Adaptive text mining: Inferring structure from sequences

dc.contributor.authorWitten, Ian H.
dc.date.accessioned2008-11-11T02:09:55Z
dc.date.available2008-11-11T02:09:55Z
dc.date.issued2004
dc.description.abstractText mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively.en_US
dc.format.mimetypeapplication/pdf
dc.identifier.citationWitten, I.H. (2004). Adaptive text mining: Inferring structure from sequences. Journal of Discrete Algorithms, 2(2), pp. 137-159.en_US
dc.identifier.doi10.1016/j.jda.2004.04.010en_US
dc.identifier.urihttps://hdl.handle.net/10289/1296
dc.language.isoen
dc.publisherElsevier B.V.en_US
dc.relation.isPartOfJournal of Discrete Algorithmsen_NZ
dc.relation.urihttp://www.sciencedirect.com/science/journal/15708667en_US
dc.rightsThis is an author’s version of an article published in the Journal of Discrete Algorithms, (c) 2008 Elsevier B.V.en_US
dc.subjectcomputer scienceen_US
dc.subjecttext miningen_US
dc.subjectphrase hierarchiesen_US
dc.subjectkeyphrase extractionen_US
dc.subjectgeneric entity extractionen_US
dc.subjecttext categorizationen_US
dc.subjectword segmentationen_US
dc.subjectacronym extractionen_US
dc.subjectcompression algorithmsen_US
dc.subjectadaptive techniquesen_US
dc.subjectMachine learning
dc.titleAdaptive text mining: Inferring structure from sequencesen_US
dc.typeJournal Articleen_US
pubs.begin-page137en_NZ
pubs.editionJuneen_NZ
pubs.elements-id30422
pubs.end-page159en_NZ
pubs.issue2en_NZ
pubs.volume2en_NZ
uow.identifier.article-no2en_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Adaptive Text Mining.pdf
Size:
107.73 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: