Adaptive text mining: Inferring structure from sequences

Witten, Ian H.

Adaptive text mining: Inferring structure from sequences

dc.contributor.author	Witten, Ian H.
dc.date.accessioned	2008-11-11T02:09:55Z
dc.date.available	2008-11-11T02:09:55Z
dc.date.issued	2004
dc.description.abstract	Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively.	en_US
dc.format.mimetype	application/pdf
dc.identifier.citation	Witten, I.H. (2004). Adaptive text mining: Inferring structure from sequences. Journal of Discrete Algorithms, 2(2), pp. 137-159.	en_US
dc.identifier.doi	10.1016/j.jda.2004.04.010	en_US
dc.identifier.uri	https://hdl.handle.net/10289/1296
dc.language.iso	en
dc.publisher	Elsevier B.V.	en_US
dc.relation.isPartOf	Journal of Discrete Algorithms	en_NZ
dc.relation.uri	http://www.sciencedirect.com/science/journal/15708667	en_US
dc.rights	This is an author’s version of an article published in the Journal of Discrete Algorithms, (c) 2008 Elsevier B.V.	en_US
dc.subject	computer science	en_US
dc.subject	text mining	en_US
dc.subject	phrase hierarchies	en_US
dc.subject	keyphrase extraction	en_US
dc.subject	generic entity extraction	en_US
dc.subject	text categorization	en_US
dc.subject	word segmentation	en_US
dc.subject	acronym extraction	en_US
dc.subject	compression algorithms	en_US
dc.subject	adaptive techniques	en_US
dc.subject	Machine learning
dc.title	Adaptive text mining: Inferring structure from sequences	en_US
dc.type	Journal Article	en_US
pubs.begin-page	137	en_NZ
pubs.edition	June	en_NZ
pubs.elements-id	30422
pubs.end-page	159	en_NZ
pubs.issue	2	en_NZ
pubs.volume	2	en_NZ
uow.identifier.article-no	2	en_NZ