Adaptive text mining: Inferring structure from sequences
dc.contributor.author | Witten, Ian H. | |
dc.date.accessioned | 2008-11-11T02:09:55Z | |
dc.date.available | 2008-11-11T02:09:55Z | |
dc.date.issued | 2004 | |
dc.description.abstract | Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively. | en_US |
dc.format.mimetype | application/pdf | |
dc.identifier.citation | Witten, I.H. (2004). Adaptive text mining: Inferring structure from sequences. Journal of Discrete Algorithms, 2(2), pp. 137-159. | en_US |
dc.identifier.doi | 10.1016/j.jda.2004.04.010 | en_US |
dc.identifier.uri | https://hdl.handle.net/10289/1296 | |
dc.language.iso | en | |
dc.publisher | Elsevier B.V. | en_US |
dc.relation.isPartOf | Journal of Discrete Algorithms | en_NZ |
dc.relation.uri | http://www.sciencedirect.com/science/journal/15708667 | en_US |
dc.rights | This is an author’s version of an article published in the Journal of Discrete Algorithms, (c) 2008 Elsevier B.V. | en_US |
dc.subject | computer science | en_US |
dc.subject | text mining | en_US |
dc.subject | phrase hierarchies | en_US |
dc.subject | keyphrase extraction | en_US |
dc.subject | generic entity extraction | en_US |
dc.subject | text categorization | en_US |
dc.subject | word segmentation | en_US |
dc.subject | acronym extraction | en_US |
dc.subject | compression algorithms | en_US |
dc.subject | adaptive techniques | en_US |
dc.subject | Machine learning | |
dc.title | Adaptive text mining: Inferring structure from sequences | en_US |
dc.type | Journal Article | en_US |
pubs.begin-page | 137 | en_NZ |
pubs.edition | June | en_NZ |
pubs.elements-id | 30422 | |
pubs.end-page | 159 | en_NZ |
pubs.issue | 2 | en_NZ |
pubs.volume | 2 | en_NZ |
uow.identifier.article-no | 2 | en_NZ |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Adaptive Text Mining.pdf
- Size:
- 107.73 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.79 KB
- Format:
- Item-specific license agreed upon to submission
- Description: