An open-source toolkit for mining Wikipedia

dc.contributor.authorMilne, David N.
dc.contributor.authorWitten, Ian H.
dc.date.accessioned2012-10-23T04:02:41Z
dc.date.available2012-10-23T04:02:41Z
dc.date.copyright2012-08
dc.date.issued2013
dc.description.abstractThe online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques. Annotation; Disambiguation; Ontology extraction; Semantic relatedness; Toolkit; Wikipediaen_NZ
dc.identifier.citationMilne, D.N., & Witten, I. H. (2013). An open-source toolkit for mining Wikipedia. Artificial Intelligence, 194, 222-239.en_NZ
dc.identifier.doi10.1016/j.artint.2012.06.007en_NZ
dc.identifier.issn0004-3702
dc.identifier.urihttps://hdl.handle.net/10289/6733
dc.language.isoen
dc.publisherElsevieren_NZ
dc.relation.isPartOfArtificial Intelligenceen_NZ
dc.relation.ispartofArtificial Intelligence
dc.subjectAnnotationen_NZ
dc.subjectDisambiguationen_NZ
dc.subjectOntology extractionen_NZ
dc.subjectSemantic relatednessen_NZ
dc.subjectToolkiten_NZ
dc.subjectWikipediaen_NZ
dc.subjectMachine learning
dc.titleAn open-source toolkit for mining Wikipediaen_NZ
dc.typeJournal Articleen_NZ
pubs.begin-page222en_NZ
pubs.elements-id38155
pubs.end-page239en_NZ
pubs.volume194en_NZ
uow.identifier.article-noCen_NZ
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: