Applying Wikipedia to Interactive Information Retrieval

dc.contributor.advisorWitten, Ian H.
dc.contributor.advisorCunningham, Sally Jo
dc.contributor.authorMilne, David N.
dc.date.accessioned2010-09-20T02:49:36Z
dc.date.available2010-09-20T02:49:36Z
dc.date.issued2010
dc.date.updated2010-09-15T05:15:38Z
dc.description.abstractThere are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval.
dc.format.mimetypeapplication/pdf
dc.identifier.citationMilne, D. N. (2010). Applying Wikipedia to Interactive Information Retrieval (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/4584en
dc.identifier.urihttps://hdl.handle.net/10289/4584
dc.language.isoen
dc.publisherUniversity of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectInteractive Information Retrieval
dc.subjectWikipedia
dc.subjectInformation Extraction
dc.subjectKnowledge Bases
dc.subjectData Mining
dc.titleApplying Wikipedia to Interactive Information Retrievalen
dc.typeThesis
pubs.place-of-publicationHamilton, New Zealanden_NZ
thesis.degree.grantorUniversity of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)en_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
8.59 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.07 KB
Format:
Item-specific license agreed upon to submission
Description: