2003 Working Papers

Browse

Recent Submissions

  • Publication
    Predicting Library of Congress Classifications from Library of Congress Subject Headings
    (Working Paper, University of Waikato, 2003-01) Frank, Eibe; Paynter, Gordon W.
    This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to work given its set of Library of Congress Subject Headings (LCSH). LCC are organized in a tree: the root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a classification model mapping from sets of LCSH to nodes in the LCC tree. We present empirical results for our technique showing its accuracy on an independent collection of 50,000 LCSH/LCC pairs.
  • Publication
    Visualizing class probability estimators
    (Working Paper, University of Waikato, Department of Computer Science, 2003-02-19) Frank, Eibe; Hall, Mark A.
    Inducing classifiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information from the models produced. Unfortunately, some of the most powerful inductive learning algorithms generate "black boxes"—that is, the representation of the model makes it virtually impossible to gain any insight into what has been learned. This paper presents a technique that can help the user understand why a classifier makes the predictions that it does by providing a two-dimensional visualization of its class probability estimates. It requires the classifier to generate class probabilities but most practical algorithms are able to do so (or can be modified to this end).
  • Publication
    From sit-forward to lean-back: Using a mobile device to vary interactive pace
    (Working Paper, University of Waikato, Department of Computer Science, 2003-03) Jones, Mark Hedley; Jain, Preeti; Buchanan, George; Marsden, Gary
    Although online, handheld, mobile computers offer new possibilities in searching and retrieving information on the go, the fast-paced, "sit-forward" style of interaction may not be appropriate for all user search needs. In this paper, we explore how a handheld computer can be used to enable interactive search experiences that vary in pace from fast and immediate through to reflective and delayed. We describe a system that asynchronously combines an offline handheld computer and an online desktop Personal Computer, and discuss some results of an initial user evaluation.
  • Publication
    Locally weighted naive Bayes
    (Working Paper, University of Waikato, Department of Computer Science, 2003-04) Frank, Eibe; Hall, Mark A.; Pfahringer, Bernhard
    Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome naive Bayes' primary weakness—attribute independence—and improve the performance of the algorithm. This paper presents a locally weighted version of naive Bayes that relaxes the independence assumption by learning local models at prediction time. Experimental results show that locally weighted naive Bayes rarely degrades accuracy compared to standard naive Bayes and, in many cases, improves accuracy dramatically. The main advantage of this method compared to other techniques for enhancing naive Bayes is its conceptual and computational simplicity.
  • Publication
    Comparison of data and process refinement
    (Working Paper, University of Waikato, Department of Computer Science, 2003-05) Reeves, Steve; Streader, David
    When is it reasonable, or possible, to refine a one place buffer into a two place buffer? In order to answer this question we characterise refinement based on substitution in restricted contexts. We see that data refinement (specifically in Z) and process refinement give differing answers to the original question, and we compare the precise circumstances which give rise to this difference by translating programs and processes into labelled transition systems, so providing a common basis upon which to make the comparison. We also look at the closely related area of subtyping of objects. Along the way we see how all these sorts of computational construct are related as far as refinement is concerned, discover and characterise some (as far as we can tell) new sorts of refinement and, finally, point up some research avenues for the future.
  • Publication
    Applying propositional learning algorithms to multi-instance data
    (Working Paper, University of Waikato, Department of Computer Science, 2003-06) Frank, Eibe; Xu, Xin
    Multi-instance learning is commonly tackled using special-purpose algorithms. Development of these algorithms has started because early experiments with standard propositional learners have failed to produce satisfactory results on multi-instance data—more specifically, the Musk data. In this paper we present evidence that this is not necessarily the case. We introduce a simple wrapper for applying standard propositional learners to multi-instance problems and present empirical results for the Musk data that are competitive with genuine multi-instance algorithms. The key features of our new wrapper technique are: (1) it discards the standard multi-instance assumption that there is some inherent difference between positive and negative bags, and (2) it introduces weights to treat instances from different bags differently. We show that these two modifications are essential for producing good results on the Musk benchmark datasets.
  • Publication
    Using keyphrases as search result surrogates on small screen devices
    (Working Paper, University of Waikato, Department of Computer Science, 2003-09) Jones, Steve; Jones, Matt; Deo, Shaleen
    This paper investigates user interpretation of search result displays on small screen devices. Such devices present interesting design challenges given their limited display capabilities, particularly in relation to screen size. Our aim is to provide users with succinct yet useful representations of search results that allow rapid and accurate decisions to be made about the utility of result documents, yet minimize user actions (such as scrolling), the use of device resources, and the volume of data to be downloaded. Our hypothesis is that keyphrases that are automatically extracted from documents can support this aim. We report on a user study that compared how accurately users categorized result documents on small screens when the document surrogates consisted of either keyphrases only, or document titles. We found no significant performance differences between the two conditions. In addition to these encouraging results, keyphrases have the benefit that they can be extracted and presented when no other document metadata can be identified.
  • Publication
    Mining data streams using option trees
    (Working Paper, University of Waikato, Department of Computer Science, 2003-09) Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon
    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over time within these constraints. Additionally, the model must be able to be used for data mining at any point in time. This paper describes a data stream classification algorithm using an ensemble of option trees. The ensemble of trees is induced by boosting and iteratively combined into a single interpretable model. The algorithm is evaluated using benchmark datasets for accuracy against state-of-the-art algorithms that make use of the entire dataset.