1999 Working Papers

 
  • High precision traffic measurement by the WAND research group

    Cleary, John G.; Graham, Ian; McGregor, Anthony James; Pearson, Murray W.; Siedins, Ilze; Curtis, James; Donnelly, Stephen F.; Martens, Jed; Martin, Stele (Department of Computer Science, 1999-12)
    Over recent years the size and capacity of the Internet has continued its exponential growth driven by new applications and improving network technology. These changes are particularly significant in the New Zealand context ...
  • The Niupepa Collection: Opening the blinds on a window to the past

    Keegan, Te Taka Adrian Gregory; Cunningham, Sally Jo; Apperley, Mark (Department of Computer Science,, 1999-12)
    This paper describes the building of a digital library collection of historic newspapers. The newspapers (Niupepa in Maori), which were published in New Zealand during the period 1842 to 1933, form a unique historical ...
  • Clustering with finite data from semi-parametric mixture distributions

    Wang, Yong; Witten, Ian H. (Dept. of Computer Science, University of Waikato, 1999-11)
    Existing clustering methods for the semi-parametric mixture distribution perform well as the volume of data increases. However, they all suffer from a serious drawback in finite-data situations: small outlying groups of ...
  • A compression-based algorithm for Chinese word segmentation

    Teahan, W.J.; Wen, Yingying; McNab, Rodger J.; Witten, Ian H. (Computer Science, University of Waikato, 1999-09)
    The Chinese language is written without using spaces or other word delimiters. Although a text may be thought of as a corresponding sequence of words, there is considerable ambiguity in the placement of boundaries. ...
  • Pace Regression

    Wang, Yong; Witten, Ian H. (Computer Science, University of Waikato, 1999-09)
    This paper articulates a new method of linear regression, “pace regression”, that addresses many drawbacks of standard regression reported in the literature-particularly the subset selection problem. Pace regression improves ...
  • Weka: Practical machine learning tools and techniques with Java implementations

    Witten, Ian H.; Frank, Eibe; Trigg, Leonard E.; Hall, Mark A.; Holmes, Geoffrey; Cunningham, Sally Jo (1999-08)
    The Waikato Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java class libraries that implement many state-of-the-art machine learning and data mining algorithms. Weka is freely available on the ...
  • Reduced-error pruning with significance tests

    Frank, Eibe; Witten, Ian H. (Computer Science, University of Waikato, 1999-06)
    When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. “Reduced-error pruning” is a fast pruning ...
  • The LRU*WWW proxy cache document replacement algorithm

    Chang, Chung-yi; McGregor, Anthony James; Holmes, Geoffrey (1999-06)
    Obtaining good performance from WWW proxy caches is critically dependent on the document replacement policy used by the proxy. This paper validates the work of other authors by reproducing their studies of proxy cache ...
  • A survey of software requirements specification practices in the New Zealand software industry

    Groves, Lindsay; Nickson, Ray; Reeve, Greg; Reeves, Steve; Utting, Mark (Computer Science, University of Waikato, 1999-06)
    We report on the software development techniques used in the New Zealand software industry, paying particular attention to requirements gathering. We surveyed a selection of software companies with a general questionnaire ...
  • Automating iterative tasks with programming by demonstration: a user evaluation

    Paynter, Gordon W.; Witten, Ian H. (1999-05)
    Computer users often face iterative tasks that cannot be automated using the tools and aggregation techniques provided by their application program: they end up performing the iteration by hand, repeating user interface ...
  • Facilitating multiple copy/past operations

    Apperley, Mark; Baker, Jay; Fletcher, Dale; Rogers, Bill (Computer science.university of Waikato, 1999-05)
    Copy and paste, or cut and paste, using a clipboard or paste buffer has long been the principle facility provided to users for transferring data between and within GUI applications. We argue that this mechanism can be ...
  • Browsing tree structures

    Apperley, Mark; Spence, Robert; Hodge, Stephen; Chester, Michael (Computer Science, University of Waikato, 1999-05)
    Graphic representations of tree structures are notoriously difficult to create, display, and interpret, particularly when the volume of information they contain, and hence the number of nodes, is large. The problem of ...
  • Feature selection for discrete and numeric class machine learning

    Hall, Mark A. (Computer Science, University of Waikato, 1999-04)
    Algorithms for feature selection fall into two broad categories: wrappers use the learning algorithm itself to evaluate the usefulness of features, while filters evaluate features according to heuristics based on general ...
  • A diagnostic tool for tree based supervised classification learning algorithms

    Holmes, Geoffrey; Trigg, Leonard E. (1999-03)
    The process of developing applications of machine learning and data mining that employ supervised classification algorithms includes the important step of knowledge verification. Interpretable output is presented to a user ...
  • Generating rule sets from model trees

    Holmes, Geoffrey; Hall, Mark A.; Frank, Eibe (1999-03)
    Knowledge discovered in a database must be represented in a form that is easy to understand. Small, easy to interpret nuggets of knowledge from data are one requirement and the ability to induce them from a variety of data ...
  • Lexical attraction for text compression

    Bach, Joscha; Witten, Ian H. (Computer Science, University of Waikato, 1999-01)
    New methods of acquiring structural information in text documents may support better compression by identifying an appropriate prediction context for each symbol. The method of “lexical attraction” infers syntactic dependency ...

View more