2000 - 2009 Working Papers
Permanent URI for this collectionhttps://hdl.handle.net/10289/17940
Browse
Recent Submissions
Item type: Publication , Text categorization using compression models(Department of Computer Science, University of Waikato, 2000) Frank, Eibe; Chui, Chang; Witten, Ian H.Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a “supervised learning” approach to categorization, where already-classified articles—which effectively define the categories—are used as “training data” to build a model that can be used for classifying new articles that comprise the “test data.” This contrasts with “unsupervised” learning, where there is no training data and clusters of like documents are sought amongst the test articles. With supervised learning, meaningful labels (such as keyphrases) are attached to the training documents, and appropriate labels can be assigned automatically to test documents depending on which category they fall into.Item type: Publication , A simple approach to ordinal classification(Department of Computer Science, University of Waikato, 2001) Frank, Eibe; Hall, Mark A.Machine learning methods for classification problems commonly assume that the class values are unordered. However, in many practical applications, the class values do exhibit a natural order—for example, when learning how to grade. The standard approach to ordinal classification converts the class value into a numeric quantity and applies a regression learner to the transformed data, translating the output back into a discrete class value in a post-processing step. A disadvantage of this method is that it can only be applied in conjunction with a regression scheme. In this paper, we present a simple method that enables standard classification algorithms to make use of ordering information in class attributes. By applying it in conjunction with a decision tree learner, we show that it outperforms the naive approach, which treats the class values as an unordered set. Compared to special-purpose algorithms for ordinal classification, our method has the advantage that it can be applied without any modification to the underlying learning scheme.Item type: Publication , Constructing programs or processes(Department of Computer Science, University of Waikato, 2005-12) Reeves, Steve; Streader, DavidWe define interacting sequential programs, motivated originally by constructivist considerations. We use them to investigate notions of implementation and determinism. Process algebras do not define what can be implemented and what cannot. As we demonstrate it is problematic to do so on the set of all processes. Guided by constructivist notions we have constructed interacting sequential programs which we claim can be readily implemented and are a subset of processes.Item type: Publication , Toward a theory of music information retrieval queries: System design implications(Department of Computer Science, University of Waikato, 2002) Cunningham, Sally Jo; Downie, J. StephenInterest in the development of content-based music information retrieval (MIR) systems is growing rapidly. The MIR research community consists of a multidisciplinary amalgam of librarians, digital librarians, information scientists, computer scientists, musicologists, audio engineers, lawyers and business persons. This multidisciplinary approach has given rise to significant technological advancements in retrieval algorithms, audio interfaces and data representation schemes. Notwithstanding these technological advancements, MIR research is currently a systems-centered research domain. For a variety of reasons-including intellectual property law, limited access to substantial, multigenre, multi-format collections and a lack of a historical user-base-MIR research has hitherto been unable to develop and exploit data concerning the nature of real-world user needs and use of music information.Item type: Publication , Proceedings of the second computing women congress: Student Papers(University of Waikato, Department of Computer Science, 2006-02-11) Hinze, Annika; Jung, Doris; Cunningham, Sally JoThe CWC 2006 Proceedings contains the following student papers: • Kathryn Hempstalk: Hiding Behind Corners: Using Edges in Images for Better Steganography • Supawan Prompramote, Kathy Blashki: Playing to Learn: Enhancing Educational Opportunities using Games Technology • Judy Bowen: Celebrity Death Match: Formal Methods vs. User-Centred Design • Liz Bryce: BECOMING INDIGENOUS: an impossible necessity • Tatiana King: Privacy Issues in Health Care and Security of Statistical Databases • Nilufar Baghaei: A Collaborative Constraint-based Intelligent System for Learning Object-Oriented Analysis and Design using UML • Sonja van Kerkhof: Alternatives to stereotypes: some thoughts on issues and an outline of one gameItem type: Publication , Design and formal model of an event-driven and service-oriented architecture for the Mobile Tourist Information System TIP(Department of Computer Science, University of Waikato, 2008) Eschner, Lisa; Hinze, AnnikaThis thesis introduces a new collaboration framework for context-aware services in a mobile environment enabling services to co-operate with several anonymous co-operation partners. We extend the current TIP design and architecture so that new services may easily be added to and co-operate with existing ones. Obsolete services may be replaced by new ones providing the same functionality. Services are de-coupled. Service co-operation is completely changed. This means that services react to the events they receive, irrespective of the events publishers. We also show how service-oriented and event-driven architectures may be combined maintaining their respective advantages. We introduce features of serviceoriented architectures to services co-operating via an eventbased middleware. We describe the formal model of a new system for mobile tourist information and the newly introduced features of the collaboration framework. Those features fundamentally change the way services communicate and cooperate.Item type: Publication , Trust-based recommendations for mobile tourists in TIP(Department of Computer Science, The University of Waikato, 2008) Quan, Qiu; Hinze, AnnikaRecommender systems aim to suggest to users items they would like. However, concerns about the reliability of information from unknown recommenders influences user acceptance. In this paper, we analyse trust-based recommendations for the tourist information system TIP. We believe that the recommender strategy is closely related to the information domain applied. So, the delivered trust-based tourist recommendations have combined peers’ ratings on sights, trust computations and geographical constraints. We create two trust propagation models to spread trust in the TIP community. Three Trust based and location-aware filtering algorithms are implemented. According to research on feasibilities of trust in recommendation fields, three collaborative filtering algorithms in TIP are improved by introducing the trust concept.Item type: Publication , Seven abstraction rules preserving generalised nonblocking(University of Waikato, Department of Computer Science, 2009-09-01) Malik, Robi; Leduc, RyanThis working paper proposes a compositional approach to verify the generalised nonblocking property of discrete-event systems. Generalised nonblocking is introduced in [15] to overcome weaknesses of the standard nonblocking check in discrete-event systems and increase the scope of liveness properties that can be handled. This paper addresses the question of how generalised nonblocking can be verified efficiently. The explicit construction of the complete state space is avoided by first composing and simplifying individual components in ways that preserve generalised nonblocking. The paper extends and generalises previous results about compositional verification of standard nonblocking and lists a new set of computationally feasible abstraction rules for standard and generalised nonblocking.Item type: Publication , Linear-time graph triples census algorithm under assumptions typical of social networks(University of Waikato, Department of Computer Science, 2009-08-20) McEnnis, DanielA graph triples census is a histogram of all possible sets of three vertici (called a triple) from a graph. Graph triples census have been in active use in sociology for over 50 years. The earliest paper using this approach is by Holland and Leinhardt [1]. This gives a general description of the structure of directed graphs in a fixed length vector. Since this time, this analytic tool has been widely used in social network analysis. A summary of important papers using this approach, both as end product and as a component of further analysis, are in[2].Item type: Publication , MIR task and evaluation techniques(University of Waikato, Department of Computer Science, 2009-08-12) McEnnis, DanielExisting tasks in MIREX have traditionally focused on low-level MIR tasks working with flat (usually DSP-only) ground-truth. These evaluation techniques, however, can not evaluate the increasing number of algorithms that utilize relational data and are not currently utilizing the state of the art in evaluating ranked or ordered output. This paper summarizes the state of the art in evaluating relational ground-truth. These components are then synthesized into novel evaluation techniques that are then applied to 14 concrete music document retrieval tasks, demonstrating how these evaluation techniques can be applied in a practical context.Item type: Publication , Graph-RAT programming environment(University of Waikato, Department of Computer Science, 2009-08-12) McEnnis, DanielGraph-RAT is a new programming environment specializing in relational data mining. It incorporates a number of different techniques into a single framework for data collection, data cleaning, propositionalization, and analysis. The language is functional where algorithms are executed over arbitrary sub-graphs of the data. Analytical results can be conducted using collaborative filtering or machine learning techniques. The example algorithms are under BSD license.Item type: Publication , A robust semantics hides fewer errors(University of Waikato, Department of Computer Science, 2009-06-10) Reeves, Steve; Streader, DavidIn this paper we explore how formal models are interpreted and to what degree meaning is captured in the formal semantics and to what degree it remains in the informal interpretation of the semantics. By applying a robust approach to the definition of refinement and semantics, favoured by the event-based community, to state-based theory we are able to move some aspects from the informal interpretation into the formal semantics.Item type: Publication , Guarded operations, refinement and simulation(University of Waikato, Department of Computer Science, 2009-06-10) Reeves, Steve; Streader, DavidSimulation rules have long been used as an effective computational means to decide refinement relations in state-based formalisms. Here we investigate how they might be amended so as to decide the event-based notion of singleton failures refinement of abstract data types or processes that have operations with a "guarded" interpretation. As the results presented here and found elsewhere in the literature are so sensitive to the details of the definitions used, we have machine-checked our results.Item type: Publication , A semantics and implementation of a causal logic programming language(University of Waikato, Department of Computer Science, 2009-02-11) Cleary, John G.; Utting, Mark; Clayton, RogerThe increasingly widespread availability of multicore and manycore computers demands new programming languages that make parallel programming dramatically easier and less error prone. This paper describes a semantics for a new class of declarative programming languages that support massive amounts of implicit parallelism.Item type: Publication , Considering reachability when comparing data refinements(University of Waikato, Department of Computer Science, 2008-11-03) Reeves, SteveAdding considerations about reachability to the Logics of Specification Languages [1] chapter [2].Item type: Publication , Hierarchical document clustering using automatically extracted keyphrases(University of Waikato, Department of Computer Science, 2000-10) Jones, Steve; Mahoui, MalikaIn this paper we present a technique for automatically generating hierarchical clusters of documents. Our technique exploits document keyphrases as features of the document space to support clustering. In fact, we cluster keyphrases rather than documents themselves and then associate documents with keyphrase clusters. We discuss alternative measures of similarity between ‘soft-clusters’ which seed Ward’s hierarchical clustering algorithm, and present the resulting cluster hierarchies that we have produced for a large collection of scientific technical reports. We analyse the effect of the alternative similarity measures and suggest improvement to our technique.Item type: Publication , A comparative transaction log analysis of two computing collections(University of Waikato, Department of Computer Science, 2000-07) Mahoui, Malika; Cunningham, Sally JoTransaction logs are invaluable sources of fine-grained information about users’ search behavior. This paper compares the searching behavior of users across two WWW-accessible digital libraries: the New Zealand Digital Library’s Computer Science Technical Reports collection (CSTR), and the Karlsruhe Computer Science Bibliographies (CSBIB) collection. Since the two collections are designed to support the same type of users-researchers/students in computer science a comparative log analysis is likely to uncover common searching preferences for that user group. The two collections differ in their content, however; the CSTR indexes a full text collection, while the CSBIB is primarily a bibliographic database. Differences in searching behavior between the two systems may indicate the effect of differing search facilities and content type.Item type: Publication , µ-Charts and Z: Extending the translation(University of Waikato, Department of Computer Science, 2000-08) Reeve, Greg; Reeves, SteveThis paper describes extensions and modifications to the µ-charts as given in earlier papers of Philipps and Scholz. The charts are extended to include a command language, integer-valued signals and local integer variables. The command language is based on the syntax presented in Scholz’ thesis and the integer-valued signals and local variables are based loosely on Scholz’ earlier work. After presenting the new semantics we turn to extending the µ-charts-to-Z translation that we developed in previous work. The extensions to the translation process describe both the changes due to the extensions to the µ-charts and a modification to the translation method to more fully capture the beneficial modularisation encouraged by the µ-charts formalism. We finish by giving three complete translation examples. The paper should be read as a record of our gradual development of a Z semantics for µ-charts–hence its sometimes exploratory character or laborious explanations as we come to terms (thinking out loud) with the (sometimes very subtle) meaning of µ-charts, especially with regard to pathological and unusual examples of their use.Item type: Publication , Benchmarking attribute selection techniques for data mining(University of Waikato, Department of Computer Science, 2000-07) Hall, Mark A.; Holmes, GeoffreyData engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods. All the methods produce an attribute ranking, a useful devise of isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the rankings with respect to a learning scheme to find the best attributes. Results are reported for a selection of standard data sets and two learning schemes C4.5 and naive Bayes.Item type: Publication , A development environment for predictive modelling in foods(University of Waikato, Department of Computer Science, 2000-07) Holmes, Geoffrey; Hall, Mark A.WEKA (Waikato Environment for Knowledge Analysis) is a comprehensive suite of Java class libraries that implement many state-of-the-art machine learning/data mining algorithms. Non-programmers interact with the software via a user interface component called the Knowledge Explorer. Applications constructed from the WEKA class libraries can be run on any computer with a web browsing capability, allowing users to apply machine learning techniques to their own data regardless of computer platform. This paper describes the user interface component of the WEKA system in reference to previous applications in the predictive modeling of foods.