1995 Working Papers

Browse

Recent Submissions

  • Publication
    Video support for shared work-space interaction – an empirical study
    (Working Paper, Department of Computer Science, The University of Waikato, 1995) Masoodian, Masood; Apperley, Mark; Frederikson, Lesley
    A study has been carried out to identify the effects of different human-to-human communication modes on dyadic computer supported group work. A pilot study evaluated an available shared work-space software system, supplemented by face-to-face, telephone-based, and text-based communication modes between two users. The findings from this study were then used to design an extensive experiment to explore the relative impact of face-to-face, full motion video, slow motion video, and audio only communication modes when used in conjunction with the type of CSCW system. This paper describes the experiments, and examines the findings of this empirical study with the aim of establishing the importance of co-presence in CSCW, and the effectiveness of these various communication modes in achieving it.
  • Publication
    How Maui captured the sun: using a MUD for educational simulation
    (Working Paper, University of Waikato, Department of Computer Science, 1995-12) Cunningham, Sally Jo; Williams, Warren
    MUDs (Multi-User Dungeons) are text-based, multi-user communication and modelling programs. This paper investigates the potential of a popular extensible MUD, the LambdaMOO system, as a tool for second language training and for educational simulation gaming.
  • Publication
    Information retrieval programs on the Internet: tools for teaching IR
    (Working Paper, University of Waikato, Department of Computer Science, 1995-11) Cunningham, Sally Jo
    The theory of information retrieval has generally been taught in theory: it has been difficult to provide students with hands-on experience with retrieval engines incorporating many IR topics such as relevance ranking, fuzzy queries, etc. Recently, however, a number of retrieval programs have become freely available for interactive use over the Internet. These programs can be useful in the classroom, by permitting students to examine a variety of implementations of IR algorithms over different document collections. Moreover, many of the document collections are in themselves valuable subject resources, and are well worth exploring from the point of view of development familiarity with them as reference materials.
  • Publication
    Character-less programming II: the spreadsheet
    (Working Paper, University of Waikato, Department of Computer Science, 1995-11) Barbour, Robert H.
    The spreadsheet is a commonly used, yet under-researched, application tool. Hendry and Green (1994) report fewer than ten entries about spreadsheets in the HCI community literature between 1984 and 1991. The 1993 Human Computer Interaction Proceedings of a Conference on Applications and Case Studies does not list the word 'spreadsheet' in the index whereas databases and wordprocessor are both referenced. The current situation in 1996 is not significantly different. For Polytechnic and University communities, many of whom are involved in teaching about application software, this situation means that the widespread teaching of the tool is simply not being reported in the research literature. The unanswered but researchable question raised in the Hendry and Green (1994) indicate the breadth of the field. Following a summary of these questions, the even more neglected issue of providing culturally appropriate spreadsheet software is raised. A set of constructs for thinking about multicultural software is presented. A model of a procedure for generating an elementary spreadsheet is provided, exemplified and demonstrated for the Maori Language in New Zealand.
  • Publication
    Towards the digital music library: tune retrieval from acoustic input
    (Working Paper, University of Waikato, Department of Computer Science, 1995-10) McNab, Rodger J.; Smith, Lloyd A.; Witten, Ian H.; Henderson, Clare L.; Cunningham, Sally Jo
    Music is traditionally retrieved by title, composer or subject classification. It is possible, with current technology, to retrieve music from a database on the basis of a few notes sung or hummed into a microphone. This paper describes the implementation of such a system, and discusses several issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements from ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input.
  • Publication
    Teaching novice conceptual data modellers to become experts
    (Working Paper, University of Waikato, Department of Computer Science, 1995-09) Venable, John R.
    This paper describes teaching practices designed to help novice data modellers become expert data modellers. We base these practices on extant empirical research which highlights the strengths of expert data modellers and reveals the weaknesses of novices. After reviewing this research and analysing the causes of the novices' difficulties, we describe the strategy and specific techniques for helping novices to overcome their weaknesses and acquire the strengths and skills of expert data modellers. Techniques recommended include explicit comparison and teaching of novice and expert characteristics and behaviours, providing students with a realistic plan for how to acquire expert data modellers' capabilities, exposure to and comparison of a wide variety of data modelling approaches and topics, extensive amounts of practice on a wide variety of application domains, and critique of practical work in light of the understanding of novice errors and expert behaviours. Our intent is not just to make significant progress during a course, but to provide students with a means to continue to learn and improve in the long term.
  • Publication
    A logic for specifying and reasoning about cooperative environments
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) Reeves, Steve
    In this paper we describe the current progress of an attempt to develop a logic which will allow us to specify required properties of systems which typically consist of a single interactive program being used, probably simultaneously, by several agents, usually people. The logic is a development of ideas from modal logic and their more recent developments to describe computation. Since modal logic (and its extensions) are still relatively new to most people we give introductions to these logics in this paper, assuming only a familiarity with classical first-order logic and some proof theory. We also give an account of some of the sorts of situations that we want to specify. Finally, we consider what work will be needed in the future, building on what we present here, in order to achieve our goal of providing a language in which to specify and reason about systems intended to support co-operative working.
  • Publication
    Towards an integrated refinement environment for formal program development
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) Reeves, Steve; Grundy, John C.
    One of the main hurdles to the general adoption of formal program development techniques is a lack of tools to support their use in combination with more traditional development techniques. This paper describes an integrated environment for software development which embodies the aim of formal program development. Multiple levels of refinement of each specification are supported, with associated proof obligations, each of which can be viewed at various levels of detail throughout the development process. All of these formal views are kept consistent with each other and with more traditional design and implementation views. This allows software developers to specify, design, refine, prove, implement and document their software within a single integrated environment.
  • Publication
    Writing anxiety in computer science students
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) Cunningham, Sally Jo; Holmes, Geoffrey
    Effective written communication skills are recognized as essential for computing professions, but are notoriously difficult to impart to our students. One problem in teaching computing students to write may be their attitudes toward writing; anecdotally, computing students are (often justifiably) lacking in confidence about their writing skills, and avoid writing when possible. This paper explores the degree of writing anxiety/apprehension in computing majors through the administration of a standard survey instrument, the Daly and Miller Writing Apprehension Test.
  • Publication
    Building a public digital library based on full-text retrieval
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) Witten, Ian H.; Nevill-Manning, Craig G.; Cunningham, Sally Jo
    Digital libraries are expensive to create and maintain, and generally restricted to a particular corporation or group of paying subscribers. While many indexes to the World Wide Web are freely available, the quality of what is indexed is extremely uneven. The digital analog of a public library a reliable, quality, community service has yet to appear. This paper demonstrates the feasibility of a cost-effective collection of high-quality public-domain information, available free over the Internet. One obstacle to the creation of a digital library is the difficulty of providing formal cataloguing information. Without a title, author and subject database it seems hard to offer the searching facilities normally available in physical libraries. Full-text retrieval provides a way of approximating these services without a concomitant investment of resources. A second is the problem of finding a suitable corpus of material. Computer science research reports form the focus of our prototype implementation. These constitute a large body of high-quality public-domain documents. Given such a corpus, a third issue becomes the question of obtaining both plain text for indexing, and page images for readability. Typesetting formats such as PostScript provide some of the benefits of libraries scanned from paper documents such as paged-based indexing and viewing without the physical demands and error-prone nature of scanning and optical character recognition. However, until recently the difficulty of extracting text from PostScript seems to have encouraged indexing on plain-text abstracts or bibliographic information provided by authors. We have developed a new technique that overcomes the problem. This paper describes the architecture, the indexing, collection and maintenance processes, and the retrieval interface, to a prototype public digital library.
  • Publication
    An investigation into the use of machine learning for determining oestrus in cows
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) Mitchell, R. Scott; Sherlock, Robert A.; Smith, Lloyd A.
    A preliminary investigation of the application of two well-known machine learning schemes—C4.5 and FOIL—to detection of oestrus in dairy cows has been made. This is a problem of practical economic significance as each missed opportunity for artificial insemination results in 21 days lost milk production. Classifications were made on normalised deviations of milk volume production and milking order time series data. The best learning scheme was C4.5 which was able to detect 69% of oestrus events, albeit with an unacceptably high rate of "false positives" (74%). Several directions for further work and improvements are identified.
  • Publication
    Signal processing for melody transcription
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) McNab, Rodger J.; Smith, Lloyd A.; Witten, Ian H.
    MT is a melody transcription system that accepts acoustic input, typically sung by the user, and displays it in standard music notation. It tracks the pitch of the input and segments the pitch stream into musical notes, which are labelled by their pitches relative to a reference frequency that adapts to the user's tuning. This paper describes the signal processing operations involved, and discusses two applications that have been prototyped: a sightsinging tutor and a scheme for acoustically indexing a melody database.
  • Publication
    A teaching and support tool for building formal models of graphical user-interfaces
    (Working Paper, University of Waikato, Department of Computer Science, 1995-08) Reeves, Steve
    In this paper we propose the design of a tool that will allow the construction of a formal, textual description of a software system even if it has a graphical user-interface as a component. An important aspect of this design is that it can be used for two purposes—the teaching of predicate calculus and the formal specification of graphical user-interfaces. The design has been suggested by considering a system that has already been very successful for teaching predicate logic, namely Tarski's World.
  • Publication
    Applying machine learning to subject classification and subject description for information retrieval
    (Working Paper, University of Waikato, Department of Computer Science, 1995-06) Cunningham, Sally Jo; Summers, Brent
    This paper describes an experiment in applying standard supervised machine learning algorithms (C4.5 and Induct) to the problem of developing subject classification rules for documents. These algorithms are found to produce surprisingly concise models of document classifications. While the models are highly accurate on the training sets, evaluation over test sets or through cross-validation shows a significant decrease in classification accuracy. Given the difficult nature of the experimental task, however, the results of this investigation are promising and merit further study. An additional algorithm, 1R, is shown to be highly effective in generating lists of candidate terms for subject descriptions.
  • Publication
    The development of Holte's 1R Classifier
    (Working Paper, University of Waikato, Department of Computer Science, 1995-06) Nevill-Manning, Craig G.; Holmes, Geoffrey; Witten, Ian H.
    The 1R procedure for machine learning is a very simple one that proves surprisingly effective on the standard datasets commonly used for evaluation. This paper describes the method and discusses two areas that can be improved: the way that intervals are formed when discretizing continuously-valued attributes, and the way that missing values are treated. Then we show how the algorithm can be extended to avoid a problem endemic to most practical machine learning algorithms—their frequent dismissal of an attribute as irrelevant when in fact it is highly relevant when combined with other attributes.
  • Publication
    Instance-based learning: nearest neighbour with generalisation
    (Working Paper, University of Waikato, Department of Computer Science, 1995-05) Martin, Brent
    Instance-based learning is a machine learning method that classifies new examples by comparing them to those already seen and in memory. There are two types of instance-based learning; nearest neighbour and case-based reasoning. Of these two methods, nearest neighbour fell into disfavour during the 1980s, but regained popularity recently due to its simplicity and ease of implementation. Nearest neighbour learning is not without problems. It is difficult to define a distance function that works well for both discrete and continuous attributes. Noise and irrelevant attributes also pose problems. Finally, the specificity bias adopted by instance-based learning, while often an advantage, can over-represent small rules at the expense of more general concepts, leading to a marked decrease in classification performance for some domains. Generalised exemplars offer a solution. Examples that share the same class are grouped together, and so represent large rules more fully. This reduces the role of the distance function to determining the class when no rule covers the new example, which reduces the number of classification errors that result from inaccuracies of the distance function, and increases the influence of large rules while still representing small ones. This thesis investigates non-nested generalised exemplars as a way of improving the performance of nearest neighbour. The method is tested using benchmark domains and the results compared with documented results for ungeneralised exemplars, nested generalised exemplars, rule induction methods and a composite rule induction and nearest neighbour learner. The benefits of generalisation are isolated and the performance improvement measured. The results show that non-nested generalisation of exemplars improves the classification performance of nearest neighbour systems and reduces classification time.
  • Publication
    Applications for bibliometric research in the emerging digital library
    (Working Paper, University of Waikato, Department of Computer Science, 1995-05) Cunningham, Sally Jo; Vallabh, Mahendra
    A large amount of research literature has recently become available on the Internet through "digital libraries". This migration of information from paper to electronic media promises to have a huge impact on the way that research is performed, as documents become more widely, cheaply, and quickly distributed than is possible through traditional publishing. A secondary use for these document repositories and indexes is as a platform for bibliometric research. We examine the extent to which the new digital libraries support conventional bibliometric analysis, and discuss shortcomings in their current forms. Interestingly, these electronic text archives also provide opportunities for new types of studies: generally the full text of documents are available for analysis, giving a finer grain of insight than abstract-only online databases; these repositories often contain technical reports or pre-prints, the "gray literature" that has been previously unavailable for analysis; and document "usage" can be measured directly by recording user accesses, rather than studied indirectly through document references.
  • Publication
    An empirical investigation of the obsolescence rate for information systems literature
    (Working Paper, University of Waikato, Department of Computer Science, 1995-05) Cunningham, Sally Jo
    A synchronous study has been performed on four years of the International Conference on Information systems (ICIS) proceedings, to determine the obsolescence rate for the field of information systems (as reflected in the sub-topics covered by the ICIS conference). IS is found to have a relatively high obsolescence rate, similar to that of fields in engineering and the technology-dependent "hard" sciences. In addition, this study provides a categorization of the types of documents referenced by IS research, and presents an analysis of the obsolescence rate for these types. This type of categorization permits a finer-grained examination of patterns of information dissemination and use.
  • Publication
    The application of machine learning techniques to time-series data
    (Working Paper, University of Waikato, Department of Computer Science, 1995-05) Mitchell, R. Scott
    "Knowledge discovery" is one of the most recent and fastest growing fields of research in computer science. It combines techniques from machine learning and database technology to find and extract meaningful knowledge from large, real world databases. Much real world data is temporal in nature, for example stock prices, dairy cow milk production figures or meteorological data. Most current knowledge discovery systems utilise similarity-based machine learning methods "learning from examples" which are not in general well suited to this type of data. Time-series analysis techniques are used extensively in signal processing and sequence identification applications such as speech recognition, but have not often been considered for knowledge discovery tasks. This report documents new methods for discovering knowledge in real world time-series data. Two complementary approaches were investigated: 1) manipulation of the original dataset into a form that is usable by conventional similarity-based learners; and 2) using sequence identification techniques to learn the concepts embedded in the database. Experimental results obtained from applying both techniques to a large agricultural database are presented and analysed.
  • Publication
    Machine learning in practice: experience with agricultural databases
    (Working Paper, University of Waikato, Department of Computer Science, 1995-05) Garner, Stephen R.; Cunningham, Sally Jo; Holmes, Geoffrey; Nevill-Manning, Craig G.; Witten, Ian H.
    The Waikato Environment for Knowledge Analysis (weka) is a New Zealand government-sponsored initiative to investigate the application of machine learning to economically important problems in the agricultural industries. The overall goals are to create a workbench for machine learning, determine the factors that contribute towards its successful application in the agricultural industries, and develop new methods of machine learning and ways of assessing their effectiveness. The project began in 1993 and is currently working towards the fulfilment of three objectives: to design and implement the workbench, to provide case studies of applications of machine learning techniques to problems in agriculture, and to develop a methodology for evaluating generalisations in terms of their entropy. These three objectives are by no means independent. For example, the design of the weka workbench has been inspired by the demands placed on it by the case studies, and has also benefited from our work on evaluating the outcomes of applying a technique to data. Our experience throughout the development of the project is that the successful application of machine learning involves much more than merely executing a learning algorithm on some data. In this paper we present the process model that underpins our work over the past two years for the development of applications in agriculture; the software we have developed around our workbench of machine learning schemes to support this model; and the outcomes and problems we have encountered in developing applications.