Permanent URI for this collection
Browse
Recent Submissions
Publication A graphical user interface for Boolean query specification(Working Paper, Computer Science, University of Waikato, 1997-12) Jones, Steve; McInnes, ShonaOn-line information repositories commonly provide keyword search facilities via textual query languages based on Boolean logic. However, there is evidence to suggest that the syntactical demands of such languages can lead to user errors and adversely affect the time that it takes users to form queries. Users also face difficulties because of the conflict in semantics between AND and OR when used in Boolean logic and English language. We suggest that graphical query languages, in particular Venn-like diagrams, can alleviate the problems that users experience when forming Boolean expressions with textual languages. We describe Vquery, a Venn-diagram based user interface to the New Zealand Digital Library (NZDL). The design of Vquery has been partly motivated by analysis of NZDL usage. We found that few queries contain more than three terms, use of the intersection operator dominates and that query refinement is common. A study of the utility of Venn diagrams for query specification indicates that with little or no training users can interpret and form Venn-like diagrams which accurately correspond to Boolean expressions. The utility of Vquery is considered and directions for future work are proposed.Publication Adaptive models of English text(Working Paper, Department of Computer Science, The University of Waikato, 1997-11) Teahan, W.J.; Cleary, John G.High quality models of English text with performance approaching that of humans is important for many applications including spelling correction, speech recognition, OCR, and encryption. A number of different statistical models of English are compared with each other and with previous estimates from human subjects. It is concluded that the best current models are word based with part of speech tags. Given sufficient training text, they are able to attain performance comparable to humans.Publication OZCHI‘96 Industry Session: Sixth Australian Conference on Human-Computer Interaction(Working Paper, Computer Science, University of Waikato, 1997-11) Phillips, Chris; McKauge, JanisThe idea for a specific industry session at OZCHI was first mooted at the 1995 conference in Wollongong, during questions following a session of short papers which happened (serendipitously) to be presented by people from industry. An animated discussion took place, most of which was about how OZCHI could be made more relevant to people in industry, be it working as usability consultants, or working within organisations either as usability professionals or as ‘champions of the cause’. The discussion raised more questions than answers, about the format of such as session, about the challenges of attracting industry participation, and about the best way of publishing the results. Although no real solutions were arrived at, it was enough to place an industry session on the agenda for OZCHI‘96.Publication Effects of re-ordered memory operations on parallelism(Working Paper, Computer Science, University of Waikato, 1997-11) Littin, Richard H.; Cleary, John G.The performance effect of permitting different memory operations to be re-ordered is examined. The available parallelism is computed using a machine code simulator. A range of possible restrictions on the re-ordering of memory operations is considered: from the purely sequential case where no re-ordering is permitted; to the completely permissive one where memory operations may occur in any order so that the parallelism is restricted only by data dependencies. A general conclusion is drawn that to reliably obtain parallelism beyond 10 instructions per clock will require an ability to re-order all memory instructions. A brief description of a feasible architecture capable of this is given.Publication Constraints on parallelism beyond 10 instructions per cycle(Working Paper, Computer Science, University of Waikato, 1997-11) Cleary, John G.; Littin, Richard H.; McWha, David J.A.; Pearson, Murray W.The problem of extracting Instruction Level Parallelism at levels of 10 instructions per clock and higher is considered. Two different architectures which use speculation on memory accesses to achieve this level of performance are reviewed. It is pointed out that while this form of speculation gives high potential parallelism it is necessary to retain execution state so that incorrect speculation can be detected and subsequently squashed. Simulation results show that the space to store such state is a critical resource in obtaining good speedup. To make good use of the space it is essential that state be stored efficiently and that it be retired as soon as possible. A number of techniques for extracting the best usage from the available state storage are introduced.Publication Correcting English text using PPM models(Working Paper, Computer Science, University of Waikato, 1997-11) Teahan, W.J.; Inglis, Stuart J.; Cleary, John G.; Holmes, GeoffreyAn essential component of many applications in natural language processing is a language modeler able to correct errors in the text being processed. For optical character recognition (OCR), poor scanning quality or extraneous pixels in the image may cause one or more characters to be mis-recognized; while for spelling correction, two characters may be transposed, or a character may be inadvertently inserted or missed out. This paper describes a method for correcting English text using a PPM model. A method that segments words in English text is introduced and is shown to be a significant improvement over previously used methods. A similar technique is also applied as a post-processing stage after pages have been recognized by a state-of-the-art commercial OCR system. We show that the accuracy of the OCR system can be increased from 95.9% to 96.6%, a decrease of about 10 errors per page.Publication Musical image compression(Working Paper, Computer Science, University of Waikato, 1997-11) Bainbridge, David; Inglis, Stuart J.Optical music recognition aims to convert the vast repositories of sheet music in the world into an on-line digital format [Bai97]. In the near future it will be possible to assimilate music into digital libraries and users will be able to perform searches based on a sung melody in addition to typical text-based searching [MSW+96]. An important requirement for such a system is the ability to reproduce the original score as accurately as possible. Due to the huge amount of sheet music available, the efficient storage of musical images is an important topic of study. This paper investigates whether the “knowledge” extracted from the optical music recognition (OMR) process can be exploited to gain higher compression than the JBIG international standard for bi-level image compression. We present a hybrid approach where the primitive shapes of music extracted by the optical music recognition process-note heads, note stems, staff lines and so forth-are fed into a graphical symbol based compression scheme originally designed for images containing mainly printed text. Using this hybrid approach the average compression rate for a single page is improved by 3.5% over JBIG. When multiple pages with similar typography are processed in sequence, the file size is decreased by 4-8%. Section 2 presents the relevant background to both optical music recognition and textual image compression. Section 3 describes the experiments performed on 66 test images, outlining the combinations of parameters that were examined to give the best results. The initial results and refinements are presented in Section 4, and we conclude in the last section by summarizing the findings of this work.Publication Tag based models of English text(Working Paper, 1997-11) Teahan, W.J.; Cleary, John G.The problem of compressing English text is important both because of the ubiquity of English as a target for compression and because of the light that compression can shed on the structure of English. English text is examined in conjunction with additional information about the parts of speech of each word in the text (these are referred to as “tags”). It is shown that the tags plus the text can be compressed more than the text alone. Essentially the tags can be compressed for nothing or even a small net saving in size. A comparison is made of a number of different ways of integrating compression of tags and text using an escape mechanism similar to PPM. These are also compared with standard word based and character based compression programs. The result is that the tag character and word based schemes always outperform the character based schemes. Overall, the tag based schemes outperform the word based schemes. We conclude by conjecturing that tags chosen for compression rather than linguistic purposes would perform even better.Publication Fast convergence with a greedy tag-phrase dictionary(Working Paper, 1997-11) Peeters, Ross; Smith, Tony C.The best general-purpose compression schemes make their gains by estimating a probability distribution over all possible next symbols given the context established by some number of previous symbols. Such context models typically obtain good compression results for plain text by taking advantage of regularities in character sequences. Frequent words and syllables can be incorporated into the model quickly and thereafter used for reasonably accurate prediction. However, the precise context in which frequent patterns emerge is often extremely varied, and each new word or phrase immediately introduces new contexts which can adversely affect the compression rate. A great deal of the structural regularity in a natural language is given rather more by properties of its grammar than by the orthographic transcription of its phonology. This implies that access to a grammatical abstraction might lead to good compression. While grammatical models have been used successfully for compressing computer programs [4], grammar-based compression of plain text has received little attention, primarily because of the difficulties associated with constructing a suitable natural language grammar. But even without a precise formulation of the syntax of a language, there is a linguistic abstraction which is easily accessed and which demonstrates a high degree of regularity which can be exploited for compression purposes—namely, lexical categories.Publication Inducing cost-sensitive trees via instance-weighting(Working Paper, Computer Science, University of Waikato, 1997-09) Ting, Kai MingWe introduce an instance-weighting method to induce cost-sensitive trees in this paper. It is a generalization of the standard tree induction process where only the initial instance weights determine the type of tree (i.e., minimum error trees or minimum cost trees) to be induced. We demonstrate that it can be easily adopted to an existing tree learning algorithm. Previous research gave insufficient evidence to support the fact that the greedy divide-and-conquer algorithm can effectively induce a truly cost-sensitive tree directly from the training data. We provide this empirical evidence in this paper. The algorithm employing the instance-weighting method is found to be comparable to or better than both C4.5 and C5 in terms of total misclassification costs, tree size and the number of high cost errors. The instance-weighting method is also simpler and more effective in implementation than a method based on altered priors.Publication Usability testing: a Malaysian study(Working Paper, Computer Science, University of Waikato, 1997-07) Yeo, Alvin; Barbour, Robert H.; Apperley, MarkAn exploratory study of software assessment techniques is conducted in Malaysia. Subjects in the study comprised staff members of a Malaysian university with a high Information Technology (IT) presence. The subjects assessed a spreadsheet tool with a Bahasa Melayu (Malaysia’s national language) interface. Software evaluation techniques used include the think aloud method, interviews and the System Usability Scale. The responses in the various techniques used are reported and initial results indicate idiosyncratic behaviour of Malaysian subjects. The implications of the findings are also discussed.Publication Language use in software(Working Paper, 1997-07) Yeo, Alvin; Barbour, Robert H.Many of the popular software we use today are in English. Very few software applications are available in minority languages. Besides economic goals, we justify why software should be made available to smaller cultures. Furthermore, there is evidence that people learn and progress faster in software in their mother tongue (Griffiths et at, 1994) (Krock, 1996). We hypothesise that experienced users of English spreadsheet can easily migrate to a spreadsheet in their native tongue i.e. Bahasa Melayu (Malaysia’s national language). Observations made in the study suggest that the native speakers of Bahasa Melayu had difficulties with the Bahasa Melayu interface. The subjects’ main difficulty was their unfamiliarity with computing terminology in Bahasa Melayu. We present possible strategies to increase the use of Bahasa Melayu in IT. These strategies may also be used to promote the use of other minority languages in IT.Publication Strategies of internationalisation and localisation: a postmodernist’s perspective(Working Paper, Department of Computer Science, University of Waik, 1997-07) Barbour, Robert H.; Yeo, AlvinMany software companies today are developing software not only for local consumption but for the rest of the world. We introduce the concepts of internationalisation and localisation and discuss some techniques using these processes. An examination of postmodern critique with respect to the software industry is also reported. In addition, we also feature our proposed internationalisation technique that was inspired by taking into account the researches of postmodern philosophers and mathematicians. As illustrated in our prototype, the technique empowers non-programmers to localise their own software. Further development of the technique and its implications on user interfaces and the future of software internationalisation and localisation are discussed.Publication Localising a spreadsheet: an Iban example(Working Paper, 1997-07) Yeo, Alvin; Barbour, Robert H.Presently, there is little localisation of software to smaller cultures if it is not economically viable. We believe software should also be localised to the languages of small cultures in order to sustain and preserve these small cultures. As an example, we localised a spreadsheet from English to Iban. The process in which we carried out the localisation can be used as a framework for the localisation of software to languages of small ethnic minorities. Some problems faced during the localisation process are also discussed.Publication Internationalising a spreadsheet for Pacific Basin languages(Working Paper, Department of Computer Science, University of Waik, 1997-07) Barbour, Robert H.; Yeo, AlvinAs people trade and engage in commerce, an economically dominant culture tends to migrate language into other recently contacted cultures. Information technology (IT) can accelerate enculturation and promote the expansion of western hegemony in IT. Equally, IT can present a culturally appropriate interface to the user that promotes the preservation of culture and language with very little additional effort. In this paper a spreadsheet is internationalised to accept languages from the Latin-1 character set such as English, Maori and Bahasa Melayu (Malaysia’s national language). A technique that allows a non-programmer to add a new language to the spreadsheet is described. The technique could also be used to internationalise other software at the point of design by following the steps we outline.Publication Proceedings of the INTERACT97 Combined Workshop on CSCW in HCI-Worldwide(Working Paper, Department of Computer Science, University of Waik, 1997-07) Rauterberg, Matthias; Oestreicher, Lars; Grundy, John C.This is the proceedings for the INTERACT97 combined workshop on “CSCW in HCI-worldwide”. The position papers in this proceedings are those selected from topics relating to HCI community development worldwide and to CSCW issues. Originally these were to be two separate INTERACT workshops, but were combined to ensure sufficient participation for a combined workshop to run. The combined workshop has been split into two separate sessions to run in the morning of July 15th, Sydney, Australia. One to discuss the issues relating to the position papers focusing on general CSCW systems, the other to the development of HCI communities in a worldwide context. The CSCW session uses as a case study a proposed groupware tool for facilitating the development of an HCI database with a worldwide geographical distribution. The HCI community session focuses on developing the content for such a database, in order for it to foster the continued development of HCI communities. The afternoon session of the combined workshop involves a joint discussion of the case study groupware tool, in terms of its content and likely groupware facilities. The position papers have been grouped into those focusing on HCI communities and hence content issues for a groupware database, and those focusing on CSCW and groupware issues, and hence likely groupware support in the proposed HCI database/collaboration tools. We hope that you find the position papers in this proceedings offer a wide range of interesting reports of HCI community development worldwide, leading CSCW system research, and that a groupware tool supporting aspects of a worldwide HCI database can draw upon the varied work reported.Publication Information seeking retrieval, reading and storing behaviour of library users(Working Paper, Computer Science, University of Waikato, 1997-06) Turner, KristineIn the interest of digital libraries, it is advisable that designers be aware of the potential behaviour of the users of such a system. There are two distinct parts under investigation, the interaction between traditional libraries involving the seeking and retrieval of relevant material, and the reading and storage behaviours ensuing. Through this analysis, the findings could be incorporated into digital library facilities. There has been copious amounts of research on information seeking leading to the development of behavioural models to describe the process. Often research on the information seeking practices of individuals is based on the task and field of study. The information seeking model, presented by Ellis et al. (1993), characterises the format of this study where it is used to compare various research on the information seeking practices of groups of people (from academics to professionals). It is found that, although researchers do make use of library facilities, they tend to rely heavily on their own collections and primarily use the library as a source for previously identified information, browsing and interloan. It was found that there are significant differences in user behaviour between the groups analysed. When looking at the reading and storage of material it was hard to draw conclusions, due to the lack of substantial research and information on the topic. However, through the use of reading strategies, a general idea on how readers behave can be developed. Designers of digital libraries can benefit from the guidelines presented here to better understand their audience.Publication Learning from batched data: model combination vs data combination(Working Paper, Department of Computer Science, University of Waik, 1997-05) Ting, Kai Ming; Low, Boon Toh; Witten, Ian H.When presented with multiple batches of data, one can either combine them into a single batch before applying a machine learning procedure or learn from each batch independently and combine the resulting models. The former procedure, data combination, is straightforward; this paper investigates the latter, model combination. Given an appropriate combination method, one might expect model combination to prove superior when the data in each batch was obtained under somewhat different conditions or when different learning algorithms were used on the batches. Empirical results show that model combination often outperforms data combination even when the batches are drawn randomly from a single source of data and the same learning method is used on each. Moreover, this is not just an artifact of one particular method of combining models: it occurs with several different combination methods. We relate this phenomenon to the learning curve of the classifiers being used. Early in the learning process when the learning curve is steep there is much to gain from data combination, but later when it becomes shallow there is less to gain and model combination achieves a greater reduction in variance and hence a lower error rate. The practical implication of these results is that one should consider using model combination rather than data combination, especially when multiple batches of data for the same task are readily available. It is often superior even when the batches are drawn randomly from a single sample, and we expect its advantage to increase if genuine statistical differences between the batches exist.Publication Discovering inter-attribute relationships(Working Paper, 1997-04) Holmes, GeoffreyIt is important to discover relationships between attributes being used to predict a class attribute in supervised learning situations for two reasons. First, any such relationship will be potentially interesting to the provider of a dataset in its own right. Second, it would simplify a learning algorithm’s search space, and the related irrelevant feature and subset selection problem, if the relationships were removed from datasets ahead of learning. An algorithm to discover such relationships is presented in this paper. The algorithm is described and a surprising number of inter-attribute relationships are discovered in datasets from the University of California at Irvine (UCI) repository.Publication Using model trees for classification(Working Paper, 1997-04) Frank, Eibe; Wang, Yong; Inglis, Stuart J.; Holmes, Geoffrey; Witten, Ian H.Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classification problems by employing a standard method of transforming a classification problem into a problem of function approximation. Surprisingly, using this simple transformation the model tree inducer M5’, based on Quinlan’s M5, generates more accurate classifiers than the state-of-the-art decision tree learner C5.0, particularly when most of the attributes are numeric.