Now showing items 1-100 of 178

  • Using Wikipedia for language learning

    Wu, Shaoqun; Witten, Ian H. (2015)
    Differentiating between words like look, see and watch, injury and wound, or broad and wide presents great challenges to language learners because it is the collocates of these words that reveal their different shades of ...
  • Learning English with FLAX apps

    Yu, Alex; Witten, Ian H. (Computing and Information Technology Research and Education New Zealand (CITRENZ), 2015)
    The rise of Mobile Assisted Language Learning has brought a new dimension and dynamic into language classes. Game-like language learning apps have become a particularly effective way to promote self-learning outside classroom ...
  • Second language learning in the context of MOOCs

    Wu, Shaoqun; Fitzgerald, Alannah; Witten, Ian H. (SCITEPRESS, 2014)
    Massive Open Online Courses are becoming popular educational vehicles through which universities reach out to non-traditional audiences. Many enrolees hail from other countries and cultures, and struggle to cope with the ...
  • An open-source toolkit for mining Wikipedia

    Milne, David N.; Witten, Ian H. (Elsevier, 2013)
    The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential ...
  • Constructing a focused taxonomy from a document collection

    Medelyan, Olena; Manion, Steve; Broekstra, Jeen; Divoli, Anna; Huang, Anna-Lan; Witten, Ian H. (Springer-Verlag Berlin Heidelberg, 2013)
    We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, ...
  • Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures

    Medelyan, Olena; Witten, Ian H.; Divoli, Anna; Broekstra, Jeen (Wiley, 2013)
    Abstract, structured, representations of knowledge such as lexicons, taxonomies, and ontologies have proven to be powerful resources not only for the systematization of knowledge in general, but to support practical ...
  • Realistic electronic books

    Liesaputra, Veronica; Witten, Ian H. (Elsevier, 2012)
    We describe a software book model that emulates a range of properties associated with physical books—analog page turning, visual location cues, bookmarks and annotations—and, furthermore, incorporates many advantages of ...
  • Learning a concept-based document similarity measure

    Huang, Lan; Milne, David N.; Frank, Eibe; Witten, Ian H. (Wiley, 2012)
    Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface ...
  • Semantic document representation: Do It with Wikification

    Witten, Ian H. (Springer-Verlag, 2012)
    Wikipedia is a goldmine of information. Each article describes a single concept, and together they constitute a vast investment of manual effort and judgment. Wikification is the process of automatically augmenting a ...
  • Can we avoid high coupling?

    Taube-Schock, Craig; Walker, Robert J.; Witten, Ian H. (Springer-Verlag, 2011)
    It is considered good software design practice to organize source code into modules and to favour within-module connections (cohesion) over between-module connections (coupling), leading to the oft-repeated maxim "low ...
  • A link-based visual search engine for Wikipedia

    Milne, David N.; Witten, Ian H. (ACM, 2011)
    This paper introduces Hopara, a new search engine that aims to make Wikipedia easier to explore. It works on top of the encyclopedia's existing link structure, abstracting away from document content and allowing users to ...
  • Exploring Wikipedia with Hōpara

    Milne, David N.; Witten, Ian H. (ACM, 2011)
    Anyone who has browsed Wikipedia has likely experienced the feeling of being happily lost, browsing from one interesting topic to the next and encountering information that they would never have searched for explicitly. ...
  • A bookmaker's workbench

    Liesaputra, Veronica; Witten, Ian H. (ACM, 2011)
    We have been developing electronic Realistic Books that combine the natural advantages of electronic documents---full-text search, hyperlinks, animation, multimedia---with those of conventional books---the ambient information ...
  • Perambulating libraries: Demonstrating how a Victorian idea can help OLPC users share books

    Witten, Ian H.; Bainbridge, David (2011, 2011)
    In this extended abstract we detail how the open source digital library toolkit Greenstone [5] can help users of the XOlaptop— produced by the One Laptop Per Child Foundation— manage and share electronic documents. The ...
  • Supporting collocation learning with a digital library

    Wu, Shaoqun; Franken, Margaret; Witten, Ian H. (Taylor & Francis Group, 2010)
    Extensive knowledge of collocations is a key factor that distinguishes learners from fluent native speakers. Such knowledge is difficult to acquire simply because there is so much of it. This paper describes a system that ...
  • Utilizing lexical data from a Web-derived corpus to expand productive collocation knowledge

    Wu, Shaoqun; Witten, Ian H.; Franken, Margaret (European Association for Computer Assisted Language Learning, 2010)
    Collocations are of great importance for second language learners, and a learner’s knowledge of them plays a key role in producing language fluently (Nation, 2001: 323). In this article we describe and evaluate an innovative ...
  • WEKA−Experiences with a Java open-source project

    Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (Microtome Publishing, 2010)
    WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the ...
  • Subject metadata support powered by Maui

    Medelyan, Olena; Perrone, Vye; Witten, Ian H. (ACM, 2010)
    Selecting subject headings and keywords is a chore for all metadata editors, who often leave these fields blank or incomplete—even when there are no guidelines and any word or phrase can be chosen. For example, tags are ...
  • Experiences with the Greenstone digital library software for international development

    Nichols, David M.; Rose, John; Bainbridge, David; Witten, Ian H. (2010)
    Greenstone is a versatile open source multilingual digital library environment, emerging from research on text compression within the New Zealand Digital Library Research Project in the Department of Computer Science at ...
  • Computer graphics techniques for modeling page turning

    Liesaputra, Veronica; Witten, Ian H. (Springer, 2009)
    Turning the page is a mechanical part of the cognitive act of reading that we do literally unthinkingly. Interest in realistic book models for digital libraries and other online documents is growing. Yet, actually producing ...
  • Human-competitive tagging using automatic keyphrase extraction

    Medelyan, Olena; Frank, Eibe; Witten, Ian H. (Association for Computational Linguistics, 2009)
    This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. ...
  • Classification

    Witten, Ian H. (Springer, 2009)
    In Classification learning, an algorithm is presented with a set of classified examples or ‘‘instances’’ from which it is expected to infer a way of classifying unseen instances into one of several ‘‘classes’’. Instances ...
  • Clustering documents with active learning using Wikipedia

    Huang, Anna; Witten, Ian H.; Frank, Eibe; Milne, David N. (IEEE Computer Society, 2009)
    Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge ...
  • Creating and reading realistic electronic books

    Liesaputra, Veronica; Witten, Ian H.; Bainbridge, David (IEEE Press, 2009)
    A digital library project aims to combine the look and feel of physical books with the advantages of online documents such as hyperlinks and multimedia. A lightweight open source implementation enables highly responsive ...
  • Mining meaning from Wikipedia

    Medelyan, Olena; Milne, David N.; Legg, Catherine; Witten, Ian H. (Elsevier, 2009)
    Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of ...
  • Clustering documents using a Wikipedia-based concept representation

    Huang, Anna; Witten, Ian H.; Frank, Eibe; Milne, David N. (Springer, 2009)
    This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to ...
  • Searching in a Book

    Liesaputra, Veronica; Witten, Ian H.; Bainbridge, David (Springer, 2009)
    Information has no value unless it is accessible. With physical books, most people rely on the table of contents and subject index to find what they want. But what if they are reading a book in a digital library and have ...
  • Refining the use of the web (and web search) as a language teaching and learning resource

    Wu, Shaoqun; Franken, Margaret; Witten, Ian H. (Routledge, 2009)
    The web is a potentially useful corpus for language study because it provides examples of language that are contextualized and authentic, and is large and easily searchable. However, web contents are heterogeneous in the ...
  • Stress-testing general purpose digital library software

    Bainbridge, David; Witten, Ian H.; Boddie, Stefan J.; Thompson, John (Springer, 2009)
    DSpace, Fedora, and Greenstone are three widely used open source digital library systems. In this paper we report on scalability tests performed on these tools by ourselves and others. These range from repositories populated ...
  • Mining meaning from Wikipedia

    Medelyan, Olena; Legg, Catherine; Milne, David N.; Witten, Ian H. (University of Waikato, Department of Computer Science, 2008-09)
    Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of ...
  • Domain-independent automatic keyphrase indexing with small training sets

    Medelyan, Olena; Witten, Ian H. (Wiley InterScience, 2008-03)
    Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist ...
  • Running greenstone on an iPod

    Bainbridge, David; Jones, Steve; McIntosh, Samuel John; Jones, Matt; Witten, Ian H. (ACM, 2008)
    The open source digital library software Greenstone is demonstrated running on an iPod. The standalone configuration supports browsing, searching and displaying documents in a range of media formats. Plugged in to a host ...
  • Seeking information in realistic books: a user study

    Liesaputra, Veronica; Witten, Ian H. (ACM, 2008)
    There are opposing views on whether readers gain any advantage from using a computer model of a 3D physical book. There is enough evidence, both anecdotal and from formal user studies, to suggest that the usual HTML or PDF ...
  • One-Class Classification by Combining Density and Class Probability Estimation

    Hempstalk, Kathryn; Frank, Eibe; Witten, Ian H. (Springer, Berlin, 2008)
    One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density estimation techniques or by adapting a standard classification algorithm to the problem of ...
  • Beyond the Client-Server Model: Self-contained to Portable Digital Libraries

    Bainbridge, David; Jones, Steve; McIntosh, Samuel John; Witten, Ian H.; Jones, Matt (Springer, 2008)
    We have created an experimental prototype that enhances an ordinary personal media player by adding digital library capabilities. It does not enable access to a remote digital library from a user’s PDA; rather, it runs a ...
  • Portable digital libraries on an iPod

    Bainbridge, David; Jones, Steve; McIntosh, Samuel John; Jones, Matt; Witten, Ian H. (ACM, 2008)
    This paper describes the facilities we built to run a self-contained digital library on an iPod. The digital library software used was the open source package Greenstone, and the paper highlights the technical problems ...
  • Learning to link with Wikipedia

    Milne, David N.; Witten, Ian H. (ACM, 2008)
    This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique ...
  • Topic indexing with Wikipedia

    Medelyan, Olena; Witten, Ian H.; Milne, David N. (AAAI Press, 2008)
    Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics in a document. Wikipedia’s 2M articles cover the terminology of nearly any document collection, which permits controlled ...
  • Searching ... in a Web

    Witten, Ian H. (Institut fuer Informationssysteme und Computer Medien, 2008)
    Search engines—“web dragons”—are the portals through which we access society’s treasure trove of information. They do not publish the algorithms they use to sort and filter information, yet what they do and how they do it ...
  • A User-Oriented Approach to Scheduling Collection Building in Greenstone

    Osborn, Wendy; Bainbridge, David; Witten, Ian H. (Springer, 2008)
    We propose a user-oriented approach for the automated and scheduled maintenance of Greenstone digital library collections. Existing systems require the user either to add new data manually to a collection, or to have ...
  • A competitive environment for exploratory query expansion

    Milne, David N.; Nichols, David M.; Witten, Ian H. (ACM, 2008)
    Most information workers query digital libraries many times a day. Yet people have little opportunity to hone their skills in a controlled environment, or compare their performance with others in an objective way. Conversely, ...
  • A fedora librarian interface

    Bainbridge, David; Witten, Ian H. (ACM, 2008)
    The Fedora content management system embodies a powerful and flexible digital object model. This paper describes a new open-source software front-end that enables end-user librarians to transfer documents and metadata in ...
  • An effective, low-cost measure of semantic relatedness obtained from Wikipedia links

    Witten, Ian H.; Milne, David N. (AAAI Press, 2008)
    This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique ...
  • The Development and Usage of the Greenstone Digital Library Software

    Witten, Ian H. (ASIS&T, 2008)
    The Greenstone software has helped spread the practical impact of digital library technology throughout the world-particularly in developing countries. This article reviews the project’s origins, usage, and the development ...
  • Semantics in Greenstone

    Hinze, Annika; Buchanan, George; Bainbridge, David; Witten, Ian H. (Springer Berlin Heidelberg, 2008)
    This chapter illustrates the impact on a well known digital library system - Greenstone - when it is moved from fixed modules and simple metadatabased structures, to open semantic digital library modules. This change has ...
  • Computer graphics techniques for modeling page turning

    Liesaputra, Veronica; Witten, Ian H. (University of Waikato, Department of Computer Science, 2007-10-24)
    Turning the page is a mechanical part of the cognitive act of reading that we do literally unthinkingly. Interest in realistic book models for digital libraries and other online documents is growing. Yet actually producing ...
  • Extracting corpus specific knowledge bases from Wikipedia

    Milne, David N.; Witten, Ian H.; Nichols, David M. (University of Waikato, Department of Computer Science, 2007-06-01)
    Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. ...
  • A retrospective look at Greenstone: Lessons from the first decade

    Witten, Ian H.; Bainbridge, David (ACM, 2007)
    The Greenstone Digital Library Software has helped spread the practical impact of digital library technology throughout the world, with particular emphasis on developing countries. As Greenstone enters its second decade, ...
  • Lightweight realistic books: The Greenstone connection

    Liesaputra, Veronica; Witten, Ian H.; Bainbridge, David (ACM, 2007)
    Realistic physically-based computer models of page-turning have been around for years, but are rarely deployed in practice except as eye-catching demos. This demo shows a connection from the Greenstone digital library ...
  • A digital library of language learning exercises

    Wu, Shaoqun; Witten, Ian H.; Edwards, Arthur; Nichols, David M.; Aquino, Raúl (Kassel University Press, 2007)
    Recent years have seen widespread adoption of the Internet for language teaching and learning. Interactive systems on the World-Wide Web provide useful alternatives to face-to-face tuition, and both teachers and learners ...
  • A knowledge-based search engine powered by Wikipedia

    Milne, David N.; Witten, Ian H.; Nichols, David M. (ACM, 2007)
    This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it ...
  • Content-Based Language Learning in a Digital Library

    Wu, Shaoqun; Witten, Ian H. (Springer, 2007)
    Digital libraries have untapped potential for supporting language teaching and learning. This paper describes a new scheme for automating topic-specific language learning using a specially built digital library. Three ...
  • Detecting replay attacks in audiovisual identity verification

    Bredin, Herve; Miguel, Antonio; Witten, Ian H.; Chollet, Gerard (IEEE Computer Society, 2006)
    We describe an algorithm that detects a lack of correspondence between speech and lip motion by detecting and monitoring the degree of synchrony between live audio and visual signals. It is simple, effective, and computationally ...
  • Extending Greenstone for Institutional Repositories

    Bainbridge, David; Osborn, Wendy; Witten, Ian H.; Nichols, David M. (Springer, 2006)
    We examine the problem of designing a generalized system for building institutional repositories. Widely used schemes such as DSpace are tailored to a particular set of requirements: fixed metadata set; standard view when ...
  • Document level interoperability for Collection Creators

    Bainbridge, David; Ke, Kaun-Yu; Witten, Ian H. (ACM, 2006)
    Digital library interoperability for both documents and metadata is a critical and complex issue. Although many relevant standards have been developed, and continue to evolve, in practice things are not quite so easy as ...
  • Towards a digital library for language learning

    Wu, Shaoqun; Witten, Ian H. (Springer, 2006)
    Digital libraries have untapped potential for supporting language teaching and learning. Although the Internet at large is widely used for language education, it has critical disadvantages that can be overcome in a more ...
  • Thesaurus based automatic keyphrase indexing

    Medelyan, Olena; Witten, Ian H. (ACM, 2006)
    We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by ...
  • Measuring inter-indexer consistency using a thesaurus

    Medelyan, Olena; Witten, Ian H. (ACM, 2006)
    When professional indexers independently assign terms to a given document, the term sets generally differ between indexers. Studies of inter-indexer consistency measure the percentage of matching index terms, but none of ...
  • Digital libraries for the developing world

    Witten, Ian H. (ACM, 2006)
    Digital libraries (DLs) are the killer app for information technology in developing countries. Priorities here include health, agriculture, nutrition, hygiene, sanitation, and safe drinking water. Computers are not a ...
  • Mining Domain-Specific Thesauri from Wikipedia: A case study

    Milne, David N.; Medelyan, Olena; Witten, Ian H. (IEEE Computer Society, 2006)
    Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a ...
  • How the dragons work: searching in a web

    Witten, Ian H. (ACM, 2006)
    Search engines -- "web dragons" -- are the portals through which we access society's treasure trove of information. They do not publish the algorithms they use to sort and filter information, yet how they work is one of ...
  • StoneD: A bridge between Greenstone and DSpace

    Witten, Ian H.; Bainbridge, David; Tansley, Robert; Huang, Chi-Yu; Don, Katherine J. (University of Waikato, Department of Computer Science, 2005-04)
    Greenstone and DSpace are widely-used software systems for digital libraries, and prospective users sometimes wonder which one to adopt. In fact, the aims of the two are very different, although their domains of application ...
  • Weka: A machine learning workbench for data mining

    Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard; Witten, Ian H. (Springer, 2005)
    The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, ...
  • Practical digital library interoperability standards

    Bainbridge, David; Witten, Ian H. (ACM, 2005)
    As the field of digital libraries matures and new systems and standards develop, the ability to interoperate between systems becomes paramount. This tutorial gives a practical introduction to many recent standards and de ...
  • A new framework for building digital library collections

    Buchanan, George; Bainbridge, David; Don, Katherine J.; Witten, Ian H. (ACM, 2005)
    This paper introduces a new framework for building digital library collections and contrasts it with existing systems. It describes a significant new step in the development of a widely-used open-source digital library ...
  • Managing personal documents with a digital library

    Jaballah, Imene; Cunningham, Sally Jo; Witten, Ian H. (Springer, 2005)
    This paper presents a desktop system for managing personal documents. The documents can be of many types—text, spreadsheets, images, multimedia—and are organized in a personal “digital library”. The interface supports ...
  • Building digital library collections with greenstone

    Witten, Ian H.; Bainbridge, David (ACM, 2005)
    This tutorial will demonstrate how to build a variety of different kinds of digital library collections with the Greenstone digital library software, a comprehensive, open-source system for constructing, presenting, and ...
  • Searching digital music libraries

    Bainbridge, David; Dewsnip, Michael; Witten, Ian H. (Elsevier B.V., 2005)
    There has been a recent explosion of interest in digital music libraries. In particular, interactive melody retrieval is a striking example of a search paradigm that differs radically from the standard full-text search. ...
  • Digital libraries and minority languages

    Nichols, David M.; Witten, Ian H.; Keegan, Te Taka Adrian Gregory; Bainbridge, David; Dewsnip, Michael (Taylor & Francis, 2005)
    Digital libraries have a pivotal role to play in the preservation and maintenance of international cultures in general and minority languages in particular. This paper outlines a software tool for building digital libraries ...
  • Creating digital library collections with Greenstone

    Witten, Ian H.; Bainbridge, David (Emerald Group Publishing Limited, 2005)
    The Greenstone digital library software is a comprehensive system for building and distributing digital library collections. It provides a way of organizing information based on metadata and publishing ti on the Internet. ...
  • Thesaurus-based index term extraction for agricultural documents

    Medelyan, Olena; Witten, Ian H. (EFITA/WICCA, 2005)
    This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled ...
  • Text mining in a digital library

    Witten, Ian H.; Don, Katherine J.; Dewsnip, Michael; Tablan, Valentin (Springer, 2004)
    Digital library strive to add value to the collections they create and maintain. One way is through selectivity: a carefully chosen set of authoritative documents in a particular topic area is far more useful to those ...
  • Applying machine learning to programming by demonstration

    Paynter, Gordon W.; Witten, Ian H.; Koblitz, Neil; Powell, Matthew (Taylor & Francis, 2004)
    ‘Familiar’ is a tool that helps end-users automate iterative tasks in their applications by showing examples of what they want to do. It observes the user’s actions, predicts what they will do next, and then offers to ...
  • Digital library access for illiterate users

    Deo, Shaleen; Nichols, David M.; Cunningham, Sally Jo; Witten, Ian H.; Trujillo, Maria F. (College of Information Technology, 2004)
    The problems that illiteracy poses in accessing information are gaining attention from the research community. Issues currently being explored include developing an understanding of the barriers to information acquisition ...
  • Creating and customizing digital library collections with the Greenstone Librarian Interface

    Witten, Ian H. (University of Tsukuba, 2004)
    The Greenstone digital library software is a comprehensive system for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet. This paper ...
  • Realistic books: A bizarre homage to an obsolete medium?

    Chu, Yi-Chun; Bainbridge, David; Jones, Matt; Witten, Ian H. (ACM, 2004)
    For many readers, handling a physical book is an enjoyably exquisite part of the information seeking process. Many physical characteristics of a book-its size, heft, the patina of use on its pages and so on-communicate ...
  • Adaptive text mining: Inferring structure from sequences

    Witten, Ian H. (Elsevier B.V., 2004)
    Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although ...
  • Greenstone digital library software: current research

    Bainbridge, David; Witten, Ian H. (ACM, 2004)
    The Greenstone digital library software (www.greenstone.org)provides a flexible way of organizing information and publishing it on the Internet or removable media such as CDROM. Its aim is to empower users, particularly ...
  • Dynamic digital library construction and configuration

    Bainbridge, David; Don, Katherine J.; Buchanan, George; Witten, Ian H.; Jones, Steve; Jones, Matt; Barr, Malcolm I. (Springer, 2004)
    This paper describes a digital library architecture and implementation that is configurable, extensible and dynamic in the way it presents content and in the services it provides. The design manifests itself as a network ...
  • Digital libraries: developing countries, universal access, and information for all

    Witten, Ian H. (Springer, 2004)
    Digital libraries are large, organized collections of information objects. Well-designed digital library software has the potential to enable non-specialist people to conceive, assemble, build, and disseminate new information ...
  • Digital libraries for creative communities

    Witten, Ian H.; Jones, Matt; Bainbridge, David; Cantlon, Polly; Cunningham, Sally Jo (Taylor & Francis Group, 2004)
    Digital library technologies have a great deal to offer to creative, design communities. They can enable large collections of text, images, music, video and other information objects to be organised and accessed in interesting ...
  • Data mining in bioinformatics using Weka

    Frank, Eibe; Hall, Mark A.; Trigg, Leonard E.; Holmes, Geoffrey; Witten, Ian H. (Oxford University Press., 2004)
    The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an ...
  • Customizing digital library interfaces with Greenstone

    Witten, Ian H. (IEEE Computer Society, 2003)
    Digital libraries are organized, focused collections of information. They are focused on a particular topic or theme—and good digital libraries will articulate the principles governing what is included. They are organized ...
  • A user evaluation of hierarchical phrase browsing

    Edgar, Katrina D.; Nichols, David M.; Paynter, Gordon W.; Thomson, Kirsten; Witten, Ian H. (Springer, 2003)
    Phrase browsing interfaces based on hierarchies of phrases extracted automatically from document collections offer a useful compromise between automatic full-text searching and manually-created subject indexes. The literature ...
  • How to turn the page

    Chu, Yi-Chun; Witten, Ian H.; Lobb, Richard; Bainbridge, David (IEEE Computer Society, 2003)
    Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? This paper describes a prototype page-turning system that realistically animates ...
  • Managing change in a digital library system with many interface languages

    Bainbridge, David; Edgar, Katrina D.; Witten, Ian H.; McPherson, John R. (Springer Berlin, 2003)
    Managing the organizational and software complexity of a comprehensive open source digital library system presents a significant challenge. The challenge becomes even more imposing when the interface is available in different ...
  • Token identification using HMM and PPM models

    Wen, Yingying; Witten, Ian H.; Wang, Dianhui (Springer, 2003)
    Hidden markov models (HMMs) and prediction by partial matching models (PPM) have been successfully used in language processing tasks including learning-based token identification. Most of the existing systems are domain- ...
  • Assembling and enriching digital library collections

    Bainbridge, David; Thompson, John; Witten, Ian H. (IEEE Computer Society, 2003)
    People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. This paper sets out the requirements for these tasks and describes a new tool ...
  • Explaining cryptographic systems

    Bell, Timothy C.; Thimbleby, Harold W.; Fellows, Mike; Witten, Ian H.; Koblitz, Neil; Powell, Matthew (Elsevier, 2003)
    Modern cryptography can achieve levels of security and authentication that non-specialists find literally incredible. Techniques including information-hiding protocols, zero-knowledge proofs and public key cryptosystems ...
  • Learning structure from sequences, with applications in a digital library

    Witten, Ian H. (Springer, 2002)
    The services that digital libraries provide to users can be greatly enhanced by automatically gleaning certain kinds of information from the full text of the documents they contain. This paper reviews some recent work that ...
  • Examples of practical digital libraries: collections built internationally using Greenstone

    Witten, Ian H. (Springer, 2002)
    Although the field of digital libraries is still young, digital library collections have been built around the world and are being deployed on numerous public web sites. But what is a digital library, exactly? In many ...
  • Delivering the Maori-language newspapers on the Internet

    Apperley, Mark; Keegan, Te Taka Adrian Gregory; Cunningham, Sally Jo; Witten, Ian H. (Auckland University Press, Auckland, New Zealand, 2002)
    Although any collection of historical newspapers provides a particularly rich and valuable record of events and social and political commentary, the content tends to be difficult to access and extremely time-consuming to ...
  • Importing documents and metadata into digital libraries: requirements analysis and an extensible architecture

    Witten, Ian H.; Bainbridge, David; Paynter, Gordon W.; Boddie, Stefan J. (Springer, 2002)
    Flexible digital library systems need to be able to accept, or “import,” documents and metadata in a variety of forms, and associate metadata with the appropriate documents. This paper analyzes the requirements of the ...
  • Modeling for optimal probability prediction

    Wang, Yong; Witten, Ian H. (Morgan Kaufmann Publishers Inc., 2002)
    We present a general modelling method for optimal probability prediction over future observations, in which model dimensionality is determined as a natural by-product. This new method yields several estimators, and we ...
  • Greenstone: Open-source digital library software

    Witten, Ian H.; Bainbridge, David; Boddie, Stefan J. (http://www.dlib.org/dlib/october01/witten/10witten.html, 2001-10)
    The Greenstone digital library software is an open-source system for the construction and presentation of information collections. It builds collections with effective full-text searching and metadata-based browsing ...
  • The promise of digital libraries in developing countries

    Witten, Ian H.; Loots, Michel; Trujillo, Maria F.; Bainbridge, David (ASSOC COMPUTING MACHINERY, 2001-05-01)
    Although knowledge is critical for development, few developing countries are participating in the information revolution. Just as industrialization and globalization have increased the gulf between the haves and have-nots, ...
  • Greenstone: open-source DL software

    Witten, Ian H.; Bainbridge, David; Boddie, Stefan J. (ACM, 2001-05-01)
    Greenstone is a comprehensive system for constructing and presenting collections of thousands or millions of documents, including text, images, audio, and video. Greenstone libraries contain many collections, individually ...
  • Niupepa: A historical newspaper collection

    Apperley, Mark; Cunningham, Sally Jo; Keegan, Te Taka Adrian Gregory; Witten, Ian H. (ASSOC COMPUTING MACHINERY, 2001-05-01)
    Niupepa is a collection of 42 newspaper titles published in New Zealand from 1842-1933, comprising a total of 21,000 pages in 1,750 issues. This collection forms a unique historical record of the language of the indigenous ...
  • Domain-independent programming by demonstration in existing applications

    Paynter, Gordon W.; Witten, Ian H. (Morgan Kaufmann, 2001-02)
    This paper describes Familiar, a domain- independent programming by demonstration system for automating iterative tasks in existing, unmodified applications on a popular commercial platform. Familiar is domain- independent ...
  • Greenstone: Open source digital library software with end-user collection building

    Witten, Ian H.; Bainbridge, David; Boddie, Stefan J. (Emerald Insight, 2001)
    The Greenstone digital library software is an open-source system for the construction and presentation of information collections. Collections built with Greenstone offer effective full-text searching and metadata-based ...

Showing up to 5 theses - most recently added to Research Commons first.

  • Learner Modelling for Individualised Reading in a Second Language

    Walmsley, Michael (University of Waikato, 2015)
    Extensive reading is an effective language learning technique that involves fast reading of large quantities of easy and interesting second language (L2) text. However, graded readers used by beginner learners are expensive ...
  • A Predictive Model for the Parallel Processing of Digital Libraries

    Thompson, John Matthew (University of Waikato, 2015)
    The computing world is facing the problem of a seemingly exponential increase in the amount of raw digital data, and the speed at which it is being collected, is eclipsing our ability to manage it manually. Combine this ...
  • Scalable Text Mining with Sparse Generative Models

    Puurula, Antti (University of Waikato, 2015)
    The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks ...
  • Computing the fast Fourier transform on SIMD microprocessors

    Blake, Anthony Martin (University of Waikato, 2012)
    This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very close to the speed of state of the art ...
  • Patterns of Change: Can modifiable software have high coupling?

    Taube-Schock, Craig (University of Waikato, 2012)
    There are few aspects of modern life that remain unaffected by software, and as our day-to-day challenges change, so too must our software. Software systems are complex, and as they grow larger and more interconnected, ...

... View More