Show simple item record  

dc.contributor.authorWitten, Ian H.
dc.contributor.authorNevill-Manning, Craig G.
dc.contributor.authorCunningham, Sally Jo
dc.date.accessioned2008-10-21T00:50:13Z
dc.date.available2008-10-21T00:50:13Z
dc.date.issued1995-08
dc.identifier.citationWitten, I. H., Nevill-Manning, C. G. & Cunningham, S. J. (1995) Building a public digital library based on full-text retrieval. (Working paper 95/24). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_US
dc.identifier.issn1170-487X
dc.identifier.urihttps://hdl.handle.net/10289/1101
dc.description.abstractDigital libraries are expensive to create and maintain, and generally restricted to a particular corporation or group of paying subscribers. While many indexes to the World Wide Web are freely available, the quality of what is indexed is extremely uneven. The digital analog of a public library a reliable, quality, community service has yet to appear. This paper demonstrates the feasibility of a cost-effective collection of high-quality public-domain information, available free over the Internet. One obstacle to the creation of a digital library is the difficulty of providing formal cataloguing information. Without a title, author and subject database it seems hard to offer the searching facilities normally available in physical libraries. Full-text retrieval provides a way of approximating these services without a concomitant investment of resources. A second is the problem of finding a suitable corpus of material. Computer science research reports form the focus of our prototype implementation. These constitute a large body of high-quality public-domain documents. Given such a corpus, a third issue becomes the question of obtaining both plain text for indexing, and page images for readability. Typesetting formats such as PostScript provide some of the benefits of libraries scanned from paper documents such as paged-based indexing and viewing without the physical demands and error-prone nature of scanning and optical character recognition. However, until recently the difficulty of extracting text from PostScript seems to have encouraged indexing on plain-text abstracts or bibliographic information provided by authors. We have developed a new technique that overcomes the problem. This paper describes the architecture, the indexing, collection and maintenance processes, and the retrieval interface, to a prototype public digital library.en_US
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherUniversity of Waikato, Department of Computer Scienceen_US
dc.relation.ispartofseriesComputer Science Working Papers
dc.subjectcomputer scienceen_US
dc.subjectdigital librariesen_US
dc.subjectinformation retrievalen_US
dc.subjectfull-text retrievalen_US
dc.subjectdistributed databasesen_US
dc.subjectdata compressionen_US
dc.subjectuser interfacesen_US
dc.titleBuilding a public digital library based on full-text retrievalen_US
dc.typeWorking Paperen_US
uow.relation.series95/24


Files in this item

This item appears in the following Collection(s)

Show simple item record