Compression and full-text indexing for Digital Libraries
Abstract
This chapter has demonstrated the feasibility of full-text indexing of large information bases. The use of modern compression techniques means that there is no space penalty: large document databases can be compressed and indexed in less than a third of the space required by the originals. Surprisingly, there is little or no time penalty either: querying can be faster because less information needs to be read from disk. Simple queries can be answered in a second; more complex ones with more query terms may take a few seconds. One important application is the creation of static databases on CD-ROM, and a 1.5 gigabyte document database can be compressed onto a standard 660 megabyte CD-ROM.
Creating a compressed and indexed document database containing hundreds of thousands of documents and gigabytes of data takes a few hours. Whereas retrieval can be done on ordinary workstations, creation requires a machine with a fair amount of main memory.
Type
Conference Contribution
Type of thesis
Series
Citation
Witten, I.H., Moffat, A. & Bell, T.C. (1995). Compression and full-text indexing for Digital Libraries. SIGOIS Bulletin, 15(1), 11-13.
Date
1995
Publisher
Springer