Learning structure from sequences, with applications in a digital library

Witten, Ian H.

doi:10.1007/3-540-36169-3_6

Learning structure from sequences, with applications in a digital library

Authors

Witten, Ian H.

Permanent Link

https://hdl.handle.net/10289/1348

DOI

10.1007/3-540-36169-3_6

Abstract

The services that digital libraries provide to users can be greatly enhanced by automatically gleaning certain kinds of information from the full text of the documents they contain. This paper reviews some recent work that applies novel techniques of machine learning (broadly interpreted) to extract information from plain text, and puts it in the context of digital library applications. We describe three areas: hierarchical phrase browsing, including efficient methods for inferring a phrase hierarchy from a large corpus of text; text mining using adaptive compression techniques, giving a new approach to generic entity extraction, word segmentation, and acronym extraction; and keyphrase extraction.

Citation

Witten, I.H. (2002). Learning structure from sequences, with applications in a digital library. In Algorithmic Learning Theory, Algorithmic Learning Theory. Lecture Notes in Computer Science Volume 2533, 2002, pp 42-56.

Type

Conference Contribution

Date

2002

Publisher

Springer

Learning structure from sequences, with applications in a digital library

Authors

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor