Publication:
Providing pin-point page-level precision to 1 trillion tokens of text for workset creation

Abstract

We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text -- down to the page-level -- of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis. We present an extended example of the web environment in use, along with details about its implementation.

Citation

Bainbridge, D., Downie, J. S., & Capitanu, B. (2018). Providing pin-point page-level precision to 1 trillion tokens of text for workset creation. In Proceedings of 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) (pp. 407–408). New York, USA: ACM. https://doi.org/10.1145/3197026.3203875

Series name

Date

Publisher

ACM

Degree

Type of thesis

Supervisor

DOI

Link to supplementary material

Research Projects

Organizational Units

Journal Issue