Providing pin-point page-level precision to 1 trillion tokens of text for workset creation

We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text -- down to the page-level -- of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis. We present an extended example of the web environment in use, along with details about its implementation.

Citation

Bainbridge, D., Downie, J. S., & Capitanu, B. (2018). Providing pin-point page-level precision to 1 trillion tokens of text for workset creation. In Proceedings of 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) (pp. 407–408). New York, USA: ACM. https://doi.org/10.1145/3197026.3203875

Type

Conference Contribution

Date

2018

Publisher

ACM

Providing pin-point page-level precision to 1 trillion tokens of text for workset creation

Authors

Files

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor