Loading...
Thumbnail Image
Item

Providing pin-point page-level precision to 1 trillion tokens of text for workset creation

Abstract
We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text -- down to the page-level -- of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis. We present an extended example of the web environment in use, along with details about its implementation.
Type
Conference Contribution
Type of thesis
Series
Citation
Bainbridge, D., Downie, J. S., & Capitanu, B. (2018). Providing pin-point page-level precision to 1 trillion tokens of text for workset creation. In Proceedings of 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) (pp. 407–408). New York, USA: ACM. https://doi.org/10.1145/3197026.3203875
Date
2018
Publisher
ACM
Degree
Supervisors
Rights
© 2018 Copyright held by the author(s).