Providing pin-point page-level precision to 1 trillion tokens of text for workset creation
Bainbridge, D., Downie, J. S., & Capitanu, B. (2018). Providing pin-point page-level precision to 1 trillion tokens of text for workset creation. In Proceedings of 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) (pp. 407–408). New York, USA: ACM. https://doi.org/10.1145/3197026.3203875
Permanent Research Commons link: https://hdl.handle.net/10289/11929
We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text -- down to the page-level -- of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis. We present an extended example of the web environment in use, along with details about its implementation.
© 2018 Copyright held by the author(s).