Text categorization and similarity analysis: similarity measure, literature review

Fowke, Michael; Hinze, Annika; Heese, Ralf

Text categorization and similarity analysis: similarity measure, literature review

Authors

Fowke, Michael

Hinze, Annika

Heese, Ralf

Files

uow-cs-wp-2013-11.pdf (448.09 KB)

Permanent Link

https://hdl.handle.net/10289/8432

Rights

Abstract

Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions.

Citation

Fowke, M., Hinze, A., & Heese, R.(2013). Text categorization and similarity analysis: similarity measure, literature review. (Working paper 11/2013). Hamilton, New Zealand: University of Waikato, Department of Computer Science.

Type

Working Paper

Series name

Computer Science Working Papers

Date

2013-12

Publisher

University of Waikato, Department of Computer Science

Text categorization and similarity analysis: similarity measure, literature review

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor