Text categorization and similarity analysis: implementation and evaluation

dc.contributor.authorFowke, Michael
dc.contributor.authorHinze, Annika
dc.contributor.authorHeese, Ralf
dc.date.accessioned2014-01-29T03:22:08Z
dc.date.available2014-01-29T03:22:08Z
dc.date.issued2013-12
dc.description.abstractThis report covers the implementation of software that aims to identify document versions and se-mantically related documents. This is important due to the increasing amount of digital information. Key criteria were that the software was fast and required limited disk space. Previous research de-termined that the Simhash algorithm was the most appropriate for this application so this method was implemented. The structure of each component was well defined with the inputs and outputs constant and the result was a software system that can have interchangeable parts if required.en_NZ
dc.format.mimetypeapplication/pdf
dc.identifier.citationFowke, M., Hinze, A., & Heese, R. (2013). Text categorization and similarity analysis: implementation and evaluation. (Working paper 10/2013). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_NZ
dc.identifier.issn1177-777X
dc.identifier.urihttps://hdl.handle.net/10289/8430
dc.language.isoenen_NZ
dc.publisherUniversity of Waikato, Department of Computer Scienceen_NZ
dc.relation.ispartofseriesComputer Science Working Papersen_NZ
dc.rights© 2013 Michael Fowke, Annika Hinze, Ralf Heese.en_NZ
dc.subjectcomputer scienceen_NZ
dc.titleText categorization and similarity analysis: implementation and evaluationen_NZ
dc.typeWorking Paperen_NZ
uow.relation.series10/2013en_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
uow-cs-wp-2013-10.pdf
Size:
706.7 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: