Mining Arbitrarily Large Datasets Using Heuristic k-Nearest Neighbour Search

Wu, Xing; Holmes, Geoffrey; Pfahringer, Bernhard

doi:10.1007/978-3-540-89378-3_35

Mining Arbitrarily Large Datasets Using Heuristic k-Nearest Neighbour Search

Authors

Wu, Xing

Holmes, Geoffrey

Pfahringer, Bernhard

Permanent Link

https://hdl.handle.net/10289/1764

DOI

10.1007/978-3-540-89378-3_35

Abstract

Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number of dimensions is greater than two there are no known solutions that can guarantee a sublinear retrieval time. This paper describes and evaluates two ways to make NNS efficient for datasets that are arbitrarily large in the number of instances and dimensions. The methods are best described as heuristic as they are neither exact nor approximate. Both stem from recent developments in the field of data stream classification. The first uses Hoeffding Trees, an extension of decision trees to streams and the second is a direct stream extension of NNS. The methods are evaluated in terms of their accuracy and the time taken to find the neighbours. Results show that the methods are competitive with NNS in terms of accuracy but significantly faster.

Citation

Wu, X. ,Pfahringer, B. & Holmes, G. (2008). Mining arbitrarily large datasets using heuristic k-nearest neighbour search. In W. Wobcke & M. Zhang, Proceedings of 21st Australasian Joint Conference on Artificial Intelligence Auckland, New Zealand, December 1-5, 2008(pp. 355-361). Berlin: Springer.

Type

Conference Contribution

Date

2008

Publisher

Springer

Mining Arbitrarily Large Datasets Using Heuristic k-Nearest Neighbour Search

Authors

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor