Clustering for Classification

Evans, Reuben James Emmanuel

Clustering for Classification

Authors

Evans, Reuben James Emmanuel

Files

thesis.pdf (496.06 KB)

Permanent Link

https://hdl.handle.net/10289/2403

Rights

Abstract

Advances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these datasets it would be useful to be able to apply modern methods of classification such as support vector machines. Unfortunately these methods are computationally expensive, quadratic in the number of data points in fact, so cannot be applied directly. This thesis proposes a framework whereby a variety of clustering methods can be used to summarise datasets, that is, reduce them to a smaller but still representative dataset so that these advanced methods can be applied. It compares the results of using this framework against using random selection on a large number of classification and regression problems. Results show that the clustered datasets are on average fifty percent smaller than the original datasets without loss of classification accuracy which is significantly better than random selection. They also show that there is no free lunch, for each dataset it is important to choose a clustering method carefully.

Citation

Evans, R. J. E. (2007). Clustering for Classification (Thesis, Master of Science (MSc)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/2403

Type

Thesis

Date

2007

Publisher

The University of Waikato

Degree

Master of Science (MSc)

Clustering for Classification

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor