Show simple item record  

dc.contributor.authorEvans, Reuben James Emmanuel
dc.contributor.authorPfahringer, Bernhard
dc.contributor.authorHolmes, Geoffrey
dc.coverage.spatialConference held at Kuching, Sarawak, Malaysiaen_NZ
dc.date.accessioned2012-02-08T02:07:20Z
dc.date.available2012-02-08T02:07:20Z
dc.date.issued2011
dc.identifier.citationEvans, R. & Pfahringer, B. (2011). Clustering for classification. In Proceedings of 2011 7th International Conference Information Technology in Asia (CITA 11), 12-13 July 2011, Kuching, Sarawak (pp. 1-8).en_NZ
dc.identifier.urihttps://hdl.handle.net/10289/6004
dc.description.abstractAdvances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these datasets it would be useful to be able to apply modern methods of classification such as support vector machines. Unfortunately these methods are computationally expensive, quadratic in the number of data points in fact, so cannot be applied directly. This paper proposes a framework whereby a variety of clustering methods can be used to summarise datasets, that is, reduce them to a smaller but still representative dataset so that advanced methods can be applied. It compares the results of using this framework against using random selection on a large number of classification problems. Results show that clustering prior to classification is beneficial when employing a sophisticated classifier however when the classifier is simple the benefits over random selection are not justified given the added cost of clustering. The results also show that for each dataset it is important to choose a clustering method carefully.en_NZ
dc.language.isoen
dc.publisherIEEEen_NZ
dc.relation.urihttp://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5998839&abstractAccess=no&userType=insten_NZ
dc.source7th International Conference on Information Technology in Asia (CITA)en_NZ
dc.subjectcomputer scienceen_NZ
dc.subjectclassification methodsen_NZ
dc.subjectclustering methodsen_NZ
dc.subjectrandom selectionen_NZ
dc.subjectsupport vector machinesen_NZ
dc.subjectMachine learning
dc.titleClustering for classificationen_NZ
dc.typeConference Contributionen_NZ
dc.identifier.doi10.1109/CITA.2011.5998839en_NZ
dc.relation.isPartOfProceedings of the Seventh International Conference on Information Technology in Asia 2011: Emerging Convergences and Singularity of Formsen_NZ
pubs.begin-page27en_NZ
pubs.elements-id20825
pubs.end-page34en_NZ
pubs.finish-date2011-07-14en_NZ
pubs.place-of-publicationMalaysiaen_NZ
pubs.start-date2011-07-12en_NZ


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record