Scalable Multi-label Classification

Read, Jesse

Scalable Multi-label Classification

dc.contributor.advisor	Pfahringer, Bernhard
dc.contributor.advisor	Holmes, Geoffrey
dc.contributor.author	Read, Jesse
dc.date.accessioned	2010-10-04T02:20:39Z
dc.date.available	2010-10-04T02:20:39Z
dc.date.issued	2010
dc.date.updated	2010-09-23T23:01:53Z
dc.description.abstract	Multi-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety of approaches, drawing inspiration from many spheres of machine learning, have been able to model these correlations. However, data sources from the real world are growing ever larger and the multi-label task is particularly sensitive to this due to the complexity associated with multiple labels and the correlations between them. Consequently, many methods do not scale up to large problems. This thesis deals with scalable multi-label classification: methods which exhibit high predictive performance, but are also able to scale up to larger problems. The first major contribution is the pruned sets method, which is able to model label correlations directly for high predictive performance, but reduces overfitting and complexity over related methods by pruning and subsampling label sets, and can thus scale up to larger datasets. The second major contribution is the classifier chains method, which models correlations with a chain of binary classifiers. The use of binary models allows for scalability to even larger datasets. Pruned sets and classifier chains are robust with respect to both the variety and scale of data that they can deal with, and can be incorporated into other methods. In an ensemble scheme, these methods are able to compete with state-of-the-art methods in terms of predictive performance as well as scale up to large datasets of hundreds of thousands of training examples. This thesis also puts a special emphasis on multi-label evaluation; introducing a new evaluation measure and studying threshold calibration. With one of the largest and most varied collections of multi-label datasets in the literature, extensive experimental evaluation shows the advantage of these methods, both in terms of predictive performance, and computational efficiency and scalability.
dc.format.mimetype	application/pdf
dc.identifier.citation	Read, J. (2010). Scalable Multi-label Classification (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/4645	en
dc.identifier.uri	https://hdl.handle.net/10289/4645
dc.language.iso	en
dc.publisher	University of Waikato
dc.rights	All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subject	multi-label
dc.subject	scalable methods
dc.subject	classification
dc.title	Scalable Multi-label Classification	en
dc.type	Thesis
pubs.place-of-publication	Hamilton, New Zealand	en_NZ
thesis.degree.grantor	University of Waikato
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (PhD)	en_NZ