Scalable Multi-label Classification

dc.contributor.advisorPfahringer, Bernhard
dc.contributor.advisorHolmes, Geoffrey
dc.contributor.authorRead, Jesse
dc.date.accessioned2010-10-04T02:20:39Z
dc.date.available2010-10-04T02:20:39Z
dc.date.issued2010
dc.date.updated2010-09-23T23:01:53Z
dc.description.abstractMulti-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety of approaches, drawing inspiration from many spheres of machine learning, have been able to model these correlations. However, data sources from the real world are growing ever larger and the multi-label task is particularly sensitive to this due to the complexity associated with multiple labels and the correlations between them. Consequently, many methods do not scale up to large problems. This thesis deals with scalable multi-label classification: methods which exhibit high predictive performance, but are also able to scale up to larger problems. The first major contribution is the pruned sets method, which is able to model label correlations directly for high predictive performance, but reduces overfitting and complexity over related methods by pruning and subsampling label sets, and can thus scale up to larger datasets. The second major contribution is the classifier chains method, which models correlations with a chain of binary classifiers. The use of binary models allows for scalability to even larger datasets. Pruned sets and classifier chains are robust with respect to both the variety and scale of data that they can deal with, and can be incorporated into other methods. In an ensemble scheme, these methods are able to compete with state-of-the-art methods in terms of predictive performance as well as scale up to large datasets of hundreds of thousands of training examples. This thesis also puts a special emphasis on multi-label evaluation; introducing a new evaluation measure and studying threshold calibration. With one of the largest and most varied collections of multi-label datasets in the literature, extensive experimental evaluation shows the advantage of these methods, both in terms of predictive performance, and computational efficiency and scalability.
dc.format.mimetypeapplication/pdf
dc.identifier.citationRead, J. (2010). Scalable Multi-label Classification (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/4645en
dc.identifier.urihttps://hdl.handle.net/10289/4645
dc.language.isoen
dc.publisherUniversity of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectmulti-label
dc.subjectscalable methods
dc.subjectclassification
dc.titleScalable Multi-label Classificationen
dc.typeThesis
pubs.place-of-publicationHamilton, New Zealanden_NZ
thesis.degree.grantorUniversity of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)en_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.07 KB
Format:
Item-specific license agreed upon to submission
Description: