A study of hierarchical and flat classification of proteins

dc.contributor.authorZimek, Arthur
dc.contributor.authorBuchwald, Fabian
dc.contributor.authorFrank, Eibe
dc.contributor.authorKramer, Stefan
dc.coverage.spatialUnited Statesen_NZ
dc.date.accessioned2009-07-15T02:05:12Z
dc.date.available2009-07-15T02:05:12Z
dc.date.issued2008
dc.description.abstractAutomatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.en
dc.format.mimetypeapplication/pdf
dc.identifier.citationZimek, A., Buchwald, F., Frank, E. & Kramer, S. (2008). A study of hierarchical and flat classification of proteins. IEEE/ACM Transations on Computational Biology and Bioinformatics, 06, Oct, 2008.en
dc.identifier.doi10.1109/TCBB.2008.104en
dc.identifier.urihttps://hdl.handle.net/10289/2677
dc.language.isoen
dc.publisherIEEE Computer Societyen_NZ
dc.relation.isPartOfIEEE/ACM Transactions on Computational Biology and Bioinformaticsen_NZ
dc.relation.urihttp://www2.computer.org/portal/web/csdl/doi/10.1109/TCBB.2008.104en
dc.rights©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en
dc.subjectcomputer scienceen
dc.subjectclassifier design and evaluationen
dc.subjectscienceen
dc.subjectbiology and geneticsen
dc.subjectdata miningen
dc.subjectprotein classificationen
dc.subjecthierarchical classificationen
dc.subjectmulti-class classificationen
dc.subjectMachine learning
dc.titleA study of hierarchical and flat classification of proteinsen
dc.typeJournal Articleen
pubs.begin-page563en_NZ
pubs.editionJuly-Septen_NZ
pubs.elements-id34043
pubs.end-page571en_NZ
pubs.issue3en_NZ
pubs.volume7en_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
a study of hierarchical and flat classification of proteins.pdf
Size:
159.83 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.79 KB
Format:
Item-specific license agreed upon to submission
Description: