A study of hierarchical and flat classification of proteins

Zimek, Arthur; Buchwald, Fabian; Frank, Eibe; Kramer, Stefan

doi:10.1109/TCBB.2008.104

A study of hierarchical and flat classification of proteins

Authors

Files

a study of hierarchical and flat classification of proteins.pdf (159.83 KB)

Permanent Link

https://hdl.handle.net/10289/2677

DOI

10.1109/TCBB.2008.104

Rights

©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Abstract

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.

Citation

Zimek, A., Buchwald, F., Frank, E. & Kramer, S. (2008). A study of hierarchical and flat classification of proteins. IEEE/ACM Transations on Computational Biology and Bioinformatics, 06, Oct, 2008.

Type

Journal Article

Date

2008

Publisher

IEEE Computer Society

A study of hierarchical and flat classification of proteins

Authors

Files

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor