Comparing classical criteria for selecting intra-class correlated features in Multimix

The mixture approach to clustering requires the user to specify both the number of components to be fitted to the model and the form of the component distributions. In the Multimix class of models, the user also has to decide on the correlation structure to be introduced into the model. The behaviour of some commonly used model selection criteria is investigated when using the finite mixture model to cluster data containing mixed categorical and continuous attributes. The performance of these criteria in selecting both the number of components in the model and the form of the correlation structure amongst the attributes when fitting the Multimix class of models is illustrated using simulated data and a real medical data set. It is found that criteria based on the integrated classification likelihood have the best performance in detecting the number of clusters to be fitted to the model and in selecting the form of the component distributions. The performance of the Bayesian information criterion in detecting the correct model depends on the partitioning structure among the attributes while the Akaike information criterion and classification likelihood criterion perform in a less satisfactory way.

Citation

Hunt, L. A., & Basford, K. E. (2016). Comparing classical criteria for selecting intra-class correlated features in Multimix. Computational Statistics & Data Analysis, 103, 350–366. https://doi.org/10.1016/j.csda.2016.05.018

Type

Journal Article

Date

2016

Publisher

Elsevier

Comparing classical criteria for selecting intra-class correlated features in Multimix

Authors

Files

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor