Hunt, L. A., & Basford, K. E. (2016). Comparing classical criteria for selecting intra-class correlated features in Multimix. Computational Statistics & Data Analysis, 103, 350–366. https://doi.org/10.1016/j.csda.2016.05.018
Permanent Research Commons link: https://hdl.handle.net/10289/12951
The mixture approach to clustering requires the user to specify both the number of components to be fitted to the model and the form of the component distributions. In the Multimix class of models, the user also has to decide on the correlation structure to be introduced into the model. The behaviour of some commonly used model selection criteria is investigated when using the finite mixture model to cluster data containing mixed categorical and continuous attributes. The performance of these criteria in selecting both the number of components in the model and the form of the correlation structure amongst the attributes when fitting the Multimix class of models is illustrated using simulated data and a real medical data set. It is found that criteria based on the integrated classification likelihood have the best performance in detecting the number of clusters to be fitted to the model and in selecting the form of the component distributions. The performance of the Bayesian information criterion in detecting the correct model depends on the partitioning structure among the attributes while the Akaike information criterion and classification likelihood criterion perform in a less satisfactory way.
This is an author’s accepted version of an article published in the journal: Computational Statistics & Data Analysis. © 2016 Elsevier.