Schmidberger, G. & Frank, E. (2005). Unsupervised discretization using tree-based density estimation. In A. Jorge et al. (Eds), Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005. (pp. 240-251). Berlin: Springer.
Permanent Research Commons link: http://hdl.handle.net/10289/1444
This paper presents an unsupervised discretization method that performs density estimation for univariate data. The subintervals that the discretization produces can be used as the bins of a histogram. Histograms are a very simple and broadly understood means for displaying data, and our method automatically adapts bin widths to the data. It uses the log-likelihood as the scoring function to select cut points and the cross-validated log-likelihood to select the number of intervals. We compare this method with equal-width discretization where we also select the number of bins using the cross-validated log-likelihood and with equal-frequency discretization.