Loading...
Thumbnail Image
Item

An MDL estimate of the significance of rules

Abstract
This paper proposes a new method for measuring the performance of models-whether decision trees or sets of rules-inferred by machine learning methods. Inspired by the minimum description length (MDL) philosophy and theoretically rooted in information theory, the new method measures the complexity of text data with respect to the model. It has been evaluated on rule sets produced by several different machine learning schemes on a large number of standard data sets. When compared with the usual percentage correct measure, it is shown to agree with it in restricted cases. However, in other more general cases taken from real data sets-for example, when rule sets make multiple or no predictions-it disagrees substantially. It is argued that the MDL measure is more reasonable in these cases and represents a better way of assessing the significance of a rule set's performance. The question of the complexity of the rule set itself is not addressed in the paper.
Type
Working Paper
Type of thesis
Series
Computer Science Working Papers
Citation
Cleary, J. G., Legg, S. & Witten, I. H. (1996). An MDL estimate of the significance of rules. (Working paper 96/03). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
Date
1996-03
Publisher
Degree
Supervisors
Rights