Smoothing in Probability Estimation Trees

Han, Zhimeng

Smoothing in Probability Estimation Trees

Authors

Han, Zhimeng

Files

thesis.pdf (829.91 KB)

Permanent Link

https://hdl.handle.net/10289/5701

Rights

Abstract

Classification learning is a type of supervised machine learning technique that uses a classification model (e.g. decision tree) to predict unknown class labels for previously unseen instances. In many applications it can be very useful to additionally obtain class probabilities for the different class labels. Decision trees that yield these probabilities are also called probability estimation trees (PETs). Smoothing is a technique used to improve the probability estimates. There are several existing smoothing methods, such as the Laplace correction, M-Estimate smoothing and M-Branch smoothing. Smoothing does not just apply to PETs. In the field of text compression, PPM in particular, smoothing methods play a important role. This thesis migrates smoothing methods from text compression to PETs. The newly migrated methods in PETs are compared with the best of the existing smoothing methods considered in this thesis under different experiment setups. Unpruned, pruned and bagged trees are considered in the experiments. The main finding is that the PPM-based methods yield the best probability estimate when used with bagged trees, but not when used with individual (pruned or unpruned) trees.

Citation

Han, Z. (2011). Smoothing in Probability Estimation Trees (Thesis, Master of Science (MSc)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/5701

Type

Thesis

Date

2011

Publisher

University of Waikato

Degree

Master of Science (MSc)

Supervisor

Frank, Eibe

Smoothing in Probability Estimation Trees

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor