A novel two stage scheme utilizing the test set for model selection in text classification

Abstract

Text classification is a natural application domain for semi-supervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel simple two stage scheme based on dagging which allows for utilizing the test set in model selection. The dagging ensemble can also be used by itself instead of the original classifier. We evaluate the performance of a meta classifier choosing between various base learners and their respective dagging ensembles. The selection process seems to perform robustly especially for small percentages of available labels for training.

Citation

Pfahringer, B., Reutemann, P., Mayo, M. (2005). A novel two stage scheme utilizing the test set for model selection in text classification. Paper presented at the 18th Australian Joint Conference on Artificial Intelligence, University of Technology, Sydney, Australia, December 5-9, 2005.

Series name

Date

Publisher

University of Technology, Sydney

Degree

Type of thesis

Supervisor