Loading...
Naive Bayes for text classification with unbalanced classes
Abstract
Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1,2]. In this paper, we present another transformation that is designed to combat a potential problem with the application of MNB to unbalanced datasets. We propose an appropriate correction by adjusting attribute priors. This correction can be implemented as another data normalization step, and we show that it can significantly improve the area under the ROC curve. We also show that the modified version of MNB is very closely related to the simple centroid-based classifier and compare the two methods empirically.
Type
Conference Contribution
Type of thesis
Series
Citation
Date
2006-01-01
Publisher
SPRINGER-VERLAG BERLIN
Degree
Supervisors
Rights
This is an author’s accepted version of a conference paper published in Proc 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. © 2006 Copyright held by the authors.