Using weighted nearest neighbor to benefit from unlabeled data

The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are “pre-labeled” by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this pre-labeled data, the nearest neighbor classifier consistently improves on the original classifier.

Citation

Driessens, K., Reutemann, P., Pfahringer, B. & Leschi, C.(2006). Using weighted nearest neighbor to benefit from unlabeled data. In W.K. Ng, M. Kitsuregawa & J. Li(Eds.), Proceedings of 10th Pacific-Asia Conference, PAKDD, Singapore, April 9-12,2006(pp. 97-106). Berlin: Springer.

Type

Conference Contribution

Date

2006

Publisher

Springer, Berlin

Using weighted nearest neighbor to benefit from unlabeled data

Authors

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor