Scaling up semi-supervised learning: An efficient and effective LLGC variant

Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter

Item

Scaling up semi-supervised learning: An efficient and effective LLGC variant

Pfahringer, Bernhard
;
Leschi, Claire
;
Reutemann, Peter

Abstract

Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems.

Type

Conference Contribution

Citation

Pfahringer, B., Leschi, C. & Reutemann, P.(2007). Scaling up semi-supervised learning: An efficient and effective LLGC variant. In Z.-H. Zhou, H. Li & Q. Yang(Eds.), Proceedings 11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007.(pp. 236-247). Berlin: Springer.

Date

2007

Publisher

Springer, Berlin

Scaling up semi-supervised learning: An efficient and effective LLGC variant

Pfahringer, Bernhard
;
Leschi, Claire
;
Reutemann, Peter

Abstract

Type

Type of thesis

Series

Citation

Date

Publisher

Degree

Supervisors

Rights

Permanent link

DOI

Publisher version

Collections