Scaling up semi-supervised learning: An efficient and effective LLGC variant

Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter

doi:10.1007/978-3-540-71701-0_25

Scaling up semi-supervised learning: An efficient and effective LLGC variant

Authors

Pfahringer, Bernhard

Leschi, Claire

Reutemann, Peter

Permanent Link

https://hdl.handle.net/10289/1433

DOI

10.1007/978-3-540-71701-0_25

Abstract

Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems.

Citation

Pfahringer, B., Leschi, C. & Reutemann, P.(2007). Scaling up semi-supervised learning: An efficient and effective LLGC variant. In Z.-H. Zhou, H. Li & Q. Yang(Eds.), Proceedings 11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007.(pp. 236-247). Berlin: Springer.

Type

Conference Contribution

Date

2007

Publisher

Springer, Berlin

Scaling up semi-supervised learning: An efficient and effective LLGC variant

Authors

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor