Clustering Relational Data Based on Randomized Propositionalization

Anderson, Grant; Pfahringer, Bernhard

doi:10.1007/978-3-540-78469-2_8

Clustering Relational Data Based on Randomized Propositionalization

Authors

Anderson, Grant

Pfahringer, Bernhard

Permanent Link

https://hdl.handle.net/10289/1726

DOI

10.1007/978-3-540-78469-2_8

Abstract

Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multi-relational data. We describe how random rules are generated and then turned into Boolean-valued features. Clustering generally is not straightforward to evaluate, but preliminary experimental results on a number of standard ILP datasets show promising results. Clusters generated without class information usually agree well with the true class labels of cluster members, i.e. class distributions inside clusters generally differ significantly from the global class distributions. The two-tiered algorithm described shows good scalability due to the randomized nature of the first step and the availability of efficient propositional clustering algorithms for the second step.

Citation

Anderson, G. & Pfahringer, B. (2008) Clustering Relational Data Based on Randomized Propositionalization. In Proceedings of 17th International Conference, ILP 2007, Corvallis, OR, USA, June 19-21, 2007(pp. 39-48). Berlin: Springer

Type

Conference Contribution

Date

2008

Publisher

Springer, Berlin

Clustering Relational Data Based on Randomized Propositionalization

Authors

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor