Thumbnail Image

Propositionalisation of multi-instance data using random forests

Multi-instance learning is a generalisation of attribute-value learning where examples for learning consist of labeled bags (i.e. multi-sets) of instances. This learning setting is more computationally challenging than attribute-value learning and a natural fit for important application areas of machine learning such as classification of molecules and image classification. One approach to solve multi-instance learning problems is to apply propositionalisation, where bags of data are converted into vectors of attribute-value pairs so that a standard propositional (i.e. attribute-value) learning algorithm can be applied. This approach is attractive because of the large number of propositional learning algorithms that have been developed and can thus be applied to the propositionalised data. In this paper, we empirically investigate a variant of an existing propositionalisation method called TLC. TLC uses a single decision tree to obtain propositionalised data. Our variant applies a random forest instead and is motivated by the potential increase in robustness that this may yield. We present results on synthetic and real-world data from the above two application domains showing that it indeed yields increased classification accuracy when applying boosting and support vector machines to classify the propositionalised data.
Conference Contribution
Type of thesis
Frank, E., & Pfahringer, B. (2013). Propositionalisation of multi-instance data using random forests. In S. Cranefield & A. Nayak (Eds.), Proceedings of 26th Australasian Joint Conference on Advances in Artificial Intelligence (Vol. LNAI 8272, pp. 362–373). Dunedin, NZ: Springer. https://doi.org/10.1007/978-3-319-03680-9_37
© 2013 Springer International Publishing Switzerland. This is the author's accepted version. The final publication is available at Springer via dx.doi.org/10.1007/978-3-319-03680-9_37