Full model selection in the space of data mining operators

Abstract

We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for GAPSO-FMS), in which a GA is used for searching the optimal structure of a data mining solution, and PSO is used for searching the optimal parameter set for a particular structure instance. Given a classification or regression problem, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are applicable to the problem, including data cleansing, data sampling, feature transformation/selection and algorithm operators. The solution can also be represented graphically in a human readable form. Experimental results demonstrate the benefit of the algorithm.

Citation

Sun, Q., Pfahringer, B. & Mayo, M. (2012). Full model selection in the space of data mining operators. GECCO’12 Companion, July 7–11, 2012, Philadelphia, PA, USA.

Series name

Date

Publisher

ACM

Degree

Type of thesis

Supervisor