Browsing by Author "Pfahringer, Bernhard"

van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin
(2015)
Ensembles of classifiers are among the strongest classifiers in most data mining applications. Bagging ensembles exploit the instability of baseclassifiers by training them on different bootstrap replicates. It has been ...

BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard
(AAAI Press, 2015)
We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains partofspeech (POS) disambiguated entries with a threedimensional probability distribution for positive, negative, and ...

BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard
(ACM, 2015)
In this article, we propose a wordlevel classification model for automatically generating a Twitterspecific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: ...

van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin
(Springer International Publishing, 2014)
We explore the possibilities of metalearning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier ...

Van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin
(CEURWS, 2014)
Modern society produces vast streams of data. Many stream mining algorithms have been developed to capture general trends in these streams, and make predictions for future observations, but relatively little is known about ...

Sun, Quan; Pfahringer, Bernhard
(Springer Verlag, 2014)
The Pairwise MetaRules (PMR) method proposed in [18] has been shown to improve the predictive performances of several metalearning algorithms for the algorithm ranking problem. Given m target objects (e.g., algorithms), ...

Ienco, Dino; Bifet, Albert; Pfahringer, Bernhard; Poncelet, Pascal
(ACM, 2014)
Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical ...

Žliobaitė, Indrė; Bifet, Albert; Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2014)
Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over ...

Sun, Quan; Pfahringer, Bernhard
(SpringerVerlag, 2013)
In this paper, we present a novel metafeature generation method in the context of metalearning, which is based on rules that compare the performance of individual base learners in a oneagainstone manner. In addition ...

Sun, Quan; Pfahringer, Bernhard; Mayo, Michael
(Springer, 2013)
People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. Endusers now have to face the new problem ...

Bifet, Albert; Read, Jesse; Žliobaitė, Indrė; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2013)
Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data ...

Ienco, Dino; Bifet, Albert; Žliobaitė, Indrė; Pfahringer, Bernhard
(Springer, 2013)
Data labeling is an expensive and timeconsuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction ...

Bifet, Albert; Pfahringer, Bernhard; Read, Jesse; Holmes, Geoffrey
(ACM, 2013)
In the context of a data stream, a classifier must be able to learn from a theoreticallyinfinite stream of examples using limited time and memory, while being able to predict at any point. Many methods deal with this ...

Seeland, Madeleine; Kramer, Stefan; Pfahringer, Bernhard
(ACM, 2013)
The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels ...

Torgo, Luís; Ribeiro, Rita P.; Pfahringer, Bernhard; Branco, Paula
(Springer, 2013)
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. ...

Sun, Quan; Pfahringer, Bernhard
(Springer, 2012)
Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on ...

Bifet, Albert; Frank, Eibe; Holmes, Geoffrey; Pfahringer, Bernhard
(Association for Computing Machinery (ACM), 2012)
The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to ...

Read, Jesse; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2012)
Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically infinite stream of examples using limited time and memory, while being ...

Seeland, Madeleine; Buchwald, Fabian; Kramer, Stefan; Pfahringer, Bernhard
(ACM, 2012)
This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in cheminformatics and for quantitative structureactivity relationships (QSARs). The method combines Locally ...

Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan
(ACM, 2012)
This paper introduces a new multilabel classifier based on Boolean matrix decomposition. Boolean matrix decomposition is used to extract, from the full label matrix, latent labels representing useful Boolean combinations ...

Sun, Quan; Pfahringer, Bernhard; Mayo, Michael
(ACM, 2012)
We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for ...

Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard
(Springer, 2012)
Many challenging real world problems involve multilabel data streams. Efficient methods exist for multilabel classification in nonstreaming scenarios. However, learning in evolving streaming scenarios is more challenging, ...

Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey
(2012)
Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further ...

Evans, Reuben James Emmanuel; Pfahringer, Bernhard; Holmes, Geoffrey
(IEEE, 2011)
Advances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these ...

Pfahringer, Bernhard
(Springer, 2011)
We present and investigate ensembles of semirandom model trees as a novel regression method. Such ensembles combine the scalability of treebased methods with predictive performance rivalling the state of the art in numeric ...

Sun, Quan; Pfahringer, Bernhard
(2011)
Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The ...

Sarjant, Samuel; Pfahringer, Bernhard; Driessens, Kurt; Smith, Tony C.
(IEEE, 2011)
By defining a videogame environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, humanreadable ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard
(ACM, 2011)
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to ...

Pfahringer, Bernhard
(University of Waikato, Department of Computer Science, 2010)
We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of treebased methods with predictive performance rivaling the state of the art in numeric ...

Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard
(University of Waikato, Department of Computer Science, 2010)
Many real world problems involve data which can be considered as multilabel data streams. Efficient methods exist for multilabel classification in non streaming scenarios. However, learning in evolving streaming scenarios ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas
(2010)
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Frank, Eibe
(Springer Berlin, 2010)
Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at ...

Kranen, Philipp; Kremer, Hardy; Jensen, Timm; Seidl, Thomas; Bifet, Albert; Homes, Geoff; Pfahringer, Bernhard
(2010)
In today's applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in realtime. The quality of the obtained clusterings, i.e. how good ...

Bifet, Albert; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard
(Massachusetts Institute of Technology Press, 2010)
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard
(SpringerVerlag, 2010)
Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. ...

Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H.
(2010)
WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the ...

Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2009)
Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative oneclass classifiers trained only on sequences belonging to the target class they ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard
(2009)
We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and AdaptiveSize Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of ...

Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2009)
All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their ...

Anderson, Grant; Pfahringer, Bernhard
(Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2009)
Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, ...

Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey; Frank, Eibe
(Springer, 2009)
The widely known binary relevance method for multilabel classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its labelindependence ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Gavaldà, Ricard
(ACM, 2009)
Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is ...

Vanschoren, Joaquin; Pfahringer, Bernhard; Holmes, Geoffrey
(2008)
Thousands of Machine Learning research papers contain experimental
comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past
experiments ...

Wu, Xing; Holmes, Geoffrey; Pfahringer, Bernhard
(Springer, 2008)
Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number ...

Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2008)
Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this ...

Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon
(Springer, Berlin, 2008)
For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into predefined bins or sort ...

Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey
(2008)
Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov ...

Anderson, Grant; Pfahringer, Bernhard
(Springer, Berlin, 2008)
Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying ...

Pfahringer, Bernhard; Anderson, Grant
(Springer, Berlin, 2008)
In this paper we investigate an approach to semisupervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to ...

Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey
(2008)
Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning ...

Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey
(IEEE, 2008)
This paper presents a Pruned Sets method (PS) for multilabel classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account ...

Mutter, Stefan; Pfahringer, Bernhard
(2007)
This paper introduces the first author’s PhD project which has just got out of its initial stage. Biological sequence data is, on the one hand, highly structured. On the other hand there are large amounts of unlabelled ...

Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon
(Springer, 2007)
Hoeffding trees are stateoftheart for processing highspeed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost ...

Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter
(Springer, Berlin, 2007)
Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi supervised learning tries to exploit this abundance of unlabeled training data to improve ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon
(2006)
We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. ...

Pfahringer, Bernhard; Anderson, Grant
(2006)
Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, socalled stochastic discrimination provides a framework ...

Frank, Eibe; Pfahringer, Bernhard
(Springer, Berlin, 2006)
Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision ...

Pfahringer, Bernhard
(2006)
This document describes a novel semisupervised approach to spam classification, which was successful at the ECML/PKDD 2006 spam classification challenge. A local learning method based on lazy projections was successfully ...

Driessens, Kurt; Reutemann, Peter; Pfahringer, Bernhard; Leschi, Claire
(Springer, Berlin, 2006)
The development of datamining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the ...

Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard
(Springer, Berlin, 2005)
Hoeffding trees are stateoftheart in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the ...

Reutemann, Peter; Pfahringer, Bernhard; Frank, Eibe
(Springer, 2005)
Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to be amenable to a ...

Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard
(Springer, Berlin, 2005)
This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with ...

Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard; Witten, Ian H.
(Springer, 2005)
The Weka workbench is an organized collection of stateoftheart machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, ...

Pfahringer, Bernhard; Reutemann, Peter; Mayo, Michael
(2005)
Text classification is a natural application domain for semisupervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel ...

Holmes, Geoffrey; Richard, Kirkby; Pfahringer, Bernhard
(2005)
A thorough examination of the performance of Hoeffding trees, stateoftheart in classification for data streams, on a range of datasets reveals that tie breaking, an essential but supposedly rare procedure, is employed ...

Kibriya, Ashraf Masood; Frank, Eibe; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2005)
This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it ...

Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard
(2004)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...

Blockeel, Hendrik; Džeroski, Sašo; Kompare, Boris; Kramer, Stefan; Pfahringer, Bernhard; Van Laer, Wim
(Taylor & Francis, 2004)
This paper is concerned with the use of AI techniques in ecology. More specifically, we present a novel application of inductive logic programming (ILP) in the area of quantitative structureactivity relationships (QSARs). ...

Pfahringer, Bernhard; Holmes, Geoffrey; Wang, Cheng
(2004)
In this paper we report on work in progress based on the induction of vast numbers of almost random rules. This work tries to combine and explore ideas from both Random Forests as well as Stochastic Discrimination. We ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon
(University of Waikato, Department of Computer Science, 2003)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...

Frank, Eibe; Hall, Mark A.; Pfahringer, Bernhard
(University of Waikato, Department of Computer Science, 2003)
Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome ...

Sauban, Maximilien; Pfahringer, Bernhard
(Springer, Berlin, 2003)
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee s model as a preprocessing filter to generate a dense representation for a given ...

Pfahringer, Bernhard; Holmes, Geoffrey
(2003)
A Simple algorithm base on the theory of stochastic discrimination is developed for the fast extraction of subgraphs with potential discriminative power from a given set of preclassified graphs. A preliminary experimental ...

Weidmann, Nils; Frank, Eibe; Pfahringer, Bernhard
(Springer, Berlin, 2003)
In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the instances in the bag and can ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A.
(University of Waikato, Department of Computer Science, 2002)
The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules. The original formulation ...

Pfahringer, Bernhard
(2002)
When considering the merit of data mining challenges, we need to answer the question of whether the amount of academic outcome justifies the related expense of scarce research time. In this paper I will provide anecdotal ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A.
(Springer, Berlin, 2002)
The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. The original formulation ...

Pfahringer, Bernhard; Holmes, Geoffrey; Schmidberger, Gabi
(Springer, Berlin, 2001)
Wrappers have recently been used to obtain parameter optimizations for learning algorithms. In this paper we investigate the use of a wrapper for estimating the correct number of boosting ensembles in the presence of class ...

Pfahringer, Bernhard
(2001)
This paper describes my submission to one of the subproblems formulated for the Predictive Toxicology Challenge 2001. The challenge is to predict the carcinogenicity of chemicals based on structural information only. I ...

Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon
(Springer, Berlin, 2001)
The alternating decision tree brings comprehensibility to the performance enhancing capabilities of boosting. A single interpretable tree is induced wherein knowledge is distributed across the nodes and multiple paths are ...

Helma, Christoph; Kramer, Stefan; Pfahringer, Bernhard; Gottmann, Eva
(Environmental health perspectives, 2000)
Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the ...

Kovar, Klaus; Fürnkranz, Johannes; Petrak, Johann; Pfahringer, Bernhard; Trappl, Robert; Widmer, Gerhard
(Taylor & Francis, 2000)
This paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. A data mining algorithm first presented by ...

Fürnkranz, Johannes; Pfahringer, Bernhard; Kaindl, Hermann; Kramer, Stefan
(IOS press, 2000)
We address the problem of advicetaking in a given domain, in particular for building a gameplaying program. Our approach to solving it strives for the application of machine learning techniques throughout, i.e. for ...

Kramer, Stefan; Widmer, Gerhard; Pfahringer, Bernhard; de Groeve, Michael
(Springer, Berlin, 2000)
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with SCART, a tree induction algorithm, and study various ways of ...
Coauthors for Bernhard Pfahringer
Supervised by Bernhard Pfahringer