Browsing by Author "Pfahringer, Bernhard"

BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard
(ACM, 2015)
In this article, we propose a wordlevel classification model for automatically generating a Twitterspecific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: ...

BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard
(AAAI Press, 2015)
We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains partofspeech (POS) disambiguated entries with a threedimensional probability distribution for positive, negative, and ...

van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin
(2015)
Ensembles of classifiers are among the strongest classifiers in most data mining applications. Bagging ensembles exploit the instability of baseclassifiers by training them on different bootstrap replicates. It has been ...

Van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin
(CEURWS, 2014)
Modern society produces vast streams of data. Many stream mining algorithms have been developed to capture general trends in these streams, and make predictions for future observations, but relatively little is known about ...

van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin
(Springer International Publishing, 2014)
We explore the possibilities of metalearning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier ...

Ienco, Dino; Bifet, Albert; Pfahringer, Bernhard; Poncelet, Pascal
(ACM, 2014)
Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical ...

Sun, Quan; Pfahringer, Bernhard
(Springer Verlag, 2014)
The Pairwise MetaRules (PMR) method proposed in [18] has been shown to improve the predictive performances of several metalearning algorithms for the algorithm ranking problem. Given m target objects (e.g., algorithms), ...

Žliobaitė, Indrė; Bifet, Albert; Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2014)
Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over ...

Sun, Quan; Pfahringer, Bernhard
(SpringerVerlag, 2013)
In this paper, we present a novel metafeature generation method in the context of metalearning, which is based on rules that compare the performance of individual base learners in a oneagainstone manner. In addition ...

Bifet, Albert; Pfahringer, Bernhard; Read, Jesse; Holmes, Geoffrey
(ACM, 2013)
In the context of a data stream, a classifier must be able to learn from a theoreticallyinfinite stream of examples using limited time and memory, while being able to predict at any point. Many methods deal with this ...

Bifet, Albert; Read, Jesse; Žliobaitė, Indrė; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2013)
Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data ...

Ienco, Dino; Bifet, Albert; Žliobaitė, Indrė; Pfahringer, Bernhard
(Springer, 2013)
Data labeling is an expensive and timeconsuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction ...

Torgo, Luís; Ribeiro, Rita P.; Pfahringer, Bernhard; Branco, Paula
(Springer, 2013)
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. ...

Seeland, Madeleine; Kramer, Stefan; Pfahringer, Bernhard
(ACM, 2013)
The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels ...

Sun, Quan; Pfahringer, Bernhard; Mayo, Michael
(Springer, 2013)
People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. Endusers now have to face the new problem ...

Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan
(ACM, 2012)
This paper introduces a new multilabel classifier based on Boolean matrix decomposition. Boolean matrix decomposition is used to extract, from the full label matrix, latent labels representing useful Boolean combinations ...

Read, Jesse; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2012)
Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically infinite stream of examples using limited time and memory, while being ...

Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey
(2012)
Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further ...

Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard
(Springer, 2012)
Many challenging real world problems involve multilabel data streams. Efficient methods exist for multilabel classification in nonstreaming scenarios. However, learning in evolving streaming scenarios is more challenging, ...

Seeland, Madeleine; Buchwald, Fabian; Kramer, Stefan; Pfahringer, Bernhard
(ACM, 2012)
This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in cheminformatics and for quantitative structureactivity relationships (QSARs). The method combines Locally ...

Bifet, Albert; Frank, Eibe; Holmes, Geoffrey; Pfahringer, Bernhard
(Association for Computing Machinery (ACM), 2012)
The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to ...

Sun, Quan; Pfahringer, Bernhard
(Springer, 2012)
Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on ...

Sun, Quan; Pfahringer, Bernhard; Mayo, Michael
(ACM, 2012)
We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for ...

Pfahringer, Bernhard
(Springer, 2011)
We present and investigate ensembles of semirandom model trees as a novel regression method. Such ensembles combine the scalability of treebased methods with predictive performance rivalling the state of the art in numeric ...

Evans, Reuben James Emmanuel; Pfahringer, Bernhard; Holmes, Geoffrey
(IEEE, 2011)
Advances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard
(ACM, 2011)
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to ...

Sarjant, Samuel; Pfahringer, Bernhard; Driessens, Kurt; Smith, Tony C.
(IEEE, 2011)
By defining a videogame environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, humanreadable ...

Sun, Quan; Pfahringer, Bernhard
(2011)
Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The ...

Pfahringer, Bernhard
(University of Waikato, Department of Computer Science, 2010)
We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of treebased methods with predictive performance rivaling the state of the art in numeric ...

Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard
(University of Waikato, Department of Computer Science, 2010)
Many real world problems involve data which can be considered as multilabel data streams. Efficient methods exist for multilabel classification in non streaming scenarios. However, learning in evolving streaming scenarios ...

Bifet, Albert; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard
(Massachusetts Institute of Technology Press, 2010)
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well ...

Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H.
(2010)
WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the ...

Kranen, Philipp; Kremer, Hardy; Jensen, Timm; Seidl, Thomas; Bifet, Albert; Homes, Geoff; Pfahringer, Bernhard
(2010)
In today's applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in realtime. The quality of the obtained clusterings, i.e. how good ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard
(SpringerVerlag, 2010)
Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas
(2010)
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Frank, Eibe
(Springer Berlin, 2010)
Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at ...

Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2009)
Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative oneclass classifiers trained only on sequences belonging to the target class they ...

Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey; Frank, Eibe
(Springer, 2009)
The widely known binary relevance method for multilabel classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its labelindependence ...

Anderson, Grant; Pfahringer, Bernhard
(Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2009)
Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Gavaldà, Ricard
(ACM, 2009)
Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is ...

Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2009)
All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their ...

Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard
(2009)
We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and AdaptiveSize Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of ...

Vanschoren, Joaquin; Pfahringer, Bernhard; Holmes, Geoffrey
(2008)
Thousands of Machine Learning research papers contain experimental
comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past
experiments ...

Wu, Xing; Holmes, Geoffrey; Pfahringer, Bernhard
(Springer, 2008)
Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number ...

Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2008)
Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this ...

Pfahringer, Bernhard; Anderson, Grant
(Springer, Berlin, 2008)
In this paper we investigate an approach to semisupervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to ...

Anderson, Grant; Pfahringer, Bernhard
(Springer, Berlin, 2008)
Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying ...

Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon
(Springer, Berlin, 2008)
For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into predefined bins or sort ...

Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey
(IEEE, 2008)
This paper presents a Pruned Sets method (PS) for multilabel classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account ...

Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey
(2008)
Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning ...

Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey
(2008)
Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov ...

Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon
(Springer, 2007)
Hoeffding trees are stateoftheart for processing highspeed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost ...

Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter
(Springer, Berlin, 2007)
Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi supervised learning tries to exploit this abundance of unlabeled training data to improve ...

Mutter, Stefan; Pfahringer, Bernhard
(2007)
This paper introduces the first author’s PhD project which has just got out of its initial stage. Biological sequence data is, on the one hand, highly structured. On the other hand there are large amounts of unlabelled ...

Driessens, Kurt; Reutemann, Peter; Pfahringer, Bernhard; Leschi, Claire
(Springer, Berlin, 2006)
The development of datamining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the ...

Frank, Eibe; Pfahringer, Bernhard
(Springer, Berlin, 2006)
Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon
(2006)
We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. ...

Pfahringer, Bernhard; Anderson, Grant
(2006)
Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, socalled stochastic discrimination provides a framework ...

Pfahringer, Bernhard
(2006)
This document describes a novel semisupervised approach to spam classification, which was successful at the ECML/PKDD 2006 spam classification challenge. A local learning method based on lazy projections was successfully ...

Kibriya, Ashraf Masood; Frank, Eibe; Pfahringer, Bernhard; Holmes, Geoffrey
(Springer, 2005)
This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it ...

Reutemann, Peter; Pfahringer, Bernhard; Frank, Eibe
(Springer, 2005)
Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to be amenable to a ...

Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard
(Springer, Berlin, 2005)
Hoeffding trees are stateoftheart in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the ...

Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard
(Springer, Berlin, 2005)
This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with ...

Holmes, Geoffrey; Richard, Kirkby; Pfahringer, Bernhard
(2005)
A thorough examination of the performance of Hoeffding trees, stateoftheart in classification for data streams, on a range of datasets reveals that tie breaking, an essential but supposedly rare procedure, is employed ...

Pfahringer, Bernhard; Reutemann, Peter; Mayo, Michael
(2005)
Text classification is a natural application domain for semisupervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel ...

Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard; Witten, Ian H.
(Springer, 2005)
The Weka workbench is an organized collection of stateoftheart machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, ...

Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard
(2004)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...

Pfahringer, Bernhard; Holmes, Geoffrey; Wang, Cheng
(2004)
In this paper we report on work in progress based on the induction of vast numbers of almost random rules. This work tries to combine and explore ideas from both Random Forests as well as Stochastic Discrimination. We ...

Blockeel, Hendrik; Džeroski, Sašo; Kompare, Boris; Kramer, Stefan; Pfahringer, Bernhard; Van Laer, Wim
(Taylor & Francis, 2004)
This paper is concerned with the use of AI techniques in ecology. More specifically, we present a novel application of inductive logic programming (ILP) in the area of quantitative structureactivity relationships (QSARs). ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon
(University of Waikato, Department of Computer Science, 2003)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...

Frank, Eibe; Hall, Mark A.; Pfahringer, Bernhard
(University of Waikato, Department of Computer Science, 2003)
Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome ...

Sauban, Maximilien; Pfahringer, Bernhard
(Springer, Berlin, 2003)
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee s model as a preprocessing filter to generate a dense representation for a given ...

Weidmann, Nils; Frank, Eibe; Pfahringer, Bernhard
(Springer, Berlin, 2003)
In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the instances in the bag and can ...

Pfahringer, Bernhard; Holmes, Geoffrey
(2003)
A Simple algorithm base on the theory of stochastic discrimination is developed for the fast extraction of subgraphs with potential discriminative power from a given set of preclassified graphs. A preliminary experimental ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A.
(University of Waikato, Department of Computer Science, 2002)
The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules. The original formulation ...

Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A.
(Springer, Berlin, 2002)
The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. The original formulation ...

Pfahringer, Bernhard
(2002)
When considering the merit of data mining challenges, we need to answer the question of whether the amount of academic outcome justifies the related expense of scarce research time. In this paper I will provide anecdotal ...

Pfahringer, Bernhard; Holmes, Geoffrey; Schmidberger, Gabi
(Springer, Berlin, 2001)
Wrappers have recently been used to obtain parameter optimizations for learning algorithms. In this paper we investigate the use of a wrapper for estimating the correct number of boosting ensembles in the presence of class ...

Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon
(Springer, Berlin, 2001)
The alternating decision tree brings comprehensibility to the performance enhancing capabilities of boosting. A single interpretable tree is induced wherein knowledge is distributed across the nodes and multiple paths are ...

Pfahringer, Bernhard
(2001)
This paper describes my submission to one of the subproblems formulated for the Predictive Toxicology Challenge 2001. The challenge is to predict the carcinogenicity of chemicals based on structural information only. I ...

Kramer, Stefan; Widmer, Gerhard; Pfahringer, Bernhard; de Groeve, Michael
(Springer, Berlin, 2000)
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with SCART, a tree induction algorithm, and study various ways of ...

Fürnkranz, Johannes; Pfahringer, Bernhard; Kaindl, Hermann; Kramer, Stefan
(IOS press, 2000)
We address the problem of advicetaking in a given domain, in particular for building a gameplaying program. Our approach to solving it strives for the application of machine learning techniques throughout, i.e. for ...

Kovar, Klaus; Fürnkranz, Johannes; Petrak, Johann; Pfahringer, Bernhard; Trappl, Robert; Widmer, Gerhard
(Taylor & Francis, 2000)
This paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. A data mining algorithm first presented by ...

Helma, Christoph; Kramer, Stefan; Pfahringer, Bernhard; Gottmann, Eva
(Environmental health perspectives, 2000)
Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the ...
Coauthors for Bernhard Pfahringer
Supervised by Bernhard Pfahringer