Browsing by Author "Pfahringer, Bernhard"
Now showing items 1100 of 101

Probability calibration trees
Leathart, Tim; Frank, Eibe; Holmes, Geoffrey; Pfahringer, Bernhard (2017)Obtaining accurate and well calibrated probability estimates from classifiers is useful in many applications, for example, when minimising the expected cost of classifications. Existing methods of calibrating probability ... 
A survey on feature drift adaptation: Definition, benchmark, challenges and future directions
Barddal, Jean Paul; Gomes, Heitor Murilo; Enembreck, Fabricio; Pfahringer, Bernhard (Elsevier, 2017)Data stream mining is a fast growing research topic due to the ubiquity of data in several realworld problems. Given their ephemeral nature, data stream sources are expected to undergo changes in data distribution, a ... 
Static techniques for reducing memory usage in the C implementation of Whiley programs
Weng, MinHsien; Pfahringer, Bernhard; Utting, Mark (ACM, 2017)Languages that use callbyvalue semantics, such as Whiley, can make program verification easier. But effcient implementation becomes harder, due to the overhead of copying and garbage collection. This paper describes how ... 
Building a Twitter opinion lexicon from automaticallyannotated tweets
BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (Elsevier, 20160915)Opinion lexicons, which are lists of terms labelled by sentiment, are widely used resources to support automatic sentiment analysis of textual passages. However, existing resources of this type exhibit some limitations ... 
AnnotateSampleAverage (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis
BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (IOS Press, 20160101)The classification of tweets into polarity classes is a popular task in sentiment analysis. Stateoftheart solutions to this problem are based on supervised machine learning models trained from manually annotated examples. ... 
Learning Distance Metrics for MultiLabel Classification
Gouk, Henry; Pfahringer, Bernhard; Cree, Michael J. (2016)Distance metric learning is a well studied problem in the field of machine learning, where it is typically used to improve the accuracy of instance based learning techniques. In this paper we propose a distance metric ... 
Building ensembles of adaptive nested dichotomies with randompair selection
Leathart, Tim; Pfahringer, Bernhard; Frank, Eibe (Springer, 2016)A system of nested dichotomies is a method of decomposing a multiclass problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains ... 
From opinion lexicons to sentiment classification of tweets and vice versa: a transfer learning approach
BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (IEEE Computer Society, 2016)Messagelevel and wordlevel polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these ... 
Determining word–emotion associations from tweets by multilabel classification
BravoMarquez, Felipe; Frank, Eibe; Mohammad, Saif M.; Pfahringer, Bernhard (IEEE Computer Society, 2016)The automatic detection of emotions in Twitter posts is a challenging task due to the informal nature of the language used in this platform. In this paper, we propose a methodology for expanding the NRC wordemotion ... 
On dynamic feature weighting for feature drifting data streams
Barddal, Jean Paul; Gomes, Heitor Murilo; Enembreck, Fabricio; Pfahringer, Bernhard; Bifet, Albert (Springer, 2016)The ubiquity of data streams has been encouraging the development of new incremental and adaptive learning algorithms. Data stream learners must be fast, memorybounded, but mainly, tailored to adapt to possible changes ... 
MEKA: A multilabel/multitarget extension to WEKA
Read, Jesse; Reutemann, Peter; Pfahringer, Bernhard; Holmes, Geoffrey (2016)Multilabel classification has rapidly attracted interest in the machine learning literature, and there are now a large number and considerable variety of methods for this type of learning. We present MEKA: an opensource ... 
Having a Blast: MetaLearning and Heterogeneous Ensembles for Data Streams
van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (IEEE, 20150101)Ensembles of classifiers are among the best performing classifiers available in many data mining applications. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of ... 
Positive, Negative, or Neutral: Learning an Expanded Opinion Lexicon from Emoticonannotated Tweets
BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (AAAI Press, 2015)We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains partofspeech (POS) disambiguated entries with a threedimensional probability distribution for positive, negative, and ... 
From unlabelled tweets to Twitterspecific opinion words
BravoMarquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (ACM, 2015)In this article, we propose a wordlevel classification model for automatically generating a Twitterspecific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: ... 
Case study on bagging stable classifiers for data streams
van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (2015)Ensembles of classifiers are among the strongest classifiers in most data mining applications. Bagging ensembles exploit the instability of baseclassifiers by training them on different bootstrap replicates. It has been ... 
Evaluation methods and decision theory for classification of streaming data with temporal dependence
Žliobaitė, Indrė; Bifet, Albert; Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2015)Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over ... 
Bound analysis for Whiley programs
Weng, MinHsien; Utting, Mark; Pfahringer, Bernhard (Elsevier, 2015)The Whiley compiler can generate naive C code, but the code is inefficient because it uses infinite integers and dynamic array sizes. Our project goal is to build up a compiler that can translate Whiley programs into ... 
Use of Ensembles of Fourier Spectra in Capturing Recurrent Concepts in Data Streams
Sakthithasan, Sakthithasan; Pears, Russel; Bifet, Albert; Pfahringer, Bernhard (IEEE, 2015)In this research, we apply ensembles of Fourier encoded spectra to capture and mine recurring concepts in a data stream environment. Previous research showed that compact versions of Decision Trees can be obtained by ... 
Efficient online evaluation of big data stream classifiers
Bifet, Albert; de Francisci Morales, Gianmarco; Read, Jess; Holmes, Geoffrey; Pfahringer, Bernhard (ACM, 2015)The evaluation of classifiers in data streams is fundamental so that poorlyperforming models can be identified, and either improved or replaced by betterperforming models. This is an increasingly relevant and important ... 
Change detection in categorical evolving data streams
Ienco, Dino; Bifet, Albert; Pfahringer, Bernhard; Poncelet, Pascal (ACM, 2014)Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical ... 
Hierarchical metarules for scalable metalearning
Sun, Quan; Pfahringer, Bernhard (Springer Verlag, 2014)The Pairwise MetaRules (PMR) method proposed in [18] has been shown to improve the predictive performances of several metalearning algorithms for the algorithm ranking problem. Given m target objects (e.g., algorithms), ... 
Algorithm selection on data streams
van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (Springer International Publishing, 2014)We explore the possibilities of metalearning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier ... 
Towards Metalearning over Data Streams
van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (CEURWS, 2014)Modern society produces vast streams of data. Many stream mining algorithms have been developed to capture general trends in these streams, and make predictions for future observations, but relatively little is known about ... 
Pairwise metarules for better metalearningbased algorithm ranking
Sun, Quan; Pfahringer, Bernhard (SpringerVerlag, 201307)In this paper, we present a novel metafeature generation method in the context of metalearning, which is based on rules that compare the performance of individual base learners in a oneagainstone manner. In addition ... 
Model selection based product kernel learning for regression on graphs
Seeland, Madeleine; Kramer, Stefan; Pfahringer, Bernhard (ACM, 2013)The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels ... 
Towards a framework for designing full model selection and optimization systems
Sun, Quan; Pfahringer, Bernhard; Mayo, Michael (Springer, 2013)People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. Endusers now have to face the new problem ... 
Clustering based active learning for evolving data streams
Ienco, Dino; Bifet, Albert; Žliobaitė, Indrė; Pfahringer, Bernhard (Springer, 2013)Data labeling is an expensive and timeconsuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction ... 
SMOTE for regression
Torgo, Luís; Ribeiro, Rita P.; Pfahringer, Bernhard; Branco, Paula (Springer, 2013)Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. ... 
Pitfalls in benchmarking data stream classification and how to avoid them
Bifet, Albert; Read, Jesse; Žliobaitė, Indrė; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2013)Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data ... 
Propositionalisation of multiinstance data using random forests
Frank, Eibe; Pfahringer, Bernhard (Springer, 2013)Multiinstance learning is a generalisation of attributevalue learning where examples for learning consist of labeled bags (i.e. multisets) of instances. This learning setting is more computationally challenging than ... 
Efficient data stream classification via probabilistic adaptive windows
Bifet, Albert; Pfahringer, Bernhard; Read, Jesse; Holmes, Geoffrey (ACM, 2013)In the context of a data stream, a classifier must be able to learn from a theoreticallyinfinite stream of examples using limited time and memory, while being able to predict at any point. Many methods deal with this ... 
Bagging ensemble selection for regression
Sun, Quan; Pfahringer, Bernhard (Springer, 2012)Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on ... 
Full model selection in the space of data mining operators
Sun, Quan; Pfahringer, Bernhard; Mayo, Michael (ACM, 2012)We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for ... 
Scalable and efficient multilabel classification for evolving data streams
Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, 2012)Many challenging real world problems involve multilabel data streams. Efficient methods exist for multilabel classification in nonstreaming scenarios. However, learning in evolving streaming scenarios is more challenging, ... 
Maximum Common Subgraph based locally weighted regression
Seeland, Madeleine; Buchwald, Fabian; Kramer, Stefan; Pfahringer, Bernhard (ACM, 2012)This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in cheminformatics and for quantitative structureactivity relationships (QSARs). The method combines Locally ... 
Multilabel classification using boolean matrix decomposition
Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan (ACM, 2012)This paper introduces a new multilabel classifier based on Boolean matrix decomposition. Boolean matrix decomposition is used to extract, from the full label matrix, latent labels representing useful Boolean combinations ... 
Batchincremental versus instanceincremental learning in dynamic and evolving data
Read, Jesse; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2012)Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically infinite stream of examples using limited time and memory, while being ... 
Ensembles of restricted Hoeffding trees
Bifet, Albert; Frank, Eibe; Holmes, Geoffrey; Pfahringer, Bernhard (Association for Computing Machinery (ACM), 2012)The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to ... 
Experiment databases: A new way to share, organize and learn from experiments
Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2012)Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further ... 
Using the online crossentropy method to learn relational policies for playing different games
Sarjant, Samuel; Pfahringer, Bernhard; Driessens, Kurt; Smith, Tony C. (IEEE, 2011)By defining a videogame environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, humanreadable ... 
Clustering for classification
Evans, Reuben James Emmanuel; Pfahringer, Bernhard; Holmes, Geoffrey (IEEE, 2011)Advances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these ... 
Semirandom model tree ensembles: An effective and scalable regression method
Pfahringer, Bernhard (Springer, 2011)We present and investigate ensembles of semirandom model trees as a novel regression method. Such ensembles combine the scalability of treebased methods with predictive performance rivalling the state of the art in numeric ... 
Bagging ensemble selection
Sun, Quan; Pfahringer, Bernhard (Springer, 2011)Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The ... 
Detecting sentiment change in Twitter streaming data
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard (JMLR, 2011)MOATweetReader is a realtime system to read tweets in real time, to detect changes, and to find the terms whose frequency changed. Twitter is a microblogging service built to discover what is happening at any moment in ... 
Mining frequent closed graphs on evolving data streams
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard (ACM, 2011)Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in realtime. Data stream mining faces hard constraints regarding time and space for processing, and also needs to ... 
Random model trees: an effective and scalable regression method
Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 201006)We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of treebased methods with predictive performance rivaling the state of the art in numeric ... 
Efficient multilabel classification for evolving data streams
Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 201005)Many real world problems involve data which can be considered as multilabel data streams. Efficient methods exist for multilabel classification in non streaming scenarios. However, learning in evolving streaming scenarios ... 
Clustering performance on evolving data streams: Assessing algorithms and evaluation measures within MOA
Kranen, Philipp; Kremer, Hardy; Jensen, Timm; Seidl, Thomas; Bifet, Albert; Homes, Geoff; Pfahringer, Bernhard (IEEE Computer Society, 2010)In today's applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in realtime. The quality of the obtained clusterings, i.e. how good ... 
Fast perceptron decision tree learning from evolving data streams
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Frank, Eibe (Springer Berlin, 2010)Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at ... 
MOA: Massive Online Analysis, a framework for stream classification and clustering.
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas (JMLR, 2010)Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling ... 
MOA: Massive Online Analysis
Bifet, Albert; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Massachusetts Institute of Technology Press, 2010)Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well ... 
Leveraging bagging for evolving data streams
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (SpringerVerlag, 2010)Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. ... 
WEKA−Experiences with a Java opensource project
Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (Microtome Publishing, 2010)WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the ... 
Improving adaptive bagging methods for evolving data streams
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard (Springer, 2009)We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and AdaptiveSize Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of ... 
The positive effects of negative information: Extending oneclass classification models in binary proteomic sequence classification
Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2009)Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative oneclass classifiers trained only on sequences belonging to the target class they ... 
Relational random forests based on random relational rules
Anderson, Grant; Pfahringer, Bernhard (Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2009)Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, ... 
Organizing the World’s Machine Learning Information
Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2009)All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their ... 
Classifier chains for multilabel classification
Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey; Frank, Eibe (Springer, 2009)The widely known binary relevance method for multilabel classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its labelindependence ... 
New ensemble methods for evolving data streams
Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Gavaldà, Ricard (ACM, 2009)Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is ... 
Learning from the past with experiment databases
Vanschoren, Joaquin; Pfahringer, Bernhard; Holmes, Geoffrey (University of Waikato, Department of Computer Science, 20080624)Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments ... 
Experiment Databases: Creating a New Platform for MetaLearning Research
Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (University of Porto, 2008)Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning ... 
Exploiting propositionalization based on random relational rules for semisupervised learning
Pfahringer, Bernhard; Anderson, Grant (Springer, Berlin, 2008)In this paper we investigate an approach to semisupervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to ... 
Handling numeric attributes in Hoeffding trees
Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, Berlin, 2008)For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into predefined bins or sort ... 
Clustering Relational Data Based on Randomized Propositionalization
Anderson, Grant; Pfahringer, Bernhard (Springer, Berlin, 2008)Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying ... 
Mining Arbitrarily Large Datasets Using Heuristic kNearest Neighbour Search
Wu, Xing; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, 2008)Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number ... 
Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis
Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2008)Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this ... 
Propositionalisation of multiple sequence alignments using probabilistic models
Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Canterbury University, 2008)Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov ... 
Multilabel classification using ensembles of pruned sets
Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey (IEEE, 2008)This paper presents a Pruned Sets method (PS) for multilabel classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account ... 
Scaling up semisupervised learning: An efficient and effective LLGC variant
Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter (Springer, Berlin, 2007)Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi supervised learning tries to exploit this abundance of unlabeled training data to improve ... 
A discriminative approach to structured biological data
Mutter, Stefan; Pfahringer, Bernhard (The University of Waikato, 2007)This paper introduces the first author’s PhD project which has just got out of its initial stage. Biological sequence data is, on the one hand, highly structured. On the other hand there are large amounts of unlabelled ... 
New Options for Hoeffding Trees
Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, 2007)Hoeffding trees are stateoftheart for processing highspeed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost ... 
Cache Hierarchy Inspired Compression: a Novel Architecture for Data Streams
Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (2006)We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. ... 
Random Relational Rules
Pfahringer, Bernhard; Anderson, Grant (2006)Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, socalled stochastic discrimination provides a framework ... 
Using weighted nearest neighbor to benefit from unlabeled data
Driessens, Kurt; Reutemann, Peter; Pfahringer, Bernhard; Leschi, Claire (Springer, Berlin, 2006)The development of datamining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the ... 
Improving on bagging with input smearing
Frank, Eibe; Pfahringer, Bernhard (Springer, Berlin, 2006)Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision ... 
A semisupervised spam mail detector
Pfahringer, Bernhard (2006)This document describes a novel semisupervised approach to spam classification, which was successful at the ECML/PKDD 2006 spam classification challenge. A local learning method based on lazy projections was successfully ... 
Stress testing Hoeffding trees
Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Springer, Berlin, 2005)Hoeffding trees are stateoftheart in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the ... 
A novel two stage scheme utilizing the test set for model selection in text classification
Pfahringer, Bernhard; Reutemann, Peter; Mayo, Michael (University of Technology, Sydney, 2005)Text classification is a natural application domain for semisupervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel ... 
Tiebreaking in Hoeffding trees
Holmes, Geoffrey; Richard, Kirkby; Pfahringer, Bernhard (ECML/PKDD, 2005)A thorough examination of the performance of Hoeffding trees, stateoftheart in classification for data streams, on a range of datasets reveals that tie breaking, an essential but supposedly rare procedure, is employed ... 
A Toolbox for Learning from Relational Data with Propositional and Multiinstance Learners
Reutemann, Peter; Pfahringer, Bernhard; Frank, Eibe (Springer, 2005)Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to be amenable to a ... 
Weka: A machine learning workbench for data mining
Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard; Witten, Ian H. (Springer, 2005)The Weka workbench is an organized collection of stateoftheart machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, ... 
Multinomial naive Bayes for text categorization revisited
Kibriya, Ashraf Masood; Frank, Eibe; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2005)This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it ... 
Clustering large datasets using cobweb and Kmeans in tandem
Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, Berlin, 2005)This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with ... 
Mining data streams using option trees (revised edition, 2004)
Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Department of Computer Science, The University of Waikato, 20040101)The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ... 
Millions of random rules
Pfahringer, Bernhard; Holmes, Geoffrey; Wang, Cheng (Knowledge Engineering Group, 2004)In this paper we report on work in progress based on the induction of vast numbers of almost random rules. This work tries to combine and explore ideas from both Random Forests as well as Stochastic Discrimination. We ... 
Experiments in Predicting Biodegradability
Blockeel, Hendrik; Džeroski, Sašo; Kompare, Boris; Kramer, Stefan; Pfahringer, Bernhard; Van Laer, Wim (Taylor & Francis, 2004)This paper is concerned with the use of AI techniques in ecology. More specifically, we present a novel application of inductive logic programming (ILP) in the area of quantitative structureactivity relationships (QSARs). ... 
Mining data streams using option trees
Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (University of Waikato, Department of Computer Science, 200309)The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ... 
Locally weighted naive Bayes
Frank, Eibe; Hall, Mark A.; Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 200304)Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome ... 
Propositionalization through stochastic discrimination
Pfahringer, Bernhard; Holmes, Geoffrey (Department of Informatics, University of Szeged, 2003)A Simple algorithm base on the theory of stochastic discrimination is developed for the fast extraction of subgraphs with potential discriminative power from a given set of preclassified graphs. A preliminary experimental ... 
Text categorisation using document profiling
Sauban, Maximilien; Pfahringer, Bernhard (Springer, Berlin, 2003)This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee s model as a preprocessing filter to generate a dense representation for a given ... 
A twolevel learning method for generalized multiinstance problems
Weidmann, Nils; Frank, Eibe; Pfahringer, Bernhard (Springer, Berlin, 2003)In traditional multiinstance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the instances in the bag and can ... 
A logic boosting approach to inducing multiclass alternating decision trees
Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A. (University of Waikato, Department of Computer Science, 200203)The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules. The original formulation ... 
Multiclass alternating decision trees
Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A. (Springer, Berlin, 2002)The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. The original formulation ... 
Data mining challenge problems: any lessons learned?
Pfahringer, Bernhard (2002)When considering the merit of data mining challenges, we need to answer the question of whether the amount of academic outcome justifies the related expense of scarce research time. In this paper I will provide anecdotal ... 
(The Futility of) Trying to Predict Carcinogenicity of Chemical Compounds
Pfahringer, Bernhard (2001)This paper describes my submission to one of the subproblems formulated for the Predictive Toxicology Challenge 2001. The challenge is to predict the carcinogenicity of chemicals based on structural information only. I ... 
Wrapping boosters against noise
Pfahringer, Bernhard; Holmes, Geoffrey; Schmidberger, Gabi (Springer, Berlin, 2001)Wrappers have recently been used to obtain parameter optimizations for learning algorithms. In this paper we investigate the use of a wrapper for estimating the correct number of boosting ensembles in the presence of class ... 
Optimizing the induction of alternating decision trees
Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, Berlin, 2001)The alternating decision tree brings comprehensibility to the performance enhancing capabilities of boosting. A single interpretable tree is induced wherein knowledge is distributed across the nodes and multiple paths are ... 
Learning to use operational advice
Fürnkranz, Johannes; Pfahringer, Bernhard; Kaindl, Hermann; Kramer, Stefan (IOS press, 2000)We address the problem of advicetaking in a given domain, in particular for building a gameplaying program. Our approach to solving it strives for the application of machine learning techniques throughout, i.e. for ... 
Data Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical Descriptors
Helma, Christoph; Kramer, Stefan; Pfahringer, Bernhard; Gottmann, Eva (Environmental health perspectives, 2000)Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the ... 
Searching for patterns in political event sequences: Experiments with the KEDs database
Kovar, Klaus; Fürnkranz, Johannes; Petrak, Johann; Pfahringer, Bernhard; Trappl, Robert; Widmer, Gerhard (Taylor & Francis, 2000)This paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. A data mining algorithm first presented by ...
Coauthors for Bernhard Pfahringer
Bernhard Pfahringer has 72 coauthors in Research Commons. Showing the 30 most frequent coauthors.
 Grant Anderson
 Jean Paul Barddal
 Albert Bifet
 Hendrik Blockeel
 Felipe BravoMarquez
 Kurt Driessens
 Fabricio Enembreck
 Eibe Frank
 Johannes Fürnkranz
 Ricard Gavaldà
 Heitor Murilo Gomes
 Mark A. Hall
 Geoffrey Holmes
 Dino Ienco
 Richard Brendon Kirkby
 Stefan Kramer
 Philipp Kranen
 Hardy Kremer
 Tim Leathart
 Claire Leschi
 Michael Mayo
 Stefan Mutter
 Jesse Read
 Peter Reutemann
 Madeleine Seeland
 Thomas Seidl
 Quan Sun
 Joaquin Vanschoren
 Jan N. van Rijn
 Indrė Žliobaitė
Supervised by Bernhard Pfahringer
Showing up to 5 theses  most recently added to Research Commons first.

Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis
BravoMarquez, Felipe (University of Waikato, 2017)The most popular sentiment analysis task in Twitter is the automatic classification of tweets into sentiment categories such as positive, negative, and neutral. Stateoftheart solutions to this problem are based on ... 
Heterogeneous Computing for Data Stream Mining
Petko, Vladimir (University of Waikato, 2016)Graphical Processing Units are defacto standard for acceleration of data parallel tasks in high performance computing. They are widely used to accelerate batch machine learning algorithms. Highend discrete GPUs are ... 
MetaLearning and the Full Model Selection Problem
Sun, Quan (University of Waikato, 2014)When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning ... 
Policy Search Based Relational Reinforcement Learning using the CrossEntropy Method
Sarjant, Samuel (University of Waikato, 2013)Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by ... 
Sequencebased protein classification: binary Profile Hidden Markov Models and propositionalisation
Mutter, Stefan (University of Waikato, 2011)Detecting similarity in biological sequences is a key element to understanding the mechanisms of life. Researchers infer potential structural, functional or evolutionary relationships from similarity. However, the concept ...