Research Commons

Browsing by Author "Pfahringer, Bernhard"

Research Commons

Browsing by Author "Pfahringer, Bernhard"

Sort by: Order: Results:

  • Sun, Quan; Pfahringer, Bernhard (Springer-Verlag, 2013)
    In this paper, we present a novel meta-feature generation method in the context of meta-learning, which is based on rules that compare the performance of individual base learners in a one-against-one manner. In addition ...
  • Torgo, Luís; Ribeiro, Rita P.; Pfahringer, Bernhard; Branco, Paula (Springer, 2013)
    Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. ...
  • Sun, Quan; Pfahringer, Bernhard; Mayo, Michael (Springer, 2013)
    People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem ...
  • Bifet, Albert; Pfahringer, Bernhard; Read, Jesse; Holmes, Geoffrey (ACM, 2013)
    In the context of a data stream, a classifier must be able to learn from a theoretically-infinite stream of examples using limited time and memory, while being able to predict at any point. Many methods deal with this ...
  • Ienco, Dino; Bifet, Albert; Žliobaitė, Indrė; Pfahringer, Bernhard (Springer, 2013)
    Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction ...
  • Bifet, Albert; Read, Jesse; Žliobaitė, Indrė; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2013)
    Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data ...
  • Seeland, Madeleine; Kramer, Stefan; Pfahringer, Bernhard (ACM, 2013)
    The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels ...
  • Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan (ACM, 2012)
    This paper introduces a new multi-label classifier based on Boolean matrix decomposition. Boolean matrix decomposition is used to extract, from the full label matrix, latent labels representing useful Boolean combinations ...
  • Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (2012)
    Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further ...
  • Read, Jesse; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2012)
    Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically- infinite stream of examples using limited time and memory, while being ...
  • Seeland, Madeleine; Buchwald, Fabian; Kramer, Stefan; Pfahringer, Bernhard (ACM, 2012)
    This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in chem-informatics and for quantitative structure-activity relationships (QSARs). The method combines Locally ...
  • Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, 2012)
    Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, ...
  • Bifet, Albert; Frank, Eibe; Holmes, Geoffrey; Pfahringer, Bernhard (Association for Computing Machinery (ACM), 2012)
    The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to ...
  • Sun, Quan; Pfahringer, Bernhard (Springer, 2012)
    Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on ...
  • Sun, Quan; Pfahringer, Bernhard; Mayo, Michael (ACM, 2012)
    We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for ...
  • Pfahringer, Bernhard (Springer, 2011)
    We present and investigate ensembles of semi-random model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivalling the state of the art in numeric ...
  • Sarjant, Samuel; Pfahringer, Bernhard; Driessens, Kurt; Smith, Tony C. (IEEE, 2011)
    By defining a video-game environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, human-readable ...
  • Sun, Quan; Pfahringer, Bernhard (2011)
    Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The ...
  • Evans, Reuben James Emmanuel; Pfahringer, Bernhard; Holmes, Geoffrey (IEEE, 2011)
    Advances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these ...
  • Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard; Gavaldà, Ricard (ACM, 2011)
  • Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 2010)
    We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivaling the state of the art in numeric ...
  • Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 2010)
    Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios ...
  • Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (Springer-Verlag, 2010)
    Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. ...
  • Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Frank, Eibe (Springer Berlin, 2010)
    Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at ...
  • Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (2010)
    WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the ...
  • Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas (2010)
    Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling ...
  • Bifet, Albert; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Massachusetts Institute of Technology Press, 2010)
    Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well ...
  • Kranen, Philipp; Kremer, Hardy; Jensen, Timm; Seidl, Thomas; Bifet, Albert; Homes, Geoff; Pfahringer, Bernhard (2010)
    In today's applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in real-time. The quality of the obtained clusterings, i.e. how good ...
  • Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey; Frank, Eibe (Springer, 2009)
    The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its label-independence ...
  • Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2009)
    All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their ...
  • Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavalda, Ricard (2009)
    We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of ...
  • Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Gavalda, Ricard (ACM, 2009)
    Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is ...
  • Anderson, Grant; Pfahringer, Bernhard (Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2009)
    Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, ...
  • Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2009)
    Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative one-class classifiers trained only on sequences belonging to the target class they ...
  • Vanschoren, Joaquin; Pfahringer, Bernhard; Holmes, Geoffrey (2008)
    Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments ...
  • Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2008)
    Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this ...
  • Wu, Xing; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, 2008)
    Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number ...
  • Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (2008)
    Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning ...
  • Pfahringer, Bernhard; Anderson, Grant (Springer, Berlin, 2008)
    In this paper we investigate an approach to semi-supervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to ...
  • Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, Berlin, 2008)
    For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into pre-defined bins or sort ...
  • Anderson, Grant; Pfahringer, Bernhard (Springer, Berlin, 2008)
    Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying ...
  • Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (2008)
    Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov ...
  • Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey (IEEE, 2008)
    This paper presents a Pruned Sets method (PS) for multi-label classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account ...
  • Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter (Springer, Berlin, 2007)
    Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve ...
  • Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, 2007)
    Hoeffding trees are state-of-the-art for processing high-speed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost ...
  • Mutter, Stefan; Pfahringer, Bernhard (2007)
    This paper introduces the first author’s PhD project which has just got out of its initial stage. Biological sequence data is, on the one hand, highly structured. On the other hand there are large amounts of unlabelled ...
  • Pfahringer, Bernhard (2006)
    This document describes a novel semi-supervised approach to spam classification, which was successful at the ECML/PKDD 2006 spam classification challenge. A local learning method based on lazy projections was successfully ...
  • Frank, Eibe; Pfahringer, Bernhard (Springer, Berlin, 2006)
    Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision ...
  • Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (2006)
    We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. ...
  • Driessens, Kurt; Reutemann, Peter; Pfahringer, Bernhard; Leschi, Claire (Springer, Berlin, 2006)
    The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the ...
  • Pfahringer, Bernhard; Anderson, Grant (2006)
    Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, so-called stochastic discrimination provides a framework ...
  • Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Springer, Berlin, 2005)
    Hoeffding trees are state-of-the-art in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the ...
  • Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard; Witten, Ian H. (Springer, 2005)
    The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, ...
  • Holmes, Geoffrey; Richard, Kirkby; Pfahringer, Bernhard (2005)
    A thorough examination of the performance of Hoeffding trees, state-of-the-art in classification for data streams, on a range of datasets reveals that tie breaking, an essential but supposedly rare procedure, is employed ...
  • Pfahringer, Bernhard; Reutemann, Peter; Mayo, Michael (2005)
    Text classification is a natural application domain for semi-supervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel ...
  • Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, Berlin, 2005)
    This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with ...
  • Kibriya, Ashraf Masood; Frank, Eibe; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2005)
    This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it ...
  • Reutemann, Peter; Pfahringer, Bernhard; Frank, Eibe (Springer, 2005)
    Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to be amenable to a ...
  • Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (2004)
    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...
  • Blockeel, Hendrik; Džeroski, Sašo; Kompare, Boris; Kramer, Stefan; Pfahringer, Bernhard; Van Laer, Wim (Taylor & Francis, 2004)
    This paper is concerned with the use of AI techniques in ecology. More specifically, we present a novel application of inductive logic programming (ILP) in the area of quantitative structure-activity relationships (QSARs). ...
  • Pfahringer, Bernhard; Holmes, Geoffrey; Wang, Cheng (2004)
    In this paper we report on work in progress based on the induction of vast numbers of almost random rules. This work tries to combine and explore ideas from both Random Forests as well as Stochastic Discrimination. We ...
  • Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (University of Waikato, Department of Computer Science, 2003)
    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...
  • Frank, Eibe; Hall, Mark A.; Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 2003)
    Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome ...
  • Pfahringer, Bernhard; Holmes, Geoffrey (2003)
    A Simple algorithm base on the theory of stochastic discrimination is developed for the fast extraction of sub-graphs with potential discriminative power from a given set of pre-classified graphs. A preliminary experimental ...
  • Weidmann, Nils; Frank, Eibe; Pfahringer, Bernhard (Springer, Berlin, 2003)
    In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the instances in the bag and can ...
  • Sauban, Maximilien; Pfahringer, Bernhard (Springer, Berlin, 2003)
    This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee s model as a pre-processing filter to generate a dense representation for a given ...
  • Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A. (University of Waikato, Department of Computer Science, 2002)
    The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules. The original formulation ...
  • Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A. (Springer, Berlin, 2002)
    The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. The original formulation ...
  • Pfahringer, Bernhard (2002)
    When considering the merit of data mining challenges, we need to answer the question of whether the amount of academic outcome justifies the related expense of scarce research time. In this paper I will provide anecdotal ...
  • Pfahringer, Bernhard; Holmes, Geoffrey; Schmidberger, Gabi (Springer, Berlin, 2001)
    Wrappers have recently been used to obtain parameter optimizations for learning algorithms. In this paper we investigate the use of a wrapper for estimating the correct number of boosting ensembles in the presence of class ...
  • Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, Berlin, 2001)
    The alternating decision tree brings comprehensibility to the performance enhancing capabilities of boosting. A single interpretable tree is induced wherein knowledge is distributed across the nodes and multiple paths are ...
  • Pfahringer, Bernhard (2001)
    This paper describes my submission to one of the sub-problems formulated for the Predictive Toxicology Challenge 2001. The challenge is to predict the carcinogenicity of chemicals based on structural information only. I ...
  • Fürnkranz, Johannes; Pfahringer, Bernhard; Kaindl, Hermann; Kramer, Stefan (IOS press, 2000)
    We address the problem of advice-taking in a given domain, in particular for building a game-playing program. Our approach to solving it strives for the application of machine learning techniques throughout, i.e. for ...
  • Kovar, Klaus; Fürnkranz, Johannes; Petrak, Johann; Pfahringer, Bernhard; Trappl, Robert; Widmer, Gerhard (Taylor & Francis, 2000)
    This paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. A data mining algorithm first presented by ...
  • Kramer, Stefan; Widmer, Gerhard; Pfahringer, Bernhard; de Groeve, Michael (Springer, Berlin, 2000)
    This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of ...
  • Helma, Christoph; Kramer, Stefan; Pfahringer, Bernhard; Gottmann, Eva (Environmental health perspectives, 2000)
    Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the ...

Co-authors for Bernhard Pfahringer

Supervised by Bernhard Pfahringer

Showing up to 5 theses - most recently added to Research Commons first.

  • Sun, Quan (University of Waikato, 2014)
    When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning ...
  • Sarjant, Samuel (University of Waikato, 2013)
    Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by ...
  • Mutter, Stefan (University of Waikato, 2011)
    Detecting similarity in biological sequences is a key element to understanding the mechanisms of life. Researchers infer potential structural, functional or evolutionary relationships from similarity. However, the concept ...
  • Read, Jesse (University of Waikato, 2010)
    Multi-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety ...