Now showing items 1-86 of 86

  • MEKA: A multi-label/multi-target extension to WEKA

    Read, Jesse; Reutemann, Peter; Pfahringer, Bernhard; Holmes, Geoff (2016)
    Multi-label classification has rapidly attracted interest in the machine learning literature, and there are now a large number and considerable variety of methods for this type of learning. We present MEKA: an open-source ...
  • Case study on bagging stable classifiers for data streams

    van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (2015)
    Ensembles of classifiers are among the strongest classi-fiers in most data mining applications. Bagging ensembles exploit the instability of base-classifiers by training them on different bootstrap replicates. It has been ...
  • Efficient online evaluation of big data stream classifiers

    Bifet, Albert; de Francisci Morales, Gianmarco; Read, Jess; Holmes, Geoff; Pfahringer, Bernhard (ACM, 2015)
    The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important ...
  • From unlabelled tweets to Twitter-specific opinion words

    Bravo-Marquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (ACM, 2015)
    In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: ...
  • Positive, Negative, or Neutral: Learning an Expanded Opinion Lexicon from Emoticon-annotated Tweets

    Bravo-Marquez, Felipe; Frank, Eibe; Pfahringer, Bernhard (AAAI Press, 2015)
    We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains part-of-speech (POS) disambiguated entries with a three-dimensional probability distribution for positive, negative, and ...
  • Towards Meta-learning over Data Streams

    Van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (CEUR-WS, 2014)
    Modern society produces vast streams of data. Many stream mining algorithms have been developed to capture general trends in these streams, and make predictions for future observations, but relatively little is known about ...
  • Hierarchical meta-rules for scalable meta-learning

    Sun, Quan; Pfahringer, Bernhard (Springer Verlag, 2014)
    The Pairwise Meta-Rules (PMR) method proposed in [18] has been shown to improve the predictive performances of several metalearning algorithms for the algorithm ranking problem. Given m target objects (e.g., algorithms), ...
  • Algorithm selection on data streams

    van Rijn, Jan N.; Holmes, Geoffrey; Pfahringer, Bernhard; Vanschoren, Joaquin (Springer International Publishing, 2014)
    We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier ...
  • Change detection in categorical evolving data streams

    Ienco, Dino; Bifet, Albert; Pfahringer, Bernhard; Poncelet, Pascal (ACM, 2014)
    Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical ...
  • Evaluation methods and decision theory for classification of streaming data with temporal dependence

    Žliobaitė, Indrė; Bifet, Albert; Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2014)
    Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over ...
  • Pairwise meta-rules for better meta-learning-based algorithm ranking

    Sun, Quan; Pfahringer, Bernhard (Springer-Verlag, 2013-07)
    In this paper, we present a novel meta-feature generation method in the context of meta-learning, which is based on rules that compare the performance of individual base learners in a one-against-one manner. In addition ...
  • Model selection based product kernel learning for regression on graphs

    Seeland, Madeleine; Kramer, Stefan; Pfahringer, Bernhard (ACM, 2013)
    The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels ...
  • Efficient data stream classification via probabilistic adaptive windows

    Bifet, Albert; Pfahringer, Bernhard; Read, Jesse; Holmes, Geoffrey (ACM, 2013)
    In the context of a data stream, a classifier must be able to learn from a theoretically-infinite stream of examples using limited time and memory, while being able to predict at any point. Many methods deal with this ...
  • SMOTE for regression

    Torgo, Luís; Ribeiro, Rita P.; Pfahringer, Bernhard; Branco, Paula (Springer, 2013)
    Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. ...
  • Clustering based active learning for evolving data streams

    Ienco, Dino; Bifet, Albert; Žliobaitė, Indrė; Pfahringer, Bernhard (Springer, 2013)
    Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction ...
  • Towards a framework for designing full model selection and optimization systems

    Sun, Quan; Pfahringer, Bernhard; Mayo, Michael (Springer, 2013)
    People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem ...
  • Pitfalls in benchmarking data stream classification and how to avoid them

    Bifet, Albert; Read, Jesse; Žliobaitė, Indrė; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2013)
    Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data ...
  • Multi-label classification using boolean matrix decomposition

    Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan (ACM, 2012)
    This paper introduces a new multi-label classifier based on Boolean matrix decomposition. Boolean matrix decomposition is used to extract, from the full label matrix, latent labels representing useful Boolean combinations ...
  • Maximum Common Subgraph based locally weighted regression

    Seeland, Madeleine; Buchwald, Fabian; Kramer, Stefan; Pfahringer, Bernhard (ACM, 2012)
    This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in chem-informatics and for quantitative structure-activity relationships (QSARs). The method combines Locally ...
  • Full model selection in the space of data mining operators

    Sun, Quan; Pfahringer, Bernhard; Mayo, Michael (ACM, 2012)
    We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for ...
  • Bagging ensemble selection for regression

    Sun, Quan; Pfahringer, Bernhard (Springer, 2012)
    Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on ...
  • Ensembles of restricted Hoeffding trees

    Bifet, Albert; Frank, Eibe; Holmes, Geoffrey; Pfahringer, Bernhard (Association for Computing Machinery (ACM), 2012)
    The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to ...
  • Experiment databases: A new way to share, organize and learn from experiments

    Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (2012)
    Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further ...
  • Scalable and efficient multi-label classification for evolving data streams

    Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, 2012)
    Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, ...
  • Batch-incremental versus instance-incremental learning in dynamic and evolving data

    Read, Jesse; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2012)
    Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically- infinite stream of examples using limited time and memory, while being ...
  • Semi-random model tree ensembles: An effective and scalable regression method

    Pfahringer, Bernhard (Springer, 2011)
    We present and investigate ensembles of semi-random model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivalling the state of the art in numeric ...
  • Clustering for classification

    Evans, Reuben James Emmanuel; Pfahringer, Bernhard; Holmes, Geoffrey (IEEE, 2011)
    Advances in technology have provided industry with an array of devices for collecting data. The frequency and scale of data collection means that there are now many large datasets being generated. To find patterns in these ...
  • Using the online cross-entropy method to learn relational policies for playing different games

    Sarjant, Samuel; Pfahringer, Bernhard; Driessens, Kurt; Smith, Tony C. (IEEE, 2011)
    By defining a video-game environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, human-readable ...
  • Bagging ensemble selection

    Sun, Quan; Pfahringer, Bernhard (2011)
    Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The ...
  • Mining frequent closed graphs on evolving data streams

    Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard (ACM, 2011)
    Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to ...
  • Random model trees: an effective and scalable regression method

    Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 2010-06)
    We present and investigate ensembles of randomized model trees as a novel regression method. Such ensembles combine the scalability of tree-based methods with predictive performance rivaling the state of the art in numeric ...
  • Efficient multi-label classification for evolving data streams

    Read, Jesse; Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 2010-05)
    Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios ...
  • Clustering performance on evolving data streams: Assessing algorithms and evaluation measures within MOA

    Kranen, Philipp; Kremer, Hardy; Jensen, Timm; Seidl, Thomas; Bifet, Albert; Homes, Geoff; Pfahringer, Bernhard (2010)
    In today's applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in real-time. The quality of the obtained clusterings, i.e. how good ...
  • MOA: Massive Online Analysis, a framework for stream classification and clustering.

    Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas (2010)
    Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling ...
  • Fast perceptron decision tree learning from evolving data streams

    Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Frank, Eibe (Springer Berlin, 2010)
    Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at ...
  • Leveraging bagging for evolving data streams

    Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard (Springer-Verlag, 2010)
    Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. ...
  • MOA: Massive Online Analysis

    Bifet, Albert; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Massachusetts Institute of Technology Press, 2010)
    Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well ...
  • WEKA−Experiences with a Java open-source project

    Bouckaert, Remco R.; Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Pfahringer, Bernhard; Reutemann, Peter; Witten, Ian H. (2010)
    WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the ...
  • Organizing the World’s Machine Learning Information

    Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2009)
    All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their ...
  • The positive effects of negative information: Extending one-class classification models in binary proteomic sequence classification

    Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2009)
    Profile Hidden Markov Models (PHMMs) have been widely used as models for Multiple Sequence Alignments. By their nature, they are generative one-class classifiers trained only on sequences belonging to the target class they ...
  • Improving adaptive bagging methods for evolving data streams

    Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà, Ricard (2009)
    We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of ...
  • Relational random forests based on random relational rules

    Anderson, Grant; Pfahringer, Bernhard (Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2009)
    Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, ...
  • New ensemble methods for evolving data streams

    Bifet, Albert; Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Gavaldà, Ricard (ACM, 2009)
    Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is ...
  • Classifier chains for multi-label classification

    Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey; Frank, Eibe (Springer, 2009)
    The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived inadequacy of its label-independence ...
  • Learning from the past with experiment databases

    Vanschoren, Joaquin; Pfahringer, Bernhard; Holmes, Geoffrey (2008-06-24)
    Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments ...
  • Experiment Databases: Creating a New Platform for Meta-Learning Research

    Vanschoren, Joaquin; Blockeel, Hendrik; Pfahringer, Bernhard; Holmes, Geoffrey (2008)
    Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning ...
  • Propositionalisation of multiple sequence alignments using probabilistic models

    Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (2008)
    Multiple sequence alignments play a central role in Bioinformatics. Most alignment representations are designed to facilitate knowledge extraction by human experts. Additionally statistical models like Profile Hidden Markov ...
  • Handling numeric attributes in Hoeffding trees

    Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, Berlin, 2008)
    For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into pre-defined bins or sort ...
  • Exploiting propositionalization based on random relational rules for semi-supervised learning

    Pfahringer, Bernhard; Anderson, Grant (Springer, Berlin, 2008)
    In this paper we investigate an approach to semi-supervised learning based on randomized propositionalization, which allows for applying standard propositional classification algorithms like support vector machines to ...
  • Propositionalisation of Profile Hidden Markov Models for Biological Sequence Analysis

    Mutter, Stefan; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2008)
    Hidden Markov Models are a widely used generative model for analysing sequence data. A variant, Profile Hidden Markov Models are a special case used in Bioinformatics to represent, for example, protein families. In this ...
  • Mining Arbitrarily Large Datasets Using Heuristic k-Nearest Neighbour Search

    Wu, Xing; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, 2008)
    Nearest Neighbour Search (NNS) is one of the top ten data mining algorithms. It is simple and effective but has a time complexity that is the product of the number of instances and the number of dimensions. When the number ...
  • Clustering Relational Data Based on Randomized Propositionalization

    Anderson, Grant; Pfahringer, Bernhard (Springer, Berlin, 2008)
    Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying ...
  • Multi-label classification using ensembles of pruned sets

    Read, Jesse; Pfahringer, Bernhard; Holmes, Geoffrey (IEEE, 2008)
    This paper presents a Pruned Sets method (PS) for multi-label classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account ...
  • Scaling up semi-supervised learning: An efficient and effective LLGC variant

    Pfahringer, Bernhard; Leschi, Claire; Reutemann, Peter (Springer, Berlin, 2007)
    Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve ...
  • A discriminative approach to structured biological data

    Mutter, Stefan; Pfahringer, Bernhard (2007)
    This paper introduces the first author’s PhD project which has just got out of its initial stage. Biological sequence data is, on the one hand, highly structured. On the other hand there are large amounts of unlabelled ...
  • New Options for Hoeffding Trees

    Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, 2007)
    Hoeffding trees are state-of-the-art for processing high-speed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost ...
  • Cache Hierarchy Inspired Compression: a Novel Architecture for Data Streams

    Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (2006)
    We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. ...
  • Random Relational Rules

    Pfahringer, Bernhard; Anderson, Grant (2006)
    Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, so-called stochastic discrimination provides a framework ...
  • A semi-supervised spam mail detector

    Pfahringer, Bernhard (2006)
    This document describes a novel semi-supervised approach to spam classification, which was successful at the ECML/PKDD 2006 spam classification challenge. A local learning method based on lazy projections was successfully ...
  • Improving on bagging with input smearing

    Frank, Eibe; Pfahringer, Bernhard (Springer, Berlin, 2006)
    Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision ...
  • Using weighted nearest neighbor to benefit from unlabeled data

    Driessens, Kurt; Reutemann, Peter; Pfahringer, Bernhard; Leschi, Claire (Springer, Berlin, 2006)
    The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the ...
  • A Toolbox for Learning from Relational Data with Propositional and Multi-instance Learners

    Reutemann, Peter; Pfahringer, Bernhard; Frank, Eibe (Springer, 2005)
    Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to be amenable to a ...
  • Tie-breaking in Hoeffding trees

    Holmes, Geoffrey; Richard, Kirkby; Pfahringer, Bernhard (2005)
    A thorough examination of the performance of Hoeffding trees, state-of-the-art in classification for data streams, on a range of datasets reveals that tie breaking, an essential but supposedly rare procedure, is employed ...
  • A novel two stage scheme utilizing the test set for model selection in text classification

    Pfahringer, Bernhard; Reutemann, Peter; Mayo, Michael (2005)
    Text classification is a natural application domain for semi-supervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel ...
  • Stress- testing Hoeffding trees

    Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Springer, Berlin, 2005)
    Hoeffding trees are state-of-the-art in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the ...
  • Weka: A machine learning workbench for data mining

    Frank, Eibe; Hall, Mark A.; Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard; Witten, Ian H. (Springer, 2005)
    The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, ...
  • Clustering large datasets using cobweb and K-means in tandem

    Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard (Springer, Berlin, 2005)
    This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with ...
  • Multinomial naive Bayes for text categorization revisited

    Kibriya, Ashraf Masood; Frank, Eibe; Pfahringer, Bernhard; Holmes, Geoffrey (Springer, 2005)
    This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning. More specifically, it ...
  • Mining data streams using option trees (revised edition, 2004)

    Holmes, Geoffrey; Kirkby, Richard Brendon; Pfahringer, Bernhard (Department of Computer Science, The University of Waikato, 2004-01-01)
    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...
  • Millions of random rules

    Pfahringer, Bernhard; Holmes, Geoffrey; Wang, Cheng (2004)
    In this paper we report on work in progress based on the induction of vast numbers of almost random rules. This work tries to combine and explore ideas from both Random Forests as well as Stochastic Discrimination. We ...
  • Experiments in Predicting Biodegradability

    Blockeel, Hendrik; Džeroski, Sašo; Kompare, Boris; Kramer, Stefan; Pfahringer, Bernhard; Van Laer, Wim (Taylor & Francis, 2004)
    This paper is concerned with the use of AI techniques in ecology. More specifically, we present a novel application of inductive logic programming (ILP) in the area of quantitative structure-activity relationships (QSARs). ...
  • Mining data streams using option trees

    Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon (University of Waikato, Department of Computer Science, 2003-09)
    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over ...
  • Locally weighted naive Bayes

    Frank, Eibe; Hall, Mark A.; Pfahringer, Bernhard (University of Waikato, Department of Computer Science, 2003-04)
    Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome ...
  • Text categorisation using document profiling

    Sauban, Maximilien; Pfahringer, Bernhard (Springer, Berlin, 2003)
    This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee s model as a pre-processing filter to generate a dense representation for a given ...
  • Propositionalization through stochastic discrimination

    Pfahringer, Bernhard; Holmes, Geoffrey (2003)
    A Simple algorithm base on the theory of stochastic discrimination is developed for the fast extraction of sub-graphs with potential discriminative power from a given set of pre-classified graphs. A preliminary experimental ...
  • A two-level learning method for generalized multi-instance problems

    Weidmann, Nils; Frank, Eibe; Pfahringer, Bernhard (Springer, Berlin, 2003)
    In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the instances in the bag and can ...
  • A logic boosting approach to inducing multiclass alternating decision trees

    Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A. (University of Waikato, Department of Computer Science, 2002-03)
    The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules. The original formulation ...
  • Data mining challenge problems: any lessons learned?

    Pfahringer, Bernhard (2002)
    When considering the merit of data mining challenges, we need to answer the question of whether the amount of academic outcome justifies the related expense of scarce research time. In this paper I will provide anecdotal ...
  • Multiclass alternating decision trees

    Holmes, Geoffrey; Pfahringer, Bernhard; Kirkby, Richard Brendon; Frank, Eibe; Hall, Mark A. (Springer, Berlin, 2002)
    The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. The original formulation ...
  • (The Futility of) Trying to Predict Carcinogenicity of Chemical Compounds

    Pfahringer, Bernhard (2001)
    This paper describes my submission to one of the sub-problems formulated for the Predictive Toxicology Challenge 2001. The challenge is to predict the carcinogenicity of chemicals based on structural information only. I ...
  • Wrapping boosters against noise

    Pfahringer, Bernhard; Holmes, Geoffrey; Schmidberger, Gabi (Springer, Berlin, 2001)
    Wrappers have recently been used to obtain parameter optimizations for learning algorithms. In this paper we investigate the use of a wrapper for estimating the correct number of boosting ensembles in the presence of class ...
  • Optimizing the induction of alternating decision trees

    Pfahringer, Bernhard; Holmes, Geoffrey; Kirkby, Richard Brendon (Springer, Berlin, 2001)
    The alternating decision tree brings comprehensibility to the performance enhancing capabilities of boosting. A single interpretable tree is induced wherein knowledge is distributed across the nodes and multiple paths are ...
  • Learning to use operational advice

    Fürnkranz, Johannes; Pfahringer, Bernhard; Kaindl, Hermann; Kramer, Stefan (IOS press, 2000)
    We address the problem of advice-taking in a given domain, in particular for building a game-playing program. Our approach to solving it strives for the application of machine learning techniques throughout, i.e. for ...
  • Data Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical Descriptors

    Helma, Christoph; Kramer, Stefan; Pfahringer, Bernhard; Gottmann, Eva (Environmental health perspectives, 2000)
    Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the ...
  • Prediction of ordinal classes using regression trees

    Kramer, Stefan; Widmer, Gerhard; Pfahringer, Bernhard; de Groeve, Michael (Springer, Berlin, 2000)
    This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of ...
  • Searching for patterns in political event sequences: Experiments with the KEDs database

    Kovar, Klaus; Fürnkranz, Johannes; Petrak, Johann; Pfahringer, Bernhard; Trappl, Robert; Widmer, Gerhard (Taylor & Francis, 2000)
    This paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. A data mining algorithm first presented by ...

Showing up to 5 theses - most recently added to Research Commons first.

  • Heterogeneous Computing for Data Stream Mining

    Petko, Vladimir (University of Waikato, 2016)
    Graphical Processing Units are de-facto standard for acceleration of data parallel tasks in high performance computing. They are widely used to accelerate batch machine learning algorithms. High-end discrete GPUs are ...
  • Meta-Learning and the Full Model Selection Problem

    Sun, Quan (University of Waikato, 2014)
    When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning ...
  • Policy Search Based Relational Reinforcement Learning using the Cross-Entropy Method

    Sarjant, Samuel (University of Waikato, 2013)
    Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by ...
  • Sequence-based protein classification: binary Profile Hidden Markov Models and propositionalisation

    Mutter, Stefan (University of Waikato, 2011)
    Detecting similarity in biological sequences is a key element to understanding the mechanisms of life. Researchers infer potential structural, functional or evolutionary relationships from similarity. However, the concept ...
  • Scalable Multi-label Classification

    Read, Jesse (University of Waikato, 2010)
    Multi-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety ...