Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Efficient online evaluation of big data stream classifiers

      Bifet, Albert; de Francisci Morales, Gianmarco; Read, Jesse; Holmes, Geoffrey; Pfahringer, Bernhard
      Thumbnail
      Files
      p59-bifet.pdf
      Published version, 709.6Kb
      DOI
       10.1145/2783258.2783372
      Find in your library  
      Citation
      Export citation
      Bifet, A., de Francisci Morales, G., Read, J., Holmes, G., & Pfahringer, B. (2015). Efficient online evaluation of big data stream classifiers. In Proceedings of 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 59–68). New York, USA: ACM. http://doi.org/10.1145/2783258.2783372
      Permanent Research Commons link: https://hdl.handle.net/10289/10145
      Abstract
      The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
      Date
      2015
      Type
      Conference Contribution
      Publisher
      ACM
      Rights
      Copyright is held by the owner/author(s). Publication rights licensed to ACM.
      Collections
      • Computing and Mathematical Sciences Papers [1455]
      Show full item record  

      Usage

      Downloads, last 12 months
      320
       
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement