Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes

      Yogarajan, Vithya; Gouk, Henry; Smith, Tony C.; Mayo, Michael; Pfahringer, Bernhard
      Thumbnail
      Files
      Comparing_High_Dimensional_YOGARAJAN_DOA06122019_AFV.pdf
      Accepted version, 217.5Kb
      DOI
       10.1007/978-3-030-41964-6_9
      Find in your library  
      Citation
      Export citation
      Yogarajan, V., Gouk, H., Smith, T. C., Mayo, M., & Pfahringer, B. (2020). Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes. In P. Sitek, M. Petranik, M. Krótkiewicz, & C. Srinilta (Eds.), Proceedings of 12th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2020) LNCS 12033 (pp. 97–108). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-41964-6_9
      Permanent Research Commons link: https://hdl.handle.net/10289/13591
      Abstract
      Word embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings.
      Date
      2020
      Type
      Conference Contribution
      Publisher
      Springer
      Rights
      © Springer Nature Switzerland AG .This is the author's accepted version. The final publication is available at Springer via dx.doi.org/10.1007/978-3-030-41964-6_9
      Collections
      • Computing and Mathematical Sciences Papers [1455]
      Show full item record  

      Usage

      Downloads, last 12 months
      149
       
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement