Show simple item record  

dc.contributor.authorYogarajan, Vithyaen_NZ
dc.contributor.authorGouk, Henryen_NZ
dc.contributor.authorSmith, Tony C.en_NZ
dc.contributor.authorMayo, Michaelen_NZ
dc.contributor.authorPfahringer, Bernharden_NZ
dc.contributor.editorSitek, P.en_NZ
dc.contributor.editorPetranik, M.en_NZ
dc.contributor.editorKrótkiewicz, M.en_NZ
dc.contributor.editorSrinilta, C.en_NZ
dc.coverage.spatialPhuket, Thailanden_NZ
dc.date.accessioned2020-05-28T05:16:48Z
dc.date.available2020en_NZ
dc.date.available2020-05-28T05:16:48Z
dc.date.issued2020en_NZ
dc.identifier.citationYogarajan, V., Gouk, H., Smith, T. C., Mayo, M., & Pfahringer, B. (2020). Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes. In P. Sitek, M. Petranik, M. Krótkiewicz, & C. Srinilta (Eds.), Proceedings of 12th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2020) LNCS 12033 (pp. 97–108). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-41964-6_9en
dc.identifier.isbn9783030419639en_NZ
dc.identifier.issn0302-9743en_NZ
dc.identifier.urihttps://hdl.handle.net/10289/13591
dc.description.abstractWord embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherSpringeren_NZ
dc.rights© Springer Nature Switzerland AG .This is the author's accepted version. The final publication is available at Springer via dx.doi.org/10.1007/978-3-030-41964-6_9
dc.sourceACIIDS 2020en_NZ
dc.subjectcomputer scienceen_NZ
dc.subjectword embeddingsen_NZ
dc.subjectbinary classificationen_NZ
dc.subjectmachine learning for healthen_NZ
dc.subjectMachine learning
dc.titleComparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codesen_NZ
dc.typeConference Contribution
dc.identifier.doi10.1007/978-3-030-41964-6_9en_NZ
dc.relation.isPartOfProceedings of 12th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2020) LNCS 12033en_NZ
pubs.begin-page97
pubs.elements-id251557
pubs.end-page108
pubs.finish-date2020-03-26en_NZ
pubs.place-of-publicationCham, Switzerland
pubs.publication-statusPublisheden_NZ
pubs.start-date2020-03-23en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record