Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes

dc.contributor.authorYogarajan, Vithyaen_NZ
dc.contributor.authorGouk, Henryen_NZ
dc.contributor.authorSmith, Tony C.en_NZ
dc.contributor.authorMayo, Michaelen_NZ
dc.contributor.authorPfahringer, Bernharden_NZ
dc.contributor.editorSitek, P.en_NZ
dc.contributor.editorPetranik, M.en_NZ
dc.contributor.editorKrótkiewicz, M.en_NZ
dc.contributor.editorSrinilta, C.en_NZ
dc.coverage.spatialPhuket, Thailanden_NZ
dc.date.accessioned2020-05-28T05:16:48Z
dc.date.available2020en_NZ
dc.date.available2020-05-28T05:16:48Z
dc.date.issued2020en_NZ
dc.description.abstractWord embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings.
dc.format.mimetypeapplication/pdf
dc.identifier.citationYogarajan, V., Gouk, H., Smith, T. C., Mayo, M., & Pfahringer, B. (2020). Comparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codes. In P. Sitek, M. Petranik, M. Krótkiewicz, & C. Srinilta (Eds.), Proceedings of 12th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2020) LNCS 12033 (pp. 97–108). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-41964-6_9en
dc.identifier.doi10.1007/978-3-030-41964-6_9en_NZ
dc.identifier.isbn9783030419639en_NZ
dc.identifier.issn0302-9743en_NZ
dc.identifier.urihttps://hdl.handle.net/10289/13591
dc.language.isoen
dc.publisherSpringeren_NZ
dc.relation.isPartOfProceedings of 12th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2020) LNCS 12033en_NZ
dc.rights© Springer Nature Switzerland AG .This is the author's accepted version. The final publication is available at Springer via dx.doi.org/10.1007/978-3-030-41964-6_9
dc.sourceACIIDS 2020en_NZ
dc.subjectcomputer scienceen_NZ
dc.subjectword embeddingsen_NZ
dc.subjectbinary classificationen_NZ
dc.subjectmachine learning for healthen_NZ
dc.subjectMachine learning
dc.titleComparing high dimensional word embeddings trained on medical text to bag-of-words for predicting medical codesen_NZ
dc.typeConference Contribution
dspace.entity.typePublication
pubs.begin-page97
pubs.end-page108
pubs.finish-date2020-03-26en_NZ
pubs.place-of-publicationCham, Switzerland
pubs.publication-statusPublisheden_NZ
pubs.start-date2020-03-23en_NZ

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Comparing_High_Dimensional_YOGARAJAN_DOA06122019_AFV.pdf
Size:
217.53 KB
Format:
Adobe Portable Document Format
Description:
Accepted version

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Research Commons Deposit Agreement 2017.pdf
Size:
188.11 KB
Format:
Adobe Portable Document Format
Description: