Show simple item record  

dc.contributor.advisorPfahringer, Bernhard
dc.contributor.advisorSmith, Tony C.
dc.contributor.advisorMontiel, Jacob
dc.contributor.authorYogarajan, Vithya
dc.date.accessioned2022-02-25T01:15:33Z
dc.date.available2022-02-25T01:15:33Z
dc.date.issued2022
dc.identifier.urihttps://hdl.handle.net/10289/14757
dc.description.abstractRecent advancements in machine learning-based medical text multi-label classifications can be used to enhance the understanding of the human body and aid the need for patient care. This research considers predicting medical codes from electronic health records (EHRs) as multi-label problems, where the number of labels ranged from 15 to 923. It is motivated by the advancements in domain-specific language models to better understand and represent electronic health records and improve the predictive accuracy of medical codes. The thesis presents an extensive empirical study of language models for binary and multi-label medical text classifications. Domain-specific multi-sourced fastText pre-trained embeddings are introduced. Experimental results show considerable improvements to predictive accuracy when such embeddings are used to represent medical text. It is shown that using domain-specific transformer models outperforms results for multi-label problems with fixed sequence length. If processing time is not an issue for a long medical text, then TransformerXL will be the best model to use. Experimental results show significant improvements over other models, including state-of-the-art results, when TransformerXL is used for down-streaming tasks such as predicting medical codes. The thesis considers concatenated language models to handle long medical documents and text data from multiple sources of EHRs. Experimental results show improvements in overall micro and macro F1 scores, and such improvements are achieved with fewer resources. In addition, it is shown that concatenated domain-specific transformers improve F1 scores of infrequent labels across several multi-label problems, especially with long-tail labels.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherThe University of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectMachine learning
dc.subjectNatural language processing
dc.subjectHealth care applications
dc.subjectTransformers
dc.subjectMulti-label classification
dc.subject.lcshMedical records -- Data processing
dc.subject.lcshMedical records -- Classification
dc.subject.lcshMedical codes
dc.subject.lcshMedical informatics
dc.subject.lcshNatural language processing (Computer science)
dc.subject.lcshClassification rule mining
dc.subject.lcshData mining
dc.subject.lcshStatistical matching
dc.titleDomain-specific language models for multi-label classification of medical text
dc.typeThesis
thesis.degree.grantorThe University of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
dc.date.updated2022-02-18T03:10:35Z
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record