Show simple item record  

dc.contributor.advisorPfahringer, Bernhard
dc.contributor.advisorFrank, Eibe
dc.contributor.authorBravo-Marquez, Felipe
dc.date.accessioned2017-07-26T01:40:40Z
dc.date.available2017-07-26T01:40:40Z
dc.date.issued2017
dc.identifier.citationBravo-Marquez, F. (2017). Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/11225en
dc.identifier.urihttps://hdl.handle.net/10289/11225
dc.description.abstractThe most popular sentiment analysis task in Twitter is the automatic classification of tweets into sentiment categories such as positive, negative, and neutral. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. These models are affected by label sparsity, because the manual annotation of tweets is labour-intensive and time-consuming. This thesis addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets. In the first part of the thesis, we induce Twitter-specific opinion lexicons by training words level classifiers using representations that exploit different sources of information: (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabelled tweets. Experimental results show that the induced lexicons produce significant improvements over existing manually annotated lexicons for tweet-level polarity classification. In the second part of the thesis, we develop distant supervision methods for generating synthetic training data for Twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherUniversity of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectsentiment analysis
dc.subjecttwitter
dc.subjectlexicon expansion
dc.subjectdata mining
dc.subjectopinion mining
dc.titleAcquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis
dc.typeThesis
thesis.degree.grantorUniversity of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
dc.date.updated2017-07-18T00:05:55Z
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record