Pfahringer, BernhardFrank, EibeBravo-Marquez, Felipe2017-07-262017-07-262017Bravo-Marquez, F. (2017). Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/11225https://hdl.handle.net/10289/11225The most popular sentiment analysis task in Twitter is the automatic classification of tweets into sentiment categories such as positive, negative, and neutral. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. These models are affected by label sparsity, because the manual annotation of tweets is labour-intensive and time-consuming. This thesis addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets. In the first part of the thesis, we induce Twitter-specific opinion lexicons by training words level classifiers using representations that exploit different sources of information: (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabelled tweets. Experimental results show that the induced lexicons produce significant improvements over existing manually annotated lexicons for tweet-level polarity classification. In the second part of the thesis, we develop distant supervision methods for generating synthetic training data for Twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis.application/pdfenAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.sentiment analysistwitterlexicon expansiondata miningopinion miningAcquiring and Exploiting Lexical Knowledge for Twitter Sentiment AnalysisThesis2017-07-18