Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis

Bravo-Marquez, Felipe

Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis

Authors

Bravo-Marquez, Felipe

Files

thesis.pdf (1.73 MB)

Permanent Link

https://hdl.handle.net/10289/11225

Rights

Abstract

The most popular sentiment analysis task in Twitter is the automatic classification of tweets into sentiment categories such as positive, negative, and neutral. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. These models are affected by label sparsity, because the manual annotation of tweets is labour-intensive and time-consuming. This thesis addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets. In the first part of the thesis, we induce Twitter-specific opinion lexicons by training words level classifiers using representations that exploit different sources of information: (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabelled tweets. Experimental results show that the induced lexicons produce significant improvements over existing manually annotated lexicons for tweet-level polarity classification. In the second part of the thesis, we develop distant supervision methods for generating synthetic training data for Twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis.

Citation

Bravo-Marquez, F. (2017). Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/11225

Type

Thesis

Date

2017

Publisher

University of Waikato

Degree

Doctor of Philosophy (PhD)

Supervisor

Pfahringer, Bernhard
Frank, Eibe

Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor