Show simple item record  

dc.contributor.authorBravo-Marquez, Felipeen_NZ
dc.contributor.authorFrank, Eibeen_NZ
dc.contributor.authorPfahringer, Bernharden_NZ
dc.date.accessioned2016-11-28T01:04:28Z
dc.date.available2016-09-15en_NZ
dc.date.available2016-11-28T01:04:28Z
dc.date.issued2016-09-15en_NZ
dc.identifier.citationBravo-Marquez, F., Frank, E., & Pfahringer, B. (2016). Building a Twitter opinion lexicon from automatically-annotated tweets. KNOWLEDGE-BASED SYSTEMS, 108, 65–78. http://doi.org/10.1016/j.knosys.2016.05.018en
dc.identifier.issn0950-7051en_NZ
dc.identifier.urihttps://hdl.handle.net/10289/10754
dc.description.abstractOpinion lexicons, which are lists of terms labelled by sentiment, are widely used resources to support automatic sentiment analysis of textual passages. However, existing resources of this type exhibit some limitations when applied to social media messages such as tweets (posts in Twitter), because they are unable to capture the diversity of informal expressions commonly found in this type of media. In this article, we present a method that combines information from automatically annotated tweets and existing hand-made opinion lexicons to expand an opinion lexicon in a supervised fashion. The expanded lexicon contains part-of-speech (POS) disambiguated entries with a probability distribution for positive, negative, and neutral polarity classes, similarly to SentiWordNet. To obtain this distribution using machine learning, we propose word-level attributes based on (a) the morphological information conveyed by POS tags and (b) associations between words and the sentiment expressed in the tweets that contain them. We consider tweets with both hard and soft sentiment labels. The sentiment associations are modelled in two different ways: using point-wise-mutual-information semantic orientation (PMI-SO), and using stochastic gradient descent semantic orientation (SGD-SO), which learns a linear relationship between words and sentiment. The training dataset is labelled by a seed lexicon formed by combining multiple hand-annotated lexicons. Our experimental results show that our method outperforms the three-dimensional word-level polarity classification performance obtained by using PMI-SO alone. This is significant because PMI-SO is a state-of-the-art measure for establishing world-level sentiment. Additionally, we show that lexicons created with our method achieve signifi- cant improvements over SentiWordNet for classifying tweets into polarity classes, and also outperform SentiStrength in the majority of the experiments.
dc.format.mimetypeapplication/pdf
dc.language.isoenen_NZ
dc.publisherElsevieren_NZ
dc.rightsThis is an author’s accepted version of an article published in the journal: Knowledge Based Systems. © 2016 Elsevier.
dc.subjectScience & Technologyen_NZ
dc.subjectTechnologyen_NZ
dc.subjectComputer Science, Artificial Intelligenceen_NZ
dc.subjectComputer Scienceen_NZ
dc.subjectLexicon expansionen_NZ
dc.subjectSentiment analysisen_NZ
dc.subjectTwitteren_NZ
dc.subjectSENTIMENT ANALYSISen_NZ
dc.subjectKNOWLEDGEen_NZ
dc.subjectDICTIONARYen_NZ
dc.subjectMachine learning
dc.titleBuilding a Twitter opinion lexicon from automatically-annotated tweetsen_NZ
dc.typeJournal Article
dc.identifier.doi10.1016/j.knosys.2016.05.018en_NZ
dc.relation.isPartOfKNOWLEDGE-BASED SYSTEMSen_NZ
pubs.begin-page65
pubs.elements-id138829
pubs.end-page78
pubs.publication-statusPublisheden_NZ
pubs.volume108en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record