This paper focuses on the development of a gold standard corpus for the validation of Felicitta, an online platform which uses Twitter as data source in order to estimate and interactively display the degree of happiness in the Italian cities. The ultimate goal is the creation of an Italian reference Twitter dataset for sentiment analysis that can be used in several frameworks aimed at detecting sentiment from big data sources. We will provide an overview of the reference corpus created for evaluating Felicitta, with a special focus on the issues ` raised from its development, on the inter-annotator agreement discussion and on implications for the further development of the corpus, considering that the assumption that a single right answer exists for each annotated instance cannot be done in several cases in the particular kind of data at issue.
Detecting Happiness in Italian Tweets: Towards an Evaluation Dataset for Sentiment Analysis in Felicittà
PATTI, Viviana;RUFFO, Giancarlo Francesco;SULIS, EMILIO
2014-01-01
Abstract
This paper focuses on the development of a gold standard corpus for the validation of Felicitta, an online platform which uses Twitter as data source in order to estimate and interactively display the degree of happiness in the Italian cities. The ultimate goal is the creation of an Italian reference Twitter dataset for sentiment analysis that can be used in several frameworks aimed at detecting sentiment from big data sources. We will provide an overview of the reference corpus created for evaluating Felicitta, with a special focus on the issues ` raised from its development, on the inter-annotator agreement discussion and on implications for the further development of the corpus, considering that the assumption that a single right answer exists for each annotated instance cannot be done in several cases in the particular kind of data at issue.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.