Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 3.07 KB

README.md

File metadata and controls

51 lines (35 loc) · 3.07 KB

SentiMP-21 Dataset

The SentiMP-21 Dataset is a multilingual sentiment analysis dataset based on tweets written by members of parliament in Greece, Spain and United Kingdom in 2021. It has been developed collaboratively by the Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) research group from the University of Granada and the Cardiff NLP research group from the University of Cardiff.

Andalusian Research Institute in Data Science and Computational Intelligence Sherpa AI

Dataset details

The dataset contains 1500 tweets from three different countries: Greece (500 tweets), Spain (500 tweets) and United Kingdom (500 tweets). For each tweet we provide the following information:

  • tweet_id: Which represents the identifier of each tweet.
  • full_text: Which containts the content of the tweet.
  • mp_party: Party to which the member of parliament who wrote the tweet belongs.
  • mp_name: Name of the member of parliament who wrote the tweet.
  • created_at: Date of the tweet.
  • label_i : Annotator's i label (i in {1,2,3} for English and Greek and i in {1,2,3,4,5} for Spanish). It takes values in {-1,0,1,x}.
  • majority_vote: The result after applying the majority vote strategy to the annotators' partial labelling. When there is a tie we use the label "TIE". It takes values in {-1,0,1,TIE}.
  • tie_break: We use this column to break ties in cases where there is a tie. Therefore, it is only completed when TIE appears in the majority_vote column. It takes values in {-1,0,1}.
  • final_label: It represents the final label. It is a combination between the majority_vote abd the tie_break columns. It takes values in {-1,0,1}.

Downloads

We release three different version for each of the datasets:

  • Extended version (full): We include all the columns for each of the initial 500 tweets.
  • Extended version (without x): We delete the tweets labeled with "x" from the previous version.
  • Simple version: It only keeps the columns tweet_id, full_text and final_label from the previous version.

You can find these files in the following repositories:

Citation

If you use this dataset, please cite:

Contact

Nuria Rodríguez Barroso - [email protected]

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0