Skip to content

ari-dasci/OD-SentiMP-21

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SentiMP-21 Dataset

The SentiMP-21 Dataset is a multilingual sentiment analysis dataset based on tweets written by members of parliament in Greece, Spain and United Kingdom in 2021. It has been developed collaboratively by the Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) research group from the University of Granada and the Cardiff NLP research group from the University of Cardiff.

Andalusian Research Institute in Data Science and Computational Intelligence Sherpa AI

Dataset details

The dataset contains 1500 tweets from three different countries: Greece (500 tweets), Spain (500 tweets) and United Kingdom (500 tweets). For each tweet we provide the following information:

  • tweet_id: Which represents the identifier of each tweet.
  • full_text: Which containts the content of the tweet.
  • mp_party: Party to which the member of parliament who wrote the tweet belongs.
  • mp_name: Name of the member of parliament who wrote the tweet.
  • created_at: Date of the tweet.
  • label_i : Annotator's i label (i in {1,2,3} for English and Greek and i in {1,2,3,4,5} for Spanish). It takes values in {-1,0,1,x}.
  • majority_vote: The result after applying the majority vote strategy to the annotators' partial labelling. When there is a tie we use the label "TIE". It takes values in {-1,0,1,TIE}.
  • tie_break: We use this column to break ties in cases where there is a tie. Therefore, it is only completed when TIE appears in the majority_vote column. It takes values in {-1,0,1}.
  • final_label: It represents the final label. It is a combination between the majority_vote abd the tie_break columns. It takes values in {-1,0,1}.

Downloads

We release three different version for each of the datasets:

  • Extended version (full): We include all the columns for each of the initial 500 tweets.
  • Extended version (without x): We delete the tweets labeled with "x" from the previous version.
  • Simple version: It only keeps the columns tweet_id, full_text and final_label from the previous version.

You can find these files in the following repositories:

Citation

If you use this dataset, please cite:

Contact

Nuria Rodríguez Barroso - [email protected]

Shield: CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published