Skip to content

Latest commit

 

History

History
43 lines (27 loc) · 1.72 KB

README.md

File metadata and controls

43 lines (27 loc) · 1.72 KB

message-translation

License: CC BY-NC 4.0

An assistive writing tool to analyze linguistic and cultural variation across communities

Environment Setup

Please run the following:

conda create -n message python=3.8
pip install -r requirements.txt

Dataset

You can follow the instructions from the public BLM Twitter dataset to download tweets using our filtered tweetid to generate a smaller dataset which contains ~200K pro-BLM tweets and ~100K anti-BLM tweets. The preprocessing code and data are here. After that, move the dataset to ./data/blm_alm/raw/ such that you have the following two files: pro_blm_200k.txt and anti_blm_100k.txt.

Semantic Shift Analysis

cd semantic_shift
# download BERTweet to your local machine
python download_bertweet.py
sh ./bash_scripts/compute_semantic_shifts.sh

Check the notebook to see the analysis.

Cultural and Ideological Analysis

cd ideology-alignment
sh train_script.sh

Check the notebook to see the analysis.

Acknowledgement

This github is developed on the basis of UiO-UvA at SemEval-2020 Task 1 and Aligning Multidimensional Worldviews and Discovering Ideological Differences.