Skip to content

Latest commit

 

History

History
46 lines (26 loc) · 2.17 KB

Readme.md

File metadata and controls

46 lines (26 loc) · 2.17 KB

Comparing twitter data and surveys data: a topic modeling and sentiment analysis approach

by Felipe Alamos and Bhargavi Ganesh

Goal of study

The goal of this study is to analyze if twitter data is a good proxy of masses opinion, and in particular, if tweets reveal the same information as traditional surveys. A full report of the project can be found in the file 'Is Twitter a Proxy for Public Opinion_Alamos-Ganesh.pdf'

Files in repo

download_tweets folder

  • download_tweets.py: python script that download tweets that contain words given a specific array of keywords To call, include args max_tweets, keyword_1, keyword_2, etc. Ex: python3 download_tweets.py 100 #environment environment

  • config.yaml: configuration file specifying some other parameters for downloading tweets (countries, dates and radius of search).

  • downloaded_tweets folder: contains csv with downloaded tweets.

data_cleaning folder

  • data_cleaning_environment.R and data_cleaning_climate_change.R are two R scripts that read the raw .csv files of downloaded tweets, cleans them and creates new files with the corpus.

  • corpus_df_#environment-usa.csv and corpus_df_#climatechange-usa.csv are the two files generated by the previous scripts. The former will be used for topic modelling and the latter for sentiment analysis.

topic_modeling folder

  • topic_modeling.R: R script that runs topic modelling analysis on the tweets corpuses

  • plots folder: presents plots and images from the topic modelling

sentiment analysis

  • sentiment_analysis.R: main script to run sentiment analysis on twitter.
  • vader_final.py: python script to make sentiment analysis with vader dictionary

eda_survey folder

  • eda_survey.R: sript that plots basic eda of survey responses, both for environmental issues and climate change questions.

Setup

References

We used this example as a general guideline to conduct topic modelling on twitter data.