by Felipe Alamos and Bhargavi Ganesh
The goal of this study is to analyze if twitter data is a good proxy of masses opinion, and in particular, if tweets reveal the same information as traditional surveys. A full report of the project can be found in the file 'Is Twitter a Proxy for Public Opinion_Alamos-Ganesh.pdf'
-
download_tweets.py: python script that download tweets that contain words given a specific array of keywords To call, include args max_tweets, keyword_1, keyword_2, etc. Ex: python3 download_tweets.py 100 #environment environment
-
config.yaml: configuration file specifying some other parameters for downloading tweets (countries, dates and radius of search).
-
downloaded_tweets folder: contains csv with downloaded tweets.
-
data_cleaning_environment.R and data_cleaning_climate_change.R are two R scripts that read the raw .csv files of downloaded tweets, cleans them and creates new files with the corpus.
-
corpus_df_#environment-usa.csv and corpus_df_#climatechange-usa.csv are the two files generated by the previous scripts. The former will be used for topic modelling and the latter for sentiment analysis.
-
topic_modeling.R: R script that runs topic modelling analysis on the tweets corpuses
-
plots folder: presents plots and images from the topic modelling
- sentiment_analysis.R: main script to run sentiment analysis on twitter.
- vader_final.py: python script to make sentiment analysis with vader dictionary
- eda_survey.R: sript that plots basic eda of survey responses, both for environmental issues and climate change questions.
- pip install GetOldTweets3, library used to download tweets.
We used this example as a general guideline to conduct topic modelling on twitter data.