Up to now this Repo contains some functions regarding Sentiment analysis with tweet regex preprocessing as form of tokenization to use simple bag-of-words-like methods for polarity and sentiment analysis. It also supports automated multi-API search for data generation via Tweepy - but obviously comes without the API keys.
Up to now this Repo only contains basic classical machine learning on the data, as without more accurate labelling of tweets for example RNNs are neither faster and nor more accurate than the baseline, thus making these imo superfluent.
Other useful analyses which are not included but might be considered in the future are for example:
- Cluster- and Hashtag-analysis
- GPS data
- Parameter for language compatibility for TextBlob (right now only hard-coded)
- Use tqdm for progress tracking when loading tweets
- Still searching for efficient ways to allow search beyond the "recent" week given by twitter's API/ tweepy
- Improve and simplify (for the user) handling of data distribution comparison (MMD vs t-test vs. ...)
- Visually improve plots
After cloning, run pip install -r requirements.txt
to get the relevant packages. Might not include all necessary
packages yet, but I would like to try that on Linux, not Windows.
Rename all config_default.py
to config_local.py
and specify the correct local path for API-keys, save directory for
tweets as well as directory for plots to save to.