COVID19Predictor is a Apache Spark based Machine Learning model designed to predict Novel Coronavirus cases in Canada. It does so by using historical tweets from Twitter.
- Historical top 1000 terms, bigrams and trigrams on twitter
- Provincial Daily Cases (Canada)
The data is processed into a more Spark-readable format in Preprocess.scala and SentimentPreprocess.scala.
Obtain a sentiment analysis of twitter sentiment of COVID19. Use this sentiment as a feature to ML algorithm.
Use top terms as features to a Lasso Regression. In addition, use the COVID19 sentiment as a feature.
- Predictions were not accurate
- More data is needed for proper sentiment analysis (preferably full tweets)
- Potentially a better model could have been used (e.g. LSTM)