Skip to content

CalderLund/COVID19Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID19Predictor

COVID19Predictor is a Apache Spark based Machine Learning model designed to predict Novel Coronavirus cases in Canada. It does so by using historical tweets from Twitter.

Dataset

  • Historical top 1000 terms, bigrams and trigrams on twitter
  • Provincial Daily Cases (Canada)

Preprocessing

The data is processed into a more Spark-readable format in Preprocess.scala and SentimentPreprocess.scala.

Sentiment Analysis

Obtain a sentiment analysis of twitter sentiment of COVID19. Use this sentiment as a feature to ML algorithm.

Algorithm

Use top terms as features to a Lasso Regression. In addition, use the COVID19 sentiment as a feature.

Results

  • Predictions were not accurate
  • More data is needed for proper sentiment analysis (preferably full tweets)
  • Potentially a better model could have been used (e.g. LSTM)

About

Predicts COVID19 cases using Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages