Skip to content

Big Data weather forecasting, experimenting with logistic regression, SVM and random forest in a distributed setting by using PySpark

Notifications You must be signed in to change notification settings

andrea-gasparini/big-data-weather-forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weather Forecasting with PySpark

In this project I am addressing weather forecasting with Machine Learning and Big Data tools, in order to show whether is possible to make valuable predictions of meteorological conditions only based on previously seen meteorological data. The classification goal is therefore, given a set of weather measurements, to predict which meteorological condition should occur.

For further details you can refer to the presentation slides or to the Python Notebook (also published on DataBricks).

This project has been developed during the A.Y. 2020-2021 for the Big Data Computing course @ Sapienza University of Rome.

Dataset

The dataset comes from Kaggle and contains hourly weather measurements data of 36 cities, collected from 2012 to 2017. This 5 years of data result in approximately 45.000 measurements (for each city) of temperature, humidity, air pressure and the like.

Author

Andrea Gasparini

About

Big Data weather forecasting, experimenting with logistic regression, SVM and random forest in a distributed setting by using PySpark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published