This project aims to predict coral bleaching percentages using various environmental factors. The main analysis and model development are contained in the linear_regression.ipynb
Jupyter notebook.
Coral bleaching is a significant environmental issue affecting marine ecosystems worldwide. This project uses machine learning techniques, specifically linear regression models, to predict the percentage of coral bleaching based on various environmental factors such as sea surface temperature, depth, and other related variables.
The dataset used in this project contains the following key features:
- Year
- Percent bleached
- Depth (meters)
- Climate Sea Surface Temperature (SST)
- Temperature mean
- Sea Surface Temperature Anomaly (SSTA) mean
- SSTA Degree Heating Weeks (DHW)
- Thermal Stress Anomaly (TSA)
- TSA DHW
- Severity code
- Latitude
- Longitude
The project follows these main steps:
- Data Import and Preprocessing
- Exploratory Data Analysis (EDA)
- Model Development
- Model Evaluation
- The relationship between sea surface temperature and coral bleaching percentage is explored visually.
- A geographic plot of the data points is created to show the global distribution of the dataset.
- Multiple linear regression models are developed and compared, including:
- A model using only Climate Sea Surface Temperature
- A model using Percent Bleached as input
- A model using multiple input features
The final model selected uses multiple input features and demonstrates the lowest loss for predicting bleaching percentage.
To run this project:
- Ensure you have Jupyter Notebook installed along with the required libraries (pandas, numpy, matplotlib, seaborn, tensorflow, geopandas).
- Open the
linear_regression.ipynb
file in Jupyter Notebook. - Run the cells in order to reproduce the analysis and model development.
You can also run this notebook in an interactive environment using Binder:
- Further feature engineering to improve model performance
- Exploration of more advanced machine learning techniques
- Integration with real-time data for ongoing predictions