This project contains codes written to analyze disaster data from Appen to build a model for an API that classifies disaster messages.
The project also includes a web app where an emergency worker can input a new message and get classification results in several categories. This is a multi-output classification task.
- Installation
- Project Motivation
- Overview of project
- Instruction for setting up
- Results
- Acknowledgements
There is no major libraries required to run the code beyond what is provided in the Anaconda python distribution. The code can be run with any version of Python 3.
Following a disaster, disaster response organizations get millions of communications either direct or via social media at the time when they have the least capacity to filter and pull out the messages which are most important.
The way disaster is responded to is that different organizations will take care of different part of the problems. One might be in charge of water, blocked roads, fire etc. However, it is usually the case that there is only one in a thousand messages that might be relevant to the disaster response professionals.
Therefore supervised machine learning based approaches are used and are more accurate than key word searches to analyze the data and know which of the organizations should respond to which need.
There are three main components of this project:
This first component is implemented in process_data.py
and it:
- Loads the
messages
andcategories
datasets. - Merges the two datasets.
- Cleans the data.
- Stores it in a SQLite database.
This component involves writing a machine learning pipeline that:
- Loads data from the SQLite database.
- Splits the dataset into training and testing sets.
- Builds a text processing and machine learning pipeline
- Trains and tunes a model using GridSearchCV.
- Outputs the results on the test set.
- Exports the final model as a pickle file.
This is given in the train_classifier.py
.
- Here I use my knowledge of flask, html, css and javascript to build the web app.
- I also add to the web appdata visulaizations using Plotly.
-
Run the following commands in the project's root directory to set up your database and model.
-
To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
-
To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
-
Go to
app
directory:cd app
-
Run your web app:
python run.py
-
Click on the web link to visualize the app.
Here I provide a snapshot of the built web app.
- This project was inspired by the Data Science nanodegree program at Udacity.
- The data was provided for by Appen.