Completed by Sonakshi Chauhan.
Overview: This project is using the Titanic Dataset to create a model that will
-return a conditional survival probabily of a passenger -Help you comapre and contrast all the Classification models based on accuracy -data vizualizations given a condition on a numerical variable from the dataset.
Problem Statement: Build a model that will return a passengers survival chance given a passengers detail as input.
Data: Titanic Kaggle Challenge
Deliverables: Probability
Ahoy! Let's Sail
- Statistical Modeling
- Imputation of Missing values
- Probability
- Various Classification Techniques
- Scikit-learn
- Google Colab
Ensure that the following packages have been installed and imported.
pip install numpy
pip install pandas
pip install seaborn
Follow instruction on https://docs.anaconda.com/anaconda/install/ to install Anaconda with Jupyter. Alternatively: VS Code can render Jupyter Notebooks
The structure of this notebook is as follows: -Imports -Data Loading -Data Pre-processing -Data Analysis -Data Vizualization -Encoding -Supporting Target and Features -Spliting Data -Model Training -Testing and Prediction
->observing the data above we found it had missing columns and rows ->We dropped the 'Cabin' column as it had highest number pf missing values ->We manipulated the 'Age' and 'Embarked Column'
->Prediction has to be made depending on the survival number ->Here we analyze the number of survived people according to different classes
->Here we vizualize our data to have a better understanding of highest survivval rates are from which category.
#Categorial Encoding ->Here we encode all the values numerically so as to ensure similarity in data types
#Supporting Target and Features ->Here we divide data into dependent and independent variables mainly 'Y' having the dependent value and 'X' having independent values
#Splitting our Dataset into Train and Test Set ->Using sklearn library we split our dataset into train and test
->First we scale our train and test set values -> Here we train multiple classification models to choose which one is more accurate -> We find RandomForest more accurate and move ahead with it.
#Prediction and accuracy ->This is the final step where we test and make predictions on our model
#Conclusion ->We built a Classifier using Random Forest technique to predict titanic survival rates
Contact: [email protected]
This is project is complete
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.