Data science competitions are the best platform to practise and learn new skills
The objective was about to create the machine learning Model to predict the cost expected to be spent by tourist when visit Tanzania national parks.
Before hand, just to reframe the hypothesis to a data science problem.
Hypothesis: Tourists are likely to spent more depends on their number.
Data Science Framing of Problem:
To test the Tourist's hypothesis, Machine Learning regressior will be required to predict the total cost a tourist can spend.
Contribution features such as:
- When a tourist planing to visit Tanzania
- The country a tourist coming from
- Number of tourists
- which mode of payment for tourism service
After building the model, we can inspect the model interpretability using Features importance
to identify the greatest features that explain increase of cost per tour.
After problem and hypothesis framing, the next task was EDA with the dataset given. To make the process a breeze, I used the powerful yet simple SWEETVIZ library.
The EDA work and observations can be found in this detailed and separate notebook
After EDA, I understood the data better and the next step was feature engineering. This involved taking a deeper dive into the data and formulating features that would better predict the amount of money a tourist directly to spend in a tour.
Work relating to this task can be found in this notebook
After the features had been formed, I used the polished dataset to build the regression model. This involved testing out several regressors and chosing the best. This was later followed by model evaluation to scrutinize performance. The Evaluation metrics used for the final solution is Mean Absolute Error.
The notebook to the task can be found here.
Arguably the most critical component. This is where your client sees the value of the work you have been doing. For this, I prepared an executive summary slide detailing the findings, insights and recommendations.
The presentation can be found on this slide.