Skip to content

Special repository for Tourism Expenditure Analysis showcase project based on Tanzania Datasets

Notifications You must be signed in to change notification settings

Tonyloyt/Tourism-Expenditure-in-Tanzania-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PyCON CONFERENCE 2020 MACHINE LEARNING CHALLENGE

Data science competitions are the best platform to practise and learn new skills

PART A: FRAMING A DATA SCIENCE PROBLEM

The objective was about to create the machine learning Model to predict the cost expected to be spent by tourist when visit Tanzania national parks.

Before hand, just to reframe the hypothesis to a data science problem.

Hypothesis: Tourists are likely to spent more depends on their number.

Data Science Framing of Problem:
To test the Tourist's hypothesis, Machine Learning regressior will be required to predict the total cost a tourist can spend.

Contribution features such as:

  • When a tourist planing to visit Tanzania
  • The country a tourist coming from
  • Number of tourists
  • which mode of payment for tourism service

After building the model, we can inspect the model interpretability using Features importance to identify the greatest features that explain increase of cost per tour.

PART B: EXPLORATORY DATA ANALYSIS(EDA)

After problem and hypothesis framing, the next task was EDA with the dataset given. To make the process a breeze, I used the powerful yet simple SWEETVIZ library.

The EDA work and observations can be found in this detailed and separate notebook

PART C: FEATURE ENGINEERING

After EDA, I understood the data better and the next step was feature engineering. This involved taking a deeper dive into the data and formulating features that would better predict the amount of money a tourist directly to spend in a tour.

Work relating to this task can be found in this notebook

PART D: MODEL BUILDING AND EVALUATION

After the features had been formed, I used the polished dataset to build the regression model. This involved testing out several regressors and chosing the best. This was later followed by model evaluation to scrutinize performance. The Evaluation metrics used for the final solution is Mean Absolute Error.

The notebook to the task can be found here.

REPORTING, INSIGHTS AND RECOMMENDATION

Arguably the most critical component. This is where your client sees the value of the work you have been doing. For this, I prepared an executive summary slide detailing the findings, insights and recommendations.

The presentation can be found on this slide.

End Banner

About

Special repository for Tourism Expenditure Analysis showcase project based on Tanzania Datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published