PyCON CONFERENCE 2020 MACHINE LEARNING CHALLENGE

Data science competitions are the best platform to practise and learn new skills

PART A: FRAMING A DATA SCIENCE PROBLEM

The objective was about to create the machine learning Model to predict the cost expected to be spent by tourist when visit Tanzania national parks.

Before hand, just to reframe the hypothesis to a data science problem.

Hypothesis: Tourists are likely to spent more depends on their number.

Data Science Framing of Problem:
To test the Tourist's hypothesis, Machine Learning regressior will be required to predict the total cost a tourist can spend.

Contribution features such as:

When a tourist planing to visit Tanzania
The country a tourist coming from
Number of tourists
which mode of payment for tourism service

After building the model, we can inspect the model interpretability using Features importance to identify the greatest features that explain increase of cost per tour.

PART B: EXPLORATORY DATA ANALYSIS(EDA)

After problem and hypothesis framing, the next task was EDA with the dataset given. To make the process a breeze, I used the powerful yet simple SWEETVIZ library.

The EDA work and observations can be found in this detailed and separate notebook

PART C: FEATURE ENGINEERING

After EDA, I understood the data better and the next step was feature engineering. This involved taking a deeper dive into the data and formulating features that would better predict the amount of money a tourist directly to spend in a tour.

Work relating to this task can be found in this notebook

PART D: MODEL BUILDING AND EVALUATION

After the features had been formed, I used the polished dataset to build the regression model. This involved testing out several regressors and chosing the best. This was later followed by model evaluation to scrutinize performance. The Evaluation metrics used for the final solution is Mean Absolute Error.

The notebook to the task can be found here.

REPORTING, INSIGHTS AND RECOMMENDATION

Arguably the most critical component. This is where your client sees the value of the work you have been doing. For this, I prepared an executive summary slide detailing the findings, insights and recommendations.

The presentation can be found on this slide.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data		Data
Notebook		Notebook
Report		Report
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyCON CONFERENCE 2020 MACHINE LEARNING CHALLENGE

PART A: FRAMING A DATA SCIENCE PROBLEM

PART B: EXPLORATORY DATA ANALYSIS(EDA)

PART C: FEATURE ENGINEERING

PART D: MODEL BUILDING AND EVALUATION

REPORTING, INSIGHTS AND RECOMMENDATION

About

Releases

Packages

Languages

Tonyloyt/Tourism-Expenditure-in-Tanzania-Analysis

Folders and files

Latest commit

History

Repository files navigation

PyCON CONFERENCE 2020 MACHINE LEARNING CHALLENGE

PART A: FRAMING A DATA SCIENCE PROBLEM

PART B: EXPLORATORY DATA ANALYSIS(EDA)

PART C: FEATURE ENGINEERING

PART D: MODEL BUILDING AND EVALUATION

REPORTING, INSIGHTS AND RECOMMENDATION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages