World Bank Data Analysis with Python

Introduction

This project covered some strategies, including data wrangling, engineering and reporting, with dataset from World Bank Demography and Census to identify key influential factors against GDP growth of each countries. The project scoped to 13 countries in Central Asia including:

Bangladesh
Bhutan
China
India
Kazakhstan
Kyrgyzstan
Maldives
Mongolia
Myanmar
Nepal
Sri Lanka
Tajikistan
Thailand

Python Libraries being used:

pandas
matplotlib.pyplot
seaborn
numpy
math
statsmodels.api
pylab
statsmodels.stats import diagnostic
statsmodels.stats.outliers_influence import variance_inflation_factor
sklearn.linear_model import LinearRegression
sklearn.model_selection import train_test_split
sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
scipy import stats

Data management strategy:

Data Wrangling

Find nulls through sum-nulls and missing value via correlation plot techniques.
Replace nulls or missing values with proper values, like means, median or mode of each column, or drop rows containing nulls or missing values.
Consult internet for key information to manipulate nulls or missing values.

Data Engineering

Run correlation matrix to indenify relationships among columns.
Plot out distributions of each attribute to find central tendency, skewness and outliers in the dataset.
Run linear regression to highly correlated varibles influencing the GDP growth.

Reporting

Bar plots of high correlated attributes influencing a country's GDP growth
Pie charts of industrial shares to a country's GDP
Scatter plots of correlation and residue between middle-class income and tax revenue gain
Bar charts representing different percentages of population particiapting in each industry
Residual plots to reveal outliers

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Central Asia Data Set		Central Asia Data Set
Notebook		Notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Bank Data Analysis with Python

Introduction

Python Libraries being used:

Data management strategy:

About

Releases

Packages

Languages

SunlongNgouv/World-bank-dataset

Folders and files

Latest commit

History

Repository files navigation

World Bank Data Analysis with Python

Introduction

Python Libraries being used:

Data management strategy:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages