Data wrangling project at Udacity Data Analyst Nanodegree program
Data wrangling is one of the important part of data analysis. It consist of 3 main parts:
- Data gatheting
- Data assessment
- Data cleaning
Mainly work has been done using python in Jupyter Notebook.
Data gathering can be done in different ways while in this project we used dataset laoding from a file, dataset loaded through the provided link and via tweepy API. The dataset which was used for analysis is WeRatfogs tweeter page. In addition, prediction table was provided by a udacity instructor, which usesed algorithm for breed prediction.
Data assessment was performed visually and programmatically. For visaul assessment DataFrame inspection along with data assessnent in Excel spread sheet were conducted. Programmatically data was inspected using differente Python method and functions. Data assessment was performed against data quality and data tidiness.
Data cleaning was conducted by writing python codes to clean the data.
Based on the obtained clean dataset Data analysis was conducted with visaulization.
Fianlly, two reports were written for the internal and external usege.
Note:
Git commit was used on Jupyter Notebook following this article.