Skip to content

This repository contains the projects related to data collecting, assessing, cleaning, visualizations and analyzing thus complementing a thorough Data Analysis process knowledge and expertise.

Notifications You must be signed in to change notification settings

Hazarika666/Data-Analytics-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Analytics-Projects

This repository is mainly for projects I have done under the influence of :

  • 'Self learning activities' by searching projects and implementing datasets after which i perform data analysis procecss.
  • Under Skillslash Academy's accalaimed course - Data Science with Artificial Intelligence & Machine Learning which spans for a period of 12 months.
  • Under Udacity's Data Analysis Nanodegree course syllabus roof.

Skillslash Academy & Udacity's online Data Science and Data analyst program prepares me for a career as a industry ready data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .

Part 1 - Intro to Data Analysis

Subjects Covered in these portfolio of projects :

  • Anaconda
  • Jupyter Notebook
  • Data Analysis Process
  • NumPy for 1 and 2D Data
  • Pandas Series and Dataframes

Project 1: Explore Weather Trends with weather forecast data

In this project, I choose one of Udacity's curated datasets and investigate it using NumPy and pandas. I complete the entire data analysis process, starting by posing a question and finishing by sharing the findings. ( It may be better to place this section inside the readme of the project 1)

Project 2: Investigate a dataset called TMDb movie data.

I was provided a dataset reflecting data collected from an experiment. I used statistical techniques to answer questions about the data and report my conclusions and recommendations in a report.

Part 2 -Practical Statistics

Subjects Covered:

  • Probability
  • Conditional Probability
  • Binominal Distribution
  • Sampling Distribution and Central Limit Theorem
  • Descriptive Statistics
  • Inferential Statistics
  • Confidence Levels and Intervals
  • Hypothesis Testing
  • T-tests and A/B test
  • Regression
  • Multiple Linear Regression
  • Logistic Regression

#Project 3: Analyze A/B Test Results with company ab_data.csv

Using Python, I gathered data from a variety of sources, assess its quality and tidiness, then cleaned it. I documented the wrangling efforts in a Jupyter Notebook, plus showcased them through analyses and visualizations using Python and SQL.By using AB Testing and regression methods to decide if the company should launch a new webpage or keep the old one.

Part 3 - Data Extraction and Wrangling

Subjects Covered:

GATHERING DATA:

  • Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs
  • Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files
  • Store gathered data in a PostgreSQL database

ASSESSING DATA

  • Assess data visually and programmatically using pandas
  • Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues)
  • Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity

CLEANING DATA

  • Identify each step of the data cleaning process (defining, coding,and testing)
  • Clean data using Python and pandas
  • Test cleaning code visually and programmatically using Python

#Project 4 : Data Wrangle and Analyze with Tweet WeRateDogs data

Collect data from different sources and assess data visually and programmatically , clean data for visulizing data and finding insights later.

Part 4 - Data Visualisation

Subjects Covered:

  • Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales )
  • Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting )
  • Multivariate exploration of data ( encodings , plot matrices , feature enginnering )
  • Explanatory Visulizations ( story telling with data , polish plots , create slide deck )

Project 5: Data Visulization with Diamond Data

Data visualization to a dataset involving the characteristics of diamonds and their prices.

Project 6: Communicate data finding with Ford Go Bike Sharing Data

In this project, I used Python’s data visualization tools to systematically explore the bike dataset for its properties and relationships between variables. Then, I created a presentation that communicates the findings to others.

About

This repository contains the projects related to data collecting, assessing, cleaning, visualizations and analyzing thus complementing a thorough Data Analysis process knowledge and expertise.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published