Skip to content

Abderaouf-KHELFAOUI/Walmart-Sales-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Here's an improved version of your README.md file with some linguistic enhancements:


Walmart Sales Data Analysis

Overview

This project explores Walmart sales data to identify top-performing branches and products, examine sales trends, and understand customer behavior. The primary goal is to study how sales strategies can be optimized. The dataset was obtained from the Kaggle Walmart Sales Forecasting Competition.

In this competition, participants are provided with historical sales data from 45 Walmart stores across different regions. Each store contains several departments, and the task is to forecast sales for each department in each store. Holiday markdown events are included, affecting sales unpredictably, making it a challenge to project the extent of these impacts. Source.

Project Objectives

The primary goal of this project is to gain insights into Walmart's sales data to better understand the factors that influence sales performance across different branches.

Dataset Description

The dataset was sourced from the Kaggle Walmart Sales Forecasting Competition. It contains sales transactions from three Walmart branches, located in Mandalay, Yangon, and Naypyitaw. The data consists of 17 columns and 1,000 rows:

Column Description Data Type
invoice_id Invoice for the sales transaction VARCHAR(30)
branch Branch where the sales occurred VARCHAR(5)
city Location of the branch VARCHAR(30)
customer_type Type of customer VARCHAR(30)
gender Gender of the customer VARCHAR(10)
product_line Product line of the sold item VARCHAR(100)
unit_price Price per product DECIMAL(10, 2)
quantity Number of units sold INT
VAT Tax on the purchase FLOAT(6, 4)
total Total cost of the transaction DECIMAL(10, 2)
date Date of the transaction DATE
time Time of the transaction TIMESTAMP
payment_method Payment method used VARCHAR(15)
cogs Cost of goods sold DECIMAL(10, 2)
gross_margin_percentage Gross margin percentage FLOAT(11, 9)
gross_income Gross income from the transaction DECIMAL(10, 2)
rating Customer rating FLOAT(2, 1)

Analytical Goals

Product Analysis

  • Analyze different product lines to determine which are performing well and which need improvement.

Sales Analysis

  • Identify sales trends to evaluate the effectiveness of sales strategies and suggest improvements to boost sales.

Customer Analysis

  • Study customer segments, their purchasing trends, and profitability to better target future marketing efforts.

Approach

  1. Data Wrangling:

    • The data is inspected for missing or NULL values, which are addressed using appropriate methods.

    Steps:

    • Build the database.
    • Create the table and insert the data.
    • Since we used NOT NULL constraints, no missing values are present.
  2. Feature Engineering:

    • Generate new columns from existing ones to extract insights.

    New features:

    • time_of_day: Classifies transactions into Morning, Afternoon, or Evening.
    • day_name: Extracts the day of the week from the date.
    • month_name: Extracts the month of the transaction.
  3. Exploratory Data Analysis (EDA):

    • Perform analysis to answer the questions and objectives laid out in the project.

Business Questions

General

  1. How many unique cities are in the data?
  2. What city corresponds to each branch?

Product-Related

  1. How many unique product lines are in the data?
  2. What is the most common payment method?
  3. Which product line sells the most?
  4. What is the total revenue per month?
  5. Which month had the highest COGS (Cost of Goods Sold)?
  6. Which product line generated the most revenue?
  7. What city produced the highest revenue?
  8. Which product line had the highest VAT?
  9. Classify each product line as "Good" or "Bad" based on whether its sales exceed the average.
  10. Which branch sold more than the average number of products?
  11. What is the most popular product line by gender?
  12. What is the average rating for each product line?

Sales-Related

  1. Number of sales made during different times of the day on weekdays.
  2. Which customer type generates the most revenue?
  3. Which city has the highest VAT percentage?
  4. Which customer type pays the most VAT?

Customer-Related

  1. How many unique customer types are in the data?
  2. How many unique payment methods are there?
  3. What is the most common customer type?
  4. Which customer type buys the most products?
  5. What is the gender breakdown of customers?
  6. What is the gender distribution per branch?
  7. At what time of day are the most ratings given?
  8. At what time of day are the most ratings given per branch?
  9. Which day of the week has the best average ratings?
  10. Which day of the week has the best average ratings per branch?

Revenue and Profit Calculations

  • COGS (Cost of Goods Sold): [ COGS = unitPrice \times quantity ]

  • VAT (Value Added Tax): [ VAT = 5% \times COGS ]

  • Total Sales: [ total(gross_sales) = VAT + COGS ]

  • Gross Profit (Gross Income): [ grossProfit = total(gross_sales) - COGS ]

  • Gross Margin: [ \text{Gross Margin} = \frac{\text{gross income}}{\text{total revenue}} ]

Example Calculation

For the first row in the dataset:

  • Unit Price: $45.79
  • Quantity: 7
  1. COGS: [ COGS = 45.79 \times 7 = 320.53 ]

  2. VAT: [ VAT = 5% \times 320.53 = 16.0265 ]

  3. Total Sales: [ Total = 16.0265 + 320.53 = 336.5565 ]

  4. Gross Margin Percentage: [ \text{Gross Margin} = \frac{16.0265}{336.5565} \approx 4.76% ]

Code

-- Create database CREATE DATABASE IF NOT EXISTS walmartSales;

-- Create table CREATE TABLE IF NOT EXISTS sales( invoice_id VARCHAR(30) NOT NULL PRIMARY KEY, branch VARCHAR(5) NOT NULL, city VARCHAR(30) NOT NULL, customer_type VARCHAR(30) NOT NULL, gender VARCHAR(30) NOT NULL, product_line VARCHAR(100) NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, quantity INT NOT NULL, tax_pct FLOAT(6, 4) NOT NULL, total DECIMAL(12, 4) NOT NULL, date DATETIME NOT NULL, time TIME NOT NULL, payment VARCHAR(15) NOT NULL, cogs DECIMAL(10, 2) NOT NULL, gross_margin_pct FLOAT(11, 9), gross_income DECIMAL(12, 4), rating FLOAT(2, 1) );

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published