Here's an improved version of your README.md
file with some linguistic enhancements:
This project explores Walmart sales data to identify top-performing branches and products, examine sales trends, and understand customer behavior. The primary goal is to study how sales strategies can be optimized. The dataset was obtained from the Kaggle Walmart Sales Forecasting Competition.
In this competition, participants are provided with historical sales data from 45 Walmart stores across different regions. Each store contains several departments, and the task is to forecast sales for each department in each store. Holiday markdown events are included, affecting sales unpredictably, making it a challenge to project the extent of these impacts. Source.
The primary goal of this project is to gain insights into Walmart's sales data to better understand the factors that influence sales performance across different branches.
The dataset was sourced from the Kaggle Walmart Sales Forecasting Competition. It contains sales transactions from three Walmart branches, located in Mandalay, Yangon, and Naypyitaw. The data consists of 17 columns and 1,000 rows:
Column | Description | Data Type |
---|---|---|
invoice_id | Invoice for the sales transaction | VARCHAR(30) |
branch | Branch where the sales occurred | VARCHAR(5) |
city | Location of the branch | VARCHAR(30) |
customer_type | Type of customer | VARCHAR(30) |
gender | Gender of the customer | VARCHAR(10) |
product_line | Product line of the sold item | VARCHAR(100) |
unit_price | Price per product | DECIMAL(10, 2) |
quantity | Number of units sold | INT |
VAT | Tax on the purchase | FLOAT(6, 4) |
total | Total cost of the transaction | DECIMAL(10, 2) |
date | Date of the transaction | DATE |
time | Time of the transaction | TIMESTAMP |
payment_method | Payment method used | VARCHAR(15) |
cogs | Cost of goods sold | DECIMAL(10, 2) |
gross_margin_percentage | Gross margin percentage | FLOAT(11, 9) |
gross_income | Gross income from the transaction | DECIMAL(10, 2) |
rating | Customer rating | FLOAT(2, 1) |
- Analyze different product lines to determine which are performing well and which need improvement.
- Identify sales trends to evaluate the effectiveness of sales strategies and suggest improvements to boost sales.
- Study customer segments, their purchasing trends, and profitability to better target future marketing efforts.
-
Data Wrangling:
- The data is inspected for missing or NULL values, which are addressed using appropriate methods.
Steps:
- Build the database.
- Create the table and insert the data.
- Since we used NOT NULL constraints, no missing values are present.
-
Feature Engineering:
- Generate new columns from existing ones to extract insights.
New features:
time_of_day
: Classifies transactions into Morning, Afternoon, or Evening.day_name
: Extracts the day of the week from the date.month_name
: Extracts the month of the transaction.
-
Exploratory Data Analysis (EDA):
- Perform analysis to answer the questions and objectives laid out in the project.
- How many unique cities are in the data?
- What city corresponds to each branch?
- How many unique product lines are in the data?
- What is the most common payment method?
- Which product line sells the most?
- What is the total revenue per month?
- Which month had the highest COGS (Cost of Goods Sold)?
- Which product line generated the most revenue?
- What city produced the highest revenue?
- Which product line had the highest VAT?
- Classify each product line as "Good" or "Bad" based on whether its sales exceed the average.
- Which branch sold more than the average number of products?
- What is the most popular product line by gender?
- What is the average rating for each product line?
- Number of sales made during different times of the day on weekdays.
- Which customer type generates the most revenue?
- Which city has the highest VAT percentage?
- Which customer type pays the most VAT?
- How many unique customer types are in the data?
- How many unique payment methods are there?
- What is the most common customer type?
- Which customer type buys the most products?
- What is the gender breakdown of customers?
- What is the gender distribution per branch?
- At what time of day are the most ratings given?
- At what time of day are the most ratings given per branch?
- Which day of the week has the best average ratings?
- Which day of the week has the best average ratings per branch?
-
COGS (Cost of Goods Sold): [ COGS = unitPrice \times quantity ]
-
VAT (Value Added Tax): [ VAT = 5% \times COGS ]
-
Total Sales: [ total(gross_sales) = VAT + COGS ]
-
Gross Profit (Gross Income): [ grossProfit = total(gross_sales) - COGS ]
-
Gross Margin: [ \text{Gross Margin} = \frac{\text{gross income}}{\text{total revenue}} ]
For the first row in the dataset:
- Unit Price: $45.79
- Quantity: 7
-
COGS: [ COGS = 45.79 \times 7 = 320.53 ]
-
VAT: [ VAT = 5% \times 320.53 = 16.0265 ]
-
Total Sales: [ Total = 16.0265 + 320.53 = 336.5565 ]
-
Gross Margin Percentage: [ \text{Gross Margin} = \frac{16.0265}{336.5565} \approx 4.76% ]
-- Create database CREATE DATABASE IF NOT EXISTS walmartSales;
-- Create table CREATE TABLE IF NOT EXISTS sales( invoice_id VARCHAR(30) NOT NULL PRIMARY KEY, branch VARCHAR(5) NOT NULL, city VARCHAR(30) NOT NULL, customer_type VARCHAR(30) NOT NULL, gender VARCHAR(30) NOT NULL, product_line VARCHAR(100) NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, quantity INT NOT NULL, tax_pct FLOAT(6, 4) NOT NULL, total DECIMAL(12, 4) NOT NULL, date DATETIME NOT NULL, time TIME NOT NULL, payment VARCHAR(15) NOT NULL, cogs DECIMAL(10, 2) NOT NULL, gross_margin_pct FLOAT(11, 9), gross_income DECIMAL(12, 4), rating FLOAT(2, 1) );