Skip to content

Adeolujoseph/Retailstore_ELT_pipeline_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Retail store ELT pipeline project

Project Overview:

In this project, I designed and developed an ELT (Extract, Load, Transform) pipeline for a retail store, automating the data flow from raw data collection to insightful visualizations.

  1. Data Collection:

    OLTP Data: Data is extracted from the store’s OLTP database (RDS) and loaded into Snowflake using Airbyte. This process is scheduled to run daily.

    S3 Data: Data is also collected from an S3 bucket and loaded into Snowflake using an AWS Lambda function. This Lambda function is orchestrated to run daily at 6 AM using Amazon EventBridge.

  2. Data Transformation: The data in Snowflake is transformed using dbt (Data Build Tool) to align with business requirements. The transformation models are scheduled to run automatically through a cron job.

  3. Data Visualization: The transformed data is visualized using Metabase. The dashboards and reports in Metabase are configured to refresh daily, ensuring that they reflect the most up-to-date data.

Project Architecture:

image

Business Scenerio:

A retail store needs a Data Engineer to develop an ELT (Extract, Load, Transform) pipeline that extracts raw data from the company’s database, transforms it to meet business needs, and offers a platform for Data Analysts to create visualizations that address key business questions. image

Business Questions:

1.Identification of the highest and lowest-performing items of the week by analyzing sales amounts and quantities. 2.Display of items with low supply levels for each week. 3.Detection of items experiencing low stock levels, along with their corresponding week and warehouse numbers, marked as "True". 4.Analysis of Revenue 5.Analysis of Customer 6.Analysis of Warehouse based on low stock items and Revenue

Project Steps:

  1. OLTP Data Ingestion (Airbyte) : Airbyte was hosted on EC2 instance and configured to ingest all but inventory table from RDS into snowflake. Replication freq: 6:00 UTC , Sync mode:Full refresh image

  2. S3 Data Ingestion (AWS Lambda) : Inventory table is ingested using a custom python function powered by Lambda. Orchestration: Eventbridge image

    Data Loaded into snowflake sucessfully as shown below image

  3. Data Transformation (DBT) : a. All tables are loaded into the staging area b. Integration of several customer-related tables into one Customer table c. Creating a snapshot of the Customer table (for SCD type 2) in the intermediate layer d. Creating a daily sales facts table in the intermediate layere e. Creating the customer dimension and weekly_sales_inventory models in the marts layer using the intermediate models f. The new fact table 'weekly_sales_inventory' provides the additional metrics

    sum_qty_wk: Sum of sales_quantity for the current week.
    
    sum_amt_wk: Sum of sales_amount for the current week.
    
    sum_profit_wk: Sum of net_profit for the current week.
    
    avg_qty_dy: Average daily sales_quantity for the current week (calculated as sum_qty_wk/7).
    
    inv_on_hand_qty_wk: Item inventory on hand at the end of each week across all warehouses (equivalent to the inventory on hand at the end of the weekend).
    
    wks_sply: Weeks of supply, an estimated metric indicating how many weeks the inventory can supply the sales (calculated as inv_on_hand_qty_wk/sum_qty_wk).
    
    low_stock_flg_wk: Low stock weekly flag. If, during the week, there is a single day where [(avg_qty_dy > 0 && avg_qty_dy > inventory_on_hand_qty_wk)], then set the flag to True.
    

    g. Testing: various test like not null, unique, relationship, accepted values were carried out on relevant table columns in the marts model

    image

    h. Documentation:detailed column and table descriptions helped to generate exhaustive documentation

    image

    image

    image

    i. Data Analyzation (Metabase) : Mart models are fed into Metabase for analytics purposes

    image

        Table relationships and column semantic types are set to aid analysis as below
    

    image

    j. Dashboards : These are created with SQL queries and Metabase GUI. Dashboards are categorized into different tabs to answer business questions with relavant filters to drill things down

    image

    image

    image

    image

Possible future work:

a. Performance was based on Revenue, profit could be considered to see the effect of cost on the business b. Filter by multiple criteria within each tab in Metabase c. Use Airbyte to ingest data from s3 bucket d. Perform more data quality tests

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages