Skip to content

tjhoranumass/dbt-project

Repository files navigation

DBT E-commerce Project

This DBT project is designed to model and transform e-commerce sales data into useful business insights. The project includes raw data ingestion, staging models, and marts for reporting purposes. The transformed data is then used for visualization and analysis.

Project Structure

The project is organized into several key directories:

dbt_project.yml # Main DBT project configuration file models/ staging/ stg_sales.sql # Staging model for sales data stg_customers.sql # Staging model for customer data stg_products.sql # Staging model for product data marts/ sales_reporting/ orders_by_day.sql # Mart for orders aggregated by day customer_lifetime_value.sql # Mart for customer lifetime value product_performance.sql # Mart for product performance analysis monthly_revenue_orders.sql # Mart for monthly revenue and orders snapshots/ sales_snapshots.sql # Snapshot model for tracking changes in sales data macros/ audit_macros.sql # Custom macros for data quality checks seeds/ customer_dim.csv # Static seed data for customer dimension product_dim.csv # Static seed data for product dimension

vbnet Always show details

Copy code

How to Run the Pipeline

To run the DBT pipeline, follow these steps:

  1. Set Up the Profile:

    • Ensure that your DBT profile is correctly configured in the profiles.yml file. It should contain the correct credentials to connect to your PostgreSQL database.
  2. Run the Pipeline: Use the following command to execute the pipeline and build all models:

    dbt run

    This command will run all transformations from the staging models to the marts.

  3. Test the Pipeline: You can run tests to ensure the pipeline's integrity:

    dbt test

    This will run any schema.yml tests defined for your models, such as checking for uniqueness or null values.

  4. Snapshot Process: If your project includes snapshots for tracking changes over time (like for sales data), run the following command:

    dbt snapshot

    Snapshots help you maintain historical records of your data by capturing the state of specific tables at each point in time.

Visualization with Python

After running the DBT pipeline, you can use Python to query the resulting data and create visualizations.

  1. Python Script for Visualization: Here's an example of a Python script that connects to the PostgreSQL database and visualizes the data using matplotlib and seaborn.

    import pandas as pd
    from sqlalchemy import create_engine
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # PostgreSQL connection string
    db_connection_url = "postgresql://<username>:<password>@<host>:5432/<database>"
    
    # Create SQLAlchemy engine
    engine = create_engine(db_connection_url)
    
    # Query sales data
    sales_query = '''
    SELECT order_date, SUM(total_price) as total_revenue, COUNT(order_id) as total_orders
    FROM stg_sales
    GROUP BY order_date
    ORDER BY order_date;
    '''
    sales_data = pd.read_sql(sales_query, engine)
    
    # Plot total revenue over time
    plt.figure(figsize=(10, 6))
    sns.lineplot(data=sales_data, x='order_date', y='total_revenue', marker='o')
    plt.title('Total Revenue Over Time')
    plt.xlabel('Order Date')
    plt.ylabel('Total Revenue ($)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
  2. Running the Script: To run the Python script:

    • Save it as visualize_data.py.
    • Run it from the command line:
      python visualize_data.py

    This will create visualizations based on the transformed data from your DBT pipeline.

Additional Notes

  • Ensure you restart your DBT environment and refresh connections if you make significant changes to the database or models.
  • You can customize the marts and staging models further to fit specific business requirements.

For more information, refer to the DBT documentation. """

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published