This DBT project is designed to model and transform e-commerce sales data into useful business insights. The project includes raw data ingestion, staging models, and marts for reporting purposes. The transformed data is then used for visualization and analysis.
The project is organized into several key directories:
dbt_project.yml # Main DBT project configuration file models/ staging/ stg_sales.sql # Staging model for sales data stg_customers.sql # Staging model for customer data stg_products.sql # Staging model for product data marts/ sales_reporting/ orders_by_day.sql # Mart for orders aggregated by day customer_lifetime_value.sql # Mart for customer lifetime value product_performance.sql # Mart for product performance analysis monthly_revenue_orders.sql # Mart for monthly revenue and orders snapshots/ sales_snapshots.sql # Snapshot model for tracking changes in sales data macros/ audit_macros.sql # Custom macros for data quality checks seeds/ customer_dim.csv # Static seed data for customer dimension product_dim.csv # Static seed data for product dimension
vbnet Always show details
Copy code
To run the DBT pipeline, follow these steps:
-
Set Up the Profile:
- Ensure that your DBT profile is correctly configured in the
profiles.yml
file. It should contain the correct credentials to connect to your PostgreSQL database.
- Ensure that your DBT profile is correctly configured in the
-
Run the Pipeline: Use the following command to execute the pipeline and build all models:
dbt run
This command will run all transformations from the staging models to the marts.
-
Test the Pipeline: You can run tests to ensure the pipeline's integrity:
dbt test
This will run any
schema.yml
tests defined for your models, such as checking for uniqueness or null values. -
Snapshot Process: If your project includes snapshots for tracking changes over time (like for sales data), run the following command:
dbt snapshot
Snapshots help you maintain historical records of your data by capturing the state of specific tables at each point in time.
After running the DBT pipeline, you can use Python to query the resulting data and create visualizations.
-
Python Script for Visualization: Here's an example of a Python script that connects to the PostgreSQL database and visualizes the data using
matplotlib
andseaborn
.import pandas as pd from sqlalchemy import create_engine import matplotlib.pyplot as plt import seaborn as sns # PostgreSQL connection string db_connection_url = "postgresql://<username>:<password>@<host>:5432/<database>" # Create SQLAlchemy engine engine = create_engine(db_connection_url) # Query sales data sales_query = ''' SELECT order_date, SUM(total_price) as total_revenue, COUNT(order_id) as total_orders FROM stg_sales GROUP BY order_date ORDER BY order_date; ''' sales_data = pd.read_sql(sales_query, engine) # Plot total revenue over time plt.figure(figsize=(10, 6)) sns.lineplot(data=sales_data, x='order_date', y='total_revenue', marker='o') plt.title('Total Revenue Over Time') plt.xlabel('Order Date') plt.ylabel('Total Revenue ($)') plt.xticks(rotation=45) plt.tight_layout() plt.show()
-
Running the Script: To run the Python script:
- Save it as
visualize_data.py
. - Run it from the command line:
python visualize_data.py
This will create visualizations based on the transformed data from your DBT pipeline.
- Save it as
- Ensure you restart your DBT environment and refresh connections if you make significant changes to the database or models.
- You can customize the marts and staging models further to fit specific business requirements.
For more information, refer to the DBT documentation. """