DBT E-commerce Project

This DBT project is designed to model and transform e-commerce sales data into useful business insights. The project includes raw data ingestion, staging models, and marts for reporting purposes. The transformed data is then used for visualization and analysis.

Project Structure

The project is organized into several key directories:

dbt_project.yml # Main DBT project configuration file models/ staging/ stg_sales.sql # Staging model for sales data stg_customers.sql # Staging model for customer data stg_products.sql # Staging model for product data marts/ sales_reporting/ orders_by_day.sql # Mart for orders aggregated by day customer_lifetime_value.sql # Mart for customer lifetime value product_performance.sql # Mart for product performance analysis monthly_revenue_orders.sql # Mart for monthly revenue and orders snapshots/ sales_snapshots.sql # Snapshot model for tracking changes in sales data macros/ audit_macros.sql # Custom macros for data quality checks seeds/ customer_dim.csv # Static seed data for customer dimension product_dim.csv # Static seed data for product dimension

vbnet Always show details

Copy code

How to Run the Pipeline

To run the DBT pipeline, follow these steps:

Set Up the Profile:
- Ensure that your DBT profile is correctly configured in the profiles.yml file. It should contain the correct credentials to connect to your PostgreSQL database.
Run the Pipeline: Use the following command to execute the pipeline and build all models:
```
dbt run
```
This command will run all transformations from the staging models to the marts.
Test the Pipeline: You can run tests to ensure the pipeline's integrity:
```
dbt test
```
This will run any schema.yml tests defined for your models, such as checking for uniqueness or null values.
Snapshot Process: If your project includes snapshots for tracking changes over time (like for sales data), run the following command:
```
dbt snapshot
```
Snapshots help you maintain historical records of your data by capturing the state of specific tables at each point in time.

Visualization with Python

After running the DBT pipeline, you can use Python to query the resulting data and create visualizations.

Python Script for Visualization: Here's an example of a Python script that connects to the PostgreSQL database and visualizes the data using matplotlib and seaborn.

import pandas as pd
from sqlalchemy import create_engine
import matplotlib.pyplot as plt
import seaborn as sns

# PostgreSQL connection string
db_connection_url = "postgresql://<username>:<password>@<host>:5432/<database>"

# Create SQLAlchemy engine
engine = create_engine(db_connection_url)

# Query sales data
sales_query = '''
SELECT order_date, SUM(total_price) as total_revenue, COUNT(order_id) as total_orders
FROM stg_sales
GROUP BY order_date
ORDER BY order_date;
'''
sales_data = pd.read_sql(sales_query, engine)

# Plot total revenue over time
plt.figure(figsize=(10, 6))
sns.lineplot(data=sales_data, x='order_date', y='total_revenue', marker='o')
plt.title('Total Revenue Over Time')
plt.xlabel('Order Date')
plt.ylabel('Total Revenue ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Running the Script: To run the Python script:
- Save it as visualize_data.py.
- Run it from the command line:
```
python visualize_data.py
```
This will create visualizations based on the transformed data from your DBT pipeline.

Additional Notes

Ensure you restart your DBT environment and refresh connections if you make significant changes to the database or models.
You can customize the marts and staging models further to fit specific business requirements.

For more information, refer to the DBT documentation. """

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
logs		logs
macros		macros
models		models
snapshots		snapshots
target		target
.DS_Store		.DS_Store
.user.yml		.user.yml
ASSIGNMENT.md		ASSIGNMENT.md
README.md		README.md
dbt_project.yml		dbt_project.yml
dbt_visualization_example.ipynb		dbt_visualization_example.ipynb
profiles.yml		profiles.yml
visualize_data.py		visualize_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBT E-commerce Project

Project Structure

How to Run the Pipeline

Visualization with Python

Additional Notes

About

Releases

Packages

Languages

HestC/dbt-project

Folders and files

Latest commit

History

Repository files navigation

DBT E-commerce Project

Project Structure

How to Run the Pipeline

Visualization with Python

Additional Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages