Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example copypastable style guide starter to style guide #4104

Merged
merged 5 commits into from
Sep 21, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,98 @@ Now that you've seen how we style our dbt projects, it's time to build your own.
## Pre-commit hooks

Lastly, to ensure your style guide's automated rules are being followed without additional mental overhead to your team, you can use [pre-commit hooks](https://pre-commit.com/) to automatically check your code for style violations (and often fix them automagically) before it's committed. This is a great way to make sure your style guide is followed by all contributors. We recommend implementing this once you've settled on and published your style guide, and your codebase is conforming to it. This will ensure that all future commits follow the style guide. You can find an excellent set of open source pre-commit hooks for dbt from the community [here in the dbt-checkpoint project](https://github.com/dbt-checkpoint/dbt-checkpoint).

## Style guide template

```markdown
# dbt Example Style Guide

## SQL Style

- Use lowercase keywords.
- Use trailing commas.

## Model Organization

Our models (typically) fit into two main categories:\

- Staging — Contains models that clean and standardize data.
- Marts — Contains models which combine or heavily transform data.

Things to note:

- There are different types of models that typically exist in each of the above categories. See [Model Layers](#model-layers) for more information.
- Read [How we structure our dbt projects](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) for an example and more details around organization.

## Model Layers

- Only models in `staging` should select from [sources](https://docs.getdbt.com/docs/building-a-dbt-project/using-sources).
- Models not in the `staging` folder should select from [refs](https://docs.getdbt.com/reference/dbt-jinja-functions/ref).

## Model File Naming and Coding

- All objects should be plural.
Example: `stg_stripe__invoices.sql` vs. `stg_stripe__invoice.sql`

- All models should use the naming convention `<type/dag_stage>_<source/topic>__<additional_context>`. See [this article](https://docs.getdbt.com/blog/stakeholder-friendly-model-names) for more information.

- Models in the **staging** folder should use the source's name as the `<source/topic>` and the entity name as the `additional_context`.

Examples:

- seed_snowflake_spend.csv
- base_stripe\_\_invoices.sql
- stg_stripe\_\_customers.sql
- stg_salesforce\_\_customers.sql
- int_customers\_\_unioned.sql
- fct_orders.sql

- Schema, table, and column names should be in `snake_case`.

- Limit the use of abbreviations that are related to domain knowledge. An onboarding employee will understand `current_order_status` better than `current_os`.

- Use names based on the _business_ rather than the source terminology.

- Each model should have a primary key to identify the unique row and should be named `<object>_id`. For example, `account_id`. This makes it easier to know what `id` is referenced in downstream joined models.

- For `base` or `staging` models, columns should be ordered in categories, where identifiers are first and date/time fields are at the end.
- Date/time columns should be named according to these conventions:

- Timestamps: `<event>_at`
Format: UTC
Example: `created_at`

- Dates: `<event>_date`
Format: Date
Example: `created_date`

- Booleans should be prefixed with `is_` or `has_`.
Example: `is_active_customer` and `has_admin_access`

- Price/revenue fields should be in decimal currency (for example, `19.99` for $19.99; many app databases store prices as integers in cents). If a non-decimal currency is used, indicate this with suffixes. For example, `price_in_cents`.

- Avoid using reserved words (such as [these](https://docs.snowflake.com/en/sql-reference/reserved-keywords.html) for Snowflake) as column names.

- Consistency is key! Use the same field names across models where possible. For example, a key to the `customers` table should be named `customer_id` rather than `user_id`.

## Model Configurations

- Model configurations at the [folder level](https://docs.getdbt.com/reference/model-configs#configuring-directories-of-models-in-dbt_projectyml) should be considered (and if applicable, applied) first.
- More specific configurations should be applied at the model level [using one of these methods](https://docs.getdbt.com/reference/model-configs#apply-configurations-to-one-model-only).
- Models within the `marts` folder should be materialized as `table` or `incremental`.
- By default, `marts` should be materialized as `table` within `dbt_project.yml`.
- If switching to `incremental`, this should be specified in the model's configuration.

## Testing

- At a minimum, `unique` and `not_null` tests should be applied to the expected primary key of each model.

## CTEs

For more information about why we use so many CTEs, read [this glossary entry](https://docs.getdbt.com/terms/cte).

- Where performance permits, CTEs should perform a single, logical unit of work.
- CTE names should be as verbose as needed to convey what they do.
- CTEs with confusing or noteable logic should be commented with SQL comments as you would with any complex functions and should be located above the CTE.
- CTEs duplicated across models should be pulled out and created as their own models.
```
Loading