Skip to content

Commit

Permalink
Merge branch 'current' into ENTERPRISE-84-end-user-saml-sso-using-goo…
Browse files Browse the repository at this point in the history
…gle-idp-should-get-group-membership
  • Loading branch information
robotmiller authored Sep 23, 2023
2 parents c4b78cc + d8a207a commit b0ff48c
Show file tree
Hide file tree
Showing 6 changed files with 200 additions and 13 deletions.
84 changes: 84 additions & 0 deletions website/docs/docs/cloud/billing.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,90 @@ There are 2 options to disable models from being built and charged:
2. Alternatively, you can delete some or all of your dbt Cloud jobs. This will ensure that no runs are kicked off, but you will permanently lose your job(s).


## Optimize costs in dbt Cloud

dbt Cloud offers ways to optimize your model’s built usage and warehouse costs.

### Best practices for optimizing successful models built

When thinking of ways to optimize your costs from successful models built, there are methods to reduce those costs while still adhering to best practices. To ensure that you are still utilizing tests and rebuilding views when logic is changed, it's recommended to implement a combination of the best practices that fit your needs. More specifically, if you decide to exclude views from your regularly scheduled dbt Cloud job runs, it's imperative that you set up a merge job (with a link to the section) to deploy updated view logic when changes are detected.

#### Exclude views in a dbt Cloud job

Many dbt Cloud users utilize views, which don’t always need to be rebuilt every time you run a job. For any jobs that contain views that _do not_ include macros that dynamically generate code (for example, case statements) based on upstream tables and also _do not_ have tests, you can implement these steps:

1. Go to your current production deployment job in dbt Cloud.
2. Modify your command to include: `-exclude config.materialized:view`.
3. Save your job changes.

If you have views that contain macros with case statements based on upstream tables, these will need to be run each time to account for new values. If you still need to test your views with each run, follow the [Exclude views while still running tests](#exclude-views-while-running-tests) best practice to create a custom selector.

#### Exclude views while running tests

Running tests for views in every job run can help keep data quality intact and save you from the need to rerun failed jobs. To exclude views from your job run while running tests, you can follow these steps to create a custom [selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors) for your job command.

1. Open your dbt project in the dbt Cloud IDE.
2. Add a file called `selectors.yml` in your top-level project folder.
3. In the file, add the following code:

```yaml
selectors:
- name: skip_views_but_test_views
description: >
A default selector that will exclude materializing views
without skipping tests on views.
default: true
definition:
union:
- union:
- method: path
value: "*"
- exclude:
- method: config.materialized
value: view
- method: resource_type
value: test

```

4. Save the file and commit it to your project.
5. Modify your dbt Cloud jobs to include `--selector skip_views_but_test_views`.

#### Build only changed views

If you want to ensure that you're building views whenever the logic is changed, create a merge job that gets triggered when code is merged into main:

1. Ensure you have a [CI job setup](/docs/deploy/ci-jobs) in your environment.
2. Create a new [deploy job](/docs/deploy/deploy-jobs#create-and-schedule-jobs) and call it “Merge Job".
3. Set the  **Environment** to your CI environment. Refer to [Types of environments](/docs/deploy/deploy-environments#types-of-environments) for more details.
4. Set **Commands** to: `dbt run -s state:modified+`.
Executing `dbt build` in this context is unnecessary because the CI job was used to both run and test the code that just got merged into main.
5. Under the **Execution Settings**, select the default production job to compare changes against:
- **Defer to a previous run state** — Select the “Merge Job” you created so the job compares and identifies what has changed since the last merge.
6. In your dbt project, follow the steps in [Run a dbt Cloud job on merge](/guides/orchestration/custom-cicd-pipelines/3-dbt-cloud-job-on-merge) to create a script to trigger the dbt Cloud API to run your job after a merge happens within your git repository or watch this [video](https://www.loom.com/share/e7035c61dbed47d2b9b36b5effd5ee78?sid=bcf4dd2e-b249-4e5d-b173-8ca204d9becb).

The purpose of the merge job is to:

- Immediately deploy any changes from PRs to production.
- Ensure your production views remain up-to-date with how they’re defined in your codebase while remaining cost-efficient when running jobs in production.

The merge action will optimize your cloud data platform spend and shorten job times, but you’ll need to decide if making the change is right for your dbt project.

### Rework inefficient models

#### Job Insights tab

To reduce your warehouse spend, you can identify what models, on average, are taking the longest to build in the **Job** page under the **Insights** tab. This chart looks at the average run time for each model based on its last 20 runs. Any models that are taking longer than anticipated to build might be prime candidates for optimization, which will ultimately reduce cloud warehouse spending.

#### Model Timing tab

To understand better how long each model takes to run within the context of a specific run, you can look at the **Model Timing** tab. Select the run of interest on the **Run History** page to find the tab. On that **Run** page, click **Model Timing**.

Once you've identified which models could be optimized, check out these other resources that walk through how to optimize your work:
* [Build scalable and trustworthy data pipelines with dbt and BigQuery](https://services.google.com/fh/files/misc/dbt_bigquery_whitepaper.pdf)
* [Best Practices for Optimizing Your dbt and Snowflake Deployment](https://www.snowflake.com/wp-content/uploads/2021/10/Best-Practices-for-Optimizing-Your-dbt-and-Snowflake-Deployment.pdf)
* [How to optimize and troubleshoot dbt models on Databricks](/guides/dbt-ecosystem/databricks-guides/how_to_optimize_dbt_models_on_databricks)

## FAQs

* What happens if I need more than 8 seats on the Team plan?
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,98 @@ Now that you've seen how we style our dbt projects, it's time to build your own.
## Pre-commit hooks

Lastly, to ensure your style guide's automated rules are being followed without additional mental overhead to your team, you can use [pre-commit hooks](https://pre-commit.com/) to automatically check your code for style violations (and often fix them automagically) before it's committed. This is a great way to make sure your style guide is followed by all contributors. We recommend implementing this once you've settled on and published your style guide, and your codebase is conforming to it. This will ensure that all future commits follow the style guide. You can find an excellent set of open source pre-commit hooks for dbt from the community [here in the dbt-checkpoint project](https://github.com/dbt-checkpoint/dbt-checkpoint).

## Style guide template

```markdown
# dbt Example Style Guide

## SQL Style

- Use lowercase keywords.
- Use trailing commas.

## Model Organization

Our models (typically) fit into two main categories:\

- Staging — Contains models that clean and standardize data.
- Marts — Contains models which combine or heavily transform data.

Things to note:

- There are different types of models that typically exist in each of the above categories. See [Model Layers](#model-layers) for more information.
- Read [How we structure our dbt projects](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) for an example and more details around organization.

## Model Layers

- Only models in `staging` should select from [sources](https://docs.getdbt.com/docs/building-a-dbt-project/using-sources).
- Models not in the `staging` folder should select from [refs](https://docs.getdbt.com/reference/dbt-jinja-functions/ref).

## Model File Naming and Coding

- All objects should be plural.
Example: `stg_stripe__invoices.sql` vs. `stg_stripe__invoice.sql`

- All models should use the naming convention `<type/dag_stage>_<source/topic>__<additional_context>`. See [this article](https://docs.getdbt.com/blog/stakeholder-friendly-model-names) for more information.

- Models in the **staging** folder should use the source's name as the `<source/topic>` and the entity name as the `additional_context`.

Examples:

- seed_snowflake_spend.csv
- base_stripe\_\_invoices.sql
- stg_stripe\_\_customers.sql
- stg_salesforce\_\_customers.sql
- int_customers\_\_unioned.sql
- fct_orders.sql

- Schema, table, and column names should be in `snake_case`.

- Limit the use of abbreviations that are related to domain knowledge. An onboarding employee will understand `current_order_status` better than `current_os`.

- Use names based on the _business_ rather than the source terminology.

- Each model should have a primary key to identify the unique row and should be named `<object>_id`. For example, `account_id`. This makes it easier to know what `id` is referenced in downstream joined models.

- For `base` or `staging` models, columns should be ordered in categories, where identifiers are first and date/time fields are at the end.
- Date/time columns should be named according to these conventions:

- Timestamps: `<event>_at`
Format: UTC
Example: `created_at`

- Dates: `<event>_date`
Format: Date
Example: `created_date`

- Booleans should be prefixed with `is_` or `has_`.
Example: `is_active_customer` and `has_admin_access`

- Price/revenue fields should be in decimal currency (for example, `19.99` for $19.99; many app databases store prices as integers in cents). If a non-decimal currency is used, indicate this with suffixes. For example, `price_in_cents`.

- Avoid using reserved words (such as [these](https://docs.snowflake.com/en/sql-reference/reserved-keywords.html) for Snowflake) as column names.

- Consistency is key! Use the same field names across models where possible. For example, a key to the `customers` table should be named `customer_id` rather than `user_id`.

## Model Configurations

- Model configurations at the [folder level](https://docs.getdbt.com/reference/model-configs#configuring-directories-of-models-in-dbt_projectyml) should be considered (and if applicable, applied) first.
- More specific configurations should be applied at the model level [using one of these methods](https://docs.getdbt.com/reference/model-configs#apply-configurations-to-one-model-only).
- Models within the `marts` folder should be materialized as `table` or `incremental`.
- By default, `marts` should be materialized as `table` within `dbt_project.yml`.
- If switching to `incremental`, this should be specified in the model's configuration.

## Testing

- At a minimum, `unique` and `not_null` tests should be applied to the expected primary key of each model.

## CTEs

For more information about why we use so many CTEs, read [this glossary entry](https://docs.getdbt.com/terms/cte).

- Where performance permits, CTEs should perform a single, logical unit of work.
- CTE names should be as verbose as needed to convey what they do.
- CTEs with confusing or noteable logic should be commented with SQL comments as you would with any complex functions and should be located above the CTE.
- CTEs duplicated across models should be pulled out and created as their own models.
```
3 changes: 1 addition & 2 deletions website/docs/guides/legacy/debugging-schema-names.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,7 @@ If your `generate_schema_name` macro looks like so:
{{ generate_schema_name_for_env(custom_schema_name, node) }}
{%- endmacro %}
```
Your project is switching out the `generate_schema_name` macro for another macro, `generate_schema_name_for_env`. Similar to the above example, this is a macro which is defined in dbt's global project, [here](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/include/global_project/macros/etc/get_custom_schema.sql#L43-L56).

Your project is switching out the `generate_schema_name` macro for another macro, `generate_schema_name_for_env`. Similar to the above example, this is a macro which is defined in dbt's global project, [here](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/get_custom_name/get_custom_schema.sql#L47-L60).
```sql
{% macro generate_schema_name_for_env(custom_schema_name, node) -%}

Expand Down
8 changes: 8 additions & 0 deletions website/docs/reference/dbt-classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ col = Column('name', 'varchar', 255)
col.is_string() # True
col.is_numeric() # False
col.is_number() # False
col.is_integer() # False
col.is_float() # False
col.string_type() # character varying(255)
col.numeric_type('numeric', 12, 4) # numeric(12,4)
Expand All @@ -112,6 +113,7 @@ col.numeric_type('numeric', 12, 4) # numeric(12,4)
- **is_string()**: Returns True if the column is a String type (eg. text, varchar), else False
- **is_numeric()**: Returns True if the column is a fixed-precision Numeric type (eg. `numeric`), else False
- **is_number()**: Returns True if the column is a number-y type (eg. `numeric`, `int`, `float`, or similar), else False
- **is_integer()**: Returns True if the column is an integer (eg. `int`, `bigint`, `serial` or similar), else False
- **is_float()**: Returns True if the column is a float type (eg. `float`, `float64`, or similar), else False
- **string_size()**: Returns the width of the column if it is a string type, else, an exception is raised

Expand All @@ -136,6 +138,9 @@ col.numeric_type('numeric', 12, 4) # numeric(12,4)
-- Return true if the column is a number
{{ string_column.is_number() }}
-- Return true if the column is an integer
{{ string_column.is_integer() }}
-- Return true if the column is a float
{{ string_column.is_float() }}
Expand All @@ -151,6 +156,9 @@ col.numeric_type('numeric', 12, 4) # numeric(12,4)
-- Return true if the column is a number
{{ numeric_column.is_number() }}
-- Return true if the column is an integer
{{ numeric_column.is_integer() }}
-- Return true if the column is a float
{{ numeric_column.is_float() }}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
---
title: " About dbt_project.yml context variables"
title: " About dbt_project.yml context"
sidebar_label: "dbt_project.yml context"
id: "dbt-project-yml-context"
description: "The context variables and methods are available when configuring resources in the dbt_project.yml file."
description: "The context methods and variables available when configuring resources in the dbt_project.yml file."
---

The following context variables and methods are available when configuring
The following context methods and variables are available when configuring
resources in the `dbt_project.yml` file. This applies to the `models:`, `seeds:`,
and `snapshots:` keys in the `dbt_project.yml` file.

**Available context methods:**
- [env_var](/reference/dbt-jinja-functions/env_var)
- [var](/reference/dbt-jinja-functions/var) (_Note: only variables defined with `--vars` are available_)

**Available context variables:**
- [target](/reference/dbt-jinja-functions/target)
- [env_var](/reference/dbt-jinja-functions/env_var)
- [vars](/reference/dbt-jinja-functions/var) (_Note: only variables defined with `--vars` are available_)
- [builtins](/reference/dbt-jinja-functions/builtins)
- [dbt_version](/reference/dbt-jinja-functions/dbt_version)


### Example configuration

<File name='dbt_project.yml'>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
title: "About profiles.yml context variable"
title: "About profiles.yml context"
sidebar_label: "profiles.yml context"
id: "profiles-yml-context"
description: "Use these context variables to configure resources in `profiles.yml` file."
description: "Use these context methods to configure resources in `profiles.yml` file."
---

The following context variables and methods are available when configuring
The following context methods are available when configuring
resources in the `profiles.yml` file.

**Available context variables:**
**Available context methods:**
- [env_var](/reference/dbt-jinja-functions/env_var)
- [vars](/reference/dbt-jinja-functions/var) (_Note: only variables defined with `--vars` are available_)
- [var](/reference/dbt-jinja-functions/var) (_Note: only variables defined with `--vars` are available_)

### Example usage

Expand Down

0 comments on commit b0ff48c

Please sign in to comment.