Merge branch 'current' into packages

dbt-labs · Dec 11, 2023 · 819dfdf · 819dfdf
2 parents 4d8c5f8 + 2560cfb
commit 819dfdf
Show file tree

Hide file tree

Showing 214 changed files with 1,396 additions and 469 deletions.
diff --git a/contributing/content-style-guide.md b/contributing/content-style-guide.md
@@ -284,7 +284,7 @@ If the list starts getting lengthy and dense, consider presenting the same conte
 
 A bulleted list with introductory text:    
 
-> A dbt project is a directory of `.sql` and .yml` files. The directory must contain at a minimum:
+> A dbt project is a directory of `.sql` and `.yml` files. The directory must contain at a minimum:
 >
 > - Models: A model is a single `.sql` file. Each model contains a single `select` statement that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation.
 > - A project file: A `dbt_project.yml` file, which configures and defines your dbt project.

diff --git a/website/docs/best-practices/clone-incremental-models.md b/website/docs/best-practices/clone-incremental-models.md
@@ -0,0 +1,79 @@
+---
+title: "Clone incremental models as the first step of your CI job"
+id: "clone-incremental-models"
+description: Learn how to define clone incremental models as the first step of your CI job.
+displayText: Clone incremental models as the first step of your CI job
+hoverSnippet: Learn how to clone incremental models for CI jobs.
+---
+
+Before you begin, you must be aware of a few conditions:
+- `dbt clone` is only available with dbt version 1.6 and newer. Refer to our [upgrade guide](/docs/dbt-versions/upgrade-core-in-cloud) for help enabling newer versions in dbt Cloud
+- This strategy only works for warehouse that support zero copy cloning (otherwise `dbt clone` will just create pointer views).
+- Some teams may want to test that their incremental models run in both incremental mode and full-refresh mode.
+
+Imagine you've created a [Slim CI job](/docs/deploy/continuous-integration) in dbt Cloud and it is configured to: 
+
+- Defer to your production environment.
+- Run the command `dbt build --select state:modified+` to run and test all of the models you've modified and their downstream dependencies.
+- Trigger whenever a developer on your team opens a PR against the main branch.
+
+<Lightbox src="/img/best-practices/slim-ci-job.png" width="70%" title="Example of a slim CI job with the above configurations" />
+
+Now imagine your dbt project looks something like this in the DAG:
+
+<Lightbox src="/img/best-practices/dag-example.png" width="70%" title="Sample project DAG" />
+
+When you open a pull request (PR) that modifies `dim_wizards`, your CI job will kickoff and build _only the modified models and their downstream dependencies_ (in this case, `dim_wizards` and `fct_orders`) into a temporary schema that's unique to your PR. 
+
+This build mimics the behavior of what will happen once the PR is merged into the main branch. It ensures you're not introducing breaking changes, without needing to build your entire dbt project. 
+
+## What happens when one of the modified models (or one of their downstream dependencies) is an incremental model?
+
+Because your CI job is building modified models into a PR-specific schema, on the first execution of `dbt build --select state:modified+`, the modified incremental model will be built in its entirety _because it does not yet exist in the PR-specific schema_ and [is_incremental will be false](/docs/build/incremental-models#understanding-the-is_incremental-macro). You're running in `full-refresh` mode.
+
+This can be suboptimal because:
+- Typically incremental models are your largest datasets, so they take a long time to build in their entirety which can slow down development time and incur high warehouse costs.
+- There are situations where a `full-refresh` of the incremental model passes successfully in your CI job but an _incremental_ build of that same table in prod would fail when the PR is merged into main (think schema drift where [on_schema_change](/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`)
+
+You can alleviate these problems by zero copy cloning the relevant, pre-exisitng incremental models into your PR-specific schema as the first step of the CI job using the `dbt clone` command. This way, the incremental models already exist in the PR-specific schema when you first execute the command `dbt build --select state:modified+` so the `is_incremental` flag will be `true`. 
+
+You'll have two commands for your dbt Cloud CI check to execute:
+1. Clone all of the pre-existing incremental models that have been modified or are downstream of another model that has been modified: `dbt clone --select state:modified+,config.materialized:incremental,state:old`
+2. Build all of the models that have been modified and their downstream dependencies: `dbt build --select state:modified+`
+
+Because of your first clone step, the incremental models selected in your `dbt build` on the second step will run in incremental mode.
+
+<Lightbox src="/img/best-practices/clone-command.png" width="70%" title="Clone command in the CI config" />
+
+Your CI jobs will run faster, and you're more accurately mimicking the behavior of what will happen once the PR has been merged into main. 
+
+### Expansion on "think schema drift" where [on_schema_change](/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`" from above
+
+Imagine you have an incremental model `my_incremental_model` with the following config:
+
+```sql
+
+{{
+    config(
+        materialized='incremental',
+        unique_key='unique_id',
+        on_schema_change='fail'
+    )
+}}
+
+```
+
+Now, let’s say you open up a PR that adds a new column to `my_incremental_model`. In this case:
+- An incremental build will fail.
+- A `full-refresh` will succeed.
+
+If you have a daily production job that just executes `dbt build` without a `--full-refresh` flag, once the PR is merged into main and the job kicks off, you will get a failure. So the question is - what do you want to happen in CI?
+- Do you want to also get a failure in CI, so that you know that once this PR is merged into main you need to immediately execute a `dbt build --full-refresh --select my_incremental_model` in production in order to avoid a failure in prod? This will block your CI check from passing.
+- Do you want your CI check to succeed, because once you do run a `full-refresh` for this model in prod you will be in a successful state? This may lead unpleasant surprises if your production job is suddenly failing when you merge this PR into main if you don’t remember you need to execute a `dbt build --full-refresh --select my_incremental_model` in production.
+
+There’s probably no perfect solution here; it’s all just tradeoffs! Our preference would be to have the failing CI job and have to manually override the blocking branch protection rule so that there are no surprises and we can proactively run the appropriate command in production once the PR is merged. 
+
+### Expansion on "why `state:old`"
+
+For brand new incremental models, you want them to run in `full-refresh` mode in CI, because they will run in `full-refresh` mode in production when the PR is merged into `main`. They also don't exist yet in the production environment... they're brand new!
+If you don't specify this, you won't get an error just a “No relation found in state manifest for…”. So, it technically works without specifying `state:old` but adding `state:old` is more explicit and means it won't even try to clone the brand new incremental models.
diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md
@@ -49,9 +49,9 @@ The **prod** service principal should have “read” access to raw source data,
 
 |  | Source Data | Development catalog | Production catalog | Test catalog |
 | --- | --- | --- | --- | --- |
-| developers | use | use, create table & create view | use or none | none |
-| production service principal | use | none | use, create table & create view | none |
-| Test service principal | use | none | none | use, create table & create view |
+| developers | use | use, create schema, table, & view | use or none | none |
+| production service principal | use | none | use, create schema, table & view | none |
+| Test service principal | use | none | none | use, create schema, table & view |
 
 
 ## Next steps

diff --git a/.../best-practices/materializations/materializations-guide-4-incremental-models.md b/.../best-practices/materializations/materializations-guide-4-incremental-models.md
@@ -7,7 +7,7 @@ displayText: Materializations best practices
 hoverSnippet: Read this guide to understand the incremental models you can create in dbt.
 ---
 
-So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This where we start to deviate from this pattern with more powerful and complex materializations.
+So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This is where we start to deviate from this pattern with more powerful and complex materializations.
 
 - 📚 **Incremental models generate tables.** They physically persist the data itself to the warehouse, just piece by piece. What’s different is **how we build that table**.
 - 💅 **Only apply our transformations to rows of data with new or updated information**, this maximizes efficiency.
@@ -53,7 +53,7 @@ where
   updated_at > (select max(updated_at) from {{ this }})
 ```
 
-Let’s break down that `where` clause a bit, because this where the action is with incremental models. Stepping through the code **_right-to-left_** we:
+Let’s break down that `where` clause a bit, because this is where the action is with incremental models. Stepping through the code **_right-to-left_** we:
 
 1. Get our **cutoff.**
    1. Select the `max(updated_at)` timestamp — the **most recent record**
@@ -138,7 +138,7 @@ where
 {% endif %}
 ```
 
-Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs is it will evaluate to true and we’ll apply our filter logic, capturing only the newer data.
+Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs it will evaluate to true and we’ll apply our filter logic, capturing only the newer data.
 
 ### Late arriving facts
 

diff --git a/website/docs/community/resources/getting-help.md b/website/docs/community/resources/getting-help.md
@@ -60,4 +60,4 @@ If you want to receive dbt training, check out our [dbt Learn](https://learn.get
 - Billing
 - Bug reports related to the web interface
 
-As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support.
+As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support. Refer to [dbt Cloud support](/docs/dbt-support) for more information.
diff --git a/website/docs/community/spotlight/alison-stanton.md b/website/docs/community/spotlight/alison-stanton.md
@@ -17,6 +17,7 @@ socialLinks:
     link: https://github.com/alison985/
 dateCreated: 2023-11-07
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/bruno-de-lima.md b/website/docs/community/spotlight/bruno-de-lima.md
@@ -20,6 +20,7 @@ socialLinks:
     link: https://medium.com/@bruno.szdl
 dateCreated: 2023-11-05
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/dakota-kelley.md b/website/docs/community/spotlight/dakota-kelley.md
@@ -15,6 +15,7 @@ socialLinks:
     link: https://www.linkedin.com/in/dakota-kelley/
 dateCreated: 2023-11-08
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/fabiyi-opeyemi.md b/website/docs/community/spotlight/fabiyi-opeyemi.md
@@ -18,6 +18,7 @@ socialLinks:
     link: https://www.linkedin.com/in/opeyemifabiyi/
 dateCreated: 2023-11-06
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/josh-devlin.md b/website/docs/community/spotlight/josh-devlin.md
@@ -23,6 +23,7 @@ socialLinks:
     link: https://www.linkedin.com/in/josh-devlin/
 dateCreated: 2023-11-10
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/karen-hsieh.md b/website/docs/community/spotlight/karen-hsieh.md
@@ -24,6 +24,7 @@ socialLinks:
     link: https://medium.com/@ijacwei
 dateCreated: 2023-11-04
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/oliver-cramer.md b/website/docs/community/spotlight/oliver-cramer.md
@@ -16,6 +16,7 @@ socialLinks:
     link: https://www.linkedin.com/in/oliver-cramer/
 dateCreated: 2023-11-02
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/sam-debruyn.md b/website/docs/community/spotlight/sam-debruyn.md
@@ -18,6 +18,7 @@ socialLinks:
     link: https://debruyn.dev/
 dateCreated: 2023-11-03
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/stacy-lo.md b/website/docs/community/spotlight/stacy-lo.md
@@ -17,6 +17,7 @@ socialLinks:
     link: https://www.linkedin.com/in/olycats/
 dateCreated: 2023-11-01
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/community/spotlight/sydney-burns.md b/website/docs/community/spotlight/sydney-burns.md
@@ -15,6 +15,7 @@ socialLinks:
     link: https://www.linkedin.com/in/sydneyeburns/
 dateCreated: 2023-11-09
 hide_table_of_contents: true
+communityAward: true
 ---
 
 ## When did you join the dbt community and in what way has it impacted your career?

diff --git a/website/docs/docs/about-setup.md b/website/docs/docs/about-setup.md
@@ -21,14 +21,14 @@ To begin configuring dbt now, select the option that is right for you.
 
 <Card
     title="dbt Cloud setup"
-    body="Learn how to connect to a data platform, integrate with secure authentication methods, configure a sync with a git repo, how to use the IDE, and how to install the dbt Cloud CLI."
+    body="Learn how to connect to a data platform, integrate with secure authentication methods, and configure a sync with a git repo."
     link="/docs/cloud/about-cloud-setup"
     icon="dbt-bit"/>
 
 <Card
-    title="dbt Core installation"
-    body="Learn how to connect install dbt Core using Pip, Homebrew, Docker, or the open source repo."
-    link="/docs/core/installation"
+    title="dbt Core setup"
+    body="Learn about dbt Core and how to setup data platform connections."
+    link="/docs/core/about-core-setup"
     icon="dbt-bit"/>
 
 </div>
diff --git a/website/docs/docs/build/build-metrics-intro.md b/website/docs/docs/build/build-metrics-intro.md
@@ -14,7 +14,7 @@ Use MetricFlow in dbt to centrally define your metrics. As a key component of th
 
 MetricFlow allows you to:
 - Intuitively define metrics in your dbt project
-- Develop from your preferred environment, whether that's the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation), [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), or [dbt Core](/docs/core/installation)
+- Develop from your preferred environment, whether that's the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation), [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), or [dbt Core](/docs/core/installation-overview)
 - Use [MetricFlow commands](/docs/build/metricflow-commands) to query and test those metrics in your development environment 
 - Harness the true magic of the universal dbt Semantic Layer and dynamically query these metrics in downstream tools (Available for dbt Cloud [Team or Enterprise](https://www.getdbt.com/pricing/) accounts only).
 

diff --git a/website/docs/docs/build/cumulative-metrics.md b/website/docs/docs/build/cumulative-metrics.md
@@ -38,10 +38,7 @@ metrics:
 
 ## Limitations
 Cumulative metrics are currently under active development and have the following limitations:
-
-1. You can only use the [`metric_time` dimension](/docs/build/dimensions#time) to check cumulative metrics. If you don't use `metric_time` in the query, the cumulative metric will return incorrect results because it won't perform the time spine join. This means you cannot reference time dimensions other than the `metric_time` in the query.
-2. If you use `metric_time` in your query filter but don't include "start_time" and "end_time," cumulative metrics will left-censor the input data. For example, if you query a cumulative metric with a 7-day window with the filter `{{ TimeDimension('metric_time') }} BETWEEN '2023-08-15' AND '2023-08-30' `, the values for `2023-08-15` to `2023-08-20` return missing or incomplete data. This is because we apply the `metric_time` filter to the aggregation input. To avoid this, you must use `start_time` and `end_time` in the query filter.
-
+- You are required to use [`metric_time` dimension](/docs/build/dimensions#time) when querying cumulative metrics. If you don't use `metric_time` in the query, the cumulative metric will return incorrect results because it won't perform the time spine join. This means you cannot reference time dimensions other than the `metric_time` in the query.
 
 ## Cumulative metrics example
 

diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md
@@ -15,7 +15,8 @@ In a data platform, dimensions is part of a larger structure called a semantic m
 
 Groups are defined within semantic models, alongside entities and measures, and correspond to non-aggregatable columns in your dbt model that provides categorical or time-based context. In SQL, dimensions  is typically included in the GROUP BY clause.-->
 
-All dimensions require a `name`, `type` and in some cases, an `expr` parameter. 
+All dimensions require a `name`, `type` and in some cases, an `expr` parameter. The `name` for your dimension must be unique to the semantic model and can not be the same as an existing `entity` or `measure` within that same model.
+
 
 | Parameter | Description | Type |
 | --------- | ----------- | ---- |

diff --git a/website/docs/docs/build/entities.md b/website/docs/docs/build/entities.md
@@ -8,7 +8,7 @@ tags: [Metrics, Semantic Layer]
 
 Entities are real-world concepts in a business such as customers, transactions, and ad campaigns. We often focus our analyses around specific entities, such as customer churn or annual recurring revenue modeling. We represent entities in our semantic models using id columns that serve as join keys to other semantic models in your semantic graph.
 
-Within a semantic graph, the required parameters for an entity are `name` and `type`. The `name` refers to either the key column name from the underlying data table, or it may serve as an alias with the column name referenced in the `expr` parameter.
+Within a semantic graph, the required parameters for an entity are `name` and `type`. The `name` refers to either the key column name from the underlying data table, or it may serve as an alias with the column name referenced in the `expr` parameter. The `name` for your entity must be unique to the semantic model and can not be the same as an existing `measure` or `dimension` within that same model.
 
 Entities can be specified with a single column or multiple columns. Entities (join keys) in a semantic model are identified by their name. Each entity name must be unique within a semantic model, but it doesn't have to be unique across different semantic models. 
 

diff --git a/website/docs/docs/build/materializations.md b/website/docs/docs/build/materializations.md
@@ -14,6 +14,8 @@ pagination_next: "docs/build/incremental-models"
 - ephemeral
 - materialized view
 
+You can also configure [custom materializations](/guides/create-new-materializations?step=1) in dbt. Custom materializations are a powerful way to extend dbt's functionality to meet your specific needs. 
+
 
 ## Configuring materializations
 By default, dbt models are materialized as "views". Models can be configured with a different materialization by supplying the `materialized` configuration parameter as shown below.

diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md
@@ -34,7 +34,8 @@ measures:
 
 When you create a measure, you can either give it a custom name or use the `name` of the data platform column directly. If the `name` of the measure is different from the column name, you need to add an `expr` to specify the column name. The `name` of the measure is used when creating a metric. 
 
-Measure names must be **unique** across all semantic models in a project.
+Measure names must be unique across all semantic models in a project and can not be the same as an existing `entity` or `dimension` within that same model.
+
 
 ### Description