Merge branch 'current' into runleonarun-patch-15

dbt-labs · Jan 8, 2024 · 25abe94 · 25abe94
2 parents 0f57c4d + fdfffe8
commit 25abe94
Show file tree

Hide file tree

Showing 8 changed files with 171 additions and 37 deletions.
diff --git a/website/docs/best-practices/how-we-mesh/mesh-1-intro.md b/website/docs/best-practices/how-we-mesh/mesh-1-intro.md
@@ -6,19 +6,21 @@ hoverSnippet: Learn how to get started with dbt Mesh
 
 ## What is dbt Mesh?
 
-Organizations of all sizes rely upon dbt to manage their data transformations, from small startups to large enterprises. At scale, it can be challenging to coordinate all the organizational and technical requirements demanded by your stakeholders within the scope of a single dbt project. To date, there also hasn't been a first-class way to effectively manage the dependencies, governance, and workflows between multiple dbt projects.
+Organizations of all sizes rely upon dbt to manage their data transformations, from small startups to large enterprises. At scale, it can be challenging to coordinate all the organizational and technical requirements demanded by your stakeholders within the scope of a single dbt project.
 
-Regardless of your organization's size and complexity, dbt should empower data teams to work independently and collaboratively; sharing data, code, and best practices without sacrificing security or autonomy. dbt Mesh provides the tooling for teams to finally achieve this.
+To date, there also hasn't been a first-class way to effectively manage the dependencies, governance, and workflows between multiple dbt projects. 
 
-dbt Mesh is not a single product: it is a pattern enabled by a convergence of several features in dbt:
+That's where **dbt Mesh** comes in - empowering data teams to work *independently and collaboratively*; sharing data, code, and best practices without sacrificing security or autonomy. 
+
+This guide will walk you through the concepts and implementation details needed to get started. dbt Mesh is not a single product - it is a pattern enabled by a convergence of several features in dbt:
 
 - **[Cross-project references](/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref)** - this is the foundational feature that enables the multi-project deployments. `{{ ref() }}`s now work across dbt Cloud projects on Enterprise plans.
 - **[dbt Explorer](/docs/collaborate/explore-projects)** - dbt Cloud's metadata-powered documentation platform, complete with full, cross-project lineage.
-- **Governance** - dbt's new governance features allow you to manage access to your dbt models both within and across projects.
-  - **[Groups](/docs/collaborate/govern/model-access#groups)** - groups allow you to assign models to subsets within a project.
+- **Governance** - dbt's governance features allow you to manage access to your dbt models both within and across projects.
+  - **[Groups](/docs/collaborate/govern/model-access#groups)** - With groups, you can organize nodes in your dbt DAG that share a logical connection (for example, by functional area) and assign an owner to the entire group.
   - **[Access](/docs/collaborate/govern/model-access#access-modifiers)** - access configs allow you to control who can reference models.
-- **[Model Versions](/docs/collaborate/govern/model-versions)** - when coordinating across projects and teams, we recommend treating your data models as stable APIs. Model versioning is the mechanism to allow graceful adoption and deprecation of models as they evolve.
-- **[Model Contracts](/docs/collaborate/govern/model-contracts)** - data contracts set explicit expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products.
+  - **[Model Versions](/docs/collaborate/govern/model-versions)** - when coordinating across projects and teams, we recommend treating your data models as stable APIs. Model versioning is the mechanism to allow graceful adoption and deprecation of models as they evolve.
+  - **[Model Contracts](/docs/collaborate/govern/model-contracts)** - data contracts set explicit expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products.
 
 ## Who is dbt Mesh for?
 

diff --git a/website/docs/best-practices/how-we-mesh/mesh-3-implementation.md b/website/docs/best-practices/how-we-mesh/mesh-3-implementation.md
@@ -127,4 +127,4 @@ We've provided a set of example projects you can use to explore the topics cover
 
 ### dbt-meshify
 
-We recommend using the `dbt-meshify` [command line tool](<https://dbt-labs.github.io/dbt-meshify/0.1/>) to help you do this. This comes with CLI operations to automate most of the above steps.
+We recommend using the `dbt-meshify` [command line tool](<https://dbt-labs.github.io/dbt-meshify/>) to help you do this. This comes with CLI operations to automate most of the above steps.
diff --git a/website/docs/docs/build/cumulative-metrics.md b/website/docs/docs/build/cumulative-metrics.md
@@ -31,8 +31,8 @@ metrics:
     label: The value that will be displayed in downstream tools # Required
     type_params: # Required
       measure: The measure you are referencing # Required
-      window: The accumulation window, such as 1 month, 7 days, 1 year. # Optional. Can not be used with window. 
-      grain_to_date: Sets the accumulation grain, such as month will accumulate data for one month, then restart at the beginning of the next.  # Optional. Cannot be used with grain_to_date
+      window: The accumulation window, such as 1 month, 7 days, 1 year. # Optional. Cannot be used with grain_to_date
+      grain_to_date: Sets the accumulation grain, such as month will accumulate data for one month, then restart at the beginning of the next.  # Optional. Cannot be used with window
 
 ```
 

diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md
@@ -236,7 +236,7 @@ Instead, whenever the logic of your incremental changes, execute a full-refresh
 
 ## About `incremental_strategy`
 
-There are various ways (strategies) to implement the concept of an incremental materializations. The value of each strategy depends on:
+There are various ways (strategies) to implement the concept of incremental materializations. The value of each strategy depends on:
 
 * the volume of data,
 * the reliability of your `unique_key`, and
@@ -450,5 +450,129 @@ The syntax depends on how you configure your `incremental_strategy`:
 
 </VersionBlock>
 
+### Built-in strategies
+
+Before diving into [custom strategies](#custom-strategies), it's important to understand the built-in incremental strategies in dbt and their corresponding macros:
+
+| `incremental_strategy` | Corresponding macro                    |
+|------------------------|----------------------------------------|
+| `append`               | `get_incremental_append_sql`           |
+| `delete+insert`        | `get_incremental_delete_insert_sql`    |
+| `merge`                | `get_incremental_merge_sql`            |
+| `insert_overwrite`     | `get_incremental_insert_overwrite_sql` |
+
+
+For example, a built-in strategy for the `append` can be defined and used with the following files:
+
+<File name='macros/append.sql'>
+
+```sql
+{% macro get_incremental_append_sql(arg_dict) %}
+
+  {% do return(some_custom_macro_with_sql(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"], arg_dict["dest_columns"], arg_dict["incremental_predicates"])) %}
+
+{% endmacro %}
+
+
+{% macro some_custom_macro_with_sql(target_relation, temp_relation, unique_key, dest_columns, incremental_predicates) %}
+
+    {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
+
+    insert into {{ target_relation }} ({{ dest_cols_csv }})
+    (
+        select {{ dest_cols_csv }}
+        from {{ temp_relation }}
+    )
+
+{% endmacro %}
+```
+</File>
+
+Define a model models/my_model.sql:
+
+```sql
+{{ config(
+    materialized="incremental",
+    incremental_strategy="append",
+) }}
+
+select * from {{ ref("some_model") }}
+```
+
+### Custom strategies
+
+<VersionBlock lastVersion="1.1">
+
+Custom incremental strategies can be defined beginning in dbt v1.2.
+
+</VersionBlock>
+
+<VersionBlock firstVersion="1.2">
+
+As an easier alternative to [creating an entirely new materialization](/guides/create-new-materializations), users can define and use their own "custom" user-defined incremental strategies by:
+
+1. defining a macro named `get_incremental_STRATEGY_sql`. Note that `STRATEGY` is a placeholder and you should replace it with the name of your custom incremental strategy.
+2. configuring `incremental_strategy: STRATEGY` within an incremental model
+
+dbt won't validate user-defined strategies, it will just look for the macro by that name, and raise an error if it can't find one.
+
+For example, a user-defined strategy named `insert_only` can be defined and used with the following files:
+
+<File name='macros/my_custom_strategies.sql'>
+
+```sql
+{% macro get_incremental_insert_only_sql(arg_dict) %}
+
+  {% do return(some_custom_macro_with_sql(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"], arg_dict["dest_columns"], arg_dict["incremental_predicates"])) %}
+
+{% endmacro %}
+
+
+{% macro some_custom_macro_with_sql(target_relation, temp_relation, unique_key, dest_columns, incremental_predicates) %}
+
+    {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
+
+    insert into {{ target_relation }} ({{ dest_cols_csv }})
+    (
+        select {{ dest_cols_csv }}
+        from {{ temp_relation }}
+    )
+
+{% endmacro %}
+```
+
+</File>
+
+<File name='models/my_model.sql'>
+
+```sql
+{{ config(
+    materialized="incremental",
+    incremental_strategy="insert_only",
+    ...
+) }}
+
+...
+```
+
+</File>
+
+### Custom strategies from a package
+
+To use the `merge_null_safe` custom incremental strategy from the `example` package:
+- [Install the package](/docs/build/packages#how-do-i-add-a-package-to-my-project)
+- Then add the following macro to your project:
+
+<File name='macros/my_custom_strategies.sql'>
+
+```sql
+{% macro get_incremental_merge_null_safe_sql(arg_dict) %}
+    {% do return(example.get_incremental_merge_null_safe_sql(arg_dict)) %}
+{% endmacro %}
+```
+
+</File>
+</VersionBlock>
+
 <Snippet path="discourse-help-feed-header" />
 <DiscourseHelpFeed tags="incremental"/>
diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md
@@ -14,7 +14,7 @@ Linters analyze code for errors, bugs, and style issues, while formatters fix st
 </details>
 
 
-In the dbt Cloud IDE, you have the capability to perform linting, auto-fix, and formatting on five different file types:
+In the dbt Cloud IDE, you can perform linting, auto-fix, and formatting on five different file types:
 
 - SQL &mdash; [Lint](#lint) and fix with SQLFluff, and [format](#format) with sqlfmt
 - YAML, Markdown, and JSON &mdash; Format with Prettier
@@ -146,7 +146,7 @@ The Cloud IDE formatting integrations take care of manual tasks like code format
 
 To format your SQL code, dbt Cloud integrates with [sqlfmt](http://sqlfmt.com/), which is an uncompromising SQL query formatter that provides one way to format the SQL query and Jinja. 
 
-By default, the IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use right away. However, if you have a file named .sqlfluff in the root directory of your dbt project, the IDE will default to SQLFluff rules instead.
+By default, the IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use immediately. However, if you have a file named .sqlfluff in the root directory of your dbt project, the IDE will default to SQLFluff rules instead.
 
 To enable sqlfmt:
 
@@ -189,10 +189,8 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read
 
 ## FAQs
 
-<details>
-<summary>When should I use SQLFluff and when should I use sqlfmt?</summary>
-
-SQLFluff and sqlfmt are both tools used for formatting SQL code, but there are some differences that may make one preferable to the other depending on your use case. <br />
+<detailsToggle alt_header="When should I use SQLFluff and when should I use sqlfmt?">
+SQLFluff and sqlfmt are both tools used for formatting SQL code, but some differences may make one preferable to the other depending on your use case. <br />
 
 SQLFluff is a SQL code linter and formatter. This means that it analyzes your code to identify potential issues and bugs, and follows coding standards. It also formats your code according to a set of rules, which are [customizable](#customize-linting), to ensure consistent coding practices. You can also use SQLFluff to keep your SQL code well-formatted and follow styling best practices. <br />
 
@@ -204,34 +202,37 @@ You can use either SQLFluff or sqlfmt depending on your preference and what work
 
 - Use sqlfmt to only have your code well-formatted without analyzing it for errors and bugs. You can use sqlfmt out of the box, making it convenient to use right away without having to configure it.
 
-</details>
+</detailsToggle>
 
-<details>
-<summary>Can I nest <code>.sqlfluff</code> files?</summary>
+<detailsToggle alt_header="Can I nest `.sqlfluff` files?">
 
 To ensure optimal code quality, consistent code, and styles &mdash; it's highly recommended you have one main `.sqlfluff` configuration file in the root folder of your project. Having multiple files can result in various different SQL styles in your project. <br /><br />
 
 However, you can customize and include an additional child `.sqlfluff` configuration file within specific subfolders of your dbt project. <br /><br />By nesting a `.sqlfluff` file in a subfolder, SQLFluff will apply the rules defined in that subfolder's configuration file to any files located within it. The rules specified in the parent `.sqlfluff` file will be used for all other files and folders outside of the subfolder. This hierarchical approach allows for tailored linting rules while maintaining consistency throughout your project. Refer to [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/configuration.html#configuration-files) for more info.
 
-</details>
+</detailsToggle>
 
-<details>
-<summary>Can I run SQLFluff commands from the terminal?</summary>
+<detailsToggle alt_header="Can I run SQLFluff commands from the terminal?">
 
 Currently, running SQLFluff commands from the terminal isn't supported. 
-</details>
+</detailsToggle>
 
-<details>
-<summary>Why am I unable to see the <bold>Lint</bold> or <bold>Format</bold> button?</summary>
+<detailsToggle alt_header="Why am I unable to see the Lint or Format button?">
 
 Make sure you're on a development branch. Formatting or Linting isn't available on "main" or "read-only" branches. 
-</details>
+</detailsToggle>
 
-<details>
-<summary>Why is there inconsistent SQLFluff behavior when running outside the dbt Cloud IDE (such as a GitHub Action)?</summary>
-&mdash; Double-check your SQLFluff version matches the one in dbt Cloud IDE (found in the <b>Code Quality</b> tab after a lint operation). <br /><br />
-&mdash; If your lint operation passes despite clear rule violations, confirm you're not linting models with ephemeral models. Linting doesn't support ephemeral models in dbt v1.5 and lower. 
-</details>
+<detailsToggle alt_header="Why is there inconsistent SQLFluff behavior when running outside the dbt Cloud IDE?">
+- Double-check that your SQLFluff version matches the one in dbt Cloud IDE (found in the <b>Code Quality</b> tab after a lint operation). <br /><br />
+- If your lint operation passes despite clear rule violations, confirm you're not linting models with ephemeral models. Linting doesn't support ephemeral models in dbt v1.5 and lower. 
+</detailsToggle>
+
+<detailsToggle alt_header="What are some considerations when using dbt Cloud linting?">
+Currently, the dbt Cloud IDE can lint or fix files up to a certain size and complexity. If you attempt to lint or fix files that are too large, taking more than 60 seconds for the dbt Cloud backend to process, you will see an 'Unable to complete linting this file' error. 
+
+To avoid this, break up your model into smaller models (files) so that they are less complex to lint or fix. Note that linting is simpler than fixing so there may be cases where a file can be linted but not fixed. 
+
+</detailsToggle>
 
 ## Related docs
 

diff --git a/website/docs/docs/collaborate/govern/model-contracts.md b/website/docs/docs/collaborate/govern/model-contracts.md
@@ -28,10 +28,18 @@ While this is ideal for quick and iterative development, for some models, consta
 ## Where are contracts supported?
 
 At present, model contracts are supported for:
-- SQL models. Contracts are not yet supported for Python models.
-- Models materialized as `table`, `view`, and `incremental` (with `on_schema_change: append_new_columns`). Views offer limited support for column names and data types, but not `constraints`. Contracts are not supported for `ephemeral`-materialized models.
+- SQL models. 
+- Models materialized as one of the following:
+    - `table`
+    - `view` &mdash; Views offer limited support for column names and data types, but not `constraints`.
+    - `incremental` &mdash; with `on_schema_change: append_new_columns` or `on_schema_change: fail`.  
 - Certain data platforms, but the supported and enforced `constraints` vary by platform.
 
+Model contracts are _not_ supported for:
+- Python models.
+- `ephemeral`-materialized SQL models.
+
+
 ## How to define a contract
 
 Let's say you have a model with a query like:

diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md
@@ -70,7 +70,7 @@ $ pwd
 <Lightbox src="/img/starter-project-dbt-cli.png" title="The starter project in a code editor" />
 </div>
 
-6. Update the following values in the `dbt_project.yml` file:
+6. dbt provides the following values in the `dbt_project.yml` file:
 
 <File name='dbt_project.yml'>
 
@@ -92,7 +92,7 @@ models:
 
 ## Connect to BigQuery
 
-When developing locally, dbt connects to your <Term id="data-warehouse" /> using a [profile](/docs/core/connect-data-platform/connection-profiles), which is a YAML file with all the connection details to your warehouse.
+When developing locally, dbt connects to your <Term id="data-warehouse" /> using a [profile](/docs/core/connect-data-platform/connection-profiles), which is a YAML file with all the connection details to your warehouse. 
 
 1. Create a file in the `~/.dbt/` directory named `profiles.yml`.
 2. Move your BigQuery keyfile into this directory.

diff --git a/website/docs/reference/global-configs/usage-stats.md b/website/docs/reference/global-configs/usage-stats.md
@@ -18,4 +18,3 @@ config:
 dbt Core users can also use the DO_NOT_TRACK environment variable to enable or disable sending anonymous data. For more information, see [Environment variables](/docs/build/environment-variables).
 
 `DO_NOT_TRACK=1` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=False`
-`DO_NOT_TRACK=0` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=True`
Original file line number	Diff line number	Diff line change
Expand Up		@@ -127,4 +127,4 @@ We've provided a set of example projects you can use to explore the topics cover

		### dbt-meshify

		We recommend using the `dbt-meshify` [command line tool](<https://dbt-labs.github.io/dbt-meshify/0.1/>) to help you do this. This comes with CLI operations to automate most of the above steps.
		We recommend using the `dbt-meshify` [command line tool](<https://dbt-labs.github.io/dbt-meshify/>) to help you do this. This comes with CLI operations to automate most of the above steps.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -18,4 +18,3 @@ config:
		dbt Core users can also use the DO_NOT_TRACK environment variable to enable or disable sending anonymous data. For more information, see [Environment variables](/docs/build/environment-variables).

		`DO_NOT_TRACK=1` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=False`
		`DO_NOT_TRACK=0` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=True`