diff --git a/website/blog/2024-10-04-hybrid-mesh.md b/website/blog/2024-10-04-hybrid-mesh.md new file mode 100644 index 00000000000..34b2a67d1cb --- /dev/null +++ b/website/blog/2024-10-04-hybrid-mesh.md @@ -0,0 +1,89 @@ +--- +title: "How Hybrid Mesh unlocks dbt collaboration at scale" +description: A deep-dive into the Hybrid Mesh pattern for enabling collaboration between domain teams using dbt Core and dbt Cloud. +slug: hybrid-mesh +authors: [jason_ganz] +tags: [analytics craft] +hide_table_of_contents: false +date: 2024-09-30 +is_featured: true +--- + +One of the most important things that dbt does is unlock the ability for teams to collaborate on creating and disseminating organizational knowledge. + +In the past, this primarily looked like a team working in one dbt Project to create a set of transformed objects in their data platform. + +As dbt was adopted by larger organizations and began to drive workloads at a global scale, it became clear that we needed mechanisms to allow teams to operate independently from each other, creating and sharing data models across teams — [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro). + + + +dbt Mesh is powerful because it allows teams to operate _independently_ and _collaboratively_, each team free to build on their own but contributing to a larger, shared set of data outputs. + +The flexibility of dbt Mesh means that it can support [a wide variety of patterns and designs](/best-practices/how-we-mesh/mesh-3-structures). Today, let’s dive into one pattern that is showing promise as a way to enable teams working on very different dbt deployments to work together. + +## How Hybrid Mesh enables collaboration between dbt Core and dbt Cloud teams + +**_Scenario_** — A company with a central data team uses dbt Core. The setup is working well for that team. They want to scale their impact to enable faster decision-making, organization-wide. The current dbt Core setup isn't well suited for onboarding a larger number of less-technical, nontechnical, or less-frequent contributors. + +**_The goal_** — Enable three domain teams of less-technical users to leverage and extend the central data models, with full ownership over their domain-specific dbt models. + + - **Central data team:** Data engineers comfortable using dbt Core and the command line interface (CLI), building and maintaining foundational data models for the entire organization. + + - **Domain teams:** Data analysts comfortable working in SQL but not using the CLI and prefer to start working right away without managing local dbt Core installations or updates. The team needs to build transformations specific to their business context. Some of these users may have tried dbt in the past, but they were not able to successfully onboard to the central team's setup. + +**_Solution: Hybrid Mesh_** — Data teams can use dbt Mesh to connect projects *across* dbt Core and dbt Cloud, creating a workflow where everyone gets to work in their preferred environment while creating a shared lineage that allows for visibility, validation, and ownership across the data pipeline. + +Each team will fully own its dbt code, from development through deployment, using the product that is appropriate to their needs and capabilities _while sharing data products across teams using both dbt Core and dbt Cloud._ + + + +Creating a Hybrid Mesh is mostly the same as creating any other [dbt Mesh](/guides/mesh-qs?step=1) workflow — there are a few considerations but mostly _it just works_. We anticipate it will continue to see adoption as more central data teams look to onboard their downstream domain teams. + +A Hybrid Mesh can be adopted as a stable long-term pattern, or as an intermediary while you perform a [migration from dbt Core to dbt Cloud](/guides/core-cloud-2?step=1). + +## How to build a Hybrid Mesh +Enabling a Hybrid Mesh is as simple as a few additional steps to import the metadata from your Core project into dbt Cloud. Once you’ve done this, you should be able to operate your dbt Mesh like normal and all of our [standard recommendations](/best-practices/how-we-mesh/mesh-1-intro) still apply. + +### Step 1: Prepare your Core project for access through dbt Mesh + +Configure public models to serve as stable interfaces for downstream dbt Projects. + +- Decide which models from your Core project will be accessible in your Mesh. For more information on how to configure public access for those models, refer to the [model access page.](/docs/collaborate/govern/model-access) +- Optionally set up a [model contract](/docs/collaborate/govern/model-contracts) for all public models for better governance. +- Keep dbt Core and dbt Cloud projects in separate repositories to allow for a clear separation between upstream models managed by the dbt Core team and the downstream models handled by the dbt Cloud team. + +### Step 2: Mirror each "producer" Core project in dbt Cloud +This allows dbt Cloud to know about the contents and metadata of your project, which in turn allows for other projects to access its models. + +- [Create a dbt Cloud account](https://www.getdbt.com/signup/) and a dbt project for each upstream Core project. + - Note: If you have [environment variables](/docs/build/environment-variables) in your project, dbt Cloud environment variables must be prefixed with `DBT_ `(including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Follow the instructions in [this guide](https://docs.getdbt.com/guides/core-to-cloud-1?step=8#environment-variables) to convert them for dbt Cloud. +- Each upstream Core project has to have a production [environment](/docs/dbt-cloud-environments) in dbt Cloud. You need to configure credentials and environment variables in dbt Cloud just so that it will resolve relation names to the same places where your dbt Core workflows are deploying those models. +- Set up a [merge job](/docs/deploy/merge-jobs) in a production environment to run `dbt parse`. This will enable connecting downstream projects in dbt Mesh by producing the necessary [artifacts](/reference/artifacts/dbt-artifacts) for cross-project referencing. + - Note: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud. +- Optional: Set up a regular job (for example, daily) to run `source freshness` and `docs generate`. This will hydrate dbt Cloud with additional metadata and enable features in [dbt Explorer](/docs/collaborate/explore-projects) that will benefit both teams, including [Column-level lineage](/docs/collaborate/column-level-lineage). + +### Step 3: Create and connect your downstream projects to your Core project using dbt Mesh +Now that dbt Cloud has the necessary information about your Core project, you can begin setting up your downstream projects, building on top of the public models from the project you brought into Cloud in [Step 2](#step-2-mirror-each-producer-core-project-in-dbt-cloud). To do this: +- Initialize each new downstream dbt Cloud project and create a [`dependencies.yml` file](/docs/collaborate/govern/project-dependencies#use-cases). +- In that `dependencies.yml` file, add the dbt project name from the `dbt_project.yml` of the upstream project(s). This sets up cross-project references between different dbt projects: + + ```yaml + # dependencies.yml file in dbt Cloud downstream project + projects: + - name: upstream_project_name + ``` +- Use [cross-project references](/reference/dbt-jinja-functions/ref#ref-project-specific-models) for public models in upstream project. Add [version](/reference/dbt-jinja-functions/ref#versioned-ref) to references of versioned models: + ```yaml + select * from {{ ref('upstream_project_name', 'monthly_revenue') }} + ``` + +And that’s all it takes! From here, the domain teams that own each dbt Project can build out their models to fit their own use cases. You can now build out your Hybrid Mesh however you want, accessing the full suite of dbt Cloud features. +- Orchestrate your Mesh to ensure timely delivery of data products and make them available to downstream consumers. +- Use [dbt Explorer](/docs/collaborate/explore-projects) to trace the lineage of your data back to its source. +- Onboard more teams and connect them to your Mesh. +- Build [semantic models](/docs/build/semantic-models) and [metrics](/docs/build/metrics-overview) into your projects to query them with the [dbt Semantic Layer](https://www.getdbt.com/product/semantic-layer). + + +## Conclusion + +In a world where organizations have complex and ever-changing data needs, there is no one-size fits all solution. Instead, data practitioners need flexible tooling that meets them where they are. The Hybrid Mesh presents a model for this approach, where teams that are comfortable and getting value out of dbt Core can collaborate frictionlessly with domain teams on dbt Cloud. diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index 59d716b4ca9..ae3ac9225db 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -70,8 +70,6 @@ The name of this test is the name of the file: `assert_total_payment_amount_is_p Singular data tests are easy to write—so easy that you may find yourself writing the same basic structure over and over, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend... - - ## Generic data tests Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like: @@ -304,7 +302,6 @@ data_tests: -To suppress warnings about the rename, add `TestsConfigDeprecation` to the `silence` block of the `warn_error_options` flag in `dbt_project.yml`, [as described in the Warnings documentation](https://docs.getdbt.com/reference/global-configs/warnings). diff --git a/website/docs/docs/build/documentation.md b/website/docs/docs/build/documentation.md index d040d3c5bef..6f7c6c27f31 100644 --- a/website/docs/docs/build/documentation.md +++ b/website/docs/docs/build/documentation.md @@ -101,7 +101,18 @@ The events in this table are recorded by [Snowplow](http://github.com/snowplow/s In the above example, a docs block named `table_events` is defined with some descriptive markdown contents. There is nothing significant about the name `table_events` — docs blocks can be named however you like, as long as the name only contains alphanumeric and underscore characters and does not start with a numeric character. ### Placement -Docs blocks should be placed in files with a `.md` file extension. By default, dbt will search in all resource paths for docs blocks (i.e. the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [macro-paths](/reference/project-configs/macro-paths) and [snapshot-paths](/reference/project-configs/snapshot-paths)) — you can adjust this behavior using the [docs-paths](/reference/project-configs/docs-paths) config. + + + +Docs blocks should be placed in files with a `.md` file extension. By default, dbt will search in all resource paths for docs blocks (for example, the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [test-paths](/reference/project-configs/test-paths), [macro-paths](/reference/project-configs/macro-paths), and [snapshot-paths](/reference/project-configs/snapshot-paths)) — you can adjust this behavior using the [docs-paths](/reference/project-configs/docs-paths) config. + + + + + +Docs blocks should be placed in files with a `.md` file extension. By default, dbt will search in all resource paths for docs blocks (for example, the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [macro-paths](/reference/project-configs/macro-paths), and [snapshot-paths](/reference/project-configs/snapshot-paths)) — you can adjust this behavior using the [docs-paths](/reference/project-configs/docs-paths) config. + + ### Usage diff --git a/website/docs/docs/build/metricflow-commands.md b/website/docs/docs/build/metricflow-commands.md index 55472ba53ce..d9e01bede71 100644 --- a/website/docs/docs/build/metricflow-commands.md +++ b/website/docs/docs/build/metricflow-commands.md @@ -59,7 +59,6 @@ The following table lists the commands compatible with the dbt Cloud IDE and dbt |
Command
|
Description
| dbt Cloud IDE | dbt Cloud CLI | |---------|-------------|---------------|---------------| -| [`list`](#list) | Retrieves metadata values. | ✅ | ✅ | | [`list metrics`](#list-metrics) | Lists metrics with dimensions. | ✅ | ✅ | | [`list dimensions`](#list) | Lists unique dimensions for metrics. | ✅ | ✅ | | [`list dimension-values`](#list-dimension-values) | List dimensions with metrics. | ✅ | ✅ | @@ -94,7 +93,6 @@ Check out the following video for a short video demo of how to query or preview Use the `mf` prefix before the command name to execute them in dbt Core. For example, to list all metrics, run `mf list metrics`. -- [`list`](#list) — Retrieves metadata values. - [`list metrics`](#list-metrics) — Lists metrics with dimensions. - [`list dimensions`](#list) — Lists unique dimensions for metrics. - [`list dimension-values`](#list-dimension-values) — List dimensions with metrics. @@ -107,17 +105,7 @@ Use the `mf` prefix before the command name to execute them in dbt Core. For exa -### List - -This command retrieves metadata values related to [Metrics](/docs/build/metrics-overview), [Dimensions](/docs/build/dimensions), and [Entities](/docs/build/entities) values. - - ### List metrics - -```bash -dbt sl list # In dbt Cloud -mf list # In dbt Core -``` This command lists the metrics with their available dimensions: ```bash @@ -213,23 +201,23 @@ The list of available saved queries: The following command performs validations against the defined semantic model configurations. ```bash -dbt sl validate # dbt Cloud users -mf validate-configs # In dbt Core +dbt sl validate # For dbt Cloud users +mf validate-configs # For dbt Core users Options: - --dw-timeout INTEGER Optional timeout for data warehouse + --timeout # dbt Cloud only + Optional timeout for data warehouse validation in dbt Cloud. + --dw-timeout INTEGER # dbt Core only + Optional timeout for data warehouse validation steps. Default None. - --skip-dw If specified, skips the data warehouse - validations - --show-all If specified, prints warnings and future- - errors - --verbose-issues If specified, prints any extra details - issues might have - --semantic-validation-workers INTEGER - Optional. Uses the number of workers - specified to run the semantic validations. - Should only be used for exceptionally large - configs + --skip-dw # dbt Core only + Skips the data warehouse validations. + --show-all # dbt Core only + Prints warnings and future errors. + --verbose-issues # dbt Core only + Prints extra details about issues. + --semantic-validation-workers INTEGER # dbt Core only + Uses specified number of workers for large configs. --help Show this message and exit. ``` @@ -350,13 +338,13 @@ mf query --metrics order_total,users_active --group-by metric_time # In dbt Core -You can include multiple dimensions in a query. For example, you can group by the `is_food_order` dimension to confirm if orders were for food or not. +You can include multiple dimensions in a query. For example, you can group by the `is_food_order` dimension to confirm if orders were for food or not. Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. **Query** ```bash -dbt sl query --metrics order_total --group-by metric_time,is_food_order # In dbt Cloud +dbt sl query --metrics order_total --group-by order_id__is_food_order # In dbt Cloud -mf query --metrics order_total --group-by metric_time,is_food_order # In dbt Core +mf query --metrics order_total --group-by order_id__is_food_order # In dbt Core ``` **Result** @@ -380,13 +368,15 @@ mf query --metrics order_total --group-by metric_time,is_food_order # In dbt Cor You can add order and limit functions to filter and present the data in a readable format. The following query limits the data set to 10 records and orders them by `metric_time`, descending. Note that using the `-` prefix will sort the query in descending order. Without the `-` prefix sorts the query in ascending order. + Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. + **Query** ```bash # In dbt Cloud -dbt sl query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time +dbt sl query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time # In dbt Core -mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time +mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time ``` **Result** @@ -406,15 +396,15 @@ mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 - -You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `metric_time` with multiple where statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024): +You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `is_food_order` with multiple where statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024). Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. **Query** ```bash # In dbt Cloud -dbt sl query --metrics order_total --group-by metric_time --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'" +dbt sl query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'" # In dbt Core -mf query --metrics order_total --group-by metric_time --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'" +mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'" ``` **Result** @@ -440,16 +430,16 @@ mf query --metrics order_total --group-by metric_time --where "{{ Dimension('ord To filter by time, there are dedicated start and end time options. Using these options to filter by time allows MetricFlow to further optimize query performance by pushing down the where filter when appropriate. - + Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. **Query** ```bash # In dbt Core -mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' +mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' ``` **Result** diff --git a/website/docs/docs/build/metricflow-time-spine.md b/website/docs/docs/build/metricflow-time-spine.md index f3387399ffe..5de3221a677 100644 --- a/website/docs/docs/build/metricflow-time-spine.md +++ b/website/docs/docs/build/metricflow-time-spine.md @@ -18,10 +18,14 @@ MetricFlow requires you to define a time-spine table as a model-level configurat To see the generated SQL for the metric and dimension types that use time-spine joins, refer to the respective documentation or add the `compile=True` flag when querying the Semantic Layer to return the compiled SQL. ## Configuring time-spine in YAML + +- The time spine is a special model that tells dbt and MetricFlow how to use specific columns by defining their properties. +- The [`models` key](/reference/model-properties) for the time spine must be in your `models/` directory. - You only need to configure time-spine models that the Semantic Layer should recognize. - At a minimum, define a time-spine table for a daily grain. - You can optionally define a time-spine table for a different granularity, like hourly. -- Note that if you don’t have a date or calendar model in your project, you'll need to create one. +- Note that if you don’t have a date or calendar model in your project, you'll need to create one. + - If you're looking to specify the grain of a time dimension so that MetricFlow can transform the underlying column to the required granularity, refer to the [Time granularity documentation](/docs/build/dimensions?dimension=time_gran) If you already have a date dimension or time-spine table in your dbt project, you can point MetricFlow to this table by updating the `model` configuration to use this table in the Semantic Layer. This is a model-level configuration that tells dbt to use the model for time range joins in the Semantic Layer. @@ -40,7 +44,7 @@ If you don’t have a date dimension table, you can still create one by using th ```yaml -models: +[models:](/reference/model-properties) - name: time_spine_hourly time_spine: standard_granularity_column: date_hour # column for the standard grain of your table @@ -56,7 +60,7 @@ models: ``` -For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. +For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. Note that the [`models` key](/reference/model-properties) in the time spine configuration must be placed in your `models/` directory. Now, break down the configuration above. It's pointing to a model called `time_spine_daily`. It sets the time spine configurations under the `time_spine` key. The `standard_granularity_column` is the lowest grain of the table, in this case, it's hourly. It needs to reference a column defined under the columns key, in this case, `date_hour`. Use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`. diff --git a/website/docs/docs/cloud/about-cloud-develop-defer.md b/website/docs/docs/cloud/about-cloud-develop-defer.md index 4e2f70b7b82..472cabe13c5 100644 --- a/website/docs/docs/cloud/about-cloud-develop-defer.md +++ b/website/docs/docs/cloud/about-cloud-develop-defer.md @@ -51,7 +51,10 @@ The dbt Cloud CLI offers additional flexibility by letting you choose the source ```yml -defer-env-id: '123456' +context: + active-host: ... + active-project: ... + defer-env-id: '123456' ``` @@ -60,7 +63,7 @@ defer-env-id: '123456' ```yml -dbt_cloud: +dbt-cloud: defer-env-id: '123456' ``` diff --git a/website/docs/docs/cloud/manage-access/external-oauth.md b/website/docs/docs/cloud/manage-access/external-oauth.md index 7ed9e4ef446..deb23f36f09 100644 --- a/website/docs/docs/cloud/manage-access/external-oauth.md +++ b/website/docs/docs/cloud/manage-access/external-oauth.md @@ -1,20 +1,17 @@ --- -title: "Set up external Oauth" +title: "Set up external OAuth" id: external-oauth -description: "Configuration instructions for dbt Cloud and external Oauth connections" -sidebar_label: "Set up external Oauth" -unlisted: true +description: "Configuration instructions for dbt Cloud and external OAuth connections" +sidebar_label: "Set up external OAuth" pagination_next: null pagination_prev: null --- -# Set up external Oauth +# Set up external OAuth -:::note Beta feature +:::note -External OAuth for authentication is available in a limited beta. If you are interested in joining the beta, please contact your account manager. - -This feature is currently only available for the Okta and Entra ID identity providers and Snowflake connections. Only available to Enterprise accounts. +This feature is currently only available for the Okta and Entra ID identity providers and [Snowflake connections](/docs/cloud/connect-data-platform/connect-snowflake). ::: @@ -23,7 +20,7 @@ dbt Cloud Enterprise supports [external OAuth authentication](https://docs.snow ## Getting started -The process of setting up external Oauth will require a little bit of back-and-forth between your dbt Cloud, IdP, and Snowflake accounts, and having them open in multiple browser tabs will help speed up the configuration process: +The process of setting up external OAuth will require a little bit of back-and-forth between your dbt Cloud, IdP, and Snowflake accounts, and having them open in multiple browser tabs will help speed up the configuration process: - **dbt Cloud:** You’ll primarily be working in the **Account Settings** —> **Integrations** page. You will need [proper permission](/docs/cloud/manage-access/enterprise-permissions) to set up the integration and create the connections. - **Snowflake:** Open a worksheet in an account that has permissions to [create a security integration](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration). @@ -34,7 +31,7 @@ If the admins that handle these products are all different people, it’s better ### Snowflake commands -The following is a template for creating the Oauth configurations in the Snowflake environment: +The following is a template for creating the OAuth configurations in the Snowflake environment: ```sql @@ -53,41 +50,45 @@ external_oauth_any_role_mode = 'ENABLE' The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_user_mapping_attribute` can be modified based on the your organizations needs. These values point to the claim in the users’ token. In the example, Snowflake will look up the Snowflake user whose `email` matches the value in the `sub` claim. -**Note:** The Snowflake default roles ACCOUNTADMIN, ORGADMIN, or SECURITYADMIN, are blocked from external Oauth by default and they will likely fail to authenticate. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration-oauth-external) for more information. +**Note:** The Snowflake default roles ACCOUNTADMIN, ORGADMIN, or SECURITYADMIN, are blocked from external OAuth by default and they will likely fail to authenticate. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration-oauth-external) for more information. + +## Identity provider configuration -## Set up with Okta +Select a supported identity provider (IdP) for instructions on configuring external OAuth in their environment and completing the integration in dbt Cloud. + + ### 1. Initialize the dbt Cloud settings -1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. +1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations** -3. Leave this window open. You can set the **Integration type** to Okta and make a note of the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. +3. Leave this window open. You can set the **Integration type** to Okta and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. ### 2. Create the Okta app -1. From the Okta dashboard, expand the **Applications** section and click **Applications.** Click the **Create app integration** button. +1. Expand the **Applications** section from the Okta dashboard and click **Applications.** Click the **Create app integration** button. 2. Select **OIDC** as the sign-in method and **Web applications** as the application type. Click **Next**. -3. Give the application an appropriate name, something like “External Oauth app for dbt Cloud” that will make it easily identifiable. +3. Give the application an appropriate name, something like “External OAuth app for dbt Cloud,” that will make it easily identifiable. 4. In the **Grant type** section, enable the **Refresh token** option. -5. Scroll down to the **Sign-in redirect URIs** option. Here, you’ll need to paste the redirect URI you gathered from dbt Cloud in step 1.3. +5. Scroll down to the **Sign-in redirect URIs** option. You’ll need to paste the redirect URI you gathered from dbt Cloud in step 1.3. - + -6. Save the app configuration. You’ll come back to it, but for now move on to the next steps. +6. Save the app configuration. You’ll come back to it, but move on to the next steps for now. ### 3. Create the Okta API -1. From the Okta sidebar menu, expand the **Security** section and clicl **API**. -2. On the API screen, click **Add authorization server**. Give the authorizations server a name (a nickname for your Snowflake account would be appropriate). For the **Audience** field, copy and paste your Snowflake login URL (for example, https://abdc-ef1234.snowflakecomputing.com). Give the server an appropriate description and click **Save**. +1. Expand the **Security** section and click **API** from the Okta sidebar menu. +2. On the API screen, click **Add authorization server**. Give the authorization server a name (a nickname for your Snowflake account would be appropriate). For the **Audience** field, copy and paste your Snowflake login URL (for example, https://abdc-ef1234.snowflakecomputing.com). Give the server an appropriate description and click **Save**. -3. On the authorization server config screen, open the **Metadata URI** in a new tab. You’ll need information from this screen in later steps. +3. On the authorization server config screen, open the **Metadata URI** in a new tab. You’ll need information from this screen in later steps. @@ -97,7 +98,7 @@ The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_u -5. Open the **Access policies** tab and click **Add policy**. Give the policy a **Name** and **Description** and set **Assign to** as **The following clients**. Start typing the name of the app you created in step 2.3 and you’ll see it autofill. Select the app and click **Create Policy**. +5. Open the **Access policies** tab and click **Add policy**. Give the policy a **Name** and **Description** and set **Assign to** as **The following clients**. Start typing the name of the app you created in step 2.3, and you’ll see it autofill. Select the app and click **Create Policy**. @@ -105,13 +106,13 @@ The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_u -7. Give the rule a descriptive name and scroll down to **token lifetimes**. Configure the **Access token lifetime is**, **Refresh token lifetime is**, and **but will expire if not used every** settings according to your organizational policies. We recommend the defaults of 1 hour and 90 days. Stricter rules increases the odds of your users having to re-authenticate. +7. Give the rule a descriptive name and scroll down to **token lifetimes**. Configure the **Access token lifetime is**, **Refresh token lifetime is**, and **but will expire if not used every** settings according to your organizational policies. We recommend the defaults of 1 hour and 90 days. Stricter rules increase the odds of your users having to re-authenticate. 8. Navigate back to the **Settings** tab and leave it open in your browser. You’ll need some of the information in later steps. -### 4. Create the Oauth settings in Snowflake +### 4. Create the OAuth settings in Snowflake 1. Open up a Snowflake worksheet and copy/paste the following: @@ -130,9 +131,9 @@ external_oauth_any_role_mode = 'ENABLE' ``` -2. Change `your_integration_name` to something appropriately descriptive. For example, `dev_OktaAccountNumber_okta`. Copy the `external_oauth_issuer` and `external_oauth_jws_keys_url` from the metadate URI in step 3.3. Use the same Snowflake URL that you entered in step 3.2 as the `external_oauth_audience_list`. +2. Change `your_integration_name` to something appropriately descriptive. For example, `dev_OktaAccountNumber_okta`. Copy the `external_oauth_issuer` and `external_oauth_jws_keys_url` from the metadata URI in step 3.3. Use the same Snowflake URL you entered in step 3.2 as the `external_oauth_audience_list`. -Adjust the other settings as needed to meet your organizations configurations in Okta and Snowflake. +Adjust the other settings as needed to meet your organization's configurations in Okta and Snowflake. @@ -140,39 +141,47 @@ Adjust the other settings as needed to meet your organizations configurations in ### 5. Configuring the integration in dbt Cloud -1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. - 1. `Integration name`: Give the integration a descriptive name that includes identifying information about the Okta environment so future users won’t have to guess where it belongs. - 2. `Client ID` and `Client secrets`: Retrieve these from the Okta application page. - - 3. Authorize URL and Token URL: Found in the metadata URI. - +1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. + 1. `Integration name`: Give the integration a descriptive name that includes identifying information about the Okta environment so future users won’t have to guess where it belongs. + 2. `Client ID` and `Client secrets`: Retrieve these from the Okta application page. + + 3. Authorize URL and Token URL: Found in the metadata URI. + 2. **Save** the configuration + ### 6. Create a new connection in dbt Cloud -1. Navigate the **Account settings** and click **Connections** from the menu. Click **Add connection**. -2. Configure the `Account`, `Database`, and `Warehouse` as you normally would and for the `Oauth method` select the external Oauth you just created. - +1. Navigate the **Account settings** and click **Connections** from the menu. Click **Add connection**. +2. Configure the `Account`, `Database`, and `Warehouse` as you normally would, and for the `OAuth method`, select the external OAuth you just created. + + + + -3. Scroll down to the **External Oauth** configurations box and select the config from the list. +3. Scroll down to the **External OAuth** configurations box and select the config from the list. - -4. **Save** the connection and you have now configured External Oauth with Okta and Snowflake! + -## Set up with Entra ID + +4. **Save** the connection, and you have now configured External OAuth with Okta and Snowflake! + + + + ### 1. Initialize the dbt Cloud settings -1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. +1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations**. -3. Leave this window open. You can set the **Integration type** to Entra ID and make a note of the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. +3. Leave this window open. You can set the **Integration type** to Entra ID and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. ### Entra ID -You’ll create two different `apps` in the Azure portal — A resource server and a client app. +You’ll create two apps in the Azure portal: A resource server and a client app. :::important @@ -187,68 +196,74 @@ In your Azure portal, open the **Entra ID** and click **App registrations** from ### 1. Create a resource server 1. From the app registrations screen, click **New registration**. - 1. Give the app a name. - 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” - 3. Click **Register** and you will be taken to the apps overview. + 1. Give the app a name. + 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” + 3. Click **Register**to see the application’s overview. 2. From the app overview page, click **Expose an API** from the left menu. -3. Click **Add** next to **Application ID URI**. The field will automatically populate. Click **Save**. -4. Record the `value` field as it will be used in a future step. *This is only displayed once. Be sure to record it immediately. It will be hidden when you leave the page and come back.* +3. Click **Add** next to **Application ID URI**. The field will automatically populate. Click **Save**. +4. Record the `value` field for use in a future step. _This is only displayed once. Be sure to record it immediately. Microsoft hides the field when you leave the page and come back._ 5. From the same screen, click **Add scope**. - 1. Give the scope a name. - 2. Set “Who can consent?” to **Admins and users**. - 3. Set **Admin consent display name** session:role-any and give it a description. - 4. Ensure **State** is set to **Enabled**. - 5. Click **Add scope**. + 1. Give the scope a name. + 2. Set “Who can consent?” to **Admins and users**. + 3. Set **Admin consent display name** session:role-any and give it a description. + 4. Ensure **State** is set to **Enabled**. + 5. Click **Add scope**. ### 2. Create a client app 1. From the **App registration page**, click **New registration**. - 1. Give the app a name that uniquely identifies it as the client app. - 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” - 3. Set the **Redirect URI** to **Web** and copy/paste the **Redirect URI** from dbt Cloud into the field. - 4. Click **Register**. + 1. Give the app a name that uniquely identifies it as the client app. + 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” + 3. Set the **Redirect URI** to **Web** and copy/paste the **Redirect URI** from dbt Cloud into the field. + 4. Click **Register**. 2. From the app overview page, click **API permissions** from the left menu, and click **Add permission**. 3. From the pop-out screen, click **APIs my organization uses**, search for the resource server name from the previous steps, and click it. 4. Ensure the box for the **Permissions** `session:role-any` is enabled and click **Add permissions**. 5. Click **Grant admin consent** and from the popup modal click **Yes**. -6. From the left menu, click **Certificates and secrets** and cllick **New client secret**. Name the secret, set an expiration, and click **Add**. -**Note**: Microsoft does not allow “forever” as an expiration. The maximum time is two years. It’s essential to document the expiration date so that the secret can be refreshed before the expiration or user authorization will fail. -7. Record the `value` for use in a future step and record it immediately. -**Note**: This value will not be displayed again once you navigate away from this screen. +6. From the left menu, click **Certificates and secrets** and click **New client secret**. Name the secret, set an expiration, and click **Add**. +**Note**: Microsoft does not allow “forever” as an expiration date. The maximum time is two years. Documenting the expiration date so you can refresh the secret before the expiration or user authorization fails is essential. +7. Record the `value` for use in a future step and record it immediately. +**Note**: Entra ID will not display this value again once you navigate away from this screen. ### 3. Snowflake configuration -You'll be switching between the Entra ID site and Snowflake. Keep your Entra ID account open for this process. +You'll be switching between the Entra ID site and Snowflake. Keep your Entra ID account open for this process. Copy and paste the following as a template in a Snowflake worksheet: ```sql + create or replace security integration - type = external_oauth - enabled = true - external_oauth_type = azure - external_oauth_issuer = '' - external_oauth_jws_keys_url = '' - external_oauth_audience_list = ('') - external_oauth_token_user_mapping_claim = 'upn' - external_oauth_any_role_mode = 'ENABLE' - external_oauth_snowflake_user_mapping_attribute = 'login_name'; + type = external_oauth + enabled = true + external_oauth_type = azure + external_oauth_issuer = '' + external_oauth_jws_keys_url = '' + external_oauth_audience_list = ('') + external_oauth_token_user_mapping_claim = 'upn' + external_oauth_any_role_mode = 'ENABLE' + external_oauth_snowflake_user_mapping_attribute = 'login_name'; + ``` + On the Entra ID site: -1. From the Client ID app in Entra ID, click **Endpoints** and open the **Federation metadata document** in a new tab. - - The **entity ID** on this page maps to the `external_oauth_issuer` field in the Snowflake config. +1. From the Client ID +app in Entra ID, click **Endpoints** and open the **Federation metadata document** in a new tab. + - The **entity ID** on this page maps to the `external_oauth_issuer` field in the Snowflake config. 2. Back on the list of endpoints, open the **OpenID Connect metadata document** in a new tab. - - The **jwks_uri** field maps to the `external_oauth_jws_keys_url` field in Snowflake. + - The **jwks_uri** field maps to the `external_oauth_jws_keys_url` field in Snowflake. 3. Navigate to the resource server in previous steps. - - The **Application ID URI** maps to teh `external_oauth_audience_list` field in Snowflake. -4. Run the configurations. Be sure the admin who created the Microsoft apps is also a user in Snowflake, or the configuration will fail. + - The **Application ID URI** maps to the `external_oauth_audience_list` field in Snowflake. +4. Run the configurations. Be sure the admin who created the Microsoft apps is also a user in Snowflake, or the configuration will fail. ### 4. Configuring the integration in dbt Cloud -1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. There will be some back-and-forth between the Entra ID account and dbt Cloud. -2. `Integration name`: Give the integration a descriptive name that includes identifying information about the Entra ID environment so future users won’t have to guess where it belongs. -3. `Client secrets`: These are found in the Client ID from the **Certificates and secrets** page. `Value` is the `Client secret` . Note that it only appears when created; if you return later, it will be hidden, and you must recreate the secret. +1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. There will be some back-and-forth between the Entra ID account and dbt Cloud. +2. `Integration name`: Give the integration a descriptive name that includes identifying information about the Entra ID environment so future users won’t have to guess where it belongs. +3. `Client secrets`: Found in the Client ID from the **Certificates and secrets** page. `Value` is the `Client secret`. Note that it only appears when created; _Microsoft hides the secret if you return later, and you must recreate it._ 4. `Client ID`: Copy the’ Application (client) ID’ on the overview page for the client ID app. -5. `Authorization URL` and `Token URL`: From the client ID app, open the `Endpoints` tab. The `Oauth 2.0 authorization endpoint (v2)` and `Oauth 2.0 token endpoint (v2)` fields map to these. *You must use v2 of the `Oauth 2.0 authorization endpoint`. Do not use V1.* You can use either version of the `Oauth 2.0 token endpoint`. +5. `Authorization URL` and `Token URL`: From the client ID app, open the `Endpoints` tab. These URLs map to the `OAuth 2.0 authorization endpoint (v2)` and `OAuth 2.0 token endpoint (v2)` fields. *You must use v2 of the `OAuth 2.0 authorization endpoint`. Do not use V1.* You can use either version of the `OAuth 2.0 token endpoint`. 6. `Application ID URI`: Copy the `Application ID URI` field from the resource server’s Overview screen. + + diff --git a/website/docs/docs/cloud/secure/postgres-privatelink.md b/website/docs/docs/cloud/secure/postgres-privatelink.md index 864cfe4acba..76b7774fcec 100644 --- a/website/docs/docs/cloud/secure/postgres-privatelink.md +++ b/website/docs/docs/cloud/secure/postgres-privatelink.md @@ -6,6 +6,7 @@ sidebar_label: "PrivateLink for Postgres" --- import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkTroubleshooting from '/snippets/_privatelink-troubleshooting.md'; +import PrivateLinkCrossZone from '/snippets/_privatelink-cross-zone-load-balancing.md'; @@ -41,9 +42,16 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - Target Group protocol: **TCP** - **Network Load Balancer (NLB)** — Requires creating a Listener that attaches to the newly created Target Group for port `5432` + - **Scheme:** Internal + - **IP address type:** IPv4 + - **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. + - **Security Groups:** The Network Load Balancer (NLB) associated with the VPC endpoint service must either not have an associated security group, or the security group must have a rule that allows requests from the appropriate dbt Cloud **private CIDR(s)**. Note that _this is different_ than the static public IPs listed on the dbt Cloud [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) page. dbt Support can provide the correct private CIDR(s) upon request. If necessary, until you can refine the rule to the smaller CIDR provided by dbt, allow connectivity by temporarily adding an allow rule of `10.0.0.0/8`. + - **Listeners:** Create one listener per target group that maps the appropriate incoming port to the corresponding target group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). - **VPC Endpoint Service** — Attach to the newly created NLB. - Acceptance required (optional) — Requires you to [accept our connection request](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests) after dbt creates the endpoint. + + ### 2. Grant dbt AWS account access to the VPC Endpoint Service On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the root user in the appropriate production AWS account and save your changes. diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index a9d4332918b..16d14badc05 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -7,6 +7,7 @@ sidebar_label: "PrivateLink for Redshift" import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkTroubleshooting from '/snippets/_privatelink-troubleshooting.md'; +import PrivateLinkCrossZone from '/snippets/_privatelink-cross-zone-load-balancing.md'; @@ -79,9 +80,16 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - Target Group protocol: **TCP** - **Network Load Balancer (NLB)** — Requires creating a Listener that attaches to the newly created Target Group for port `5439` + - **Scheme:** Internal + - **IP address type:** IPv4 + - **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. + - **Security Groups:** The Network Load Balancer (NLB) associated with the VPC endpoint service must either not have an associated security group, or the security group must have a rule that allows requests from the appropriate dbt Cloud **private CIDR(s)**. Note that _this is different_ than the static public IPs listed on the dbt Cloud [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) page. dbt Support can provide the correct private CIDR(s) upon request. If necessary, until you can refine the rule to the smaller CIDR provided by dbt, allow connectivity by temporarily adding an allow rule of `10.0.0.0/8`. + - **Listeners:** Create one listener per target group that maps the appropriate incoming port to the corresponding target group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). - **VPC Endpoint Service** — Attach to the newly created NLB. - Acceptance required (optional) — Requires you to [accept our connection request](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests) after dbt creates the endpoint. + + ### 2. Grant dbt AWS Account access to the VPC Endpoint Service On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the root user in the appropriate production AWS account and save your changes. diff --git a/website/docs/docs/cloud/secure/vcs-privatelink.md b/website/docs/docs/cloud/secure/vcs-privatelink.md index 6041b1cb4ed..28b4df8f706 100644 --- a/website/docs/docs/cloud/secure/vcs-privatelink.md +++ b/website/docs/docs/cloud/secure/vcs-privatelink.md @@ -7,6 +7,7 @@ sidebar_label: "PrivateLink for VCS" import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkTroubleshooting from '/snippets/_privatelink-troubleshooting.md'; +import PrivateLinkCrossZone from '/snippets/_privatelink-cross-zone-load-balancing.md'; @@ -44,12 +45,15 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Scheme:** Internal - **IP address type:** IPv4 - **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. + - **Security Groups:** The Network Load Balancer (NLB) associated with the VPC Endpoint Service must either not have an associated Security Group, or the Security Group must have a rule that allows requests from the appropriate dbt Cloud **private CIDR(s)**. Note that **this is different** than the static public IPs listed on the dbt Cloud [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) page. The correct private CIDR(s) can be provided by dbt Support upon request. If necessary, temporarily adding an allow rule of `10.0.0.0/8` should allow connectivity until the rule can be refined to the smaller dbt provided CIDR. - **Listeners:** Create one Listener per Target Group that maps the appropriate incoming port to the corresponding Target Group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). - **Endpoint Service** - The VPC Endpoint Service is what allows for the VPC to VPC connection, routing incoming requests to the configured load balancer. - **Load balancer type:** Network. - **Load balancer:** Attach the NLB created in the previous step. - **Acceptance required (recommended)**: When enabled, requires a new connection request to the VPC Endpoint Service to be accepted by the customer before connectivity is allowed ([details](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests)). + + ### 2. Grant dbt AWS account access to the VPC Endpoint Service Once these resources have been provisioned, access needs to be granted for the dbt Labs AWS account to create a VPC Endpoint in our VPC. On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the following IAM role in the appropriate production AWS account and save your changes ([details](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permissions)). diff --git a/website/docs/docs/collaborate/govern/model-versions.md b/website/docs/docs/collaborate/govern/model-versions.md index f255aa9db1a..eefcf76e824 100644 --- a/website/docs/docs/collaborate/govern/model-versions.md +++ b/website/docs/docs/collaborate/govern/model-versions.md @@ -69,7 +69,7 @@ When you make updates to a model's source code — its logical definition, i **Versioned models are different.** Defining model `versions` is appropriate when people, systems, and processes beyond your team's control, inside or outside of dbt, depend on your models. You can neither simply go migrate them all, nor break their queries on a whim. You need to offer a migration path, with clear diffs and deprecation dates. -Multiple versions of a model will live in the same code repository at the same time, and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned: Multiple versions are live simultaneously, two or three, and not more). Over time, newer versions come online, and older versions are sunsetted . +Multiple versions of a model will live in the same code repository at the same time, and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned: Multiple versions live simultaneously, two or three, and not more). Over time, newer versions come online, and older versions are sunsetted . ## How is this different from just creating a new model? diff --git a/website/docs/docs/collaborate/govern/project-dependencies.md b/website/docs/docs/collaborate/govern/project-dependencies.md index a56646b0d0b..6a84a04e109 100644 --- a/website/docs/docs/collaborate/govern/project-dependencies.md +++ b/website/docs/docs/collaborate/govern/project-dependencies.md @@ -30,9 +30,6 @@ import UseCaseInfo from '/snippets/_packages_or_dependencies.md'; -Refer to the [FAQs](#faqs) for more info. - - ## Example As an example, let's say you work on the Marketing team at the Jaffle Shop. The name of your team's project is `jaffle_marketing`: diff --git a/website/docs/faqs/Troubleshooting/job-memory-limits.md b/website/docs/faqs/Troubleshooting/job-memory-limits.md index 06f6a752507..abba43d18cd 100644 --- a/website/docs/faqs/Troubleshooting/job-memory-limits.md +++ b/website/docs/faqs/Troubleshooting/job-memory-limits.md @@ -6,14 +6,14 @@ sidebar_label: 'Job failures due to exceeded memory limits' If you're receiving a `This run exceeded your account's run memory limits` error in your failed job, it means that the job exceeded the [memory limits](/docs/deploy/job-scheduler#job-memory) set for your account. All dbt Cloud accounts have a pod memory of 600Mib and memory limits are on a per run basis. They're typically influenced by the amount of result data that dbt has to ingest and process, which is small but can become bloated unexpectedly by project design choices. -## Common reasons +### Common reasons Some common reasons for higher memory usage are: - dbt run/build: Macros that capture large result sets from run query may not all be necessary and may be memory inefficient. - dbt docs generate: Source or model schemas with large numbers of tables (even if those tables aren't all used by dbt) cause the ingest of very large results for catalog queries. -## Resolution +### Resolution There are various reasons why you could be experiencing this error but they are mostly the outcome of retrieving too much data back into dbt. For example, using the `run_query()` operations or similar macros, or even using database/schemas that have a lot of other non-dbt related tables/views. Try to reduce the amount of data / number of rows retrieved back into dbt by refactoring the SQL in your `run_query()` operation using `group`, `where`, or `limit` clauses. Additionally, you can also use a database/schema with fewer non-dbt related tables/views. @@ -26,5 +26,5 @@ As an additional resource, check out [this example video](https://www.youtube.co If you've tried the earlier suggestions and are still experiencing failed job runs with this error about hitting the memory limits of your account, please [reach out to support](mailto:support@getdbt.com). We're happy to help! -## Additional resources +### Additional resources - [Blog post on how we shaved 90 mins off](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model) diff --git a/website/docs/reference/dbt-jinja-functions/set.md b/website/docs/reference/dbt-jinja-functions/set.md index d85e0539924..fa4de60e968 100644 --- a/website/docs/reference/dbt-jinja-functions/set.md +++ b/website/docs/reference/dbt-jinja-functions/set.md @@ -27,6 +27,10 @@ __Args__: {% do log(my_set) %} {# None #} ``` +``` +{% set email_id = "'admin@example.com'" %} +``` + ### set_strict The `set_strict` context method can be used to convert any iterable to a sequence of iterable elements that are unique (a set). The difference to the `set` context method is that the `set_strict` method will raise an exception on a `TypeError`, if the provided value is not a valid iterable and cannot be converted to a set. diff --git a/website/docs/reference/global-configs/resource-type.md b/website/docs/reference/global-configs/resource-type.md index ad8897a745c..9e6ec82df06 100644 --- a/website/docs/reference/global-configs/resource-type.md +++ b/website/docs/reference/global-configs/resource-type.md @@ -4,7 +4,17 @@ id: "resource-type" sidebar: "resource type" --- -The `--resource-type` and `--exclude-resource-type` flags include or exclude resource types from the `dbt build`, `dbt clone`, and `dbt list` commands. + + +The `--resource-type` and `--exclude-resource-type` flags include or exclude resource types from the `dbt build`, `dbt clone`, and `dbt list` commands. In Versionless and from dbt v1.9 onwards, these flags are also supported in the `dbt test` command. + + + + + +The `--resource-type` and `--exclude-resource-type` flags include or exclude resource types from the `dbt build`, `dbt test`, `dbt clone`, and `dbt list` commands. + + This means the flags enable you to specify which types of resources to include or exclude when running the commands, instead of targeting specific resources. @@ -109,3 +119,27 @@ Instead of targeting specific resources, use the `--resource-flag` or `--exclude + + + +- In this example, use the following command to exclude _all_ unit tests when running tests: + + + + ```text + dbt test --exclude-resource-type unit_test + ``` + + + +- In this example, use the following command to include all data tests when running tests: + + + + ```text + dbt test --resource-type test + ``` + + + + diff --git a/website/docs/reference/global-configs/warnings.md b/website/docs/reference/global-configs/warnings.md index 97eb270338e..d432432d25f 100644 --- a/website/docs/reference/global-configs/warnings.md +++ b/website/docs/reference/global-configs/warnings.md @@ -46,7 +46,6 @@ flags: error: # Previously called "include" warn: # Previously called "exclude" silence: # To silence or ignore warnings - - TestsConfigDeprecation - NoNodesForSelectionCriteria ``` @@ -131,7 +130,6 @@ config: warn: # Previously called "exclude" - NoNodesForSelectionCriteria silence: # Silence or ignore warnings - - TestsConfigDeprecation - NoNodesForSelectionCriteria ``` diff --git a/website/docs/reference/model-properties.md b/website/docs/reference/model-properties.md index 46fb0ca3bad..7576fc350f8 100644 --- a/website/docs/reference/model-properties.md +++ b/website/docs/reference/model-properties.md @@ -2,9 +2,9 @@ title: Model properties --- -Models properties can be declared in `.yml` files in your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)). +Models properties can be declared in `.yml` files in your `models/` directory (as defined by the [`model-paths` config](/reference/project-configs/model-paths)). -You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `models/` directory. +You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `models/` directory. The [MetricFlow time spine](/docs/build/metricflow-time-spine) is a model property that tells dbt and MetricFlow how to use specific columns by defining their properties. @@ -74,7 +74,3 @@ models: - diff --git a/website/docs/reference/project-configs/docs-paths.md b/website/docs/reference/project-configs/docs-paths.md index 51ff5c5ccca..5481c19c9fd 100644 --- a/website/docs/reference/project-configs/docs-paths.md +++ b/website/docs/reference/project-configs/docs-paths.md @@ -17,8 +17,18 @@ Optionally specify a custom list of directories where [docs blocks](/docs/build/ ## Default -By default, dbt will search in all resource paths for docs blocks (i.e. the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [macro-paths](/reference/project-configs/macro-paths) and [snapshot-paths](/reference/project-configs/snapshot-paths)). If this option is configured, dbt will _only_ look in the specified directory for docs blocks. + + +By default, dbt will search in all resource paths for docs blocks (for example, the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [test-paths](/reference/project-configs/test-paths), [macro-paths](/reference/project-configs/macro-paths), and [snapshot-paths](/reference/project-configs/snapshot-paths)). If this option is configured, dbt will _only_ look in the specified directory for docs blocks. + + + + + +By default, dbt will search in all resource paths for docs blocks (i.e. the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [macro-paths](/reference/project-configs/macro-paths), and [snapshot-paths](/reference/project-configs/snapshot-paths)). If this option is configured, dbt will _only_ look in the specified directory for docs blocks. + + ## Example diff --git a/website/docs/reference/project-configs/test-paths.md b/website/docs/reference/project-configs/test-paths.md index 59f17db05eb..6749a07d23d 100644 --- a/website/docs/reference/project-configs/test-paths.md +++ b/website/docs/reference/project-configs/test-paths.md @@ -13,7 +13,7 @@ test-paths: [directorypath] ## Definition -Optionally specify a custom list of directories where [singular tests](/docs/build/data-tests) are located. +Optionally specify a custom list of directories where [singular tests](/docs/build/data-tests#singular-data-tests) and [custom generic tests](/docs/build/data-tests#generic-data-tests) are located. ## Default diff --git a/website/docs/reference/resource-configs/updated_at.md b/website/docs/reference/resource-configs/updated_at.md index 896405bf063..c61b04264be 100644 --- a/website/docs/reference/resource-configs/updated_at.md +++ b/website/docs/reference/resource-configs/updated_at.md @@ -27,6 +27,17 @@ snapshots: + + +:::caution + +You will get a warning if the data type of the `updated_at` column does not match the adapter-configured default. + +::: + + + + ## Description A column within the results of your snapshot query that represents when the record row was last updated. diff --git a/website/docs/reference/resource-properties/description.md b/website/docs/reference/resource-properties/description.md index 59420614b02..6f32f75efa4 100644 --- a/website/docs/reference/resource-properties/description.md +++ b/website/docs/reference/resource-properties/description.md @@ -13,6 +13,7 @@ description: "This guide explains how to use the description key to add YAML des { label: 'Snapshots', value: 'snapshots', }, { label: 'Analyses', value: 'analyses', }, { label: 'Macros', value: 'macros', }, + { label: 'Singular data tests', value: 'singular_data_tests', }, ] }> @@ -145,6 +146,33 @@ macros: + + + + + + +```yml +version: 2 + +data_tests: + - name: singular_data_test_name + description: markdown_string + +``` + + + + + + + +The `description` property is available for singular data tests beginning in dbt v1.9. + + + + + diff --git a/website/docs/reference/resource-properties/unit-testing-versions.md b/website/docs/reference/resource-properties/unit-testing-versions.md index 4d28e19e71d..39ef241c122 100644 --- a/website/docs/reference/resource-properties/unit-testing-versions.md +++ b/website/docs/reference/resource-properties/unit-testing-versions.md @@ -27,7 +27,7 @@ unit_tests: - name: test_is_valid_email_address model: my_model versions: - include: + exclude: - 1 ... diff --git a/website/sidebars.js b/website/sidebars.js index f211269b0b7..590279d0680 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -124,6 +124,7 @@ const sidebarSettings = { "docs/cloud/manage-access/set-up-snowflake-oauth", "docs/cloud/manage-access/set-up-databricks-oauth", "docs/cloud/manage-access/set-up-bigquery-oauth", + "docs/cloud/manage-access/external-oauth", ], }, // SSO "docs/cloud/manage-access/audit-log", diff --git a/website/snippets/_packages_or_dependencies.md b/website/snippets/_packages_or_dependencies.md index 3cd0361a099..a822b9773db 100644 --- a/website/snippets/_packages_or_dependencies.md +++ b/website/snippets/_packages_or_dependencies.md @@ -1,16 +1,23 @@ ## Use cases -Starting from dbt v1.6, we added a new configuration file called `dependencies.yml`. The file can contain both types of dependencies: "package" and "project" dependencies. -- ["Package" dependencies](/docs/build/packages#how-do-i-add-a-package-to-my-project) lets you add source code from someone else's dbt project into your own, like a library. -- ["Project" dependencies](/docs/collaborate/govern/project-dependencies) provide a different way to build on top of someone else's work in dbt. +The following setup will work for every dbt project: + +- Add [any package dependencies](/docs/collaborate/govern/project-dependencies#when-to-use-project-dependencies) to `packages.yml` +- Add [any project dependencies](/docs/collaborate/govern/project-dependencies#when-to-use-package-dependencies) to `dependencies.yml` + +However, you may be able to consolidate both into a single `dependencies.yml` file. Read the following section to learn more. + +#### About packages.yml and dependencies.yml +The `dependencies.yml`. file can contain both types of dependencies: "package" and "project" dependencies. +- [Package dependencies](/docs/build/packages#how-do-i-add-a-package-to-my-project) lets you add source code from someone else's dbt project into your own, like a library. +- Project dependencies provide a different way to build on top of someone else's work in dbt. If your dbt project doesn't require the use of Jinja within the package specifications, you can simply rename your existing `packages.yml` to `dependencies.yml`. However, something to note is if your project's package specifications use Jinja, particularly for scenarios like adding an environment variable or a [Git token method](/docs/build/packages#git-token-method) in a private Git package specification, you should continue using the `packages.yml` file name. -Examine the following tabs to understand the differences and determine when to use `dependencies.yml` or `packages.yml` (or both at the same time). +Use the following toggles to understand the differences and determine when to use `dependencies.yml` or `packages.yml` (or both). Refer to the [FAQs](#faqs) for more info. - - + Project dependencies are designed for the [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro) and [cross-project reference](/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref) workflow: @@ -19,9 +26,9 @@ Project dependencies are designed for the [dbt Mesh](/best-practices/how-we-mesh - Private packages are not supported in `dependencies.yml` because they intentionally don't support Jinja rendering or conditional configuration. This is to maintain static and predictable configuration and ensures compatibility with other services, like dbt Cloud. - Use `dependencies.yml` for organization and maintainability if you're using both [cross-project refs](/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref) and [dbt Hub packages](https://hub.getdbt.com/). This reduces the need for multiple YAML files to manage dependencies. - + - + Package dependencies allow you to add source code from someone else's dbt project into your own, like a library: @@ -31,5 +38,5 @@ Package dependencies allow you to add source code from someone else's dbt projec - `packages.yml` supports Jinja rendering for historical reasons, allowing dynamic configurations. This can be useful if you need to insert values, like a [Git token method](/docs/build/packages#git-token-method) from an environment variable, into your package specifications. Currently, to use private git repositories in dbt, you need to use a workaround that involves embedding a git token with Jinja. This is not ideal as it requires extra steps like creating a user and sharing a git token. We're planning to introduce a simpler method soon that won't require Jinja-embedded secret environment variables. For that reason, `dependencies.yml` does not support Jinja. - - + + diff --git a/website/snippets/_privatelink-cross-zone-load-balancing.md b/website/snippets/_privatelink-cross-zone-load-balancing.md new file mode 100644 index 00000000000..cb879e5602b --- /dev/null +++ b/website/snippets/_privatelink-cross-zone-load-balancing.md @@ -0,0 +1,6 @@ + +:::note Cross-Zone Load Balancing +We highly recommend cross-zone load balancing for your NLB or Target Group; some connections may require it. Cross-zone load balancing may also [improve routing distribution and connection resiliency](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html#cross-zone-load-balancing). Note that cross-zone connectivity may incur additional data transfer charges, though this should be minimal for requests from dbt Cloud. + +- [Enabling cross-zone load balancing for a load balancer or target group](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/edit-target-group-attributes.html#target-group-cross-zone) +::: diff --git a/website/static/img/blog/2024-09-30-hybrid-mesh/hybrid-mesh.png b/website/static/img/blog/2024-09-30-hybrid-mesh/hybrid-mesh.png new file mode 100644 index 00000000000..ce081a11834 Binary files /dev/null and b/website/static/img/blog/2024-09-30-hybrid-mesh/hybrid-mesh.png differ