From a490897f551591831c5da361861e6d8adf6d3537 Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Mon, 29 Jan 2024 18:50:33 +1300 Subject: [PATCH 01/14] Rework custom schema docs --- website/docs/docs/build/custom-schemas.md | 132 +++++++++++----------- 1 file changed, 67 insertions(+), 65 deletions(-) diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index b20d4130725..feb9d41bf5d 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -4,25 +4,28 @@ id: "custom-schemas" pagination_next: "docs/build/custom-databases" --- -By default, all dbt models are built in the schema specified in your target. In dbt projects with lots of models, it may be useful to instead build some models in schemas other than your target schema – this can help logically group models together. +By default, all dbt models are built in the schema specified in your [environment](/docs/dbt-cloud-environments) (dbt Cloud) or [profile's target](/docs/core/dbt-core-environments) (dbt Core). This default schema is called your **target schema**. + +In dbt projects with lots of models, it is often preferable to build models across multiple schemas and group similar models together. For example, you may wish to: -For example, you may wish to: * Group models based on the business unit using the model, creating schemas such as `core`, `marketing`, `finance` and `support`; or, * Hide intermediate models in a `staging` schema, and only present models that should be queried by an end user in an `analytics` schema. -You can use **custom schemas** in dbt to build models in a schema other than your target schema. It's important to note that by default, dbt will generate the schema name for a model by **concatenating the custom schema to the target schema**, as in: `_;`. +To do this, specify a custom schema. dbt will then generate the schema name for a model by **appending the custom schema to the target schema**, as in: `_`. | Target schema | Custom schema | Resulting schema | | ------------- | ------------- | ---------------- | -| <target_schema> | None | <target_schema> | -| analytics | None | analytics | -| dbt_alice | None | dbt_alice | -| <target_schema> | <custom_schema> | <target_schema>\_<custom_schema> | -| analytics | marketing | analytics_marketing | -| dbt_alice | marketing | dbt_alice_marketing | +| analytics_prod | None | analytics_prod | +| alice_dev | None | alice_dev | +| dbt_cloud_pr_123_456 | None | dbt_cloud_pr_123_456 | +| analytics_prod | marketing | analytics_prod_marketing | +| alice_dev | marketing | alice_dev_marketing | +| dbt_cloud_pr_123_456 | marketing | dbt_cloud_pr_123_456_marketing | ## How do I use custom schemas? -Use the `schema` configuration key to specify a custom schema for a model. As with any configuration, you can either: + +Use the `schema` configuration key. As with any configuration, you can either: + * apply this configuration to a specific model by using a config block within a model, or * apply it to a subdirectory of models by specifying it in your `dbt_project.yml` file @@ -36,12 +39,10 @@ select ... - - ```yaml -# models in `models/marketing/ will be rendered to the "*_marketing" schema +# models in `models/marketing/ will be built in the "*_marketing" schema models: my_project: marketing: @@ -52,17 +53,17 @@ models: ## Understanding custom schemas -When first using custom schemas, it's common to assume that a model will be built in a schema that matches the `schema` configuration exactly, for example, a model that has the configuration `schema: marketing`, would be built in the `marketing` schema. However, dbt instead creates it in a schema like `_marketing` by default – there's a good reason for this! +When first using custom schemas, it's common to assume that a model will use _only_ the new `schema` configuration, for example, a model that has the configuration `schema: marketing`, would be built in the `marketing` schema. However, dbt will actually put it in a schema like `_marketing` – there's a good reason for this! -In a typical setup of dbt, each dbt user will use a separate target schema (see [Managing Environments](/docs/build/custom-schemas#managing-environments)). If dbt created models in a schema that matches a model's custom schema exactly, every dbt user would create models in the same schema. +Each dbt user has their own target schema for development (see [Managing Environments](#managing-environments)). If dbt ignored the target schema and only used the model's custom schema, every dbt user would create models in the same schema and would overwrite each other's work. -Further, the schema that your development models are built in would be the same schema that your production models are built in! Instead, concatenating the custom schema to the target schema helps create distinct schema names, reducing naming conflicts. +This would be bad enough if it was only development schemas overwriting each other, but it would _also_ impact your production models. By combining the target schema and the custom schema, dbt ensures that objects it creates in your data warehouse don't collide with one another. If you prefer to use different logic for generating a schema name, you can change the way dbt generates a schema name (see below). ### How does dbt generate a model's schema name? -dbt uses a default macro called `generate_schema_name` to determine the name of the schema that a model should be built in. +dbt uses a default macro called `generate_schema_name` to determine the name of the schema that a model should be built in. The following code represents the default macro's logic: @@ -83,30 +84,23 @@ The following code represents the default macro's logic: {%- endmacro %} ``` -## Advanced custom schema configuration - -You can customize schema name generation in dbt depending on your needs, such as creating a custom macro named `generate_schema_name` in your project or using the built-in macro for environment-based schema names. The built-in macro follows a pattern of generating schema names based on the environment, making it a convenient alternative. - -If your dbt project has a macro that’s also named `generate_schema_name`, dbt will always use the macro in your dbt project instead of the default macro. - -### Changing the way dbt generates a schema name +## Changing the way dbt generates a schema name -To modify how dbt generates schema names, you should add a macro named `generate_schema_name` to your project and customize it according to your needs: +If your dbt project has a custom macro called `generate_schema_name`, dbt will use it instead of the default macro. This allows you to customize the name generation according to your needs. -- Copy and paste the `generate_schema_name` macro into a file named 'generate_schema_name'. +To customize this macro, copy the example code above into a file named `macros/generate_schema_name.sql` and make changes as necessary. -- Modify the target schema by either using [target variables](/reference/dbt-jinja-functions/target) or [env_var](/reference/dbt-jinja-functions/env_var). Check out our [Advanced Deployment - Custom Environment and job behavior](https://courses.getdbt.com/courses/advanced-deployment) course video for more details. - -**Note**: dbt will ignore any custom `generate_schema_name` macros included in installed packages. +**Note**: dbt will ignore any custom `generate_schema_name` macros included in installed packages.
❗️ Warning: Don't replace default_schema in the macro. -If you're modifying how dbt generates schema names, don't just replace ```{{ default_schema }}_{{ custom_schema_name | trim }}``` with ```{{ custom_schema_name | trim }}``` in the ```generate_schema_name``` macro. +If you're modifying how dbt generates schema names, don't just replace ```{{ default_schema }}_{{ custom_schema_name | trim }}``` with ```{{ custom_schema_name | trim }}``` in the ```generate_schema_name``` macro. If you remove ```{{ default_schema }}```, it causes developers to override each other's models if they create their own custom schemas. This can also cause issues during development and continuous integration (CI). -❌ The following code block is an example of what your code _should not_ look like: +❌ The following code block is an example of what your code _should not_ look like: + ```sql {% macro generate_schema_name(custom_schema_name, node) -%} @@ -123,39 +117,9 @@ If you remove ```{{ default_schema }}```, it causes developers to override each {%- endmacro %} -``` -
- -### An alternative pattern for generating schema names - -A common way to generate schema names is by adjusting the behavior according to the environment in dbt. Here's how it works: - -**Production environment** - -- If a custom schema is specified, the schema name of a model should match the custom schema, instead of concatenating to the target schema. -- If no custom schema is specified, the schema name of a model should match the target schema. - -**Other environments** (like development or quality assurance (QA)): - -- Build _all_ models in the target schema, ignoring any custom schema configurations. - -dbt ships with a global, predefined macro that contains this logic - `generate_schema_name_for_env`. - -If you want to use this pattern, you'll need a `generate_schema_name` macro in your project that points to this logic. You can do this by creating a file in your `macros` directory (typically named `get_custom_schema.sql`), and copying/pasting the following code: - - - -```sql --- put this in macros/get_custom_schema.sql - -{% macro generate_schema_name(custom_schema_name, node) -%} - {{ generate_schema_name_for_env(custom_schema_name, node) }} -{%- endmacro %} ``` - - -**Note:** When using this macro, you'll need to set the target name in your job specifically to "prod" if you want custom schemas to be applied. + ### generate_schema_name arguments @@ -165,6 +129,7 @@ If you want to use this pattern, you'll need a `generate_schema_name` macro in y | node | The `node` that is currently being processed by dbt | `{"name": "my_model", "resource_type": "model",...}` | ### Jinja context available in generate_schema_name + If you choose to write custom logic to generate a schema name, it's worth noting that not all variables and methods are available to you when defining this logic. In other words: the `generate_schema_name` macro is compiled with a limited Jinja context. The following context methods _are_ available in the `generate_schema_name` macro: @@ -192,12 +157,49 @@ See docs on macro `dispatch`: ["Managing different global overrides across packa +## A built-in alternative pattern for generating schema names + +A common customization is to ignore the target schema in production environments, and ignore the custom schema configurations in other environments (such as development and CI). + +Production Environment (`target.name == 'prod'`) +| Target schema | Custom schema | Resulting schema | +| ------------- | ------------- | ---------------- | +| analytics_prod | None | analytics_prod | +| analytics_prod | marketing | marketing | + +Development/CI Environment (`target.name != 'prod'`) +| Target schema | Custom schema | Resulting schema | +| ------------- | ------------- | ---------------- | +| alice_dev | None | alice_dev | +| alice_dev | marketing | alice_dev | +| dbt_cloud_pr_123_456 | None | dbt_cloud_pr_123_456 | +| dbt_cloud_pr_123_456 | marketing | dbt_cloud_pr_123_456 | + +Just like the normal macro, this approach guarantees that schemas from different environments will not collide. + +dbt ships with a macro for this use case – called `generate_schema_name_for_env` – which is disabled by default. To enable it, add a custom `generate_schema_name` macro to your project that contains the following code: + + + +```sql +-- put this in macros/get_custom_schema.sql + +{% macro generate_schema_name(custom_schema_name, node) -%} + {{ generate_schema_name_for_env(custom_schema_name, node) }} +{%- endmacro %} +``` + + + +**Note:** When using this macro, you'll need to set the target name in your production job to `prod`. + ## Managing environments -In the `generate_schema_name` macro examples shown above, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must additionally ensure that your different dbt environments are configured appropriately. While you can use any naming scheme you'd like, we typically recommend: - - **dev**: Your local development environment; configured in a `profiles.yml` file on your computer. +In the `generate_schema_name` macro examples shown above, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must ensure that your different dbt environments are configured accordingly. While you can use any naming scheme you'd like, we typically recommend: + +* **dev**: Your local development environment; configured in a `profiles.yml` file on your computer. * **ci**: A [continuous integration](/docs/cloud/git/connect-github) environment running on Pull Requests in GitHub, GitLab, etc. - - **prod**: The production deployment of your dbt project, like in dbt Cloud, Airflow, or [similar](/docs/deploy/deployments). +* **prod**: The production deployment of your dbt project, like in dbt Cloud, Airflow, or [similar](/docs/deploy/deployments). If your schema names are being generated incorrectly, double check your target name in the relevant environment. From 1f6a30d90ebd893f579699bd523c22c0518af523 Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Mon, 29 Jan 2024 19:08:53 +1300 Subject: [PATCH 02/14] add line breaks --- website/docs/docs/build/custom-schemas.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index feb9d41bf5d..eb28c46fab1 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -162,12 +162,14 @@ See docs on macro `dispatch`: ["Managing different global overrides across packa A common customization is to ignore the target schema in production environments, and ignore the custom schema configurations in other environments (such as development and CI). Production Environment (`target.name == 'prod'`) + | Target schema | Custom schema | Resulting schema | | ------------- | ------------- | ---------------- | | analytics_prod | None | analytics_prod | | analytics_prod | marketing | marketing | Development/CI Environment (`target.name != 'prod'`) + | Target schema | Custom schema | Resulting schema | | ------------- | ------------- | ---------------- | | alice_dev | None | alice_dev | From 3e9fba13be5e3f777ba7e5808f2e8edcbe6d03a0 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Wed, 31 Jan 2024 10:54:34 -0800 Subject: [PATCH 03/14] Update sl-partner-integration-guide.md --- website/docs/guides/sl-partner-integration-guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/docs/guides/sl-partner-integration-guide.md b/website/docs/guides/sl-partner-integration-guide.md index 7eb158a2c85..57402bc040a 100644 --- a/website/docs/guides/sl-partner-integration-guide.md +++ b/website/docs/guides/sl-partner-integration-guide.md @@ -91,6 +91,8 @@ We recommend organizing metrics and dimensions in ways that a non-technical user - **Organizing Metrics** — The goal is to organize metrics into a hierarchy in our configurations, instead of presenting them in a long list.

This hierarchy helps you organize metrics based on specific criteria, such as business unit or team. By providing this structured organization, users can find and navigate metrics more efficiently, enhancing their overall data analysis experience. +-**Using Saved Queries** — The Semantic Layer has a concept of Saved Queries which allows users to pre-build slices of metrics, dimensions, filters to be easily accessed. You should surface these as first class objects in your integration. Refer to our JDBC and GraphQL guides for syntax. + ### Query flexibility Allow users to query either one metric alone without dimensions or multiple metrics with dimensions. From 8281981ae404a50de0063e5c80491101bbb5a878 Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Thu, 1 Feb 2024 11:08:41 +1300 Subject: [PATCH 04/14] tweak on self-review --- website/docs/docs/build/custom-schemas.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index eb28c46fab1..b9a8ccfff81 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -53,11 +53,11 @@ models: ## Understanding custom schemas -When first using custom schemas, it's common to assume that a model will use _only_ the new `schema` configuration, for example, a model that has the configuration `schema: marketing`, would be built in the `marketing` schema. However, dbt will actually put it in a schema like `_marketing` – there's a good reason for this! +When first using custom schemas, it's a common misunderstanding to assume that a model will use _only_ the new `schema` configuration, for example, a model that has the configuration `schema: marketing` would be built in the `marketing` schema. However, dbt will actually put it in a schema like `_marketing`. -Each dbt user has their own target schema for development (see [Managing Environments](#managing-environments)). If dbt ignored the target schema and only used the model's custom schema, every dbt user would create models in the same schema and would overwrite each other's work. +There's a good reason for this deviation! Each dbt user has their own target schema for development (see [Managing Environments](#managing-environments)). If dbt ignored the target schema and only used the model's custom schema, every dbt user would create models in the same schema and would overwrite each other's work. -This would be bad enough if it was only development schemas overwriting each other, but it would _also_ impact your production models. By combining the target schema and the custom schema, dbt ensures that objects it creates in your data warehouse don't collide with one another. +By combining the target schema and the custom schema, dbt ensures that objects it creates in your data warehouse don't collide with one another. If you prefer to use different logic for generating a schema name, you can change the way dbt generates a schema name (see below). From 6da897fded7f251987bc4fc876dc29a7fa71e945 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 1 Feb 2024 11:07:59 +0000 Subject: [PATCH 05/14] update format --- website/docs/docs/build/saved-queries.md | 4 +--- .../guides/sl-partner-integration-guide.md | 18 +++++++++++------- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/website/docs/docs/build/saved-queries.md b/website/docs/docs/build/saved-queries.md index 2ad16b86f0d..3f61de05cac 100644 --- a/website/docs/docs/build/saved-queries.md +++ b/website/docs/docs/build/saved-queries.md @@ -33,6 +33,4 @@ saved_queries: - "{{ Dimension('listing__capacity_latest') }} > 3" ``` -### FAQs - -* All metrics in a saved query need to use the same dimensions in the `group_by` or `where` clauses. +All metrics in a saved query need to use the same dimensions in the `group_by` or `where` clauses. diff --git a/website/docs/guides/sl-partner-integration-guide.md b/website/docs/guides/sl-partner-integration-guide.md index 57402bc040a..df26556c750 100644 --- a/website/docs/guides/sl-partner-integration-guide.md +++ b/website/docs/guides/sl-partner-integration-guide.md @@ -52,7 +52,7 @@ Best practices for exposing metrics are summarized into five themes: - [Governance](#governance-and-traceability) — Recommendations on how to establish guardrails for governed data work. - [Discoverability](#discoverability) — Recommendations on how to make user-friendly data interactions. -- [Organization](#organization) — Organize metrics and dimensions for all audiences. +- [Organization](#organization) — Organize metrics and dimensions for all audiences, use [saved queries](/docs/build/saved-queries). - [Query flexibility](#query-flexibility) — Allow users to query either one metric alone without dimensions or multiple metrics with dimensions. - [Context and interpretation](#context-and-interpretation) — Contextualize metrics for better analysis; expose definitions, metadata, lineage, and freshness. @@ -73,13 +73,13 @@ When working with more governed data, it's essential to establish clear guardrai - Consider treating [metrics](/docs/build/metrics-overview) as first-class objects rather than measures. Metrics offer a higher-level and more contextual way to interact with data, reducing the burden on end-users to manually aggregate data. -- Easy metric interactions: Provide users with an intuitive approach to: +- **Easy metric interactions** — Provide users with an intuitive approach to: * Search for Metrics — Users should be able to easily search and find relevant metrics. Metrics can serve as the starting point to lead users into exploring dimensions. * Search for Dimensions — Users should be able to query metrics with associated dimensions, allowing them to gain deeper insights into the data. * Filter by Dimension Values — Expose and enable users to filter metrics based on dimension values, encouraging data analysis and exploration. * Filter additional metadata — Allow users to filter metrics based on other available metadata, such as metric type and default time granularity. -- Suggested Metrics: Ideally, the system should intelligently suggest relevant metrics to users based on their team's activities. This approach encourages user exposure, facilitates learning, and supports collaboration among team members. +- **Suggested metrics** — Ideally, the system should intelligently suggest relevant metrics to users based on their team's activities. This approach encourages user exposure, facilitates learning, and supports collaboration among team members. By implementing these recommendations, the data interaction process becomes more user-friendly, empowering users to gain valuable insights without the need for extensive data manipulation. @@ -87,11 +87,11 @@ By implementing these recommendations, the data interaction process becomes more We recommend organizing metrics and dimensions in ways that a non-technical user can understand the data model, without needing much context: -- **Organizing Dimensions** — To help non-technical users understand the data model better, we recommend organizing dimensions based on the entity they originated from. For example, consider dimensions like `user__country` and `product__category`.

You can create groups by extracting `user` and `product` and then nest the respective dimensions under each group. This way, dimensions align with the entity or semantic model they belong to and make them more user-friendly and accessible. +- **Organizing dimensions** — To help non-technical users understand the data model better, we recommend organizing dimensions based on the entity they originated from. For example, consider dimensions like `user__country` and `product__category`.

You can create groups by extracting `user` and `product` and then nest the respective dimensions under each group. This way, dimensions align with the entity or semantic model they belong to and make them more user-friendly and accessible. -- **Organizing Metrics** — The goal is to organize metrics into a hierarchy in our configurations, instead of presenting them in a long list.

This hierarchy helps you organize metrics based on specific criteria, such as business unit or team. By providing this structured organization, users can find and navigate metrics more efficiently, enhancing their overall data analysis experience. +- **Organizing metrics** — The goal is to organize metrics into a hierarchy in our configurations, instead of presenting them in a long list.

This hierarchy helps you organize metrics based on specific criteria, such as business unit or team. By providing this structured organization, users can find and navigate metrics more efficiently, enhancing their overall data analysis experience. --**Using Saved Queries** — The Semantic Layer has a concept of Saved Queries which allows users to pre-build slices of metrics, dimensions, filters to be easily accessed. You should surface these as first class objects in your integration. Refer to our JDBC and GraphQL guides for syntax. +- **Using Saved queries** — The dbt Semantic Layer has a concept of [saved queries](/docs/build/saved-queries) which allows users to pre-build slices of metrics, dimensions, filters to be easily accessed. You should surface these as first class objects in your integration. Refer to our JDBC and GraphQL guides for syntax. ### Query flexibility @@ -104,7 +104,11 @@ Allow users to query either one metric alone without dimensions or multiple metr - Only expose time granularities (monthly, daily, yearly) that match the available metrics. * For example, if a dbt model and its resulting semantic model have a monthly granularity, make sure querying data with a 'daily' granularity isn't available to the user. Our APIs have functionality that will help you surface the correct granularities -- We recommend that time granularity is treated as a general time dimension-specific concept and that it can be applied to more than just the primary aggregation (or `metric_time`). Consider a situation where a user wants to look at `sales` over time by `customer signup month`; in this situation, having the ability to apply granularities to both time dimensions is crucial. Our APIs include information to fetch the granularities for the primary (metric_time) dimensions, as well as all time dimensions. You can treat each time dimension and granularity selection independently in your application. Note: Initially, as a starting point, it makes sense to only support `metric_time` or the primary time dimension, but we recommend expanding that as your solution evolves. +- We recommend that time granularity is treated as a general time dimension-specific concept and that it can be applied to more than just the primary aggregation (or `metric_time`). + + Consider a situation where a user wants to look at `sales` over time by `customer signup month`; in this situation, having the ability to apply granularities to both time dimensions is crucial. Our APIs include information to fetch the granularities for the primary (metric_time) dimensions, as well as all time dimensions. + + You can treat each time dimension and granularity selection independently in your application. Note: Initially, as a starting point, it makes sense to only support `metric_time` or the primary time dimension, but we recommend expanding that as your solution evolves. - You should allow users to filter on date ranges and expose a calendar and nice presets for filtering these. * For example, last 30 days, last week, and so on. From 683350263afabcc40eb4a6b7c9642ddd111ab6b8 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 1 Feb 2024 11:13:39 +0000 Subject: [PATCH 06/14] update --- website/docs/guides/sl-partner-integration-guide.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/website/docs/guides/sl-partner-integration-guide.md b/website/docs/guides/sl-partner-integration-guide.md index df26556c750..e47948b27cd 100644 --- a/website/docs/guides/sl-partner-integration-guide.md +++ b/website/docs/guides/sl-partner-integration-guide.md @@ -91,7 +91,7 @@ We recommend organizing metrics and dimensions in ways that a non-technical user - **Organizing metrics** — The goal is to organize metrics into a hierarchy in our configurations, instead of presenting them in a long list.

This hierarchy helps you organize metrics based on specific criteria, such as business unit or team. By providing this structured organization, users can find and navigate metrics more efficiently, enhancing their overall data analysis experience. -- **Using Saved queries** — The dbt Semantic Layer has a concept of [saved queries](/docs/build/saved-queries) which allows users to pre-build slices of metrics, dimensions, filters to be easily accessed. You should surface these as first class objects in your integration. Refer to our JDBC and GraphQL guides for syntax. +- **Using Saved queries** — The dbt Semantic Layer has a concept of [saved queries](/docs/build/saved-queries) which allows users to pre-build slices of metrics, dimensions, filters to be easily accessed. You should surface these as first class objects in your integration. Refer to the [JDBC](/docs/dbt-cloud-apis/sl-jdbc) and [GraphQL](/docs/dbt-cloud-apis/sl-graphql) APIs for syntax. ### Query flexibility @@ -148,6 +148,7 @@ These are recommendations on how to evolve a Semantic Layer integration and not * Listing available dimensions based on one or many metrics * Querying defined metric values on their own or grouping by available dimensions * Display metadata from [Discovery API](/docs/dbt-cloud-apis/discovery-api) and other context +* Use [saved queries](/docs/build/saved-queries) to query pre-build metrics, dimensions, and filters. Refer to the [JDBC](/docs/dbt-cloud-apis/sl-jdbc) and [GraphQL](/docs/dbt-cloud-apis/sl-graphql) APIs for syntax. **Stage 3 - More querying flexibility and better user experience (UX)** * More advanced filtering From 7d59f7469bd2b344c977dcd3e8fdcfc93b19ffbd Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Fri, 2 Feb 2024 15:31:25 +1300 Subject: [PATCH 07/14] Apply suggestions from code review Co-authored-by: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> --- website/docs/docs/build/custom-schemas.md | 34 +++++++++++------------ 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index b9a8ccfff81..086aeae0853 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -4,14 +4,14 @@ id: "custom-schemas" pagination_next: "docs/build/custom-databases" --- -By default, all dbt models are built in the schema specified in your [environment](/docs/dbt-cloud-environments) (dbt Cloud) or [profile's target](/docs/core/dbt-core-environments) (dbt Core). This default schema is called your **target schema**. +By default, all dbt models are built in the schema specified in your [environment](/docs/dbt-cloud-environments) (dbt Cloud) or [profile's target](/docs/core/dbt-core-environments) (dbt Core). This default schema is called your _target schema_. -In dbt projects with lots of models, it is often preferable to build models across multiple schemas and group similar models together. For example, you may wish to: +For dbt projects with lots of models, it's common to build models across multiple schemas and group similar models together. For example, you might want to: -* Group models based on the business unit using the model, creating schemas such as `core`, `marketing`, `finance` and `support`; or, +* Group models based on the business unit using the model, creating schemas such as `core`, `marketing`, `finance` and `support` * Hide intermediate models in a `staging` schema, and only present models that should be queried by an end user in an `analytics` schema. -To do this, specify a custom schema. dbt will then generate the schema name for a model by **appending the custom schema to the target schema**, as in: `_`. +To do this, specify a custom schema. dbt generates the schema name for a model by appending the custom schema to the target schema. For example, `_`. | Target schema | Custom schema | Resulting schema | | ------------- | ------------- | ---------------- | @@ -24,9 +24,9 @@ To do this, specify a custom schema. dbt will then generate the schema name for ## How do I use custom schemas? -Use the `schema` configuration key. As with any configuration, you can either: +To specify a custom schema for a model, use the `schema` configuration key. As with any configuration, you can do one of the following: -* apply this configuration to a specific model by using a config block within a model, or +* apply this configuration to a specific model by using a config block within a model * apply it to a subdirectory of models by specifying it in your `dbt_project.yml` file @@ -53,9 +53,9 @@ models: ## Understanding custom schemas -When first using custom schemas, it's a common misunderstanding to assume that a model will use _only_ the new `schema` configuration, for example, a model that has the configuration `schema: marketing` would be built in the `marketing` schema. However, dbt will actually put it in a schema like `_marketing`. +When first using custom schemas, it's a common misunderstanding to assume that a model _only_ uses the new `schema` configuration; for example, a model that has the configuration `schema: marketing` would be built in the `marketing` schema. However, dbt puts it in a schema like `_marketing`. -There's a good reason for this deviation! Each dbt user has their own target schema for development (see [Managing Environments](#managing-environments)). If dbt ignored the target schema and only used the model's custom schema, every dbt user would create models in the same schema and would overwrite each other's work. +There's a good reason for this deviation. Each dbt user has their own target schema for development (refer to [Managing Environments](#managing-environments)). If dbt ignored the target schema and only used the model's custom schema, every dbt user would create models in the same schema and would overwrite each other's work. By combining the target schema and the custom schema, dbt ensures that objects it creates in your data warehouse don't collide with one another. @@ -88,9 +88,9 @@ The following code represents the default macro's logic: If your dbt project has a custom macro called `generate_schema_name`, dbt will use it instead of the default macro. This allows you to customize the name generation according to your needs. -To customize this macro, copy the example code above into a file named `macros/generate_schema_name.sql` and make changes as necessary. +To customize this macro, copy the example code in the section [How does dbt generate a model's schema name](#how-does-dbt-generate-a-models-schema-name) into a file named `macros/generate_schema_name.sql` and make changes as necessary. -**Note**: dbt will ignore any custom `generate_schema_name` macros included in installed packages. +Be careful. dbt will ignore any custom `generate_schema_name` macros included in installed packages.
❗️ Warning: Don't replace default_schema in the macro. @@ -177,9 +177,9 @@ Development/CI Environment (`target.name != 'prod'`) | dbt_cloud_pr_123_456 | None | dbt_cloud_pr_123_456 | | dbt_cloud_pr_123_456 | marketing | dbt_cloud_pr_123_456 | -Just like the normal macro, this approach guarantees that schemas from different environments will not collide. +Similar to the regular macro, this approach guarantees that schemas from different environments will not collide. -dbt ships with a macro for this use case – called `generate_schema_name_for_env` – which is disabled by default. To enable it, add a custom `generate_schema_name` macro to your project that contains the following code: +dbt ships with a macro for this use case — called `generate_schema_name_for_env` — which is disabled by default. To enable it, add a custom `generate_schema_name` macro to your project that contains the following code: @@ -193,16 +193,16 @@ dbt ships with a macro for this use case – called `generate_schema_name_for_en -**Note:** When using this macro, you'll need to set the target name in your production job to `prod`. +When using this macro, you'll need to set the target name in your production job to `prod`. ## Managing environments In the `generate_schema_name` macro examples shown above, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must ensure that your different dbt environments are configured accordingly. While you can use any naming scheme you'd like, we typically recommend: -* **dev**: Your local development environment; configured in a `profiles.yml` file on your computer. -* **ci**: A [continuous integration](/docs/cloud/git/connect-github) environment running on Pull Requests in GitHub, GitLab, etc. -* **prod**: The production deployment of your dbt project, like in dbt Cloud, Airflow, or [similar](/docs/deploy/deployments). +* **dev** — Your local development environment; configured in a `profiles.yml` file on your computer. +* **ci** — A [continuous integration](/docs/cloud/git/connect-github) environment running on pull pequests in GitHub, GitLab, and so on. +* **prod** — The production deployment of your dbt project, like in dbt Cloud, Airflow, or [similar](/docs/deploy/deployments). -If your schema names are being generated incorrectly, double check your target name in the relevant environment. +If your schema names are being generated incorrectly, double-check your target name in the relevant environment. For more information, consult the [managing environments in dbt Core](/docs/core/dbt-core-environments) guide. From d5d62ec6cd37794a9392a21a06690a84b724b543 Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Fri, 2 Feb 2024 15:34:21 +1300 Subject: [PATCH 08/14] Apply final change from code review --- website/docs/docs/build/custom-schemas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index 086aeae0853..042bb45a744 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -197,7 +197,7 @@ When using this macro, you'll need to set the target name in your production job ## Managing environments -In the `generate_schema_name` macro examples shown above, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must ensure that your different dbt environments are configured accordingly. While you can use any naming scheme you'd like, we typically recommend: +In the `generate_schema_name` macro examples shown in the [built-in alternative pattern](#a-built-in-alternative-pattern-for-generating-schema-names) section, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must ensure that your different dbt environments are configured accordingly. While you can use any naming scheme you'd like, we typically recommend: * **dev** — Your local development environment; configured in a `profiles.yml` file on your computer. * **ci** — A [continuous integration](/docs/cloud/git/connect-github) environment running on pull pequests in GitHub, GitLab, and so on. From b12c5c69b82910ff5f523dcab080c6656d0eea43 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Fri, 2 Feb 2024 10:21:52 +0000 Subject: [PATCH 09/14] Update website/docs/guides/sl-partner-integration-guide.md --- website/docs/guides/sl-partner-integration-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/guides/sl-partner-integration-guide.md b/website/docs/guides/sl-partner-integration-guide.md index e47948b27cd..21ea822389f 100644 --- a/website/docs/guides/sl-partner-integration-guide.md +++ b/website/docs/guides/sl-partner-integration-guide.md @@ -148,7 +148,7 @@ These are recommendations on how to evolve a Semantic Layer integration and not * Listing available dimensions based on one or many metrics * Querying defined metric values on their own or grouping by available dimensions * Display metadata from [Discovery API](/docs/dbt-cloud-apis/discovery-api) and other context -* Use [saved queries](/docs/build/saved-queries) to query pre-build metrics, dimensions, and filters. Refer to the [JDBC](/docs/dbt-cloud-apis/sl-jdbc) and [GraphQL](/docs/dbt-cloud-apis/sl-graphql) APIs for syntax. +* Expose [Saved queries](/docs/build/saved-queries), which are pre-built metrics, dimensions, and filters that Semantic Layer developers create for easier analysis. You can expose them in your application. Refer to the [JDBC](/docs/dbt-cloud-apis/sl-jdbc) and [GraphQL](/docs/dbt-cloud-apis/sl-graphql) APIs for syntax. **Stage 3 - More querying flexibility and better user experience (UX)** * More advanced filtering From 34237df72d176763aeb59512ba395e06fdc35430 Mon Sep 17 00:00:00 2001 From: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> Date: Fri, 2 Feb 2024 16:23:39 -0700 Subject: [PATCH 10/14] `unique_key` applies to the input rather than the output --- website/docs/reference/resource-configs/unique_key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index 4e2409bb618..3f3ee20eabd 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -27,7 +27,7 @@ snapshots: ## Description -A column name or expression that is unique for the results of a snapshot. dbt uses this to match records between a result set and an existing snapshot, so that changes can be captured correctly. +A column name or expression that is unique for the inputs of a snapshot. dbt uses this to match records between a result set and an existing snapshot, so that changes can be captured correctly. :::caution From a308d6bbbf10596ee925ecdedc56859d25cc68f5 Mon Sep 17 00:00:00 2001 From: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> Date: Fri, 2 Feb 2024 16:28:14 -0700 Subject: [PATCH 11/14] Testing for uniqueness prior to snapshots --- website/docs/reference/resource-configs/unique_key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index 4e2409bb618..8b7146385e8 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -31,7 +31,7 @@ A column name or expression that is unique for the results of a snapshot. dbt us :::caution -Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider adding a test to your project to ensure that this key is indeed unique. +Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](/blog/primary-key-testing) the source data to ensure that this key is indeed unique. ::: From e13c50f9df22bdbacb91bc472127a39a087457f2 Mon Sep 17 00:00:00 2001 From: Doug Beatty <44704949+dbeatty10@users.noreply.github.com> Date: Fri, 2 Feb 2024 16:34:05 -0700 Subject: [PATCH 12/14] Hyperlink to the most specific section for testing primary keys with data tests --- website/docs/reference/resource-configs/unique_key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index 8b7146385e8..dec1aae9a8e 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -31,7 +31,7 @@ A column name or expression that is unique for the results of a snapshot. dbt us :::caution -Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](/blog/primary-key-testing) the source data to ensure that this key is indeed unique. +Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](blog/primary-key-testing#how-to-test-primary-keys-with-dbt) the source data to ensure that this key is indeed unique. ::: From 97b27a0b9436956292d6cad0d6a12dd8b4af4ea1 Mon Sep 17 00:00:00 2001 From: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> Date: Fri, 2 Feb 2024 15:53:29 -0800 Subject: [PATCH 13/14] Update website/docs/docs/build/custom-schemas.md --- website/docs/docs/build/custom-schemas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/custom-schemas.md b/website/docs/docs/build/custom-schemas.md index 042bb45a744..24cd4194a1c 100644 --- a/website/docs/docs/build/custom-schemas.md +++ b/website/docs/docs/build/custom-schemas.md @@ -8,7 +8,7 @@ By default, all dbt models are built in the schema specified in your [environmen For dbt projects with lots of models, it's common to build models across multiple schemas and group similar models together. For example, you might want to: -* Group models based on the business unit using the model, creating schemas such as `core`, `marketing`, `finance` and `support` +* Group models based on the business unit using the model, creating schemas such as `core`, `marketing`, `finance` and `support`. * Hide intermediate models in a `staging` schema, and only present models that should be queried by an end user in an `analytics` schema. To do this, specify a custom schema. dbt generates the schema name for a model by appending the custom schema to the target schema. For example, `_`. From 6a42be9eadefc518c15949eb6223dae9303f1a5c Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Mon, 5 Feb 2024 10:10:29 +0000 Subject: [PATCH 14/14] Update website/docs/reference/resource-configs/unique_key.md --- website/docs/reference/resource-configs/unique_key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/unique_key.md b/website/docs/reference/resource-configs/unique_key.md index dec1aae9a8e..46aad99e71f 100644 --- a/website/docs/reference/resource-configs/unique_key.md +++ b/website/docs/reference/resource-configs/unique_key.md @@ -31,7 +31,7 @@ A column name or expression that is unique for the results of a snapshot. dbt us :::caution -Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](blog/primary-key-testing#how-to-test-primary-keys-with-dbt) the source data to ensure that this key is indeed unique. +Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](/blog/primary-key-testing#how-to-test-primary-keys-with-dbt) the source data to ensure that this key is indeed unique. :::