diff --git a/website/blog/2021-11-22-dbt-labs-pr-template.md b/website/blog/2021-11-22-dbt-labs-pr-template.md index 40d4960ac18..439a02371ec 100644 --- a/website/blog/2021-11-22-dbt-labs-pr-template.md +++ b/website/blog/2021-11-22-dbt-labs-pr-template.md @@ -70,7 +70,7 @@ Checking for things like modularity and 1:1 relationships between sources and st #### Validation of models: -This section should show something to confirm that your model is doing what you intended it to do. This could be a [dbt test](/docs/build/tests) like uniqueness or not null, or could be an ad-hoc query that you wrote to validate your data. Here is a screenshot from a test run on a local development branch: +This section should show something to confirm that your model is doing what you intended it to do. This could be a [dbt test](/docs/build/data-tests) like uniqueness or not null, or could be an ad-hoc query that you wrote to validate your data. Here is a screenshot from a test run on a local development branch: ![test validation](/img/blog/pr-template-test-validation.png "dbt test validation") diff --git a/website/blog/2021-11-22-primary-keys.md b/website/blog/2021-11-22-primary-keys.md index 84c92055eb0..d5f87cddd94 100644 --- a/website/blog/2021-11-22-primary-keys.md +++ b/website/blog/2021-11-22-primary-keys.md @@ -51,7 +51,7 @@ In the days before testing your data was commonplace, you often found out that y ## How to test primary keys with dbt -Today, you can add two simple [dbt tests](/docs/build/tests) onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data. +Today, you can add two simple [dbt tests](/docs/build/data-tests) onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data. Not surprisingly, these two tests correspond to the two most common errors found on your primary keys, and are usually the first tests that teams testing data with dbt implement: diff --git a/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md b/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md index b179c0f5c7c..d20c7d139d0 100644 --- a/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md +++ b/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md @@ -90,7 +90,7 @@ So instead of getting bogged down in defining roles, let’s focus on hard skill The common skills needed for implementing any flavor of dbt (Core or Cloud) are: * SQL: ‘nuff said -* YAML: required to generate config files for [writing tests on data models](/docs/build/tests) +* YAML: required to generate config files for [writing tests on data models](/docs/build/data-tests) * [Jinja](/guides/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc) YAML + Jinja can be learned pretty quickly, but SQL is the non-negotiable you’ll need to get started. diff --git a/website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md b/website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md index 8ea387cf00c..f3a24a0febd 100644 --- a/website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md +++ b/website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md @@ -87,7 +87,7 @@ The most important thing we’re introducing when your project is an infant is t * Introduce modularity with [{{ ref() }}](/reference/dbt-jinja-functions/ref) and [{{ source() }}](/reference/dbt-jinja-functions/source) -* [Document](/docs/collaborate/documentation) and [test](/docs/build/tests) your first models +* [Document](/docs/collaborate/documentation) and [test](/docs/build/data-tests) your first models ![image alt text](/img/blog/building-a-mature-dbt-project-from-scratch/image_3.png) diff --git a/website/blog/2022-04-19-complex-deduplication.md b/website/blog/2022-04-19-complex-deduplication.md index daacff4eec6..f33e6a8fe35 100644 --- a/website/blog/2022-04-19-complex-deduplication.md +++ b/website/blog/2022-04-19-complex-deduplication.md @@ -146,7 +146,7 @@ select * from filter_real_diffs > *What happens in this step? You check your data because you are thorough!* -Good thing dbt has already built this for you. Add a [unique test](/docs/build/tests#generic-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test! +Good thing dbt has already built this for you. Add a [unique test](/docs/build/data-tests#generic-data-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test! ```yaml models: diff --git a/website/blog/2022-09-28-analyst-to-ae.md b/website/blog/2022-09-28-analyst-to-ae.md index 7c8ccaeabec..bf19bbae59e 100644 --- a/website/blog/2022-09-28-analyst-to-ae.md +++ b/website/blog/2022-09-28-analyst-to-ae.md @@ -111,7 +111,7 @@ The analyst caught the issue because they have the appropriate context to valida An analyst is able to identify which areas do *not* need to be 100% accurate, which means they can also identify which areas *do* need to be 100% accurate. -> dbt makes it very quick to add [data quality tests](/docs/build/tests). In fact, it’s so quick, that it’ll take an analyst longer to write up what tests they want than it would take for an analyst to completely finish coding them. +> dbt makes it very quick to add [data quality tests](/docs/build/data-tests). In fact, it’s so quick, that it’ll take an analyst longer to write up what tests they want than it would take for an analyst to completely finish coding them. When data quality issues are identified by the business, we often see that analysts are the first ones to be asked: diff --git a/website/blog/2022-10-19-polyglot-dbt-python-dataframes-and-sql.md b/website/blog/2022-10-19-polyglot-dbt-python-dataframes-and-sql.md index bab92000a16..694f6ddc105 100644 --- a/website/blog/2022-10-19-polyglot-dbt-python-dataframes-and-sql.md +++ b/website/blog/2022-10-19-polyglot-dbt-python-dataframes-and-sql.md @@ -133,9 +133,9 @@ This model tries to parse the raw string value into a Python datetime. When not #### Testing the result -During the build process, dbt will check if any of the values are null. This is using the built-in [`not_null`](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) test, which will generate and execute SQL in the data platform. +During the build process, dbt will check if any of the values are null. This is using the built-in [`not_null`](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-data-tests) test, which will generate and execute SQL in the data platform. -Our initial recommendation for testing Python models is to use [generic](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) and [singular](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-tests) tests. +Our initial recommendation for testing Python models is to use [generic](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-data-tests) and [singular](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-data-tests) tests. ```yaml version: 2 diff --git a/website/blog/2023-01-24-aggregating-test-failures.md b/website/blog/2023-01-24-aggregating-test-failures.md index d82c202b376..2319da910a6 100644 --- a/website/blog/2023-01-24-aggregating-test-failures.md +++ b/website/blog/2023-01-24-aggregating-test-failures.md @@ -30,7 +30,7 @@ _It should be noted that this framework is for dbt v1.0+ on BigQuery. Small adap When we talk about high quality data tests, we aren’t just referencing high quality code, but rather the informational quality of our testing framework and their corresponding error messages. Originally, we theorized that any test that cannot be acted upon is a test that should not be implemented. Later, we realized there is a time and place for tests that should receive attention at a critical mass of failures. All we needed was a higher specificity system: tests should have an explicit severity ranking associated with them, equipped to filter out the noise of common, but low concern, failures. Each test should also mesh into established [RACI](https://project-management.com/understanding-responsibility-assignment-matrix-raci-matrix/) guidelines that state which groups tackle what failures, and what constitutes a critical mass. -To ensure that tests are always acted upon, we implement tests differently depending on the user groups that must act when a test fails. This led us to have two main classes of tests — Data Integrity Tests (called [Generic Tests](https://docs.getdbt.com/docs/build/tests) in dbt docs) and Context Driven Tests (called [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-tests) in dbt docs), with varying levels of severity across both test classes. +To ensure that tests are always acted upon, we implement tests differently depending on the user groups that must act when a test fails. This led us to have two main classes of tests — Data Integrity Tests (called [Generic Tests](https://docs.getdbt.com/docs/build/tests) in dbt docs) and Context Driven Tests (called [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-data-tests) in dbt docs), with varying levels of severity across both test classes. Data Integrity tests (Generic Tests)  are simple — they’re tests akin to a uniqueness check or not null constraint. These tests are usually actionable by the data platform team rather than subject matter experts. We define Data Integrity tests in our YAML files, similar to how they are [outlined by dbt’s documentation on generic tests](https://docs.getdbt.com/docs/build/tests). They look something like this — diff --git a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md index 2a4879ac98d..6b1012a5320 100644 --- a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md +++ b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md @@ -143,7 +143,7 @@ To help you get started, [we have created a template GitHub project](https://git ### Entity Relation Diagrams (ERDs) and dbt -Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-tests) into an ERD. +Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-data-tests) into an ERD. ## Summary diff --git a/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md b/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md new file mode 100644 index 00000000000..ea77072a6dd --- /dev/null +++ b/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md @@ -0,0 +1,97 @@ +--- +title: "How we built consistent product launch metrics with the dbt Semantic Layer." +description: "We built an end-to-end data pipeline for measuring the launch of the dbt Semantic Layer using the dbt Semantic Layer." +slug: product-analytics-pipeline-with-dbt-semantic-layer + +authors: [jordan_stein] + +tags: [dbt Cloud] +hide_table_of_contents: false + +date: 2023-12-12 +is_featured: false +--- +There’s nothing quite like the feeling of launching a new product. +On launch day emotions can range from excitement, to fear, to accomplishment all in the same hour. +Once the dust settles and the product is in the wild, the next thing the team needs to do is track how the product is doing. +How many users do we have? How is performance looking? What features are customers using? How often? Answering these questions is vital to understanding the success of any product launch. + +At dbt we recently made the [Semantic Layer Generally Available](https://www.getdbt.com/blog/new-dbt-cloud-features-announced-at-coalesce-2023). The Semantic Layer lets teams define business metrics centrally, in dbt, and access them in multiple analytics tools through our semantic layer APIs. +I’m a Product Manager on the Semantic Layer team, and the launch of the Semantic Layer put our team in an interesting, somewhat “meta,” position: we need to understand how a product launch is doing, and the product we just launched is designed to make defining and consuming metrics much more efficient. It’s the perfect opportunity to put the semantic layer through its paces for product analytics. This blog post walks through the end-to-end process we used to set up product analytics for the dbt Semantic Layer using the dbt Semantic Layer. + +## Getting your data ready for metrics + +The first steps to building a product analytics pipeline with the Semantic Layer look the same as just using dbt - it’s all about data transformation. The steps we followed were broadly: + +1. Work with engineering to understand the data sources. In our case, it’s db exports from Semantic Layer Server. +2. Load the data into our warehouse. We use Fivetran and Snowflake. +3. Transform the data into normalized tables with dbt. This step is a classic. dbt’s bread and butter. You probably know the drill by now. + +There are [plenty of other great resources](https://docs.getdbt.com/docs/build/projects) on how to accomplish the above steps, I’m going to skip that in this post and focus on how we built business metrics using the Semantic Layer. Once the data is loaded and modeling is complete, our DAG for the Semantic Layer data looks like the following: + + + + + + + + +Let’s walk through the DAG from left to right: First, we have raw tables from the Semantic Layer Server loaded into our warehouse, next we have staging models where we apply business logic and finally a clean, normalized `fct_semantic_layer_queries` model. Finally, we built a semantic model named `semantic_layer_queries` on top of our normalized fact model. This is a typical DAG for a dbt project that contains semantic objects. Now let’s zoom in to the section of the DAG that contains our semantic layer objects and look in more detail at how we defined our semantic layer product metrics. + +## [How we build semantic models and metrics](https://docs.getdbt.com/best-practices/how-we-build-our-metrics/semantic-layer-1-intro) + +What [is a semantic model](https://docs.getdbt.com/docs/build/semantic-models)? Put simply, semantic models contain the components we need to build metrics. Semantic models are YAML files that live in your dbt project. They contain metadata about your dbt models in a format that MetricFlow, the query builder that powers the semantic layer, can understand. The DAG below in [dbt Explorer](https://docs.getdbt.com/docs/collaborate/explore-projects) shows the metrics we’ve built off of `semantic_layer_queries`. + + + + +Let’s dig into semantic models and metrics a bit more, and explain some of the data modeling decisions we made. First, we needed to decide what model to use as a base for our semantic model. We decide to use`fct_semantic_layer`queries as our base model because defining a semantic model on top of a normalized fact table gives us maximum flexibility to join to other tables. This increased the number of dimensions available, which means we can answer more questions. + +You may wonder: why not just build our metrics on top of raw tables and let MetricFlow figure out the rest? The reality is, that you will almost almost always need to do some form of data modeling to create the data set you want to build your metrics off of. MetricFlow’s job isn’t to do data modeling. The transformation step is done with dbt. + +Next, we had to decide what we wanted to put into our semantic models. Semantic models contain [dimensions](https://docs.getdbt.com/docs/build/dimensions), [measures](https://docs.getdbt.com/docs/build/measures), and [entities](https://docs.getdbt.com/docs/build/entities). We took the following approach to add each of these components: + +- Dimensions: We included all the relevant dimensions in our semantic model that stakeholders might ask for, like the time a query was created, the query status, and booleans showing if a query contained certain elements like a where filter or multiple metrics. +- Entities: We added entities to our semantic model, like dbt cloud environment id. Entities function as join keys in semantic models, which means any other semantic models that have a j[oinable entity](https://docs.getdbt.com/docs/build/join-logic) can be used when querying metrics. +- Measures: Next we added Measures. Measures define the aggregation you want to run on your data. I think of measures as a metric primitive, we’ll use them to build metrics and can reuse them to keep our code [DRY](https://docs.getdbt.com/terms/dry). + +Finally, we reference the measures defined in our semantic model to create metrics. Our initial set of usage metrics are all relatively simple aggregations. For example, the total number of queries run. + +```yaml +## Example of a metric definition +metrics: + - name: queries + description: The total number of queries run + type: simple + label: Semantic Layer Queries + type_params: + measure: queries +``` + +Having our metrics in the semantic layer is powerful in a few ways. Firstly, metric definitions and the generated SQL are centralized, and live in our dbt project, instead of being scattered across BI tools or sql clients. Secondly, the types of queries I can run are dynamic and flexible. Traditionally, I would materialize a cube or rollup table which needs to contain all the different dimensional slices my users might be curious about. Now, users can join tables and add dimensionality to their metrics queries on the fly at query time, saving our data team cycles of updating and adding new fields to rollup tables. Thirdly, we can expose these metrics to a variety of downstream BI tools so stakeholders in product, finance, or GTM can understand product performance regardless of their technical skills. + +Now that we’ve done the pipeline work to set up our metrics for the semantic layer launch we’re ready to analyze how the launch went! + +## Our Finance, Operations and GTM teams are all looking at the same metrics 😊 + +To query to Semantic Layer you have two paths: you can query metrics directly through the Semantic Layer APIs or use one of our [first-class integrations](https://docs.getdbt.com/docs/use-dbt-semantic-layer/avail-sl-integrations). Our analytics team and product teams are big Hex users, while our operations and finance teams live and breathe Google Sheets, so it’s important for us to have the same metric definitions available in both tools. + +The leg work of building our pipeline and defining metrics is all done, which makes last-mile consumption much easier. First, we set up a launch dashboard in Hex as the source of truth for semantic layer product metrics. This tool is used by cross-functional partners like marketing, sales, and the executive team to easily check product and usage metrics like total semantic layer queries, or weekly active semantic layer users. To set up our Hex connection, we simply enter a few details from our dbt Cloud environment and then we can work with metrics directly in Hex notebooks. We can use the JDBC interface, or use Hex’s GUI metric builder to build reports. We run all our WBRs off this dashboard, which allows us to spot trends in consumption and react quickly to changes in our business. + + + + + +On the finance and operations side, product usage data is crucial to making informed pricing decisions. All our pricing models are created in spreadsheets, so we leverage the Google Sheets integration to give those teams access to consistent data sets without the need to download CSVs from the Hex dashboard. This lets the Pricing team add dimensional slices, like tier and company size, to the data in a self-serve manner without having to request data team resources to generate those insights. This allows our finance team to iteratively build financial models and be more self-sufficient in pulling data, instead of relying on data team resources. + + + + + +As a former data scientist and data engineer, I personally think this is a huge improvement over the approach I would have used without the semantic layer. My old approach would have been to materialize One Big Table with all the numeric and categorical columns I needed for analysis. Then write a ton of SQL in Hex or various notebooks to create reports for stakeholders. Inevitably I’m signing up for more development cycles to update the pipeline whenever a new dimension needs to be added or the data needs to be aggregated in a slightly different way. From a data team management perspective, using a central semantic layer saves data analysts cycles since users can more easily self-serve. At every company I’ve ever worked at, data analysts are always in high demand, with more requests than they can reasonably accomplish. This means any time a stakeholder can self-serve their data without pulling us in is a huge win. + +## The Result: Consistent Governed Metrics + +And just like that, we have an end-to-end pipeline for product analytics on the dbt Semantic Layer using the dbt Semantic Layer 🤯. Part of the foundational work to build this pipeline will be familiar to you, like building out a normalized fact table using dbt. Hopefully walking through the next step of adding semantic models and metrics on top of those dbt models helped give you some ideas about how you can use the semantic layer for your team. Having launch metrics defined in dbt made keeping the entire organization up to date on product adoption and performance much easier. Instead of a rollup table or static materialized cubes, we added flexible metrics without rewriting logic in SQL, or adding additional tables to the end of our DAG. + +The result is access to consistent and governed metrics in the tool our stakeholders are already using to do their jobs. We are able to keep the entire organization aligned and give them access to consistent, accurate data they need to do their part to make the semantic layer product successful. Thanks for reading! If you’re thinking of using the semantic layer, or have questions we’re always happy to keep the conversation going in the [dbt community slack.](https://www.getdbt.com/community/join-the-community) Drop us a note in #dbt-cloud-semantic-layer. We’d love to hear from you! diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 31d69824ed4..cd2bd162935 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -287,6 +287,14 @@ jonathan_natkins: url: https://twitter.com/nattyice name: Jon "Natty" Natkins organization: dbt Labs +jordan_stein: + image_url: /img/blog/authors/jordan.jpeg + job_title: Product Manager + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/jstein5/ + name: Jordan Stein + organization: dbt Labs josh_fell: image_url: /img/blog/authors/josh-fell.jpeg diff --git a/website/docs/best-practices/custom-generic-tests.md b/website/docs/best-practices/custom-generic-tests.md index f2d84e38853..e96fc864ee6 100644 --- a/website/docs/best-practices/custom-generic-tests.md +++ b/website/docs/best-practices/custom-generic-tests.md @@ -1,15 +1,15 @@ --- -title: "Writing custom generic tests" +title: "Writing custom generic data tests" id: "writing-custom-generic-tests" -description: Learn how to define your own custom generic tests. -displayText: Writing custom generic tests -hoverSnippet: Learn how to define your own custom generic tests. +description: Learn how to define your own custom generic data tests. +displayText: Writing custom generic data tests +hoverSnippet: Learn how to write your own custom generic data tests. --- -dbt ships with [Not Null](/reference/resource-properties/tests#not-null), [Unique](/reference/resource-properties/tests#unique), [Relationships](/reference/resource-properties/tests#relationships), and [Accepted Values](/reference/resource-properties/tests#accepted-values) generic tests. (These used to be called "schema tests," and you'll still see that name in some places.) Under the hood, these generic tests are defined as `test` blocks (like macros) in a globally accessible dbt project. You can find the source code for these tests in the [global project](https://github.com/dbt-labs/dbt-core/tree/main/core/dbt/include/global_project/macros/generic_test_sql). +dbt ships with [Not Null](/reference/resource-properties/data-tests#not-null), [Unique](/reference/resource-properties/data-tests#unique), [Relationships](/reference/resource-properties/data-tests#relationships), and [Accepted Values](/reference/resource-properties/data-tests#accepted-values) generic data tests. (These used to be called "schema tests," and you'll still see that name in some places.) Under the hood, these generic data tests are defined as `test` blocks (like macros) in a globally accessible dbt project. You can find the source code for these tests in the [global project](https://github.com/dbt-labs/dbt-core/tree/main/core/dbt/include/global_project/macros/generic_test_sql). :::info -There are tons of generic tests defined in open source packages, such as [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) and [dbt-expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) — the test you're looking for might already be here! +There are tons of generic data tests defined in open source packages, such as [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) and [dbt-expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) — the test you're looking for might already be here! ::: ### Generic tests with standard arguments diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md index 89153fe1b86..a55e1d121af 100644 --- a/website/docs/best-practices/dbt-unity-catalog-best-practices.md +++ b/website/docs/best-practices/dbt-unity-catalog-best-practices.md @@ -49,9 +49,9 @@ The **prod** service principal should have “read” access to raw source data, | | Source Data | Development catalog | Production catalog | Test catalog | | --- | --- | --- | --- | --- | -| developers | use | use, create table & create view | use or none | none | -| production service principal | use | none | use, create table & create view | none | -| Test service principal | use | none | none | use, create table & create view | +| developers | use | use, create schema, table, & view | use or none | none | +| production service principal | use | none | use, create schema, table & view | none | +| Test service principal | use | none | none | use, create schema, table & view | ## Next steps diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md index 19c6717063c..ee3d4262882 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md @@ -4,10 +4,6 @@ description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow --- -:::tip -**This is a guide for a beta product.** We anticipate this guide will evolve alongside the Semantic Layer through community collaboration. We welcome discussions, ideas, issues, and contributions to refining best practices. -::: - Flying cars, hoverboards, and true self-service analytics: this is the future we were promised. The first two might still be a few years out, but real self-service analytics is here today. With dbt Cloud's Semantic Layer, you can resolve the tension between accuracy and flexibility that has hampered analytics tools for years, empowering everybody in your organization to explore a shared reality of metrics. Best of all for analytics engineers, building with these new tools will significantly [DRY](https://docs.getdbt.com/terms/dry) up and simplify your codebase. As you'll see, the deep interaction between your dbt models and the Semantic Layer make your dbt project the ideal place to craft your metrics. ## Learning goals diff --git a/website/docs/best-practices/materializations/materializations-guide-4-incremental-models.md b/website/docs/best-practices/materializations/materializations-guide-4-incremental-models.md index 71b24ef58f2..a195812ff1c 100644 --- a/website/docs/best-practices/materializations/materializations-guide-4-incremental-models.md +++ b/website/docs/best-practices/materializations/materializations-guide-4-incremental-models.md @@ -7,7 +7,7 @@ displayText: Materializations best practices hoverSnippet: Read this guide to understand the incremental models you can create in dbt. --- -So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This where we start to deviate from this pattern with more powerful and complex materializations. +So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This is where we start to deviate from this pattern with more powerful and complex materializations. - 📚 **Incremental models generate tables.** They physically persist the data itself to the warehouse, just piece by piece. What’s different is **how we build that table**. - 💅 **Only apply our transformations to rows of data with new or updated information**, this maximizes efficiency. @@ -53,7 +53,7 @@ where updated_at > (select max(updated_at) from {{ this }}) ``` -Let’s break down that `where` clause a bit, because this where the action is with incremental models. Stepping through the code **_right-to-left_** we: +Let’s break down that `where` clause a bit, because this is where the action is with incremental models. Stepping through the code **_right-to-left_** we: 1. Get our **cutoff.** 1. Select the `max(updated_at)` timestamp — the **most recent record** @@ -138,7 +138,7 @@ where {% endif %} ``` -Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs is it will evaluate to true and we’ll apply our filter logic, capturing only the newer data. +Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs it will evaluate to true and we’ll apply our filter logic, capturing only the newer data. ### Late arriving facts diff --git a/website/docs/community/resources/oss-expectations.md b/website/docs/community/resources/oss-expectations.md index 9c916de1240..a57df7fe67f 100644 --- a/website/docs/community/resources/oss-expectations.md +++ b/website/docs/community/resources/oss-expectations.md @@ -94,6 +94,8 @@ PRs are your surest way to make the change you want to see in dbt / packages / d **Every PR should be associated with an issue.** Why? Before you spend a lot of time working on a contribution, we want to make sure that your proposal will be accepted. You should open an issue first, describing your desired outcome and outlining your planned change. If you've found an older issue that's already open, comment on it with an outline for your planned implementation. Exception to this rule: If you're just opening a PR for a cosmetic fix, such as a typo in documentation, an issue isn't needed. +**PRs should include robust testing.** Comprehensive testing within pull requests is crucial for the stability of our project. By prioritizing robust testing, we ensure the reliability of our codebase, minimize unforeseen issues and safeguard against potential regressions. We understand that creating thorough tests often requires significant effort, and your dedication to this process greatly contributes to the project's overall reliability. Thank you for your commitment to maintaining the integrity of our codebase!" + **Our goal is to review most new PRs within 7 days.** The first review will include some high-level comments about the implementation, including (at a high level) whether it’s something we think suitable to merge. Depending on the scope of the PR, the first review may include line-level code suggestions, or we may delay specific code review until the PR is more finalized / until we have more time. **Automation that can help us:** Many repositories have a template for pull request descriptions, which will include a checklist that must be completed before the PR can be merged. You don’t have to do all of these things to get an initial PR, but they definitely help. Those many include things like: diff --git a/website/docs/community/spotlight/alison-stanton.md b/website/docs/community/spotlight/alison-stanton.md index 2054f78b0b7..fd4a1796411 100644 --- a/website/docs/community/spotlight/alison-stanton.md +++ b/website/docs/community/spotlight/alison-stanton.md @@ -17,6 +17,7 @@ socialLinks: link: https://github.com/alison985/ dateCreated: 2023-11-07 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/bruno-de-lima.md b/website/docs/community/spotlight/bruno-de-lima.md index 0365ee6c6a8..3cce6135ae0 100644 --- a/website/docs/community/spotlight/bruno-de-lima.md +++ b/website/docs/community/spotlight/bruno-de-lima.md @@ -20,6 +20,7 @@ socialLinks: link: https://medium.com/@bruno.szdl dateCreated: 2023-11-05 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/dakota-kelley.md b/website/docs/community/spotlight/dakota-kelley.md index 57834d9cdff..85b79f0e85a 100644 --- a/website/docs/community/spotlight/dakota-kelley.md +++ b/website/docs/community/spotlight/dakota-kelley.md @@ -15,6 +15,7 @@ socialLinks: link: https://www.linkedin.com/in/dakota-kelley/ dateCreated: 2023-11-08 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/fabiyi-opeyemi.md b/website/docs/community/spotlight/fabiyi-opeyemi.md index f67ff4aaefc..67efd90e1c5 100644 --- a/website/docs/community/spotlight/fabiyi-opeyemi.md +++ b/website/docs/community/spotlight/fabiyi-opeyemi.md @@ -18,6 +18,7 @@ socialLinks: link: https://www.linkedin.com/in/opeyemifabiyi/ dateCreated: 2023-11-06 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/josh-devlin.md b/website/docs/community/spotlight/josh-devlin.md index d8a9b91c282..6036105940b 100644 --- a/website/docs/community/spotlight/josh-devlin.md +++ b/website/docs/community/spotlight/josh-devlin.md @@ -23,6 +23,7 @@ socialLinks: link: https://www.linkedin.com/in/josh-devlin/ dateCreated: 2023-11-10 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/karen-hsieh.md b/website/docs/community/spotlight/karen-hsieh.md index 5147f39ce59..22d6915baf7 100644 --- a/website/docs/community/spotlight/karen-hsieh.md +++ b/website/docs/community/spotlight/karen-hsieh.md @@ -24,6 +24,7 @@ socialLinks: link: https://medium.com/@ijacwei dateCreated: 2023-11-04 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/oliver-cramer.md b/website/docs/community/spotlight/oliver-cramer.md index bfd62db0908..7e9974a8a2c 100644 --- a/website/docs/community/spotlight/oliver-cramer.md +++ b/website/docs/community/spotlight/oliver-cramer.md @@ -16,6 +16,7 @@ socialLinks: link: https://www.linkedin.com/in/oliver-cramer/ dateCreated: 2023-11-02 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/sam-debruyn.md b/website/docs/community/spotlight/sam-debruyn.md index 166adf58b09..24ce7aa1b15 100644 --- a/website/docs/community/spotlight/sam-debruyn.md +++ b/website/docs/community/spotlight/sam-debruyn.md @@ -18,6 +18,7 @@ socialLinks: link: https://debruyn.dev/ dateCreated: 2023-11-03 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/stacy-lo.md b/website/docs/community/spotlight/stacy-lo.md index f0b70fcc225..23a5491dd18 100644 --- a/website/docs/community/spotlight/stacy-lo.md +++ b/website/docs/community/spotlight/stacy-lo.md @@ -17,6 +17,7 @@ socialLinks: link: https://www.linkedin.com/in/olycats/ dateCreated: 2023-11-01 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/community/spotlight/sydney-burns.md b/website/docs/community/spotlight/sydney-burns.md index ecebd6cdec0..25278ac6ecf 100644 --- a/website/docs/community/spotlight/sydney-burns.md +++ b/website/docs/community/spotlight/sydney-burns.md @@ -15,6 +15,7 @@ socialLinks: link: https://www.linkedin.com/in/sydneyeburns/ dateCreated: 2023-11-09 hide_table_of_contents: true +communityAward: true --- ## When did you join the dbt community and in what way has it impacted your career? diff --git a/website/docs/docs/build/tests.md b/website/docs/docs/build/data-tests.md similarity index 56% rename from website/docs/docs/build/tests.md rename to website/docs/docs/build/data-tests.md index 3d86dc6a81b..d981d7e272d 100644 --- a/website/docs/docs/build/tests.md +++ b/website/docs/docs/build/data-tests.md @@ -1,43 +1,43 @@ --- -title: "Add tests to your DAG" -sidebar_label: "Tests" -description: "Read this tutorial to learn how to use tests when building in dbt." +title: "Add data tests to your DAG" +sidebar_label: "Data tests" +description: "Read this tutorial to learn how to use data tests when building in dbt." search_weight: "heavy" -id: "tests" +id: "data-tests" keywords: - test, tests, testing, dag --- ## Related reference docs * [Test command](/reference/commands/test) -* [Test properties](/reference/resource-properties/tests) -* [Test configurations](/reference/test-configs) +* [Data test properties](/reference/resource-properties/data-tests) +* [Data test configurations](/reference/data-test-configs) * [Test selection examples](/reference/node-selection/test-selection-examples) ## Overview -Tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots). When you run `dbt test`, dbt will tell you if each test in your project passes or fails. +Data tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots). When you run `dbt test`, dbt will tell you if each test in your project passes or fails. -You can use tests to improve the integrity of the SQL in each model by making assertions about the results generated. Out of the box, you can test whether a specified column in a model only contains non-null values, unique values, or values that have a corresponding value in another model (for example, a `customer_id` for an `order` corresponds to an `id` in the `customers` model), and values from a specified list. You can extend tests to suit business logic specific to your organization – any assertion that you can make about your model in the form of a select query can be turned into a test. +You can use data tests to improve the integrity of the SQL in each model by making assertions about the results generated. Out of the box, you can test whether a specified column in a model only contains non-null values, unique values, or values that have a corresponding value in another model (for example, a `customer_id` for an `order` corresponds to an `id` in the `customers` model), and values from a specified list. You can extend data tests to suit business logic specific to your organization – any assertion that you can make about your model in the form of a select query can be turned into a data test. -Both types of tests return a set of failing records. Previously, generic/schema tests returned a numeric value representing failures. Generic tests (f.k.a. schema tests) are defined using `test` blocks instead of macros prefixed `test_`. +Data tests return a set of failing records. Generic data tests (f.k.a. schema tests) are defined using `test` blocks. -Like almost everything in dbt, tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks after nulls. If the test returns zero failing rows, it passes, and your assertion has been validated. +Like almost everything in dbt, data tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks after nulls. If the data test returns zero failing rows, it passes, and your assertion has been validated. -There are two ways of defining tests in dbt: -* A **singular** test is testing in its simplest form: If you can write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](/reference/project-configs/test-paths). It's now a test, and it will be executed by the `dbt test` command. -* A **generic** test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](jinja-macros)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic tests built in, and we think you should use them! +There are two ways of defining data tests in dbt: +* A **singular** data test is testing in its simplest form: If you can write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](/reference/project-configs/test-paths). It's now a data test, and it will be executed by the `dbt test` command. +* A **generic** data test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](jinja-macros)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we think you should use them! -Defining tests is a great way to confirm that your code is working correctly, and helps prevent regressions when your code changes. Because you can use them over and over again, making similar assertions with minor variations, generic tests tend to be much more common—they should make up the bulk of your dbt testing suite. That said, both ways of defining tests have their time and place. +Defining data tests is a great way to confirm that your outputs and inputs are as expected, and helps prevent regressions when your code changes. Because you can use them over and over again, making similar assertions with minor variations, generic data tests tend to be much more common—they should make up the bulk of your dbt data testing suite. That said, both ways of defining data tests have their time and place. -:::tip Creating your first tests +:::tip Creating your first data tests If you're new to dbt, we recommend that you check out our [quickstart guide](/guides) to build your first dbt project with models and tests. ::: -## Singular tests +## Singular data tests -The simplest way to define a test is by writing the exact SQL that will return failing records. We call these "singular" tests, because they're one-off assertions usable for a single purpose. +The simplest way to define a data test is by writing the exact SQL that will return failing records. We call these "singular" data tests, because they're one-off assertions usable for a single purpose. -These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your [`test-paths` config](/reference/project-configs/test-paths)). You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one test: +These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your [`test-paths` config](/reference/project-configs/test-paths)). You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one data test: @@ -56,10 +56,10 @@ having not(total_amount >= 0) The name of this test is the name of the file: `assert_total_payment_amount_is_positive`. Simple enough. -Singular tests are easy to write—so easy that you may find yourself writing the same basic structure over and over, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend... +Singular data tests are easy to write—so easy that you may find yourself writing the same basic structure over and over, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend... -## Generic tests -Certain tests are generic: they can be reused over and over again. A generic test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like: +## Generic data tests +Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like: ```sql {% test not_null(model, column_name) %} @@ -77,7 +77,7 @@ You'll notice that there are two arguments, `model` and `column_name`, which are If this is your first time working with adding properties to a resource, check out the docs on [declaring properties](/reference/configs-and-properties). ::: -Out of the box, dbt ships with four generic tests already defined: `unique`, `not_null`, `accepted_values` and `relationships`. Here's a full example using those tests on an `orders` model: +Out of the box, dbt ships with four generic data tests already defined: `unique`, `not_null`, `accepted_values` and `relationships`. Here's a full example using those tests on an `orders` model: ```yml version: 2 @@ -100,19 +100,19 @@ models: field: id ``` -In plain English, these tests translate to: +In plain English, these data tests translate to: * `unique`: the `order_id` column in the `orders` model should be unique * `not_null`: the `order_id` column in the `orders` model should not contain null values * `accepted_values`: the `status` column in the `orders` should be one of `'placed'`, `'shipped'`, `'completed'`, or `'returned'` * `relationships`: each `customer_id` in the `orders` model exists as an `id` in the `customers` (also known as referential integrity) -Behind the scenes, dbt constructs a `select` query for each test, using the parametrized query from the generic test block. These queries return the rows where your assertion is _not_ true; if the test returns zero rows, your assertion passes. +Behind the scenes, dbt constructs a `select` query for each data test, using the parametrized query from the generic test block. These queries return the rows where your assertion is _not_ true; if the test returns zero rows, your assertion passes. -You can find more information about these tests, and additional configurations (including [`severity`](/reference/resource-configs/severity) and [`tags`](/reference/resource-configs/tags)) in the [reference section](/reference/resource-properties/tests). +You can find more information about these data tests, and additional configurations (including [`severity`](/reference/resource-configs/severity) and [`tags`](/reference/resource-configs/tags)) in the [reference section](/reference/resource-properties/data-tests). -### More generic tests +### More generic data tests -Those four tests are enough to get you started. You'll quickly find you want to use a wider variety of tests—a good thing! You can also install generic tests from a package, or write your own, to use (and reuse) across your dbt project. Check out the [guide on custom generic tests](/best-practices/writing-custom-generic-tests) for more information. +Those four tests are enough to get you started. You'll quickly find you want to use a wider variety of tests—a good thing! You can also install generic data tests from a package, or write your own, to use (and reuse) across your dbt project. Check out the [guide on custom generic tests](/best-practices/writing-custom-generic-tests) for more information. :::info There are generic tests defined in some open source packages, such as [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) and [dbt-expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) — skip ahead to the docs on [packages](/docs/build/packages) to learn more! @@ -241,7 +241,7 @@ where {{ column_name }} is null ## Storing test failures -Normally, a test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](/reference/resource-configs/store_failures), or the [`store_failures_as`](/reference/resource-configs/store_failures_as) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures. +Normally, a data test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](/reference/resource-configs/store_failures), or the [`store_failures_as`](/reference/resource-configs/store_failures_as) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures. This workflow allows you to query and examine failing records much more quickly in development: diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 683ff730d3c..3c4edd9aef0 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -252,7 +252,8 @@ This example shows how to create slowly changing dimensions (SCD) using a semant | 333 | 2 | 2020-08-19 | 2021-10-22| | 333 | 3 | 2021-10-22 | 2048-01-01| -Take note of the extra arguments under `validity_params`: `is_start` and `is_end`. These arguments indicate the columns in the SCD table that contain the start and end dates for each tier (or beginning or ending timestamp column for a dimensional value). + +The `validity_params` include two important arguments — `is_start` and `is_end`. These specify the columns in the SCD table that mark the start and end dates (or timestamps) for each tier or dimension. Additionally, the entity is tagged as `natural` to differentiate it from a `primary` entity. In a `primary` entity, each entity value has one row. In contrast, a `natural` entity has one row for each combination of entity value and its validity period. ```yaml semantic_models: @@ -280,9 +281,11 @@ semantic_models: - name: tier type: categorical + primary_entity: sales_person + entities: - name: sales_person - type: primary + type: natural expr: sales_person_id ``` diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 3a597499f04..46788758ee6 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -249,31 +249,29 @@ The `merge` strategy is available in dbt-postgres and dbt-redshift beginning in - -| data platform adapter | default strategy | additional supported strategies | -| :-------------------| ---------------- | -------------------- | -| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | `append` | `delete+insert` | -| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | `append` | `delete+insert` | -| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | `merge` | `insert_overwrite` | -| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | -| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | -| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | `merge` | `append`, `delete+insert` | -| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | `append` | `merge` `delete+insert` | +| data platform adapter | `append` | `merge` | `delete+insert` | `insert_overwrite` | +|-----------------------------------------------------------------------------------------------------|:--------:|:-------:|:---------------:|:------------------:| +| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ | | ✅ | | +| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | ✅ | | ✅ | | +| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | | ✅ | | ✅ | +| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | ✅ | ✅ | | ✅ | +| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | ✅ | ✅ | | ✅ | +| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | ✅ | ✅ | ✅ | | +| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | ✅ | ✅ | ✅ | | - -| data platform adapter | default strategy | additional supported strategies | -| :----------------- | :----------------| : ---------------------------------- | -| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | `append` | `merge` , `delete+insert` | -| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | `append` | `merge`, `delete+insert` | -| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | `merge` | `insert_overwrite` | -| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | -| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | `append` | `merge` (Delta only) `insert_overwrite` | -| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | `merge` | `append`, `delete+insert` | -| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | `append` | `merge` `delete+insert` | +| data platform adapter | `append` | `merge` | `delete+insert` | `insert_overwrite` | +|-----------------------------------------------------------------------------------------------------|:--------:|:-------:|:---------------:|:------------------:| +| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | +| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | +| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | | ✅ | | ✅ | +| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | ✅ | ✅ | | ✅ | +| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | ✅ | ✅ | | ✅ | +| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | ✅ | ✅ | ✅ | | +| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | ✅ | ✅ | ✅ | | diff --git a/website/docs/docs/build/jinja-macros.md b/website/docs/docs/build/jinja-macros.md index 135db740f75..074e648d410 100644 --- a/website/docs/docs/build/jinja-macros.md +++ b/website/docs/docs/build/jinja-macros.md @@ -23,7 +23,7 @@ Using Jinja turns your dbt project into a programming environment for SQL, givin In fact, if you've used the [`{{ ref() }}` function](/reference/dbt-jinja-functions/ref), you're already using Jinja! -Jinja can be used in any SQL in a dbt project, including [models](/docs/build/sql-models), [analyses](/docs/build/analyses), [tests](/docs/build/tests), and even [hooks](/docs/build/hooks-operations). +Jinja can be used in any SQL in a dbt project, including [models](/docs/build/sql-models), [analyses](/docs/build/analyses), [tests](/docs/build/data-tests), and even [hooks](/docs/build/hooks-operations). :::info Ready to get started with Jinja and macros? diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 9039822c9fd..29b9d101a59 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -84,10 +84,6 @@ mf query --metrics average_purchase_price --dimensions metric_time,user_id__type ## Multi-hop joins -:::info -This feature is currently in development and not currently available. -::: - MetricFlow allows users to join measures and dimensions across a graph of entities, which we refer to as a 'multi-hop join.' This is because users can move from one table to another like a 'hop' within a graph. Here's an example schema for reference: @@ -134,9 +130,6 @@ semantic_models: ### Query multi-hop joins -:::info -This feature is currently in development and not currently available. -::: To query dimensions _without_ a multi-hop join involved, you can use the fully qualified dimension name with the syntax entity double underscore (dunder) dimension, like `entity__dimension`. diff --git a/website/docs/docs/build/metricflow-commands.md b/website/docs/docs/build/metricflow-commands.md index 7e535e4ea62..e3bb93da964 100644 --- a/website/docs/docs/build/metricflow-commands.md +++ b/website/docs/docs/build/metricflow-commands.md @@ -556,3 +556,8 @@ Keep in mind that modifying your shell configuration files can have an impact on +
+Why is my query limited to 100 rows in the dbt Cloud CLI? +The default limit for query issues from the dbt Cloud CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt Cloud CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data.

+However, you can change this limit if needed by setting the --limit option in your query. For example, to return 1000 rows, you can run dbt sl list metrics --limit 1000. +
diff --git a/website/docs/docs/build/projects.md b/website/docs/docs/build/projects.md index a54f6042cce..c5e08177dee 100644 --- a/website/docs/docs/build/projects.md +++ b/website/docs/docs/build/projects.md @@ -14,7 +14,7 @@ At a minimum, all a project needs is the `dbt_project.yml` project configuration | [models](/docs/build/models) | Each model lives in a single file and contains logic that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation. | | [snapshots](/docs/build/snapshots) | A way to capture the state of your mutable tables so you can refer to it later. | | [seeds](/docs/build/seeds) | CSV files with static data that you can load into your data platform with dbt. | -| [tests](/docs/build/tests) | SQL queries that you can write to test the models and resources in your project. | +| [data tests](/docs/build/data-tests) | SQL queries that you can write to test the models and resources in your project. | | [macros](/docs/build/jinja-macros) | Blocks of code that you can reuse multiple times. | | [docs](/docs/collaborate/documentation) | Docs for your project that you can build. | | [sources](/docs/build/sources) | A way to name and describe the data loaded into your warehouse by your Extract and Load tools. | diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index a657b6257c9..466bcedc688 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -88,10 +88,10 @@ Using the `{{ source () }}` function also creates a dependency between the model ### Testing and documenting sources You can also: -- Add tests to sources +- Add data tests to sources - Add descriptions to sources, that get rendered as part of your documentation site -These should be familiar concepts if you've already added tests and descriptions to your models (if not check out the guides on [testing](/docs/build/tests) and [documentation](/docs/collaborate/documentation)). +These should be familiar concepts if you've already added tests and descriptions to your models (if not check out the guides on [testing](/docs/build/data-tests) and [documentation](/docs/collaborate/documentation)). diff --git a/website/docs/docs/build/sql-models.md b/website/docs/docs/build/sql-models.md index 237ac84c0c2..a0dd174278b 100644 --- a/website/docs/docs/build/sql-models.md +++ b/website/docs/docs/build/sql-models.md @@ -262,7 +262,7 @@ Additionally, the `ref` function encourages you to write modular transformations ## Testing and documenting models -You can also document and test models — skip ahead to the section on [testing](/docs/build/tests) and [documentation](/docs/collaborate/documentation) for more information. +You can also document and test models — skip ahead to the section on [testing](/docs/build/data-tests) and [documentation](/docs/collaborate/documentation) for more information. ## Additional FAQs diff --git a/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md b/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md index 7f32505d56e..119201b389d 100644 --- a/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md +++ b/website/docs/docs/cloud/about-cloud/regions-ip-addresses.md @@ -12,7 +12,7 @@ dbt Cloud is [hosted](/docs/cloud/about-cloud/architecture) in multiple regions | Region | Location | Access URL | IP addresses | Developer plan | Team plan | Enterprise plan | |--------|----------|------------|--------------|----------------|-----------|-----------------| | North America multi-tenant [^1] | AWS us-east-1 (N. Virginia) | cloud.getdbt.com | 52.45.144.63
54.81.134.249
52.22.161.231
52.3.77.232
3.214.191.130
34.233.79.135 | ✅ | ✅ | ✅ | -| North America Cell 1 [^1] | AWS us-east-1 (N. Virginia) | {account prefix}.us1.dbt.com | 52.45.144.63
54.81.134.249
52.22.161.231
52.3.77.232
3.214.191.130
34.233.79.135 | ❌ | ❌ | ✅ | +| North America Cell 1 [^1] | AWS us-east-1 (N. Virginia) | {account prefix}.us1.dbt.com | 52.45.144.63
54.81.134.249
52.22.161.231
52.3.77.232
3.214.191.130
34.233.79.135 | ✅ | ❌ | ✅ | | EMEA [^1] | AWS eu-central-1 (Frankfurt) | emea.dbt.com | 3.123.45.39
3.126.140.248
3.72.153.148 | ❌ | ❌ | ✅ | | APAC [^1] | AWS ap-southeast-2 (Sydney)| au.dbt.com | 52.65.89.235
3.106.40.33
13.239.155.206
| ❌ | ❌ | ✅ | | Virtual Private dbt or Single tenant | Customized | Customized | Ask [Support](/community/resources/getting-help#dbt-cloud-support) for your IPs | ❌ | ❌ | ✅ | diff --git a/website/docs/docs/cloud/cloud-cli-installation.md b/website/docs/docs/cloud/cloud-cli-installation.md index f3294477611..8d2196696aa 100644 --- a/website/docs/docs/cloud/cloud-cli-installation.md +++ b/website/docs/docs/cloud/cloud-cli-installation.md @@ -26,7 +26,7 @@ dbt commands are run against dbt Cloud's infrastructure and benefit from: The dbt Cloud CLI is available in all [deployment regions](/docs/cloud/about-cloud/regions-ip-addresses) and for both multi-tenant and single-tenant accounts (Azure single-tenant not supported at this time). - Ensure you are using dbt version 1.5 or higher. Refer to [dbt Cloud versions](/docs/dbt-versions/upgrade-core-in-cloud) to upgrade. -- Note that SSH tunneling for [Postgres and Redshift](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) connections and [Single sign-on (SSO)](/docs/cloud/manage-access/sso-overview) doesn't support the dbt Cloud CLI yet. +- Note that SSH tunneling for [Postgres and Redshift](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) connections doesn't support the dbt Cloud CLI yet. ## Install dbt Cloud CLI diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index b90bceef570..774400529e9 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -34,7 +34,7 @@ On the audit log page, you will see a list of various events and their associate Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `v1.events.job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#events-in-audit-log). -The event details provides the key factors of an event: +The event details provide the key factors of an event: | Name | Description | | -------------------- | --------------------------------------------- | @@ -160,16 +160,22 @@ The audit log supports various events for different objects in dbt Cloud. You wi You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window. - + ## Exporting logs You can use the audit log to export all historical audit results for security, compliance, and analysis purposes: -- For events within 90 days — dbt Cloud will automatically display the 90-day selectable date range. Select **Export Selection** to download a CSV file of all the events that occurred in your organization within 90 days. -- For events beyond 90 days — Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization. +- **For events within 90 days** — dbt Cloud will automatically display the 90-day selectable date range. Select **Export Selection** to download a CSV file of all the events that occurred in your organization within 90 days. - +- **For events beyond 90 days** — Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization. + +### Azure Single-tenant + +For users deployed in [Azure single tenant](/docs/cloud/about-cloud/tenancy), while the **Export All** button isn't available, you can conveniently use specific APIs to access all events: + +- [Get recent audit log events CSV](/dbt-cloud/api-v3#/operations/Get%20Recent%20Audit%20Log%20Events%20CSV) — This API returns all events in a single CSV without pagination. +- [List recent audit log events](/dbt-cloud/api-v3#/operations/List%20Recent%20Audit%20Log%20Events) — This API returns a limited number of events at a time, which means you will need to paginate the results. diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index f613df7907e..b4954955c8c 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -57,8 +57,9 @@ Non-admin users that currently login with a password will no longer be able to d ### Security best practices There are a few scenarios that might require you to login with a password. We recommend these security best-practices for the two most common scenarios: -* **Onboarding partners and contractors** - We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment. -* **Identity Provider is down -** Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem. +* **Onboarding partners and contractors** — We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment. +* **Identity Provider is down** — Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem. +* **Offboarding admins** — When offboarding admins, revoke access to dbt Cloud by deleting the user from your environment; otherwise, they can continue to use username/password credentials to log in. ### Next steps for non-admin users currently logging in with passwords @@ -67,4 +68,5 @@ If you have any non-admin users logging into dbt Cloud with a password today: 1. Ensure that all users have a user account in your identity provider and are assigned dbt Cloud so they won’t lose access. 2. Alert all dbt Cloud users that they won’t be able to use a password for logging in anymore unless they are already an Admin with a password. 3. We **DO NOT** recommend promoting any users to Admins just to preserve password-based logins because you will reduce security of your dbt Cloud environment. -** + + diff --git a/website/docs/docs/collaborate/documentation.md b/website/docs/docs/collaborate/documentation.md index 16a4e610c70..1a989806851 100644 --- a/website/docs/docs/collaborate/documentation.md +++ b/website/docs/docs/collaborate/documentation.md @@ -15,7 +15,7 @@ pagination_prev: null ## Assumed knowledge -* [Tests](/docs/build/tests) +* [Tests](/docs/build/data-tests) ## Overview @@ -32,7 +32,7 @@ Here's an example docs site: ## Adding descriptions to your project -To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/tests), like so: +To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/data-tests), like so: diff --git a/website/docs/docs/collaborate/explore-projects.md b/website/docs/docs/collaborate/explore-projects.md index 78fe6f45cc7..ed5dee93317 100644 --- a/website/docs/docs/collaborate/explore-projects.md +++ b/website/docs/docs/collaborate/explore-projects.md @@ -149,7 +149,7 @@ An example of the details you might get for a model: - **Lineage** graph — The model’s lineage graph that you can interact with. The graph includes one parent node and one child node from the model. Click the Expand icon in the graph's upper right corner to view the model in full lineage graph mode. - **Description** section — A [description of the model](/docs/collaborate/documentation#adding-descriptions-to-your-project). - **Recent** section — Information on the last time the model ran, how long it ran for, whether the run was successful, the job ID, and the run ID. - - **Tests** section — [Tests](/docs/build/tests) for the model, including a status indicator for the latest test status. A :white_check_mark: denotes a passing test. + - **Tests** section — [Tests](/docs/build/data-tests) for the model, including a status indicator for the latest test status. A :white_check_mark: denotes a passing test. - **Details** section — Key properties like the model’s relation name (for example, how it’s represented and how you can query it in the data platform: `database.schema.identifier`); model governance attributes like access, group, and if contracted; and more. - **Relationships** section — The nodes the model **Depends On**, is **Referenced by**, and (if applicable) is **Used by** for projects that have declared the models' project as a dependency. - **Code** tab — The source code and compiled code for the model. diff --git a/website/docs/docs/collaborate/govern/model-contracts.md b/website/docs/docs/collaborate/govern/model-contracts.md index 342d86c1a77..8e7598f8e3b 100644 --- a/website/docs/docs/collaborate/govern/model-contracts.md +++ b/website/docs/docs/collaborate/govern/model-contracts.md @@ -183,9 +183,9 @@ Any model meeting the criteria described above _can_ define a contract. We recom A model's contract defines the **shape** of the returned dataset. If the model's logic or input data doesn't conform to that shape, the model does not build. -[Tests](/docs/build/tests) are a more flexible mechanism for validating the content of your model _after_ it's built. So long as you can write the query, you can run the test. Tests are more configurable, such as with [custom severity thresholds](/reference/resource-configs/severity). They are easier to debug after finding failures, because you can query the already-built model, or [store the failing records in the data warehouse](/reference/resource-configs/store_failures). +[Data Tests](/docs/build/data-tests) are a more flexible mechanism for validating the content of your model _after_ it's built. So long as you can write the query, you can run the data test. Data tests are more configurable, such as with [custom severity thresholds](/reference/resource-configs/severity). They are easier to debug after finding failures, because you can query the already-built model, or [store the failing records in the data warehouse](/reference/resource-configs/store_failures). -In some cases, you can replace a test with its equivalent constraint. This has the advantage of guaranteeing the validation at build time, and it probably requires less compute (cost) in your data platform. The prerequisites for replacing a test with a constraint are: +In some cases, you can replace a data test with its equivalent constraint. This has the advantage of guaranteeing the validation at build time, and it probably requires less compute (cost) in your data platform. The prerequisites for replacing a data test with a constraint are: - Making sure that your data platform can support and enforce the constraint that you need. Most platforms only enforce `not_null`. - Materializing your model as `table` or `incremental` (**not** `view` or `ephemeral`). - Defining a full contract for this model by specifying the `name` and `data_type` of each column. diff --git a/website/docs/docs/core/connect-data-platform/infer-setup.md b/website/docs/docs/core/connect-data-platform/infer-setup.md index 7642c553cc4..c04fba59a56 100644 --- a/website/docs/docs/core/connect-data-platform/infer-setup.md +++ b/website/docs/docs/core/connect-data-platform/infer-setup.md @@ -12,16 +12,21 @@ meta: slack_channel_name: n/a slack_channel_link: platform_name: 'Infer' - config_page: '/reference/resource-configs/no-configs' + config_page: '/reference/resource-configs/infer-configs' min_supported_version: n/a --- +:::info Vendor-supported plugin + +Certain core functionality may vary. If you would like to report a bug, request a feature, or contribute, you can check out the linked repository and open an issue. + +::: + import SetUpPages from '/snippets/_setup-pages-intro.md'; - ## Connecting to Infer with **dbt-infer** Infer allows you to perform advanced ML Analytics within SQL as if native to your data warehouse. @@ -30,10 +35,18 @@ you can build advanced analysis for any business use case. Read more about SQL-inf and Infer in the [Infer documentation](https://docs.getinfer.io/). The `dbt-infer` package allow you to use SQL-inf easily within your DBT models. -You can read more about the `dbt-infer` package itself and how it connecst to Infer in the [dbt-infer documentation](https://dbt.getinfer.io/). +You can read more about the `dbt-infer` package itself and how it connects to Infer in the [dbt-infer documentation](https://dbt.getinfer.io/). + +The dbt-infer adapter is maintained via PyPi and installed with pip. +To install the latest dbt-infer package simply run the following within the same shell as you run dbt. +```python +pip install dbt-infer +``` + +Versioning of dbt-infer follows the standard dbt versioning scheme - meaning if you are using dbt 1.2 the corresponding dbt-infer will be named 1.2.x where is the latest minor version number. Before using SQL-inf in your DBT models you need to setup an Infer account and generate an API-key for the connection. -You can read how to do that in the [Getting Started Guide](https://dbt.getinfer.io/docs/getting_started#sign-up-to-infer). +You can read how to do that in the [Getting Started Guide](https://docs.getinfer.io/docs/reference/integrations/dbt). The profile configuration in `profiles.yml` for `dbt-infer` should look something like this: @@ -101,10 +114,10 @@ as native SQL functions. Infer supports a number of SQL-inf commands, including `PREDICT`, `EXPLAIN`, `CLUSTER`, `SIMILAR_TO`, `TOPICS`, `SENTIMENT`. -You can read more about SQL-inf and the commands it supports in the [SQL-inf Reference Guide](https://docs.getinfer.io/docs/reference). +You can read more about SQL-inf and the commands it supports in the [SQL-inf Reference Guide](https://docs.getinfer.io/docs/category/commands). To get you started we will give a brief example here of what such a model might look like. -You can find other more complex examples on the [dbt-infer examples page](https://dbt.getinfer.io/docs/examples). +You can find other more complex examples in the [dbt-infer examples repo](https://github.com/inferlabs/dbt-infer-examples). In our simple example, we will show how to use a previous model 'user_features' to predict churn by predicting the column `has_churned`. diff --git a/website/docs/docs/dbt-cloud-apis/schema-discovery-job.mdx b/website/docs/docs/dbt-cloud-apis/schema-discovery-job.mdx index 8b02c5601ad..faebcd9ec2c 100644 --- a/website/docs/docs/dbt-cloud-apis/schema-discovery-job.mdx +++ b/website/docs/docs/dbt-cloud-apis/schema-discovery-job.mdx @@ -61,4 +61,4 @@ query JobQueryExample { ### Fields When querying an `job`, you can use the following fields. - + diff --git a/website/docs/docs/dbt-cloud-apis/schema.jsx b/website/docs/docs/dbt-cloud-apis/schema.jsx index 31568671573..9ac4656c984 100644 --- a/website/docs/docs/dbt-cloud-apis/schema.jsx +++ b/website/docs/docs/dbt-cloud-apis/schema.jsx @@ -173,7 +173,7 @@ export const NodeArgsTable = ({ parent, name, useBetaAPI }) => { ) } -export const SchemaTable = ({ nodeName, useBetaAPI }) => { +export const SchemaTable = ({ nodeName, useBetaAPI, exclude = [] }) => { const [data, setData] = useState(null) useEffect(() => { const fetchData = () => { @@ -255,6 +255,7 @@ export const SchemaTable = ({ nodeName, useBetaAPI }) => { {data.data.__type.fields.map(function ({ name, description, type }) { + if (exclude.includes(name)) return; return ( {name} diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index f1369711d2b..b0b5fbd6cfe 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -51,7 +51,7 @@ Job admin service tokens can authorize requests for viewing, editing, and creati Member service tokens can authorize requests for viewing and editing resources, triggering runs, and inviting members to the account. Tokens assigned the Member permission set will have the same permissions as a Member user. For more information about Member users, see "[Self-service permissions](/docs/cloud/manage-access/self-service-permissions)". **Read-only**
-Read-only service tokens can authorize requests for viewing a read-only dashboard, viewing generated documentation, and viewing source freshness reports. +Read-only service tokens can authorize requests for viewing a read-only dashboard, viewing generated documentation, and viewing source freshness reports. This token can access and retrieve account-level information endpoints on the [Admin API](/docs/dbt-cloud-apis/admin-cloud-api) and authorize requests to the [Discovery API](/docs/dbt-cloud-apis/discovery-api). ### Enterprise plans using service account tokens diff --git a/website/docs/docs/dbt-support.md b/website/docs/docs/dbt-support.md index 40968b9d763..84bf92482c5 100644 --- a/website/docs/docs/dbt-support.md +++ b/website/docs/docs/dbt-support.md @@ -17,7 +17,9 @@ If you're developing on the command line (CLI) and have questions or need some h ## dbt Cloud support -The global dbt Support team is available to dbt Cloud customers by email or in-product live chat. We want to help you work through implementing and utilizing dbt Cloud at your organization. Have a question you can't find an answer to in [our docs](https://docs.getdbt.com/) or [the Community Forum](https://discourse.getdbt.com/)? Our Support team is here to `dbt help` you! +The global dbt Support team is available to dbt Cloud customers by [email](mailto:support@getdbt.com) or using the in-product live chat (💬). + +We want to help you work through implementing and utilizing dbt Cloud at your organization. Have a question you can't find an answer to in [our docs](https://docs.getdbt.com/) or [the Community Forum](https://discourse.getdbt.com/)? Our Support team is here to `dbt help` you! - **Enterprise plans** — Priority [support](#severity-level-for-enterprise-support), options for custom support coverage hours, implementation assistance, dedicated management, and dbt Labs security reviews depending on price point. - **Developer and Team plans** — 24x5 support (no service level agreement (SLA); [contact Sales](https://www.getdbt.com/pricing/) for Enterprise plan inquiries). diff --git a/website/docs/docs/dbt-versions/core-upgrade/03-upgrading-to-dbt-utils-v1.0.md b/website/docs/docs/dbt-versions/core-upgrade/03-upgrading-to-dbt-utils-v1.0.md index a7b302c9a58..229a54627fc 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/03-upgrading-to-dbt-utils-v1.0.md +++ b/website/docs/docs/dbt-versions/core-upgrade/03-upgrading-to-dbt-utils-v1.0.md @@ -82,7 +82,7 @@ models: # ...with this... where: "created_at > '2018-12-31'" ``` -**Note** — This may cause some tests to get the same autogenerated names. To resolve this, you can [define a custom name for a test](/reference/resource-properties/tests#define-a-custom-name-for-one-test). +**Note** — This may cause some tests to get the same autogenerated names. To resolve this, you can [define a custom name for a test](/reference/resource-properties/data-tests#define-a-custom-name-for-one-test). - The deprecated `unique_where` and `not_null_where` tests have been removed, because [where is now available natively to all tests](https://docs.getdbt.com/reference/resource-configs/where). To migrate, find and replace `dbt_utils.unique_where` with `unique` and `dbt_utils.not_null_where` with `not_null`. - `dbt_utils.current_timestamp()` is replaced by `dbt.current_timestamp()`. - Note that Postgres and Snowflake’s implementation of `dbt.current_timestamp()` differs from the old `dbt_utils` one ([full details here](https://github.com/dbt-labs/dbt-utils/pull/597#issuecomment-1231074577)). If you use Postgres or Snowflake and need identical backwards-compatible behavior, use `dbt.current_timestamp_backcompat()`. This discrepancy will hopefully be reconciled in a future version of dbt Core. diff --git a/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.1.md b/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.1.md index 12f0f42354a..868f3c7ed04 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.1.md +++ b/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.1.md @@ -43,7 +43,7 @@ Expected a schema version of "https://schemas.getdbt.com/dbt/manifest/v5.json" i [**Incremental models**](/docs/build/incremental-models) can now accept a list of multiple columns as their `unique_key`, for models that need a combination of columns to uniquely identify each row. This is supported by the most common data warehouses, for incremental strategies that make use of the `unique_key` config (`merge` and `delete+insert`). -[**Generic tests**](/reference/resource-properties/tests) can define custom names. This is useful to "prettify" the synthetic name that dbt applies automatically. It's needed to disambiguate the case when the same generic test is defined multiple times with different configurations. +[**Generic tests**](/reference/resource-properties/data-tests) can define custom names. This is useful to "prettify" the synthetic name that dbt applies automatically. It's needed to disambiguate the case when the same generic test is defined multiple times with different configurations. [**Sources**](/reference/source-properties) can define configuration inline with other `.yml` properties, just like other resource types. The only supported config is `enabled`; you can use this to dynamically enable/disable sources based on environment or package variables. diff --git a/website/docs/docs/dbt-versions/core-upgrade/08-upgrading-to-v1.0.md b/website/docs/docs/dbt-versions/core-upgrade/08-upgrading-to-v1.0.md index 6e437638ef6..0460186551d 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/08-upgrading-to-v1.0.md +++ b/website/docs/docs/dbt-versions/core-upgrade/08-upgrading-to-v1.0.md @@ -34,7 +34,7 @@ dbt Core major version 1.0 includes a number of breaking changes! Wherever possi ### Tests -The two **test types** are now "singular" and "generic" (instead of "data" and "schema", respectively). The `test_type:` selection method accepts `test_type:singular` and `test_type:generic`. (It will also accept `test_type:schema` and `test_type:data` for backwards compatibility.) **Not backwards compatible:** The `--data` and `--schema` flags to dbt test are no longer supported, and tests no longer have the tags `'data'` and `'schema'` automatically applied. Updated docs: [tests](/docs/build/tests), [test selection](/reference/node-selection/test-selection-examples), [selection methods](/reference/node-selection/methods). +The two **test types** are now "singular" and "generic" (instead of "data" and "schema", respectively). The `test_type:` selection method accepts `test_type:singular` and `test_type:generic`. (It will also accept `test_type:schema` and `test_type:data` for backwards compatibility.) **Not backwards compatible:** The `--data` and `--schema` flags to dbt test are no longer supported, and tests no longer have the tags `'data'` and `'schema'` automatically applied. Updated docs: [tests](/docs/build/data-tests), [test selection](/reference/node-selection/test-selection-examples), [selection methods](/reference/node-selection/methods). The `greedy` flag/property has been renamed to **`indirect_selection`**, which is now eager by default. **Note:** This reverts test selection to its pre-v0.20 behavior by default. `dbt test -s my_model` _will_ select multi-parent tests, such as `relationships`, that depend on unselected resources. To achieve the behavior change in v0.20 + v0.21, set `--indirect-selection=cautious` on the CLI or `indirect_selection: cautious` in YAML selectors. Updated docs: [test selection examples](/reference/node-selection/test-selection-examples), [yaml selectors](/reference/node-selection/yaml-selectors). diff --git a/website/docs/docs/dbt-versions/core-upgrade/10-upgrading-to-v0.20.md b/website/docs/docs/dbt-versions/core-upgrade/10-upgrading-to-v0.20.md index 9ff5695d5dc..be6054087b3 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/10-upgrading-to-v0.20.md +++ b/website/docs/docs/dbt-versions/core-upgrade/10-upgrading-to-v0.20.md @@ -29,9 +29,9 @@ dbt Core v0.20 has reached the end of critical support. No new patch versions wi ### Tests -- [Building a dbt Project: tests](/docs/build/tests) -- [Test Configs](/reference/test-configs) -- [Test properties](/reference/resource-properties/tests) +- [Building a dbt Project: tests](/docs/build/data-tests) +- [Test Configs](/reference/data-test-configs) +- [Test properties](/reference/resource-properties/data-tests) - [Node Selection](/reference/node-selection/syntax) (with updated [test selection examples](/reference/node-selection/test-selection-examples)) - [Writing custom generic tests](/best-practices/writing-custom-generic-tests) diff --git a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-14-0.md b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-14-0.md index 036a9a2aedf..48aa14a42e5 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-14-0.md +++ b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-14-0.md @@ -189,7 +189,7 @@ models:
-**Configuring the `incremental_strategy for a single model:** +**Configuring the `incremental_strategy` for a single model:** diff --git a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md new file mode 100644 index 00000000000..25791b66fb1 --- /dev/null +++ b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md @@ -0,0 +1,16 @@ +--- +title: "Update: Extended attributes is GA" +description: "December 2023: The extended attributes feature is now GA in dbt Cloud. It enables you to override dbt adapter YAML attributes at the environment level." +sidebar_label: "Update: Extended attributes is GA" +sidebar_position: 10 +tags: [Dec-2023] +date: 2023-12-06 +--- + +The extended attributes feature in dbt Cloud is now GA! It allows for an environment level override on any YAML attribute that a dbt adapter accepts in its `profiles.yml`. You can provide a YAML snippet to add or replace any [profile](/docs/core/connect-data-platform/profiles.yml) value. + +To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#extended-attributes). + +The **Extended Atrributes** text box is available from your environment's settings page: + + diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index e43020bf66e..cee6e245359 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -84,15 +84,22 @@ To fully customize the scheduling of your job, choose the **Custom cron schedule Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and returns their plain English translations. -Refer to the following example snippets: +Here are examples of cron job schedules. The dbt Cloud job scheduler supports using `L` to schedule jobs on the last day of the month: -- `0 * * * *`: Every hour, at minute 0 -- `*/5 * * * *`: Every 5 minutes -- `5 4 * * *`: At exactly 4:05 AM UTC -- `30 */4 * * *`: At minute 30 past every 4th hour (e.g. 4:30AM, 8:30AM, 12:30PM, etc., all UTC) -- `0 0 */2 * *`: At midnight UTC every other day +- `0 * * * *`: Every hour, at minute 0. +- `*/5 * * * *`: Every 5 minutes. +- `5 4 * * *`: At exactly 4:05 AM UTC. +- `30 */4 * * *`: At minute 30 past every 4th hour (such as 4:30 AM, 8:30 AM, 12:30 PM, and so on, all UTC). +- `0 0 */2 * *`: At 12:00 AM (midnight) UTC every other day. - `0 0 * * 1`: At midnight UTC every Monday. +- `0 0 L * *`: At 12:00 AM (midnight), on the last day of the month. +- `0 0 L 1,2,3,4,5,6,8,9,10,11,12 *`: At 12:00 AM, on the last day of the month, only in January, February, March, April, May, June, August, September, October, November, and December. +- `0 0 L 7 *`: At 12:00 AM, on the last day of the month, only in July. +- `0 0 L * FRI,SAT`: At 12:00 AM, on the last day of the month, and on Friday and Saturday. +- `0 12 L * *`: At 12:00 PM (afternoon), on the last day of the month. +- `0 7 L * 5`: At 07:00 AM, on the last day of the month, and on Friday. +- `30 14 L * *`: At 02:30 PM, on the last day of the month. ## Related docs diff --git a/website/docs/docs/introduction.md b/website/docs/docs/introduction.md index c575a9ae657..08564aeb2f0 100644 --- a/website/docs/docs/introduction.md +++ b/website/docs/docs/introduction.md @@ -56,7 +56,7 @@ As a dbt user, your main focus will be on writing models (i.e. select queries) t | Use a code compiler | SQL files can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, `if` statements and `for` loops. It also enables repeated SQL to be shared through `macros`. Read more about [Macros](/docs/build/jinja-macros).| | Determine the order of model execution | Often, when transforming data, it makes sense to do so in a staged approach. dbt provides a mechanism to implement transformations in stages through the [ref function](/reference/dbt-jinja-functions/ref). Rather than selecting from existing tables and views in your warehouse, you can select from another model.| | Document your dbt project | dbt provides a mechanism to write, version-control, and share documentation for your dbt models. You can write descriptions (in plain text or markdown) for each model and field. In dbt Cloud, you can auto-generate the documentation when your dbt project runs. Read more about the [Documentation](/docs/collaborate/documentation).| -| Test your models | Tests provide a way to improve the integrity of the SQL in each model by making assertions about the results generated by a model. Read more about writing tests for your models [Testing](/docs/build/tests)| +| Test your models | Tests provide a way to improve the integrity of the SQL in each model by making assertions about the results generated by a model. Read more about writing tests for your models [Testing](/docs/build/data-tests)| | Manage packages | dbt ships with a package manager, which allows analysts to use and publish both public and private repositories of dbt code which can then be referenced by others. Read more about [Package Management](/docs/build/packages). | | Load seed files| Often in analytics, raw values need to be mapped to a more readable value (for example, converting a country-code to a country name) or enriched with static or infrequently changing data. These data sources, known as seed files, can be saved as a CSV file in your `project` and loaded into your data warehouse using the `seed` command. Read more about [Seeds](/docs/build/seeds).| | Snapshot data | Often, records in a data source are mutable, in that they change over time. This can be difficult to handle in analytics if you want to reconstruct historic values. dbt provides a mechanism to snapshot raw data for a point in time, through use of [snapshots](/docs/build/snapshots).| diff --git a/website/docs/faqs/Models/specifying-column-types.md b/website/docs/faqs/Models/specifying-column-types.md index 8e8379c4ec1..904c616d89a 100644 --- a/website/docs/faqs/Models/specifying-column-types.md +++ b/website/docs/faqs/Models/specifying-column-types.md @@ -38,6 +38,6 @@ So long as your model queries return the correct column type, the table you crea To define additional column options: -* Rather than enforcing uniqueness and not-null constraints on your column, use dbt's [testing](/docs/build/tests) functionality to check that your assertions about your model hold true. +* Rather than enforcing uniqueness and not-null constraints on your column, use dbt's [data testing](/docs/build/data-tests) functionality to check that your assertions about your model hold true. * Rather than creating default values for a column, use SQL to express defaults (e.g. `coalesce(updated_at, current_timestamp()) as updated_at`) * In edge-cases where you _do_ need to alter a column (e.g. column-level encoding on Redshift), consider implementing this via a [post-hook](/reference/resource-configs/pre-hook-post-hook). diff --git a/website/docs/faqs/Project/properties-not-in-config.md b/website/docs/faqs/Project/properties-not-in-config.md index d1aea32b687..76de58404a9 100644 --- a/website/docs/faqs/Project/properties-not-in-config.md +++ b/website/docs/faqs/Project/properties-not-in-config.md @@ -16,7 +16,7 @@ Certain properties are special, because: These properties are: - [`description`](/reference/resource-properties/description) -- [`tests`](/reference/resource-properties/tests) +- [`tests`](/reference/resource-properties/data-tests) - [`docs`](/reference/resource-configs/docs) - `columns` - [`quote`](/reference/resource-properties/quote) diff --git a/website/docs/faqs/Tests/available-tests.md b/website/docs/faqs/Tests/available-tests.md index f08e6841bd0..2b5fd3ff55c 100644 --- a/website/docs/faqs/Tests/available-tests.md +++ b/website/docs/faqs/Tests/available-tests.md @@ -12,6 +12,6 @@ Out of the box, dbt ships with the following tests: * `accepted_values` * `relationships` (i.e. referential integrity) -You can also write your own [custom schema tests](/docs/build/tests). +You can also write your own [custom schema data tests](/docs/build/data-tests). Some additional custom schema tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils/tree/0.2.4/#schema-tests), check out the docs on [packages](/docs/build/packages) to learn how to make these tests available in your project. diff --git a/website/docs/faqs/Tests/custom-test-thresholds.md b/website/docs/faqs/Tests/custom-test-thresholds.md index 34d2eec7494..400a5b4e28b 100644 --- a/website/docs/faqs/Tests/custom-test-thresholds.md +++ b/website/docs/faqs/Tests/custom-test-thresholds.md @@ -10,5 +10,5 @@ As of `v0.20.0`, you can use the `error_if` and `warn_if` configs to set custom For dbt `v0.19.0` and earlier, you could try these possible solutions: -* Setting the [severity](/reference/resource-properties/tests#severity) to `warn`, or: +* Setting the [severity](/reference/resource-properties/data-tests#severity) to `warn`, or: * Writing a [custom generic test](/best-practices/writing-custom-generic-tests) that accepts a threshold argument ([example](https://discourse.getdbt.com/t/creating-an-error-threshold-for-schema-tests/966)) diff --git a/website/docs/guides/airflow-and-dbt-cloud.md b/website/docs/guides/airflow-and-dbt-cloud.md index a3ff59af14e..e7f754ef02d 100644 --- a/website/docs/guides/airflow-and-dbt-cloud.md +++ b/website/docs/guides/airflow-and-dbt-cloud.md @@ -11,50 +11,29 @@ recently_updated: true ## Introduction -In some cases, [Airflow](https://airflow.apache.org/) may be the preferred orchestrator for your organization over working fully within dbt Cloud. There are a few reasons your team might be considering using Airflow to orchestrate your dbt jobs: - -- Your team is already using Airflow to orchestrate other processes -- Your team needs to ensure that a [dbt job](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview#schedule-and-run-dbt-jobs-in-production) kicks off before or after another process outside of dbt Cloud -- Your team needs flexibility to manage more complex scheduling, such as kicking off one dbt job only after another has completed -- Your team wants to own their own orchestration solution -- You need code to work right now without starting from scratch - -### Prerequisites - -- [dbt Cloud Teams or Enterprise account](https://www.getdbt.com/pricing/) (with [admin access](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions)) in order to create a service token. Permissions for service tokens can be found [here](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens#permissions-for-service-account-tokens). -- A [free Docker account](https://hub.docker.com/signup) in order to sign in to Docker Desktop, which will be installed in the initial setup. -- A local digital scratchpad for temporarily copy-pasting API keys and URLs - -### Airflow + dbt Core - -There are [so many great examples](https://gitlab.com/gitlab-data/analytics/-/blob/master/dags/transformation/dbt_snowplow_backfill.py) from GitLab through their open source data engineering work. This is especially appropriate if you are well-versed in Kubernetes, CI/CD, and docker task management when building your airflow pipelines. If this is you and your team, you’re in good hands reading through more details [here](https://about.gitlab.com/handbook/business-technology/data-team/platform/infrastructure/#airflow) and [here](https://about.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/). - -### Airflow + dbt Cloud API w/Custom Scripts - -This has served as a bridge until the fabled Astronomer + dbt Labs-built dbt Cloud provider became generally available [here](https://registry.astronomer.io/providers/dbt%20Cloud/versions/latest). - -There are many different permutations of this over time: - -- [Custom Python Scripts](https://github.com/sungchun12/airflow-dbt-cloud/blob/main/archive/dbt_cloud_example.py): This is an airflow DAG based on [custom python API utilities](https://github.com/sungchun12/airflow-dbt-cloud/blob/main/archive/dbt_cloud_utils.py) -- [Make API requests directly through the BashOperator based on the docs](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#operation/triggerRun): You can make cURL requests to invoke dbt Cloud to do what you want -- For more options, check out the [official dbt Docs](/docs/deploy/deployments#airflow) on the various ways teams are running dbt in airflow - -These solutions are great, but can be difficult to trust as your team grows and management for things like: testing, job definitions, secrets, and pipelines increase past your team’s capacity. Roles become blurry (or were never clearly defined at the start!). Both data and analytics engineers start digging through custom logging within each other’s workflows to make heads or tails of where and what the issue really is. Not to mention that when the issue is found, it can be even harder to decide on the best path forward for safely implementing fixes. This complex workflow and unclear delineation on process management results in a lot of misunderstandings and wasted time just trying to get the process to work smoothly! +Many organization already use [Airflow](https://airflow.apache.org/) to orchestrate their data workflows. dbt Cloud works great with Airflow, letting you execute your dbt code in dbt Cloud while keeping orchestration duties with Airflow. This ensures your project's metadata (important for tools like dbt Explorer) is available and up-to-date, while still enabling you to use Airflow for general tasks such as: +- Scheduling other processes outside of dbt runs +- Ensuring that a [dbt job](/docs/deploy/job-scheduler) kicks off before or after another process outside of dbt Cloud +- Triggering a dbt job only after another has completed In this guide, you'll learn how to: -1. Creating a working local Airflow environment -2. Invoking a dbt Cloud job with Airflow (with proof!) -3. Reusing tested and trusted Airflow code for your specific use cases +1. Create a working local Airflow environment +2. Invoke a dbt Cloud job with Airflow +3. Reuse tested and trusted Airflow code for your specific use cases You’ll also gain a better understanding of how this will: - Reduce the cognitive load when building and maintaining pipelines - Avoid dependency hell (think: `pip install` conflicts) -- Implement better recoveries from failures -- Define clearer workflows so that data and analytics engineers work better, together ♥️ +- Define clearer handoff of workflows between data engineers and analytics engineers + +## Prerequisites +- [dbt Cloud Teams or Enterprise account](https://www.getdbt.com/pricing/) (with [admin access](/docs/cloud/manage-access/enterprise-permissions)) in order to create a service token. Permissions for service tokens can be found [here](/docs/dbt-cloud-apis/service-tokens#permissions-for-service-account-tokens). +- A [free Docker account](https://hub.docker.com/signup) in order to sign in to Docker Desktop, which will be installed in the initial setup. +- A local digital scratchpad for temporarily copy-pasting API keys and URLs 🙌 Let’s get started! 🙌 @@ -72,7 +51,7 @@ brew install astro ## Install and start Docker Desktop -Docker allows us to spin up an environment with all the apps and dependencies we need for the example. +Docker allows us to spin up an environment with all the apps and dependencies we need for this guide. Follow the instructions [here](https://docs.docker.com/desktop/) to install Docker desktop for your own operating system. Once Docker is installed, ensure you have it up and running for the next steps. @@ -80,7 +59,7 @@ Follow the instructions [here](https://docs.docker.com/desktop/) to install Dock ## Clone the airflow-dbt-cloud repository -Open your terminal and clone the [airflow-dbt-cloud repository](https://github.com/sungchun12/airflow-dbt-cloud.git). This contains example Airflow DAGs that you’ll use to orchestrate your dbt Cloud job. Once cloned, navigate into the `airflow-dbt-cloud` project. +Open your terminal and clone the [airflow-dbt-cloud repository](https://github.com/sungchun12/airflow-dbt-cloud). This contains example Airflow DAGs that you’ll use to orchestrate your dbt Cloud job. Once cloned, navigate into the `airflow-dbt-cloud` project. ```bash git clone https://github.com/sungchun12/airflow-dbt-cloud.git @@ -91,12 +70,9 @@ cd airflow-dbt-cloud ## Start the Docker container -You can initialize an Astronomer project in an empty local directory using a Docker container, and then run your project locally using the `start` command. - -1. Run the following commands to initialize your project and start your local Airflow deployment: +1. From the `airflow-dbt-cloud` directory you cloned and opened in the prior step, run the following command to start your local Airflow deployment: ```bash - astro dev init astro dev start ``` @@ -110,10 +86,10 @@ You can initialize an Astronomer project in an empty local directory using a Doc Airflow Webserver: http://localhost:8080 Postgres Database: localhost:5432/postgres The default Airflow UI credentials are: admin:admin - The default Postrgres DB credentials are: postgres:postgres + The default Postgres DB credentials are: postgres:postgres ``` -2. Open the Airflow interface. Launch your web browser and navigate to the address for the **Airflow Webserver** from your output in Step 1. +2. Open the Airflow interface. Launch your web browser and navigate to the address for the **Airflow Webserver** from your output above (for us, `http://localhost:8080`). This will take you to your local instance of Airflow. You’ll need to log in with the **default credentials**: @@ -126,15 +102,15 @@ You can initialize an Astronomer project in an empty local directory using a Doc ## Create a dbt Cloud service token -Create a service token from within dbt Cloud using the instructions [found here](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens). Ensure that you save a copy of the token, as you won’t be able to access this later. In this example we use `Account Admin`, but you can also use `Job Admin` instead for token permissions. +[Create a service token](/docs/dbt-cloud-apis/service-tokens) with `Job Admin` privileges from within dbt Cloud. Ensure that you save a copy of the token, as you won’t be able to access this later. ## Create a dbt Cloud job -In your dbt Cloud account create a job, paying special attention to the information in the bullets below. Additional information for creating a dbt Cloud job can be found [here](/guides/bigquery). +[Create a job in your dbt Cloud account](/docs/deploy/deploy-jobs#create-and-schedule-jobs), paying special attention to the information in the bullets below. -- Configure the job with the commands that you want to include when this job kicks off, as Airflow will be referring to the job’s configurations for this rather than being explicitly coded in the Airflow DAG. This job will run a set of commands rather than a single command. +- Configure the job with the full commands that you want to include when this job kicks off. This sample code has Airflow triggering the dbt Cloud job and all of its commands, instead of explicitly identifying individual models to run from inside of Airflow. - Ensure that the schedule is turned **off** since we’ll be using Airflow to kick things off. - Once you hit `save` on the job, make sure you copy the URL and save it for referencing later. The url will look similar to this: @@ -144,77 +120,59 @@ https://cloud.getdbt.com/#/accounts/{account_id}/projects/{project_id}/jobs/{job -## Add your dbt Cloud API token as a secure connection - - +## Connect dbt Cloud to Airflow -Now you have all the working pieces to get up and running with Airflow + dbt Cloud. Let’s dive into make this all work together. We will **set up a connection** and **run a DAG in Airflow** that kicks off a dbt Cloud job. +Now you have all the working pieces to get up and running with Airflow + dbt Cloud. It's time to **set up a connection** and **run a DAG in Airflow** that kicks off a dbt Cloud job. -1. Navigate to Admin and click on **Connections** +1. From the Airflow interface, navigate to Admin and click on **Connections** ![Airflow connections menu](/img/guides/orchestration/airflow-and-dbt-cloud/airflow-connections-menu.png) 2. Click on the `+` sign to add a new connection, then click on the drop down to search for the dbt Cloud Connection Type - ![Create connection](/img/guides/orchestration/airflow-and-dbt-cloud/create-connection.png) - ![Connection type](/img/guides/orchestration/airflow-and-dbt-cloud/connection-type.png) 3. Add in your connection details and your default dbt Cloud account id. This is found in your dbt Cloud URL after the accounts route section (`/accounts/{YOUR_ACCOUNT_ID}`), for example the account with id 16173 would see this in their URL: `https://cloud.getdbt.com/#/accounts/16173/projects/36467/jobs/65767/` -![https://lh3.googleusercontent.com/sRxe5xbv_LYhIKblc7eiY7AmByr1OibOac2_fIe54rpU3TBGwjMpdi_j0EPEFzM1_gNQXry7Jsm8aVw9wQBSNs1I6Cyzpvijaj0VGwSnmVf3OEV8Hv5EPOQHrwQgK2RhNBdyBxN2](https://lh3.googleusercontent.com/sRxe5xbv_LYhIKblc7eiY7AmByr1OibOac2_fIe54rpU3TBGwjMpdi_j0EPEFzM1_gNQXry7Jsm8aVw9wQBSNs1I6Cyzpvijaj0VGwSnmVf3OEV8Hv5EPOQHrwQgK2RhNBdyBxN2) - -## Add your `job_id` and `account_id` config details to the python file + ![Connection type](/img/guides/orchestration/airflow-and-dbt-cloud/connection-type-configured.png) - Add your `job_id` and `account_id` config details to the python file: [dbt_cloud_provider_eltml.py](https://github.com/sungchun12/airflow-dbt-cloud/blob/main/dags/dbt_cloud_provider_eltml.py). +## Update the placeholders in the sample code -1. You’ll find these details within the dbt Cloud job URL, see the comments in the code snippet below for an example. + Add your `account_id` and `job_id` to the python file [dbt_cloud_provider_eltml.py](https://github.com/sungchun12/airflow-dbt-cloud/blob/main/dags/dbt_cloud_provider_eltml.py). - ```python - # dbt Cloud Job URL: https://cloud.getdbt.com/#/accounts/16173/projects/36467/jobs/65767/ - # account_id: 16173 - #job_id: 65767 +Both IDs are included inside of the dbt Cloud job URL as shown in the following snippets: - # line 28 - default_args={"dbt_cloud_conn_id": "dbt_cloud", "account_id": 16173}, +```python +# For the dbt Cloud Job URL https://cloud.getdbt.com/#/accounts/16173/projects/36467/jobs/65767/ +# The account_id is 16173 - trigger_dbt_cloud_job_run = DbtCloudRunJobOperator( - task_id="trigger_dbt_cloud_job_run", - job_id=65767, # line 39 - check_interval=10, - timeout=300, - ) - ``` - -2. Turn on the DAG and verify the job succeeded after running. Note: screenshots taken from different job runs, but the user experience is consistent. - - ![https://lh6.googleusercontent.com/p8AqQRy0UGVLjDGPmcuGYmQ_BRodyL0Zis-eQgSmp69EHbKW51o4S-bCl1fXHlOmwpYEBxD0A-O1Q1hwt-VDVMO1wWH-AIeaoelBx06JXRJ0m1OcHaPpFKH0xDiduIhNlQhhbLiy](https://lh6.googleusercontent.com/p8AqQRy0UGVLjDGPmcuGYmQ_BRodyL0Zis-eQgSmp69EHbKW51o4S-bCl1fXHlOmwpYEBxD0A-O1Q1hwt-VDVMO1wWH-AIeaoelBx06JXRJ0m1OcHaPpFKH0xDiduIhNlQhhbLiy) - - ![Airflow DAG](/img/guides/orchestration/airflow-and-dbt-cloud/airflow-dag.png) - - ![Task run instance](/img/guides/orchestration/airflow-and-dbt-cloud/task-run-instance.png) - - ![https://lh6.googleusercontent.com/S9QdGhLAdioZ3x634CChugsJRiSVtTTd5CTXbRL8ADA6nSbAlNn4zV0jb3aC946c8SGi9FRTfyTFXqjcM-EBrJNK5hQ0HHAsR5Fj7NbdGoUfBI7xFmgeoPqnoYpjyZzRZlXkjtxS](https://lh6.googleusercontent.com/S9QdGhLAdioZ3x634CChugsJRiSVtTTd5CTXbRL8ADA6nSbAlNn4zV0jb3aC946c8SGi9FRTfyTFXqjcM-EBrJNK5hQ0HHAsR5Fj7NbdGoUfBI7xFmgeoPqnoYpjyZzRZlXkjtxS) - -## How do I rerun the dbt Cloud job and downstream tasks in my pipeline? - -If you have worked with dbt Cloud before, you have likely encountered cases where a job fails. In those cases, you have likely logged into dbt Cloud, investigated the error, and then manually restarted the job. - -This section of the guide will show you how to restart the job directly from Airflow. This will specifically run *just* the `trigger_dbt_cloud_job_run` and downstream tasks of the Airflow DAG and not the entire DAG. If only the transformation step fails, you don’t need to re-run the extract and load processes. Let’s jump into how to do that in Airflow. - -1. Click on the task +# Update line 28 +default_args={"dbt_cloud_conn_id": "dbt_cloud", "account_id": 16173}, +``` - ![Task DAG view](/img/guides/orchestration/airflow-and-dbt-cloud/task-dag-view.png) +```python +# For the dbt Cloud Job URL https://cloud.getdbt.com/#/accounts/16173/projects/36467/jobs/65767/ +# The job_id is 65767 + +# Update line 39 +trigger_dbt_cloud_job_run = DbtCloudRunJobOperator( + task_id="trigger_dbt_cloud_job_run", + job_id=65767, + check_interval=10, + timeout=300, + ) +``` -2. Clear the task instance + - ![Clear task instance](/img/guides/orchestration/airflow-and-dbt-cloud/clear-task-instance.png) +## Run the Airflow DAG - ![Approve clearing](/img/guides/orchestration/airflow-and-dbt-cloud/approve-clearing.png) +Turn on the DAG and trigger it to run. Verify the job succeeded after running. -3. Watch it rerun in real time +![Airflow DAG](/img/guides/orchestration/airflow-and-dbt-cloud/airflow-dag.png) - ![Re-run](/img/guides/orchestration/airflow-and-dbt-cloud/re-run.png) +Click Monitor Job Run to open the run details in dbt Cloud. +![Task run instance](/img/guides/orchestration/airflow-and-dbt-cloud/task-run-instance.png) ## Cleaning up @@ -224,9 +182,9 @@ At the end of this guide, make sure you shut down your docker container. When y $ astrocloud dev stop [+] Running 3/3 - ⠿ Container airflow-dbt-cloud_e3fe3c-webserver-1 Stopped 7.5s - ⠿ Container airflow-dbt-cloud_e3fe3c-scheduler-1 Stopped 3.3s - ⠿ Container airflow-dbt-cloud_e3fe3c-postgres-1 Stopped 0.3s + ⠿ Container airflow-dbt-cloud_e3fe3c-webserver-1 Stopped 7.5s + ⠿ Container airflow-dbt-cloud_e3fe3c-scheduler-1 Stopped 3.3s + ⠿ Container airflow-dbt-cloud_e3fe3c-postgres-1 Stopped 0.3s ``` To verify that the deployment has stopped, use the following command: @@ -244,37 +202,29 @@ airflow-dbt-cloud_e3fe3c-scheduler-1 exited airflow-dbt-cloud_e3fe3c-postgres-1 exited ``` - + ## Frequently asked questions ### How can we run specific subsections of the dbt DAG in Airflow? -Because of the way we configured the dbt Cloud job to run in Airflow, you can leave this job to your analytics engineers to define in the job configurations from dbt Cloud. If, for example, we need to run hourly-tagged models every hour and daily-tagged models daily, we can create jobs like `Hourly Run` or `Daily Run` and utilize the commands `dbt run -s tag:hourly` and `dbt run -s tag:daily` within each, respectively. We only need to grab our dbt Cloud `account` and `job id`, configure it in an Airflow DAG with the code provided, and then we can be on your way. See more node selection options: [here](/reference/node-selection/syntax) - -### How can I re-run models from the point of failure? - -You may want to parse the dbt DAG in Airflow to get the benefit of re-running from the point of failure. However, when you have hundreds of models in your DAG expanded out, it becomes useless for diagnosis and rerunning due to the overhead that comes along with creating an expansive Airflow DAG. +Because the Airflow DAG references dbt Cloud jobs, your analytics engineers can take responsibility for configuring the jobs in dbt Cloud. -You can’t re-run from failure natively in dbt Cloud today (feature coming!), but you can use a custom rerun parser. +For example, to run some models hourly and others daily, there will be jobs like `Hourly Run` or `Daily Run` using the commands `dbt run --select tag:hourly` and `dbt run --select tag:daily` respectively. Once configured in dbt Cloud, these can be added as steps in an Airflow DAG as shown in this guide. Refer to our full [node selection syntax docs here](/reference/node-selection/syntax). -Using a simple python script coupled with the dbt Cloud provider, you can: - -- Avoid managing artifacts in a separate storage bucket(dbt Cloud does this for you) -- Avoid building your own parsing logic -- Get clear logs on what models you're rerunning in dbt Cloud (without hard coding step override commands) - -Watch the video below to see how it works! +### How can I re-run models from the point of failure? - +You can trigger re-run from point of failure with the `rerun` API endpoint. See the docs on [retrying jobs](/docs/deploy/retry-jobs) for more information. ### Should Airflow run one big dbt job or many dbt jobs? -Overall we recommend being as purposeful and minimalistic as you can. This is because dbt manages all of the dependencies between models and the orchestration of running those dependencies in order, which in turn has benefits in terms of warehouse processing efforts. +dbt jobs are most effective when a build command contains as many models at once as is practical. This is because dbt manages the dependencies between models and coordinates running them in order, which ensures that your jobs can run in a highly parallelized fashion. It also streamlines the debugging process when a model fails and enables re-run from point of failure. + +As an explicit example, it's not recommended to have a dbt job for every single node in your DAG. Try combining your steps according to desired run frequency, or grouping by department (finance, marketing, customer success...) instead. ### We want to kick off our dbt jobs after our ingestion tool (such as Fivetran) / data pipelines are done loading data. Any best practices around that? -Our friends at Astronomer answer this question with this example: [here](https://registry.astronomer.io/dags/fivetran-dbt-cloud-census) +Astronomer's DAG registry has a sample workflow combining Fivetran, dbt Cloud and Census [here](https://registry.astronomer.io/dags/fivetran-dbt_cloud-census/versions/3.0.0). ### How do you set up a CI/CD workflow with Airflow? @@ -285,12 +235,12 @@ Check out these two resources for accomplishing your own CI/CD pipeline: ### Can dbt dynamically create tasks in the DAG like Airflow can? -We prefer to keep models bundled vs. unbundled. You can go this route, but if you have hundreds of dbt models, it’s more effective to let the dbt Cloud job handle the models and dependencies. Bundling provides the solution to clear observability when things go wrong - we've seen more success in having the ability to clearly see issues in a bundled dbt Cloud job than combing through the nodes of an expansive Airflow DAG. If you still have a use case for this level of control though, our friends at Astronomer answer this question [here](https://www.astronomer.io/blog/airflow-dbt-1/)! +As discussed above, we prefer to keep jobs bundled together and containing as many nodes as are necessary. If you must run nodes one at a time for some reason, then review [this article](https://www.astronomer.io/blog/airflow-dbt-1/) for some pointers. -### Can you trigger notifications if a dbt job fails with Airflow? Is there any way to access the status of the dbt Job to do that? +### Can you trigger notifications if a dbt job fails with Airflow? -Yes, either through [Airflow's email/slack](https://www.astronomer.io/guides/error-notifications-in-airflow/) functionality by itself or combined with [dbt Cloud's notifications](/docs/deploy/job-notifications), which support email and slack notifications. +Yes, either through [Airflow's email/slack](https://www.astronomer.io/guides/error-notifications-in-airflow/) functionality, or [dbt Cloud's notifications](/docs/deploy/job-notifications), which support email and Slack notifications. You could also create a [webhook](/docs/deploy/webhooks). -### Are there decision criteria for how to best work with dbt Cloud and airflow? +### How should I plan my dbt Cloud + Airflow implementation? -Check out this deep dive into planning your dbt Cloud + Airflow implementation [here](https://www.youtube.com/watch?v=n7IIThR8hGk)! +Check out [this recording](https://www.youtube.com/watch?v=n7IIThR8hGk) of a dbt meetup for some tips. diff --git a/website/docs/guides/bigquery-qs.md b/website/docs/guides/bigquery-qs.md index c1f632f0621..9cf2447fa52 100644 --- a/website/docs/guides/bigquery-qs.md +++ b/website/docs/guides/bigquery-qs.md @@ -78,7 +78,7 @@ In order to let dbt connect to your warehouse, you'll need to generate a keyfile - Click **Next** to create a new service account. 2. Create a service account for your new project from the [Service accounts page](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts?supportedpurview=project). For more information, refer to [Create a service account](https://developers.google.com/workspace/guides/create-credentials#create_a_service_account) in the Google Cloud docs. As an example for this guide, you can: - Type `dbt-user` as the **Service account name** - - From the **Select a role** dropdown, choose **BigQuery Admin** and click **Continue** + - From the **Select a role** dropdown, choose **BigQuery Job User** and **BigQuery Data Editor** roles and click **Continue** - Leave the **Grant users access to this service account** fields blank - Click **Done** 3. Create a service account key for your new project from the [Service accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys&start_index=1#step_index=1). For more information, refer to [Create a service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating) in the Google Cloud docs. When downloading the JSON file, make sure to use a filename you can easily remember. For example, `dbt-user-creds.json`. For security reasons, dbt Labs recommends that you protect this JSON file like you would your identity credentials; for example, don't check the JSON file into your version control software. diff --git a/website/docs/guides/building-packages.md b/website/docs/guides/building-packages.md index 641a1c6af6d..55f0c2ed912 100644 --- a/website/docs/guides/building-packages.md +++ b/website/docs/guides/building-packages.md @@ -104,7 +104,7 @@ dbt makes it possible for users of your package to override your model `, you can proceed with the rest of this guide. -## Configure Databricks workflows for dbt Cloud jobs - -To use Databricks workflows for running dbt Cloud jobs, you need to perform the following steps: - -- [Set up a Databricks secret scope](#set-up-a-databricks-secret-scope) -- [Create a Databricks Python notebook](#create-a-databricks-python-notebook) -- [Configure the workflows to run the dbt Cloud jobs](#configure-the-workflows-to-run-the-dbt-cloud-jobs) - ## Set up a Databricks secret scope 1. Retrieve **[User API Token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens#user-api-tokens) **or **[Service Account Token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens#generating-service-account-tokens) **from dbt Cloud diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md index c74d30db51c..e9c1af259ac 100644 --- a/website/docs/guides/manual-install-qs.md +++ b/website/docs/guides/manual-install-qs.md @@ -16,7 +16,7 @@ When you use dbt Core to work with dbt, you will be editing files locally using * To use dbt Core, it's important that you know some basics of the Terminal. In particular, you should understand `cd`, `ls` and `pwd` to navigate through the directory structure of your computer easily. * Install dbt Core using the [installation instructions](/docs/core/installation-overview) for your operating system. -* Complete [Setting up (in BigQuery)](/guides/bigquery?step=2) and [Loading data (BigQuery)](/guides/bigquery?step=3). +* Complete appropriate Setting up and Loading data steps in the Quickstart for dbt Cloud series. For example, for BigQuery, complete [Setting up (in BigQuery)](/guides/bigquery?step=2) and [Loading data (BigQuery)](/guides/bigquery?step=3). * [Create a GitHub account](https://github.com/join) if you don't already have one. ### Create a starter project diff --git a/website/docs/guides/productionize-your-dbt-databricks-project.md b/website/docs/guides/productionize-your-dbt-databricks-project.md index b95d8ffd2dd..3584cffba77 100644 --- a/website/docs/guides/productionize-your-dbt-databricks-project.md +++ b/website/docs/guides/productionize-your-dbt-databricks-project.md @@ -81,7 +81,7 @@ CI/CD, or Continuous Integration and Continuous Deployment/Delivery, has become The steps below show how to create a CI test for your dbt project. CD in dbt Cloud requires no additional steps, as your jobs will automatically pick up the latest changes from the branch assigned to the environment your job is running in. You may choose to add steps depending on your deployment strategy. If you want to dive deeper into CD options, check out [this blog on adopting CI/CD with dbt Cloud](https://www.getdbt.com/blog/adopting-ci-cd-with-dbt-cloud/). -dbt allows you to write [tests](/docs/build/tests) for your data pipeline, which can be run at every step of the process to ensure the stability and correctness of your data transformations. The main places you’ll use your dbt tests are: +dbt allows you to write [tests](/docs/build/data-tests) for your data pipeline, which can be run at every step of the process to ensure the stability and correctness of your data transformations. The main places you’ll use your dbt tests are: 1. **Daily runs:** Regularly running tests on your data pipeline helps catch issues caused by bad source data, ensuring the quality of data that reaches your users. 2. **Development**: Running tests during development ensures that your code changes do not break existing assumptions, enabling developers to iterate faster by catching problems immediately after writing code. diff --git a/website/docs/guides/set-up-your-databricks-dbt-project.md b/website/docs/guides/set-up-your-databricks-dbt-project.md index c17c6a1f99e..b2988f36589 100644 --- a/website/docs/guides/set-up-your-databricks-dbt-project.md +++ b/website/docs/guides/set-up-your-databricks-dbt-project.md @@ -62,7 +62,7 @@ Let’s [create a Databricks SQL warehouse](https://docs.databricks.com/sql/admi 5. Click *Create* 6. Configure warehouse permissions to ensure our service principal and developer have the right access. -We are not covering python in this post but if you want to learn more, check out these [docs](https://docs.getdbt.com/docs/build/python-models#specific-data-platforms). Depending on your workload, you may wish to create a larger SQL Warehouse for production workflows while having a smaller development SQL Warehouse (if you’re not using Serverless SQL Warehouses). +We are not covering python in this post but if you want to learn more, check out these [docs](https://docs.getdbt.com/docs/build/python-models#specific-data-platforms). Depending on your workload, you may wish to create a larger SQL Warehouse for production workflows while having a smaller development SQL Warehouse (if you’re not using Serverless SQL Warehouses). As your project grows, you might want to apply [compute per model configurations](/reference/resource-configs/databricks-configs#specifying-the-compute-for-models). ## Configure your dbt project diff --git a/website/docs/reference/artifacts/dbt-artifacts.md b/website/docs/reference/artifacts/dbt-artifacts.md index 859fde7c908..31525777500 100644 --- a/website/docs/reference/artifacts/dbt-artifacts.md +++ b/website/docs/reference/artifacts/dbt-artifacts.md @@ -48,3 +48,6 @@ In the manifest, the `metadata` may also include: #### Notes: - The structure of dbt artifacts is canonized by [JSON schemas](https://json-schema.org/), which are hosted at **schemas.getdbt.com**. - Artifact versions may change in any minor version of dbt (`v1.x.0`). Each artifact is versioned independently. + +## Related docs +- [Other artifacts](/reference/artifacts/other-artifacts) files such as `index.html` or `graph_summary.json`. diff --git a/website/docs/reference/artifacts/other-artifacts.md b/website/docs/reference/artifacts/other-artifacts.md index 205bdfc1a14..60050a6be66 100644 --- a/website/docs/reference/artifacts/other-artifacts.md +++ b/website/docs/reference/artifacts/other-artifacts.md @@ -21,7 +21,7 @@ This file is used to store a compressed representation of files dbt has parsed. **Produced by:** commands supporting [node selection](/reference/node-selection/syntax) -Stores the networkx representation of the dbt resource DAG. +Stores the network representation of the dbt resource DAG. ### graph_summary.json diff --git a/website/docs/reference/commands/test.md b/website/docs/reference/commands/test.md index c050d82a0ab..373ad9b6db3 100644 --- a/website/docs/reference/commands/test.md +++ b/website/docs/reference/commands/test.md @@ -28,4 +28,4 @@ dbt test --select "one_specific_model,test_type:singular" dbt test --select "one_specific_model,test_type:generic" ``` -For more information on writing tests, see the [Testing Documentation](/docs/build/tests). +For more information on writing tests, see the [Testing Documentation](/docs/build/data-tests). diff --git a/website/docs/reference/configs-and-properties.md b/website/docs/reference/configs-and-properties.md index c6458babeaa..9464faf719d 100644 --- a/website/docs/reference/configs-and-properties.md +++ b/website/docs/reference/configs-and-properties.md @@ -11,7 +11,7 @@ A rule of thumb: properties declare things _about_ your project resources; confi For example, you can use resource **properties** to: * Describe models, snapshots, seed files, and their columns -* Assert "truths" about a model, in the form of [tests](/docs/build/tests), e.g. "this `id` column is unique" +* Assert "truths" about a model, in the form of [data tests](/docs/build/data-tests), e.g. "this `id` column is unique" * Define pointers to existing tables that contain raw data, in the form of [sources](/docs/build/sources), and assert the expected "freshness" of this raw data * Define official downstream uses of your data models, in the form of [exposures](/docs/build/exposures) @@ -33,7 +33,7 @@ Depending on the resource type, configurations can be defined: dbt prioritizes configurations in order of specificity, from most specificity to least specificity. This generally follows the order above: an in-file `config()` block --> properties defined in a `.yml` file --> config defined in the project file. -Note - Generic tests work a little differently when it comes to specificity. See [test configs](/reference/test-configs). +Note - Generic data tests work a little differently when it comes to specificity. See [test configs](/reference/data-test-configs). Within the project file, configurations are also applied hierarchically. The most specific config always "wins": In the project file, configurations applied to a `marketing` subdirectory will take precedence over configurations applied to the entire `jaffle_shop` project. To apply a configuration to a model, or directory of models, define the resource path as nested dictionary keys. @@ -76,7 +76,7 @@ Certain properties are special, because: These properties are: - [`description`](/reference/resource-properties/description) -- [`tests`](/reference/resource-properties/tests) +- [`tests`](/reference/resource-properties/data-tests) - [`docs`](/reference/resource-configs/docs) - [`columns`](/reference/resource-properties/columns) - [`quote`](/reference/resource-properties/quote) diff --git a/website/docs/reference/test-configs.md b/website/docs/reference/data-test-configs.md similarity index 87% rename from website/docs/reference/test-configs.md rename to website/docs/reference/data-test-configs.md index 960e8d5471a..5f922d08c6b 100644 --- a/website/docs/reference/test-configs.md +++ b/website/docs/reference/data-test-configs.md @@ -1,8 +1,8 @@ --- -title: Test configurations -description: "Read this guide to learn about using test configurations in dbt." +title: Data test configurations +description: "Read this guide to learn about using data test configurations in dbt." meta: - resource_type: Tests + resource_type: Data tests --- import ConfigResource from '/snippets/_config-description-resource.md'; import ConfigGeneral from '/snippets/_config-description-general.md'; @@ -10,20 +10,20 @@ import ConfigGeneral from '/snippets/_config-description-general.md'; ## Related documentation -* [Tests](/docs/build/tests) +* [Data tests](/docs/build/data-tests) -Tests can be configured in a few different ways: -1. Properties within `.yml` definition (generic tests only, see [test properties](/reference/resource-properties/tests) for full syntax) +Data tests can be configured in a few different ways: +1. Properties within `.yml` definition (generic tests only, see [test properties](/reference/resource-properties/data-tests) for full syntax) 2. A `config()` block within the test's SQL definition 3. In `dbt_project.yml` -Test configs are applied hierarchically, in the order of specificity outlined above. In the case of a singular test, the `config()` block within the SQL definition takes precedence over configs in the project file. In the case of a specific instance of a generic test, the test's `.yml` properties would take precedence over any values set in its generic SQL definition's `config()`, which in turn would take precedence over values set in `dbt_project.yml`. +Data test configs are applied hierarchically, in the order of specificity outlined above. In the case of a singular test, the `config()` block within the SQL definition takes precedence over configs in the project file. In the case of a specific instance of a generic test, the test's `.yml` properties would take precedence over any values set in its generic SQL definition's `config()`, which in turn would take precedence over values set in `dbt_project.yml`. ## Available configurations Click the link on each configuration option to read more about what it can do. -### Test-specific configurations +### Data test-specific configurations @@ -204,7 +204,7 @@ version: 2 [alias](/reference/resource-configs/alias): ``` -This configuration mechanism is supported for specific instances of generic tests only. To configure a specific singular test, you should use the `config()` macro in its SQL definition. +This configuration mechanism is supported for specific instances of generic data tests only. To configure a specific singular test, you should use the `config()` macro in its SQL definition. @@ -216,7 +216,7 @@ This configuration mechanism is supported for specific instances of generic test #### Add a tag to one test -If a specific instance of a generic test: +If a specific instance of a generic data test: @@ -232,7 +232,7 @@ models: -If a singular test: +If a singular data test: @@ -244,7 +244,7 @@ select ... -#### Set the default severity for all instances of a generic test +#### Set the default severity for all instances of a generic data test @@ -260,7 +260,7 @@ select ... -#### Disable all tests from a package +#### Disable all data tests from a package diff --git a/website/docs/reference/dbt-jinja-functions/config.md b/website/docs/reference/dbt-jinja-functions/config.md index c2fc8f96e5b..3903c82eef7 100644 --- a/website/docs/reference/dbt-jinja-functions/config.md +++ b/website/docs/reference/dbt-jinja-functions/config.md @@ -25,8 +25,6 @@ is responsible for handling model code that looks like this: ``` Review [Model configurations](/reference/model-configs) for examples and more information on valid arguments. -https://docs.getdbt.com/reference/model-configs - ## config.get __Args__: diff --git a/website/docs/reference/dbt-jinja-functions/return.md b/website/docs/reference/dbt-jinja-functions/return.md index 43bbddfa2d1..d2069bc9254 100644 --- a/website/docs/reference/dbt-jinja-functions/return.md +++ b/website/docs/reference/dbt-jinja-functions/return.md @@ -1,6 +1,6 @@ --- title: "About return function" -sidebar_variable: "return" +sidebar_label: "return" id: "return" description: "Read this guide to understand the return Jinja function in dbt." --- diff --git a/website/docs/reference/dbt_project.yml.md b/website/docs/reference/dbt_project.yml.md index 34af0f696c7..7b5d54c3e03 100644 --- a/website/docs/reference/dbt_project.yml.md +++ b/website/docs/reference/dbt_project.yml.md @@ -81,7 +81,7 @@ sources: [](source-configs) tests: - [](/reference/test-configs) + [](/reference/data-test-configs) vars: [](/docs/build/project-variables) @@ -153,7 +153,7 @@ sources: [](source-configs) tests: - [](/reference/test-configs) + [](/reference/data-test-configs) vars: [](/docs/build/project-variables) @@ -222,7 +222,7 @@ sources: [](source-configs) tests: - [](/reference/test-configs) + [](/reference/data-test-configs) vars: [](/docs/build/project-variables) diff --git a/website/docs/reference/model-properties.md b/website/docs/reference/model-properties.md index 63adc1f0d63..65f9307b5b3 100644 --- a/website/docs/reference/model-properties.md +++ b/website/docs/reference/model-properties.md @@ -23,9 +23,9 @@ models: [](/reference/model-configs): [constraints](/reference/resource-properties/constraints): - - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - - ... # declare additional tests + - ... # declare additional data tests [columns](/reference/resource-properties/columns): - name: # required [description](/reference/resource-properties/description): @@ -33,9 +33,9 @@ models: [quote](/reference/resource-properties/quote): true | false [constraints](/reference/resource-properties/constraints): - - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - - ... # declare additional tests + - ... # declare additional data tests [tags](/reference/resource-configs/tags): [] - name: ... # declare properties of additional columns @@ -51,9 +51,9 @@ models: - [config](/reference/resource-properties/config): [](/reference/model-configs): - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - - ... # declare additional tests + - ... # declare additional data tests columns: # include/exclude columns from the top-level model properties - [include](/reference/resource-properties/include-exclude): @@ -63,9 +63,9 @@ models: [quote](/reference/resource-properties/quote): true | false [constraints](/reference/resource-properties/constraints): - - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - - ... # declare additional tests + - ... # declare additional data tests [tags](/reference/resource-configs/tags): [] - v: ... # declare additional versions diff --git a/website/docs/reference/node-selection/defer.md b/website/docs/reference/node-selection/defer.md index 03c3b2aac12..81a0f4a0328 100644 --- a/website/docs/reference/node-selection/defer.md +++ b/website/docs/reference/node-selection/defer.md @@ -234,3 +234,8 @@ dbt will check to see if `dev_alice.model_a` exists. If it doesn't exist, dbt wi + +## Related docs + +- [Using defer in dbt Cloud](/docs/cloud/about-cloud-develop-defer) + diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md index e29612e3401..2ffe0ea599e 100644 --- a/website/docs/reference/node-selection/methods.md +++ b/website/docs/reference/node-selection/methods.md @@ -173,7 +173,7 @@ dbt test --select "test_type:singular" # run all singular tests The `test_name` method is used to select tests based on the name of the generic test that defines it. For more information about how generic tests are defined, read about -[tests](/docs/build/tests). +[tests](/docs/build/data-tests). ```bash diff --git a/website/docs/reference/node-selection/syntax.md b/website/docs/reference/node-selection/syntax.md index d0ea4a9acd8..22946903b7d 100644 --- a/website/docs/reference/node-selection/syntax.md +++ b/website/docs/reference/node-selection/syntax.md @@ -35,7 +35,7 @@ To follow [POSIX standards](https://pubs.opengroup.org/onlinepubs/9699919799/bas 3. dbt now has a list of still-selected resources of varying types. As a final step, it tosses away any resource that does not match the resource type of the current task. (Only seeds are kept for `dbt seed`, only models for `dbt run`, only tests for `dbt test`, and so on.) -### Shorthand +## Shorthand Select resources to build (run, test, seed, snapshot) or check freshness: `--select`, `-s` @@ -43,6 +43,9 @@ Select resources to build (run, test, seed, snapshot) or check freshness: `--sel By default, `dbt run` will execute _all_ of the models in the dependency graph. During development (and deployment), it is useful to specify only a subset of models to run. Use the `--select` flag with `dbt run` to select a subset of models to run. Note that the following arguments (`--select`, `--exclude`, and `--selector`) also apply to other dbt tasks, such as `test` and `build`. + + + The `--select` flag accepts one or more arguments. Each argument can be one of: 1. a package name @@ -52,8 +55,7 @@ The `--select` flag accepts one or more arguments. Each argument can be one of: Examples: - - ```bash +```bash dbt run --select "my_dbt_project_name" # runs all models in your project dbt run --select "my_dbt_model" # runs a specific model dbt run --select "path.to.my.models" # runs all models in a specific directory @@ -61,14 +63,30 @@ dbt run --select "my_package.some_model" # run a specific model in a specific pa dbt run --select "tag:nightly" # run models with the "nightly" tag dbt run --select "path/to/models" # run models contained in path/to/models dbt run --select "path/to/my_model.sql" # run a specific model by its path - ``` +``` -dbt supports a shorthand language for defining subsets of nodes. This language uses the characters `+`, `@`, `*`, and `,`. + + - ```bash +dbt supports a shorthand language for defining subsets of nodes. This language uses the following characters: + +- plus operator [(`+`)](/reference/node-selection/graph-operators#the-plus-operator) +- at operator [(`@`)](/reference/node-selection/graph-operators#the-at-operator) +- asterisk operator (`*`) +- comma operator (`,`) + +Examples: + +```bash # multiple arguments can be provided to --select - dbt run --select "my_first_model my_second_model" +dbt run --select "my_first_model my_second_model" + +# select my_model and all of its children +dbt run --select "my_model+" + +# select my_model, its children, and the parents of its children +dbt run --models @my_model # these arguments can be projects, models, directory paths, tags, or sources dbt run --select "tag:nightly my_model finance.base.*" @@ -77,6 +95,10 @@ dbt run --select "tag:nightly my_model finance.base.*" dbt run --select "path:marts/finance,tag:nightly,config.materialized:table" ``` + + + + As your selection logic gets more complex, and becomes unwieldly to type out as command-line arguments, consider using a [yaml selector](/reference/node-selection/yaml-selectors). You can use a predefined definition with the `--selector` flag. Note that when you're using `--selector`, most other flags (namely `--select` and `--exclude`) will be ignored. diff --git a/website/docs/reference/project-configs/test-paths.md b/website/docs/reference/project-configs/test-paths.md index e3d0e0b76fa..59f17db05eb 100644 --- a/website/docs/reference/project-configs/test-paths.md +++ b/website/docs/reference/project-configs/test-paths.md @@ -13,7 +13,7 @@ test-paths: [directorypath] ## Definition -Optionally specify a custom list of directories where [singular tests](/docs/build/tests) are located. +Optionally specify a custom list of directories where [singular tests](/docs/build/data-tests) are located. ## Default diff --git a/website/docs/reference/resource-configs/access.md b/website/docs/reference/resource-configs/access.md index da50e48d2f0..fcde93647a1 100644 --- a/website/docs/reference/resource-configs/access.md +++ b/website/docs/reference/resource-configs/access.md @@ -27,7 +27,7 @@ You can apply access modifiers in config files, including `the dbt_project.yml`, There are multiple approaches to configuring access: -In the model configs of `dbt_project.yml``: +In the model configs of `dbt_project.yml`: ```yaml models: diff --git a/website/docs/reference/resource-configs/alias.md b/website/docs/reference/resource-configs/alias.md index 6b7588ecaf7..e1d3ae41f8b 100644 --- a/website/docs/reference/resource-configs/alias.md +++ b/website/docs/reference/resource-configs/alias.md @@ -112,7 +112,7 @@ When using `--store-failures`, this would return the name `analytics.finance.ord ## Definition -Optionally specify a custom alias for a [model](/docs/build/models), [tests](/docs/build/tests), [snapshots](/docs/build/snapshots), or [seed](/docs/build/seeds). +Optionally specify a custom alias for a [model](/docs/build/models), [data test](/docs/build/data-tests), [snapshot](/docs/build/snapshots), or [seed](/docs/build/seeds). When dbt creates a relation (/) in a database, it creates it as: `{{ database }}.{{ schema }}.{{ identifier }}`, e.g. `analytics.finance.payments` diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index ffbaa37c059..d3497a02caf 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -718,3 +718,188 @@ Views with this configuration will be able to select from objects in `project_1. The `grant_access_to` config is not thread-safe when multiple views need to be authorized for the same dataset. The initial `dbt run` operation after a new `grant_access_to` config is added should therefore be executed in a single thread. Subsequent runs using the same configuration will not attempt to re-apply existing access grants, and can make use of multiple threads. + + +## Materialized views + +The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) +with the following configuration parameters: + +| Parameter | Type | Required | Default | Change Monitoring Support | +|-------------------------------------------------------------|------------------------|----------|---------|---------------------------| +| `on_configuration_change` | `` | no | `apply` | n/a | +| [`cluster_by`](#clustering-clause) | `[]` | no | `none` | drop/create | +| [`partition_by`](#partition-clause) | `{}` | no | `none` | drop/create | +| [`enable_refresh`](#auto-refresh) | `` | no | `true` | alter | +| [`refresh_interval_minutes`](#auto-refresh) | `` | no | `30` | alter | +| [`max_staleness`](#auto-refresh) (in Preview) | `` | no | `none` | alter | +| [`description`](/reference/resource-properties/description) | `` | no | `none` | alter | +| [`labels`](#specifying-labels) | `{: }` | no | `none` | alter | +| [`hours_to_expiration`](#controlling-table-expiration) | `` | no | `none` | alter | +| [`kms_key_name`](#using-kms-encryption) | `` | no | `none` | alter | + + + + + + + + +```yaml +models: + [](/reference/resource-configs/resource-path): + [+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): materialized_view + [+](/reference/resource-configs/plus-prefix)on_configuration_change: apply | continue | fail + [+](/reference/resource-configs/plus-prefix)[cluster_by](#clustering-clause): | [] + [+](/reference/resource-configs/plus-prefix)[partition_by](#partition-clause): + - field: + - data_type: timestamp | date | datetime | int64 + # only if `data_type` is not 'int64' + - granularity: hour | day | month | year + # only if `data_type` is 'int64' + - range: + - start: + - end: + - interval: + [+](/reference/resource-configs/plus-prefix)[enable_refresh](#auto-refresh): true | false + [+](/reference/resource-configs/plus-prefix)[refresh_interval_minutes](#auto-refresh): + [+](/reference/resource-configs/plus-prefix)[max_staleness](#auto-refresh): + [+](/reference/resource-configs/plus-prefix)[description](/reference/resource-properties/description): + [+](/reference/resource-configs/plus-prefix)[labels](#specifying-labels): {: } + [+](/reference/resource-configs/plus-prefix)[hours_to_expiration](#acontrolling-table-expiration): + [+](/reference/resource-configs/plus-prefix)[kms_key_name](##using-kms-encryption): +``` + + + + + + + + + + +```yaml +version: 2 + +models: + - name: [] + config: + [materialized](/reference/resource-configs/materialized): materialized_view + on_configuration_change: apply | continue | fail + [cluster_by](#clustering-clause): | [] + [partition_by](#partition-clause): + - field: + - data_type: timestamp | date | datetime | int64 + # only if `data_type` is not 'int64' + - granularity: hour | day | month | year + # only if `data_type` is 'int64' + - range: + - start: + - end: + - interval: + [enable_refresh](#auto-refresh): true | false + [refresh_interval_minutes](#auto-refresh): + [max_staleness](#auto-refresh): + [description](/reference/resource-properties/description): + [labels](#specifying-labels): {: } + [hours_to_expiration](#acontrolling-table-expiration): + [kms_key_name](##using-kms-encryption): +``` + + + + + + + + + + +```jinja +{{ config( + [materialized](/reference/resource-configs/materialized)='materialized_view', + on_configuration_change="apply" | "continue" | "fail", + [cluster_by](#clustering-clause)="" | [""], + [partition_by](#partition-clause)={ + "field": "", + "data_type": "timestamp" | "date" | "datetime" | "int64", + + # only if `data_type` is not 'int64' + "granularity": "hour" | "day" | "month" | "year, + + # only if `data_type` is 'int64' + "range": { + "start": , + "end": , + "interval": , + } + }, + + # auto-refresh options + [enable_refresh](#auto-refresh)= true | false, + [refresh_interval_minutes](#auto-refresh)=, + [max_staleness](#auto-refresh)="", + + # additional options + [description](/reference/resource-properties/description)="", + [labels](#specifying-labels)={ + "": "", + }, + [hours_to_expiration](#acontrolling-table-expiration)=, + [kms_key_name](##using-kms-encryption)="", +) }} +``` + + + + + + + +Many of these parameters correspond to their table counterparts and have been linked above. +The set of parameters unique to materialized views covers [auto-refresh functionality](#auto-refresh). + +Find more information about these parameters in the BigQuery docs: +- [CREATE MATERIALIZED VIEW statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_materialized_view_statement) +- [materialized_view_option_list](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#materialized_view_option_list) + +### Auto-refresh + +| Parameter | Type | Required | Default | Change Monitoring Support | +|------------------------------|--------------|----------|---------|---------------------------| +| `enable_refresh` | `` | no | `true` | alter | +| `refresh_interval_minutes` | `` | no | `30` | alter | +| `max_staleness` (in Preview) | `` | no | `none` | alter | + +BigQuery supports [automatic refresh](https://cloud.google.com/bigquery/docs/materialized-views-manage#automatic_refresh) configuration for materialized views. +By default, a materialized view will automatically refresh within 5 minutes of changes in the base table, but not more frequently than once every 30 minutes. +BigQuery only officially supports the configuration of the frequency (the "once every 30 minutes" frequency); +however, there is a feature in preview that allows for the configuration of the staleness (the "5 minutes" refresh). +dbt will monitor these parameters for changes and apply them using an `ALTER` statement. + +Find more information about these parameters in the BigQuery docs: +- [materialized_view_option_list](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#materialized_view_option_list) +- [max_staleness](https://cloud.google.com/bigquery/docs/materialized-views-create#max_staleness) + +### Limitations + +As with most data platforms, there are limitations associated with materialized views. Some worth noting include: + +- Materialized view SQL has a [limited feature set](https://cloud.google.com/bigquery/docs/materialized-views-create#supported-mvs). +- Materialized view SQL cannot be updated; the materialized view must go through a `--full-refresh` (DROP/CREATE). +- The `partition_by` clause on a materialized view must match that of the underlying base table. +- While materialized views can have descriptions, materialized view *columns* cannot. +- Recreating/dropping the base table requires recreating/dropping the materialized view. + +Find more information about materialized view limitations in Google's BigQuery [docs](https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations). + + diff --git a/website/docs/reference/resource-configs/database.md b/website/docs/reference/resource-configs/database.md index 7d91358ff01..19c9eca272d 100644 --- a/website/docs/reference/resource-configs/database.md +++ b/website/docs/reference/resource-configs/database.md @@ -70,7 +70,7 @@ This would result in the generated relation being located in the `reporting` dat ## Definition -Optionally specify a custom database for a [model](/docs/build/sql-models), [seed](/docs/build/seeds), or [tests](/docs/build/tests). (To specify a database for a [snapshot](/docs/build/snapshots), use the [`target_database` config](/reference/resource-configs/target_database)). +Optionally specify a custom database for a [model](/docs/build/sql-models), [seed](/docs/build/seeds), or [data test](/docs/build/data-tests). (To specify a database for a [snapshot](/docs/build/snapshots), use the [`target_database` config](/reference/resource-configs/target_database)). When dbt creates a relation (/) in a database, it creates it as: `{{ database }}.{{ schema }}.{{ identifier }}`, e.g. `analytics.finance.payments` diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index a3b00177967..677cad57ce6 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -38,9 +38,9 @@ When materializing a model as `table`, you may include several optional configs ## Incremental models dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs/build/incremental-models#about-incremental_strategy). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of four values: - - **`append`** (default): Insert new records without updating or overwriting any existing data. + - **`append`**: Insert new records without updating or overwriting any existing data. - **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the with new data. If no `partition_by` is specified, overwrite the entire table with new data. - - **`merge`** (Delta and Hudi file format only): Match records based on a `unique_key`, updating old records, and inserting new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) + - **`merge`** (default; Delta and Hudi file format only): Match records based on a `unique_key`, updating old records, and inserting new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) - **`replace_where`** (Delta file format only): Match records based on `incremental_predicates`, replacing all records that match the predicates from the existing table with records matching the predicates from the new data. (If no `incremental_predicates` are specified, all new data is inserted, similar to `append`.) Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. @@ -49,8 +49,6 @@ Each of these strategies has its pros and cons, which we'll discuss below. As wi Following the `append` strategy, dbt will perform an `insert into` statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy _cannot_ update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources. -Specifying `append` as the incremental strategy is optional, since it's the default strategy used when none is specified. - + + +## Selecting compute per model + +Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis. +For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster. +For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models). + +:::note + +This is an optional setting. If you do not configure this as shown below, we will default to the compute specified by http_path in the top level of the output section in your profile. +This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema. + +::: + + +To take advantage of this capability, you will need to add compute blocks to your profile: + + + +```yaml + +: + target: # this is the default target + outputs: + : + type: databricks + catalog: [optional catalog name if you are using Unity Catalog] + schema: [schema name] # Required + host: [yourorg.databrickshost.com] # Required + + ### This path is used as the default compute + http_path: [/sql/your/http/path] # Required + + ### New compute section + compute: + + ### Name that you will use to refer to an alternate compute + Compute1: + http_path: [‘/sql/your/http/path’] # Required of each alternate compute + + ### A third named compute, use whatever name you like + Compute2: + http_path: [‘/some/other/path’] # Required of each alternate compute + ... + + : # additional targets + ... + ### For each target, you need to define the same compute, + ### but you can specify different paths + compute: + + ### Name that you will use to refer to an alternate compute + Compute1: + http_path: [‘/sql/your/http/path’] # Required of each alternate compute + + ### A third named compute, use whatever name you like + Compute2: + http_path: [‘/some/other/path’] # Required of each alternate compute + ... + +``` + + + +The new compute section is a map of user chosen names to objects with an http_path property. +Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. +We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI. + +:::note + +You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios. + +::: + +To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments: + +```yaml + +compute: + Compute1: + http_path:[`/some/other/path'] + Compute2: + http_path:[`/some/other/path'] + +``` + +### Specifying the compute for models + +As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`. +In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory: + + + +```yaml + +... + +models: + +databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project... + my_project: + clickstream: + +databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`. + +snapshots: + +databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`. + +``` + + + +For an individual model the compute can be specified in the model config in your schema file. + + + +```yaml + +models: + - name: table_model + config: + databricks_compute: Compute1 + columns: + - name: id + data_type: int + +``` + + + + +Alternatively the warehouse can be specified in the config block of a model's SQL file. + + + +```sql + +{{ + config( + materialized='table', + databricks_compute='Compute1' + ) +}} +select * from {{ ref('seed') }} + +``` + + + + +To validate that the specified compute is being used, look for lines in your dbt.log like: + +``` +Databricks adapter ... using default compute resource. +``` + +or + +``` +Databricks adapter ... using compute resource . +``` + +### Specifying compute for Python models + +Materializing a python model requires execution of SQL as well as python. +Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL. +The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse. +When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL. +If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model. Please note that declaring a separate SQL compute and a python compute for your python dbt models is optional. If you wish to do this: + + + + ```python + +def model(dbt, session): + dbt.config( + http_path="sql/protocolv1/..." + ) + +``` + + + +If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way. + + ## Persisting model descriptions diff --git a/website/docs/reference/resource-configs/infer-configs.md b/website/docs/reference/resource-configs/infer-configs.md new file mode 100644 index 00000000000..c4837806935 --- /dev/null +++ b/website/docs/reference/resource-configs/infer-configs.md @@ -0,0 +1,39 @@ +--- +title: "Infer configurations" +description: "Read this guide to understand how to configure Infer with dbt." +id: "infer-configs" +--- + + +## Authentication + +To connect to Infer from your dbt instance you need to set up a correct profile in your `profiles.yml`. + +The format of this should look like this: + + + +```yaml +: + target: + outputs: + : + type: infer + url: "" + username: "" + apikey: "" + data_config: + [configuration for your underlying data warehouse] +``` + + + +### Description of Infer Profile Fields + +| Field | Required | Description | +|------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------| +| `type` | Yes | Must be set to `infer`. This must be included either in `profiles.yml` or in the `dbt_project.yml` file. | +| `url` | Yes | The host name of the Infer server to connect to. Typically this is `https://app.getinfer.io`. | +| `username` | Yes | Your Infer username - the one you use to login. | +| `apikey` | Yes | Your Infer api key. | +| `data_config` | Yes | The configuration for your underlying data warehouse. The format of this follows the format of the configuration for your data warehouse adapter. | diff --git a/website/docs/reference/resource-configs/meta.md b/website/docs/reference/resource-configs/meta.md index bc0c0c7c041..bd9755548ab 100644 --- a/website/docs/reference/resource-configs/meta.md +++ b/website/docs/reference/resource-configs/meta.md @@ -127,7 +127,7 @@ See [configs and properties](/reference/configs-and-properties) for details. -You can't add YAML `meta` configs for [generic tests](/docs/build/tests#generic-tests). However, you can add `meta` properties to [singular tests](/docs/build/tests#singular-tests) using `config()` at the top of the test file. +You can't add YAML `meta` configs for [generic tests](/docs/build/data-tests#generic-data-tests). However, you can add `meta` properties to [singular tests](/docs/build/data-tests#singular-data-tests) using `config()` at the top of the test file. diff --git a/website/docs/reference/resource-configs/postgres-configs.md b/website/docs/reference/resource-configs/postgres-configs.md index 97a695ee12e..fcc0d91a47c 100644 --- a/website/docs/reference/resource-configs/postgres-configs.md +++ b/website/docs/reference/resource-configs/postgres-configs.md @@ -10,16 +10,16 @@ In dbt-postgres, the following incremental materialization strategies are suppor -- `append` (default) -- `delete+insert` +- `append` (default when `unique_key` is not defined) +- `delete+insert` (default when `unique_key` is defined) -- `append` (default) +- `append` (default when `unique_key` is not defined) - `merge` -- `delete+insert` +- `delete+insert` (default when `unique_key` is defined) @@ -108,21 +108,100 @@ models: ## Materialized views -The Postgres adapter supports [materialized views](https://www.postgresql.org/docs/current/rules-materializedviews.html). -Indexes are the only configuration that is specific to `dbt-postgres`. -The remaining configuration follows the general [materialized view](/docs/build/materializations#materialized-view) configuration. -There are also some limitations that we hope to address in the next version. +The Postgres adapter supports [materialized views](https://www.postgresql.org/docs/current/rules-materializedviews.html) +with the following configuration parameters: -### Monitored configuration changes +| Parameter | Type | Required | Default | Change Monitoring Support | +|---------------------------|--------------------|----------|---------|---------------------------| +| `on_configuration_change` | `` | no | `apply` | n/a | +| [`indexes`](#indexes) | `[{}]` | no | `none` | alter | -The settings below are monitored for changes applicable to `on_configuration_change`. + -#### Indexes -Index changes (`CREATE`, `DROP`) can be applied without the need to rebuild the materialized view. -This differs from a table model, where the table needs to be dropped and re-created to update the indexes. -If the `indexes` portion of the `config` block is updated, the changes will be detected and applied -directly to the materialized view in place. + + + + +```yaml +models: + [](/reference/resource-configs/resource-path): + [+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): materialized_view + [+](/reference/resource-configs/plus-prefix)on_configuration_change: apply | continue | fail + [+](/reference/resource-configs/plus-prefix)[indexes](#indexes): + - columns: [] + unique: true | false + type: hash | btree +``` + + + + + + + + + + +```yaml +version: 2 + +models: + - name: [] + config: + [materialized](/reference/resource-configs/materialized): materialized_view + on_configuration_change: apply | continue | fail + [indexes](#indexes): + - columns: [] + unique: true | false + type: hash | btree +``` + + + + + + + + + + +```jinja +{{ config( + [materialized](/reference/resource-configs/materialized)="materialized_view", + on_configuration_change="apply" | "continue" | "fail", + [indexes](#indexes)=[ + { + "columns": [""], + "unique": true | false, + "type": "hash" | "btree", + } + ] +) }} +``` + + + + + + + +The [`indexes`](#indexes) parameter corresponds to that of a table, as explained above. +It's worth noting that, unlike tables, dbt monitors this parameter for changes and applies the changes without dropping the materialized view. +This happens via a `DROP/CREATE` of the indexes, which can be thought of as an `ALTER` of the materialized view. + +Find more information about materialized view parameters in the Postgres docs: +- [CREATE MATERIALIZED VIEW](https://www.postgresql.org/docs/current/sql-creatematerializedview.html) + + ### Limitations @@ -138,3 +217,5 @@ If the user changes the model's config to `materialized="materialized_view"`, th The solution is to execute `DROP TABLE my_model` on the data warehouse before trying the model again. + + diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index 9bd127a1e1a..85b2af0c552 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -16,16 +16,16 @@ In dbt-redshift, the following incremental materialization strategies are suppor -- `append` (default) -- `delete+insert` - +- `append` (default when `unique_key` is not defined) +- `delete+insert` (default when `unique_key` is defined) + -- `append` (default) +- `append` (default when `unique_key` is not defined) - `merge` -- `delete+insert` +- `delete+insert` (default when `unique_key` is defined) @@ -111,40 +111,138 @@ models: ## Materialized views -The Redshift adapter supports [materialized views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html). -Redshift-specific configuration includes the typical `dist`, `sort_type`, `sort`, and `backup`. -For materialized views, there is also the `auto_refresh` setting, which allows Redshift to [automatically refresh](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh.html) the materialized view for you. -The remaining configuration follows the general [materialized view](/docs/build/materializations#Materialized-View) configuration. -There are also some limitations that we hope to address in the next version. +The Redshift adapter supports [materialized views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html) +with the following configuration parameters: + +| Parameter | Type | Required | Default | Change Monitoring Support | +|-------------------------------------------|--------------|----------|------------------------------------------------|---------------------------| +| `on_configuration_change` | `` | no | `apply` | n/a | +| [`dist`](#using-sortkey-and-distkey) | `` | no | `even` | drop/create | +| [`sort`](#using-sortkey-and-distkey) | `[]` | no | `none` | drop/create | +| [`sort_type`](#using-sortkey-and-distkey) | `` | no | `auto` if no `sort`
`compound` if `sort` | drop/create | +| [`auto_refresh`](#auto-refresh) | `` | no | `false` | alter | +| [`backup`](#backup) | `` | no | `true` | n/a | + + + + + + + + +```yaml +models: + [](/reference/resource-configs/resource-path): + [+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): materialized_view + [+](/reference/resource-configs/plus-prefix)on_configuration_change: apply | continue | fail + [+](/reference/resource-configs/plus-prefix)[dist](#using-sortkey-and-distkey): all | auto | even | + [+](/reference/resource-configs/plus-prefix)[sort](#using-sortkey-and-distkey): | [] + [+](/reference/resource-configs/plus-prefix)[sort_type](#using-sortkey-and-distkey): auto | compound | interleaved + [+](/reference/resource-configs/plus-prefix)[auto_refresh](#auto-refresh): true | false + [+](/reference/resource-configs/plus-prefix)[backup](#backup): true | false +``` + + + + + + + + + + +```yaml +version: 2 + +models: + - name: [] + config: + [materialized](/reference/resource-configs/materialized): materialized_view + on_configuration_change: apply | continue | fail + [dist](#using-sortkey-and-distkey): all | auto | even | + [sort](#using-sortkey-and-distkey): | [] + [sort_type](#using-sortkey-and-distkey): auto | compound | interleaved + [auto_refresh](#auto-refresh): true | false + [backup](#backup): true | false +``` + + + + + + + + + + +```jinja +{{ config( + [materialized](/reference/resource-configs/materialized)="materialized_view", + on_configuration_change="apply" | "continue" | "fail", + [dist](#using-sortkey-and-distkey)="all" | "auto" | "even" | "", + [sort](#using-sortkey-and-distkey)=[""], + [sort_type](#using-sortkey-and-distkey)="auto" | "compound" | "interleaved", + [auto_refresh](#auto-refresh)=true | false, + [backup](#backup)=true | false, +) }} +``` -### Monitored configuration changes + + + + + -The settings below are monitored for changes applicable to `on_configuration_change`. +Many of these parameters correspond to their table counterparts and have been linked above. +The parameters unique to materialized views are the [auto-refresh](#auto-refresh) and [backup](#backup) functionality, which are covered below. -#### Dist +Find more information about the [CREATE MATERIALIZED VIEW](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html) parameters in the Redshift docs. -Changes to `dist` will result in a full refresh of the existing materialized view (applied at the time of the next `dbt run` of the model). Redshift requires a materialized view to be -dropped and recreated to apply a change to the `distkey` or `diststyle`. +#### Auto-refresh -#### Sort type, sort +| Parameter | Type | Required | Default | Change Monitoring Support | +|----------------|-------------|----------|---------|---------------------------| +| `auto_refresh` | `` | no | `false` | alter | -Changes to `sort_type` or `sort` will result in a full refresh. Redshift requires a materialized -view to be dropped and recreated to apply a change to the `sortkey` or `sortstyle`. +Redshift supports [automatic refresh](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh.html#materialized-view-auto-refresh) configuration for materialized views. +By default, a materialized view does not automatically refresh. +dbt monitors this parameter for changes and applies them using an `ALTER` statement. + +Learn more information about the [parameters](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html#mv_CREATE_MATERIALIZED_VIEW-parameters) in the Redshift docs. #### Backup -Changes to `backup` will result in a full refresh. Redshift requires a materialized -view to be dropped and recreated to apply a change to the `backup` setting. +| Parameter | Type | Required | Default | Change Monitoring Support | +|-----------|-------------|----------|---------|---------------------------| +| `backup` | `` | no | `true` | n/a | -#### Auto refresh +Redshift supports [backup](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-snapshots.html) configuration of clusters at the object level. +This parameter identifies if the materialized view should be backed up as part of the cluster snapshot. +By default, a materialized view will be backed up during a cluster snapshot. +dbt cannot monitor this parameter as it is not queryable within Redshift. +If the value is changed, the materialized view will need to go through a `--full-refresh` in order to set it. -The `auto_refresh` setting can be updated via an `ALTER` statement. This setting effectively toggles -automatic refreshes on or off. The default setting for this config is off (`False`). If this -is the only configuration change for the materialized view, dbt will choose to apply -an `ALTER` statement instead of issuing a full refresh, +Learn more about the [parameters](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html#mv_CREATE_MATERIALIZED_VIEW-parameters) in the Redshift docs. ### Limitations +As with most data platforms, there are limitations associated with materialized views. Some worth noting include: + +- Materialized views cannot reference views, temporary tables, user-defined functions, or late-binding tables. +- Auto-refresh cannot be used if the materialized view references mutable functions, external schemas, or another materialized view. + +Find more information about materialized view limitations in Redshift's [docs](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html#mv_CREATE_MATERIALIZED_VIEW-limitations). + + + #### Changing materialization from "materialized_view" to "table" or "view" Swapping a materialized view to a table or view is not supported. @@ -157,3 +255,5 @@ If the user changes the model's config to `materialized="table"`, they will get The workaround is to execute `DROP MATERIALIZED VIEW my_mv CASCADE` on the data warehouse before trying the model again. + + diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index 30c7966ec68..aa2bf370df6 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -346,82 +346,122 @@ In the configuration format for the model SQL file: ## Dynamic tables -The Snowflake adapter supports [dynamic tables](https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table). +The Snowflake adapter supports [dynamic tables](https://docs.snowflake.com/en/user-guide/dynamic-tables-about). This materialization is specific to Snowflake, which means that any model configuration that would normally come along for the ride from `dbt-core` (e.g. as with a `view`) may not be available for dynamic tables. This gap will decrease in future patches and versions. While this materialization is specific to Snowflake, it very much follows the implementation of [materialized views](/docs/build/materializations#Materialized-View). In particular, dynamic tables have access to the `on_configuration_change` setting. -There are also some limitations that we hope to address in the next version. +Dynamic tables are supported with the following configuration parameters: -### Parameters +| Parameter | Type | Required | Default | Change Monitoring Support | +|----------------------------------------------------------|------------|----------|---------|---------------------------| +| `on_configuration_change` | `` | no | `apply` | n/a | +| [`target_lag`](#target-lag) | `` | yes | | alter | +| [`snowflake_warehouse`](#configuring-virtual-warehouses) | `` | yes | | alter | -Dynamic tables in `dbt-snowflake` require the following parameters: -- `target_lag` -- `snowflake_warehouse` -- `on_configuration_change` + -To learn more about each parameter and what values it can take, see -the Snowflake docs page: [`CREATE DYNAMIC TABLE: Parameters`](https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table). -### Usage + -You can create a dynamic table by editing _one_ of these files: + -- the SQL file for your model -- the `dbt_project.yml` configuration file +```yaml +models: + [](/reference/resource-configs/resource-path): + [+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): dynamic_table + [+](/reference/resource-configs/plus-prefix)on_configuration_change: apply | continue | fail + [+](/reference/resource-configs/plus-prefix)[target_lag](#target-lag): downstream | + [+](/reference/resource-configs/plus-prefix)[snowflake_warehouse](#configuring-virtual-warehouses): +``` -The following examples create a dynamic table: + - + -```sql -{{ config( - materialized = 'dynamic_table', - snowflake_warehouse = 'snowflake_warehouse', - target_lag = '10 minutes', -) }} -``` -
+ - + ```yaml +version: 2 + models: - path: - materialized: dynamic_table - snowflake_warehouse: snowflake_warehouse - target_lag: '10 minutes' + - name: [] + config: + [materialized](/reference/resource-configs/materialized): dynamic_table + on_configuration_change: apply | continue | fail + [target_lag](#target-lag): downstream | + [snowflake_warehouse](#configuring-virtual-warehouses): ``` -### Monitored configuration changes + -The settings below are monitored for changes applicable to `on_configuration_change`. -#### Target lag + -Changes to `target_lag` can be applied by running an `ALTER` statement. Refreshing is essentially -always on for dynamic tables; this setting changes how frequently the dynamic table is updated. + -#### Warehouse +```jinja +{{ config( + [materialized](/reference/resource-configs/materialized)="dynamic_table", + on_configuration_change="apply" | "continue" | "fail", + [target_lag](#target-lag)="downstream" | " seconds | minutes | hours | days", + [snowflake_warehouse](#configuring-virtual-warehouses)="", +) }} +``` + + + + + + + +Find more information about these parameters in Snowflake's [docs](https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table): -Changes to `snowflake_warehouse` can be applied via an `ALTER` statement. +### Target lag + +Snowflake allows two configuration scenarios for scheduling automatic refreshes: +- **Time-based** — Provide a value of the form ` { seconds | minutes | hours | days }`. For example, if the dynamic table needs to be updated every 30 minutes, use `target_lag='30 minutes'`. +- **Downstream** — Applicable when the dynamic table is referenced by other dynamic tables. In this scenario, `target_lag='downstream'` allows for refreshes to be controlled at the target, instead of at each layer. + +Find more information about `target_lag` in Snowflake's [docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh#understanding-target-lag). ### Limitations +As with materialized views on most data platforms, there are limitations associated with dynamic tables. Some worth noting include: + +- Dynamic table SQL has a [limited feature set](https://docs.snowflake.com/en/user-guide/dynamic-tables-tasks-create#query-constructs-not-currently-supported-in-dynamic-tables). +- Dynamic table SQL cannot be updated; the dynamic table must go through a `--full-refresh` (DROP/CREATE). +- Dynamic tables cannot be downstream from: materialized views, external tables, streams. +- Dynamic tables cannot reference a view that is downstream from another dynamic table. + +Find more information about dynamic table limitations in Snowflake's [docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-tasks-create#dynamic-table-limitations-and-supported-functions). + + + #### Changing materialization to and from "dynamic_table" -Swapping an already materialized model to be a dynamic table and vice versa. -The workaround is manually dropping the existing materialization in the data warehouse prior to calling `dbt run`. -Normally, re-running with the `--full-refresh` flag would resolve this, but not in this case. -This would only need to be done once as the existing object would then be a dynamic table. +Version `1.6.x` does not support altering the materialization from a non-dynamic table be a dynamic table and vice versa. +Re-running with the `--full-refresh` does not resolve this either. +The workaround is manually dropping the existing model in the warehouse prior to calling `dbt run`. +This only needs to be done once for the conversion. For example, assume for the example model below, `my_model`, has already been materialized to the underlying data platform via `dbt run`. -If the user changes the model's config to `materialized="dynamic_table"`, they will get an error. +If the model config is updated to `materialized="dynamic_table"`, dbt will return an error. The workaround is to execute `DROP TABLE my_model` on the data warehouse before trying the model again. @@ -429,7 +469,7 @@ The workaround is to execute `DROP TABLE my_model` on the data warehouse before ```yaml {{ config( - materialized="table" # or any model type eg view, incremental + materialized="table" # or any model type (e.g. view, incremental) ) }} ``` @@ -437,3 +477,5 @@ The workaround is to execute `DROP TABLE my_model` on the data warehouse before + + diff --git a/website/docs/reference/resource-configs/store_failures_as.md b/website/docs/reference/resource-configs/store_failures_as.md index a9149360089..dd61030afb8 100644 --- a/website/docs/reference/resource-configs/store_failures_as.md +++ b/website/docs/reference/resource-configs/store_failures_as.md @@ -17,7 +17,7 @@ You can configure it in all the same places as `store_failures`, including singu #### Singular test -[Singular test](https://docs.getdbt.com/docs/build/tests#singular-tests) in `tests/singular/check_something.sql` file +[Singular test](https://docs.getdbt.com/docs/build/tests#singular-data-tests) in `tests/singular/check_something.sql` file ```sql {{ config(store_failures_as="table") }} @@ -29,7 +29,7 @@ where 1=0 #### Generic test -[Generic tests](https://docs.getdbt.com/docs/build/tests#generic-tests) in `models/_models.yml` file +[Generic tests](https://docs.getdbt.com/docs/build/tests#generic-data-tests) in `models/_models.yml` file ```yaml models: @@ -70,7 +70,7 @@ As with most other configurations, `store_failures_as` is "clobbered" when appli Additional resources: -- [Test configurations](/reference/test-configs#related-documentation) -- [Test-specific configurations](/reference/test-configs#test-specific-configurations) +- [Data test configurations](/reference/data-test-configs#related-documentation) +- [Data test-specific configurations](/reference/data-test-configs#test-data-specific-configurations) - [Configuring directories of models in dbt_project.yml](/reference/model-configs#configuring-directories-of-models-in-dbt_projectyml) - [Config inheritance](/reference/configs-and-properties#config-inheritance) \ No newline at end of file diff --git a/website/docs/reference/resource-properties/columns.md b/website/docs/reference/resource-properties/columns.md index ff8aa8734c6..74727977feb 100644 --- a/website/docs/reference/resource-properties/columns.md +++ b/website/docs/reference/resource-properties/columns.md @@ -28,7 +28,7 @@ models: data_type: [description](/reference/resource-properties/description): [quote](/reference/resource-properties/quote): true | false - [tests](/reference/resource-properties/tests): ... + [tests](/reference/resource-properties/data-tests): ... [tags](/reference/resource-configs/tags): ... [meta](/reference/resource-configs/meta): ... - name: @@ -55,7 +55,7 @@ sources: [description](/reference/resource-properties/description): data_type: [quote](/reference/resource-properties/quote): true | false - [tests](/reference/resource-properties/tests): ... + [tests](/reference/resource-properties/data-tests): ... [tags](/reference/resource-configs/tags): ... [meta](/reference/resource-configs/meta): ... - name: @@ -81,7 +81,7 @@ seeds: [description](/reference/resource-properties/description): data_type: [quote](/reference/resource-properties/quote): true | false - [tests](/reference/resource-properties/tests): ... + [tests](/reference/resource-properties/data-tests): ... [tags](/reference/resource-configs/tags): ... [meta](/reference/resource-configs/meta): ... - name: @@ -106,7 +106,7 @@ snapshots: [description](/reference/resource-properties/description): data_type: [quote](/reference/resource-properties/quote): true | false - [tests](/reference/resource-properties/tests): ... + [tests](/reference/resource-properties/data-tests): ... [tags](/reference/resource-configs/tags): ... [meta](/reference/resource-configs/meta): ... - name: diff --git a/website/docs/reference/resource-properties/config.md b/website/docs/reference/resource-properties/config.md index 55d2f64d9ff..89d189d8a78 100644 --- a/website/docs/reference/resource-properties/config.md +++ b/website/docs/reference/resource-properties/config.md @@ -98,7 +98,7 @@ version: 2 - [](#test_name): : config: - [](/reference/test-configs): + [](/reference/data-test-configs): ... ``` diff --git a/website/docs/reference/resource-properties/tests.md b/website/docs/reference/resource-properties/data-tests.md similarity index 83% rename from website/docs/reference/resource-properties/tests.md rename to website/docs/reference/resource-properties/data-tests.md index 0fe86ccc57d..ce557ebeb4f 100644 --- a/website/docs/reference/resource-properties/tests.md +++ b/website/docs/reference/resource-properties/data-tests.md @@ -1,8 +1,8 @@ --- -title: "About tests property" -sidebar_label: "tests" +title: "About data tests property" +sidebar_label: "Data tests" resource_types: all -datatype: test +datatype: data-test keywords: [test, tests, custom tests, custom test name, test name] --- @@ -30,7 +30,7 @@ models: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): [columns](/reference/resource-properties/columns): - name: @@ -39,7 +39,7 @@ models: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): ```
@@ -62,7 +62,7 @@ sources: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): columns: - name: @@ -71,7 +71,7 @@ sources: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): ``` @@ -93,7 +93,7 @@ seeds: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): columns: - name: @@ -102,7 +102,7 @@ seeds: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): ``` @@ -124,7 +124,7 @@ snapshots: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): columns: - name: @@ -133,7 +133,7 @@ snapshots: - [](#test_name): : [config](/reference/resource-properties/config): - [](/reference/test-configs): + [](/reference/data-test-configs): ``` @@ -152,17 +152,17 @@ This feature is not implemented for analyses. ## Related documentation -* [Testing guide](/docs/build/tests) +* [Data testing guide](/docs/build/data-tests) ## Description -The `tests` property defines assertions about a column, , or . The property contains a list of [generic tests](/docs/build/tests#generic-tests), referenced by name, which can include the four built-in generic tests available in dbt. For example, you can add tests that ensure a column contains no duplicates and zero null values. Any arguments or [configurations](/reference/test-configs) passed to those tests should be nested below the test name. +The data `tests` property defines assertions about a column, , or . The property contains a list of [generic tests](/docs/build/data-tests#generic-data-tests), referenced by name, which can include the four built-in generic tests available in dbt. For example, you can add tests that ensure a column contains no duplicates and zero null values. Any arguments or [configurations](/reference/data-test-configs) passed to those tests should be nested below the test name. Once these tests are defined, you can validate their correctness by running `dbt test`. -## Out-of-the-box tests +## Out-of-the-box data tests -There are four generic tests that are available out of the box, for everyone using dbt. +There are four generic data tests that are available out of the box, for everyone using dbt. ### `not_null` @@ -262,7 +262,7 @@ The `to` argument accepts a [Relation](/reference/dbt-classes#relation) – this ## Additional examples ### Test an expression -Some tests require multiple columns, so it doesn't make sense to nest them under the `columns:` key. In this case, you can apply the test to the model (or source, seed, or snapshot) instead: +Some data tests require multiple columns, so it doesn't make sense to nest them under the `columns:` key. In this case, you can apply the data test to the model (or source, seed, or snapshot) instead: @@ -300,7 +300,7 @@ models: Check out the guide on writing a [custom generic test](/best-practices/writing-custom-generic-tests) for more information. -### Custom test name +### Custom data test name By default, dbt will synthesize a name for your generic test by concatenating: - test name (`not_null`, `unique`, etc) @@ -434,11 +434,11 @@ $ dbt test 12:48:04 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2 ``` -**If using [`store_failures`](/reference/resource-configs/store_failures):** dbt uses each test's name as the name of the table in which to store any failing records. If you have defined a custom name for one test, that custom name will also be used for its table of failures. You may optionally configure an [`alias`](/reference/resource-configs/alias) for the test, to separately control both the name of the test (for metadata) and the name of its database table (for storing failures). +**If using [`store_failures`](/reference/resource-configs/store_failures):** dbt uses each data test's name as the name of the table in which to store any failing records. If you have defined a custom name for one test, that custom name will also be used for its table of failures. You may optionally configure an [`alias`](/reference/resource-configs/alias) for the test, to separately control both the name of the test (for metadata) and the name of its database table (for storing failures). ### Alternative format for defining tests -When defining a generic test with several arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same test properties as top-level keys of a single dictionary, by providing the test name as `test_name` instead. It's totally up to you. +When defining a generic data test with several arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same test properties as top-level keys of a single dictionary, by providing the test name as `test_name` instead. It's totally up to you. This example is identical to the one above: diff --git a/website/docs/reference/seed-properties.md b/website/docs/reference/seed-properties.md index 85e7be21ae1..9201df65f4c 100644 --- a/website/docs/reference/seed-properties.md +++ b/website/docs/reference/seed-properties.md @@ -18,7 +18,7 @@ seeds: show: true | false [config](/reference/resource-properties/config): [](/reference/seed-configs): - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - ... # declare additional tests columns: @@ -27,7 +27,7 @@ seeds: [meta](/reference/resource-configs/meta): {} [quote](/reference/resource-properties/quote): true | false [tags](/reference/resource-configs/tags): [] - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - ... # declare additional tests diff --git a/website/docs/reference/snapshot-properties.md b/website/docs/reference/snapshot-properties.md index 301747e9325..8f01fd8e988 100644 --- a/website/docs/reference/snapshot-properties.md +++ b/website/docs/reference/snapshot-properties.md @@ -22,7 +22,7 @@ snapshots: show: true | false [config](/reference/resource-properties/config): [](/reference/snapshot-configs): - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - ... columns: @@ -31,7 +31,7 @@ snapshots: [meta](/reference/resource-configs/meta): {} [quote](/reference/resource-properties/quote): true | false [tags](/reference/resource-configs/tags): [] - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - ... # declare additional tests - ... # declare properties of additional columns diff --git a/website/docs/reference/source-properties.md b/website/docs/reference/source-properties.md index d107881967e..aa95a19327c 100644 --- a/website/docs/reference/source-properties.md +++ b/website/docs/reference/source-properties.md @@ -57,7 +57,7 @@ sources: [meta](/reference/resource-configs/meta): {} [identifier](/reference/resource-properties/identifier): [loaded_at_field](/reference/resource-properties/freshness#loaded_at_field): - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - ... # declare additional tests [tags](/reference/resource-configs/tags): [] @@ -80,7 +80,7 @@ sources: [description](/reference/resource-properties/description): [meta](/reference/resource-configs/meta): {} [quote](/reference/resource-properties/quote): true | false - [tests](/reference/resource-properties/tests): + [tests](/reference/resource-properties/data-tests): - - ... # declare additional tests [tags](/reference/resource-configs/tags): [] diff --git a/website/docs/terms/data-wrangling.md b/website/docs/terms/data-wrangling.md index 58034fe8e91..46a14a25949 100644 --- a/website/docs/terms/data-wrangling.md +++ b/website/docs/terms/data-wrangling.md @@ -51,7 +51,7 @@ The cleaning stage involves using different functions so that the values in your - Removing appropriate duplicates or nulls you found in the discovery process - Eliminating unnecessary characters or spaces from values -Certain cleaning steps, like removing rows with null values, are helpful to do at the beginning of the process because removing nulls and duplicates from the start can increase the performance of your downstream models. In the cleaning step, it’s important to follow a standard for your transformations here. This means you should be following a consistent naming convention for your columns (especially for your primary keys) and casting to the same timezone and datatypes throughout your models. Examples include making sure all dates are in UTC time rather than source timezone-specific, all string in either lower or upper case, etc. +Certain cleaning steps, like removing rows with null values, are helpful to do at the beginning of the process because removing nulls and duplicates from the start can increase the performance of your downstream models. In the cleaning step, it’s important to follow a standard for your transformations here. This means you should be following a consistent naming convention for your columns (especially for your primary keys) and casting to the same timezone and datatypes throughout your models. Examples include making sure all dates are in UTC time rather than source timezone-specific, all strings are in either lower or upper case, etc. :::tip dbt to the rescue! If you're struggling to do all the cleaning on your own, remember that dbt packages ([dbt expectations](https://github.com/calogica/dbt-expectations), [dbt_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/), and [re_data](https://www.getre.io/)) and their macros are also available to help you clean up your data. @@ -150,9 +150,9 @@ For nested data types such as JSON, you’ll want to check out the JSON parsing ### Validating -dbt offers [generic tests](/docs/build/tests#more-generic-tests) in every dbt project that allows you to validate accepted, unique, and null values. They also allow you to validate the relationships between tables and that the primary key is unique. +dbt offers [generic data tests](/docs/build/data-tests#more-generic-data-tests) in every dbt project that allows you to validate accepted, unique, and null values. They also allow you to validate the relationships between tables and that the primary key is unique. -If you can’t find what you need with the generic tests, you can download an additional dbt testing package called [dbt_expectations](https://hub.getdbt.com/calogica/dbt_expectations/0.1.2/) that dives even deeper into how you can test the values in your columns. This package has useful tests like `expect_column_values_to_be_in_type_list`, `expect_column_values_to_be_between`, and `expect_column_value_lengths_to_equal`. +If you can’t find what you need with the generic tests, you can download an additional dbt testing package called [dbt_expectations](https://hub.getdbt.com/calogica/dbt_expectations/0.1.2/) that dives even deeper into how you can test the values in your columns. This package has useful data tests like `expect_column_values_to_be_in_type_list`, `expect_column_values_to_be_between`, and `expect_column_value_lengths_to_equal`. ## Conclusion diff --git a/website/docs/terms/primary-key.md b/website/docs/terms/primary-key.md index 4acd1e8c46d..fde3ff44ac7 100644 --- a/website/docs/terms/primary-key.md +++ b/website/docs/terms/primary-key.md @@ -108,7 +108,7 @@ In general for Redshift, it’s still good practice to define your primary keys ### Google BigQuery -BigQuery is pretty unique here in that it doesn’t support or enforce primary keys. If your team is on BigQuery, you’ll need to have some [pretty solid testing](/docs/build/tests) in place to ensure your primary key fields are unique and non-null. +BigQuery is pretty unique here in that it doesn’t support or enforce primary keys. If your team is on BigQuery, you’ll need to have some [pretty solid data testing](/docs/build/data-tests) in place to ensure your primary key fields are unique and non-null. ### Databricks @@ -141,7 +141,7 @@ If you don't have a field in your table that would act as a natural primary key, If your data warehouse doesn’t provide out-of-the box support and enforcement for primary keys, it’s important to clearly label and put your own constraints on primary key fields. This could look like: * **Creating a consistent naming convention for your primary keys**: You may see an `id` field or fields prefixed with `pk_` (ex. `pk_order_id`) to identify primary keys. You may also see the primary key be named as the obvious table grain (ex. In the jaffle shop’s `orders` table, the primary key is called `order_id`). -* **Adding automated [tests](/docs/build/tests) to your data models**: Use a data tool, such as dbt, to create not null and unique tests for your primary key fields. +* **Adding automated [data tests](/docs/build/data-tests) to your data models**: Use a data tool, such as dbt, to create not null and unique tests for your primary key fields. ## Testing primary keys diff --git a/website/docs/terms/surrogate-key.md b/website/docs/terms/surrogate-key.md index e57a0b74a7f..1c4d7f21d57 100644 --- a/website/docs/terms/surrogate-key.md +++ b/website/docs/terms/surrogate-key.md @@ -177,7 +177,7 @@ After executing this, the table would now have the `unique_id` field now uniquel Amazing, you just made a surrogate key! You can just move on to the next data model, right? No!! It’s critically important to test your surrogate keys for uniqueness and non-null values to ensure that the correct fields were chosen to create the surrogate key. -In order to test for null and unique values you can utilize code-based tests like [dbt tests](/docs/build/tests), that can check fields for nullness and uniqueness. You can additionally utilize simple SQL queries or unit tests to check if surrogate key count and non-nullness is correct. +In order to test for null and unique values you can utilize code-based data tests like [dbt tests](/docs/build/data-tests), that can check fields for nullness and uniqueness. You can additionally utilize simple SQL queries or unit tests to check if surrogate key count and non-nullness is correct. ## A note on hashing algorithms diff --git a/website/sidebars.js b/website/sidebars.js index 598fffc7f0d..8d7be07d491 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -277,7 +277,7 @@ const sidebarSettings = { }, "docs/build/snapshots", "docs/build/seeds", - "docs/build/tests", + "docs/build/data-tests", "docs/build/jinja-macros", "docs/build/sources", "docs/build/exposures", @@ -723,6 +723,7 @@ const sidebarSettings = { "reference/resource-configs/oracle-configs", "reference/resource-configs/upsolver-configs", "reference/resource-configs/starrocks-configs", + "reference/resource-configs/infer-configs", ], }, { @@ -743,7 +744,7 @@ const sidebarSettings = { "reference/resource-properties/latest_version", "reference/resource-properties/include-exclude", "reference/resource-properties/quote", - "reference/resource-properties/tests", + "reference/resource-properties/data-tests", "reference/resource-properties/versions", ], }, @@ -809,7 +810,7 @@ const sidebarSettings = { type: "category", label: "For tests", items: [ - "reference/test-configs", + "reference/data-test-configs", "reference/resource-configs/fail_calc", "reference/resource-configs/limit", "reference/resource-configs/severity", diff --git a/website/snippets/_run-result.md b/website/snippets/_run-result.md index 77a35676e86..28de3a97cb6 100644 --- a/website/snippets/_run-result.md +++ b/website/snippets/_run-result.md @@ -1,2 +1,2 @@ -- `adapter_response`: Dictionary of metadata returned from the database, which varies by adapter. For example, success `code`, number of `rows_affected`, total `bytes_processed`, and so on. Not applicable for [tests](/docs/build/tests). +- `adapter_response`: Dictionary of metadata returned from the database, which varies by adapter. For example, success `code`, number of `rows_affected`, total `bytes_processed`, and so on. Not applicable for [tests](/docs/build/data-tests). * `rows_affected` returns the number of rows modified by the last statement executed. In cases where the query's row count can't be determined or isn't applicable (such as when creating a view), a [standard value](https://peps.python.org/pep-0249/#rowcount) of `-1` is returned for `rowcount`. diff --git a/website/snippets/_sl-faqs.md b/website/snippets/_sl-faqs.md index 5bc556ae00a..def8f3837f6 100644 --- a/website/snippets/_sl-faqs.md +++ b/website/snippets/_sl-faqs.md @@ -10,6 +10,11 @@ As we refine MetricFlow’s API layers, some users may find it easier to set up their own custom service layers for managing query requests. This is not currently recommended, as the API boundaries around MetricFlow are not sufficiently well-defined for broad-based community use +- **Why is my query limited to 100 rows in the dbt Cloud CLI?** +- The default `limit` for query issues from the dbt Cloud CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt Cloud CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data. + + However, you can change this limit if needed by setting the `--limit` option in your query. For example, to return 1000 rows, you can run `dbt sl list metrics --limit 1000`. + - **Can I reference MetricFlow queries inside dbt models?** - dbt relies on Jinja macros to compile SQL, while MetricFlow is Python-based and does direct SQL rendering targeting at a specific dialect. MetricFlow does not support pass-through rendering of Jinja macros, so we can’t easily reference MetricFlow queries inside of dbt models. diff --git a/website/snippets/tutorial-add-tests-to-models.md b/website/snippets/tutorial-add-tests-to-models.md index 491fc72ba85..f743c2bf947 100644 --- a/website/snippets/tutorial-add-tests-to-models.md +++ b/website/snippets/tutorial-add-tests-to-models.md @@ -1,4 +1,4 @@ -Adding [tests](/docs/build/tests) to a project helps validate that your models are working correctly. +Adding [tests](/docs/build/data-tests) to a project helps validate that your models are working correctly. To add tests to your project: diff --git a/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-dag.png b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-dag.png new file mode 100644 index 00000000000..c053e595b77 Binary files /dev/null and b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-dag.png differ diff --git a/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-gsheets.png b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-gsheets.png new file mode 100644 index 00000000000..b34fb11a678 Binary files /dev/null and b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-gsheets.png differ diff --git a/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-hex.png b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-hex.png new file mode 100644 index 00000000000..16affa42ea3 Binary files /dev/null and b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-hex.png differ diff --git a/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-metrics-dag.png b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-metrics-dag.png new file mode 100644 index 00000000000..ed141da4679 Binary files /dev/null and b/website/static/img/blog/2023-12-11-semantic-layer-on-semantic-layer/Screenshot-metrics-dag.png differ diff --git a/website/static/img/blog/authors/jordan.jpeg b/website/static/img/blog/authors/jordan.jpeg new file mode 100644 index 00000000000..16252e3161f Binary files /dev/null and b/website/static/img/blog/authors/jordan.jpeg differ diff --git a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg b/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg index 1239ea784de..04ec9280f14 100644 Binary files a/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg and b/website/static/img/docs/dbt-cloud/using-dbt-cloud/prod-settings.jpg differ diff --git a/website/static/img/guides/orchestration/airflow-and-dbt-cloud/connection-type-configured.png b/website/static/img/guides/orchestration/airflow-and-dbt-cloud/connection-type-configured.png new file mode 100644 index 00000000000..f77d3e188fc Binary files /dev/null and b/website/static/img/guides/orchestration/airflow-and-dbt-cloud/connection-type-configured.png differ diff --git a/website/vercel.json b/website/vercel.json index 3377b49278d..5cdc2656948 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -2,6 +2,21 @@ "cleanUrls": true, "trailingSlash": false, "redirects": [ + { + "source": "/reference/test-configs", + "destination": "/reference/data-test-configs", + "permanent": true + }, + { + "source": "/reference/resource-properties/tests", + "destination": "/reference/resource-properties/data-tests", + "permanent": true + }, + { + "source": "/docs/build/tests", + "destination": "/docs/build/data-tests", + "permanent": true + }, { "source": "/docs/cloud/dbt-cloud-ide", "destination": "/docs/cloud/dbt-cloud-ide/develop-in-the-cloud",