diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 309872dd818..0534dd916cb 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,6 +1,6 @@ ## What are you changing in this pull request and why? @@ -16,11 +16,8 @@ Uncomment if you're publishing docs for a prerelease version of dbt (delete if n - [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." -Adding new pages (delete if not applicable): -- [ ] Add page to `website/sidebars.js` -- [ ] Provide a unique filename for the new page - -Removing or renaming existing pages (delete if not applicable): -- [ ] Remove page from `website/sidebars.js` -- [ ] Add an entry `website/static/_redirects` -- [ ] Run link testing locally with `npm run build` to update the links that point to the deleted page +Adding or removing pages (delete if not applicable): +- [ ] Add/remove page in `website/sidebars.js` +- [ ] Provide a unique filename for new pages +- [ ] Add an entry for deleted pages in `website/static/_redirects` +- [ ] Run link testing locally with `npm run build` to update the links that point to deleted pages diff --git a/website/blog/2023-12-20-partner-integration-guide.md b/website/blog/2023-12-20-partner-integration-guide.md new file mode 100644 index 00000000000..b546f258f6c --- /dev/null +++ b/website/blog/2023-12-20-partner-integration-guide.md @@ -0,0 +1,99 @@ +--- +title: "How to integrate with dbt" +description: "This guide will cover the ways to integrate with dbt Cloud" +slug: integrating-with-dbtcloud + +authors: [amy_chen] + +tags: [dbt Cloud, Integrations, APIs] +hide_table_of_contents: false + +date: 2023-12-20 +is_featured: false +--- +## Overview + +Over the course of my three years running the Partner Engineering team at dbt Labs, the most common question I've been asked is, How do we integrate with dbt? Because those conversations often start out at the same place, I decided to create this guide so I’m no longer the blocker to fundamental information. This also allows us to skip the intro and get to the fun conversations so much faster, like what a joint solution for our customers would look like. + +This guide doesn't include how to integrate with dbt Core. If you’re interested in creating a dbt adapter, please check out the [adapter development guide](https://docs.getdbt.com/guides/dbt-ecosystem/adapter-development/1-what-are-adapters) instead. + +Instead, we're going to focus on integrating with dbt Cloud. Integrating with dbt Cloud is a key requirement to become a dbt Labs technology partner, opening the door to a variety of collaborative commercial opportunities. + +Here I'll cover how to get started, potential use cases you want to solve for, and points of integrations to do so. + +## New to dbt Cloud? + +If you're new to dbt and dbt Cloud, we recommend you and your software developers try our [Getting Started Quickstarts](https://docs.getdbt.com/guides) after reading [What is dbt](https://docs.getdbt.com/docs/introduction). The documentation will help you familiarize yourself with how our users interact with dbt. By going through this, you will also create a sample dbt project to test your integration. + +If you require a partner dbt Cloud account to test on, we can upgrade an existing account or a trial account. This account may only be used for development, training, and demonstration purposes. Please contact your partner manager if you're interested and provide the account ID (provided in the URL). Our partner account includes all of the enterprise level functionality and can be provided with a signed partnerships agreement. + +## Integration points + +- [Discovery API (formerly referred to as Metadata API)](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api) + - **Overview** — This GraphQL API allows you to query the metadata that dbt Cloud generates every time you run a dbt project. We have two schemas available (environment and job level). By default, we always recommend that you integrate with the environment level schema because it contains the latest state and historical run results of all the jobs run on the dbt Cloud project. The job level will only provide you the metadata of one job, giving you only a small snapshot of part of the project. +- [Administrative (Admin) API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api) + - **Overview** — This REST API allows you to orchestrate dbt Cloud jobs runs and help you administer a dbt Cloud account. For metadata retrieval, we recommend integrating with the Discovery API instead. +- [Webhooks](https://docs.getdbt.com/docs/deploy/webhooks) + - **Overview** — Outbound webhooks can send notifications about your dbt Cloud jobs to other systems. These webhooks allow you to get the latest information about your dbt jobs in real time. +- [Semantic Layers/Metrics](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview) + - **Overview** — Our Semantic Layer is made up of two parts: metrics definitions and the ability to interactively query the dbt metrics. For more details, here is a [basic overview](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl) and [our best practices](https://docs.getdbt.com/guides/dbt-ecosystem/sl-partner-integration-guide). + - Metrics definitions can be pulled from the Discovery API (linked above) or the Semantic Layer Driver/GraphQL API. The key difference is that the Discovery API isn't able to pull the semantic graph, which provides the list of available dimensions that one can query per metric. That is only available with the SL Driver/APIs. The trade-off is that the SL Driver/APIs doesn't have access to the lineage of the entire dbt project (that is, how the dbt metrics dependencies on dbt models). + - Three integration points are available for the Semantic Layer API. + +## dbt Cloud hosting and authentication + +To use the dbt Cloud APIs, you'll need access to the customer’s access urls. Depending on their dbt Cloud setup, they'll have a different access URL. To find out more, refer to [Regions & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/regions-ip-addresses) to understand all the possible configurations. My recommendation is to allow the customer to provide their own URL to simplify support. + +If the customer is on an Azure single tenant instance, they don't currently have access to the Discovery API or the Semantic Layer APIs. + +For authentication, we highly recommend that your integration uses account service tokens. You can read more about [how to create a service token and what permission sets to provide it](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens). Please note that depending on their plan type, they'll have access to different permission sets. We _do not_ recommend that users supply their user bearer tokens for authentication. This can cause issues if the user leaves the organization and provides you access to all the dbt Cloud accounts associated to the user rather than just the account (and related projects) that they want to integrate with. + +## Potential use cases + +- Event-based orchestration + - **Desired action** — You want to receive information that a scheduled dbt Cloud job has been completed or has kicked off a dbt Cloud job. You can align your product schedule to the dbt Cloud run schedule. + - **Examples** — Kicking off a dbt job after the ETL job of extracting and loading the data is completed. Or receiving a webhook after the job has been completed to kick off your reverse ETL job. + - **Integration points** — Webhooks and/or Admin API +- dbt lineage + - **Desired action** — You want to interpolate the dbt lineage metadata into your tool. + - **Example** — In your tool, you want to pull in the dbt DAG into your lineage diagram. For details on what you could pull and how to do this, refer to [Use cases and examples for the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples). + - **Integration points** — Discovery API +- dbt environment/job metadata + - **Desired action** — You want to interpolate the dbt Cloud job information into your tool, including the status of the jobs, the status of the tables executed in the run, what tests passed, etc. + - **Example** — In your Business Intelligence tool, stakeholders select from tables that a dbt model created. You show the last time the model passed its tests/last run to show that the tables are current and can be trusted. For details on what you could pull and how to do this, refer to [What's the latest state of each model](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples#whats-the-latest-state-of-each-model). + - **Integration points** — Discovery API +- dbt model documentation + - **Desired action** — You want to interpolate the dbt project Information, including model descriptions, column descriptions, etc. + - **Example** — You want to extract the dbt model description so you can display and help the stakeholder understand what they are selecting from. This way, the creators can easily pass on the information without updating another system. For details on what you could pull and how to do this, refer to [What does this dataset and its columns mean](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples#what-does-this-dataset-and-its-columns-mean). + - **Integration points** — Discovery API + +dbt Core only users will have no access to the above integration points. For dbt metadata, oftentimes our partners will create a dbt Core integration by using the [dbt artifact](https://www.getdbt.com/product/semantic-layer/) files generated by each run and provided by the user. With the Discovery API, we are providing a dynamic way to get the latest information parsed out for you. + +## dbt Cloud plans & permissions + +[The dbt Cloud plan type](https://www.getdbt.com/pricing) will change what the user has access to. There are four different types of plans: + +- **Developer** — This is free and available to one user with a limited amount of successful models built. This plan can't access the APIs, Webhooks, or Semantic Layer and is limited to just one project. +- **Team** — This plan provides access to the APIs, webhooks, and Semantic Layer. You can have up to eight users on the account and one dbt Cloud Project. This is limited to 15,000 successful models built. +- **Enterprise** (multi-tenant/multi-cell) — This plan provides access to the APIs, webhooks, and Semantic Layer. You can have more than one dbt Cloud project based on how many dbt projects/domains they have using dbt. The majority of our enterprise customers are on multi-tenant dbt Cloud instances. +- **Enterprise** (single tenant): This plan might have access to the APIs, webhooks, and Semantic Layer. If you're working with a specific customer, let us know and we can confirm if their instance has access. + +## FAQs + +- What is a dbt Cloud project? + - A dbt Cloud project is made up of two connections: one to the Git repository and one to the data warehouse/platform. Most customers will have only one dbt Cloud project in their account but there are enterprise clients who might have more depending on their use cases. The project also encapsulates two types of environments at minimal: a development environment and deployment environment. + - Folks commonly refer to the [dbt project](https://docs.getdbt.com/docs/build/projects) as the code hosted in their Git repository. +- What is a dbt Cloud environment? + - For an overview, check out [About environments](https://docs.getdbt.com/docs/environments-in-dbt). At a minimum, a project will have one deployment type environment that they will be executing jobs on. The development environment powers the dbt Cloud IDE and Cloud CLI. +- Can we write back to the dbt project? + - At this moment, we don't have a Write API. A dbt project is hosted in a Git repository, so if you have a Git provider integration, you can manually open a pull request (PR) on the project to maintain the version control process. +- Can you provide column-level information in the lineage? + - Column-level lineage is currently in beta release with more information to come. +- How do I get a Partner Account? + - Contact your Partner Manager with your account ID (in your URL). +- Why shouldn't I use the Admin API to pull out the dbt artifacts for metadata? + - We recommend not integrating with the Admin API to extract the dbt artifacts documentation. This is because the Discovery API provides more extensive information, a user-friendly structure, and a more reliable integration point. +- How do I get access to the dbt brand assets? + - Check out our [Brand guidelines](https://www.getdbt.com/brand-guidelines/) page. Please make sure you’re not using our old logo (hint: there should only be one hole in the logo). Please also note that the name dbt and the dbt logo are trademarked by dbt Labs, and that use is governed by our brand guidelines, which are fairly specific for commercial uses. If you have any questions about proper use of our marks, please ask your partner manager. +- How do I engage with the partnerships team? + - Email partnerships@dbtlabs.com. \ No newline at end of file diff --git a/website/blog/authors.yml b/website/blog/authors.yml index cd2bd162935..a3548575b6e 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -1,6 +1,6 @@ amy_chen: image_url: /img/blog/authors/achen.png - job_title: Staff Partner Engineer + job_title: Product Ecosystem Manager links: - icon: fa-linkedin url: https://www.linkedin.com/in/yuanamychen/ diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md index ee3d4262882..e50542a446c 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md @@ -2,6 +2,8 @@ title: "Intro to MetricFlow" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-2-setup" +pagination_prev: null --- Flying cars, hoverboards, and true self-service analytics: this is the future we were promised. The first two might still be a few years out, but real self-service analytics is here today. With dbt Cloud's Semantic Layer, you can resolve the tension between accuracy and flexibility that has hampered analytics tools for years, empowering everybody in your organization to explore a shared reality of metrics. Best of all for analytics engineers, building with these new tools will significantly [DRY](https://docs.getdbt.com/terms/dry) up and simplify your codebase. As you'll see, the deep interaction between your dbt models and the Semantic Layer make your dbt project the ideal place to craft your metrics. diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-2-setup.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-2-setup.md index 6e9153a3780..470445891dc 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-2-setup.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-2-setup.md @@ -2,6 +2,7 @@ title: "Set up MetricFlow" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models" --- ## Getting started @@ -13,9 +14,23 @@ git clone git@github.com:dbt-labs/jaffle-sl-template.git cd path/to/project ``` -Next, before you start writing code, you need to install MetricFlow as an extension of a dbt adapter from PyPI (dbt Core users only). The MetricFlow is compatible with Python versions 3.8 through 3.11. +Next, before you start writing code, you need to install MetricFlow: -We'll use pip to install MetricFlow and our dbt adapter: + + + + +- [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) — MetricFlow commands are embedded in the dbt Cloud CLI. You can immediately run them once you install the dbt Cloud CLI. Using dbt Cloud means you won't need to manage versioning — your dbt Cloud account will automatically manage the versioning. + +- [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) — You can create metrics using MetricFlow in the dbt Cloud IDE. However, support for running MetricFlow commands in the IDE will be available soon. + + + + + +- Download MetricFlow as an extension of a dbt adapter from PyPI (dbt Core users only). The MetricFlow is compatible with Python versions 3.8 through 3.11. + - **Note**: You'll need to manage versioning between dbt Core, your adapter, and MetricFlow. +- We'll use pip to install MetricFlow and our dbt adapter: ```shell # activate a virtual environment for your project, @@ -27,13 +42,16 @@ python -m pip install "dbt-metricflow[adapter name]" # e.g. python -m pip install "dbt-metricflow[snowflake]" ``` -Lastly, to get to the pre-Semantic Layer starting state, checkout the `start-here` branch. + + + +- Now that you're ready to use MetricFlow, get to the pre-Semantic Layer starting state by checking out the `start-here` branch: ```shell git checkout start-here ``` -For more information, refer to the [MetricFlow commands](/docs/build/metricflow-commands) or a [quickstart](/guides) to get more familiar with setting up a dbt project. +For more information, refer to the [MetricFlow commands](/docs/build/metricflow-commands) or the [quickstart guides](/guides) to get more familiar with setting up a dbt project. ## Basic commands diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md index a2dc55e37ae..9c710b286ef 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md @@ -2,6 +2,7 @@ title: "Building semantic models" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics" --- ## How to build a semantic model diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics.md index da83adbdc69..003eff9de40 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics.md @@ -2,6 +2,7 @@ title: "Building metrics" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-5-refactor-a-mart" --- ## How to build metrics diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-5-refactor-a-mart.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-5-refactor-a-mart.md index dfdba2941e9..9ae80cbcd29 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-5-refactor-a-mart.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-5-refactor-a-mart.md @@ -2,6 +2,7 @@ title: "Refactor an existing mart" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-6-advanced-metrics" --- ## A new approach diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-6-advanced-metrics.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-6-advanced-metrics.md index fe7438b5800..e5c6e452dac 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-6-advanced-metrics.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-6-advanced-metrics.md @@ -2,6 +2,7 @@ title: "More advanced metrics" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-7-conclusion" --- ## More advanced metric types diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-7-conclusion.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-7-conclusion.md index a1062721177..1870b6b77e4 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-7-conclusion.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-7-conclusion.md @@ -2,6 +2,7 @@ title: "Best practices" description: Getting started with the dbt and MetricFlow hoverSnippet: Learn how to get started with the dbt and MetricFlow +pagination_next: null --- ## Putting it all together diff --git a/website/docs/best-practices/how-we-mesh/mesh-1-intro.md b/website/docs/best-practices/how-we-mesh/mesh-1-intro.md index ba1660a8d82..819a9e04111 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-1-intro.md +++ b/website/docs/best-practices/how-we-mesh/mesh-1-intro.md @@ -6,19 +6,21 @@ hoverSnippet: Learn how to get started with dbt Mesh ## What is dbt Mesh? -Organizations of all sizes rely upon dbt to manage their data transformations, from small startups to large enterprises. At scale, it can be challenging to coordinate all the organizational and technical requirements demanded by your stakeholders within the scope of a single dbt project. To date, there also hasn't been a first-class way to effectively manage the dependencies, governance, and workflows between multiple dbt projects. +Organizations of all sizes rely upon dbt to manage their data transformations, from small startups to large enterprises. At scale, it can be challenging to coordinate all the organizational and technical requirements demanded by your stakeholders within the scope of a single dbt project. -Regardless of your organization's size and complexity, dbt should empower data teams to work independently and collaboratively; sharing data, code, and best practices without sacrificing security or autonomy. dbt Mesh provides the tooling for teams to finally achieve this. +To date, there also hasn't been a first-class way to effectively manage the dependencies, governance, and workflows between multiple dbt projects. -dbt Mesh is not a single product: it is a pattern enabled by a convergence of several features in dbt: +That's where **dbt Mesh** comes in - empowering data teams to work *independently and collaboratively*; sharing data, code, and best practices without sacrificing security or autonomy. -- **[Cross-project references](/docs/collaborate/govern/project-dependencies#how-to-use-ref)** - this is the foundational feature that enables the multi-project deployments. `{{ ref() }}`s now work across dbt Cloud projects on Enterprise plans. +This guide will walk you through the concepts and implementation details needed to get started. dbt Mesh is not a single product - it is a pattern enabled by a convergence of several features in dbt: + +- **[Cross-project references](/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref)** - this is the foundational feature that enables the multi-project deployments. `{{ ref() }}`s now work across dbt Cloud projects on Enterprise plans. - **[dbt Explorer](/docs/collaborate/explore-projects)** - dbt Cloud's metadata-powered documentation platform, complete with full, cross-project lineage. -- **Governance** - dbt's new governance features allow you to manage access to your dbt models both within and across projects. - - **[Groups](/docs/collaborate/govern/model-access#groups)** - groups allow you to assign models to subsets within a project. +- **Governance** - dbt's governance features allow you to manage access to your dbt models both within and across projects. + - **[Groups](/docs/collaborate/govern/model-access#groups)** - With groups, you can organize nodes in your dbt DAG that share a logical connection (for example, by functional area) and assign an owner to the entire group. - **[Access](/docs/collaborate/govern/model-access#access-modifiers)** - access configs allow you to control who can reference models. -- **[Model Versions](/docs/collaborate/govern/model-versions)** - when coordinating across projects and teams, we recommend treating your data models as stable APIs. Model versioning is the mechanism to allow graceful adoption and deprecation of models as they evolve. -- **[Model Contracts](/docs/collaborate/govern/model-contracts)** - data contracts set explicit expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products. + - **[Model Versions](/docs/collaborate/govern/model-versions)** - when coordinating across projects and teams, we recommend treating your data models as stable APIs. Model versioning is the mechanism to allow graceful adoption and deprecation of models as they evolve. + - **[Model Contracts](/docs/collaborate/govern/model-contracts)** - data contracts set explicit expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products. ## Who is dbt Mesh for? @@ -32,6 +34,8 @@ dbt Cloud is designed to coordinate the features above and simplify the complexi If you're just starting your dbt journey, don't worry about building a multi-project architecture right away. You can _incrementally_ adopt the features in this guide as you scale. The collection of features work effectively as independent tools. Familiarizing yourself with the tooling and features that make up a multi-project architecture, and how they can apply to your organization will help you make better decisions as you grow. +For additional information, refer to the [dbt Mesh FAQs](/best-practices/how-we-mesh/mesh-4-faqs). + ## Learning goals - Understand the **purpose and tradeoffs** of building a multi-project architecture. diff --git a/website/docs/best-practices/how-we-mesh/mesh-2-structures.md b/website/docs/best-practices/how-we-mesh/mesh-2-structures.md index 9ab633c50ad..345ef22c62d 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-2-structures.md +++ b/website/docs/best-practices/how-we-mesh/mesh-2-structures.md @@ -20,7 +20,7 @@ At a high level, you’ll need to decide: ### Cycle detection -Like resource dependencies, project dependencies are acyclic, meaning they only move in one direction. This prevents `ref` cycles (or loops), which lead to issues with your data workflows. For example, if project B depends on project A, a new model in project A could not import and use a public model from project B. Refer to [Project dependencies](/docs/collaborate/govern/project-dependencies#how-to-use-ref) for more information. +Like resource dependencies, project dependencies are acyclic, meaning they only move in one direction. This prevents `ref` cycles (or loops), which lead to issues with your data workflows. For example, if project B depends on project A, a new model in project A could not import and use a public model from project B. Refer to [Project dependencies](/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref) for more information. ## Define your project interfaces by splitting your DAG diff --git a/website/docs/best-practices/how-we-mesh/mesh-3-implementation.md b/website/docs/best-practices/how-we-mesh/mesh-3-implementation.md index 65ed5d7935b..5934c1625a3 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-3-implementation.md +++ b/website/docs/best-practices/how-we-mesh/mesh-3-implementation.md @@ -127,4 +127,4 @@ We've provided a set of example projects you can use to explore the topics cover ### dbt-meshify -We recommend using the `dbt-meshify` [command line tool]() to help you do this. This comes with CLI operations to automate most of the above steps. +We recommend using the `dbt-meshify` [command line tool]() to help you do this. This comes with CLI operations to automate most of the above steps. diff --git a/website/docs/best-practices/how-we-mesh/mesh-4-faqs.md b/website/docs/best-practices/how-we-mesh/mesh-4-faqs.md new file mode 100644 index 00000000000..7119a3d90bd --- /dev/null +++ b/website/docs/best-practices/how-we-mesh/mesh-4-faqs.md @@ -0,0 +1,317 @@ +--- +title: "dbt Mesh FAQs" +description: "Read the FAQs to learn more about dbt Mesh, how it works, compatibility, and more." +hoverSnippet: "dbt Mesh FAQs" +sidebar_label: "dbt Mesh FAQs" +--- + +dbt Mesh is a new architecture enabled by dbt Cloud. It allows you to better manage complexity by deploying multiple interconnected dbt projects instead of a single large, monolithic project. It’s designed to accelerate development, without compromising governance. + +## Overview of Mesh + + + +Here are some benefits of implementing dbt Mesh: + +* **Ship data products faster**: With a more modular architecture, teams can make changes rapidly and independently in specific areas without impacting the entire system, leading to faster development cycles. +* **Improve trust in data:** Adopting dbt Mesh helps ensure that changes in one domain's data models do not unexpectedly break dependencies in other domain areas, leading to a more secure and predictable data environment. +* **Reduce complexity**: By organizing transformation logic into distinct domains, dbt Mesh reduces the complexity inherent in large, monolithic projects, making them easier to manage and understand. +* **Improve collaboration**: Teams are able to share and build upon each other's work without duplicating efforts. + +Most importantly, all this can be accomplished without the central data team losing the ability to see lineage across the entire organization, or compromising on governance mechanisms. + + + + + +dbt [model contracts](/docs/collaborate/govern/model-contracts) serve as a governance tool enabling the definition and enforcement of data structure standards in your dbt models. They allow you to specify and uphold data model guarantees, including column data types, allowing for the stability of dependent models. Should a model fail to adhere to its established contracts, it will not build successfully. + + + + + +dbt [model versions](https://docs.getdbt.com/docs/collaborate/govern/model-versions) are iterations of your dbt models made over time. In many cases, you might knowingly choose to change a model’s structure in a way that “breaks” the previous model contract, and may break downstream queries depending on that model’s structure. When you do so, creating a new version of the model is useful to signify this change. + +You can use model versions to: + +- Test "prerelease" changes (in production, in downstream systems). +- Bump the latest version, to be used as the canonical "source of truth." +- Offer a migration window off the "old" version. + + + + + +A [model access modifier](/docs/collaborate/govern/model-access) in dbt determines if a model is accessible as an input to other dbt models and projects. It specifies where a model can be referenced using [the `ref` function](/reference/dbt-jinja-functions/ref). There are three types of access modifiers: + +1. **Private:** A model with a private access modifier is only referenceable by models within the same group. This is intended for models that are implementation details and are meant to be used only within a specific group of related models. +2. **Protected:** Models with a protected access modifier can be referenced by any other model within the same dbt project or when the project is installed as a package. This is the default setting for all models, ensuring backward compatibility, especially when groups are assigned to an existing set of models. +3. **Public:** A public model can be referenced across different groups, packages, or projects. This is suitable for stable and mature models that serve as interfaces for other teams or projects. + + + + + +A [model group](/docs/collaborate/govern/model-access#groups) in dbt is a concept used to organize models under a common category or ownership. This categorization can be based on various criteria, such as the team responsible for the models or the specific data source they model. + + + + + +This is a new way of working, and the intentionality required to build, and then maintain, cross-project interfaces and dependencies may feel like a slowdown versus what some developers are used to. The intentional friction introduced promotes thoughtful changes, fostering a mindset that values stability and systematic adjustments over rapid transformations. + +Orchestration across multiple projects is also likely to be slightly more challenging for many organizations, although we’re currently developing new functionality that will make this process simpler. + + + + + +dbt Mesh allows you to better _operationalize_ data mesh by enabling decentralized, domain-specific data ownership and collaboration. + +In data mesh, each business domain is responsible for its data as a product. This is the same goal that dbt Mesh facilitates by enabling organizations to break down large, monolithic data projects into smaller, domain-specific dbt projects. Each team or domain can independently develop, maintain, and share its data models, fostering a decentralized data environment. + +dbt Mesh also enhances the interoperability and reusability of data across different domains, a key aspect of the data mesh philosophy. By allowing cross-project references and shared governance through model contracts and access controls, dbt Mesh ensures that while data ownership is decentralized, there is still a governed structure to the overall data architecture. + + + +## How dbt Mesh works + + + +Like resource dependencies, project dependencies are acyclic, meaning they only move in one direction. This prevents `ref` cycles (or loops). For example, if project B depends on project A, a new model in project A could not import and use a public model from project B. Refer to [Project dependencies](/docs/collaborate/govern/project-dependencies#how-to-use-ref) for more information. + + + + + +While it’s not currently possible to share sources across projects, it would be possible to have a shared foundational project, with staging models on top of those sources, exposed as “public” models to other teams/projects. + + + + + +This would be a breaking change for downstream consumers of that model. If the maintainers of the upstream project wish to remove the model (or “downgrade” its access modifier, effectively the same thing), they should mark that model for deprecation (using [deprecation_date](/reference/resource-properties/deprecation_date)), which will deliver a warning to all downstream consumers of that model. + +In the future, we plan for dbt Cloud to also be able to proactively flag this scenario in [continuous integration](/docs/deploy/continuous-integration) for the maintainers of the upstream public model. + + + + + +No, unless downstream projects are installed as [packages](/docs/build/packages) (source code). In that case, the models in project installed as a project become “your” models, and you can select or run them. There are cases in which this can be desirable; see docs on [project dependencies](/docs/collaborate/govern/project-dependencies). + + + + + +Yes, as long as they’re in the same data platform (BigQuery, Databricks, Redshift, Snowflake, etc.) and you have configured permissions and sharing in that data platform provider to allow this. + + + + + +Yes, because the cross-project collaboration is done using the `{{ ref() }}` macro, you can use those models from other teams in [singular tests](/docs/build/data-tests#singular-data-tests). + + + + + +Each team defines their connection to the data warehouse, and the default schema names for dbt to use when materializing datasets. + +By default, each project belonging to a team will create: + +- One schema for production runs (for example, `finance`). +- One schema per developer (for example, `dev_jerco`). + +Depending on each team’s needs, this can be customized with model-level [schema configurations](/docs/build/custom-schemas), including the ability to define different rules by environment. + + + + + +No, contracts can only be applied at the [model level](/docs/collaborate/govern/model-contracts). It is a recommended best practice to [define staging models](/best-practices/how-we-structure/2-staging) on top of sources, and it is possible to define contracts on top of those staging models. + + + + + +No. A contract applies to an entire model, including all columns in the model’s output. This is the same set of columns that a consumer would see when viewing the model’s details in Explorer, or when querying the model in the data platform. + +- If you wish to contract only a subset of columns, you can create a separate model (materialized as a view) selecting only that subset. +- If you wish to limit which rows or columns a downstream consumer can see when they query the model’s data, depending on who they are, some data platforms offer advanced capabilities around dynamic row-level access and column-level data masking. + + + + + +No, a [group](/docs/collaborate/govern/model-access#groups) can only be assigned to a single owner. However, the assigned owner can be a _team_, rather than an individual. + + + + + +Not directly, but contracts are [assigned to models](/docs/collaborate/govern/model-contracts) and models can be assigned to individual owners. You can use meta fields for this purpose. + + + + + +This is not currently possible, but something we hope to enable in the near future. If you’re interested in this functionality, please reach out to your dbt Labs account team. + + + + + +dbt Cloud will soon offer the capability to trigger jobs on the completion of another job, including a job in a different project. This offers one mechanism for executing a pipeline from start to finish across projects. + + + + + +Yes. In addition to being viewable natively through [dbt Explorer](https://www.getdbt.com/product/dbt-explorer), it is possible to view cross-project lineage connect using partner integrations with data cataloging tools. For a list of available dbt Cloud integrations, refer to the [Integrations page](https://www.getdbt.com/product/integrations). + + + + + +Tests and model contracts in dbt help eliminate the need to restate data in the first place. With these tools, you can incorporate checks at the source and output layers of your dbt projects to assess data quality in the most critical places. When there are changes in transformation logic (for example, the definition of a particular column is changed), restating the data is as easy as merging the updated code and running a dbt Cloud job. + +If a data quality issue does slip through, you also have the option of simply rolling back the git commit, and then re-running the dbt Cloud job with the old code. + + + + + +Yes, all of this metadata is accessible via the [dbt Cloud Admin API](/docs/dbt-cloud-apis/admin-cloud-api). This metadata can be fed into a monitoring tool, or used to create reports and dashboards. + +We also expose some of this information in dbt Cloud itself in [jobs](/docs/deploy/jobs), [environments](/docs/environments-in-dbt) and in [dbt Explorer](https://www.getdbt.com/product/dbt-explorer). + + + +## Permissions and access + + + +The existence of projects that have at least one public model will be visible to everyone in the organization with [read-only access](/docs/cloud/manage-access/seats-and-users). + +Private or protected models require a user to have read-only access on the specific project in order to see its existence. + + + + + +There’s model-level access within dbt, role-based access for users and groups in dbt Cloud, and access to the underlying data within the data platform. + +First things first: access to underlying data is always defined and enforced by the underlying data platform (for example, BigQuery, Databricks, Redshift, Snowflake, Starburst, etc.) This access is managed by executing “DCL statements” (namely `grant`). dbt makes it easy to [configure `grants` on models](/reference/resource-configs/grants), which provision data access for other roles/users/groups in the data warehouse. However, dbt does _not_ automatically define or coordinate those grants unless they are configured explicitly. Refer to your organization's system for managing data warehouse permissions. + +[dbt Cloud Enterprise plans](https://www.getdbt.com/pricing) support [role-based access control (RBAC)](/docs/cloud/manage-access/enterprise-permissions#how-to-set-up-rbac-groups-in-dbt-cloud) that manages granular permissions for users and user groups. You can control which users can see or edit all aspects of a dbt Cloud project. A user’s access to dbt Cloud projects also determines whether they can “explore” that project in detail. Roles, users, and groups are defined within the dbt Cloud application via the UI or by integrating with an identity provider. + +[Model access](/docs/collaborate/govern/model-access) defines where models can be referenced. It also informs the discoverability of those projects within dbt Explorer. Model `access` is defined in code, just like any other model configuration (`materialized`, `tags`, etc). + +**Public:** Models with `public` access can be referenced everywhere. These are the “data products” of your organization. + +**Protected:** Models with `protected` access can only be referenced within the same project. This is the default level of model access. +We are discussing a future extension to `protected` models to allow for their reference in _specific_ downstream projects. Please read [the GitHub issue](https://github.com/dbt-labs/dbt-core/issues/9340), and upvote/comment if you’re interested in this use case. + +**Private:** Model `groups` enable more-granular control over where `private` models can be referenced. By defining a group, and configuring models to belong to that group, you can restrict other models (not in the same group) from referencing any `private` models the group contains. Groups also provide a standard mechanism for defining the `owner` of all resources it contains. + +Within dbt Explorer, `public` models are discoverable for every user in the dbt Cloud account — every public model is listed in the “multi-project” view. By contrast, `protected` and `private` models in a project are visible only to users who have access to that project (including read-only access). + +Because dbt does not implicitly coordinate data warehouse `grants` with model-level `access`, it is possible for there to be a mismatch between them. For example, a `public` model’s metadata is viewable to all dbt Cloud users, anyone can write a `ref` to that model, but when they actually run or preview, they realize they do not have access to the underlying data in the data warehouse. **This is intentional.** In this way, your organization can retain least-privileged access to underlying data, while providing visibility and discoverability for the wider organization. Armed with the knowledge of which other “data products” (public models) exist — their descriptions, their ownership, which columns they contain — an analyst on another team can prepare a well-informed request for access to the underlying data. + + + + + +Not currently! But this is something we may evaluate for the future. + + + + + +Yes! As long as a user has permissions (at least read-only access) on all projects in a dbt Cloud account, they can navigate across the entirety of the organization’s DAG in dbt Explorer, and see models at all levels of detail. + + + + + +By default, cross-project references resolve to the “Production” deployment environment of the upstream project. If your organization has genuinely different data in production versus non-production environments, this poses an issue. + +For this reason, we will soon roll out a new canonical type of deployment environment: “Staging.” If a project defines both a “Production” environment and a “Staging” environment, then cross-project references from development and “Staging” environments will resolve to “Staging,” whereas only references coming from “Production” environments will resolve to “Production.” In this way, you are guaranteed separation of data environments, without needing to duplicate project configurations. + +If you’re interested in beta access to “Staging” environments, let your dbt Labs account representative know! + + + +## Compatibility with other features + + + +The [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) and dbt Mesh are complementary mechanisms enabled by dbt Cloud that work together to enhance the management, usability, and governance of data in large-scale data environments. + +The Semantic Layer in dbt Cloud allows teams to centrally define business metrics and dimensions. It ensures consistent and reliable metric definitions across various analytics tools and platforms. + +dbt Mesh enables organizations to split their data architecture into multiple domain-specific projects, while retaining the ability to reference “public” models across projects. It is also possible to reference a “public” model from another project for the purpose of defining semantic models and metrics. Your organization can have multiple dbt projects feed into a unified semantic layer, ensuring that metrics and dimensions are consistently defined and understood across these domains. + + + + + +**[dbt Explorer](/docs/collaborate/explore-projects)** is a tool within dbt Cloud that serves as a knowledge base and lineage visualization platform. It provides a comprehensive view of your dbt assets, including models, tests, sources, and their interdependencies. + +Used in conjunction with dbt Mesh, dbt Explorer becomes a powerful tool for visualizing and understanding the relationships and dependencies between models across multiple dbt projects. + + + + + +The [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) allows users to develop and run dbt commands from their preferred development environments, like VS Code, Sublime Text, or terminal interfaces. This flexibility is particularly beneficial in a dbt Mesh setup, where managing multiple projects can be complex. Developers can work in their preferred tools while leveraging the centralized capabilities of dbt Cloud. + + + +## Availability + + + +Yes, your account must be on [at least dbt v1.6](/docs/dbt-versions/upgrade-core-in-cloud) to take advantage of [cross-project dependencies](/docs/collaborate/govern/project-dependencies), one of the most crucial underlying capabilities required to implement a dbt Mesh. + + + + + +While dbt Core defines several of the foundational elements for dbt Mesh, dbt Cloud offers an enhanced experience that leverages these elements for scaled collaboration across multiple teams, facilitated by multi-project discovery in dbt Explorer that’s tailored to each user’s access. + +Several key components that underpin the dbt Mesh pattern, including model contracts, versions, and access modifiers, are defined and implemented in dbt Core. We believe these are components of the core language, which is why their implementations are open source. We want to define a standard pattern that analytics engineers everywhere can adopt, extend, and help us improve. + +To reference models defined in another project, users can also leverage [packages](/docs/build/packages), a longstanding feature of dbt Core. By importing an upstream project as a package, dbt will import all models defined in that project, which enables the resolution of cross-project references to those models. They can be [optionally restricted](/docs/collaborate/govern/model-access#how-do-i-restrict-access-to-models-defined-in-a-package) to just the models with `public` access. + +The major distinction comes with dbt Cloud's metadata service, which is unique to the dbt Cloud platform and allows for the resolution of references to only the public models in a project. This service enables users to take dependencies on upstream projects, and reference just their `public` models, *without* needing to load the full complexity of those upstream projects into their local development environment. + + + + + +Yes, a [dbt Cloud Enterprise](https://www.getdbt.com/pricing) plan is required to set up multiple projects and reference models across them. + + + +## Tips on implementing dbt Mesh + + + +Refer to our developer guide on [How we structure our dbt Mesh projects](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro). You may also be interested in watching the recording of this talk from Coalesce 2023: [Unlocking model governance and multi-project deployments with dbt-meshify](https://www.youtube.com/watch?v=FAsY0Qx8EyU). + + + + + +`dbt-meshify` is a [CLI tool](https://github.com/dbt-labs/dbt-meshify) that automates the creation of model governance and cross-project lineage features introduced in dbt-core v1.5 and v1.6. This package will leverage your dbt project metadata to create and/or edit the files in your project to properly configure the models in your project with these features. + + + + +Let’s say your organization has fewer than 500 models and fewer than a dozen regular contributors to dbt. You're operating at a scale well served by the monolith (a single project), and the larger pattern of dbt Mesh probably won't provide any immediate benefits. + +It’s never too early to think about how you’re organizing models _within_ that project. Use model `groups` to define clear ownership boundaries and `private` access to restrict purpose-built models from becoming load-bearing blocks in an unrelated section of the DAG. Your future selves will thank you for defining these interfaces, especially if you reach a scale where it makes sense to “graduate” the interfaces between `groups` into boundaries between projects. + + diff --git a/website/docs/docs/build/cumulative-metrics.md b/website/docs/docs/build/cumulative-metrics.md index 45a136df751..94ed762d1f0 100644 --- a/website/docs/docs/build/cumulative-metrics.md +++ b/website/docs/docs/build/cumulative-metrics.md @@ -31,8 +31,8 @@ metrics: label: The value that will be displayed in downstream tools # Required type_params: # Required measure: The measure you are referencing # Required - window: The accumulation window, such as 1 month, 7 days, 1 year. # Optional. Can not be used with window. - grain_to_date: Sets the accumulation grain, such as month will accumulate data for one month, then restart at the beginning of the next. # Optional. Cannot be used with grain_to_date + window: The accumulation window, such as 1 month, 7 days, 1 year. # Optional. Cannot be used with grain_to_date + grain_to_date: Sets the accumulation grain, such as month will accumulate data for one month, then restart at the beginning of the next. # Optional. Cannot be used with window ``` diff --git a/website/docs/docs/build/metricflow-commands.md b/website/docs/docs/build/metricflow-commands.md index e3bb93da964..a0964269e68 100644 --- a/website/docs/docs/build/metricflow-commands.md +++ b/website/docs/docs/build/metricflow-commands.md @@ -17,15 +17,16 @@ MetricFlow is compatible with Python versions 3.8, 3.9, 3.10, and 3.11. MetricFlow is a dbt package that allows you to define and query metrics in your dbt project. You can use MetricFlow to query metrics in your dbt project in the dbt Cloud CLI, dbt Cloud IDE, or dbt Core. -**Note** — MetricFlow commands aren't supported in dbt Cloud jobs yet. However, you can add MetricFlow validations with your git provider (such as GitHub Actions) by installing MetricFlow (`python -m pip install metricflow`). This allows you to run MetricFlow commands as part of your continuous integration checks on PRs. +Using MetricFlow with dbt Cloud means you won't need to manage versioning — your dbt Cloud account will automatically manage the versioning. + +**dbt Cloud jobs** — MetricFlow commands aren't supported in dbt Cloud jobs yet. However, you can add MetricFlow validations with your git provider (such as GitHub Actions) by installing MetricFlow (`python -m pip install metricflow`). This allows you to run MetricFlow commands as part of your continuous integration checks on PRs. -MetricFlow commands are embedded in the dbt Cloud CLI, which means you can immediately run them once you install the dbt Cloud CLI. - -A benefit to using the dbt Cloud is that you won't need to manage versioning — your dbt Cloud account will automatically manage the versioning. +- MetricFlow commands are embedded in the dbt Cloud CLI. This means you can immediately run them once you install the dbt Cloud CLI and don't need to install MetricFlow separately. +- You don't need to manage versioning — your dbt Cloud account will automatically manage the versioning for you. @@ -35,7 +36,7 @@ A benefit to using the dbt Cloud is that you won't need to manage versioning &md You can create metrics using MetricFlow in the dbt Cloud IDE. However, support for running MetricFlow commands in the IDE will be available soon. ::: -A benefit to using the dbt Cloud is that you won't need to manage versioning — your dbt Cloud account will automatically manage the versioning. + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index 5f1c4cae725..c265529fb49 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -42,10 +42,12 @@ alter user jsmith set rsa_public_key='MIIBIjANBgkqh...'; ``` 2. Finally, set the **Private Key** and **Private Key Passphrase** fields in the **Credentials** page to finish configuring dbt Cloud to authenticate with Snowflake using a key pair. - - **Note:** At this time ONLY Encrypted Private Keys are supported by dbt Cloud, and the keys must be of size 4096 or smaller. -3. To successfully fill in the Private Key field, you **must** include commented lines when you add the passphrase. Leaving the **Private Key Passphrase** field empty will return an error. If you're receiving a `Could not deserialize key data` or `JWT token` error, refer to [Troubleshooting](#troubleshooting) for more info. +**Note:** Unencrypted private keys are permitted. Use a passphrase only if needed. +As of dbt version 1.5.0, you can use a `private_key` string in place of `private_key_path`. This `private_key` string can be either Base64-encoded DER format for the key bytes or plain-text PEM format. For more details on key generation, refer to the [Snowflake documentation](https://community.snowflake.com/s/article/How-to-configure-Snowflake-key-pair-authentication-fields-in-dbt-connection). + + +4. To successfully fill in the Private Key field, you _must_ include commented lines. If you receive a `Could not deserialize key data` or `JWT token` error, refer to [Troubleshooting](#troubleshooting) for more info. **Example:** diff --git a/website/docs/docs/cloud/dbt-cloud-ide/keyboard-shortcuts.md b/website/docs/docs/cloud/dbt-cloud-ide/keyboard-shortcuts.md index 121cab68ce7..61fe47a235a 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/keyboard-shortcuts.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/keyboard-shortcuts.md @@ -13,14 +13,14 @@ Use this dbt Cloud IDE page to help you quickly reference some common operation |--------|----------------|------------------| | View a full list of editor shortcuts | Fn-F1 | Fn-F1 | | Select a file to open | Command-O | Control-O | -| Open the command palette to invoke dbt commands and actions | Command-P or Command-Shift-P | Control-P or Control-Shift-P | -| Multi-edit by selecting multiple lines | Option-click or Shift-Option-Command | Hold Alt and click | +| Close currently active editor tab | Option-W | Alt-W | | Preview code | Command-Enter | Control-Enter | | Compile code | Command-Shift-Enter | Control-Shift-Enter | -| Reveal a list of dbt functions | Enter two underscores `__` | Enter two underscores `__` | -| Toggle open the [Invocation history drawer](/docs/cloud/dbt-cloud-ide/ide-user-interface#invocation-history) located on the bottom of the IDE. | Control-backtick (or Control + `) | Control-backtick (or Ctrl + `) | -| Add a block comment to selected code. SQL files will use the Jinja syntax `({# #})` rather than the SQL one `(/* */)`.

Markdown files will use the Markdown syntax `()` | Command-Option-/ | Control-Alt-/ | -| Close the currently active editor tab | Option-W | Alt-W | +| Reveal a list of dbt functions in the editor | Enter two underscores `__` | Enter two underscores `__` | +| Open the command palette to invoke dbt commands and actions | Command-P / Command-Shift-P | Control-P / Control-Shift-P | +| Multi-edit in the editor by selecting multiple lines | Option-Click / Shift-Option-Command / Shift-Option-Click | Hold Alt and Click | +| Open the [**Invocation History Drawer**](/docs/cloud/dbt-cloud-ide/ide-user-interface#invocation-history) located at the bottom of the IDE. | Control-backtick (or Control + `) | Control-backtick (or Ctrl + `) | +| Add a block comment to the selected code. SQL files will use the Jinja syntax `({# #})` rather than the SQL one `(/* */)`.

Markdown files will use the Markdown syntax `()` | Command-Option-/ | Control-Alt-/ | ## Related docs diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index 733ec9dbcfe..f6f2265a922 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -14,7 +14,7 @@ Linters analyze code for errors, bugs, and style issues, while formatters fix st -In the dbt Cloud IDE, you have the capability to perform linting, auto-fix, and formatting on five different file types: +In the dbt Cloud IDE, you can perform linting, auto-fix, and formatting on five different file types: - SQL — [Lint](#lint) and fix with SQLFluff, and [format](#format) with sqlfmt - YAML, Markdown, and JSON — Format with Prettier @@ -146,7 +146,7 @@ The Cloud IDE formatting integrations take care of manual tasks like code format To format your SQL code, dbt Cloud integrates with [sqlfmt](http://sqlfmt.com/), which is an uncompromising SQL query formatter that provides one way to format the SQL query and Jinja. -By default, the IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use right away. However, if you have a file named .sqlfluff in the root directory of your dbt project, the IDE will default to SQLFluff rules instead. +By default, the IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use immediately. However, if you have a file named .sqlfluff in the root directory of your dbt project, the IDE will default to SQLFluff rules instead. To enable sqlfmt: @@ -189,10 +189,8 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read ## FAQs -
-When should I use SQLFluff and when should I use sqlfmt? - -SQLFluff and sqlfmt are both tools used for formatting SQL code, but there are some differences that may make one preferable to the other depending on your use case.
+ +SQLFluff and sqlfmt are both tools used for formatting SQL code, but some differences may make one preferable to the other depending on your use case.
SQLFluff is a SQL code linter and formatter. This means that it analyzes your code to identify potential issues and bugs, and follows coding standards. It also formats your code according to a set of rules, which are [customizable](#customize-linting), to ensure consistent coding practices. You can also use SQLFluff to keep your SQL code well-formatted and follow styling best practices.
@@ -204,34 +202,37 @@ You can use either SQLFluff or sqlfmt depending on your preference and what work - Use sqlfmt to only have your code well-formatted without analyzing it for errors and bugs. You can use sqlfmt out of the box, making it convenient to use right away without having to configure it. -
+ -
-Can I nest .sqlfluff files? + To ensure optimal code quality, consistent code, and styles — it's highly recommended you have one main `.sqlfluff` configuration file in the root folder of your project. Having multiple files can result in various different SQL styles in your project.

However, you can customize and include an additional child `.sqlfluff` configuration file within specific subfolders of your dbt project.

By nesting a `.sqlfluff` file in a subfolder, SQLFluff will apply the rules defined in that subfolder's configuration file to any files located within it. The rules specified in the parent `.sqlfluff` file will be used for all other files and folders outside of the subfolder. This hierarchical approach allows for tailored linting rules while maintaining consistency throughout your project. Refer to [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/configuration.html#configuration-files) for more info. -
+ -
-Can I run SQLFluff commands from the terminal? + Currently, running SQLFluff commands from the terminal isn't supported. -
+ -
-Why am I unable to see the Lint or Format button? + Make sure you're on a development branch. Formatting or Linting isn't available on "main" or "read-only" branches. -
+ -
-Why is there inconsistent SQLFluff behavior when running outside the dbt Cloud IDE (such as a GitHub Action)? -— Double-check your SQLFluff version matches the one in dbt Cloud IDE (found in the Code Quality tab after a lint operation).

-— If your lint operation passes despite clear rule violations, confirm you're not linting models with ephemeral models. Linting doesn't support ephemeral models in dbt v1.5 and lower. -
+ +- Double-check that your SQLFluff version matches the one in dbt Cloud IDE (found in the Code Quality tab after a lint operation).

+- If your lint operation passes despite clear rule violations, confirm you're not linting models with ephemeral models. Linting doesn't support ephemeral models in dbt v1.5 and lower. +
+ + +Currently, the dbt Cloud IDE can lint or fix files up to a certain size and complexity. If you attempt to lint or fix files that are too large, taking more than 60 seconds for the dbt Cloud backend to process, you will see an 'Unable to complete linting this file' error. + +To avoid this, break up your model into smaller models (files) so that they are less complex to lint or fix. Note that linting is simpler than fixing so there may be cases where a file can be linted but not fixed. + + ## Related docs diff --git a/website/docs/docs/cloud/migration.md b/website/docs/docs/cloud/migration.md new file mode 100644 index 00000000000..0c43a287bbe --- /dev/null +++ b/website/docs/docs/cloud/migration.md @@ -0,0 +1,45 @@ +--- +title: "Multi-cell migration checklist" +id: migration +description: "Prepare for account migration to AWS cell-based architecture." +pagination_next: null +pagination_prev: null +--- + +dbt Labs is in the process of migrating dbt Cloud to a new _cell-based architecture_. This architecture will be the foundation of dbt Cloud for years to come, and will bring improved scalability, reliability, and security to all customers and users of dbt Cloud. + +There is some preparation required to ensure a successful migration. + +Migrations are being scheduled on a per-account basis. _If you haven't received any communication (either with a banner or by email) about a migration date, you don't need to take any action at this time._ dbt Labs will share migration date information with you, with appropriate advance notice, before we complete any migration steps in the dbt Cloud backend. + +This document outlines the steps that you must take to prevent service disruptions before your environment is migrated over to the cell-based architecture. This will impact areas such as login, IP restrictions, and API access. + +## Pre-migration checklist + +Prior to your migration date, your dbt Cloud account admin will need to make some changes to your account. + +If your account is scheduled for migration, you will see a banner indicating your migration date when you log in. If you don't see a banner, you don't need to take any action. + +1. **IP addresses** — dbt Cloud will be using new IPs to access your warehouse after the migration. Make sure to allow inbound traffic from these IPs in your firewall and include it in any database grants. All six of the IPs below should be added to allowlists. + * Old IPs: `52.45.144.63`, `54.81.134.249`, `52.22.161.231` + * New IPs: `52.3.77.232`, `3.214.191.130`, `34.233.79.135` +2. **APIs and integrations** — Each dbt Cloud account will be allocated a static access URL like: `aa000.us1.dbt.com`. You should begin migrating your API access and partner integrations to use the new static subdomain as soon as possible. You can find your access URL on: + * Any page where you generate or manage API tokens. + * The **Account Settings** > **Account page**. + + :::important Multiple account access + Be careful, each account that you have access to will have a different, dedicated [access URL](https://next.docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses#accessing-your-account). + ::: + +3. **IDE sessions** — Any uncommitted changes in the IDE might be lost during the migration process. dbt Labs _strongly_ encourages you to commit all changes in the IDE before your scheduled migration time. +4. **User invitations** — Any pending user invitations will be invalidated during the migration. You can resend the invitations once the migration is complete. +5. **Git integrations** — Integrations with GitHub, GitLab, and Azure DevOps will need to be manually updated. dbt Labs will not be migrating any accounts using these integrations at this time. If you're using one of these integrations and your account is scheduled for migration, please contact support and we will delay your migration. +6. **SSO integrations** — Integrations with SSO identity providers (IdPs) will need to be manually updated. dbt Labs will not be migrating any accounts using SSO at this time. If you're using one of these integrations and your account is scheduled for migration, please contact support and we will delay your migration. + +## Post-migration + +After migration, if you completed all the [Pre-migration checklist](#pre-migration-checklist) items, your dbt Cloud resources and jobs will continue to work as they did before. + +You have the option to log in to dbt Cloud at a different URL: + * If you were previously logging in at `cloud.getdbt.com`, you should instead plan to login at `us1.dbt.com`. The original URL will still work, but you’ll have to click through to be redirected upon login. + * You may also log in directly with your account’s unique [access URL](https://next.docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses#accessing-your-account). \ No newline at end of file diff --git a/website/docs/docs/cloud/secure/about-privatelink.md b/website/docs/docs/cloud/secure/about-privatelink.md index 2134ab25cfe..731cef3f019 100644 --- a/website/docs/docs/cloud/secure/about-privatelink.md +++ b/website/docs/docs/cloud/secure/about-privatelink.md @@ -6,10 +6,11 @@ sidebar_label: "About PrivateLink" --- import SetUpPages from '/snippets/_available-tiers-privatelink.md'; +import PrivateLinkHostnameWarning from '/snippets/_privatelink-hostname-restriction.md'; -PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on AWS using [AWS PrivateLink](https://aws.amazon.com/privatelink/) technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. +PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on AWS using [AWS PrivateLink](https://aws.amazon.com/privatelink/) technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. ### Cross-region PrivateLink @@ -24,3 +25,5 @@ dbt Cloud supports the following data platforms for use with the PrivateLink fea - [Redshift](/docs/cloud/secure/redshift-privatelink) - [Postgres](/docs/cloud/secure/postgres-privatelink) - [VCS](/docs/cloud/secure/vcs-privatelink) + + diff --git a/website/docs/docs/collaborate/govern/model-contracts.md b/website/docs/docs/collaborate/govern/model-contracts.md index e3ea1e8c70c..09389036513 100644 --- a/website/docs/docs/collaborate/govern/model-contracts.md +++ b/website/docs/docs/collaborate/govern/model-contracts.md @@ -28,10 +28,18 @@ While this is ideal for quick and iterative development, for some models, consta ## Where are contracts supported? At present, model contracts are supported for: -- SQL models. Contracts are not yet supported for Python models. -- Models materialized as `table`, `view`, and `incremental` (with `on_schema_change: append_new_columns`). Views offer limited support for column names and data types, but not `constraints`. Contracts are not supported for `ephemeral`-materialized models. +- SQL models. +- Models materialized as one of the following: + - `table` + - `view` — Views offer limited support for column names and data types, but not `constraints`. + - `incremental` — with `on_schema_change: append_new_columns` or `on_schema_change: fail`. - Certain data platforms, but the supported and enforced `constraints` vary by platform. +Model contracts are _not_ supported for: +- Python models. +- `ephemeral`-materialized SQL models. + + ## How to define a contract Let's say you have a model with a query like: diff --git a/website/docs/docs/collaborate/govern/project-dependencies.md b/website/docs/docs/collaborate/govern/project-dependencies.md index 569d69a87e6..80dee650698 100644 --- a/website/docs/docs/collaborate/govern/project-dependencies.md +++ b/website/docs/docs/collaborate/govern/project-dependencies.md @@ -4,16 +4,16 @@ id: project-dependencies sidebar_label: "Project dependencies" description: "Reference public models across dbt projects" pagination_next: null +keyword: dbt mesh, project dependencies, ref, cross project ref, project dependencies --- :::info Available in Public Preview for dbt Cloud Enterprise accounts Project dependencies and cross-project `ref` are features available in [dbt Cloud Enterprise](https://www.getdbt.com/pricing), currently in [Public Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). -Enterprise users can use these features by designating a [public model](/docs/collaborate/govern/model-access) and adding a [cross-project ref](#how-to-use-ref). +If you have an [Enterprise account](https://www.getdbt.com/pricing), you can unlock these features by designating a [public model](/docs/collaborate/govern/model-access) and adding a [cross-project ref](#how-to-write-cross-project-ref). ::: - For a long time, dbt has supported code reuse and extension by installing other projects as [packages](/docs/build/packages). When you install another project as a package, you are pulling in its full source code, and adding it to your own. This enables you to call macros and run models defined in that other project. While this is a great way to reuse code, share utility macros, and establish a starting point for common transformations, it's not a great way to enable collaboration across teams and at scale, especially at larger organizations. @@ -80,9 +80,9 @@ When you're building on top of another team's work, resolving the references in - You don't need to mirror any conditional configuration of the upstream project such as `vars`, environment variables, or `target.name`. You can reference them directly wherever the Finance team is building their models in production. Even if the Finance team makes changes like renaming the model, changing the name of its schema, or [bumping its version](/docs/collaborate/govern/model-versions), your `ref` would still resolve successfully. - You eliminate the risk of accidentally building those models with `dbt run` or `dbt build`. While you can select those models, you can't actually build them. This prevents unexpected warehouse costs and permissions issues. This also ensures proper ownership and cost allocation for each team's models. -### How to use ref +### How to write cross-project ref -**Writing `ref`:** Models referenced from a `project`-type dependency must use [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant), including the project name: +**Writing `ref`:** Models referenced from a `project`-type dependency must use [two-argument `ref`](/reference/dbt-jinja-functions/ref#ref-project-specific-models), including the project name: diff --git a/website/docs/docs/core/connect-data-platform/dremio-setup.md b/website/docs/docs/core/connect-data-platform/dremio-setup.md index 839dd8cffa8..21d0ee2956b 100644 --- a/website/docs/docs/core/connect-data-platform/dremio-setup.md +++ b/website/docs/docs/core/connect-data-platform/dremio-setup.md @@ -15,12 +15,6 @@ meta: config_page: '/reference/resource-configs/no-configs' --- -:::info Vendor plugin - -Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. - -::: - import SetUpPages from '/snippets/_setup-pages-intro.md'; diff --git a/website/docs/docs/core/connect-data-platform/snowflake-setup.md b/website/docs/docs/core/connect-data-platform/snowflake-setup.md index 2b426ef667b..2ab5e64e36a 100644 --- a/website/docs/docs/core/connect-data-platform/snowflake-setup.md +++ b/website/docs/docs/core/connect-data-platform/snowflake-setup.md @@ -98,7 +98,8 @@ Along with adding the `authenticator` parameter, be sure to run `alter account s ### Key Pair Authentication -To use key pair authentication, omit a `password` and instead provide a `private_key_path` and, optionally, a `private_key_passphrase` in your target. **Note:** Versions of dbt before 0.16.0 required that private keys were encrypted and a `private_key_passphrase` was provided. This behavior was changed in dbt v0.16.0. +To use key pair authentication, skip the `password` and provide a `private_key_path`. If needed, you can also add a `private_key_passphrase`. +**Note**: Unencrypted private keys are accepted, so add a passphrase only if necessary. Starting from [dbt v1.5.0](/docs/dbt-versions/core), you have the option to use a `private_key` string instead of a `private_key_path`. The `private_key` string should be in either Base64-encoded DER format, representing the key bytes, or a plain-text PEM format. Refer to [Snowflake documentation](https://docs.snowflake.com/developer-guide/python-connector/python-connector-example#using-key-pair-authentication-key-pair-rotation) for more info on how they generate the key. diff --git a/website/docs/docs/core/connect-data-platform/spark-setup.md b/website/docs/docs/core/connect-data-platform/spark-setup.md index 93595cea3f6..9d9e0c9d5fb 100644 --- a/website/docs/docs/core/connect-data-platform/spark-setup.md +++ b/website/docs/docs/core/connect-data-platform/spark-setup.md @@ -20,10 +20,6 @@ meta: -:::note -See [Databricks setup](#databricks-setup) for the Databricks version of this page. -::: - import SetUpPages from '/snippets/_setup-pages-intro.md'; @@ -204,6 +200,7 @@ connect_retries: 3 + ### Server side configuration Spark can be customized using [Application Properties](https://spark.apache.org/docs/latest/configuration.html). Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to [set a Spark catalogs](https://spark.apache.org/docs/latest/configuration.html#spark-sql). diff --git a/website/docs/docs/core/connect-data-platform/teradata-setup.md b/website/docs/docs/core/connect-data-platform/teradata-setup.md index 1a30a1a4a54..4f467968716 100644 --- a/website/docs/docs/core/connect-data-platform/teradata-setup.md +++ b/website/docs/docs/core/connect-data-platform/teradata-setup.md @@ -38,6 +38,7 @@ import SetUpPages from '/snippets/_setup-pages-intro.md'; |1.4.x.x | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |1.5.x | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |1.6.x | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ +|1.7.x | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ ## dbt dependent packages version compatibility @@ -45,6 +46,7 @@ import SetUpPages from '/snippets/_setup-pages-intro.md'; |--------------|------------|-------------------|----------------| | 1.2.x | 1.2.x | 0.1.0 | 0.9.x or below | | 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 | +| 1.7.0 | 1.7.3 | 1.1.1 | 1.1.1 | ### Connecting to Teradata @@ -172,6 +174,8 @@ For using cross DB macros, teradata-utils as a macro namespace will not be used, | Cross-database macros | type_string | :white_check_mark: | custom macro provided | | Cross-database macros | last_day | :white_check_mark: | no customization needed, see [compatibility note](#last_day) | | Cross-database macros | width_bucket | :white_check_mark: | no customization +| Cross-database macros | generate_series | :white_check_mark: | custom macro provided +| Cross-database macros | date_spine | :white_check_mark: | no customization #### examples for cross DB macros diff --git a/website/docs/docs/core/connect-data-platform/trino-setup.md b/website/docs/docs/core/connect-data-platform/trino-setup.md index a7dc658358f..4caa56dcb00 100644 --- a/website/docs/docs/core/connect-data-platform/trino-setup.md +++ b/website/docs/docs/core/connect-data-platform/trino-setup.md @@ -4,7 +4,7 @@ description: "Read this guide to learn about the Starburst/Trino warehouse setup id: "trino-setup" meta: maintained_by: Starburst Data, Inc. - authors: Marius Grama, Przemek Denkiewicz, Michiel de Smet + authors: Marius Grama, Przemek Denkiewicz, Michiel de Smet, Damian Owsianny github_repo: 'starburstdata/dbt-trino' pypi_package: 'dbt-trino' min_core_version: 'v0.20.0' @@ -30,7 +30,7 @@ The parameters for setting up a connection are for Starburst Enterprise, Starbur ## Host parameters -The following profile fields are always required except for `user`, which is also required unless you're using the `oauth`, `cert`, or `jwt` authentication methods. +The following profile fields are always required except for `user`, which is also required unless you're using the `oauth`, `oauth_console`, `cert`, or `jwt` authentication methods. | Field | Example | Description | | --------- | ------- | ----------- | @@ -71,6 +71,7 @@ The authentication methods that dbt Core supports are: - `jwt` — JSON Web Token (JWT) - `certificate` — Certificate-based authentication - `oauth` — Open Authentication (OAuth) +- `oauth_console` — Open Authentication (OAuth) with authentication URL printed to the console - `none` — None, no authentication Set the `method` field to the authentication method you intend to use for the connection. For a high-level introduction to authentication in Trino, see [Trino Security: Authentication types](https://trino.io/docs/current/security/authentication-types.html). @@ -85,6 +86,7 @@ Click on one of these authentication methods for further details on how to confi {label: 'JWT', value: 'jwt'}, {label: 'Certificate', value: 'certificate'}, {label: 'OAuth', value: 'oauth'}, + {label: 'OAuth (console)', value: 'oauth_console'}, {label: 'None', value: 'none'}, ]} > @@ -269,7 +271,36 @@ sandbox-galaxy: host: bunbundersders.trino.galaxy-dev.io catalog: dbt_target schema: dataders - port: 433 + port: 443 +``` + + + + + +The only authentication parameter to set for OAuth 2.0 is `method: oauth_console`. If you're using Starburst Enterprise or Starburst Galaxy, you must enable OAuth 2.0 in Starburst before you can use this authentication method. + +For more information, refer to both [OAuth 2.0 authentication](https://trino.io/docs/current/security/oauth2.html) in the Trino docs and the [README](https://github.com/trinodb/trino-python-client#oauth2-authentication) for the Trino Python client. + +The only difference between `oauth_console` and `oauth` is: +- `oauth` — An authentication URL automatically opens in a browser. +- `oauth_console` — A URL is printed to the console. + +It's recommended that you install `keyring` to cache the OAuth 2.0 token over multiple dbt invocations by running `python -m pip install 'trino[external-authentication-token-cache]'`. The `keyring` package is not installed by default. + +#### Example profiles.yml for OAuth + +```yaml +sandbox-galaxy: + target: oauth_console + outputs: + oauth: + type: trino + method: oauth_console + host: bunbundersders.trino.galaxy-dev.io + catalog: dbt_target + schema: dataders + port: 443 ``` diff --git a/website/docs/docs/core/connect-data-platform/vertica-setup.md b/website/docs/docs/core/connect-data-platform/vertica-setup.md index b1424289137..8e499d68b3e 100644 --- a/website/docs/docs/core/connect-data-platform/vertica-setup.md +++ b/website/docs/docs/core/connect-data-platform/vertica-setup.md @@ -6,7 +6,7 @@ meta: authors: 'Vertica (Former authors: Matthew Carter, Andy Regan, Andrew Hedengren)' github_repo: 'vertica/dbt-vertica' pypi_package: 'dbt-vertica' - min_core_version: 'v1.6.0 and newer' + min_core_version: 'v1.7.0' cloud_support: 'Not Supported' min_supported_version: 'Vertica 23.4.0' slack_channel_name: 'n/a' @@ -46,10 +46,12 @@ your-profile: username: [your username] password: [your password] database: [database name] + oauth_access_token: [access token] schema: [dbt schema] connection_load_balance: True backup_server_node: [list of backup hostnames or IPs] retries: [1 or more] + threads: [1 or more] target: dev ``` @@ -70,6 +72,7 @@ your-profile: | username | The username to use to connect to the server. | Yes | None | dbadmin| password |The password to use for authenticating to the server. |Yes|None|my_password| database |The name of the database running on the server. |Yes | None | my_db | +| oauth_access_token | To authenticate via OAuth, provide an OAuth Access Token that authorizes a user to the database. | No | "" | Default: "" | schema| The schema to build models into.| No| None |VMart| connection_load_balance| A Boolean value that indicates whether the connection can be redirected to a host in the database other than host.| No| True |True| backup_server_node| List of hosts to connect to if the primary host specified in the connection (host, port) is unreachable. Each item in the list should be either a host string (using default port 5433) or a (host, port) tuple. A host can be a host name or an IP address.| No| None |['123.123.123.123','www.abc.com',('123.123.123.124',5433)]| diff --git a/website/docs/docs/dbt-cloud-apis/sl-api-overview.md b/website/docs/docs/dbt-cloud-apis/sl-api-overview.md index 6644d3e4b8b..0ddbc6888db 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-api-overview.md +++ b/website/docs/docs/dbt-cloud-apis/sl-api-overview.md @@ -15,7 +15,7 @@ import DeprecationNotice from '/snippets/_sl-deprecation-notice.md'; -The rapid growth of different tools in the modern data stack has helped data professionals address the diverse needs of different teams. The downside of this growth is the fragmentation of business logic across teams, tools, and workloads. +The rapid growth of different tools in the modern data stack has helped data professionals address the diverse needs of different teams. The downside of this growth is the fragmentation of business logic across teams, tools, and workloads.
The [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) allows you to define metrics in code (with [MetricFlow](/docs/build/about-metricflow)) and dynamically generate and query datasets in downstream tools based on their dbt governed assets, such as metrics and models. Integrating with the dbt Semantic Layer will help organizations that use your product make more efficient and trustworthy decisions with their data. It also helps you to avoid duplicative coding, optimize development workflow, ensure data governance, and guarantee consistency for data consumers. diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 3555b211f4f..56a4ac7ba59 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -26,6 +26,8 @@ The dbt Semantic Layer GraphQL API allows you to explore and query metrics and d dbt Partners can use the Semantic Layer GraphQL API to build an integration with the dbt Semantic Layer. +Note that the dbt Semantic Layer API doesn't support `ref` to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code. + ## Requirements to use the GraphQL API - A dbt Cloud project on dbt v1.6 or higher - Metrics are defined and configured diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 345be39635e..97f70902c74 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -33,6 +33,8 @@ You *may* be able to use our JDBC API with tools that do not have an official in Refer to [Get started with the dbt Semantic Layer](/docs/use-dbt-semantic-layer/quickstart-sl) for more info. +Note that the dbt Semantic Layer API doesn't support `ref` to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code. + ## Authentication dbt Cloud authorizes requests to the dbt Semantic Layer API. You need to provide an environment ID, host, and [service account tokens](/docs/dbt-cloud-apis/service-tokens). diff --git a/website/docs/docs/dbt-versions/core-upgrade/01-upgrading-to-v1.6.md b/website/docs/docs/dbt-versions/core-upgrade/01-upgrading-to-v1.6.md index 33a038baa9b..f1f7a77e1e1 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/01-upgrading-to-v1.6.md +++ b/website/docs/docs/dbt-versions/core-upgrade/01-upgrading-to-v1.6.md @@ -79,7 +79,7 @@ Support for BigQuery coming soon. [**Deprecation date**](/reference/resource-properties/deprecation_date): Models can declare a deprecation date that will warn model producers and downstream consumers. This enables clear migration windows for versioned models, and provides a mechanism to facilitate removal of immature or little-used models, helping to avoid project bloat. -[Model names](/faqs/Models/unique-model-names) can be duplicated across different namespaces (projects/packages), so long as they are unique within each project/package. We strongly encourage using [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant) when referencing a model from a different package/project. +[Model names](/faqs/Models/unique-model-names) can be duplicated across different namespaces (projects/packages), so long as they are unique within each project/package. We strongly encourage using [two-argument `ref`](/reference/dbt-jinja-functions/ref#ref-project-specific-models) when referencing a model from a different package/project. More consistency and flexibility around packages. Resources defined in a package will respect variable and global macro definitions within the scope of that package. - `vars` defined in a package's `dbt_project.yml` are now available in the resolution order when compiling nodes in that package, though CLI `--vars` and the root project's `vars` will still take precedence. See ["Variable Precedence"](/docs/build/project-variables#variable-precedence) for details. diff --git a/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md new file mode 100644 index 00000000000..c0236a30783 --- /dev/null +++ b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md @@ -0,0 +1,15 @@ +--- +title: "New: Native support for partial parsing" +description: "December 2023: For faster run times with your dbt invocations, configure dbt Cloud to parse only the changed files in your project." +sidebar_label: "New: Native support for partial parsing" +sidebar_position: 09 +tags: [Jan-2024] +date: 2024-01-03 +--- + +By default, dbt parses all the files in your project at the beginning of every dbt invocation. Depending on the size of your project, this operation can take a long time to complete. With the new partial parsing feature in dbt Cloud, you can reduce the time it takes for dbt to parse your project. When enabled, dbt Cloud parses only the changed files in your project instead of parsing all the project files. As a result, your dbt invocations will take less time to run. + +To learn more, refer to [Partial parsing](/docs/deploy/deploy-environments#partial-parsing). + + + diff --git a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/dec-sl-updates.md b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/dec-sl-updates.md new file mode 100644 index 00000000000..401b43fb333 --- /dev/null +++ b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/dec-sl-updates.md @@ -0,0 +1,22 @@ +--- +title: "dbt Semantic Layer updates for December 2023" +description: "December 2023: Enhanced Tableau integration, BIGINT support, LookML to MetricFlow conversion, and deprecation of legacy features." +sidebar_label: "Update and fixes: dbt Semantic Layer" +sidebar_position: 08 +date: 2023-12-22 +--- +The dbt Labs team continues to work on adding new features, fixing bugs, and increasing reliability for the dbt Semantic Layer. The following list explains the updates and fixes for December 2023 in more detail. + +## Bug fixes + +- Tableau integration — The dbt Semantic Layer integration with Tableau now supports queries that resolve to a "NOT IN" clause. This applies to using "exclude" in the filtering user interface. Previously it wasn’t supported. +- `BIGINT` support — The dbt Semantic Layer can now support `BIGINT` values with precision greater than 18. Previously it would return an error. +- Memory leak — Fixed a memory leak in the JDBC API that would previously lead to intermittent errors when querying it. +- Data conversion support — Added support for converting various Redshift and Postgres-specific data types. Previously, the driver would throw an error when encountering columns with those types. + + +## Improvements + +- Deprecation — We deprecated [dbt Metrics and the legacy dbt Semantic Layer](/docs/dbt-versions/release-notes/Dec-2023/legacy-sl), both supported on dbt version 1.5 or lower. This change came into effect on December 15th, 2023. +- Improved dbt converter tool — The [dbt converter tool](https://github.com/dbt-labs/dbt-converter) can now help automate some of the work in converting from LookML (Looker's modeling language) for those who are migrating. Previously this wasn’t available. + diff --git a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md index 7c35991e961..eff15e96cfd 100644 --- a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md +++ b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md @@ -11,4 +11,4 @@ Now available for dbt Cloud Enterprise plans is a new option to enable Git repos To learn more, refer to [Repo caching](/docs/deploy/deploy-environments#git-repository-caching). - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md b/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md index ba82234c0b5..5cf1f97ff25 100644 --- a/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md +++ b/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md @@ -27,8 +27,12 @@ Jobs scheduled at the top of the hour used to take over 106 seconds to prepare b Our enhanced scheduler offers more durability and empowers users to run jobs effortlessly. -This means Enterprise, multi-tenant accounts can now enjoy the advantages of unlimited job concurrency. Previously limited to a fixed number of run slots, Enterprise accounts now have the freedom to operate without constraints. Single-tenant support will be coming soon. Team plan customers will continue to have only 2 run slots. +This means Enterprise, multi-tenant accounts can now enjoy the advantages of unlimited job concurrency. Previously limited to a fixed number of run slots, Enterprise accounts now have the freedom to operate without constraints. Single-tenant support will be coming soon. -Something to note, each running job occupies a run slot for its duration, and if all slots are occupied, jobs will queue accordingly. +Something to note, each running job occupies a run slot for its duration, and if all slots are occupied, jobs will queue accordingly. For more feature details, refer to the [dbt Cloud pricing page](https://www.getdbt.com/pricing/). + +Note, Team accounts created after July 2023 benefit from unlimited job concurrency: +- Legacy Team accounts have a fixed number of run slots. +- Both Team and Developer plans are limited to one project each. For larger-scale needs, our [Enterprise plan](https://www.getdbt.com/pricing/) offers features such as audit logging, unlimited job concurrency and projects, and more. diff --git a/website/docs/docs/deploy/job-scheduler.md b/website/docs/docs/deploy/job-scheduler.md index fba76f677a7..7a4cd740804 100644 --- a/website/docs/docs/deploy/job-scheduler.md +++ b/website/docs/docs/deploy/job-scheduler.md @@ -31,7 +31,7 @@ Familiarize yourself with these useful terms to help you understand how the job | Over-scheduled job | A situation when a cron-scheduled job's run duration becomes longer than the frequency of the job’s schedule, resulting in a job queue that will grow faster than the scheduler can process the job’s runs. | | Prep time | The time dbt Cloud takes to create a short-lived environment to execute the job commands in the user's cloud data platform. Prep time varies most significantly at the top of the hour when the dbt Cloud Scheduler experiences a lot of run traffic. | | Run | A single, unique execution of a dbt job. | -| Run slot | Run slots control the number of jobs that can run concurrently. Developer and Team plan accounts have a fixed number of run slots, and Enterprise users have [unlimited run slots](/docs/dbt-versions/release-notes/July-2023/faster-run#unlimited-job-concurrency-for-enterprise-accounts). Each running job occupies a run slot for the duration of the run. If you need more jobs to execute in parallel, consider the [Enterprise plan](https://www.getdbt.com/pricing/) | +| Run slot | Run slots control the number of jobs that can run concurrently. Developer plans have a fixed number of run slots, while Enterprise and Team plans have [unlimited run slots](/docs/dbt-versions/release-notes/July-2023/faster-run#unlimited-job-concurrency-for-enterprise-accounts). Each running job occupies a run slot for the duration of the run.

Team and Developer plans are limited to one project each. For additional projects, consider upgrading to the [Enterprise plan](https://www.getdbt.com/pricing/).| | Threads | When dbt builds a project's DAG, it tries to parallelize the execution by using threads. The [thread](/docs/running-a-dbt-project/using-threads) count is the maximum number of paths through the DAG that dbt can work on simultaneously. The default thread count in a job is 4. | | Wait time | Amount of time that dbt Cloud waits before running a job, either because there are no available slots or because a previous run of the same job is still in progress. | diff --git a/website/docs/faqs/Models/unique-model-names.md b/website/docs/faqs/Models/unique-model-names.md index c721fca7c6e..7878a5a704c 100644 --- a/website/docs/faqs/Models/unique-model-names.md +++ b/website/docs/faqs/Models/unique-model-names.md @@ -10,7 +10,7 @@ id: unique-model-names Within one project: yes! To build dependencies between models, you need to use the `ref` function, and pass in the model name as an argument. dbt uses that model name to uniquely resolve the `ref` to a specific model. As a result, these model names need to be unique, _even if they are in distinct folders_. -A model in one project can have the same name as a model in another project (installed as a dependency). dbt uses the project name to uniquely identify each model. We call this "namespacing." If you `ref` a model with a duplicated name, it will resolve to the model within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant) to disambiguate references by specifying the namespace. +A model in one project can have the same name as a model in another project (installed as a dependency). dbt uses the project name to uniquely identify each model. We call this "namespacing." If you `ref` a model with a duplicated name, it will resolve to the model within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](/reference/dbt-jinja-functions/ref#ref-project-specific-models) to disambiguate references by specifying the namespace. Those models will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](/docs/build/custom-aliases) and [custom schemas](/docs/build/custom-schemas) for details on how to achieve this. diff --git a/website/docs/guides/create-new-materializations.md b/website/docs/guides/create-new-materializations.md index af2732c0c39..52a8594b0d2 100644 --- a/website/docs/guides/create-new-materializations.md +++ b/website/docs/guides/create-new-materializations.md @@ -13,7 +13,7 @@ recently_updated: true ## Introduction -The model materializations you're familiar with, `table`, `view`, and `incremental` are implemented as macros in a package that's distributed along with dbt. You can check out the [source code for these materializations](https://github.com/dbt-labs/dbt-core/tree/main/core/dbt/include/global_project/macros/materializations). If you need to create your own materializations, reading these files is a good place to start. Continue reading below for a deep-dive into dbt materializations. +The model materializations you're familiar with, `table`, `view`, and `incremental` are implemented as macros in a package that's distributed along with dbt. You can check out the [source code for these materializations](https://github.com/dbt-labs/dbt-core/tree/main/core/dbt/adapters/include/global_project/macros/materializations). If you need to create your own materializations, reading these files is a good place to start. Continue reading below for a deep-dive into dbt materializations. :::caution diff --git a/website/docs/guides/custom-cicd-pipelines.md b/website/docs/guides/custom-cicd-pipelines.md index bd6d7617623..1778098f752 100644 --- a/website/docs/guides/custom-cicd-pipelines.md +++ b/website/docs/guides/custom-cicd-pipelines.md @@ -511,7 +511,7 @@ This section is only for those projects that connect to their git repository usi ::: -The setup for this pipeline will use the same steps as the prior page. Before moving on, **follow steps 1-5 from the [prior page](https://docs.getdbt.com/guides/orchestration/custom-cicd-pipelines/3-dbt-cloud-job-on-merge)** +The setup for this pipeline will use the same steps as the prior page. Before moving on, follow steps 1-5 from the [prior page](https://docs.getdbt.com/guides/custom-cicd-pipelines?step=2). ### 1. Create a pipeline job that runs when PRs are created diff --git a/website/docs/guides/sl-migration.md b/website/docs/guides/sl-migration.md index 8ede40a6a2d..afa181646e3 100644 --- a/website/docs/guides/sl-migration.md +++ b/website/docs/guides/sl-migration.md @@ -25,21 +25,26 @@ dbt Labs recommends completing these steps in a local dev environment (such as t 1. Create new Semantic Model configs as YAML files in your dbt project.* 1. Upgrade the metrics configs in your project to the new spec.* 1. Delete your old metrics file or remove the `.yml` file extension so they're ignored at parse time. Remove the `dbt-metrics` package from your project. Remove any macros that reference `dbt-metrics`, like `metrics.calculate()`. Make sure that any packages you’re using don't have references to the old metrics spec. -1. Install the CLI with `python -m pip install "dbt-metricflow[your_adapter_name]"`. For example: +1. Install the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) to run MetricFlow commands and define your semantic model configurations. + - If you're using dbt Core, install the [MetricFlow CLI](/docs/build/metricflow-commands) with `python -m pip install "dbt-metricflow[your_adapter_name]"`. For example: ```bash python -m pip install "dbt-metricflow[snowflake]" ``` - **Note** - The MetricFlow CLI is not available in the IDE at this time. Support is coming soon. + **Note** - MetricFlow commands aren't yet supported in the dbt CLoud IDE at this time. -1. Run `dbt parse`. This parses your project and creates a `semantic_manifest.json` file in your target directory. MetricFlow needs this file to query metrics. If you make changes to your configs, you will need to parse your project again. -1. Run `mf list metrics` to view the metrics in your project. -1. Test querying a metric by running `mf query --metrics --group-by `. For example: +2. Run `dbt parse`. This parses your project and creates a `semantic_manifest.json` file in your target directory. MetricFlow needs this file to query metrics. If you make changes to your configs, you will need to parse your project again. +3. Run `mf list metrics` to view the metrics in your project. +4. Test querying a metric by running `mf query --metrics --group-by `. For example: ```bash mf query --metrics revenue --group-by metric_time ``` -1. Run `mf validate-configs` to run semantic and warehouse validations. This ensures your configs are valid and the underlying objects exist in your warehouse. -1. Push these changes to a new branch in your repo. +5. Run `mf validate-configs` to run semantic and warehouse validations. This ensures your configs are valid and the underlying objects exist in your warehouse. +6. Push these changes to a new branch in your repo. + +:::info `ref` not supported +The dbt Semantic Layer API doesn't support `ref` to call dbt objects. This is currently due to differences in architecture between the legacy Semantic Layer and the re-released Semantic Layer. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code. +::: **To make this process easier, dbt Labs provides a [custom migration tool](https://github.com/dbt-labs/dbt-converter) that automates these steps for you. You can find installation instructions in the [README](https://github.com/dbt-labs/dbt-converter/blob/master/README.md). Derived metrics aren’t supported in the migration tool, and will have to be migrated manually.* diff --git a/website/docs/reference/dbt-jinja-functions/debug-method.md b/website/docs/reference/dbt-jinja-functions/debug-method.md index 0938970b50c..778ad095693 100644 --- a/website/docs/reference/dbt-jinja-functions/debug-method.md +++ b/website/docs/reference/dbt-jinja-functions/debug-method.md @@ -6,9 +6,9 @@ description: "The `{{ debug() }}` macro will open an iPython debugger." --- -:::caution New in v0.14.1 +:::warning Development environment only -The `debug` macro is new in dbt v0.14.1, and is only intended to be used in a development context with dbt. Do not deploy code to production which uses the `debug` macro. +The `debug` macro is only intended to be used in a development context with dbt. Do not deploy code to production that uses the `debug` macro. ::: diff --git a/website/docs/reference/dbt-jinja-functions/ref.md b/website/docs/reference/dbt-jinja-functions/ref.md index fda5992e234..bc1f3f1ba9e 100644 --- a/website/docs/reference/dbt-jinja-functions/ref.md +++ b/website/docs/reference/dbt-jinja-functions/ref.md @@ -3,6 +3,7 @@ title: "About ref function" sidebar_label: "ref" id: "ref" description: "Read this guide to understand the builtins Jinja function in dbt." +keyword: dbt mesh, project dependencies, ref, cross project ref, project dependencies --- The most important function in dbt is `ref()`; it's impossible to build even moderately complex models without it. `ref()` is how you reference one model within another. This is a very common behavior, as typically models are built to be "stacked" on top of one another. Here is how this looks in practice: @@ -68,15 +69,19 @@ select * from {{ ref('model_name', version=1) }} select * from {{ ref('model_name') }} ``` -### Two-argument variant +### Ref project-specific models -You can also use a two-argument variant of the `ref` function. With this variant, you can pass both a namespace (project or package) and model name to `ref` to avoid ambiguity. When using two arguments with projects (not packages), you also need to set [cross project dependencies](/docs/collaborate/govern/project-dependencies). +You can also reference models from different projects using the two-argument variant of the `ref` function. By specifying both a namespace (which could be a project or package) and a model name, you ensure clarity and avoid any ambiguity in the `ref`. This is also useful when dealing with models across various projects or packages. + +When using two arguments with projects (not packages), you also need to set [cross project dependencies](/docs/collaborate/govern/project-dependencies). + +The following syntax demonstrates how to reference a model from a specific project or package: ```sql select * from {{ ref('project_or_package', 'model_name') }} ``` -We recommend using two-argument `ref` any time you are referencing a model defined in a different package or project. While not required in all cases, it's more explicit for you, for dbt, and for future readers of your code. +We recommend using two-argument `ref` any time you are referencing a model defined in a different package or project. While not required in all cases, it's more explicit for you, for dbt, and future readers of your code. diff --git a/website/docs/reference/global-configs/usage-stats.md b/website/docs/reference/global-configs/usage-stats.md index 1f9492f4a43..01465bcac2a 100644 --- a/website/docs/reference/global-configs/usage-stats.md +++ b/website/docs/reference/global-configs/usage-stats.md @@ -18,4 +18,3 @@ config: dbt Core users can also use the DO_NOT_TRACK environment variable to enable or disable sending anonymous data. For more information, see [Environment variables](/docs/build/environment-variables). `DO_NOT_TRACK=1` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=False` -`DO_NOT_TRACK=0` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=True` diff --git a/website/docs/reference/parsing.md b/website/docs/reference/parsing.md index 1a68ba0d476..6eed4c96af0 100644 --- a/website/docs/reference/parsing.md +++ b/website/docs/reference/parsing.md @@ -41,7 +41,7 @@ The [`PARTIAL_PARSE` global config](/reference/global-configs/parsing) can be en Parse-time attributes (dependencies, configs, and resource properties) are resolved using the parse-time context. When partial parsing is enabled, and certain context variables change, those attributes will _not_ be re-resolved, and are likely to become stale. -In particular, you may see **incorrect results** if these attributes depend on "volatile" context variables, such as [`run_started_at`](/reference/dbt-jinja-functions/run_started_at), [`invocation_id`](/reference/dbt-jinja-functions/invocation_id), or [flags](/reference/dbt-jinja-functions/flags). These variables are likely (or even guaranteed!) to change in each invocation. We _highly discourage_ you from using these variables to set parse-time attributes (dependencies, configs, and resource properties). +In particular, you may see incorrect results if these attributes depend on "volatile" context variables, such as [`run_started_at`](/reference/dbt-jinja-functions/run_started_at), [`invocation_id`](/reference/dbt-jinja-functions/invocation_id), or [flags](/reference/dbt-jinja-functions/flags). These variables are likely (or even guaranteed!) to change in each invocation. dbt Labs _strongly discourages_ you from using these variables to set parse-time attributes (dependencies, configs, and resource properties). Starting in v1.0, dbt _will_ detect changes in environment variables. It will selectively re-parse only the files that depend on that [`env_var`](/reference/dbt-jinja-functions/env_var) value. (If the env var is used in `profiles.yml` or `dbt_project.yml`, a full re-parse is needed.) However, dbt will _not_ re-render **descriptions** that include env vars. If your descriptions include frequently changing env vars (this is highly uncommon), we recommend that you fully re-parse when generating documentation: `dbt --no-partial-parse docs generate`. @@ -51,7 +51,9 @@ If certain inputs change between runs, dbt will trigger a full re-parse. The res - `dbt_project.yml` content (or `env_var` values used within) - installed packages - dbt version -- certain widely-used macros, e.g. [builtins](/reference/dbt-jinja-functions/builtins) overrides or `generate_x_name` for `database`/`schema`/`alias` +- certain widely-used macros (for example, [builtins](/reference/dbt-jinja-functions/builtins), overrides, or `generate_x_name` for `database`/`schema`/`alias`) + +If you're triggering [CI](/docs/deploy/continuous-integration) job runs, the benefits of partial parsing are not applicable to new pull requests (PR) or new branches. However, they are applied on subsequent commits to the new PR or branch. If you ever get into a bad state, you can disable partial parsing and trigger a full re-parse by setting the `PARTIAL_PARSE` global config to false, or by deleting `target/partial_parse.msgpack` (e.g. by running `dbt clean`). diff --git a/website/docs/reference/resource-configs/full_refresh.md b/website/docs/reference/resource-configs/full_refresh.md index f75fe3a583b..c7f1b799087 100644 --- a/website/docs/reference/resource-configs/full_refresh.md +++ b/website/docs/reference/resource-configs/full_refresh.md @@ -74,7 +74,7 @@ Optionally set a resource to always or never full-refresh. -This logic is encoded in the [`should_full_refresh()`](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/configs.sql#L6) macro. +This logic is encoded in the [`should_full_refresh()`](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/include/global_project/macros/materializations/configs.sql#L6) macro. ## Usage diff --git a/website/docs/reference/resource-configs/store_failures.md b/website/docs/reference/resource-configs/store_failures.md index 2c596d1cf3e..8a83809152b 100644 --- a/website/docs/reference/resource-configs/store_failures.md +++ b/website/docs/reference/resource-configs/store_failures.md @@ -12,7 +12,7 @@ Optionally set a test to always or never store its failures in the database. - If the `store_failures` config is `none` or omitted, the resource will use the value of the `--store-failures` flag. - When true, `store_failures` save all the record(s) that failed the test only if [limit](/reference/resource-configs/limit) is not set or if there are fewer records than the limit. `store_failures` are saved in a new table with the name of the test. By default, `store_failures` use a schema named `dbt_test__audit`, but, you can [configure](/reference/resource-configs/schema#tests) the schema to a different value. -This logic is encoded in the [`should_store_failures()`](https://github.com/dbt-labs/dbt-core/blob/98c015b7754779793e44e056905614296c6e4527/core/dbt/include/global_project/macros/materializations/helpers.sql#L77) macro. +This logic is encoded in the [`should_store_failures()`](https://github.com/dbt-labs/dbt-core/blob/77632122974b28967221758b4a470d7dfb608ac2/core/dbt/adapters/include/global_project/macros/materializations/configs.sql#L15) macro. diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index 3cef8b0df51..2bfcf0a94e4 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -132,8 +132,8 @@ This is a **required configuration**. There is no default value. ### Advanced: define and use custom snapshot strategy Behind the scenes, snapshot strategies are implemented as macros, named `snapshot__strategy` -* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql#L65) for the timestamp strategy -* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql#L131) for the check strategy +* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/adapters/include/global_project/macros/materializations/snapshots/strategies.sql#L52) for the timestamp strategy +* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/adapters/include/global_project/macros/materializations/snapshots/strategies.sql#L136) for the check strategy It's possible to implement your own snapshot strategy by adding a macro with the same naming pattern to your project. For example, you might choose to create a strategy which records hard deletes, named `timestamp_with_deletes`. diff --git a/website/docs/reference/resource-configs/trino-configs.md b/website/docs/reference/resource-configs/trino-configs.md index 21df13feac4..9ee62959f76 100644 --- a/website/docs/reference/resource-configs/trino-configs.md +++ b/website/docs/reference/resource-configs/trino-configs.md @@ -97,8 +97,9 @@ The `dbt-trino` adapter supports these modes in `table` materialization, which y - `rename` — Creates an intermediate table, renames the target table to the backup one, and renames the intermediate table to the target one. - `drop` — Drops and re-creates a table. This overcomes the table rename limitation in AWS Glue. +- `replace` — Replaces a table using CREATE OR REPLACE clause. Support for table replacement varies across connectors. Refer to the connector documentation for details. -The recommended `table` materialization uses `on_table_exists = 'rename'` and is also the default. You can change this default configuration by editing _one_ of these files: +If CREATE OR REPLACE is supported in underlying connector, `replace` is recommended option. Otherwise, the recommended `table` materialization uses `on_table_exists = 'rename'` and is also the default. You can change this default configuration by editing _one_ of these files: - the SQL file for your model - the `dbt_project.yml` configuration file diff --git a/website/docs/reference/resource-configs/vertica-configs.md b/website/docs/reference/resource-configs/vertica-configs.md index 598bc3fecee..90badfe29ad 100644 --- a/website/docs/reference/resource-configs/vertica-configs.md +++ b/website/docs/reference/resource-configs/vertica-configs.md @@ -99,7 +99,7 @@ You can use `on_schema_change` parameter with values `ignore`, `fail` and `appen
-#### Configuring the `apppend_new_columns` parameter +#### Configuring the `append_new_columns` parameter 0" tests: - - unique # primary_key constraint is not enforced + - unique # need this test because primary_key constraint is not enforced - name: customer_name data_type: text - name: first_transaction_date diff --git a/website/sidebars.js b/website/sidebars.js index a82b2e06ec2..27bcd1147a3 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -135,7 +135,6 @@ const sidebarSettings = { "docs/cloud/secure/redshift-privatelink", "docs/cloud/secure/postgres-privatelink", "docs/cloud/secure/vcs-privatelink", - "docs/cloud/secure/ip-restrictions", ], }, // PrivateLink "docs/cloud/billing", @@ -1028,6 +1027,8 @@ const sidebarSettings = { id: "best-practices/how-we-build-our-metrics/semantic-layer-1-intro", }, items: [ + "best-practices/how-we-build-our-metrics/semantic-layer-1-intro", + "best-practices/how-we-build-our-metrics/semantic-layer-2-setup", "best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models", "best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics", "best-practices/how-we-build-our-metrics/semantic-layer-5-refactor-a-mart", @@ -1045,6 +1046,7 @@ const sidebarSettings = { items: [ "best-practices/how-we-mesh/mesh-2-structures", "best-practices/how-we-mesh/mesh-3-implementation", + "best-practices/how-we-mesh/mesh-4-faqs", ], }, { diff --git a/website/snippets/_cloud-environments-info.md b/website/snippets/_cloud-environments-info.md index 6e096b83750..4a882589f77 100644 --- a/website/snippets/_cloud-environments-info.md +++ b/website/snippets/_cloud-environments-info.md @@ -34,24 +34,6 @@ Both development and deployment environments have a section called **General Set - If you select a current version with `(latest)` in the name, your environment will automatically install the latest stable version of the minor version selected. ::: -### Git repository caching - -At the start of every job run, dbt Cloud clones the project's Git repository so it has the latest versions of your project's code and runs `dbt deps` to install your dependencies. - -For improved reliability and performance on your job runs, you can enable dbt Cloud to keep a cache of the project's Git repository. So, if there's a third-party outage that causes the cloning operation to fail, dbt Cloud will instead use the cached copy of the repo so your jobs can continue running as scheduled. - -dbt Cloud caches your project's Git repo after each successful run and retains it for 8 days if there are no repo updates. It caches all packages regardless of installation method and does not fetch code outside of the job runs. - -To enable Git repository caching, select **Account settings** from the gear menu and enable the **Repository caching** option. - - - -:::note - -This feature is only available on the dbt Cloud Enterprise plan. - -::: - ### Custom branch behavior By default, all environments will use the default branch in your repository (usually the `main` branch) when accessing your dbt code. This is overridable within each dbt Cloud Environment using the **Default to a custom branch** option. This setting have will have slightly different behavior depending on the environment type: @@ -92,3 +74,42 @@ schema: dbt_alice threads: 4 ``` +### Git repository caching + +At the start of every job run, dbt Cloud clones the project's Git repository so it has the latest versions of your project's code and runs `dbt deps` to install your dependencies. + +For improved reliability and performance on your job runs, you can enable dbt Cloud to keep a cache of the project's Git repository. So, if there's a third-party outage that causes the cloning operation to fail, dbt Cloud will instead use the cached copy of the repo so your jobs can continue running as scheduled. + +dbt Cloud caches your project's Git repo after each successful run and retains it for 8 days if there are no repo updates. It caches all packages regardless of installation method and does not fetch code outside of the job runs. + +dbt Cloud will use the cached copy of your project's Git repo under these circumstances: + +- Outages from third-party services (for example, the [dbt package hub](https://hub.getdbt.com/)). +- Git authentication fails. +- There are syntax errors in the `packages.yml` file. You can set up and use [continuous integration (CI)](/docs/deploy/continuous-integration) to find these errors sooner. +- If a package doesn't work with the current dbt version. You can set up and use [continuous integration (CI)](/docs/deploy/continuous-integration) to identify this issue sooner. + +To enable Git repository caching, select **Account settings** from the gear menu and enable the **Repository caching** option. + + + +:::note + +This feature is only available on the dbt Cloud Enterprise plan. + +::: + +### Partial parsing + +At the start of every dbt invocation, dbt reads all the files in your project, extracts information, and constructs an internal manifest containing every object (model, source, macro, and so on). Among other things, it uses the `ref()`, `source()`, and `config()` macro calls within models to set properties, infer dependencies, and construct your project's DAG. When dbt finishes parsing your project, it stores the internal manifest in a file called `partial_parse.msgpack`. + +Parsing projects can be time-consuming, especially for large projects with hundreds of models and thousands of files. To reduce the time it takes dbt to parse your project, use the partial parsing feature in dbt Cloud for your environment. When enabled, dbt Cloud uses the `partial_parse.msgpack` file to determine which files have changed (if any) since the project was last parsed, and then it parses _only_ the changed files and the files related to those changes. + +Partial parsing in dbt Cloud requires dbt version 1.4 or newer. The feature does have some known limitations. Refer to [Known limitations](/reference/parsing#known-limitations) to learn more about them. + +To enable, select **Account settings** from the gear menu and enable the **Partial parsing** option. + + + + + diff --git a/website/snippets/_packages_or_dependencies.md b/website/snippets/_packages_or_dependencies.md index 5cc4c67e63c..61014bc2b1a 100644 --- a/website/snippets/_packages_or_dependencies.md +++ b/website/snippets/_packages_or_dependencies.md @@ -12,7 +12,7 @@ There are some important differences between Package dependencies and Project de -Project dependencies are designed for the [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro) and [cross-project reference](/docs/collaborate/govern/project-dependencies#how-to-use-ref) workflow: +Project dependencies are designed for the [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro) and [cross-project reference](/docs/collaborate/govern/project-dependencies#how-to-write-cross-project-ref) workflow: - Use `dependencies.yml` when you need to set up cross-project references between different dbt projects, especially in a dbt Mesh setup. - Use `dependencies.yml` when you want to include both projects and non-private dbt packages in your project's dependencies. diff --git a/website/snippets/_privatelink-hostname-restriction.md b/website/snippets/_privatelink-hostname-restriction.md new file mode 100644 index 00000000000..a4bcd318a15 --- /dev/null +++ b/website/snippets/_privatelink-hostname-restriction.md @@ -0,0 +1,5 @@ +:::caution Environment variables + +Using [Environment variables](/docs/build/environment-variables) when configuring PrivateLink endpoints isn't supported in dbt Cloud. Instead, use [Extended Attributes](/docs/deploy/deploy-environments#extended-attributes) to dynamically change these values in your dbt Cloud environment. + +::: diff --git a/website/snippets/dbt-databricks-for-databricks.md b/website/snippets/dbt-databricks-for-databricks.md index f1c5ec84af1..1e18da33d42 100644 --- a/website/snippets/dbt-databricks-for-databricks.md +++ b/website/snippets/dbt-databricks-for-databricks.md @@ -1,4 +1,5 @@ :::info If you're using Databricks, use `dbt-databricks` -If you're using Databricks, the `dbt-databricks` adapter is recommended over `dbt-spark`. -If you're still using dbt-spark with Databricks consider [migrating from the dbt-spark adapter to the dbt-databricks adapter](/guides/migrate-from-spark-to-databricks). +If you're using Databricks, the `dbt-databricks` adapter is recommended over `dbt-spark`. If you're still using dbt-spark with Databricks consider [migrating from the dbt-spark adapter to the dbt-databricks adapter](/guides/migrate-from-spark-to-databricks). + +For the Databricks version of this page, refer to [Databricks setup](#databricks-setup). ::: diff --git a/website/snippets/warehouse-setups-cloud-callout.md b/website/snippets/warehouse-setups-cloud-callout.md index 3bc1147a637..56edd3a96ea 100644 --- a/website/snippets/warehouse-setups-cloud-callout.md +++ b/website/snippets/warehouse-setups-cloud-callout.md @@ -1,3 +1,3 @@ -:::info `profiles.yml` file is for CLI users only -If you're using dbt Cloud, you don't need to create a `profiles.yml` file. This file is only for CLI users. To connect your data platform to dbt Cloud, refer to [About data platforms](/docs/cloud/connect-data-platform/about-connections). +:::info `profiles.yml` file is for dbt Core users only +If you're using dbt Cloud, you don't need to create a `profiles.yml` file. This file is only for dbt Core users. To connect your data platform to dbt Cloud, refer to [About data platforms](/docs/cloud/connect-data-platform/about-connections). ::: diff --git a/website/src/components/detailsToggle/index.js b/website/src/components/detailsToggle/index.js index ba53192e54b..076d053846c 100644 --- a/website/src/components/detailsToggle/index.js +++ b/website/src/components/detailsToggle/index.js @@ -40,7 +40,7 @@ useEffect(() => { onMouseLeave={handleMouseLeave} >   - {alt_header} + {alt_header} {/* Visual disclaimer */} Hover to view diff --git a/website/src/components/detailsToggle/styles.module.css b/website/src/components/detailsToggle/styles.module.css index 446d3197128..b9c2f09df06 100644 --- a/website/src/components/detailsToggle/styles.module.css +++ b/website/src/components/detailsToggle/styles.module.css @@ -1,9 +1,11 @@ -:local(.link) { +:local(.link) :local(.headerText) { color: var(--ifm-link-color); - transition: background-color 0.3s; /* Smooth transition for background color */ + text-decoration: none; + transition: text-decoration 0.3s; /* Smooth transition */ } -:local(.link:hover), :local(.link:focus) { +:local(.link:hover) :local(.headerText), +:local(.link:focus) :local(.headerText) { text-decoration: underline; cursor: pointer; } @@ -12,6 +14,7 @@ font-size: 0.8em; color: #666; margin-left: 10px; /* Adjust as needed */ + text-decoration: none; } :local(.toggle) { diff --git a/website/src/components/faqs/index.js b/website/src/components/faqs/index.js index 52c4573d883..0741a29cd89 100644 --- a/website/src/components/faqs/index.js +++ b/website/src/components/faqs/index.js @@ -3,10 +3,10 @@ import styles from './styles.module.css'; import { usePluginData } from '@docusaurus/useGlobalData'; function FAQ({ path, alt_header = null }) { - const [isOn, setOn] = useState(false); - const [filePath, setFilePath] = useState(path) - const [fileContent, setFileContent] = useState({}) + const [filePath, setFilePath] = useState(path); + const [fileContent, setFileContent] = useState({}); + const [hoverTimeout, setHoverTimeout] = useState(null); // Get all faq file paths from plugin const { faqFiles } = usePluginData('docusaurus-build-global-data-plugin'); @@ -37,24 +37,45 @@ function FAQ({ path, alt_header = null }) { } }, [filePath]) - const toggleOn = function () { - setOn(!isOn); + const handleMouseEnter = () => { + setHoverTimeout(setTimeout(() => { + setOn(true); + }, 500)); + }; + + const handleMouseLeave = () => { + if (!isOn) { + clearTimeout(hoverTimeout); + setOn(false); } +}; + + useEffect(() => { + return () => { + if (hoverTimeout) { + clearTimeout(hoverTimeout); + } + }; + }, [hoverTimeout]); + + const toggleOn = () => { + if (hoverTimeout) { + clearTimeout(hoverTimeout); + } + setOn(!isOn); + }; return ( -
+
- -   - {alt_header || fileContent?.meta && fileContent.meta.title} - -
- {fileContent?.contents && fileContent.contents} + + {alt_header || (fileContent?.meta && fileContent.meta.title)} + Hover to view + +
+ {fileContent?.contents}
-
+
); } diff --git a/website/src/components/faqs/styles.module.css b/website/src/components/faqs/styles.module.css index e19156a3a7b..c179aa85cdc 100644 --- a/website/src/components/faqs/styles.module.css +++ b/website/src/components/faqs/styles.module.css @@ -1,9 +1,12 @@ -:local(.link) { +:local(.link) :local(.headerText) { color: var(--ifm-link-color); + text-decoration: none; + transition: text-decoration 0.3s; /* Smooth transition */ } -:local(.link:hover) { +:local(.link:hover) :local(.headerText), +:local(.link:focus) :local(.headerText) { text-decoration: underline; cursor: pointer; } @@ -24,6 +27,13 @@ filter: invert(1); } +:local(.disclaimer) { + font-size: 0.8em; + color: #666; + margin-left: 10px; /* Adjust as needed */ + text-decoration: none; +} + :local(.body) { margin-left: 2em; margin-bottom: 10px; diff --git a/website/static/img/docs/deploy/example-account-settings.png b/website/static/img/docs/deploy/example-account-settings.png new file mode 100644 index 00000000000..12b8d9bc49f Binary files /dev/null and b/website/static/img/docs/deploy/example-account-settings.png differ diff --git a/website/static/img/docs/deploy/example-repo-caching.png b/website/static/img/docs/deploy/example-repo-caching.png deleted file mode 100644 index 805d845dccb..00000000000 Binary files a/website/static/img/docs/deploy/example-repo-caching.png and /dev/null differ