diff --git a/website/docs/best-practices/how-we-mesh/mesh-1-intro.md b/website/docs/best-practices/how-we-mesh/mesh-1-intro.md index 0f27e64c447..fcd379de9cf 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-1-intro.md +++ b/website/docs/best-practices/how-we-mesh/mesh-1-intro.md @@ -32,6 +32,8 @@ dbt Cloud is designed to coordinate the features above and simplify the complexi If you're just starting your dbt journey, don't worry about building a multi-project architecture right away. You can _incrementally_ adopt the features in this guide as you scale. The collection of features work effectively as independent tools. Familiarizing yourself with the tooling and features that make up a multi-project architecture, and how they can apply to your organization will help you make better decisions as you grow. +For additional information, refer to the [dbt Mesh FAQs](/best-practices/how-we-mesh/mesh-4-faqs). + ## Learning goals - Understand the **purpose and tradeoffs** of building a multi-project architecture. diff --git a/website/docs/best-practices/how-we-mesh/mesh-4-faqs.md b/website/docs/best-practices/how-we-mesh/mesh-4-faqs.md new file mode 100644 index 00000000000..7119a3d90bd --- /dev/null +++ b/website/docs/best-practices/how-we-mesh/mesh-4-faqs.md @@ -0,0 +1,317 @@ +--- +title: "dbt Mesh FAQs" +description: "Read the FAQs to learn more about dbt Mesh, how it works, compatibility, and more." +hoverSnippet: "dbt Mesh FAQs" +sidebar_label: "dbt Mesh FAQs" +--- + +dbt Mesh is a new architecture enabled by dbt Cloud. It allows you to better manage complexity by deploying multiple interconnected dbt projects instead of a single large, monolithic project. It’s designed to accelerate development, without compromising governance. + +## Overview of Mesh + + + +Here are some benefits of implementing dbt Mesh: + +* **Ship data products faster**: With a more modular architecture, teams can make changes rapidly and independently in specific areas without impacting the entire system, leading to faster development cycles. +* **Improve trust in data:** Adopting dbt Mesh helps ensure that changes in one domain's data models do not unexpectedly break dependencies in other domain areas, leading to a more secure and predictable data environment. +* **Reduce complexity**: By organizing transformation logic into distinct domains, dbt Mesh reduces the complexity inherent in large, monolithic projects, making them easier to manage and understand. +* **Improve collaboration**: Teams are able to share and build upon each other's work without duplicating efforts. + +Most importantly, all this can be accomplished without the central data team losing the ability to see lineage across the entire organization, or compromising on governance mechanisms. + + + + + +dbt [model contracts](/docs/collaborate/govern/model-contracts) serve as a governance tool enabling the definition and enforcement of data structure standards in your dbt models. They allow you to specify and uphold data model guarantees, including column data types, allowing for the stability of dependent models. Should a model fail to adhere to its established contracts, it will not build successfully. + + + + + +dbt [model versions](https://docs.getdbt.com/docs/collaborate/govern/model-versions) are iterations of your dbt models made over time. In many cases, you might knowingly choose to change a model’s structure in a way that “breaks” the previous model contract, and may break downstream queries depending on that model’s structure. When you do so, creating a new version of the model is useful to signify this change. + +You can use model versions to: + +- Test "prerelease" changes (in production, in downstream systems). +- Bump the latest version, to be used as the canonical "source of truth." +- Offer a migration window off the "old" version. + + + + + +A [model access modifier](/docs/collaborate/govern/model-access) in dbt determines if a model is accessible as an input to other dbt models and projects. It specifies where a model can be referenced using [the `ref` function](/reference/dbt-jinja-functions/ref). There are three types of access modifiers: + +1. **Private:** A model with a private access modifier is only referenceable by models within the same group. This is intended for models that are implementation details and are meant to be used only within a specific group of related models. +2. **Protected:** Models with a protected access modifier can be referenced by any other model within the same dbt project or when the project is installed as a package. This is the default setting for all models, ensuring backward compatibility, especially when groups are assigned to an existing set of models. +3. **Public:** A public model can be referenced across different groups, packages, or projects. This is suitable for stable and mature models that serve as interfaces for other teams or projects. + + + + + +A [model group](/docs/collaborate/govern/model-access#groups) in dbt is a concept used to organize models under a common category or ownership. This categorization can be based on various criteria, such as the team responsible for the models or the specific data source they model. + + + + + +This is a new way of working, and the intentionality required to build, and then maintain, cross-project interfaces and dependencies may feel like a slowdown versus what some developers are used to. The intentional friction introduced promotes thoughtful changes, fostering a mindset that values stability and systematic adjustments over rapid transformations. + +Orchestration across multiple projects is also likely to be slightly more challenging for many organizations, although we’re currently developing new functionality that will make this process simpler. + + + + + +dbt Mesh allows you to better _operationalize_ data mesh by enabling decentralized, domain-specific data ownership and collaboration. + +In data mesh, each business domain is responsible for its data as a product. This is the same goal that dbt Mesh facilitates by enabling organizations to break down large, monolithic data projects into smaller, domain-specific dbt projects. Each team or domain can independently develop, maintain, and share its data models, fostering a decentralized data environment. + +dbt Mesh also enhances the interoperability and reusability of data across different domains, a key aspect of the data mesh philosophy. By allowing cross-project references and shared governance through model contracts and access controls, dbt Mesh ensures that while data ownership is decentralized, there is still a governed structure to the overall data architecture. + + + +## How dbt Mesh works + + + +Like resource dependencies, project dependencies are acyclic, meaning they only move in one direction. This prevents `ref` cycles (or loops). For example, if project B depends on project A, a new model in project A could not import and use a public model from project B. Refer to [Project dependencies](/docs/collaborate/govern/project-dependencies#how-to-use-ref) for more information. + + + + + +While it’s not currently possible to share sources across projects, it would be possible to have a shared foundational project, with staging models on top of those sources, exposed as “public” models to other teams/projects. + + + + + +This would be a breaking change for downstream consumers of that model. If the maintainers of the upstream project wish to remove the model (or “downgrade” its access modifier, effectively the same thing), they should mark that model for deprecation (using [deprecation_date](/reference/resource-properties/deprecation_date)), which will deliver a warning to all downstream consumers of that model. + +In the future, we plan for dbt Cloud to also be able to proactively flag this scenario in [continuous integration](/docs/deploy/continuous-integration) for the maintainers of the upstream public model. + + + + + +No, unless downstream projects are installed as [packages](/docs/build/packages) (source code). In that case, the models in project installed as a project become “your” models, and you can select or run them. There are cases in which this can be desirable; see docs on [project dependencies](/docs/collaborate/govern/project-dependencies). + + + + + +Yes, as long as they’re in the same data platform (BigQuery, Databricks, Redshift, Snowflake, etc.) and you have configured permissions and sharing in that data platform provider to allow this. + + + + + +Yes, because the cross-project collaboration is done using the `{{ ref() }}` macro, you can use those models from other teams in [singular tests](/docs/build/data-tests#singular-data-tests). + + + + + +Each team defines their connection to the data warehouse, and the default schema names for dbt to use when materializing datasets. + +By default, each project belonging to a team will create: + +- One schema for production runs (for example, `finance`). +- One schema per developer (for example, `dev_jerco`). + +Depending on each team’s needs, this can be customized with model-level [schema configurations](/docs/build/custom-schemas), including the ability to define different rules by environment. + + + + + +No, contracts can only be applied at the [model level](/docs/collaborate/govern/model-contracts). It is a recommended best practice to [define staging models](/best-practices/how-we-structure/2-staging) on top of sources, and it is possible to define contracts on top of those staging models. + + + + + +No. A contract applies to an entire model, including all columns in the model’s output. This is the same set of columns that a consumer would see when viewing the model’s details in Explorer, or when querying the model in the data platform. + +- If you wish to contract only a subset of columns, you can create a separate model (materialized as a view) selecting only that subset. +- If you wish to limit which rows or columns a downstream consumer can see when they query the model’s data, depending on who they are, some data platforms offer advanced capabilities around dynamic row-level access and column-level data masking. + + + + + +No, a [group](/docs/collaborate/govern/model-access#groups) can only be assigned to a single owner. However, the assigned owner can be a _team_, rather than an individual. + + + + + +Not directly, but contracts are [assigned to models](/docs/collaborate/govern/model-contracts) and models can be assigned to individual owners. You can use meta fields for this purpose. + + + + + +This is not currently possible, but something we hope to enable in the near future. If you’re interested in this functionality, please reach out to your dbt Labs account team. + + + + + +dbt Cloud will soon offer the capability to trigger jobs on the completion of another job, including a job in a different project. This offers one mechanism for executing a pipeline from start to finish across projects. + + + + + +Yes. In addition to being viewable natively through [dbt Explorer](https://www.getdbt.com/product/dbt-explorer), it is possible to view cross-project lineage connect using partner integrations with data cataloging tools. For a list of available dbt Cloud integrations, refer to the [Integrations page](https://www.getdbt.com/product/integrations). + + + + + +Tests and model contracts in dbt help eliminate the need to restate data in the first place. With these tools, you can incorporate checks at the source and output layers of your dbt projects to assess data quality in the most critical places. When there are changes in transformation logic (for example, the definition of a particular column is changed), restating the data is as easy as merging the updated code and running a dbt Cloud job. + +If a data quality issue does slip through, you also have the option of simply rolling back the git commit, and then re-running the dbt Cloud job with the old code. + + + + + +Yes, all of this metadata is accessible via the [dbt Cloud Admin API](/docs/dbt-cloud-apis/admin-cloud-api). This metadata can be fed into a monitoring tool, or used to create reports and dashboards. + +We also expose some of this information in dbt Cloud itself in [jobs](/docs/deploy/jobs), [environments](/docs/environments-in-dbt) and in [dbt Explorer](https://www.getdbt.com/product/dbt-explorer). + + + +## Permissions and access + + + +The existence of projects that have at least one public model will be visible to everyone in the organization with [read-only access](/docs/cloud/manage-access/seats-and-users). + +Private or protected models require a user to have read-only access on the specific project in order to see its existence. + + + + + +There’s model-level access within dbt, role-based access for users and groups in dbt Cloud, and access to the underlying data within the data platform. + +First things first: access to underlying data is always defined and enforced by the underlying data platform (for example, BigQuery, Databricks, Redshift, Snowflake, Starburst, etc.) This access is managed by executing “DCL statements” (namely `grant`). dbt makes it easy to [configure `grants` on models](/reference/resource-configs/grants), which provision data access for other roles/users/groups in the data warehouse. However, dbt does _not_ automatically define or coordinate those grants unless they are configured explicitly. Refer to your organization's system for managing data warehouse permissions. + +[dbt Cloud Enterprise plans](https://www.getdbt.com/pricing) support [role-based access control (RBAC)](/docs/cloud/manage-access/enterprise-permissions#how-to-set-up-rbac-groups-in-dbt-cloud) that manages granular permissions for users and user groups. You can control which users can see or edit all aspects of a dbt Cloud project. A user’s access to dbt Cloud projects also determines whether they can “explore” that project in detail. Roles, users, and groups are defined within the dbt Cloud application via the UI or by integrating with an identity provider. + +[Model access](/docs/collaborate/govern/model-access) defines where models can be referenced. It also informs the discoverability of those projects within dbt Explorer. Model `access` is defined in code, just like any other model configuration (`materialized`, `tags`, etc). + +**Public:** Models with `public` access can be referenced everywhere. These are the “data products” of your organization. + +**Protected:** Models with `protected` access can only be referenced within the same project. This is the default level of model access. +We are discussing a future extension to `protected` models to allow for their reference in _specific_ downstream projects. Please read [the GitHub issue](https://github.com/dbt-labs/dbt-core/issues/9340), and upvote/comment if you’re interested in this use case. + +**Private:** Model `groups` enable more-granular control over where `private` models can be referenced. By defining a group, and configuring models to belong to that group, you can restrict other models (not in the same group) from referencing any `private` models the group contains. Groups also provide a standard mechanism for defining the `owner` of all resources it contains. + +Within dbt Explorer, `public` models are discoverable for every user in the dbt Cloud account — every public model is listed in the “multi-project” view. By contrast, `protected` and `private` models in a project are visible only to users who have access to that project (including read-only access). + +Because dbt does not implicitly coordinate data warehouse `grants` with model-level `access`, it is possible for there to be a mismatch between them. For example, a `public` model’s metadata is viewable to all dbt Cloud users, anyone can write a `ref` to that model, but when they actually run or preview, they realize they do not have access to the underlying data in the data warehouse. **This is intentional.** In this way, your organization can retain least-privileged access to underlying data, while providing visibility and discoverability for the wider organization. Armed with the knowledge of which other “data products” (public models) exist — their descriptions, their ownership, which columns they contain — an analyst on another team can prepare a well-informed request for access to the underlying data. + + + + + +Not currently! But this is something we may evaluate for the future. + + + + + +Yes! As long as a user has permissions (at least read-only access) on all projects in a dbt Cloud account, they can navigate across the entirety of the organization’s DAG in dbt Explorer, and see models at all levels of detail. + + + + + +By default, cross-project references resolve to the “Production” deployment environment of the upstream project. If your organization has genuinely different data in production versus non-production environments, this poses an issue. + +For this reason, we will soon roll out a new canonical type of deployment environment: “Staging.” If a project defines both a “Production” environment and a “Staging” environment, then cross-project references from development and “Staging” environments will resolve to “Staging,” whereas only references coming from “Production” environments will resolve to “Production.” In this way, you are guaranteed separation of data environments, without needing to duplicate project configurations. + +If you’re interested in beta access to “Staging” environments, let your dbt Labs account representative know! + + + +## Compatibility with other features + + + +The [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) and dbt Mesh are complementary mechanisms enabled by dbt Cloud that work together to enhance the management, usability, and governance of data in large-scale data environments. + +The Semantic Layer in dbt Cloud allows teams to centrally define business metrics and dimensions. It ensures consistent and reliable metric definitions across various analytics tools and platforms. + +dbt Mesh enables organizations to split their data architecture into multiple domain-specific projects, while retaining the ability to reference “public” models across projects. It is also possible to reference a “public” model from another project for the purpose of defining semantic models and metrics. Your organization can have multiple dbt projects feed into a unified semantic layer, ensuring that metrics and dimensions are consistently defined and understood across these domains. + + + + + +**[dbt Explorer](/docs/collaborate/explore-projects)** is a tool within dbt Cloud that serves as a knowledge base and lineage visualization platform. It provides a comprehensive view of your dbt assets, including models, tests, sources, and their interdependencies. + +Used in conjunction with dbt Mesh, dbt Explorer becomes a powerful tool for visualizing and understanding the relationships and dependencies between models across multiple dbt projects. + + + + + +The [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) allows users to develop and run dbt commands from their preferred development environments, like VS Code, Sublime Text, or terminal interfaces. This flexibility is particularly beneficial in a dbt Mesh setup, where managing multiple projects can be complex. Developers can work in their preferred tools while leveraging the centralized capabilities of dbt Cloud. + + + +## Availability + + + +Yes, your account must be on [at least dbt v1.6](/docs/dbt-versions/upgrade-core-in-cloud) to take advantage of [cross-project dependencies](/docs/collaborate/govern/project-dependencies), one of the most crucial underlying capabilities required to implement a dbt Mesh. + + + + + +While dbt Core defines several of the foundational elements for dbt Mesh, dbt Cloud offers an enhanced experience that leverages these elements for scaled collaboration across multiple teams, facilitated by multi-project discovery in dbt Explorer that’s tailored to each user’s access. + +Several key components that underpin the dbt Mesh pattern, including model contracts, versions, and access modifiers, are defined and implemented in dbt Core. We believe these are components of the core language, which is why their implementations are open source. We want to define a standard pattern that analytics engineers everywhere can adopt, extend, and help us improve. + +To reference models defined in another project, users can also leverage [packages](/docs/build/packages), a longstanding feature of dbt Core. By importing an upstream project as a package, dbt will import all models defined in that project, which enables the resolution of cross-project references to those models. They can be [optionally restricted](/docs/collaborate/govern/model-access#how-do-i-restrict-access-to-models-defined-in-a-package) to just the models with `public` access. + +The major distinction comes with dbt Cloud's metadata service, which is unique to the dbt Cloud platform and allows for the resolution of references to only the public models in a project. This service enables users to take dependencies on upstream projects, and reference just their `public` models, *without* needing to load the full complexity of those upstream projects into their local development environment. + + + + + +Yes, a [dbt Cloud Enterprise](https://www.getdbt.com/pricing) plan is required to set up multiple projects and reference models across them. + + + +## Tips on implementing dbt Mesh + + + +Refer to our developer guide on [How we structure our dbt Mesh projects](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro). You may also be interested in watching the recording of this talk from Coalesce 2023: [Unlocking model governance and multi-project deployments with dbt-meshify](https://www.youtube.com/watch?v=FAsY0Qx8EyU). + + + + + +`dbt-meshify` is a [CLI tool](https://github.com/dbt-labs/dbt-meshify) that automates the creation of model governance and cross-project lineage features introduced in dbt-core v1.5 and v1.6. This package will leverage your dbt project metadata to create and/or edit the files in your project to properly configure the models in your project with these features. + + + + +Let’s say your organization has fewer than 500 models and fewer than a dozen regular contributors to dbt. You're operating at a scale well served by the monolith (a single project), and the larger pattern of dbt Mesh probably won't provide any immediate benefits. + +It’s never too early to think about how you’re organizing models _within_ that project. Use model `groups` to define clear ownership boundaries and `private` access to restrict purpose-built models from becoming load-bearing blocks in an unrelated section of the DAG. Your future selves will thank you for defining these interfaces, especially if you reach a scale where it makes sense to “graduate” the interfaces between `groups` into boundaries between projects. + + diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index 733ec9dbcfe..f6f2265a922 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -14,7 +14,7 @@ Linters analyze code for errors, bugs, and style issues, while formatters fix st -In the dbt Cloud IDE, you have the capability to perform linting, auto-fix, and formatting on five different file types: +In the dbt Cloud IDE, you can perform linting, auto-fix, and formatting on five different file types: - SQL — [Lint](#lint) and fix with SQLFluff, and [format](#format) with sqlfmt - YAML, Markdown, and JSON — Format with Prettier @@ -146,7 +146,7 @@ The Cloud IDE formatting integrations take care of manual tasks like code format To format your SQL code, dbt Cloud integrates with [sqlfmt](http://sqlfmt.com/), which is an uncompromising SQL query formatter that provides one way to format the SQL query and Jinja. -By default, the IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use right away. However, if you have a file named .sqlfluff in the root directory of your dbt project, the IDE will default to SQLFluff rules instead. +By default, the IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use immediately. However, if you have a file named .sqlfluff in the root directory of your dbt project, the IDE will default to SQLFluff rules instead. To enable sqlfmt: @@ -189,10 +189,8 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read ## FAQs -
-When should I use SQLFluff and when should I use sqlfmt? - -SQLFluff and sqlfmt are both tools used for formatting SQL code, but there are some differences that may make one preferable to the other depending on your use case.
+ +SQLFluff and sqlfmt are both tools used for formatting SQL code, but some differences may make one preferable to the other depending on your use case.
SQLFluff is a SQL code linter and formatter. This means that it analyzes your code to identify potential issues and bugs, and follows coding standards. It also formats your code according to a set of rules, which are [customizable](#customize-linting), to ensure consistent coding practices. You can also use SQLFluff to keep your SQL code well-formatted and follow styling best practices.
@@ -204,34 +202,37 @@ You can use either SQLFluff or sqlfmt depending on your preference and what work - Use sqlfmt to only have your code well-formatted without analyzing it for errors and bugs. You can use sqlfmt out of the box, making it convenient to use right away without having to configure it. -
+ -
-Can I nest .sqlfluff files? + To ensure optimal code quality, consistent code, and styles — it's highly recommended you have one main `.sqlfluff` configuration file in the root folder of your project. Having multiple files can result in various different SQL styles in your project.

However, you can customize and include an additional child `.sqlfluff` configuration file within specific subfolders of your dbt project.

By nesting a `.sqlfluff` file in a subfolder, SQLFluff will apply the rules defined in that subfolder's configuration file to any files located within it. The rules specified in the parent `.sqlfluff` file will be used for all other files and folders outside of the subfolder. This hierarchical approach allows for tailored linting rules while maintaining consistency throughout your project. Refer to [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/configuration.html#configuration-files) for more info. -
+ -
-Can I run SQLFluff commands from the terminal? + Currently, running SQLFluff commands from the terminal isn't supported. -
+ -
-Why am I unable to see the Lint or Format button? + Make sure you're on a development branch. Formatting or Linting isn't available on "main" or "read-only" branches. -
+ -
-Why is there inconsistent SQLFluff behavior when running outside the dbt Cloud IDE (such as a GitHub Action)? -— Double-check your SQLFluff version matches the one in dbt Cloud IDE (found in the Code Quality tab after a lint operation).

-— If your lint operation passes despite clear rule violations, confirm you're not linting models with ephemeral models. Linting doesn't support ephemeral models in dbt v1.5 and lower. -
+ +- Double-check that your SQLFluff version matches the one in dbt Cloud IDE (found in the Code Quality tab after a lint operation).

+- If your lint operation passes despite clear rule violations, confirm you're not linting models with ephemeral models. Linting doesn't support ephemeral models in dbt v1.5 and lower. +
+ + +Currently, the dbt Cloud IDE can lint or fix files up to a certain size and complexity. If you attempt to lint or fix files that are too large, taking more than 60 seconds for the dbt Cloud backend to process, you will see an 'Unable to complete linting this file' error. + +To avoid this, break up your model into smaller models (files) so that they are less complex to lint or fix. Note that linting is simpler than fixing so there may be cases where a file can be linted but not fixed. + + ## Related docs diff --git a/website/docs/docs/cloud/secure/about-privatelink.md b/website/docs/docs/cloud/secure/about-privatelink.md index 2134ab25cfe..731cef3f019 100644 --- a/website/docs/docs/cloud/secure/about-privatelink.md +++ b/website/docs/docs/cloud/secure/about-privatelink.md @@ -6,10 +6,11 @@ sidebar_label: "About PrivateLink" --- import SetUpPages from '/snippets/_available-tiers-privatelink.md'; +import PrivateLinkHostnameWarning from '/snippets/_privatelink-hostname-restriction.md'; -PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on AWS using [AWS PrivateLink](https://aws.amazon.com/privatelink/) technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. +PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on AWS using [AWS PrivateLink](https://aws.amazon.com/privatelink/) technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. ### Cross-region PrivateLink @@ -24,3 +25,5 @@ dbt Cloud supports the following data platforms for use with the PrivateLink fea - [Redshift](/docs/cloud/secure/redshift-privatelink) - [Postgres](/docs/cloud/secure/postgres-privatelink) - [VCS](/docs/cloud/secure/vcs-privatelink) + + diff --git a/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md new file mode 100644 index 00000000000..c0236a30783 --- /dev/null +++ b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md @@ -0,0 +1,15 @@ +--- +title: "New: Native support for partial parsing" +description: "December 2023: For faster run times with your dbt invocations, configure dbt Cloud to parse only the changed files in your project." +sidebar_label: "New: Native support for partial parsing" +sidebar_position: 09 +tags: [Jan-2024] +date: 2024-01-03 +--- + +By default, dbt parses all the files in your project at the beginning of every dbt invocation. Depending on the size of your project, this operation can take a long time to complete. With the new partial parsing feature in dbt Cloud, you can reduce the time it takes for dbt to parse your project. When enabled, dbt Cloud parses only the changed files in your project instead of parsing all the project files. As a result, your dbt invocations will take less time to run. + +To learn more, refer to [Partial parsing](/docs/deploy/deploy-environments#partial-parsing). + + + diff --git a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md index 7c35991e961..eff15e96cfd 100644 --- a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md +++ b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md @@ -11,4 +11,4 @@ Now available for dbt Cloud Enterprise plans is a new option to enable Git repos To learn more, refer to [Repo caching](/docs/deploy/deploy-environments#git-repository-caching). - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/guides/create-new-materializations.md b/website/docs/guides/create-new-materializations.md index af2732c0c39..52a8594b0d2 100644 --- a/website/docs/guides/create-new-materializations.md +++ b/website/docs/guides/create-new-materializations.md @@ -13,7 +13,7 @@ recently_updated: true ## Introduction -The model materializations you're familiar with, `table`, `view`, and `incremental` are implemented as macros in a package that's distributed along with dbt. You can check out the [source code for these materializations](https://github.com/dbt-labs/dbt-core/tree/main/core/dbt/include/global_project/macros/materializations). If you need to create your own materializations, reading these files is a good place to start. Continue reading below for a deep-dive into dbt materializations. +The model materializations you're familiar with, `table`, `view`, and `incremental` are implemented as macros in a package that's distributed along with dbt. You can check out the [source code for these materializations](https://github.com/dbt-labs/dbt-core/tree/main/core/dbt/adapters/include/global_project/macros/materializations). If you need to create your own materializations, reading these files is a good place to start. Continue reading below for a deep-dive into dbt materializations. :::caution diff --git a/website/docs/reference/dbt-jinja-functions/debug-method.md b/website/docs/reference/dbt-jinja-functions/debug-method.md index 0938970b50c..778ad095693 100644 --- a/website/docs/reference/dbt-jinja-functions/debug-method.md +++ b/website/docs/reference/dbt-jinja-functions/debug-method.md @@ -6,9 +6,9 @@ description: "The `{{ debug() }}` macro will open an iPython debugger." --- -:::caution New in v0.14.1 +:::warning Development environment only -The `debug` macro is new in dbt v0.14.1, and is only intended to be used in a development context with dbt. Do not deploy code to production which uses the `debug` macro. +The `debug` macro is only intended to be used in a development context with dbt. Do not deploy code to production that uses the `debug` macro. ::: diff --git a/website/docs/reference/parsing.md b/website/docs/reference/parsing.md index 1a68ba0d476..6eed4c96af0 100644 --- a/website/docs/reference/parsing.md +++ b/website/docs/reference/parsing.md @@ -41,7 +41,7 @@ The [`PARTIAL_PARSE` global config](/reference/global-configs/parsing) can be en Parse-time attributes (dependencies, configs, and resource properties) are resolved using the parse-time context. When partial parsing is enabled, and certain context variables change, those attributes will _not_ be re-resolved, and are likely to become stale. -In particular, you may see **incorrect results** if these attributes depend on "volatile" context variables, such as [`run_started_at`](/reference/dbt-jinja-functions/run_started_at), [`invocation_id`](/reference/dbt-jinja-functions/invocation_id), or [flags](/reference/dbt-jinja-functions/flags). These variables are likely (or even guaranteed!) to change in each invocation. We _highly discourage_ you from using these variables to set parse-time attributes (dependencies, configs, and resource properties). +In particular, you may see incorrect results if these attributes depend on "volatile" context variables, such as [`run_started_at`](/reference/dbt-jinja-functions/run_started_at), [`invocation_id`](/reference/dbt-jinja-functions/invocation_id), or [flags](/reference/dbt-jinja-functions/flags). These variables are likely (or even guaranteed!) to change in each invocation. dbt Labs _strongly discourages_ you from using these variables to set parse-time attributes (dependencies, configs, and resource properties). Starting in v1.0, dbt _will_ detect changes in environment variables. It will selectively re-parse only the files that depend on that [`env_var`](/reference/dbt-jinja-functions/env_var) value. (If the env var is used in `profiles.yml` or `dbt_project.yml`, a full re-parse is needed.) However, dbt will _not_ re-render **descriptions** that include env vars. If your descriptions include frequently changing env vars (this is highly uncommon), we recommend that you fully re-parse when generating documentation: `dbt --no-partial-parse docs generate`. @@ -51,7 +51,9 @@ If certain inputs change between runs, dbt will trigger a full re-parse. The res - `dbt_project.yml` content (or `env_var` values used within) - installed packages - dbt version -- certain widely-used macros, e.g. [builtins](/reference/dbt-jinja-functions/builtins) overrides or `generate_x_name` for `database`/`schema`/`alias` +- certain widely-used macros (for example, [builtins](/reference/dbt-jinja-functions/builtins), overrides, or `generate_x_name` for `database`/`schema`/`alias`) + +If you're triggering [CI](/docs/deploy/continuous-integration) job runs, the benefits of partial parsing are not applicable to new pull requests (PR) or new branches. However, they are applied on subsequent commits to the new PR or branch. If you ever get into a bad state, you can disable partial parsing and trigger a full re-parse by setting the `PARTIAL_PARSE` global config to false, or by deleting `target/partial_parse.msgpack` (e.g. by running `dbt clean`). diff --git a/website/docs/reference/resource-configs/full_refresh.md b/website/docs/reference/resource-configs/full_refresh.md index f75fe3a583b..c7f1b799087 100644 --- a/website/docs/reference/resource-configs/full_refresh.md +++ b/website/docs/reference/resource-configs/full_refresh.md @@ -74,7 +74,7 @@ Optionally set a resource to always or never full-refresh. -This logic is encoded in the [`should_full_refresh()`](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/configs.sql#L6) macro. +This logic is encoded in the [`should_full_refresh()`](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/include/global_project/macros/materializations/configs.sql#L6) macro. ## Usage diff --git a/website/docs/reference/resource-configs/store_failures.md b/website/docs/reference/resource-configs/store_failures.md index 2c596d1cf3e..8a83809152b 100644 --- a/website/docs/reference/resource-configs/store_failures.md +++ b/website/docs/reference/resource-configs/store_failures.md @@ -12,7 +12,7 @@ Optionally set a test to always or never store its failures in the database. - If the `store_failures` config is `none` or omitted, the resource will use the value of the `--store-failures` flag. - When true, `store_failures` save all the record(s) that failed the test only if [limit](/reference/resource-configs/limit) is not set or if there are fewer records than the limit. `store_failures` are saved in a new table with the name of the test. By default, `store_failures` use a schema named `dbt_test__audit`, but, you can [configure](/reference/resource-configs/schema#tests) the schema to a different value. -This logic is encoded in the [`should_store_failures()`](https://github.com/dbt-labs/dbt-core/blob/98c015b7754779793e44e056905614296c6e4527/core/dbt/include/global_project/macros/materializations/helpers.sql#L77) macro. +This logic is encoded in the [`should_store_failures()`](https://github.com/dbt-labs/dbt-core/blob/77632122974b28967221758b4a470d7dfb608ac2/core/dbt/adapters/include/global_project/macros/materializations/configs.sql#L15) macro. diff --git a/website/docs/reference/resource-configs/strategy.md b/website/docs/reference/resource-configs/strategy.md index 3cef8b0df51..2bfcf0a94e4 100644 --- a/website/docs/reference/resource-configs/strategy.md +++ b/website/docs/reference/resource-configs/strategy.md @@ -132,8 +132,8 @@ This is a **required configuration**. There is no default value. ### Advanced: define and use custom snapshot strategy Behind the scenes, snapshot strategies are implemented as macros, named `snapshot__strategy` -* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql#L65) for the timestamp strategy -* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/include/global_project/macros/materializations/snapshots/strategies.sql#L131) for the check strategy +* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/adapters/include/global_project/macros/materializations/snapshots/strategies.sql#L52) for the timestamp strategy +* [Source code](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/adapters/include/global_project/macros/materializations/snapshots/strategies.sql#L136) for the check strategy It's possible to implement your own snapshot strategy by adding a macro with the same naming pattern to your project. For example, you might choose to create a strategy which records hard deletes, named `timestamp_with_deletes`. diff --git a/website/docs/reference/resource-configs/vertica-configs.md b/website/docs/reference/resource-configs/vertica-configs.md index 598bc3fecee..90badfe29ad 100644 --- a/website/docs/reference/resource-configs/vertica-configs.md +++ b/website/docs/reference/resource-configs/vertica-configs.md @@ -99,7 +99,7 @@ You can use `on_schema_change` parameter with values `ignore`, `fail` and `appen -#### Configuring the `apppend_new_columns` parameter +#### Configuring the `append_new_columns` parameter - -:::note - -This feature is only available on the dbt Cloud Enterprise plan. - -::: - ### Custom branch behavior By default, all environments will use the default branch in your repository (usually the `main` branch) when accessing your dbt code. This is overridable within each dbt Cloud Environment using the **Default to a custom branch** option. This setting have will have slightly different behavior depending on the environment type: @@ -99,3 +74,42 @@ schema: dbt_alice threads: 4 ``` +### Git repository caching + +At the start of every job run, dbt Cloud clones the project's Git repository so it has the latest versions of your project's code and runs `dbt deps` to install your dependencies. + +For improved reliability and performance on your job runs, you can enable dbt Cloud to keep a cache of the project's Git repository. So, if there's a third-party outage that causes the cloning operation to fail, dbt Cloud will instead use the cached copy of the repo so your jobs can continue running as scheduled. + +dbt Cloud caches your project's Git repo after each successful run and retains it for 8 days if there are no repo updates. It caches all packages regardless of installation method and does not fetch code outside of the job runs. + +dbt Cloud will use the cached copy of your project's Git repo under these circumstances: + +- Outages from third-party services (for example, the [dbt package hub](https://hub.getdbt.com/)). +- Git authentication fails. +- There are syntax errors in the `packages.yml` file. You can set up and use [continuous integration (CI)](/docs/deploy/continuous-integration) to find these errors sooner. +- If a package doesn't work with the current dbt version. You can set up and use [continuous integration (CI)](/docs/deploy/continuous-integration) to identify this issue sooner. + +To enable Git repository caching, select **Account settings** from the gear menu and enable the **Repository caching** option. + + + +:::note + +This feature is only available on the dbt Cloud Enterprise plan. + +::: + +### Partial parsing + +At the start of every dbt invocation, dbt reads all the files in your project, extracts information, and constructs an internal manifest containing every object (model, source, macro, and so on). Among other things, it uses the `ref()`, `source()`, and `config()` macro calls within models to set properties, infer dependencies, and construct your project's DAG. When dbt finishes parsing your project, it stores the internal manifest in a file called `partial_parse.msgpack`. + +Parsing projects can be time-consuming, especially for large projects with hundreds of models and thousands of files. To reduce the time it takes dbt to parse your project, use the partial parsing feature in dbt Cloud for your environment. When enabled, dbt Cloud uses the `partial_parse.msgpack` file to determine which files have changed (if any) since the project was last parsed, and then it parses _only_ the changed files and the files related to those changes. + +Partial parsing in dbt Cloud requires dbt version 1.4 or newer. The feature does have some known limitations. Refer to [Known limitations](/reference/parsing#known-limitations) to learn more about them. + +To enable, select **Account settings** from the gear menu and enable the **Partial parsing** option. + + + + + diff --git a/website/snippets/_privatelink-hostname-restriction.md b/website/snippets/_privatelink-hostname-restriction.md new file mode 100644 index 00000000000..a4bcd318a15 --- /dev/null +++ b/website/snippets/_privatelink-hostname-restriction.md @@ -0,0 +1,5 @@ +:::caution Environment variables + +Using [Environment variables](/docs/build/environment-variables) when configuring PrivateLink endpoints isn't supported in dbt Cloud. Instead, use [Extended Attributes](/docs/deploy/deploy-environments#extended-attributes) to dynamically change these values in your dbt Cloud environment. + +::: diff --git a/website/src/components/detailsToggle/index.js b/website/src/components/detailsToggle/index.js index ba53192e54b..076d053846c 100644 --- a/website/src/components/detailsToggle/index.js +++ b/website/src/components/detailsToggle/index.js @@ -40,7 +40,7 @@ useEffect(() => { onMouseLeave={handleMouseLeave} >   - {alt_header} + {alt_header} {/* Visual disclaimer */} Hover to view diff --git a/website/src/components/detailsToggle/styles.module.css b/website/src/components/detailsToggle/styles.module.css index 446d3197128..b9c2f09df06 100644 --- a/website/src/components/detailsToggle/styles.module.css +++ b/website/src/components/detailsToggle/styles.module.css @@ -1,9 +1,11 @@ -:local(.link) { +:local(.link) :local(.headerText) { color: var(--ifm-link-color); - transition: background-color 0.3s; /* Smooth transition for background color */ + text-decoration: none; + transition: text-decoration 0.3s; /* Smooth transition */ } -:local(.link:hover), :local(.link:focus) { +:local(.link:hover) :local(.headerText), +:local(.link:focus) :local(.headerText) { text-decoration: underline; cursor: pointer; } @@ -12,6 +14,7 @@ font-size: 0.8em; color: #666; margin-left: 10px; /* Adjust as needed */ + text-decoration: none; } :local(.toggle) { diff --git a/website/src/components/faqs/index.js b/website/src/components/faqs/index.js index 52c4573d883..0741a29cd89 100644 --- a/website/src/components/faqs/index.js +++ b/website/src/components/faqs/index.js @@ -3,10 +3,10 @@ import styles from './styles.module.css'; import { usePluginData } from '@docusaurus/useGlobalData'; function FAQ({ path, alt_header = null }) { - const [isOn, setOn] = useState(false); - const [filePath, setFilePath] = useState(path) - const [fileContent, setFileContent] = useState({}) + const [filePath, setFilePath] = useState(path); + const [fileContent, setFileContent] = useState({}); + const [hoverTimeout, setHoverTimeout] = useState(null); // Get all faq file paths from plugin const { faqFiles } = usePluginData('docusaurus-build-global-data-plugin'); @@ -37,24 +37,45 @@ function FAQ({ path, alt_header = null }) { } }, [filePath]) - const toggleOn = function () { - setOn(!isOn); + const handleMouseEnter = () => { + setHoverTimeout(setTimeout(() => { + setOn(true); + }, 500)); + }; + + const handleMouseLeave = () => { + if (!isOn) { + clearTimeout(hoverTimeout); + setOn(false); } +}; + + useEffect(() => { + return () => { + if (hoverTimeout) { + clearTimeout(hoverTimeout); + } + }; + }, [hoverTimeout]); + + const toggleOn = () => { + if (hoverTimeout) { + clearTimeout(hoverTimeout); + } + setOn(!isOn); + }; return ( -
+
- -   - {alt_header || fileContent?.meta && fileContent.meta.title} - -
- {fileContent?.contents && fileContent.contents} + + {alt_header || (fileContent?.meta && fileContent.meta.title)} + Hover to view + +
+ {fileContent?.contents}
-
+
); } diff --git a/website/src/components/faqs/styles.module.css b/website/src/components/faqs/styles.module.css index e19156a3a7b..c179aa85cdc 100644 --- a/website/src/components/faqs/styles.module.css +++ b/website/src/components/faqs/styles.module.css @@ -1,9 +1,12 @@ -:local(.link) { +:local(.link) :local(.headerText) { color: var(--ifm-link-color); + text-decoration: none; + transition: text-decoration 0.3s; /* Smooth transition */ } -:local(.link:hover) { +:local(.link:hover) :local(.headerText), +:local(.link:focus) :local(.headerText) { text-decoration: underline; cursor: pointer; } @@ -24,6 +27,13 @@ filter: invert(1); } +:local(.disclaimer) { + font-size: 0.8em; + color: #666; + margin-left: 10px; /* Adjust as needed */ + text-decoration: none; +} + :local(.body) { margin-left: 2em; margin-bottom: 10px; diff --git a/website/static/img/docs/deploy/example-account-settings.png b/website/static/img/docs/deploy/example-account-settings.png new file mode 100644 index 00000000000..12b8d9bc49f Binary files /dev/null and b/website/static/img/docs/deploy/example-account-settings.png differ diff --git a/website/static/img/docs/deploy/example-repo-caching.png b/website/static/img/docs/deploy/example-repo-caching.png deleted file mode 100644 index 805d845dccb..00000000000 Binary files a/website/static/img/docs/deploy/example-repo-caching.png and /dev/null differ