Skip to content

Commit

Permalink
Merge branch 'current' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jan 10, 2024
2 parents d02a122 + 7cb0ebb commit 33ae1bf
Show file tree
Hide file tree
Showing 17 changed files with 395 additions and 35 deletions.
2 changes: 1 addition & 1 deletion website/blog/2023-12-20-partner-integration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This guide doesn't include how to integrate with dbt Core. If you’re intereste
Instead, we're going to focus on integrating with dbt Cloud. Integrating with dbt Cloud is a key requirement to become a dbt Labs technology partner, opening the door to a variety of collaborative commercial opportunities.

Here I'll cover how to get started, potential use cases you want to solve for, and points of integrations to do so.

<!-- truncate -->
## New to dbt Cloud?

If you're new to dbt and dbt Cloud, we recommend you and your software developers try our [Getting Started Quickstarts](https://docs.getdbt.com/guides) after reading [What is dbt](https://docs.getdbt.com/docs/introduction). The documentation will help you familiarize yourself with how our users interact with dbt. By going through this, you will also create a sample dbt project to test your integration.
Expand Down
160 changes: 160 additions & 0 deletions website/blog/2024-01-09-defer-in-development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: "More time coding, less time waiting: Mastering defer in dbt"
description: "Learn how to take advantage of the defer to prod feature in dbt Cloud"
slug: defer-to-prod

authors: [dave_connors]

tags: [analytics craft]
hide_table_of_contents: false

date: 2024-01-09
is_featured: true
---

Picture this — you’ve got a massive dbt project, thousands of models chugging along, creating actionable insights for your stakeholders. A ticket comes your way &mdash; a model needs to be refactored! "No problem," you think to yourself, "I will simply make that change and test it locally!" You look at you lineage, and realize this model is many layers deep, buried underneath a long chain of tables and views.

“OK,” you think further, “I’ll just run a `dbt build -s +my_changed_model` to make sure I have everything I need built into my dev schema and I can test my changes”. You run the command. You wait. You wait some more. You get some coffee, and completely take yourself out of your dbt development flow state. A lot of time and money down the drain to get to a point where you can *start* your work. That’s no good!

Luckily, dbt’s defer functionality allow you to *only* build what you care about when you need it, and nothing more. This feature helps developers spend less time and money in development, helping ship trusted data products faster. dbt Cloud offers native support for this workflow in development, so you can start deferring without any additional overhead!
<!-- truncate -->
## Defer to prod or prefer to slog

A lot of dbt’s magic relies on the elegance and simplicity of the `{{ ref() }}` function, which is how you can build your lineage graph, and how dbt can be run in different environments &mdash; the `{{ ref() }}` functions dynamically compile depending on your environment settings, so that you can run your project in development and production without changing any code.

Here's how a simple `{{ ref() }}` would compile in different environments:

<Tabs defaultValue="Raw Model Code">

<TabItem value="Raw Model Code">

```sql
-- in models/my_model.sql
select * from {{ ref('model_a') }}
```
</TabItem>

<TabItem value="Compiled in Dev">

```sql
-- in target/compiled/models/my_model.sql
select * from analytics.dbt_dconnors.model_a
```
</TabItem>

<TabItem value="Compiled in Prod">

```sql
-- in target/compiled/models/my_model.sql
select * from analytics.analytics.model_a
```
</TabItem>

</Tabs>

All of that is made possible by the dbt `manifest.json`, [the artifact](https://docs.getdbt.com/reference/artifacts/manifest-json) that is produced each time you run a dbt command, containing the comprehensive and encyclopedic compendium of all things in your project. Each node is assigned a `unique_id` (like `model.my_project.my_model` ) and the manifest stores all the metadata about that model in a dictionary associated to that id. This includes the data warehouse location that gets returned when you write `{{ ref('my_model') }}` in SQL. Different runs of your project in different environments result in different metadata written to the manifest.

Let’s think back to the hypothetical above &mdash; what if we made use of the production metadata to read in data from production, so that I don’t have to rebuild *everything* upstream of the model I’m changing? That’s exactly what `defer` does! When you supply dbt with a production version of the `manifest.json` artifact, and pass the `--defer` flag to your dbt command, dbt will resolve the `{{ ref() }}` functions for any resource upstream of your selected models with the *production metadata* — no need to rebuild anything you don’t have to!

Let’s take a look at a simplified example &mdash; let’s say your project looks like this in production:

<Lightbox src="/img/blog/2024-01-09-defer-in-development/prod-environment-plain.png" width="85%" title="A simplified dbt project running in production." />

And you’re tasked with making changes to `model_f`. Without defer, you would need to make sure to at minimum execute a `dbt run -s +model_f` to ensure all the upstream dependencies of `model_f` are present in your development schema so that you can start to run `model_f`.* You just spent a whole bunch of time and money duplicating your models, and now your warehouse looks like this:

<Lightbox src="/img/blog/2024-01-09-defer-in-development/prod-and-dev-full.png" width="85%" title="The whole project has been rebuilt into the dev schema, which can be time consuming and expensive!" />

With defer, we should not build anything other than the models that have changed, and are now different from their production counterparts! Let’s tell dbt to use production metadata to resolve our refs, and only build the model I have changed &mdash; that command would be `dbt run -s model_f --defer` .**

<Lightbox src="/img/blog/2024-01-09-defer-in-development/prod-and-dev-defer.png" width="85%" title="Using defer, we can only build one single model" />

This results in a *much slimmer build* &mdash; we read data in directly from the production version of `model_b` and `model_c`, and don’t have to worry about building anything other than what we selected!

\* [Another option](https://docs.getdbt.com/reference/commands/clone) is to run `dbt clone -s +model_f` , which will make clones of your production models into your development schema, making use of zero copy cloning where available. Check out this [great dev blog](https://docs.getdbt.com/blog/to-defer-or-to-clone) from Doug and Kshitij on when to use `clone` vs `defer`!

** in dbt Core, you also have to tell dbt where to find the production artifacts! Otherwise it doesn’t know what to defer to. You can either use the `--state path/to/artifact/folder` option, or set a `DBT_STATE` environment variable.

### Batteries included deferral in dbt Cloud

dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and the dbt Cloud CLI — dbt Cloud ***always*** has the latest run artifacts from your production environment. Rather than having to go through the painful process of somehow getting a copy of your latest production `manifest.json` into your local filesystem to defer to, and building a pipeline to always keep it fresh, dbt Cloud does all that work for you. When developing in dbt Cloud, the latest artifact is automatically provided to you under the hood, and dbt Cloud handles the `--defer` flag for you when you run commands in “defer mode”. dbt Cloud will use the artifacts from the deployment environment in your project marked as `Production` in the [environments settings](https://docs.getdbt.com/docs/deploy/deploy-environments#set-as-production-environment) in both the IDE and the Cloud CLI. Be sure to configure a production environment to unlock this feature!

In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE!

<Lightbox src="/img/blog/2024-01-09-defer-in-development/defer-toggle.png" title="The defer to prod toggle in the IDE" />

The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` :

```yaml
dbt-cloud:
project-id: <Your project id>
defer-env-id: <An environment id>
```
When you’re developing with dbt Cloud, you can defer right away, and completely avoid unnecessary model builds in development!
### Other things to to know about defer
**Favoring state**
One of the major gotchas in the defer workflow is that when you’re in defer mode, dbt assumes that all the objects in your development schema are part of your current work stream, and will prioritize those objects over the production objects when possible.
Let’s take a look at that example above again, and pretend that some time before we went to make this edit, we did some work on `model_c`, and we have a local copy of `model_c` hanging out in our development schema:

<Lightbox src="/img/blog/2024-01-09-defer-in-development/prod-and-dev-model-c.png" width="85%" title="Hypothetical starting point, with a development copy of model_c in the development schema at the start of the development cycle." />

When you run `dbt run -s model_f --defer` , dbt will detect the development copy of `model_c` and say “Hey, y’know, I bet Dave is working on that model too, and he probably wants to make sure his changes to `model_c` work together with his changes to `model_f` . Because I am a kind and benevolent data transformation tool, i’ll make sure his `{{ ref('model_c') }]` function compiles to his development changes!” Thanks dbt!

As a result, we’ll effectively see this behavior when we run our command:

<Lightbox src="/img/blog/2024-01-09-defer-in-development/prod-and-dev-mixed.png" width="85%" title="With a development version of model_a in our dev schema, dbt will preferentially use that version instead of deferring" />

Where our code would compile from

```sql
# in models/model_f.sql
with
model_b as (
select * from {{ ref('model_b') }}
),
model_c as (
select * from {{ ref('model_c') }}
),
...
```

to

```sql
# in target/compiled/models/model_f.sql
with
model_b as (
select * from analytics.analytics.model_b
),
model_c as (
select * from analytics.dbt_dconnors.model_b
),
...
```

A mix of prod and dev models may not be what we want! To avoid this, we have a couple options:

1. **Start fresh every time:** The simplest way to avoid this issue is to make sure you are always drop your development schema at the start of a new development session. That way, the only things that show up in your development schema are the things you intentionally selected with your commands!
2. **Favor state:** Passing the `--favor-state` flag to your command tells dbt “Hey benevolent tool, go ahead and use what you find in the production manifest no matter what you find in my development schema” so that both `{{ ref() }}` functions in the example above point to the production schema, even if `model_c` was hanging around in there.

In this example, `model_c` is a relic of a previous development cycle, but I should be clear here that defaulting to using dev relations is *usually the right course of action* &mdash; generally, a dbt PR spans a few models, and you want to coordinate your changes across those models together. This behavior can just get a bit confusing if you’re encountering it for the first time!

**When should I *not* defer to prod**

While defer is a faster and cheaper option for most folks in most situations, defer to prod does not support all projects. The most common reason you should not use defer is regulatory &mdash; defer to prod makes the assumption that data is shared between your production and development environments, so reading between these environments is not an issue. For some organizations, like healthcare companies, have restrictions around the data access and sharing that precludes the basic defer structure presented here.

### Call me Willem Defer

<Lightbox src="/img/blog/2024-01-09-defer-in-development/willem.png" title="Willem Dafoe after using the `-—defer` flag" />

Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects!
12 changes: 6 additions & 6 deletions website/docs/best-practices/how-we-mesh/mesh-4-faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ You can use model versions to:

A [model access modifier](/docs/collaborate/govern/model-access) in dbt determines if a model is accessible as an input to other dbt models and projects. It specifies where a model can be referenced using [the `ref` function](/reference/dbt-jinja-functions/ref). There are three types of access modifiers:

1. **Private:** A model with a private access modifier is only referenceable by models within the same group. This is intended for models that are implementation details and are meant to be used only within a specific group of related models.
2. **Protected:** Models with a protected access modifier can be referenced by any other model within the same dbt project or when the project is installed as a package. This is the default setting for all models, ensuring backward compatibility, especially when groups are assigned to an existing set of models.
3. **Public:** A public model can be referenced across different groups, packages, or projects. This is suitable for stable and mature models that serve as interfaces for other teams or projects.
* **Private:** A model with a private access modifier is only referenceable by models within the same group. This is intended for models that are implementation details and are meant to be used only within a specific group of related models.
* **Protected:** Models with a protected access modifier can be referenced by any other model within the same dbt project or when the project is installed as a package. This is the default setting for all models, ensuring backward compatibility, especially when groups are assigned to an existing set of models.
* **Public:** A public model can be referenced across different groups, packages, or projects. This is suitable for stable and mature models that serve as interfaces for other teams or projects.

</detailsToggle>

Expand Down Expand Up @@ -208,12 +208,12 @@ First things first: access to underlying data is always defined and enforced by

[Model access](/docs/collaborate/govern/model-access) defines where models can be referenced. It also informs the discoverability of those projects within dbt Explorer. Model `access` is defined in code, just like any other model configuration (`materialized`, `tags`, etc).

**Public:** Models with `public` access can be referenced everywhere. These are the “data products” of your organization.
* **Public:** Models with `public` access can be referenced everywhere. These are the “data products” of your organization.

**Protected:** Models with `protected` access can only be referenced within the same project. This is the default level of model access.
* **Protected:** Models with `protected` access can only be referenced within the same project. This is the default level of model access.
We are discussing a future extension to `protected` models to allow for their reference in _specific_ downstream projects. Please read [the GitHub issue](https://github.com/dbt-labs/dbt-core/issues/9340), and upvote/comment if you’re interested in this use case.

**Private:** Model `groups` enable more-granular control over where `private` models can be referenced. By defining a group, and configuring models to belong to that group, you can restrict other models (not in the same group) from referencing any `private` models the group contains. Groups also provide a standard mechanism for defining the `owner` of all resources it contains.
* **Private:** Model `groups` enable more-granular control over where `private` models can be referenced. By defining a group, and configuring models to belong to that group, you can restrict other models (not in the same group) from referencing any `private` models the group contains. Groups also provide a standard mechanism for defining the `owner` of all resources it contains.

Within dbt Explorer, `public` models are discoverable for every user in the dbt Cloud account — every public model is listed in the “multi-project” view. By contrast, `protected` and `private` models in a project are visible only to users who have access to that project (including read-only access).

Expand Down
Loading

0 comments on commit 33ae1bf

Please sign in to comment.