diff --git a/website/blog/2023-12-20-partner-integration-guide.md b/website/blog/2023-12-20-partner-integration-guide.md index b546f258f6c..432ed97635b 100644 --- a/website/blog/2023-12-20-partner-integration-guide.md +++ b/website/blog/2023-12-20-partner-integration-guide.md @@ -20,7 +20,7 @@ This guide doesn't include how to integrate with dbt Core. If you’re intereste Instead, we're going to focus on integrating with dbt Cloud. Integrating with dbt Cloud is a key requirement to become a dbt Labs technology partner, opening the door to a variety of collaborative commercial opportunities. Here I'll cover how to get started, potential use cases you want to solve for, and points of integrations to do so. - + ## New to dbt Cloud? If you're new to dbt and dbt Cloud, we recommend you and your software developers try our [Getting Started Quickstarts](https://docs.getdbt.com/guides) after reading [What is dbt](https://docs.getdbt.com/docs/introduction). The documentation will help you familiarize yourself with how our users interact with dbt. By going through this, you will also create a sample dbt project to test your integration. diff --git a/website/blog/2024-01-09-defer-in-development.md b/website/blog/2024-01-09-defer-in-development.md new file mode 100644 index 00000000000..4ff48d7d2e0 --- /dev/null +++ b/website/blog/2024-01-09-defer-in-development.md @@ -0,0 +1,160 @@ +--- +title: "More time coding, less time waiting: Mastering defer in dbt" +description: "Learn how to take advantage of the defer to prod feature in dbt Cloud" +slug: defer-to-prod + +authors: [dave_connors] + +tags: [analytics craft] +hide_table_of_contents: false + +date: 2024-01-09 +is_featured: true +--- + +Picture this — you’ve got a massive dbt project, thousands of models chugging along, creating actionable insights for your stakeholders. A ticket comes your way — a model needs to be refactored! "No problem," you think to yourself, "I will simply make that change and test it locally!" You look at you lineage, and realize this model is many layers deep, buried underneath a long chain of tables and views. + +“OK,” you think further, “I’ll just run a `dbt build -s +my_changed_model` to make sure I have everything I need built into my dev schema and I can test my changes”. You run the command. You wait. You wait some more. You get some coffee, and completely take yourself out of your dbt development flow state. A lot of time and money down the drain to get to a point where you can *start* your work. That’s no good! + +Luckily, dbt’s defer functionality allow you to *only* build what you care about when you need it, and nothing more. This feature (which has been around for a long time!) helps developers spend less time and money in development, helping ship trusted data products faster. dbt Cloud now offers native support for this workflow in development, so it’s never been easier to master the defer feature in dbt! + +## Defer to prod or prefer to slog + +A lot of dbt’s magic relies on the elegance and simplicity of the `{{ ref() }}` function, which is how you can build your lineage graph, and how dbt can be run in different environments — the `{{ ref() }}` functions dynamically compile depending on your environment settings, so that you can run your project in development and production without changing any code. + +Here's how a simple `{{ ref() }}` would compile in different environments: + + + + + + ```sql + -- in models/my_model.sql + select * from {{ ref('model_a') }} + ``` + + + + + ```sql + -- in target/compiled/models/my_model.sql + select * from analytics.dbt_dconnors.model_a + ``` + + + + + ```sql + -- in target/compiled/models/my_model.sql + select * from analytics.analytics.model_a + ``` + + + + +All of that is made possible by the dbt `manifest.json`, [the artifact](https://docs.getdbt.com/reference/artifacts/manifest-json) that is produced each time you run a dbt command, containing the comprehensive and encyclopedic compendium of all things in your project. Each node is assigned a `unique_id` (like `model.my_project.my_model` ) and the manifest stores all the metadata about that model in a dictionary associated to that id. This includes the data warehouse location that gets returned when you write `{{ ref('my_model') }}` in SQL. Different runs of your project in different environments result in different metadata written to the manifest. + +Let’s think back to the hypothetical above — what if we made use of the production metadata to read in data from production, so that I don’t have to rebuild *everything* upstream of the model I’m changing? That’s exactly what `defer` does! When you supply dbt with a production version of the `manifest.json` artifact, and pass the `--defer` flag to your dbt command, dbt will resolve the `{{ ref() }}` functions for any resource upstream of your selected models with the *production metadata* — no need to rebuild anything you don’t have to! + +Let’s take a look at a simplified example — let’s say your project looks like this in production: + + + +And you’re tasked with making changes to `model_f`. Without defer, you would need to make sure to at minimum execute a `dbt run -s +model_f` to ensure all the upstream dependencies of `model_f` are present in your development schema so that you can start to run `model_f`.* You just spent a whole bunch of time and money duplicating your models, and now your warehouse looks like this: + + + +With defer, we should not build anything other than the models that have changed, and are now different from their production counterparts! Let’s tell dbt to use production metadata to resolve our refs, and only build the model I have changed — that command would be `dbt run -s model_f --defer` .** + + + +This results in a *much slimmer build* — we read data in directly from the production version of `model_b` and `model_c`, and don’t have to worry about building anything other than what we selected! + +\* [Another option](https://docs.getdbt.com/reference/commands/clone) is to run `dbt clone -s +model_f` , which will make clones of your production models into your development schema, making use of zero copy cloning where available. Check out this [great dev blog](https://docs.getdbt.com/blog/to-defer-or-to-clone) from Doug and Kshitij on when to use `clone` vs `defer`! + +** in dbt Core, you also have to tell dbt where to find the production artifacts! Otherwise it doesn’t know what to defer to. You can either use the `--state path/to/artifact/folder` option, or set a `DBT_STATE` environment variable. + +### Batteries included deferral in dbt Cloud + +dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and the dbt Cloud CLI — dbt Cloud ***always*** has the latest run artifacts from your production environment. Rather than having to go through the painful process of somehow getting a copy of your latest production `manifest.json` into your local filesystem to defer to, and building a pipeline to alaways keep it fresh, dbt Cloud does all that work for you. When developing in dbt Cloud, the latest artifact is automatically provided to you under the hood, and dbt Cloud handles the `--defer` flag for you when you run commands in “defer mode”. dbt Cloud will use the artifacts from the deployment environment in your project marked as `Production` in the [environments settings](https://docs.getdbt.com/docs/deploy/deploy-environments#set-as-production-environment) in both the IDE and the Cloud CLI. Be sure to configure a production environment to unlock this feature! + +In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE! + + + +The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` : + +```yaml +dbt-cloud: + project-id: + defer-env-id: +``` + +When you’re developing with dbt Cloud, you can defer right away, and completely avoid unnecessary model builds in development! + +### Other things to to know about defer + +**Favoring state** + +One of the major gotchas in the defer workflow is that when you’re in defer mode, dbt assumes that all the objects in your development schema are part of your current work stream, and will prioritize those objects over the production objects when possible. + +Let’s take a look at that example above again, and pretend that some time before we went to make this edit, we did some work on `model_c`, and we have a local copy of `model_c` hanging out in our development schema: + + + +When you run `dbt run -s model_f --defer` , dbt will detect the development copy of `model_c` and say “Hey, y’know, I bet Dave is working on that model too, and he probably wants to make sure his changes to `model_c` work together with his changes to `model_f` . Because I am a kind and benevolent data transformation tool, i’ll make sure his `{{ ref('model_c') }]` function compiles to his development changes!” Thanks dbt! + +As a result, we’ll effectively see this behavior when we run our command: + + + +Where our code would compile from + +```sql +# in models/model_f.sql +with + +model_b as ( + select * from {{ ref('model_b') }} +), + +model_c as ( + select * from {{ ref('model_c') }} +), + +... +``` + +to + +```sql +# in target/compiled/models/model_f.sql +with + +model_b as ( + select * from analytics.analytics.model_b +), + +model_c as ( + select * from analytics.dbt_dconnors.model_b +), + +... +``` + +A mix of prod and dev models may not be what we want! To avoid this, we have a couple options: + +1. **Start fresh every time:** The simplest way to avoid this issue is to make sure you are always drop your development schema at the start of a new development session. That way, the only things that show up in your development schema are the things you intentionally selected with your commands! +2. **Favor state:** Passing the `--favor-state` flag to your command tells dbt “Hey benevolent tool, go ahead and use what you find in the production manifest no matter what you find in my development schema” so that both `{{ ref() }}` functions in the example above point to the production schema, even if `model_c` was hanging around in there. + +In this example, `model_c` is a relic of a previous development cycle, but I should be clear here that defaulting to using dev relations is *usually the right course of action* — generally, a dbt PR spans a few models, and you want to coordinate your changes across those models together. This behavior can just get a bit confusing if you’re encountering it for the first time! + +**When should I *not* defer to prod** + +While defer is a faster and cheaper option for most folks in most situations, defer to prod does not support all projects. The most common reason you should not use defer is regulatory — defer to prod makes the assumption that data is shared between your production and development environments, so reading between these environments is not an issue. For some organizations, like healthcare companies, have restrictions around the data access and sharing that precludes the basic defer structure presented here. + +### Call me Willem Defer + + + +Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects! diff --git a/website/static/img/blog/2023-12-04-defer-in-development/defer-toggle.png b/website/static/img/blog/2023-12-04-defer-in-development/defer-toggle.png new file mode 100644 index 00000000000..7161dc68b93 Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/defer-toggle.png differ diff --git a/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-defer.png b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-defer.png new file mode 100644 index 00000000000..7ec96a7b598 Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-defer.png differ diff --git a/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-full.png b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-full.png new file mode 100644 index 00000000000..45a1119cd96 Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-full.png differ diff --git a/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-mixed.png b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-mixed.png new file mode 100644 index 00000000000..1020c3b65f0 Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-mixed.png differ diff --git a/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-model-c.png b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-model-c.png new file mode 100644 index 00000000000..3f48255ac12 Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/prod-and-dev-model-c.png differ diff --git a/website/static/img/blog/2023-12-04-defer-in-development/prod-environment-plain.png b/website/static/img/blog/2023-12-04-defer-in-development/prod-environment-plain.png new file mode 100644 index 00000000000..5c2860411ec Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/prod-environment-plain.png differ diff --git a/website/static/img/blog/2023-12-04-defer-in-development/willem.png b/website/static/img/blog/2023-12-04-defer-in-development/willem.png new file mode 100644 index 00000000000..bd38e9b0bd4 Binary files /dev/null and b/website/static/img/blog/2023-12-04-defer-in-development/willem.png differ