Skip to content

Commit

Permalink
Merge branch 'current' into mwong-clarify-defer
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Nov 13, 2023
2 parents 1da1108 + b047185 commit 4c7e2f0
Show file tree
Hide file tree
Showing 265 changed files with 6,728 additions and 5,494 deletions.
2 changes: 1 addition & 1 deletion contributing/adding-page-components.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Using warehouse components

You can use the following components to provide code snippets for each supported warehouse. You can see a real-life example in the docs page [Initialize your project](/quickstarts/databricks?step=6).
You can use the following components to provide code snippets for each supported warehouse. You can see a real-life example in the docs page [Initialize your project](/guides/databricks?step=6).

Identify code by labeling with the warehouse names:

Expand Down
2 changes: 1 addition & 1 deletion contributing/content-style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ Otherwise, the text will appear squished and provide users with a bad experience
- `<divclassName="grid--5-col">`: creates 5 columns cards (use sparingly)
- You can't create cards with 6 or more columns as that would provide users a poor experience.

Refer to [dbt Cloud features](/docs/cloud/about-cloud/dbt-cloud-features) and [Quickstarts](/docs/quickstarts/overview) as examples.
Refer to [dbt Cloud features](/docs/cloud/about-cloud/dbt-cloud-features) and [Quickstarts](/docs/guides) as examples.

### Create cards

Expand Down
6 changes: 3 additions & 3 deletions website/blog/2021-02-05-dbt-project-checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ This post is the checklist I created to guide our internal work, and I’m shari
* [Sources](/docs/build/sources/)
* [Refs](/reference/dbt-jinja-functions/ref/)
* [tags](/reference/resource-configs/tags/)
* [Jinja docs](/guides/advanced/using-jinja)
* [Jinja docs](/guides/using-jinja)

## ✅ Testing & Continuous Integration
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Expand All @@ -156,7 +156,7 @@ This post is the checklist I created to guide our internal work, and I’m shari

**Useful links**

* [Version control](/guides/legacy/best-practices#version-control-your-dbt-project)
* [Version control](/best-practices/best-practice-workflows#version-control-your-dbt-project)
* [dbt Labs' PR Template](/blog/analytics-pull-request-template)

## ✅ Documentation
Expand Down Expand Up @@ -252,7 +252,7 @@ Thanks to Christine Berger for her DAG diagrams!

**Useful links**

* [How we structure our dbt Project](/guides/best-practices/how-we-structure/1-guide-overview)
* [How we structure our dbt Project](/best-practices/how-we-structure/1-guide-overview)
* [Coalesce DAG Audit Talk](https://www.youtube.com/watch?v=5W6VrnHVkCA&t=2s)
* [Modular Data Modeling Technique](https://getdbt.com/analytics-engineering/modular-data-modeling-technique/)
* [Understanding Threads](/docs/running-a-dbt-project/using-threads)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,4 @@ All of the above configurations “work”. And as detailed, they each solve for
2. Figure out what may be a pain point in the future and try to plan for it from the beginning.
3. Don’t over-complicate things until you have the right reason. As I said in my Coalesce talk: **don’t drag your skeletons from one closet to another** 💀!

**Note:** Our attempt in writing guides like this and [How we structure our dbt projects](/guides/best-practices/how-we-structure/1-guide-overview) aren’t to try to convince you that our way is right; it is to hopefully save you the hundreds of hours it has taken us to form those opinions!
**Note:** Our attempt in writing guides like this and [How we structure our dbt projects](/best-practices/how-we-structure/1-guide-overview) aren’t to try to convince you that our way is right; it is to hopefully save you the hundreds of hours it has taken us to form those opinions!
2 changes: 1 addition & 1 deletion website/blog/2021-11-23-how-to-upgrade-dbt-versions.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ Once your compilation issues are resolved, it's time to run your job for real, t

After that, make sure that your CI environment in dbt Cloud or your orchestrator is on the right dbt version, then open a PR.

If you're using [Slim CI](https://docs.getdbt.com/docs/guides/best-practices#run-only-modified-models-to-test-changes-slim-ci), keep in mind that artifacts aren't necessarily compatible from one version to another, so you won't be able to use it until the job you defer to has completed a run with the upgraded dbt version. This doesn't impact our example because support for Slim CI didn't come out until 0.18.0.
If you're using [Slim CI](https://docs.getdbt.com/docs/best-practices#run-only-modified-models-to-test-changes-slim-ci), keep in mind that artifacts aren't necessarily compatible from one version to another, so you won't be able to use it until the job you defer to has completed a run with the upgraded dbt version. This doesn't impact our example because support for Slim CI didn't come out until 0.18.0.

## Step 7. Merge and communicate

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ So let’s all commit to sharing our hard won knowledge with each other—and in

The purpose of this blog is to double down on our long running commitment to contributing to the knowledge loop.

From early posts like ‘[The Startup Founders Guide to Analytics’](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1) to foundational guides like [‘How We Structure Our dbt Projects](/guides/best-practices/how-we-structure/1-guide-overview)’, we’ve had a long standing goal of working with the community to create practical, hands-on tutorials and guides which distill the knowledge we’ve been able to collectively gather.
From early posts like ‘[The Startup Founders Guide to Analytics’](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1) to foundational guides like [‘How We Structure Our dbt Projects](/best-practices/how-we-structure/1-guide-overview)’, we’ve had a long standing goal of working with the community to create practical, hands-on tutorials and guides which distill the knowledge we’ve been able to collectively gather.

dbt as a product is based around the philosophy that even the most complicated problems can be broken down into modular, reusable components, then mixed and matched to create something novel.

Expand Down
4 changes: 2 additions & 2 deletions website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ The common skills needed for implementing any flavor of dbt (Core or Cloud) are:

* SQL: ‘nuff said
* YAML: required to generate config files for [writing tests on data models](/docs/build/tests)
* [Jinja](/guides/advanced/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc)
* [Jinja](/guides/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc)

YAML + Jinja can be learned pretty quickly, but SQL is the non-negotiable you’ll need to get started.

Expand Down Expand Up @@ -176,7 +176,7 @@ Instead you can now use the following command:
`dbt build –select result:error+ –defer –state <previous_state_artifacts>` … and that’s it!


You can see more examples [here](https://docs.getdbt.com/docs/guides/best-practices#run-only-modified-models-to-test-changes-slim-ci).
You can see more examples [here](https://docs.getdbt.com/docs/best-practices#run-only-modified-models-to-test-changes-slim-ci).


This means that whether you’re actively developing or you simply want to rerun a scheduled job (because of, say, permission errors or timeouts in your database), you now have a unified approach to doing both.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ In addition to learning the basic pieces of dbt, we're familiarizing ourselves w

If we decide not to do this, we end up missing out on what the dbt workflow has to offer. If you want to learn more about why we think analytics engineering with dbt is the way to go, I encourage you to read the [dbt Viewpoint](/community/resources/viewpoint#analytics-is-collaborative)!

In order to learn the basics, we’re going to [port over the SQL file](/guides/migration/tools/refactoring-legacy-sql) that powers our existing "patient_claim_summary" report that we use in our KPI dashboard in parallel to our old transformation process. We’re not ripping out the old plumbing just yet. In doing so, we're going to try dbt on for size and get used to interfacing with a dbt project.
In order to learn the basics, we’re going to [port over the SQL file](/guides/refactoring-legacy-sql) that powers our existing "patient_claim_summary" report that we use in our KPI dashboard in parallel to our old transformation process. We’re not ripping out the old plumbing just yet. In doing so, we're going to try dbt on for size and get used to interfacing with a dbt project.

**Project Appearance**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ In short, a jaffle is:

*See above: Tasty, tasty jaffles.*

Jaffle Shop is a demo repo referenced in [dbt’s Getting Started Guide](/quickstarts), and its jaffles hold a special place in the dbt community’s hearts, as well as on Data Twitter™.
Jaffle Shop is a demo repo referenced in [dbt’s Getting Started Guide](/guides), and its jaffles hold a special place in the dbt community’s hearts, as well as on Data Twitter™.

![jaffles on data twitter](/img/blog/2022-02-08-customer-360-view/image_1.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ These 3 parts go from least granular (general) to most granular (specific) so yo

### Coming up...

In this part of the series, we talked about why the model name is the center of understanding for the purpose and content within a model. In the in the upcoming ["How We Structure Our dbt Projects"](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) guide, you can explore how to use this naming pattern with more specific examples in different parts of your dbt DAG that cover regular use cases:
In this part of the series, we talked about why the model name is the center of understanding for the purpose and content within a model. In the in the upcoming ["How We Structure Our dbt Projects"](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview) guide, you can explore how to use this naming pattern with more specific examples in different parts of your dbt DAG that cover regular use cases:

- How would you name a model that is filtered on some columns
- Do we recommend naming snapshots in a specific way
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-06-30-lower-sql-function.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ After running this query, the `customers` table will look a little something lik
Now, all characters in the `first_name` and `last_name` columns are lowercase.

> **Where do you lower?**
> Changing all string columns to lowercase to create uniformity across data sources typically happens in our dbt project’s [staging models](https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging). There are a few reasons for that: data cleanup and standardization, such as aliasing, casting, and lowercasing, should ideally happen in staging models to create downstream uniformity. It’s also more performant in downstream models that join on string values to join on strings that are of all the same casing versus having to join and perform lowercasing at the same time.
> Changing all string columns to lowercase to create uniformity across data sources typically happens in our dbt project’s [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging). There are a few reasons for that: data cleanup and standardization, such as aliasing, casting, and lowercasing, should ideally happen in staging models to create downstream uniformity. It’s also more performant in downstream models that join on string values to join on strings that are of all the same casing versus having to join and perform lowercasing at the same time.
## Why we love it

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-07-19-migrating-from-stored-procs.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ With dbt, we work towards creating simpler, more transparent data pipelines like

![Diagram of what data flows look like with dbt. It's easier to trace lineage in this setup.](/img/blog/2022-07-19-migrating-from-stored-procs/dbt-diagram.png)

Tight [version control integration](https://docs.getdbt.com/docs/guides/best-practices#version-control-your-dbt-project) is an added benefit of working with dbt. By leveraging the power of git-based tools, dbt enables you to integrate and test changes to transformation pipelines much faster than you can with other approaches. We often see teams who work in stored procedures making changes to their code without any notion of tracking those changes over time. While that’s more of an issue with the team’s chosen workflow than a problem with stored procedures per se, it does reflect how legacy tooling makes analytics work harder than necessary.
Tight [version control integration](https://docs.getdbt.com/docs/best-practices#version-control-your-dbt-project) is an added benefit of working with dbt. By leveraging the power of git-based tools, dbt enables you to integrate and test changes to transformation pipelines much faster than you can with other approaches. We often see teams who work in stored procedures making changes to their code without any notion of tracking those changes over time. While that’s more of an issue with the team’s chosen workflow than a problem with stored procedures per se, it does reflect how legacy tooling makes analytics work harder than necessary.

## Methodologies for migrating from stored procedures to dbt

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-07-26-pre-commit-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ is_featured: true

*Editor's note — since the creation of this post, the package pre-commit-dbt's ownership has moved to another team and it has been renamed to [dbt-checkpoint](https://github.com/dbt-checkpoint/dbt-checkpoint). A redirect has been set up, meaning that the code example below will still work. It is also possible to replace `repo: https://github.com/offbi/pre-commit-dbt` with `repo: https://github.com/dbt-checkpoint/dbt-checkpoint` in your `.pre-commit-config.yaml` file.*

At dbt Labs, we have [best practices](https://docs.getdbt.com/docs/guides/best-practices) we like to follow for the development of dbt projects. One of them, for example, is that all models should have at least `unique` and `not_null` tests on their primary key. But how can we enforce rules like this?
At dbt Labs, we have [best practices](https://docs.getdbt.com/docs/best-practices) we like to follow for the development of dbt projects. One of them, for example, is that all models should have at least `unique` and `not_null` tests on their primary key. But how can we enforce rules like this?

That question becomes difficult to answer in large dbt projects. Developers might not follow the same conventions. They might not be aware of past decisions, and reviewing pull requests in git can become more complex. When dbt projects have hundreds of models, it's hard to know which models do not have any tests defined and aren't enforcing your conventions.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ Developing an analytic code base is an ever-evolving process. What worked well w

4. **Test on representative data**

Testing on a [subset of data](https://docs.getdbt.com/guides/legacy/best-practices#limit-the-data-processed-when-in-development) is a great general practice. It allows you to iterate quickly, and doesn’t waste resources. However, there are times when you need to test on a larger dataset for problems like disk spillage to come to the fore. Testing on large data is hard and expensive, so make sure you have a good idea of the solution before you commit to this step.
Testing on a [subset of data](https://docs.getdbt.com/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development) is a great general practice. It allows you to iterate quickly, and doesn’t waste resources. However, there are times when you need to test on a larger dataset for problems like disk spillage to come to the fore. Testing on large data is hard and expensive, so make sure you have a good idea of the solution before you commit to this step.

5. **Repeat**

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-08-22-narrative-modeling.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ To that final point, if presented with the DAG from the narrative modeling appro

### Users can tie business concepts to source data

- While the schema structure above is focused on business entities, there are still ample use cases for [staging and intermediate tables](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview).
- While the schema structure above is focused on business entities, there are still ample use cases for [staging and intermediate tables](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview).
- After cleaning up source data with staging tables, use the same “what happened” approach to more technical events, creating a three-node dependency from `stg_snowplow_events` to `int_page_click_captured` to `user_refreshed_cart` and thus answering the question “where do we get online user behavior information?” in a quick visit to the DAG in dbt docs.

# Should your team use it?
Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-09-08-konmari-your-query-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Here are a few things to look for:

## Steps 4 & 5: Tidy by category and follow the right order—upstream to downstream

We are ready to unpack our kitchen. Use your design as a guideline for [modularization](/guides/best-practices/how-we-structure/1-guide-overview).
We are ready to unpack our kitchen. Use your design as a guideline for [modularization](/best-practices/how-we-structure/1-guide-overview).

- Build your staging tables first, and then your intermediate tables in your pre-planned buckets.
- Important, reusable joins that are performed in the final query should be moved upstream into their own modular models, as well as any joins that are repeated in your query.
Expand Down
4 changes: 2 additions & 2 deletions website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Instead of syncing all cells in a sheet, you create a [named range](https://five

<Lightbox src="/img/blog/2022-11-22-move-spreadsheets-to-your-dwh/google-sheets-uploader.png" title="Creating a named range in Google Sheets to sync via the Fivetran Google Sheets Connector" />

Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null.
Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null.

#### Good fit for:

Expand Down Expand Up @@ -192,4 +192,4 @@ Databricks also supports [pulling in data, such as spreadsheets, from external c

Beyond the options we’ve already covered, there’s an entire world of other tools that can load data from your spreadsheets into your data warehouse. This is a living document, so if your preferred method isn't listed then please [open a PR](https://github.com/dbt-labs/docs.getdbt.com) and I'll check it out.

The most important things to consider are your files’ origins and formats—if you need your colleagues to upload files on a regular basis then try to provide them with a more user-friendly process; but if you just need two computers to talk to each other, or it’s a one-off file that will hardly ever change, then a more technical integration is totally appropriate.
The most important things to consider are your files’ origins and formats—if you need your colleagues to upload files on a regular basis then try to provide them with a more user-friendly process; but if you just need two computers to talk to each other, or it’s a one-off file that will hardly ever change, then a more technical integration is totally appropriate.
Loading

0 comments on commit 4c7e2f0

Please sign in to comment.