Merge branch 'current' into patch-1

dbt-labs · Jan 8, 2024 · a4500be · a4500be
2 parents d9ac330 + 515840b
commit a4500be
Show file tree

Hide file tree

Showing 451 changed files with 4,292 additions and 3,726 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -1,26 +1,23 @@
 ## What are you changing in this pull request and why?
 <!---
-Describe your changes and why you're making them. If linked to an open
+Describe your changes and why you're making them. If related to an open 
 issue or a pull request on dbt Core, then link to them here! 
 
 To learn more about the writing conventions used in the dbt Labs docs, see the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md).
 -->
 
 ## Checklist
 <!--
-Uncomment if you're publishing docs for a prerelease version of dbt (delete if not applicable):
+Uncomment when publishing docs for a prerelease version of dbt:
 - [ ] Add versioning components, as described in [Versioning Docs](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-entire-pages)
 - [ ] Add a note to the prerelease version [Migration Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade)
 -->
 - [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.
 - [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
 - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."
 
-Adding new pages (delete if not applicable):
-- [ ] Add page to `website/sidebars.js`
-- [ ] Provide a unique filename for the new page
-
-Removing or renaming existing pages (delete if not applicable):
-- [ ] Remove page from `website/sidebars.js`
-- [ ] Add an entry `website/static/_redirects`
-- [ ] Run link testing locally with `npm run build` to update the links that point to the deleted page
+Adding or removing pages (delete if not applicable):
+- [ ] Add/remove page in `website/sidebars.js`
+- [ ] Provide a unique filename for new pages
+- [ ] Add an entry for deleted pages in `website/static/_redirects`
+- [ ] Run link testing locally with `npm run build` to update the links that point to deleted pages
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -12,8 +12,16 @@ jobs:
       uses: actions/setup-node@v3
       with:
         node-version: '18.12.0'
+
+    - name: Cache Node Modules
+      uses: actions/cache@v3
+      id: cache-node-mods
+      with:
+        path: website/node_modules
+        key: node-modules-cache-v3-${{ hashFiles('**/package.json', '**/package-lock.json') }}
 
     - name: Install Packages
+      if: steps.cache-node-mods.outputs.cache-hit != 'true'
       run: cd website && npm ci
 
     - name: Run ESLint

diff --git a/contributing/content-style-guide.md b/contributing/content-style-guide.md
@@ -284,7 +284,7 @@ If the list starts getting lengthy and dense, consider presenting the same conte
 
 A bulleted list with introductory text:    
 
-> A dbt project is a directory of `.sql` and .yml` files. The directory must contain at a minimum:
+> A dbt project is a directory of `.sql` and `.yml` files. The directory must contain at a minimum:
 >
 > - Models: A model is a single `.sql` file. Each model contains a single `select` statement that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation.
 > - A project file: A `dbt_project.yml` file, which configures and defines your dbt project.
@@ -479,6 +479,9 @@ Some common Latin abbreviations and other words to use instead:
 | i.e.               |  that is    | Use incremental models when your dbt runs are becoming too slow (that is, don't start with incremental models) |
 | e.g.               | <ul><li>for example</li><li>like</li></ul> | <ul><li>Join both the dedicated #adapter-ecosystem channel in dbt Slack and the channel for your adapter's data store (for example, #db-sqlserver and #db-athena)</li><li>Using Jinja in SQL provides a way to use control structures (like `if` statements and `for` loops) in your queries </li></ul> |
 | etc.               | <ul><li>and more</li><li>and so forth</li></ul> | <ul><li>A continuous integration environment running pull requests in GitHub, GitLab, and more</li><li>While reasonable defaults are provided for many such operations (like `create_schema`, `drop_schema`, `create_table`, and so forth), you might need to override one or more macros when building a new adapter</li></ul> |
+| N.B.               |  note    | Note: State-based selection is a powerful, complex feature. |
+
+https://www.thoughtco.com/n-b-latin-abbreviations-in-english-3972787
 
 ### Prepositions
 

diff --git a/package-lock.json b/package-lock.json
diff --git a/website/blog/2021-11-22-dbt-labs-pr-template.md b/website/blog/2021-11-22-dbt-labs-pr-template.md
@@ -70,7 +70,7 @@ Checking for things like modularity and 1:1 relationships between sources and st
 
 #### Validation of models:
 
-This section should show something to confirm that your model is doing what you intended it to do. This could be a [dbt test](/docs/build/tests) like uniqueness or not null, or could be an ad-hoc query that you wrote to validate your data. Here is a screenshot from a test run on a local development branch:
+This section should show something to confirm that your model is doing what you intended it to do. This could be a [dbt test](/docs/build/data-tests) like uniqueness or not null, or could be an ad-hoc query that you wrote to validate your data. Here is a screenshot from a test run on a local development branch:
 
 ![test validation](/img/blog/pr-template-test-validation.png "dbt test validation")
 

diff --git a/website/blog/2021-11-22-primary-keys.md b/website/blog/2021-11-22-primary-keys.md
@@ -51,7 +51,7 @@ In the days before testing your data was commonplace, you often found out that y
 
 ## How to test primary keys with dbt
 
-Today, you can add two simple [dbt tests](/docs/build/tests) onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data.
+Today, you can add two simple [dbt tests](/docs/build/data-tests) onto your primary keys and feel secure that you are going to catch the vast majority of problems in your data.
 
 Not surprisingly, these two tests correspond to the two most common errors found on your primary keys, and are usually the first tests that teams testing data with dbt implement:
 

diff --git a/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md b/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
@@ -90,7 +90,7 @@ So instead of getting bogged down in defining roles, let’s focus on hard skill
 The common skills needed for implementing any flavor of dbt (Core or Cloud) are:
 
 * SQL: ‘nuff said
-* YAML: required to generate config files for [writing tests on data models](/docs/build/tests)
+* YAML: required to generate config files for [writing tests on data models](/docs/build/data-tests)
 * [Jinja](/guides/using-jinja): allows you to write DRY code (using [macros](/docs/build/jinja-macros), for loops, if statements, etc)
 
 YAML + Jinja can be learned pretty quickly, but SQL is the non-negotiable you’ll need to get started.

diff --git a/website/blog/2021-11-29-open-source-community-growth.md b/website/blog/2021-11-29-open-source-community-growth.md
@@ -57,7 +57,7 @@ For starters, I want to know how much conversation is occurring across the vario
 
 There are a ton of metrics that can be tracked in any GitHub project — committers, pull requests, forks, releases — but I started pretty simple. For each of the projects we participate in, I just want to know how the number of GitHub stars grows over time, and whether the growth is accelerating or flattening out. This has become a key performance indicator for open source communities, for better or for worse, and keeping track of it isn't optional.
 
-Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
+Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `python -m pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
 
 To summarize, here are the metrics I decided to track (for now, anyway):
 - Slack messages (by user/ by community)

diff --git a/website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md b/website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md
@@ -87,7 +87,7 @@ The most important thing we’re introducing when your project is an infant is t
 
 * Introduce modularity with [{{ ref() }}](/reference/dbt-jinja-functions/ref) and [{{ source() }}](/reference/dbt-jinja-functions/source)
 
-* [Document](/docs/collaborate/documentation) and [test](/docs/build/tests) your first models
+* [Document](/docs/collaborate/documentation) and [test](/docs/build/data-tests) your first models
 
 ![image alt text](/img/blog/building-a-mature-dbt-project-from-scratch/image_3.png)
 

diff --git a/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md b/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md
@@ -159,7 +159,7 @@ pipelines:
           artifacts:  # Save the dbt run artifacts for the next step (upload)
             - target/*.json
           script:
-            - pip install -r requirements.txt
+            - python -m pip install -r requirements.txt
             - mkdir ~/.dbt
             - cp .ci/profiles.yml ~/.dbt/profiles.yml
             - dbt deps
@@ -208,7 +208,7 @@ pipelines:
             # Set up dbt environment + dbt packages. Rather than passing
             # profiles.yml to dbt commands explicitly, we'll store it where dbt
             # expects it:
-            - pip install -r requirements.txt
+            - python -m pip install -r requirements.txt
             - mkdir ~/.dbt
             - cp .ci/profiles.yml ~/.dbt/profiles.yml
             - dbt deps

diff --git a/website/blog/2022-04-19-complex-deduplication.md b/website/blog/2022-04-19-complex-deduplication.md
@@ -146,7 +146,7 @@ select * from filter_real_diffs
 
 > *What happens in this step? You check your data because you are thorough!*
 
-Good thing dbt has already built this for you. Add a [unique test](/docs/build/tests#generic-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test!
+Good thing dbt has already built this for you. Add a [unique test](/docs/build/data-tests#generic-data-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test!
 
 ```yaml
 models:

diff --git a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md
@@ -59,7 +59,7 @@ You probably agree that the latter example is definitely more elegant and easier
 
 In addition to CLI commands that interact with a single dbt Cloud API endpoint there are composite helper commands that call one or more API endpoints and perform more complex operations. One example of composite commands are `dbt-cloud job export` and `dbt-cloud job import` where, under the hood, the export command performs a `dbt-cloud job get` and writes the job metadata to a <Term id="json" /> file and the import command reads job parameters from a JSON file and calls `dbt-cloud job create`. The export and import commands can be used in tandem to move dbt Cloud jobs between projects. Another example is the `dbt-cloud job delete-all` which fetches a list of all jobs using `dbt-cloud job list` and then iterates over the list prompting the user if they want to delete the job. For each job that the user agrees to delete  a `dbt-cloud job delete` is performed.
 
-To install the CLI in your Python environment run `pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
+To install the CLI in your Python environment run `python -m pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
 
 ## How the project came to be
 
@@ -310,7 +310,7 @@ The `CatalogExploreCommand.execute` method implements the interactive exploratio
 I’ve included the app in the latest version of dbt-cloud-cli so you can test it out yourself! To use the app you need install dbt-cloud-cli with extra dependencies:
 
 ```bash
-pip install dbt-cloud-cli[demo]
+python -m pip install dbt-cloud-cli[demo]
 ```
 
 Now you can the run app:

diff --git a/website/blog/2022-09-28-analyst-to-ae.md b/website/blog/2022-09-28-analyst-to-ae.md
@@ -111,7 +111,7 @@ The analyst caught the issue because they have the appropriate context to valida
 
 An analyst is able to identify which areas do *not* need to be 100% accurate, which means they can also identify which areas *do* need to be 100% accurate.
 
-> dbt makes it very quick to add [data quality tests](/docs/build/tests). In fact, it’s so quick, that it’ll take an analyst longer to write up what tests they want than it would take for an analyst to completely finish coding them.
+> dbt makes it very quick to add [data quality tests](/docs/build/data-tests). In fact, it’s so quick, that it’ll take an analyst longer to write up what tests they want than it would take for an analyst to completely finish coding them.
 
 When data quality issues are identified by the business, we often see that analysts are the first ones to be asked:
 

diff --git a/website/blog/2022-10-19-polyglot-dbt-python-dataframes-and-sql.md b/website/blog/2022-10-19-polyglot-dbt-python-dataframes-and-sql.md
@@ -133,9 +133,9 @@ This model tries to parse the raw string value into a Python datetime. When not
 
 #### Testing the result
 
-During the build process, dbt will check if any of the values are null. This is using the built-in [`not_null`](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) test, which will generate and execute SQL in the data platform.
+During the build process, dbt will check if any of the values are null. This is using the built-in [`not_null`](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-data-tests) test, which will generate and execute SQL in the data platform.
 
-Our initial recommendation for testing Python models is to use [generic](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) and [singular](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-tests) tests.
+Our initial recommendation for testing Python models is to use [generic](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-data-tests) and [singular](https://docs.getdbt.com/docs/building-a-dbt-project/tests#singular-data-tests) tests.
 
 ```yaml
 version: 2

diff --git a/website/blog/2023-01-24-aggregating-test-failures.md b/website/blog/2023-01-24-aggregating-test-failures.md
@@ -30,7 +30,7 @@ _It should be noted that this framework is for dbt v1.0+ on BigQuery. Small adap
 
 When we talk about high quality data tests, we aren’t just referencing high quality code, but rather the informational quality of our testing framework and their corresponding error messages. Originally, we theorized that any test that cannot be acted upon is a test that should not be implemented. Later, we realized there is a time and place for tests that should receive attention at a critical mass of failures. All we needed was a higher specificity system: tests should have an explicit severity ranking associated with them, equipped to filter out the noise of common, but low concern, failures. Each test should also mesh into established [RACI](https://project-management.com/understanding-responsibility-assignment-matrix-raci-matrix/) guidelines that state which groups tackle what failures, and what constitutes a critical mass.
 
-To ensure that tests are always acted upon, we implement tests differently depending on the user groups that must act when a test fails. This led us to have two main classes of tests — Data Integrity Tests (called [Generic Tests](https://docs.getdbt.com/docs/build/tests) in dbt docs) and Context Driven Tests (called [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-tests) in dbt docs), with varying levels of severity across both test classes.
+To ensure that tests are always acted upon, we implement tests differently depending on the user groups that must act when a test fails. This led us to have two main classes of tests — Data Integrity Tests (called [Generic Tests](https://docs.getdbt.com/docs/build/tests) in dbt docs) and Context Driven Tests (called [Singular Tests](https://docs.getdbt.com/docs/build/tests#singular-data-tests) in dbt docs), with varying levels of severity across both test classes.
 
 Data Integrity tests (Generic Tests)  are simple — they’re tests akin to a uniqueness check or not null constraint. These tests are usually actionable by the data platform team rather than subject matter experts. We define Data Integrity tests in our YAML files, similar to how they are [outlined by dbt’s documentation on generic tests](https://docs.getdbt.com/docs/build/tests). They look something like this —
 

diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md
@@ -79,12 +79,12 @@ Depending on which database you’ve chosen, install the relevant database adapt
 
 ```text
 # install adaptor for duckdb
-pip install dbt-duckdb
+python -m pip install dbt-duckdb
 
 # OR 
 
 # install adaptor for postgresql
-pip install dbt-postgres
+python -m pip install dbt-postgres
 ```
 
 ### Step 4: Setup dbt profile

diff --git a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md
@@ -16,6 +16,8 @@ This article covers an approach to handling time-varying ragged hierarchies in a
 
 To help visualize this data, we're going to pretend we are a company that manufactures and rents out eBikes in a ride share application.  When we build a bike, we keep track of the serial numbers of the components that make up the bike.  Any time something breaks and needs to be replaced, we track the old parts that were removed and the new parts that were installed.  We also precisely track the mileage accumulated on each of our bikes.  Our primary analytical goal is to be able to report on the expected lifetime of each component, so we can prioritize improving that component and reduce costly maintenance.
 
+<!--truncate-->
+
 ## Data model
 
 Obviously, a real bike could have a hundred or more separate components.  To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube.  Our component hierarchy looks like:

diff --git a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md
@@ -143,7 +143,7 @@ To help you get started, [we have created a template GitHub project](https://git
 
 ### Entity Relation Diagrams (ERDs) and dbt
 
-Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-tests) into an ERD.
+Data lineage is dbt's strength, but sometimes it's not enough to help you to understand the relationships between Data Vault components like a classic ERD would. There are a few open source packages to visualize the entities in your Data Vault built with dbt. I recommend checking out the [dbterd](https://dbterd.datnguyen.de/1.2/index.html) which turns your [dbt relationship data quality checks](https://docs.getdbt.com/docs/build/tests#generic-data-tests) into an ERD.
 
 ## Summary
-Original file line number
+Diff line change
@@ Expand Up / @@ -146,7 +146,7 @@ select * from filter_real_diffs @@
     > *What happens in this step? You check your data because you are thorough!*
-    Good thing dbt has already built this for you. Add a [unique test](/docs/build/tests#generic-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test!
+    Good thing dbt has already built this for you. Add a [unique test](/docs/build/data-tests#generic-data-tests) to your YAML model block for your `grain_id` in this de-duped staging model, and give it a dbt test!
     ```yaml
     models:
@@ Expand Down @@