Merge branch 'current' into patch-1

dbt-labs · Nov 30, 2023 · 82e8fdc · 82e8fdc
2 parents c05c218 + 7fc3cbf
commit 82e8fdc
Show file tree

Hide file tree

Showing 1,323 changed files with 4,169 additions and 3,149 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -12,7 +12,8 @@ Uncomment if you're publishing docs for a prerelease version of dbt (delete if n
 - [ ] Add versioning components, as described in [Versioning Docs](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-entire-pages)
 - [ ] Add a note to the prerelease version [Migration Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade)
 -->
-- [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [About versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) so my content adheres to these guidelines.
+- [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.
+- [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
 - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."
 
 Adding new pages (delete if not applicable):
@@ -22,4 +23,4 @@ Adding new pages (delete if not applicable):
 Removing or renaming existing pages (delete if not applicable):
 - [ ] Remove page from `website/sidebars.js`
 - [ ] Add an entry `website/static/_redirects`
-- [ ] [Ran link testing](https://github.com/dbt-labs/docs.getdbt.com#running-the-cypress-tests-locally) to update the links that point to the deleted page
+- [ ] Run link testing locally with `npm run build` to update the links that point to the deleted page
diff --git a/.github/workflows/asana-connection.yml b/.github/workflows/asana-connection.yml
@@ -0,0 +1,17 @@
+name: Show PR Status in Asana
+on:
+  pull_request:
+    types: [opened, reopened]
+
+jobs:
+  create-asana-attachment-job:
+    runs-on: ubuntu-latest
+    name: Create pull request attachments on Asana tasks
+    steps:
+      - name: Create pull request attachments
+        uses: Asana/create-app-attachment-github-action@latest
+        id: postAttachment
+        with:
+          asana-secret: ${{ secrets.ASANA_SECRET }}
+      - name: Log output status
+        run: echo "Status is ${{ steps.postAttachment.outputs.status }}"
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -12,8 +12,16 @@ jobs:
       uses: actions/setup-node@v3
       with:
         node-version: '18.12.0'
+
+    - name: Cache Node Modules
+      uses: actions/cache@v3
+      id: cache-node-mods
+      with:
+        path: website/node_modules
+        key: node-modules-cache-v3-${{ hashFiles('**/package.json', '**/package-lock.json') }}
 
     - name: Install Packages
+      if: steps.cache-node-mods.outputs.cache-hit != 'true'
       run: cd website && npm ci
 
     - name: Run ESLint

diff --git a/package-lock.json b/package-lock.json
diff --git a/website/blog/2021-11-29-open-source-community-growth.md b/website/blog/2021-11-29-open-source-community-growth.md
@@ -57,7 +57,7 @@ For starters, I want to know how much conversation is occurring across the vario
 
 There are a ton of metrics that can be tracked in any GitHub project — committers, pull requests, forks, releases — but I started pretty simple. For each of the projects we participate in, I just want to know how the number of GitHub stars grows over time, and whether the growth is accelerating or flattening out. This has become a key performance indicator for open source communities, for better or for worse, and keeping track of it isn't optional.
 
-Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
+Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `python -m pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
 
 To summarize, here are the metrics I decided to track (for now, anyway):
 - Slack messages (by user/ by community)

diff --git a/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md b/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md
@@ -159,7 +159,7 @@ pipelines:
           artifacts:  # Save the dbt run artifacts for the next step (upload)
             - target/*.json
           script:
-            - pip install -r requirements.txt
+            - python -m pip install -r requirements.txt
             - mkdir ~/.dbt
             - cp .ci/profiles.yml ~/.dbt/profiles.yml
             - dbt deps
@@ -208,7 +208,7 @@ pipelines:
             # Set up dbt environment + dbt packages. Rather than passing
             # profiles.yml to dbt commands explicitly, we'll store it where dbt
             # expects it:
-            - pip install -r requirements.txt
+            - python -m pip install -r requirements.txt
             - mkdir ~/.dbt
             - cp .ci/profiles.yml ~/.dbt/profiles.yml
             - dbt deps

diff --git a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md
@@ -59,7 +59,7 @@ You probably agree that the latter example is definitely more elegant and easier
 
 In addition to CLI commands that interact with a single dbt Cloud API endpoint there are composite helper commands that call one or more API endpoints and perform more complex operations. One example of composite commands are `dbt-cloud job export` and `dbt-cloud job import` where, under the hood, the export command performs a `dbt-cloud job get` and writes the job metadata to a <Term id="json" /> file and the import command reads job parameters from a JSON file and calls `dbt-cloud job create`. The export and import commands can be used in tandem to move dbt Cloud jobs between projects. Another example is the `dbt-cloud job delete-all` which fetches a list of all jobs using `dbt-cloud job list` and then iterates over the list prompting the user if they want to delete the job. For each job that the user agrees to delete  a `dbt-cloud job delete` is performed.
 
-To install the CLI in your Python environment run `pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
+To install the CLI in your Python environment run `python -m pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
 
 ## How the project came to be
 
@@ -310,7 +310,7 @@ The `CatalogExploreCommand.execute` method implements the interactive exploratio
 I’ve included the app in the latest version of dbt-cloud-cli so you can test it out yourself! To use the app you need install dbt-cloud-cli with extra dependencies:
 
 ```bash
-pip install dbt-cloud-cli[demo]
+python -m pip install dbt-cloud-cli[demo]
 ```
 
 Now you can the run app:

diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md
@@ -79,12 +79,12 @@ Depending on which database you’ve chosen, install the relevant database adapt
 
 ```text
 # install adaptor for duckdb
-pip install dbt-duckdb
+python -m pip install dbt-duckdb
 
 # OR 
 
 # install adaptor for postgresql
-pip install dbt-postgres
+python -m pip install dbt-postgres
 ```
 
 ### Step 4: Setup dbt profile

diff --git a/website/blog/2023-11-14-specify-prod-environment.md b/website/blog/2023-11-14-specify-prod-environment.md
@@ -0,0 +1,73 @@
+---
+
+title: Why you should specify a production environment in dbt Cloud
+description: "The bottom line: You should split your Environments in dbt Cloud based on their purposes (e.g. Production and Staging/CI) and mark one environment as Production. This will improve your CI experience and enable you to use dbt Explorer."
+slug: specify-prod-environment
+
+authors: [joel_labes]
+
+tags: [dbt Cloud]
+hide_table_of_contents: false
+
+date: 2023-11-14
+is_featured: false
+
+---
+
+:::tip The Bottom Line:
+You should [split your Jobs](#how) across Environments in dbt Cloud based on their purposes (e.g. Production and Staging/CI) and set one environment as Production. This will improve your CI experience and enable you to use dbt Explorer.
+:::
+
+[Environmental segmentation](/docs/environments-in-dbt) has always been an important part of the analytics engineering workflow:
+
+- When developing new models you can [process a smaller subset of your data](/reference/dbt-jinja-functions/target#use-targetname-to-limit-data-in-dev) by using `target.name` or an environment variable.
+- By building your production-grade models into [a different schema and database](https://docs.getdbt.com/docs/build/custom-schemas#managing-environments), you can experiment in peace without being worried that your changes will accidentally impact downstream users.
+- Using dedicated credentials for production runs, instead of an analytics engineer's individual dev credentials, ensures that things don't break when that long-tenured employee finally hangs up their IDE.
+
+Historically, dbt Cloud required a separate environment for _Development_, but was otherwise unopinionated in how you configured your account. This mostly just worked – as long as you didn't have anything more complex than a CI job mixed in with a couple of production jobs – because important constructs like deferral in CI and documentation were only ever tied to a single job.
+
+But as companies' dbt deployments have grown more complex, it doesn't make sense to assume that a single job is enough anymore. We need to exchange a job-oriented strategy for a more mature and scalable environment-centric view of the world. To support this, a recent change in dbt Cloud enables project administrators to [mark one of their environments as the Production environment](/docs/deploy/deploy-environments#set-as-production-environment-beta), just as has long been possible for the Development environment.
+
+Explicitly separating your Production workloads lets dbt Cloud be smarter with the metadata it creates, and is particularly important for two new features: dbt Explorer and the revised CI workflows.
+
+<!-- truncate -->
+
+## Make sure dbt Explorer always has the freshest information available
+
+**The old way**: Your dbt docs site was based on a single job's run.
+
+**The new way**: dbt Explorer uses metadata from across every invocation in a defined Production environment to build the richest and most up-to-date understanding of your project.
+
+Because dbt docs could only be updated by a single predetermined job, users who needed their documentation to immediately reflect changes deployed throughout the day (regardless of which job executed them) would find themselves forced to run a dedicated job which did nothing other than run `dbt docs generate` on a regular schedule.
+
+The Discovery API that powers dbt Explorer ingests all metadata generated by any dbt invocation, which means that it can always be up to date with the applied state of your project. However it doesn't make sense for dbt Explorer to show docs based on a PR that hasn't been merged yet.
+
+To avoid this conflation, you need to mark an environment as the Production environment. All runs completed in _that_ environment will contribute to dbt Explorer's, while others will be excluded. (Future versions of Explorer will support environment selection, so that you can preview your documentation changes as well.)
+
+## Run Slimmer CI than ever with environment-level deferral
+
+**The old way**: [Slim CI](/guides/set-up-ci?step=2) deferred to a single job, and would only detect changes as of that job's last build time.
+
+**The new way**: Changes are detected regardless of the job they were deployed in, removing false positives and overbuilding of models in CI.
+
+Just like dbt docs, relying on a single job to define your state for comparison purposes leads to a choice between unnecessarily rebuilding models which were deployed by another job, or creating a dedicated job that runs `dbt compile` on repeat to keep on top of all changes.
+
+By using the environment as the arbiter of state, any time a change is made to your Production deployment it will immediately be taken into consideration by subsequent Slim CI runs.
+
+## The easiest way to break apart your jobs {#how}
+
+<Lightbox src="/img/blog/2023-11-06-differentiate-prod-and-staging-environments/data-landscape.png" alt="A chart showing the interplay of Data Warehouse, git repo and dbt Cloud project across Dev, CI and Prod environments." title="Your organization's data landscape should separate Dev, CI and Prod environments. To achieve this, configure your data warehouse, git repo and dbt Cloud account as shown above." width="100%"/>
+
+For most projects, changing from a job-centric to environment-centric approach to metadata is straightforward and immediately pays dividends as described above. Assuming that your Staging/CI and Production jobs are currently intermingled, you can extricate them as follows:
+
+1. Create a new dbt Cloud environment called Staging
+2. For each job that belongs to the Staging environment, edit the job and update its environment
+3. Tick the ["Mark as Production environment" box](/docs/deploy/deploy-environments#set-as-production-environment-beta) in your original environment's settings
+
+## Conclusion
+
+Until very recently, I only thought of Environments in dbt Cloud as a way to use different authentication credentials in different contexts. And until very recently, I was mostly right.
+
+Not anymore. The metadata dbt creates is critical for effective data teams – whether you're concerned about cost savings, discoverability, increased development speed or reliable results across your organization – but is only fully effective if it's segmented by the environment that created it.
+
+Take a few minutes to clean up your environments - it'll make all the difference.
diff --git a/website/blog/categories.yml b/website/blog/categories.yml
@@ -19,3 +19,5 @@
   display_title: SQL magic
   description: Stories of dbt developers making SQL sing across warehouses.
   is_featured: true
+- name: dbt Cloud
+  description: Using dbt Cloud to build for scale
diff --git a/website/docs/best-practices/best-practice-workflows.md b/website/docs/best-practices/best-practice-workflows.md
@@ -24,7 +24,7 @@ SQL styles, field naming conventions, and other rules for your dbt project shoul
 
 :::info Our style guide
 
-We've made our [style guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) public – these can act as a good starting point for your own style guide.
+We've made our [style guide](/best-practices/how-we-style/0-how-we-style-our-dbt-projects) public – these can act as a good starting point for your own style guide.
 
 :::
-Original file line number
+Diff line change
@@ Expand Up @@
     :::info Our style guide
-    We've made our [style guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) public – these can act as a good starting point for your own style guide.
+    We've made our [style guide](/best-practices/how-we-style/0-how-we-style-our-dbt-projects) public – these can act as a good starting point for your own style guide.
     :::
@@ Expand Down @@