Merge branch 'current' into mwong-clarify-use-case

dbt-labs · Nov 27, 2023 · 7348fbf · 7348fbf
2 parents 2d24dd2 + f0f002b
commit 7348fbf
Show file tree

Hide file tree

Showing 1,190 changed files with 734 additions and 2,536 deletions.
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -12,8 +12,16 @@ jobs:
       uses: actions/setup-node@v3
       with:
         node-version: '18.12.0'
+
+    - name: Cache Node Modules
+      uses: actions/cache@v3
+      id: cache-node-mods
+      with:
+        path: website/node_modules
+        key: node-modules-cache-v3-${{ hashFiles('**/package.json', '**/package-lock.json') }}
 
     - name: Install Packages
+      if: steps.cache-node-mods.outputs.cache-hit != 'true'
       run: cd website && npm ci
 
     - name: Run ESLint

diff --git a/package-lock.json b/package-lock.json
diff --git a/website/blog/2021-11-29-open-source-community-growth.md b/website/blog/2021-11-29-open-source-community-growth.md
@@ -57,7 +57,7 @@ For starters, I want to know how much conversation is occurring across the vario
 
 There are a ton of metrics that can be tracked in any GitHub project — committers, pull requests, forks, releases — but I started pretty simple. For each of the projects we participate in, I just want to know how the number of GitHub stars grows over time, and whether the growth is accelerating or flattening out. This has become a key performance indicator for open source communities, for better or for worse, and keeping track of it isn't optional.
 
-Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
+Finally, I want to know how much Marquez and OpenLineage are being used. It used to be that when you wanted to consume a bit of tech, you'd download a file. Folks like me who study user behavior would track download counts as if they were stock prices. This is no longer the case; today, our tech is increasingly distributed through package managers and image repositories. Docker Hub and PyPI metrics have therefore become good indicators of consumption. Docker image pulls and runs of `python -m pip install` are the modern day download and, as noisy as these metrics are, they indicate a similar level of user commitment.
 
 To summarize, here are the metrics I decided to track (for now, anyway):
 - Slack messages (by user/ by community)

diff --git a/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md b/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md
@@ -159,7 +159,7 @@ pipelines:
           artifacts:  # Save the dbt run artifacts for the next step (upload)
             - target/*.json
           script:
-            - pip install -r requirements.txt
+            - python -m pip install -r requirements.txt
             - mkdir ~/.dbt
             - cp .ci/profiles.yml ~/.dbt/profiles.yml
             - dbt deps
@@ -208,7 +208,7 @@ pipelines:
             # Set up dbt environment + dbt packages. Rather than passing
             # profiles.yml to dbt commands explicitly, we'll store it where dbt
             # expects it:
-            - pip install -r requirements.txt
+            - python -m pip install -r requirements.txt
             - mkdir ~/.dbt
             - cp .ci/profiles.yml ~/.dbt/profiles.yml
             - dbt deps

diff --git a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md
@@ -59,7 +59,7 @@ You probably agree that the latter example is definitely more elegant and easier
 
 In addition to CLI commands that interact with a single dbt Cloud API endpoint there are composite helper commands that call one or more API endpoints and perform more complex operations. One example of composite commands are `dbt-cloud job export` and `dbt-cloud job import` where, under the hood, the export command performs a `dbt-cloud job get` and writes the job metadata to a <Term id="json" /> file and the import command reads job parameters from a JSON file and calls `dbt-cloud job create`. The export and import commands can be used in tandem to move dbt Cloud jobs between projects. Another example is the `dbt-cloud job delete-all` which fetches a list of all jobs using `dbt-cloud job list` and then iterates over the list prompting the user if they want to delete the job. For each job that the user agrees to delete  a `dbt-cloud job delete` is performed.
 
-To install the CLI in your Python environment run `pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
+To install the CLI in your Python environment run `python -m pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.
 
 ## How the project came to be
 
@@ -310,7 +310,7 @@ The `CatalogExploreCommand.execute` method implements the interactive exploratio
 I’ve included the app in the latest version of dbt-cloud-cli so you can test it out yourself! To use the app you need install dbt-cloud-cli with extra dependencies:
 
 ```bash
-pip install dbt-cloud-cli[demo]
+python -m pip install dbt-cloud-cli[demo]
 ```
 
 Now you can the run app:

diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md
@@ -79,12 +79,12 @@ Depending on which database you’ve chosen, install the relevant database adapt
 
 ```text
 # install adaptor for duckdb
-pip install dbt-duckdb
+python -m pip install dbt-duckdb
 
 # OR 
 
 # install adaptor for postgresql
-pip install dbt-postgres
+python -m pip install dbt-postgres
 ```
 
 ### Step 4: Setup dbt profile

diff --git a/website/docs/best-practices/best-practice-workflows.md b/website/docs/best-practices/best-practice-workflows.md
@@ -24,7 +24,7 @@ SQL styles, field naming conventions, and other rules for your dbt project shoul
 
 :::info Our style guide
 
-We've made our [style guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) public – these can act as a good starting point for your own style guide.
+We've made our [style guide](/best-practices/how-we-style/0-how-we-style-our-dbt-projects) public – these can act as a good starting point for your own style guide.
 
 :::
 

diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-2-setup.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-2-setup.md
@@ -23,8 +23,8 @@ We'll use pip to install MetricFlow and our dbt adapter:
 python -m venv [virtual environment name]
 source [virtual environment name]/bin/activate
 # install dbt and MetricFlow
-pip install "dbt-metricflow[adapter name]"
-# e.g. pip install "dbt-metricflow[snowflake]"
+python -m pip install "dbt-metricflow[adapter name]"
+# e.g. python -m pip install "dbt-metricflow[snowflake]"
 ```
 
 Lastly, to get to the pre-Semantic Layer starting state, checkout the `start-here` branch.

diff --git a/website/docs/community/spotlight/alison-stanton.md b/website/docs/community/spotlight/alison-stanton.md
@@ -2,7 +2,7 @@
 id: alison-stanton
 title: Alison Stanton
 description: |
-  I started programming 20+ years ago. I moved from web applications into transforming data and business intelligence reporting because it's both hard and useful. The majority of my career has been engineering for SaaS companies. For my last few positions I've been brought in to transition larger, older companies to a modern data platform and ways of thinking.
+  I started programming 20+ years ago. I moved from web applications into transforming data and business intelligence reporting because it's both hard and useful. The majority of my career has been in engineering for SaaS companies. For my last few positions I've been brought in to transition larger, older companies to a modern data platform and ways of thinking.
 
   I am dbt Certified. I attend Coalesce and other dbt events virtually. I speak up in <a href="https://www.getdbt.com/community/join-the-community" rel="noopener noreferrer" target="_blank">dbt Slack</a> and on the dbt-core, dbt-redshift, and dbt-sqlserver repositories. dbt Slack is my happy place, especially #advice-for-dbt-power-users. I care a lot about the dbt documentation and dbt doc.
 image: /img/community/spotlight/alison.jpg
@@ -23,7 +23,7 @@ hide_table_of_contents: true
 
 I joined the dbt community when I joined an employer in mid-2020. To summarize the important things that dbt has given me: it allowed me to focus on the next set of data challenges instead of staying in toil. Data folks joke that we're plumbers, but we're digital plumbers and that distinction should enable us to be DRY. That means not only writing DRY code like dbt allows, but also having tooling automation to DRY up repetitive tasks like dbt provides.
 
-dbt's existence flipped the experience of data testing on it's head for me. I went from a)years of instigating tech discussions on how to systematize data quality checks and b) building my own SQL tests and design patterns, to having built-in mechanisms for data testing.
+dbt's existence flipped the experience of data testing on its head for me. I went from a)years of instigating tech discussions on how to systematize data quality checks and b) building my own SQL tests and design patterns, to having built-in mechanisms for data testing.
 
 dbt and the dbt community materials are assets I can use in order to provide validation for things I have, do, and will say about data. Having outside voices to point to when requesting investment in data up-front - to avoid problems later - is an under-appreciated tool for data leader's toolboxes.
 
@@ -33,49 +33,57 @@ dbt's community has given me access to both a) high-quality, seasoned SMEs in my
 
 I want to be when I grow up:
 
-MJ, who was the first person to ever say "data build tool" to me. If I'd listened to her then I could have been part of the dbt community years sooner.
+- MJ, who was the first person to ever say "data build tool" to me. If I'd listened to her then I could have been part of the dbt community years sooner.
 
-Christine Dixon who presented <a href="https://www.youtube.com/watch?v=vD6IrGtxNAM" rel="noopener noreferrer" target="_blank">"Could You Defend Your Data in Court?"</a> at Coalesce 2023. In your entire data career, that is the most important piece of education you'll get.
+- Christine Dixon who presented <a href="https://www.youtube.com/watch?v=vD6IrGtxNAM" rel="noopener noreferrer" target="_blank">"Could You Defend Your Data in Court?"</a> at Coalesce 2023. In your entire data career, that is the most important piece of education you'll get.
 
-The dbt community team in general. Hands-down the most important work they do is the dbt Slack community, which gives me and others the accessibility we need to participate. Gwen Windflower (Winnie) for her extraordinary ability to bridge technical nuance with business needs on-the-fly. Dave Connors for being the first voice for "a node is a node is a node". Joel Labes for creating the ability to emoji-react with :sparkles: to post to the #best-of-slack channel. And so on. The decision to foster a space for data instead of just for their product because that enhances their product. The extremely impressive ability to maintain a problem-solving-is-cool, participate-as-you-can, chorus-of-voices, international, not-only-cis-men, and we're-all-in-this-together community.
+- The dbt community team in general. Hands-down the most important work they do is the dbt Slack community, which gives me and others the accessibility we need to participate. Gwen Windflower (Winnie) for her extraordinary ability to bridge technical nuance with business needs on-the-fly. Dave Connors for being the first voice for "a node is a node is a node". Joel Labes for creating the ability to emoji-react with :sparkles: to post to the #best-of-slack channel. And so on. The decision to foster a space for data instead of just for their product because that enhances their product. The extremely impressive ability to maintain a problem-solving-is-cool, participate-as-you-can, chorus-of-voices, international, not-only-cis-men, and we're-all-in-this-together community.
 
-Other (all?) dbt labs employees who engage with the community, instead of having a false separation with it - like most software companies. Welcoming feedback, listening to it, and actioning or filtering it out (ex. Mirna Wong, account reps). Thinking holistically about the eco-system not just one feature at a time (ex. Anders). Responsiveness and ability to translate diverse items into technical clarity and focused actions (ex. Doug Beatty, the dbt support team). I've been in software and open source and online communities for a long time - these are rare things we should not take for granted.
+- Other (all?) dbt labs employees who engage with the community, instead of having a false separation with it &mdash; like most software companies. Welcoming feedback, listening to it, and actioning or filtering it out (ex. Mirna Wong, account reps). Thinking holistically about the eco-system, not just one feature at a time (ex. Anders). Responsiveness and ability to translate diverse items into technical clarity and focused actions (ex. Doug Beatty, the dbt support team). I've been in software and open source and online communities for a long time - these are rare things we should not take for granted.
 
-Josh Devlin for prolificness that demonstrates expertise and dedication to helping.
+- Josh Devlin for prolificness that demonstrates expertise and dedication to helping.
 
-The maintainers of dbt packages like dbt-utils, dbt-expectations, dbt-date, etc.
+- The maintainers of dbt packages like dbt-utils, dbt-expectations, dbt-date, etc.
 
-Everyone who gets over their fear to ask a question, propose an answer that may not work, or otherwise take a risk by sharing their voice.
+- Everyone who gets over their fear to ask a question, propose an answer that may not work, or otherwise take a risk by sharing their voice.
 
-I hope I can support my employer and my professional development and my dbt community through the following:
--Elevate dbt understanding of and support for Enterprise-size company use cases through dialogue, requests, and examples.
--Emphasize rigor with defensive coding and comprehensive testing practices.
--Improve the onboarding and up-skilling of dbt engineers through feedback and edits on <a href="/">docs.getdbt.com</a>.
--Contribute to the maintenance of a collaborative and helpful dbt community as the number of dbt practitioners reaches various growth stages and tipping points.
--Engage in dialogue. Providing feedback. Champion developer experience as a priority. Be a good open source citizen on Github.
+I hope I can support my employer my professional development and my dbt community through the following:
+
+- Elevate dbt understanding of and support for Enterprise-size company use cases through dialogue, requests, and examples.
+- Emphasize rigor with defensive coding and comprehensive testing practices.
+- Improve the onboarding and up-skilling of dbt engineers through feedback and edits on <a href="/">docs.getdbt.com</a>.
+- Contribute to the maintenance of a collaborative and helpful dbt community as the number of dbt practitioners reaches various growth stages and tipping points.
+- Engage in dialogue. Providing feedback. Champion developer experience as a priority. Be a good open-source citizen on GitHub.
 
 ## What have you learned from community members? What do you hope others can learn from you?
 
 I have learned:
 
-Details on DAG sequencing.
-How to make an engineering proposal a community conversation.
-The <a href="https://www.getdbt.com/product/semantic-layer" rel="noopener noreferrer" target="_blank">dbt semantic layer</a>
-.
+- Details on DAG sequencing.
+- How to make an engineering proposal a community conversation.
+- The <a href="https://www.getdbt.com/product/semantic-layer" rel="noopener noreferrer" target="_blank">dbt semantic layer</a>
+
 So many things that are now so engrained in me that I can't remember not knowing them.
 
 I can teach and share about:
 
-Naming new concepts and how to choose those names.
-Reproducibility, reconciliation, and audits.
-Data ethics.
-Demographic questions for sexual orientation and/or gender identity on a form. I'm happy to be your shortcut to the most complicated data and most-engrained tech debt in history.
-I also geek out talking about: reusing functionality in creative ways, balancing trade-offs in data schema modeling, dealing with all of an organization's data holistically, tracking instrumentation, and philosophy on prioritization.
+- Naming new concepts and how to choose those names.
+- Reproducibility, reconciliation, and audits.
+- Data ethics.
+- Demographic questions for sexual orientation and/or gender identity on a form. I'm happy to be your shortcut to the most complicated data and most engrained tech debt in history.
+
+I also geek out talking about: 
+
+- reusing functionality in creative ways,
+- balancing trade-offs in data schema modeling,
+- dealing with all of an organization's data holistically,
+- tracking instrumentation, and
+- the philosophy on prioritization.
 
 The next things on my agenda to learn about:
 
-Successes and failures in data literacy work. The best I've found so far is 1:1 interactions and that doesn't scale.
-How to reduce the amount of time running dbt test takes while maintaining coverage.
+- Successes and failures in data literacy work. The best I've found so far is 1:1 interactions and that doesn't scale.
+- How to reduce the amount of time running dbt test takes while maintaining coverage.
 Data ethics.
 The things you think are most important by giving them a :sparkles: emoji reaction in Slack.
 

diff --git a/website/docs/docs/build/about-metricflow.md b/website/docs/docs/build/about-metricflow.md
@@ -82,7 +82,7 @@ The following example data is based on the Jaffle Shop repo. You can view the co
 To make this more concrete, consider the metric `order_total`, which is defined using the SQL expression:
 
 `select sum(order_total) as order_total from orders` 
-This expression calculates the revenue from each order by summing the order_total column in the orders table. In a business setting, the metric order_total is often calculated according to different categories, such as"
+This expression calculates the total revenue for all orders by summing the order_total column in the orders table. In a business setting, the metric order_total is often calculated according to different categories, such as"
 - Time, for example `date_trunc(ordered_at, 'day')`
 - Order Type, using `is_food_order` dimension from the `orders` table.
 

diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md
@@ -103,6 +103,8 @@ dbt Cloud has a number of pre-defined variables built in. The following environm
 - `DBT_CLOUD_RUN_ID`: The ID of this particular run
 - `DBT_CLOUD_RUN_REASON_CATEGORY`: The "category" of the trigger for this run (one of: `scheduled`, `github_pull_request`, `gitlab_merge_request`, `azure_pull_request`, `other`)
 - `DBT_CLOUD_RUN_REASON`: The specific trigger for this run (eg. `Scheduled`, `Kicked off by <email>`, or custom via `API`)
+- `DBT_CLOUD_ENVIRONMENT_ID`: The ID of the environment for this run
+- `DBT_CLOUD_ACCOUNT_ID`: The ID of the dbt Cloud account for this run
 
 **Git details**
-Original file line number
+Diff line change
@@ Expand Up @@
     :::info Our style guide
-    We've made our [style guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) public – these can act as a good starting point for your own style guide.
+    We've made our [style guide](/best-practices/how-we-style/0-how-we-style-our-dbt-projects) public – these can act as a good starting point for your own style guide.
     :::
@@ Expand Down @@