-
Notifications
You must be signed in to change notification settings - Fork 982
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'current' into mwong-config-descriptions
- Loading branch information
Showing
31 changed files
with
1,479 additions
and
84 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
|
||
title: Why you should specify a production environment in dbt Cloud | ||
description: "The bottom line: You should split your Environments in dbt Cloud based on their purposes (e.g. Production and Staging/CI) and mark one environment as Production. This will improve your CI experience and enable you to use dbt Explorer." | ||
slug: specify-prod-environment | ||
|
||
authors: [joel_labes] | ||
|
||
tags: [dbt Cloud] | ||
hide_table_of_contents: false | ||
|
||
date: 2023-11-14 | ||
is_featured: false | ||
|
||
--- | ||
|
||
:::tip The Bottom Line: | ||
You should [split your Jobs](#how) across Environments in dbt Cloud based on their purposes (e.g. Production and Staging/CI) and set one environment as Production. This will improve your CI experience and enable you to use dbt Explorer. | ||
::: | ||
|
||
[Environmental segmentation](/docs/environments-in-dbt) has always been an important part of the analytics engineering workflow: | ||
|
||
- When developing new models you can [process a smaller subset of your data](/reference/dbt-jinja-functions/target#use-targetname-to-limit-data-in-dev) by using `target.name` or an environment variable. | ||
- By building your production-grade models into [a different schema and database](https://docs.getdbt.com/docs/build/custom-schemas#managing-environments), you can experiment in peace without being worried that your changes will accidentally impact downstream users. | ||
- Using dedicated credentials for production runs, instead of an analytics engineer's individual dev credentials, ensures that things don't break when that long-tenured employee finally hangs up their IDE. | ||
|
||
Historically, dbt Cloud required a separate environment for _Development_, but was otherwise unopinionated in how you configured your account. This mostly just worked – as long as you didn't have anything more complex than a CI job mixed in with a couple of production jobs – because important constructs like deferral in CI and documentation were only ever tied to a single job. | ||
|
||
But as companies' dbt deployments have grown more complex, it doesn't make sense to assume that a single job is enough anymore. We need to exchange a job-oriented strategy for a more mature and scalable environment-centric view of the world. To support this, a recent change in dbt Cloud enables project administrators to [mark one of their environments as the Production environment](/docs/deploy/deploy-environments#set-as-production-environment-beta), just as has long been possible for the Development environment. | ||
|
||
Explicitly separating your Production workloads lets dbt Cloud be smarter with the metadata it creates, and is particularly important for two new features: dbt Explorer and the revised CI workflows. | ||
|
||
<!-- truncate --> | ||
|
||
## Make sure dbt Explorer always has the freshest information available | ||
|
||
**The old way**: Your dbt docs site was based on a single job's run. | ||
|
||
**The new way**: dbt Explorer uses metadata from across every invocation in a defined Production environment to build the richest and most up-to-date understanding of your project. | ||
|
||
Because dbt docs could only be updated by a single predetermined job, users who needed their documentation to immediately reflect changes deployed throughout the day (regardless of which job executed them) would find themselves forced to run a dedicated job which did nothing other than run `dbt docs generate` on a regular schedule. | ||
|
||
The Discovery API that powers dbt Explorer ingests all metadata generated by any dbt invocation, which means that it can always be up to date with the applied state of your project. However it doesn't make sense for dbt Explorer to show docs based on a PR that hasn't been merged yet. | ||
|
||
To avoid this conflation, you need to mark an environment as the Production environment. All runs completed in _that_ environment will contribute to dbt Explorer's, while others will be excluded. (Future versions of Explorer will support environment selection, so that you can preview your documentation changes as well.) | ||
|
||
## Run Slimmer CI than ever with environment-level deferral | ||
|
||
**The old way**: [Slim CI](/guides/set-up-ci?step=2) deferred to a single job, and would only detect changes as of that job's last build time. | ||
|
||
**The new way**: Changes are detected regardless of the job they were deployed in, removing false positives and overbuilding of models in CI. | ||
|
||
Just like dbt docs, relying on a single job to define your state for comparison purposes leads to a choice between unnecessarily rebuilding models which were deployed by another job, or creating a dedicated job that runs `dbt compile` on repeat to keep on top of all changes. | ||
|
||
By using the environment as the arbiter of state, any time a change is made to your Production deployment it will immediately be taken into consideration by subsequent Slim CI runs. | ||
|
||
## The easiest way to break apart your jobs {#how} | ||
|
||
<Lightbox src="/img/blog/2023-11-06-differentiate-prod-and-staging-environments/data-landscape.png" alt="A chart showing the interplay of Data Warehouse, git repo and dbt Cloud project across Dev, CI and Prod environments." title="Your organization's data landscape should separate Dev, CI and Prod environments. To achieve this, configure your data warehouse, git repo and dbt Cloud account as shown above." width="100%"/> | ||
|
||
For most projects, changing from a job-centric to environment-centric approach to metadata is straightforward and immediately pays dividends as described above. Assuming that your Staging/CI and Production jobs are currently intermingled, you can extricate them as follows: | ||
|
||
1. Create a new dbt Cloud environment called Staging | ||
2. For each job that belongs to the Staging environment, edit the job and update its environment | ||
3. Tick the ["Mark as Production environment" box](/docs/deploy/deploy-environments#set-as-production-environment-beta) in your original environment's settings | ||
|
||
## Conclusion | ||
|
||
Until very recently, I only thought of Environments in dbt Cloud as a way to use different authentication credentials in different contexts. And until very recently, I was mostly right. | ||
|
||
Not anymore. The metadata dbt creates is critical for effective data teams – whether you're concerned about cost savings, discoverability, increased development speed or reliable results across your organization – but is only fully effective if it's segmented by the environment that created it. | ||
|
||
Take a few minutes to clean up your environments - it'll make all the difference. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
id: alison-stanton | ||
title: Alison Stanton | ||
description: | | ||
I started programming 20+ years ago. I moved from web applications into transforming data and business intelligence reporting because it's both hard and useful. The majority of my career has been engineering for SaaS companies. For my last few positions I've been brought in to transition larger, older companies to a modern data platform and ways of thinking. | ||
I am dbt Certified. I attend Coalesce and other dbt events virtually. I speak up in <a href="https://www.getdbt.com/community/join-the-community" rel="noopener noreferrer" target="_blank">dbt Slack</a> and on the dbt-core, dbt-redshift, and dbt-sqlserver repositories. dbt Slack is my happy place, especially #advice-for-dbt-power-users. I care a lot about the dbt documentation and dbt doc. | ||
image: /img/community/spotlight/alison.jpg | ||
pronouns: she/her | ||
location: Chicago, IL, USA | ||
jobTitle: AVP, Analytics Engineering Lead | ||
organization: Advocates for SOGIE Data Collection | ||
socialLinks: | ||
- name: LinkedIn | ||
link: https://www.linkedin.com/in/alisonstanton/ | ||
- name: Github | ||
link: https://github.com/alison985/ | ||
dateCreated: 2023-11-07 | ||
hide_table_of_contents: true | ||
--- | ||
|
||
## When did you join the dbt community and in what way has it impacted your career? | ||
|
||
I joined the dbt community when I joined an employer in mid-2020. To summarize the important things that dbt has given me: it allowed me to focus on the next set of data challenges instead of staying in toil. Data folks joke that we're plumbers, but we're digital plumbers and that distinction should enable us to be DRY. That means not only writing DRY code like dbt allows, but also having tooling automation to DRY up repetitive tasks like dbt provides. | ||
|
||
dbt's existence flipped the experience of data testing on it's head for me. I went from a)years of instigating tech discussions on how to systematize data quality checks and b) building my own SQL tests and design patterns, to having built-in mechanisms for data testing. | ||
|
||
dbt and the dbt community materials are assets I can use in order to provide validation for things I have, do, and will say about data. Having outside voices to point to when requesting investment in data up-front - to avoid problems later - is an under-appreciated tool for data leader's toolboxes. | ||
|
||
dbt's community has given me access to both a) high-quality, seasoned SMEs in my field to learn from and b) newer folks I can help. Both are gifts that I cherish. | ||
|
||
## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? | ||
|
||
I want to be when I grow up: | ||
|
||
MJ, who was the first person to ever say "data build tool" to me. If I'd listened to her then I could have been part of the dbt community years sooner. | ||
|
||
Christine Dixon who presented <a href="https://www.youtube.com/watch?v=vD6IrGtxNAM" rel="noopener noreferrer" target="_blank">"Could You Defend Your Data in Court?"</a> at Coalesce 2023. In your entire data career, that is the most important piece of education you'll get. | ||
|
||
The dbt community team in general. Hands-down the most important work they do is the dbt Slack community, which gives me and others the accessibility we need to participate. Gwen Windflower (Winnie) for her extraordinary ability to bridge technical nuance with business needs on-the-fly. Dave Connors for being the first voice for "a node is a node is a node". Joel Labes for creating the ability to emoji-react with :sparkles: to post to the #best-of-slack channel. And so on. The decision to foster a space for data instead of just for their product because that enhances their product. The extremely impressive ability to maintain a problem-solving-is-cool, participate-as-you-can, chorus-of-voices, international, not-only-cis-men, and we're-all-in-this-together community. | ||
|
||
Other (all?) dbt labs employees who engage with the community, instead of having a false separation with it - like most software companies. Welcoming feedback, listening to it, and actioning or filtering it out (ex. Mirna Wong, account reps). Thinking holistically about the eco-system not just one feature at a time (ex. Anders). Responsiveness and ability to translate diverse items into technical clarity and focused actions (ex. Doug Beatty, the dbt support team). I've been in software and open source and online communities for a long time - these are rare things we should not take for granted. | ||
|
||
Josh Devlin for prolificness that demonstrates expertise and dedication to helping. | ||
|
||
The maintainers of dbt packages like dbt-utils, dbt-expectations, dbt-date, etc. | ||
|
||
Everyone who gets over their fear to ask a question, propose an answer that may not work, or otherwise take a risk by sharing their voice. | ||
|
||
I hope I can support my employer and my professional development and my dbt community through the following: | ||
-Elevate dbt understanding of and support for Enterprise-size company use cases through dialogue, requests, and examples. | ||
-Emphasize rigor with defensive coding and comprehensive testing practices. | ||
-Improve the onboarding and up-skilling of dbt engineers through feedback and edits on <a href="/">docs.getdbt.com</a>. | ||
-Contribute to the maintenance of a collaborative and helpful dbt community as the number of dbt practitioners reaches various growth stages and tipping points. | ||
-Engage in dialogue. Providing feedback. Champion developer experience as a priority. Be a good open source citizen on Github. | ||
|
||
## What have you learned from community members? What do you hope others can learn from you? | ||
|
||
I have learned: | ||
|
||
Details on DAG sequencing. | ||
How to make an engineering proposal a community conversation. | ||
The <a href="https://www.getdbt.com/product/semantic-layer" rel="noopener noreferrer" target="_blank">dbt semantic layer</a> | ||
. | ||
So many things that are now so engrained in me that I can't remember not knowing them. | ||
|
||
I can teach and share about: | ||
|
||
Naming new concepts and how to choose those names. | ||
Reproducibility, reconciliation, and audits. | ||
Data ethics. | ||
Demographic questions for sexual orientation and/or gender identity on a form. I'm happy to be your shortcut to the most complicated data and most-engrained tech debt in history. | ||
I also geek out talking about: reusing functionality in creative ways, balancing trade-offs in data schema modeling, dealing with all of an organization's data holistically, tracking instrumentation, and philosophy on prioritization. | ||
|
||
The next things on my agenda to learn about: | ||
|
||
Successes and failures in data literacy work. The best I've found so far is 1:1 interactions and that doesn't scale. | ||
How to reduce the amount of time running dbt test takes while maintaining coverage. | ||
Data ethics. | ||
The things you think are most important by giving them a :sparkles: emoji reaction in Slack. | ||
|
||
## Anything else interesting you want to tell us? | ||
|
||
My gratitude to each community member for this community. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.