Revisions & additions to Model Versions #3232

jtcohen6 · 2023-04-20T15:14:48Z

Preview: Collaborate with others > Model governance > Model versions

What are you changing in this pull request and why?

We've written the minimal viable reference docs for this feature. I want to offer some more opinionated guidance & framing, and gesture in the direction of some best practices:

Don't create a new version for every model change
Do actually sunset/deprecate your old model versions

This does require a more personal tone, and a sense of future direction, than a lot of other (more-established) documentation. Very open to feedback.

netlify · 2023-04-20T15:14:57Z

✅ Deploy Preview for docs-getdbt-com ready!

Name	Link
🔨 Latest commit	`1a00757`
🔍 Latest deploy log	https://app.netlify.com/sites/docs-getdbt-com/deploys/644900b6a731830008eb5024
😎 Deploy Preview	https://deploy-preview-3232--docs-getdbt-com.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

MichelleArk · 2023-04-20T15:36:58Z

website/docs/docs/collaborate/govern/model-versions.md

+          value: old
+```
+
+Finally, we intend to add support for **deprecating models** in dbt Core v1.6. When you slate a versioned model for deprecation, dbt will be able to provide more helpful warnings to downstream consumers of that model. Rather than just, "This model is going away," it's - "This older version of the model is going away, and there's a new version coming soon."


Not sure if this is the best spot for it - but it'd be great to raise the tradeoff between cost/clutter (from maintaining multiple versions of a model in a warehouse) and providing consumers with enough time to gracefully migrate off old versions. It could be easy to let old versions of models pile up but deprecation dates create an explicit bound for how costly a migration should to be for a particular model

joellabes

This is wonderful. Nitpicks, pedantry and clarifying questions from me throughout, but I think that vibe-wise this is great!

joellabes · 2023-04-20T23:57:42Z

website/docs/docs/collaborate/govern/model-versions.md

-In the meantime, anywhere that model is used downstream, it can be referenced at a specific version.
+In the meantime, anywhere that model is used downstream, it can continue to be referenced at a specific version.
+
+In the future, we intend to also add support for **deprecating models**. Taken together, model versions and deprecation offer a pathway for _sunsetting_ and _migrating_. In the short term, avoid breaking everyone's queries. Over the longer term, older & unmaintained versions go away—they do **not** stick around forever.


This feels a lil fuzzy right now.

In the short term, avoid breaking everyone's queries

Who is the subject here?

Are we (dbt Labs) avoiding breaking users' queries by shipping versioning first, while in the long term making it possible for old versions to go away?

Or is this an imperative to the reader: once deprecation ships, you should use versions to avoid breaking your own queries in the short term, and use the deprecation window to eventually get rid of unmaintained versions?

joellabes · 2023-04-21T00:04:29Z

website/docs/docs/collaborate/govern/model-versions.md

+
+It's also possible to change the model in more subtle ways — by recalculating a column in a way that doesn't change its name, data type, or enforceable characteristics—but would substantially change the results seen by downstream queriers.
+
+The process of sunsetting and migrating model versions requires real work, and may require significant coordination across teams. If, instead of using model versions, you opt for non-breaking changes wherever possible—that's a completely legitimate approach. Even so, after a while, you'll find yourself with lots of unused or deprecated columns. Many teams will want to consider a predictable cadence (once or twice a year) for bumping the version of their mature models, and taking the opportunity to remove no-longer-used columns.


that's a completely legitimate approach

if anything, I reckon that that's underselling it. You should make non-breaking changes as much as possible, and if you have to make breaking changes to a model, you should try to bunch them all together instead of dribbling out bad news over time.

An interesting thing I just found while looking at breaking changes best practices: Our friends at HubSpot have a stated policy for deprecating tables, because they offer Snowflake Data Shares: https://developers.hubspot.com/docs/breaking-change-definition#snowflake-data-share

That's a great find! We're taking exactly the same approach, in terms of what we're considering breaking versus non-breaking, and in recommending a clear migration window:

In this case, HubSpot will add a new table with the same data to the share so that you can begin using the new name. The old table will continue to exist until the end of the 90 day notice period.

joellabes · 2023-04-21T00:13:52Z

website/docs/docs/collaborate/govern/model-versions.md

+
+You've always been able to create a new model, and name it `dim_customers_v2`. Why should you opt for a "real" versioned model instead?
+
+First, the versioned model preserves its _reference name_. Versioned models are `ref`'d by their _model name_, rather than the name of the file that they're defined in. By default, the `ref` resolves to the latest version (as declared by that model's maintainer), but you can also `ref` a specific version of the model, with a `version` keyword.


"reference name" is a new concept to me here. Is this the same as "model name"? I think it is from context, but if so then I don't think we want to introduce a brand new term as a one-off

I think this needs a more worked example with examples and how the different elements combine:

models: - name: dim_customers latest_version: 2 versions: - v: 3 defined_in: dim_customers_NOT_READY_YET.sql ... - v: 2 alias: dim_customers ... - v: 1 ...

v ref syntax file name table name

3 ref('dim_customers', v=3) dim_customers_NOT_READY_YET.sql analytics.dim_customers_v3

2 ref('dim_customers') or ref('dim_customers', v=2) dim_customers_v2.sql analytics.dim_customers

1 ref('dim_customers', v=1) dim_customers_v1.sql analytics.dim_customers_v1

Does latest version of a model get to look for the _vX-less sql file? I don't like that the grid here forces new sql files for every version, so you don't get a nice git diff

Does latest version of a model get to look for the _vX-less sql file?

That's not the case in the current implementation. Should it be? I think we could do this. (Naive attempt: dbt-labs/dbt-core@b13cd2b)

Even if we don't do this, you could do the same thing as with aliases, and keep moving the defined_in property around, so that dim_customers.sql is always your "latest":

models: - name: dim_customers latest_version: 2 versions: - v: 3 defined_in: dim_customers_NOT_READY_YET.sql ... - v: 2 # because this is the latest, it should have the canonical file name + alias alias: dim_customers defined_in: dim_customers ... - v: 1 ...

But if that's our strong recommendation - let's just make it the default behavior

It feels weird/WET to have to move both the alias and defined_in around over time to just get the same name as is already defined in the name key up top.

Edit: here's my actual objection: defined_in feels like it should be a "break glass in case of emergency" property, not something that gets rolled out everywhere. If you're using defined_in, you best have a good reason. Encouraging it to be everywhere cheapens that a bit

joellabes · 2023-04-21T00:14:11Z

website/docs/docs/collaborate/govern/model-versions.md

+
+First, the versioned model preserves its _reference name_. Versioned models are `ref`'d by their _model name_, rather than the name of the file that they're defined in. By default, the `ref` resolves to the latest version (as declared by that model's maintainer), but you can also `ref` a specific version of the model, with a `version` keyword.
+
+<File name="models/schema.yml">


Suggested change

<File name="models/schema.yml">

<File name="models/scratchpad.sql">

Although if you take the table and sample code above, I think this File block is totally redundant

joellabes · 2023-04-21T00:59:02Z

website/docs/docs/collaborate/govern/model-versions.md

+```yml
+selectors:
+  - name: exclude_old_versions
+    default: "{{ target.name == 'dev' }}"


Does this need | as_bool?

It actually doesn't. I think the reason is because yaml treats the string "True" as truthy

joellabes · 2023-04-21T01:15:33Z

website/docs/docs/collaborate/govern/model-versions.md

+Or, you could define a separate view that always points to the latest version of the model. We recommend this pattern because it's the most transparent and easiest to follow.
+
+<File name="models/dim_customers_view.yml">
+
+```sql
+{{ config(alias = 'dim_customers') }}
+
+select * from {{ ref('dim_customers') }}
+```
+
+</File>


As I was building my table of examples up top, I decided that if we're not going to do this magically behind the scenes, I think we should just encourage people to move the alias definition around in their YAML as they progress their models. Telling people to shepherd an entire extra model around by hand feels gross.

@joellabes I've had another thought here: It would be possible to implement this as a standard pattern, with a modification to the generate_alias_name macro.

{% macro generate_alias_name(custom_alias_name=none, node=none) -%} {%- if custom_alias_name -%} {{ return(custom_alias_name | trim) }} {%- elif node.version and not node.is_latest_version -%} {# <--- this bit #} {{ return(node.name ~ "_v" ~ (node.version | replace(".", "_"))) }} {%- else -%} {# latest version has standard behavior #} {{ return(node.name) }} {%- endif -%} {%- endmacro %}

This way, whichever version is latest_version, it always lands in the model's "canonical" location.

Should we make that the default behavior? Or should it be something that end users opt into? I'm inclined to be a bit more opinionated, and say this should be the default—really emphasize that the latest version is the thing, and the old/new versions are mechanisms for managing change—but it does add a bit more inconsistency.

Yes I definitely agree it should be the default. More to come on the Slack thread

joellabes · 2023-04-21T01:17:29Z

website/docs/docs/collaborate/govern/model-versions.md

+Of course, if one model version makes meaningful and substantive changes to logic in another, it may not be possible to optimize it in this way. At that point, the cost of human intuition and legibility is more important than the cost of recomputing similar transformations.
+
+We expect to develop more opinionated recommendations as teams start adopting model versions in practice. One recommended pattern we can envision: Prioritize the definition of the `latest_version`, and define other versions (old and prerelease) based on their diffs from the latest. How?
+- Define the properties and configuration for the latest version in the top-level model yaml, and the diffs for other versions below (via `include`/`exclude`)


👨‍🍳 💋

joellabes · 2023-04-21T01:18:15Z

website/docs/docs/collaborate/govern/model-versions.md

+We expect to develop more opinionated recommendations as teams start adopting model versions in practice. One recommended pattern we can envision: Prioritize the definition of the `latest_version`, and define other versions (old and prerelease) based on their diffs from the latest. How?
+- Define the properties and configuration for the latest version in the top-level model yaml, and the diffs for other versions below (via `include`/`exclude`)
+- Where possible, define other versions as `select` transformations, which take the latest version as their starting point
+- When bumping the `latest_version`, migrate the SQL and yaml accordingly. In this case, we would see if it's possible to redefine `v1` with respect to `v2`.


in this case

The sample above where country name is removed? I don't quite follow this bit

joellabes · 2023-04-21T01:21:55Z

website/docs/docs/collaborate/govern/model-versions.md

+
+Many changes to a model are not breaking, and do not require a new version! Examples include adding a new column, or fixing a bug in modeling logic.
+
+By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers.


True, but nothing to do with versioning on its own. Is the implicit recommendation here "when you break your contract, you should bump versions"?

joellabes · 2023-04-21T01:23:38Z

website/docs/docs/collaborate/govern/model-versions.md

+
+By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers.
+
+It's also possible to change the model in more subtle ways — by recalculating a column in a way that doesn't change its name, data type, or enforceable characteristics—but would substantially change the results seen by downstream queriers.


Likewise here, it feels like this would benefit from driving the point home: if you are going to surprise your querier, probably bump versions

jtcohen6 · 2023-04-24T02:02:41Z

Thank you for the excellent feedback ❤️

My revision includes two significant assumptions:

That we will implement two of the UX changes proposed in UX improvements to model versions dbt-core#7435 (which I can remove, if implementing proves impossible before Thursday):
- Latest version can be defined in <model_name>.sql (no suffix)
- Unpinned ref will log if a newer prerelease version is detected
That, for now, the best / recommended approach to handle aliasing is with a hook that creates a view pointing to the latest version, as in this gist

joellabes

🔥

joellabes · 2023-04-24T04:47:05Z

website/docs/docs/collaborate/govern/model-versions.md

-Model versions are different. Multiple versions of a model will live in the same code repository at the same time and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned—multiple versions are live simultaneously; older versions are often eventually sunsetted.
+**Versioned models are different.** Defining model `versions` is appropriate when there are people, systems, and processes beyond your team's control, inside or outside of dbt. You can neither simply go migrate them all, nor break their queries on a whim. I need to do my part by offering a migration path, with clear diffs and deprecation dates.
+
+Multiple versions of a model will live in the same code repository at the same time, and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned—multiple versions are live simultaneously; older versions are often eventually sunsetted. 


(but hopefully not more than 2)

joellabes · 2023-04-24T04:48:10Z

website/docs/docs/collaborate/govern/model-versions.md


-Model versions are different. Multiple versions of a model will live in the same code repository at the same time and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned—multiple versions are live simultaneously; older versions are often eventually sunsetted.
+**Versioned models are different.** Defining model `versions` is appropriate when there are people, systems, and processes beyond your team's control, inside or outside of dbt. You can neither simply go migrate them all, nor break their queries on a whim. I need to do my part by offering a migration path, with clear diffs and deprecation dates.


You can neither simply go migrate them all, nor break their queries on a whim. I need to do my part by offering a migration path, with clear diffs and deprecation dates.

Both of these are the same actor aren't they? Did you just feel weird about telling people they have to do their part?

:) I suspect I'm switching frequently between first & second person - worth another reread just for this

I've actually been reasonably consistent:

"we" = dbt Labs, developers/maintainers of dbt-core

"you" = user of dbt, maintainer of a versioned model

joellabes · 2023-04-24T04:51:31Z

website/docs/docs/collaborate/govern/model-versions.md

+**Where are they defined?**
+
+
+**Where will they be materialized?** By convention, these will create database relations with aliases `dim_customers_v1` and `dim_customers_v2`. We recommend that you also create a view, named `dim_customers`, pointing to the latest version. Check out guidance on an easy & repeatable way to do that.


Check out guidance on an easy & repeatable way to do that.

Where is that guidance? Looks like a link is missing here

joellabes · 2023-04-24T04:53:26Z

website/docs/docs/collaborate/govern/model-versions.md

+
+**Where will they be materialized?** By convention, these will create database relations with aliases `dim_customers_v1` and `dim_customers_v2`. We recommend that you also create a view, named `dim_customers`, pointing to the latest version. Check out guidance on an easy & repeatable way to do that.
+
+By convention, dbt will expect those two models to be defined in files named `dim_customers_v1.sql` and `dim_customers_v2.sql`. It will also accept `dim_customers.sql` (no suffix) as the definition of the latest version. (It is possible to override this by setting `defined_in: any_file_name_you_want`, but we strongly encourage you to follow the convention!)


It is possible to override this by setting defined_in: any_file_name_you_want

You have to include the .sql suffix right?

(Related: if it's optional, what happens if you have any_file_you_want.sql and any_file_you_want.py?)

You don't need to include the file extension (and in fact, shouldn't)

(Related: if it's optional, what happens if you have any_file_you_want.sql and any_file_you_want.py?)

Not allowed - model file names still need to be globally unique, independent of the file extension. I think I have a note about this in the reference docs for defined_in - I think I'll add a link there from here.

joellabes · 2023-04-24T04:55:28Z

website/docs/docs/collaborate/govern/model-versions.md


-<File name="models/dim_customers_view.yml">
+<!-- TODO: add the macro from my gist to dbt-core. Better as on-run-end or post-hook? -->


I'd say post-hook for all the same reasons we used to encourage doing grants in post-hooks - the changes apply immediately instead of having to wait for the entire 3 hour run to complete

joellabes · 2023-04-24T05:03:48Z

website/docs/reference/resource-properties/versions.md

@@ -35,24 +35,29 @@ The standard convention for naming model versions is `<model_name>_v<v>`. This h

 The version identifier for a version of a model. This value can be numeric (integer or float), or any string.

-The value of the version identifier is used to order versions of a model relative to one another. If a versioned model does _not_ explicitly configure a [`latest_version`](resource-properties/latest-version), the highest version number is used as the latest version to resolve `ref` calls to the model without a `version` argument.
+The value of the version identifier is used to order versions of a model relative to one another. If a versioned model does _not_ explicitly configure a [`latest_version`](resource-properties/latest_version), the highest version number is used as the latest version to resolve `ref` calls to the model without a `version` argument.

 In general, we recommend that you use a simple "major versioning" scheme for your models: `v1`, `v2`, `v3`, etc, where each version represents a breaking change from previous versions. However, you are welcome to use other versioning schemes.


However, you are welcome to use other versioning schemes

as long as they behave correctly when sort()ed. (or however we're actually doing it).

On that note, do we handle people putting vs in their yaml? What would happen if I did this?

models: - name: dim_customers versions: - v: v1 ... - v: 2

Both from a sorting perspective and an alias-creation perspective - would I wind up with dim_customers_vv1 which outranked the 2 in sort order?

Yes & yes. I'll add an explicit caution that people should not include v in their version identifier.

jtcohen6 · 2023-04-24T13:46:36Z

As with dbt-labs/dbt-core#7435 (comment), holding off on merging this until we decide whether to vendor create_latest_version_view (recommended post-hook) within dbt-core directly.

Update: Let's opt for, this will come in v1.6; in the meantime, a macro you can copy-paste-edit-post-hook.

In the meantime, comments from other reviewers still welcome!

mirnawong1 · 2023-04-24T14:10:50Z

website/docs/reference/model-properties.md

@@ -16,7 +16,7 @@ models:
    [description](description): <markdown_string>
    [docs](/reference/resource-configs/docs):
      show: true | false
-    [latest_version](resource-properties/latest-version): <version_identifier>
+    [latest_version](resource-properties/latest_version): <version_identifier>


i dont understand why an underscore was added here, the actual page is https://docs.getdbt.com/reference/resource-properties/latest-version and a latest_version brings the user to a 'page not found'. suggesting it goes back to the latest-version

Suggested change

[latest_version](resource-properties/latest_version): <version_identifier>

[latest_version](resource-properties/latest-version): <version_identifier>

@mirnawong1 The name of this resource property is latest_version (underscore). I looked at some other similar properties/configs, and they all have underscores in their file name / id:

https://docs.getdbt.com/reference/resource-configs/sql_header

https://docs.getdbt.com/reference/resource-configs/column_types

https://docs.getdbt.com/reference/resource-configs/store_failures

(etc)

So I renamed the page, and added a redirect for it

jtcohen6 · 2023-04-24T19:40:44Z

website/docs/docs/collaborate/govern/model-versions.md

+        ) %}
+
+        {% set existing_relation = load_relation(new_relation) %}
+        {{ drop_relation_if_exists(existing_relation) }}


Since the whole point is that this view should be live-queryable from a BI tool... I don't think we want to be dropping it outside of a transaction / atomic operation.

I don't think we have a handy cross-db way to do this, outside of the actual view materialization logic. This might take some fudging.

runleonarun

Just a few comments now and I will finish reviewing tomorrow!

runleonarun · 2023-04-25T01:40:42Z

_redirects

@@ -278,6 +278,7 @@ docs/dbt-cloud/using-dbt-cloud/cloud-model-timing-tab /docs/deploy/dbt-cloud-job
 /docs/artifacts /docs/dbt-cloud/using-dbt-cloud/artifacts 301
 /docs/bigquery-configs /reference/resource-configs/bigquery-configs 301
 /reference/resource-properties/docs /reference/resource-configs/docs 301
+/reference/resource-properties/latest-version /reference/resource-configs/latest_version 301


I think this should be as follows:

Suggested change

/reference/resource-properties/latest-version /reference/resource-configs/latest_version 301

/reference/resource-properties/latest-version /reference/resource-properties/latest_version 301

good catch!

runleonarun · 2023-04-25T01:42:09Z

website/docs/docs/collaborate/govern/model-contracts.md

@@ -6,7 +6,7 @@ description: "Model contracts define a set of parameters validated during transf
 ---

 :::info New functionality
-This functionality is new in v1.5.
+This functionality is new in v1.5 — if you have thoughts, weigh into the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6726)!


"Weigh in" might be vague.

Suggested change

This functionality is new in v1.5 — if you have thoughts, weigh into the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6726)!

This functionality is new in v1.5 — if you have feedback, then participate in the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6726)!

👍 replacing with ~~"comment on"~~ "participate in"

runleonarun · 2023-04-25T01:42:48Z

website/docs/docs/collaborate/govern/model-versions.md

@@ -6,39 +6,120 @@ description: "Version models to help with lifecycle management"
 ---

 :::info New functionality
-This functionality is new in v1.5.
+This functionality is new in v1.5 — if you have thoughts, weigh into the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6736)!


Suggested change

This functionality is new in v1.5 — if you have thoughts, weigh into the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6736)!

This functionality is new in v1.5 — if you have feedback, then participate in the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6736)!

runleonarun · 2023-04-25T01:43:32Z

website/docs/docs/collaborate/govern/model-versions.md

 :::

-API versioning is a _complex_ problem in software engineering. It's also essential. Our goal is to _overcome obstacles to transform a complex problem into a reality_.
+Versioning APIs is a hard problem in software engineering. At the root of the challenge is the fact that the producers and consumers of an API have competing incentives:


Suggested change

Versioning APIs is a hard problem in software engineering. At the root of the challenge is the fact that the producers and consumers of an API have competing incentives:

Versioning APIs is a hard problem in software engineering. The root of the challenge is that the producers and consumers of an API have competing incentives:

runleonarun

@jtcohen6 This looks good! I have some wording suggestions to clarify the message here. I also have a few questions that might be worth addressing.

Approving so you can release during your day.

runleonarun · 2023-04-25T16:50:09Z

website/docs/docs/collaborate/govern/model-versions.md

 :::

-API versioning is a _complex_ problem in software engineering. It's also essential. Our goal is to _overcome obstacles to transform a complex problem into a reality_.
+Versioning APIs is a hard problem in software engineering. The root of the challenge is that the producers and consumers of an API have competing incentives:
+- Producers of an API need the ability to make changes to its logic. There is a real cost associated with maintaining legacy endpoints forever, but losing the trust of downstream users is far costlier.


I think we can make this pack more punch by writing out some of the passive language:

Suggested change

- Producers of an API need the ability to make changes to its logic. There is a real cost associated with maintaining legacy endpoints forever, but losing the trust of downstream users is far costlier.

- Producers of an API need the ability to modify its logic. Although maintaining legacy endpoints forever incurs a significant expense, it costs more to lose the trust of downstream users.

runleonarun · 2023-04-25T16:55:48Z

website/docs/docs/collaborate/govern/model-versions.md

-API versioning is a _complex_ problem in software engineering. It's also essential. Our goal is to _overcome obstacles to transform a complex problem into a reality_.
+Versioning APIs is a hard problem in software engineering. The root of the challenge is that the producers and consumers of an API have competing incentives:
+- Producers of an API need the ability to make changes to its logic. There is a real cost associated with maintaining legacy endpoints forever, but losing the trust of downstream users is far costlier.
+- Consumers of an API need to trust in its stability—their queries will keep working, and won't break without warning. There is a real cost associated with migrating to a newer API version, but unplanned migration is far costlier.


Suggested change

- Consumers of an API need to trust in its stability—their queries will keep working, and won't break without warning. There is a real cost associated with migrating to a newer API version, but unplanned migration is far costlier.

- Consumers of an API need to trust in its stability—their queries will keep working, and won't break without warning. Although migrating to a newer API version incurs an expense, an unplanned migration is far costlier.

runleonarun · 2023-04-25T17:01:21Z

website/docs/docs/collaborate/govern/model-versions.md

+- Producers of an API need the ability to make changes to its logic. There is a real cost associated with maintaining legacy endpoints forever, but losing the trust of downstream users is far costlier.
+- Consumers of an API need to trust in its stability—their queries will keep working, and won't break without warning. There is a real cost associated with migrating to a newer API version, but unplanned migration is far costlier.
+
+The goal of model versions is not to make the problem go away, nor to pretend it's somehow easier or simpler than it is. Rather, we want dbt to provide tools that make it possible to tackle this problem, thoughtfully and head-on, and to develop standard patterns for solving it.


To be clear, the "tools" are model versioning, right? I'd also suggest flipping the two sentences so the what model versioning does do is not buried.

Suggested change

The goal of model versions is not to make the problem go away, nor to pretend it's somehow easier or simpler than it is. Rather, we want dbt to provide tools that make it possible to tackle this problem, thoughtfully and head-on, and to develop standard patterns for solving it.

The goal of model versions is not to make the problem go away, nor to pretend it's somehow easier or simpler than it is. Rather, model versioning makes it possible to tackle this problem, thoughtfully and head-on, and to develop standard patterns for solving it.

runleonarun · 2023-04-25T17:11:53Z

website/docs/docs/collaborate/govern/model-versions.md

+
+## When should you version a model?
+
+By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers. These changes, when made intentionally, would require a new model version. But many changes are not breaking, and don't require a new version—such as adding a new column, or fixing a bug in an existing column's calculation.


Suggested change

By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers. These changes, when made intentionally, would require a new model version. But many changes are not breaking, and don't require a new version—such as adding a new column, or fixing a bug in an existing column's calculation.

By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers. These changes, when made intentionally, would require a new model version. But when making non-breaking changes, you don't need a new version—such as adding a new column, or fixing a bug in an existing column's calculation.

@jtcohen6 for this sentence, do we want to say that they get the option of creating a new version vs fixing the problem? It feels like that the undertone here, but we might want to be explicit. Using "require" kind of sounds like dbt will require it, which doesn't seem the case later.

These changes, when made intentionally, would require a new model version

Good clarifying question. If you make a breaking contract change, dbt will raise an error during CI — and you'd need to "merge on red" (you always can do it). https://docs.getdbt.com/reference/resource-configs/contract#detecting-breaking-changes

Consider making an additive (non-breaking) change instead, if possible. Otherwise, create a new model version: https://docs.getdbt.com/docs/collaborate/govern/model-versions

runleonarun · 2023-04-25T17:23:16Z

website/docs/docs/collaborate/govern/model-versions.md

+
+When you make updates to a model's source code—its logical definition, in SQL or Python, or related configuration—dbt can [compare your project to previous state](project-state), enabling you to rebuild only models that have changed, and models downstream of a change. In this way, it's possible to develop changes to a model, quickly test in CI, and efficiently deploy into production—all coordinated via your version control system.
+
+**Versioned models are different.** Defining model `versions` is appropriate when there are people, systems, and processes beyond your team's control, inside or outside of dbt. You can neither simply go migrate them all, nor break their queries on a whim. You need to do my part by offering a migration path, with clear diffs and deprecation dates.


Not sure if "do my part" was a typo?

Suggested change

**Versioned models are different.** Defining model `versions` is appropriate when there are people, systems, and processes beyond your team's control, inside or outside of dbt. You can neither simply go migrate them all, nor break their queries on a whim. You need to do my part by offering a migration path, with clear diffs and deprecation dates.

**Versioned models are different.** Defining model `versions` is appropriate when people, systems, and processes beyond your team's control, inside or outside of dbt, depend on your models. You can neither simply go migrate them all, nor break their queries on a whim. You need to offer a migration path, with clear diffs and deprecation dates.

*do your part! whoops. I like your suggestion better

runleonarun · 2023-04-25T17:28:20Z

website/docs/docs/collaborate/govern/model-versions.md

+
+You've always been able to copy-paste, create a new model file, and name it `dim_customers_v2.sql`. Why should you opt for a "real" versioned model instead?
+
+As the **producer** of a versioned model:


These benefits are super clear! Love how this section reads!

You might consider bullets instead of steps. Then the reader can focus on the content. Steps usually indicate you need to do something.

runleonarun · 2023-04-25T18:07:28Z

website/docs/docs/collaborate/govern/model-versions.md

 :::

-API versioning is a _complex_ problem in software engineering. It's also essential. Our goal is to _overcome obstacles to transform a complex problem into a reality_.
+Versioning APIs is a hard problem in software engineering. The root of the challenge is that the producers and consumers of an API have competing incentives:


We never actually say how model versioning relates to API versioning. I wonder if we could call that out here before we start talking about the problems with versioning APIs?

Added these sentences in between, to try and connect the dots:

When sharing a final dbt model with other teams or systems, that model is operating like an API. When the producer of that model needs to make significant changes, how can they avoid breaking the queries of its users downstream?

runleonarun · 2023-04-25T18:10:21Z

website/docs/docs/collaborate/govern/model-versions.md

+| 2 | "latest"     | `ref('dim_customers', v=2)` **and** `ref('dim_customers')`  | `dim_customers_v2.sql` **or** `dim_customers.sql` | `analytics.dim_customers_v2` **and** `analytics.dim_customers` (recommended) |
+| 1 | "old"        |  `ref('dim_customers', v=1)`                           | `dim_customers_v1.sql`                          | `analytics.dim_customers_v1`                                             |
+
+As you'll see in the implementation section below, a versioned model can reuse the majority of its yaml properties and configuration. Each version needs to only say how it _differs_ from the shared set of attributes. This gives you, as the producer of a versioned model, the opportunity to highlight the differences across versions—which is otherwise difficult to detect in models with dozens or hundreds of columns—and to clearly track, in one place, all versions of the model which are currently live.


Suggested change

As you'll see in the implementation section below, a versioned model can reuse the majority of its yaml properties and configuration. Each version needs to only say how it _differs_ from the shared set of attributes. This gives you, as the producer of a versioned model, the opportunity to highlight the differences across versions—which is otherwise difficult to detect in models with dozens or hundreds of columns—and to clearly track, in one place, all versions of the model which are currently live.

As you'll see in the implementation section below, a versioned model can reuse the majority of its YAML properties and configuration. Each version needs to only say how it _differs_ from the shared set of attributes. This gives you, as the producer of a versioned model, the opportunity to highlight the differences across versions—which is otherwise difficult to detect in models with dozens or hundreds of columns—and to clearly track, in one place, all versions of the model which are currently live.

runleonarun · 2023-04-25T18:20:04Z

website/docs/docs/collaborate/govern/model-versions.md

+
+  Try out v3: {{ ref('my_dbt_project', 'my_model', v='3') }}
+  Pin to  v2: {{ ref('my_dbt_project', 'my_model', v='2') }}
+```

 ## How to create a new version of a model


Not for this week, but eventually, we might want to pull the procedure section out into its own page so people who don't need the "why" and other context can more easily find the "how."

Heard! A lot of the "how" details are also captured in the reference documentation: https://docs.getdbt.com/reference/resource-properties/versions

For now, we should expect that the main audience for these docs is people trying to learn about & understand the new feature. As more the concept becomes more established, and people are already convinced they want to use the feature, we can make it even easier to just do the thing

Draft some revisions to model versions

b9049a8

github-actions bot added content Improvements or additions to content size: medium This change will take up to a week to address labels Apr 20, 2023

MichelleArk reviewed Apr 20, 2023

View reviewed changes

Side-by-side example

cb26fd1

joellabes requested changes Apr 21, 2023

View reviewed changes

This was linked to issues Apr 24, 2023

Referencing specific model versions #3226

Closed

Keeping the original filename and table name when versioning for first time #3203

Closed

PR feedback

e263bed

jtcohen6 marked this pull request as ready for review April 24, 2023 02:02

jtcohen6 requested a review from a team as a code owner April 24, 2023 02:02

github-actions bot added size: large This change will more than a week to address and might require more than one person and removed size: medium This change will take up to a week to address labels Apr 24, 2023

joellabes reviewed Apr 24, 2023

View reviewed changes

More feedback

8d303df

jtcohen6 mentioned this pull request Apr 24, 2023

UX improvements to model versions dbt-labs/dbt-core#7435

Merged

6 tasks

Merge branch 'current' into jerco/more-on-model-versions

8fd1fef

mirnawong1 reviewed Apr 24, 2023

View reviewed changes

jtcohen6 commented Apr 24, 2023

View reviewed changes

runleonarun reviewed Apr 25, 2023

View reviewed changes

PR feedback, self-review

1ebed2c

runleonarun previously approved these changes Apr 25, 2023

View reviewed changes

Final feedbacack

1a00757

jtcohen6 dismissed runleonarun’s stale review via 1a00757 April 26, 2023 10:45

jtcohen6 merged commit 22ecf93 into current Apr 26, 2023

jtcohen6 deleted the jerco/more-on-model-versions branch April 26, 2023 10:49


		It's also possible to change the model in more subtle ways — by recalculating a column in a way that doesn't change its name, data type, or enforceable characteristics—but would substantially change the results seen by downstream queriers.

		The process of sunsetting and migrating model versions requires real work, and may require significant coordination across teams. If, instead of using model versions, you opt for non-breaking changes wherever possible—that's a completely legitimate approach. Even so, after a while, you'll find yourself with lots of unused or deprecated columns. Many teams will want to consider a predictable cadence (once or twice a year) for bumping the version of their mature models, and taking the opportunity to remove no-longer-used columns.


		You've always been able to create a new model, and name it `dim_customers_v2`. Why should you opt for a "real" versioned model instead?

		First, the versioned model preserves its _reference name_. Versioned models are `ref`'d by their _model name_, rather than the name of the file that they're defined in. By default, the `ref` resolves to the latest version (as declared by that model's maintainer), but you can also `ref` a specific version of the model, with a `version` keyword.

v	ref syntax	file name	table name
3	`ref('dim_customers', v=3)`	dim_customers_NOT_READY_YET.sql	analytics.dim_customers_v3
2	`ref('dim_customers')` or `ref('dim_customers', v=2)`	dim_customers_v2.sql	analytics.dim_customers
1	`ref('dim_customers', v=1)`	dim_customers_v1.sql	analytics.dim_customers_v1


		First, the versioned model preserves its _reference name_. Versioned models are `ref`'d by their _model name_, rather than the name of the file that they're defined in. By default, the `ref` resolves to the latest version (as declared by that model's maintainer), but you can also `ref` a specific version of the model, with a `version` keyword.

		<File name="models/schema.yml">

	<File name="models/schema.yml">
	<File name="models/scratchpad.sql">


		Many changes to a model are not breaking, and do not require a new version! Examples include adding a new column, or fixing a bug in modeling logic.

		By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers.


		Model versions are different. Multiple versions of a model will live in the same code repository at the same time and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned—multiple versions are live simultaneously; older versions are often eventually sunsetted.
		Versioned models are different. Defining model `versions` is appropriate when there are people, systems, and processes beyond your team's control, inside or outside of dbt. You can neither simply go migrate them all, nor break their queries on a whim. I need to do my part by offering a migration path, with clear diffs and deprecation dates.

		Where are they defined?


		Where will they be materialized? By convention, these will create database relations with aliases `dim_customers_v1` and `dim_customers_v2`. We recommend that you also create a view, named `dim_customers`, pointing to the latest version. Check out guidance on an easy & repeatable way to do that.


		Where will they be materialized? By convention, these will create database relations with aliases `dim_customers_v1` and `dim_customers_v2`. We recommend that you also create a view, named `dim_customers`, pointing to the latest version. Check out guidance on an easy & repeatable way to do that.

		By convention, dbt will expect those two models to be defined in files named `dim_customers_v1.sql` and `dim_customers_v2.sql`. It will also accept `dim_customers.sql` (no suffix) as the definition of the latest version. (It is possible to override this by setting `defined_in: any_file_name_you_want`, but we strongly encourage you to follow the convention!)


		<File name="models/dim_customers_view.yml">
		<!-- TODO: add the macro from my gist to dbt-core. Better as on-run-end or post-hook? -->

	[latest_version](resource-properties/latest_version): <version_identifier>
	[latest_version](resource-properties/latest-version): <version_identifier>

	/reference/resource-properties/latest-version /reference/resource-configs/latest_version 301
	/reference/resource-properties/latest-version /reference/resource-properties/latest_version 301

	This functionality is new in v1.5 — if you have thoughts, weigh into the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6726)!
	This functionality is new in v1.5 — if you have feedback, then participate in the [GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/6726)!

	Versioning APIs is a hard problem in software engineering. At the root of the challenge is the fact that the producers and consumers of an API have competing incentives:
	Versioning APIs is a hard problem in software engineering. The root of the challenge is that the producers and consumers of an API have competing incentives:

	- Producers of an API need the ability to make changes to its logic. There is a real cost associated with maintaining legacy endpoints forever, but losing the trust of downstream users is far costlier.
	- Producers of an API need the ability to modify its logic. Although maintaining legacy endpoints forever incurs a significant expense, it costs more to lose the trust of downstream users.

	- Consumers of an API need to trust in its stability—their queries will keep working, and won't break without warning. There is a real cost associated with migrating to a newer API version, but unplanned migration is far costlier.
	- Consumers of an API need to trust in its stability—their queries will keep working, and won't break without warning. Although migrating to a newer API version incurs an expense, an unplanned migration is far costlier.

	The goal of model versions is not to make the problem go away, nor to pretend it's somehow easier or simpler than it is. Rather, we want dbt to provide tools that make it possible to tackle this problem, thoughtfully and head-on, and to develop standard patterns for solving it.
	The goal of model versions is not to make the problem go away, nor to pretend it's somehow easier or simpler than it is. Rather, model versioning makes it possible to tackle this problem, thoughtfully and head-on, and to develop standard patterns for solving it.


		## When should you version a model?

		By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers. These changes, when made intentionally, would require a new model version. But many changes are not breaking, and don't require a new version—such as adding a new column, or fixing a bug in an existing column's calculation.


		When you make updates to a model's source code—its logical definition, in SQL or Python, or related configuration—dbt can [compare your project to previous state](project-state), enabling you to rebuild only models that have changed, and models downstream of a change. In this way, it's possible to develop changes to a model, quickly test in CI, and efficiently deploy into production—all coordinated via your version control system.

		Versioned models are different. Defining model `versions` is appropriate when there are people, systems, and processes beyond your team's control, inside or outside of dbt. You can neither simply go migrate them all, nor break their queries on a whim. You need to do my part by offering a migration path, with clear diffs and deprecation dates.


		You've always been able to copy-paste, create a new model file, and name it `dim_customers_v2.sql`. Why should you opt for a "real" versioned model instead?

		As the producer of a versioned model:

Revisions & additions to Model Versions #3232

Revisions & additions to Model Versions #3232

Conversation

jtcohen6 commented Apr 20, 2023 • edited by dbeatty10 Loading

What are you changing in this pull request and why?

netlify bot commented Apr 20, 2023 • edited Loading

✅ Deploy Preview for docs-getdbt-com ready!

Choose a reason for hiding this comment

joellabes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 22, 2023 • edited Loading

Choose a reason for hiding this comment

joellabes Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 commented Apr 24, 2023

joellabes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joellabes Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joellabes Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 commented Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

jtcohen6 Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

jtcohen6 Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

runleonarun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

runleonarun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 commented Apr 20, 2023 •

edited by dbeatty10

Loading

netlify bot commented Apr 20, 2023 •

edited

Loading

jtcohen6 Apr 22, 2023 •

edited

Loading

jtcohen6 Apr 22, 2023 •

edited

Loading

joellabes Apr 23, 2023 •

edited

Loading

jtcohen6 Apr 22, 2023 •

edited

Loading

joellabes Apr 24, 2023 •

edited

Loading

joellabes Apr 24, 2023 •

edited

Loading

jtcohen6 commented Apr 24, 2023 •

edited

Loading

jtcohen6 Apr 24, 2023 •

edited

Loading

jtcohen6 Apr 24, 2023 •

edited

Loading

jtcohen6 Apr 25, 2023 •

edited

Loading