-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Snapshot contract errors for dbt_change_type and dbt_unique_key #9542
Comments
Thanks for reporting this @david-burke5000 ! Initial assessmentI suspect that we didn't intentionally include support model contracts for snapshots -- it's not listed here either way. But at the very least, you did figure out how to get model contracts to work for snapshots in dbt-postgres (albeit with behavior that is a bit wonky). When I tried it locally with dbt-postgres, I got the same behavior as you, but it didn't really work at all when I tried with dbt-snowflake. OptionsWe've got a couple options moving forward:
I'm going to label this as "refinement" for us to take a closer look. |
We talked about this briefly today (@graciegoheen and myself). We thought we'd leave this in the Court of the Mesh King 👑 to decide. What do you think @jtcohen6? |
@david-burke5000 @dbeatty10 My inclination is that we should not seek to officially support contracts on snapshots. Here's my rationale: For snapshots that depend on sources (best practice), when a column in the upstream source system is added or changes data type, the snapshot will handle this by adding a new column with that name / data type. If a column is removed, the snapshot does not remove its corresponding column. In this way, we ensure that data is not lost (good!), but we trade off having a fully predictable schema for snapshots. For snapshots that depend on models, and sit in the middle of the DAG (not something we encourage outright, but it definitely happens): I would recommend limiting the transformation logic in the snapshot to just To that end, I've opened a docs PR: What say ye? Any strong dissent? |
…4954) ## What are you changing in this pull request and why? Clarify that model contracts are only supported for models, not for other resource types. Context: - dbt-labs/dbt-core#9542
No dissent from me, but I do think if we go this route we should add an error message when you attempt to add a contract to a snapshot, so folks don't get into this tricky spot. |
To round out this story, here's the components that would be the most helpful:
|
Workaround using a pass-through modelSuppose you have a snapshot (or seed) named To add a pass-through model contract, you can do the following:
1. Make a simple pass-through view
{{ config(materialized="view") }}
select * from {{ ref("not_a_model") }} 2. Enforce a contract on the pass-through view
models:
- name: contract_for_not_a_model
contract:
enforced: true
columns:
# snapshot-specific columns
# your columns here
- name: column_1
data_type: dtype_1
# etc.
# snapshot-specific columns
- name: dbt_scd_id
data_type: text
- name: dbt_updated_at
data_type: timestamp
- name: dbt_valid_from
data_type: timestamp
- name: dbt_valid_to
data_type: timestamp |
Dear @graciegoheen, @dbeatty10, @jtcohen6 , Thank you for your support and feedback on this issue! I am working together with @david-burke5000. I am commenting the situation with the dbt-postgres-adapter in mind. We work in a highly regulated industry where it is imperative that we track how data has changed over time. This is where we rely on the SCD2 snapshots to build the history of the data, incl. the time intervals a record is valid. All our downstream models are based on these snapshots. Our current DAG: source table => view (adding metadata and derive surrogate key column + dbt tests) => snapshot (with data contracts) => any downstream model Snapshots vs. incremental models:@jtcohen6 argued:
Currently contracts are supported for incremental models. As far as I understand, the behavior for incremental models with on_schema_change: append_new_columns is the same as for snapshots: new columns are added to the table, deleted columns are left within the table. Workaround using a pass-through model:The workaround to add a view-based pass-through model would not (fully) work, as database constraints (e.g. not_null, primary_key, foreign_key) can only be applied on an actual table (in our case the snapshot table) and not on a view. Kindly ask for further consideration and feedback on our thoughts. Thank you and best regards! |
Is this a new bug in dbt-core?
Current Behavior
Defining in the config of a snapshot schema that contracts are to be enforced, one gets error relating to fields
If one includes them in the schema for the first run of a snapshot, this generates an error since the fields do not exist yet. In subsequent snapshots (after an initial snapshot, and where a change has occurred) one gets an error if they are missing in the schema definition. This then requires one, especially in the development phase, to continually put these contracts in or take them out. Understandably, one has to define a contract for all columns in the schema that finally appear in the snapshot table. However, for internally generated columns, this is wasting development time. This doesn not happen for dbt_scd_id column for example.
Expected Behavior
If columns do not appear in the snapshot tables, and are internal, it should not cause a problem to not define their contracts directly in a schema.
Steps To Reproduce
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
postgres
Additional Context
No response
The text was updated successfully, but these errors were encountered: