-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2869] Add new snapshot records when source removes a field #8207
Comments
+1, would love to see this handled as a config option similar to |
+1 |
Thanks for such a nice write-up @owen-mc-git 🤩 -- made it easy to see what is it doing currently vs. your expectation. See "Reprex" below for the code that I used as a reproducible example. Refinement@compilerqueen if we can address this, do you think it should be a config option... or should we just roll it out (essentially) as a bug fix? Would be curious to hear your thoughts on pros/cons either way. Potential solutionIf we wanted to make the minimal change to produce the behavior that @owen-mc-git described, we could just add the following line right above the return statement here: {%- set ns.column_added = intersection|length < query_columns|length or intersection|length < existing_cols|length -%} 👉 Of course we'd probably do more changes than that (e.g., rename variables and remove unused code), but it was pretty pleasing to see that the behavior can be achieved with such minimal code changes 😎 Result
ReprexClick here for detailsThese project files assume
{% snapshot my_snapshot %}
{{
config(
target_database=target.database,
target_schema=target.schema,
updated_at="'" ~ var("updated_at", '2023-07-23') ~ "'::date",
unique_key='id',
strategy='check',
check_cols='all',
)
}}
{% set day_number = var("day", 1) | int %}
{% if day_number <= 1 %}
select * from {{ ref("my_seed.v1") }}
{% elif day_number == 2 %}
select * from {{ ref("my_seed.v2") }}
{% elif day_number >= 3 %}
select * from {{ ref("my_seed.v3") }}
{% endif %}
{% endsnapshot %} Run the snapshots one after another and view the output in between: rm db.db
dbt seed
dbt snapshot --vars "{'day': 1, 'updated_at': '2023-07-23'}"
duckcli db.db -e "select id, col_1, col_2, col_3, date_trunc('day', dbt_valid_from)::date as dbt_valid_from, date_trunc('day', dbt_valid_to)::date as dbt_valid_to from my_snapshot order by dbt_valid_from, id" --table
dbt snapshot --vars "{'day': 2, 'updated_at': '2023-07-24'}"
duckcli db.db -e "select id, col_1, col_2, col_3, col_4, date_trunc('day', dbt_valid_from)::date as dbt_valid_from, date_trunc('day', dbt_valid_to)::date as dbt_valid_to from my_snapshot order by dbt_valid_from, id" --table
dbt snapshot --vars "{'day': 3, 'updated_at': '2023-07-25'}"
duckcli db.db -e "select id, col_1, col_2, col_3, col_4, date_trunc('day', dbt_valid_from)::date as dbt_valid_from, date_trunc('day', dbt_valid_to)::date as dbt_valid_to from my_snapshot order by dbt_valid_from, id" --table Output:
|
@dbeatty10 following up on this issue, have there been any new updates that can be shared, thanks |
Hi @owen-mc-git - we opened up a new issue for this #10235 so am closing this one as a dupe :) |
@owen-mc-git Hi! Just wanted to clarify - is this request specific to when you have a snapshot with |
@graciegoheen When I encountered the issue I was using the strategy: "check" and check_cols:'all; I di not do any test to see what would happen for the same scenario under strategy timestamp |
Thanks! I'll block off some time to confirm what happens with the |
The logic referred to in the potential solution above by @dbeatty10 has since moved to dbt-adapters. See #8906 and #9401. |
Is this your first time submitting a feature request?
Describe the feature
Summary,
Using DBT snapshots, if a field is removed from source no new records are created in the snapshot table to show that the prior values no longer exists in source. This is to request that this functionality is added/supported.
This functionality exists when a new field is added but not the reverse.
Expand:
When using DBT snapshots, if a new field added in the source, it will also be added to the snapshot table and a new record is created for each existing record in the table to show the new field added/updated at this new point in time.
However if a field is removed from source no new records are created in the snapshot table to show that the prior value no longer exists in source. When a net new record is added we do see a null value appear in the removed field location.
Not having a new record created for data that is removed gives a false sense that the source still has these values.
Example:
source adding a new column on the 24th
source removing a column and adding new record on the 25th
what I would expect to see in a true SCD table
Describe alternatives you've considered
No response
Who will this benefit?
users of snapshots who wish to see the true state of a source at the correct points in time
Are you interested in contributing this feature?
No response
Anything else?
FAQ section "What happens if I add new columns to my snapshot query?" https://docs.getdbt.com/docs/build/snapshots
The text was updated successfully, but these errors were encountered: