[Feature]: Snapshots of "source" tables? #410

codigo-ergo-sum · 2023-11-30T04:14:07Z

Overview

I'm not sure if this is really a feature request or if I'm misunderstanding something about how this (very helpful and much appreciated) package works.

The "source" tables (e.g. models, tests, etc.) where the dbt execution info is loaded via INSERT INTO statements are defined as incremental models. I understand on one level why this is done - if you just used a standard table materialization then every time the model was run the data would get removed, but the issue for us is that we run --full-refresh on our existing dbt project fairly often, i.e. once a week or so. We do this because we have a number of incremental models which cut down on query processing time and costs but occasionally there are some really late-arriving facts or other sneaky data cases that sneak past our incremental windowing filters, so just to be safe it's easier for us to do a full trunc and reload on our incremental tables every so often.

But... This causes a problem with this package because it means we'll lose all the historical invocation information every time we do a --full-refresh. Yes we could do some exclusions on our dbt job invocations in our dev and prod environments like --exclude dbt-artifacts but that's fragile and vulnerable to somebody forgetting to not exclude those when they're editing dbt job parameters in dbt cloud. We want to keep all of that over time and not have to worry about ever losing all of our dbt invocation history just because somebody accidentally configured a job wrong or didn't understand.

Am I understanding the situation correctly here? Wouldn't others want to ensure that they keep the full invocation history even when running a full refresh? And in that case wouldn't dbt snapshots be the proper method to do this? So, have the source tables that could be incremental, or just regular table materializations, but then have a snapshot on top of them to capture and persist the data?

Let me know if I misunderstood the intention and use case here, maybe I'm not "getting" it.

What problem would this solve?

It would ensure persistence of dbt artifact data over time regardless of how other incremental models are being used and refreshed in the project.

Would you be willing to contribute?

Yes but not able to at this exact moment.

The text was updated successfully, but these errors were encountered:

codigo-ergo-sum · 2024-08-21T02:12:45Z

Just checking if any feedback from anyone on this? Looking back now to adding dbt-artifacts to our bigquery project since some of the previous issues around managing large numbers of tests seem to have been resolved but still scratching my head about the question I posted previously in this issue.

codigo-ergo-sum added the enhancement New feature or request label Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Snapshots of "source" tables? #410

[Feature]: Snapshots of "source" tables? #410

codigo-ergo-sum commented Nov 30, 2023

codigo-ergo-sum commented Aug 21, 2024

[Feature]: Snapshots of "source" tables? #410

[Feature]: Snapshots of "source" tables? #410

Comments

codigo-ergo-sum commented Nov 30, 2023

Overview

What problem would this solve?

Would you be willing to contribute?

codigo-ergo-sum commented Aug 21, 2024