You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this is really a feature request or if I'm misunderstanding something about how this (very helpful and much appreciated) package works.
The "source" tables (e.g. models, tests, etc.) where the dbt execution info is loaded via INSERT INTO statements are defined as incremental models. I understand on one level why this is done - if you just used a standard table materialization then every time the model was run the data would get removed, but the issue for us is that we run --full-refresh on our existing dbt project fairly often, i.e. once a week or so. We do this because we have a number of incremental models which cut down on query processing time and costs but occasionally there are some really late-arriving facts or other sneaky data cases that sneak past our incremental windowing filters, so just to be safe it's easier for us to do a full trunc and reload on our incremental tables every so often.
But... This causes a problem with this package because it means we'll lose all the historical invocation information every time we do a --full-refresh. Yes we could do some exclusions on our dbt job invocations in our dev and prod environments like --exclude dbt-artifacts but that's fragile and vulnerable to somebody forgetting to not exclude those when they're editing dbt job parameters in dbt cloud. We want to keep all of that over time and not have to worry about ever losing all of our dbt invocation history just because somebody accidentally configured a job wrong or didn't understand.
Am I understanding the situation correctly here? Wouldn't others want to ensure that they keep the full invocation history even when running a full refresh? And in that case wouldn't dbt snapshots be the proper method to do this? So, have the source tables that could be incremental, or just regular table materializations, but then have a snapshot on top of them to capture and persist the data?
Let me know if I misunderstood the intention and use case here, maybe I'm not "getting" it.
What problem would this solve?
It would ensure persistence of dbt artifact data over time regardless of how other incremental models are being used and refreshed in the project.
Would you be willing to contribute?
Yes but not able to at this exact moment.
The text was updated successfully, but these errors were encountered:
Just checking if any feedback from anyone on this? Looking back now to adding dbt-artifacts to our bigquery project since some of the previous issues around managing large numbers of tests seem to have been resolved but still scratching my head about the question I posted previously in this issue.
Overview
I'm not sure if this is really a feature request or if I'm misunderstanding something about how this (very helpful and much appreciated) package works.
The "source" tables (e.g.
models
,tests
, etc.) where the dbt execution info is loaded viaINSERT INTO
statements are defined as incremental models. I understand on one level why this is done - if you just used a standardtable
materialization then every time the model was run the data would get removed, but the issue for us is that we run--full-refresh
on our existing dbt project fairly often, i.e. once a week or so. We do this because we have a number of incremental models which cut down on query processing time and costs but occasionally there are some really late-arriving facts or other sneaky data cases that sneak past our incremental windowing filters, so just to be safe it's easier for us to do a full trunc and reload on our incremental tables every so often.But... This causes a problem with this package because it means we'll lose all the historical invocation information every time we do a
--full-refresh
. Yes we could do some exclusions on our dbt job invocations in our dev and prod environments like--exclude dbt-artifacts
but that's fragile and vulnerable to somebody forgetting to not exclude those when they're editing dbt job parameters in dbt cloud. We want to keep all of that over time and not have to worry about ever losing all of our dbt invocation history just because somebody accidentally configured a job wrong or didn't understand.Am I understanding the situation correctly here? Wouldn't others want to ensure that they keep the full invocation history even when running a full refresh? And in that case wouldn't dbt snapshots be the proper method to do this? So, have the source tables that could be incremental, or just regular
table
materializations, but then have a snapshot on top of them to capture and persist the data?Let me know if I misunderstood the intention and use case here, maybe I'm not "getting" it.
What problem would this solve?
It would ensure persistence of dbt artifact data over time regardless of how other incremental models are being used and refreshed in the project.
Would you be willing to contribute?
Yes but not able to at this exact moment.
The text was updated successfully, but these errors were encountered: