Option to disable "skip downstream models on failure" #2142

ian-whitestone · 2020-02-17T22:26:44Z

Describe the feature

When a model run fails, all dependent, downstream model runs are skipped. This is a great default, but there can be some cases where you still want to run the downstream model.

For example, one case where I would actually want this default behaviour overridden is for jobs that rely on an infrequently updated dimension. Something like a country dimension or currency dimension which changes very rarely. Say you have a daily summary/rollup model that uses order transaction data and a country dimension. If country dimension starts failing i’d still want my model to keep running off the new order transaction data, and not have to wait until the responsible team fixes it.

Describe alternatives you've considered

Individually running models and keeping track of which ones ran successfully and which ones failed. This would be quite time consuming.

Who will this benefit?

Can't comment on how many people this will benefit, but the added flexibility is always a plus.

drewbanin · 2020-02-18T14:32:44Z

Thanks @ian-whitestone - the example use case you described is a really good one!

I think we can make this a model config. I'm picturing something a config key called on_error with values like continue, skip_children, and abort.

The values map onto the following behavior:

skip_children - the default, and the current behavior. Skips children of this model
continue - try running children models, letting them fail or succeed as they will
abort - this is my addition, and it would stop the entire run (ie. skip all remaining nodes)

You buy this? I don't know that we want/need to include the abort parameter, but I thought it was worth discussing :)

ian-whitestone · 2020-02-19T02:17:43Z

👍 this makes sense to me. Only alternative I can think of is having this specified at the downstream (dependent) model level. i.e. same values but different meanings:

skip - the default, and the current behaviour. do not run this model if any of the upstream dependencies fail
continue - try running this model even if the upstream dependencies fail

The advantage of this approach would be that it gives individual analysts/modellers the ability to specify how they want to handle errors for their specific model without dictating the response for everyone else.

bashyroger · 2020-02-20T13:46:34Z

What is the difference between:

skip_children - the default, and the current behavior. Skips children of this model
abort - this is my addition, and it would stop the entire run (ie. skip all remaining nodes)

Does abort do a ROLLBACK of the full tree (so, parents too?)

drewbanin · 2020-02-20T14:52:57Z

@bashyroger I'm not picturing abort doing any sort of rollback. I was instead thinking that it would compel dbt to skip all models that have not yet run, regardless of their relationship to the failed model. It's sort of a solution in search of a problem, and we probably shouldn't implement it until we have a better handle on some good uses cases :)

github-actions · 2021-12-04T01:47:24Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

cmdugan13 · 2022-01-17T16:37:28Z

@drewbanin I've been following this since the use case that @ian-whitestone would be useful to me as well. Curious if there has been any more work in this area. Thanks!

smitsrr · 2022-05-03T14:18:07Z

I would love this feature :-)

uttam17 · 2022-05-03T20:09:14Z

This would be very helpful

nick-heron-zip · 2022-05-10T03:51:48Z

+1 would be very helpful

jtcohen6 · 2022-05-13T10:48:08Z

Reopening this issue given the renewed interest!

Some of this is now possible with metadata, namely the result status that can select skipped models to rerun (docs). Think about this as, you can pick all nodes where both are true: "skipped" because of an upstream failure, and tagged to rerun_despite_upstream_failure.

dbt run --select result:skipped,rerun_despite_upstream_failure --state previous/run_results.json

I'd say, I'm open to this possibility, of conditionally disabling "skip downstream models on failure." It's a core construct of dbt-core's execution model, but I don't know that it's an absolutely essential component of the opinionated workflow, or of DAGs more generally.

Where to configure?

I think it feels appropriate to support this config on either the parent node ("I'm worth stopping the DAG for"), or the child node ("I don't care about upstream failures"). By picking one OR the other, it's much simpler to reason about, and much simpler to implement.

I'm hesitant to support defining this configuration for the full matrix of parent-child relationships, as I think that can get unwieldy very quickly. I think it might be possible to deploy ephemeral "middle models" that would be "passthroughs" for the upstream model's data, while allowing you to customize this behavior. Similar to the discussion about dbt build and test failures in #4591. Unlike test failures, though, which access actual data, the ephemeral model would need some way to know that the upstream model errored. Query logs...?

How to configure?

There are two ways we could support this:

a per-model config, which controls behavior about this node's children when it fails
an overall "run-level" config (--skip-on-failures=...)

If we decide to support both, I'd see this being similar to how the --full-refresh flag / full_refresh model config work today (docs), where the model-level config (if set) takes precedence over the run-level flag.

How to implement?

For parents, that would look like adding a check here for result.node.config.get('on_error') == 'continue', before "marking dependent errors":

dbt-core/core/dbt/task/runnable.py

Lines 337 to 342 in 72c17c4

    
           if result.status in self.MARK_DEPENDENT_ERRORS_STATUSES: 
        
               if is_ephemeral: 
        
                   cause = result 
        
               else: 
        
                   cause = None 
        
               self._mark_dependent_errors(node.unique_id, result, cause)

For children, that would look like checking whether the "dependent node" has on_upstream_error: continue before marking it to skip:

dbt-core/core/dbt/task/runnable.py

Lines 399 to 400 in 72c17c4

    
           for dep_node_id in self.graph.get_dependent_nodes(node_id): 
        
               self._skipped_children[dep_node_id] = cause

Next steps

I think there are still some open questions here—namely, whether parents or children are the right place to put this config. It'd be helpful to get more context on specific use cases, to motivate the choice!

Personally, I lean toward parents. This feels analogous to the "warning" behavior of tests, which say: I found some failures, which is good information to know, but it's not worth stopping the DAG for.

nick-heron-zip · 2022-05-16T04:35:31Z

Our context:

In our source layer we rename and recast all fields where necessary. One of the Salesforce tables (dimension) we ingest via Fivetran has 400+ columns. Less than 10 of these columns are used downstream in business-critical models. Recently one of the ~390 unused columns was dropped from the source schema, leading to a DAG failure and delayed business data while we fixed the error.

As a temporary solution we are no longer enforcing naming conventions or typecasting columns from Salesforce tables. Long-term we would like to trigger a warning instead of a failure when issues are detected in dimension tables.

jtcohen6 · 2022-05-16T08:25:22Z

@nick-heron-zip Thanks for sharing the use case!

My instinct would be to include only the columns in your staging model that you are actually using downstream, those which are actually relevant to your "analytical universe." I realize it doesn't say that in the docs or discourse explicitly, but that's always been, for me, a significant consideration when writing a staging model.

I also know that I've occasionally used staging model definitions as a form of documentation / quick look-up to see what's available in the source—but that's why you can document and describe columns in sources, and dbt docs generate produces catalog info (including full column list) for sources, too.

boje · 2022-06-21T19:49:37Z

Our context:

Our source is JSON and with Snowpipe these is loading into a table as the columntype VARIANT. In some cases loading data into a VARIANT is recommended by Snowflake as new schema changes won't break the load of data into Snowflake.
When staging these data, the JSON elements is flattern out into a table with a number of column and datatype.
If a column have been decided as int but now is sent as char the stage-model will break.

Another scenarie is when creating a merge statement in Snowflake it is require that the rows are unique otherwise this error is raise Duplicate row detected during DML action.

In both cases dataset X is returning an error and the dbt Cloud job will stop. Dataset Z, Y and Q is working perfectly but because X return error all models are blocked, even if there is reports only depending on Z and Y dataset.

Hence we are wondering how to avoid a all or nothing scenario as it is today.
One option is to split the jobs into smaller jobs, but that will create dependency cross jobs and something that is not support in dbt Cloud today.

jens-koster · 2022-08-18T07:10:02Z

our use case:
Some sources are google sheets of rarely changing data, edited by humans. Notoriously breaky due to typos. This data is not worth skipping the downstream models for. I would stage this source to a table, which would occasionally fail without affecting downstream loads. On failure the downstream loads would use the data I materialised in the previous run. That works well enough and I will catch the problem using freshness alerts on the table of materialised data.
I can achieve this by running the staging of these tables in a separate step before the main run. Having this feature would make it a little neater and less work.

I can also see myself using this setting as a quick fix to get the load running again after failure, just until I can find the root cause and fix it properly. Excluding the model in the dbt run command means messing with the devops/build stuff, risking the entire load. Setting a config on one model is low impact and I can do it in my normal work flow of editing models.

davidsr2r · 2022-09-28T02:22:17Z

We're getting pretty desperate for this. There are all kinds of models we have where we pull in data from a variety of data providers, and one of those providers having out of date data (due to e.g. a schema breakage) doesn't invalidate the rest of the providers' data in the downstream models.

I'm about to have to break up our DAG purely to avoid this breakage, which nullifies a significant part of dbt's value, in DAG management.

jtcohen6 · 2022-09-28T11:26:31Z

@boje @jens-koster @davidsr2r I've marked this issue as help_wanted, meaning that we're not planning to prioritize it ourselves in the next few months, but we'd be happy to have a motivated community member work on contributing it.

Do you have thoughts on the questions I raised in my comment above, around how you'd want to configure this? Would you look for an all-or-nothing runtime config (--no-skip-on-failures), or a config you can specify in project code for specific upstream/parent models?

davidsr2r · 2022-09-30T02:14:13Z

@jtcohen6

We'd be needing a per-model config, so that the rerun of skipped models is intentional and explicit, and avoids downstream runs that shouldn't happen.

Re: Parents vs. Children, I'd say that it probably depends on your use case preferences; if it's on parents then you need to configure it in fewer places and it is information that is specific to that model, which makes a lot of sense, however if it's on children then the query that needs to be aware of the possibility of failure is in the same file that contains the config option, making the dependency and failure-case more explicit for the developer. I always prefer maintainability and so, while I could be persuaded otherwise, I'd therefore want us to use the latter option. Why not support both, or just one or the other?

Inviso-LiseBohr · 2023-04-11T09:03:07Z

@jtcohen6
We have a similar situation where we are waiting for the responsible team to fix an issue causing a stage model fail. This model contains very rarely changing data and therefore we can live without having it updated for some time.
In the meantime the rest of our dag is skipped and none of our most important models are not updated until the issue is fixed.

In our situation it would make sense to have the configuration on the parent model, since a larger number of models references the failing model.

We would appreciate a feature described earlier in this thread on_error key and values like skip_children and continue
Is there a good way to upvote the implementation of this?

Btw. our workaround for now is to just temporarily exclude the failing model from our dbt build,
the downside is that it becomes a little manual and we need to remember to remove it again when the issue is fixed.

VDFaller · 2023-05-10T14:41:40Z

My only worry about putting the config in the child would be this use case

We have a chain A->B->C
B is configured to skip if upstream fails
C is configured not to skip if upstream fails
A fails

So C would only have data from B from the last time that A passed. I think this could possibly lead analysts (or more likely business side) to have bad trust in thinking that the data in C is up to date (even though it will be stale for the B portion), because of some loaded_date column being up to date.

pengyusong · 2023-09-07T01:37:28Z

our use case:
our bi tool use serveral wide-table to show metrics, the wide-table composed of multiple tables, when one upsteam table result in wide-table willl failed, so the impact is significant , i want ignore some upsteam table failures, then wide-table can run success, then I have enough time to fix the abnormal upstream model

CitationJamespaul · 2023-11-01T12:51:38Z

We have a large project building our entire data warehouse - there are very few points of failure that would make us want to terminate our entire downstream build.

we would rather have downstream continuation by default and define our points of failure with tests.

id be happy with a --no-skip-on-failures option

maxrostron · 2024-04-19T10:27:05Z

Would still be very interested in the skip_children , abort, continue proposal and being able to specify that at the top of specific models - especially as we use dbt as part of dagster, so we don't always trigger dbt flows using command line prompts like --no-skip-on-failures.

Would be a game changer for our flow. At the moment we have to do a lot of convoluted workarounds.

jtcohen6 · 2024-05-22T15:20:27Z

I've left a comment on the linked PR:

add flag --no-skip-on-failure for command build, run, retry, seed #9020 (review)

After rereading the use cases described in the comments above (thank you everyone!), I'm interested in moving forward with this as a model-level config set on "parent" models, which are generally flakey failures caused by changes in unreliable upstream data sources. I don't believe we should support this on "child" models (at all), or as a global flag broadly applied (yet).

mahiki · 2024-09-22T16:42:25Z

I run analytics pipelines for reports, not data engineering pipelines.

If a tests fails (esp. anomaly detection tests) I want the whole pipeline to run. I'll get an alert about the anomaly or other failure and deal with it.

Two bad things about current behavior:

I miss downstream errors that are unrelated to the first test failure. Yes there can be two unrelated failures and I want to see them both after a single build or run/test. I don't want to fix one thing, rebuild, find new error, fix, rebuild, etc multiple times.
What I cannot have is a stack of empty reports with nothing in them for days while I work out the bug in one small part of the outputs. A test failure in my domain is something the business team would prefer to talk about rather than skip review of many blank pages of reports, or skip the business review altogether.

edit:
Oh, I would just do a dbt run, and then dbt test in a different process that could handle failure in a customized way. Sorry, I just started using dbt and elementary. Carry on.

ian-whitestone added enhancement New feature or request triage labels Feb 17, 2020

drewbanin removed the triage label Feb 18, 2020

bashyroger mentioned this issue Jun 18, 2020

Option to "Continue loading from failure point" when restarting a previous failed run #2568

Closed

github-actions bot added the stale Issues that have gone stale label Dec 4, 2021

github-actions bot closed this as completed Dec 11, 2021

jtcohen6 reopened this May 13, 2022

jtcohen6 added Team:Execution help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed stale Issues that have gone stale labels May 13, 2022

jtcohen6 mentioned this issue May 13, 2022

[CT-627] [Feature] Allow references to input parameters, specifically for dependency handling #5231

Closed

1 task

jtcohen6 added the Refinement Maintainer input needed label Nov 28, 2022

jtcohen6 removed the Team:Execution label Jul 19, 2023

leo-schick linked a pull request Nov 7, 2023 that will close this issue

add flag --no-skip-on-failure for command build, run, retry, seed #9020

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to disable "skip downstream models on failure" #2142

Option to disable "skip downstream models on failure" #2142

ian-whitestone commented Feb 17, 2020 •

edited

Loading

drewbanin commented Feb 18, 2020

ian-whitestone commented Feb 19, 2020 •

edited

Loading

bashyroger commented Feb 20, 2020 •

edited

Loading

drewbanin commented Feb 20, 2020

github-actions bot commented Dec 4, 2021

cmdugan13 commented Jan 17, 2022

smitsrr commented May 3, 2022

uttam17 commented May 3, 2022

nick-heron-zip commented May 10, 2022

jtcohen6 commented May 13, 2022 •

edited

Loading

nick-heron-zip commented May 16, 2022

jtcohen6 commented May 16, 2022

boje commented Jun 21, 2022

jens-koster commented Aug 18, 2022

davidsr2r commented Sep 28, 2022

jtcohen6 commented Sep 28, 2022

davidsr2r commented Sep 30, 2022

Inviso-LiseBohr commented Apr 11, 2023

VDFaller commented May 10, 2023

pengyusong commented Sep 7, 2023

CitationJamespaul commented Nov 1, 2023 •

edited

Loading

maxrostron commented Apr 19, 2024

jtcohen6 commented May 22, 2024

mahiki commented Sep 22, 2024 •

edited

Loading

Option to disable "skip downstream models on failure" #2142

Option to disable "skip downstream models on failure" #2142

Comments

ian-whitestone commented Feb 17, 2020 • edited Loading

Describe the feature

Describe alternatives you've considered

Who will this benefit?

drewbanin commented Feb 18, 2020

ian-whitestone commented Feb 19, 2020 • edited Loading

bashyroger commented Feb 20, 2020 • edited Loading

drewbanin commented Feb 20, 2020

github-actions bot commented Dec 4, 2021

cmdugan13 commented Jan 17, 2022

smitsrr commented May 3, 2022

uttam17 commented May 3, 2022

nick-heron-zip commented May 10, 2022

jtcohen6 commented May 13, 2022 • edited Loading

Where to configure?

How to configure?

How to implement?

Next steps

nick-heron-zip commented May 16, 2022

jtcohen6 commented May 16, 2022

boje commented Jun 21, 2022

jens-koster commented Aug 18, 2022

davidsr2r commented Sep 28, 2022

jtcohen6 commented Sep 28, 2022

davidsr2r commented Sep 30, 2022

Inviso-LiseBohr commented Apr 11, 2023

VDFaller commented May 10, 2023

pengyusong commented Sep 7, 2023

CitationJamespaul commented Nov 1, 2023 • edited Loading

maxrostron commented Apr 19, 2024

jtcohen6 commented May 22, 2024

mahiki commented Sep 22, 2024 • edited Loading

ian-whitestone commented Feb 17, 2020 •

edited

Loading

ian-whitestone commented Feb 19, 2020 •

edited

Loading

bashyroger commented Feb 20, 2020 •

edited

Loading

jtcohen6 commented May 13, 2022 •

edited

Loading

CitationJamespaul commented Nov 1, 2023 •

edited

Loading

mahiki commented Sep 22, 2024 •

edited

Loading