-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-3569] [Feature] Add a --favor-state-selector flag that supports node selection syntax #9410
Comments
--favor-state-selector
flag that supports node selection syntax
--favor-state-selector
flag that supports node selection syntax
Hi! Thanks for opening this issue. I'm thinking through this, and wondering if you can use the If you were to run
Can you give that a go, and let me know what you think? |
Hi Grace, thank you for taking a look! I believe using
It does achieve the desired defer selection behaviour but compared to a --favor-state-selector has two downsides:
|
Jumping in to offer an idea with one more layer -- wanna give it a shot @d-cole ? Producing multi-step artifactsLet's say you want to:
Then you can do the following:
dbt compile --select model_b+ --defer --favor-state --state prod-run-artifacts --target dev
rm -rf hybrid-run-artifacts
cp -r target hybrid-run-artifacts
dbt run --select model_c --defer --favor-state --state hybrid-run-artifacts --target dev For me, this gives:
You can repeat this approach successively with different selectors and targets to create an N-layer burrito with all your desired ingredients. |
Here's a tangible example that uses the the First, suppose we have a
-- model_d: {{ this }}
-- depends on:
-- model_a: {{ ref("model_a") }}
-- model_b: {{ ref("model_b") }}
-- model_b: {{ ref("model_c") }}
select 1 as id We can layer a series of target states like this to mix-n-match where the references are coming from: dbt compile --target prod --target-path prod-run-artifacts
dbt compile --select model_b+ --defer --favor-state --state prod-run-artifacts --target dev1 --target-path hybrid1-run-artifacts
dbt compile --select model_c --defer --favor-state --state hybrid1-run-artifacts --target dev2 --target-path hybrid2-run-artifacts
dbt compile --select model_d --defer --favor-state --state hybrid2-run-artifacts --target dev2 The final command gives the following output:
|
Hey @dbeatty10, thank you for the detailed response. I didn't know you could create a n-layer burrito like that, it is really cool! This approach does allow for the described node selection without any of the downsides of the alternatives mentioned above. However, a --favor-state-selector does seem like a simpler way to achieve the same behaviour. The favor-state-selector is much more familiar to what many users are used to (e.g. swapping table refs in a SQL query or input paths in a python job).
vs.
I understand if this isn't prioritized as it is not new functionality. No worries about that, thanks again for teaching me about the n-layer burrito! |
@d-cole Thanks for the describing a tricky scenario so well and exploring each of the edge cases to consider 🧠 Indeed, the proposed But since we can already support the end goal with current functionality, I'm going to close this as "not planned". |
@dbeatty10 I'm sorry to say that I've removed the possibility of the n-layer burrito, with some recent changes we made to rationalize the behavior of deferral (and resolve a thorny bug with unit tests):
In older versions of dbt-core, when a node was deferred, its manifest entry was completely overwritten with the node from the state manifest. In In dbt Core v1.8+, rather than completely overwriting the node, we're simply going to add an attribute ( (cc @MichelleArk - this is what we were talking about a few weeks ago) Going back to the original example:
Rather than dropping specific tables (difficult to manage and potentially expensive to recompute), I wonder if a more straightforward approach might be to switch target schemas, and use |
There are two scenario's that come to mind.
The clone approach does work and I've seen a fair bit of its use. The downside of the clone approach is:
With referencing upstreams from a different target, the experience I've found to be the most intuitive is one that is closest to swapping out table references in SQL. I suspect this is because that is a very common thing to do when iterating on SQL outside of dbt. The closer the defer experience can be to the flexibility that provides, the more use cases it will address. Due to the complexity of defers interface, I've come across a few solutions that just override ref or generate_schema/database_name in order to provide a |
@d-cole Really helpful, thank you! I've been reflecting more on this, especially now that the heavy-lifting conditional logic for Regarding the two workflows you've outlined:
This is the part that resonated most with me:
What if this looked exactly like swapping out the reference within your SQL? If it's part of my debugging of This is pretty ugly, though not as bad as I expected: -- macros/ref_from_state.sql
{% macro ref_from_state(model_name) %}
-- at parse time, just return a simple ref to capture the dependency
{{ return(ref(model_name)) if not execute }}
{% for node in graph.nodes.values() %}
{% if node.name == model_name %}
-- favor defer_relation if available, unless the upstream model is also selected in current run
{% set rel = node.defer_relation
if (node.defer_relation and node.unique_id not in selected_resources)
else node %}
{{ return(api.Relation.create(rel.database, rel.schema, rel.alias)) }}
{% endif %}
{% endfor %}
{% endmacro %} -- models/model_c.sql
with model_a as (
-- select * from {{ ref('model_a') }}
-- I am manually swapping this to prefer the defer_relation defined in --state manifest
select * from {{ ref_from_state('model_a') }}
),
model_b as (
select * from {{ ref('model_b') }}
),
... I think my primary hesitation stems from the number of flags we already have for very similar (and complex) functionality:
Both |
@jtcohen6 Thank you for looking into this!
This can achieve the desired behaviour but requires altering the SQL which seems like the toil defer is aiming to remove. I could just swap {{ ref('model_a') }} to the production fqn and use defer with roughly the same effort as swapping to this macro.
Good point, the cli flags can be overwhelming. What do you think of this?
|
Is this your first time submitting a feature request?
Describe the feature
It is often desirable to defer some but not all upstream parents. The current behaviour of --favor-state does not easily allow for node-specific selection. For example, consider the following scenario:
Alternatives:
dbt run -s model_c --defer --favor-state --state prod-run-artifacts
does not work as it will select model_b (prod) unless model_b (prod) is dropped.dbt run -s model_c --defer --state prod-run-artifacts
cannot achieve this behaviour without first dropping model_a (dev). While the cost of dropping model_a (dev) seems trivial in this example, it becomes increasingly cumbersome as the number of inputs grows. This requires the user to spend time to identify what upstreams exist in their dev target and to drop them.dbt run -s model_b+ --defer --favor-state --state prod-run-artifacts
will work but requires model_b and model_a to be executed in the same command. This can cause issues when the upstream dev target models have a long runtime. Additionally this can waste compute if the user has already recently run model_b in their dev target.Desired Behaviour:
This experience could be improved by defer supporting favor state node selection. Considering the example above, what this could look like is shown below. Ideally
--favor-state-selector
would support all the node selection syntax.dbt run -s model_c --defer --favor-state-selector model_a --state prod-run-artifacts
Who will this benefit?
Users with large models that take a long time and are expensive to recompute.
Are you interested in contributing this feature?
Yes
The text was updated successfully, but these errors were encountered: