-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a no-op exposure runner #11082
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11082 +/- ##
==========================================
- Coverage 88.96% 88.91% -0.05%
==========================================
Files 183 187 +4
Lines 23933 23970 +37
==========================================
+ Hits 21291 21312 +21
- Misses 2642 2658 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@@ -1441,6 +1441,12 @@ def same_contents(self, old: Optional["Exposure"]) -> bool: | |||
def group(self): | |||
return None | |||
|
|||
def __post_serialize__(self, dct: Dict, context: Optional[Dict] = None): | |||
dct = super().__post_serialize__(dct, context) | |||
if "_event_status" in dct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this field is never written out.
Fixed the failing test to account for this field instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't implying this was unnecessary, just moreso trying to understand what added _event_status and why the change was made.
Digging into this I see its because of the NodeInfoMixin
inheritance change to Exposure
in this PR. I think we should do what is consistent with other nodes that inherit from NodeInfoMixin. It looks like the post_serialize implementation was the same as what ParsedNode did:
dbt-core/core/dbt/contracts/graph/nodes.py
Lines 282 to 286 in 1b7d9b5
def __post_serialize__(self, dct: Dict, context: Optional[Dict] = None): | |
dct = super().__post_serialize__(dct, context) | |
if "_event_status" in dct: | |
del dct["_event_status"] | |
return dct |
I think keeping the __post_serialize__
change makes sense then 👍 Probably the best way to address this is actually implement the method on NodeInfoMixin itself, but I'd do that in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no sweat, just reverted the commit here
This reverts commit f6c8715.
Not strictly necessary to merge this PR, but let's just remember to make an update to the schemas in schemas.getdbt.com associated with this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two comments, both of which confirm the behavior already present in this PR.
We manually confirmed that this should not run into the same issues we saw when adding hooks to the on-run-end
{{ results }}
context (#10699, #10885) because the structure of the no-op exposure result matches the structure of all other node results. Trying to access node-level properties that exist for other executable nodes, and do not exist for exposures (e.g. schema
), will simply return a null value.
# dbt_project.yml
on-run-end:
- "{% for result in results %} {{ log('The value of result.node.schema is: ' ~ result.node.schema, info = true) }} {% endfor %}"
🎩 Looks good:
% dbt retry
10:07:40 Running with dbt=1.10.0-a1
10:07:40 Registered adapter: duckdb=1.8.4
10:07:41 Found 1 seed, 1 operation, 1 model, 1 exposure, 421 macros
10:07:41 Nothing to do. Try checking your model configs and model specification args
(env) jerco@Jeremy-Cohen testy % dbt build -s my_model+
10:08:22 Running with dbt=1.10.0-a1
10:08:22 Registered adapter: duckdb=1.8.4
10:08:22 Unable to do partial parsing because a project config has changed
10:08:23 Found 1 model, 1 seed, 1 operation, 1 exposure, 421 macros
10:08:23
10:08:23 Concurrency: 1 threads (target='dev')
10:08:23
10:08:23 1 of 2 START sql table model main.my_model ..................................... [RUN]
10:08:23 1 of 2 ERROR creating sql table model main.my_model ............................ [ERROR in 0.04s]
10:08:23 2 of 2 SKIP exposure my_exposure ............................................... [SKIP]
10:08:23
10:08:23 The value of result.node.schema for model.testy.my_model is: main
10:08:23 The value of result.node.schema for exposure.testy.my_exposure is:
10:08:23 1 of 1 START hook: testy.on-run-end.0 .......................................... [RUN]
10:08:23 1 of 1 OK hook: testy.on-run-end.0 ............................................. [OK in 0.00s]
10:08:23
10:08:23 Finished running 1 exposure, 1 project hook, 1 table model in 0 hours 0 minutes and 0.13 seconds (0.13s).
10:08:23
10:08:23 Completed with 1 error, 0 partial successes, and 0 warnings:
10:08:23
10:08:23 Runtime Error in model my_model (models/example/my_model.sql)
Parser Error: syntax error at or near "bad_syntax"
10:08:23
10:08:23 Done. PASS=1 WARN=0 ERROR=1 SKIP=1 NO-OP=0 TOTAL=3
# fix syntax error
% dbt retry
10:08:52 Running with dbt=1.10.0-a1
10:08:52 Registered adapter: duckdb=1.8.4
10:08:52 Found 1 seed, 1 operation, 1 model, 1 exposure, 421 macros
10:08:52
10:08:52 Concurrency: 1 threads (target='dev')
10:08:52
10:08:52 1 of 2 START sql table model main.my_model ..................................... [RUN]
10:08:52 1 of 2 OK created sql table model main.my_model ................................ [OK in 0.04s]
10:08:52 2 of 2 NO-OP exposure my_exposure .............................................. [NO-OP in 0.00s]
10:08:52
10:08:52 The value of result.node.schema for model.testy.my_model is: main
10:08:52 The value of result.node.schema for exposure.testy.my_exposure is:
10:08:52 1 of 1 START hook: testy.on-run-end.0 .......................................... [RUN]
10:08:52 1 of 1 OK hook: testy.on-run-end.0 ............................................. [OK in 0.00s]
10:08:52
10:08:52 Finished running 1 exposure, 1 project hook, 1 table model in 0 hours 0 minutes and 0.11 seconds (0.11s).
10:08:52
10:08:52 Completed successfully
10:08:52
10:08:52 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 NO-OP=1 TOTAL=3
% dbt retry
10:08:55 Running with dbt=1.10.0-a1
10:08:55 Registered adapter: duckdb=1.8.4
10:08:55 Found 1 seed, 1 operation, 1 model, 1 exposure, 421 macros
10:08:55 Nothing to do. Try checking your model configs and model specification args
In run_results.json
:
{
"status": "no-op",
"timing": [
{
"name": "compile",
"started_at": "2024-12-12T10:00:30.047130Z",
"completed_at": "2024-12-12T10:00:30.047131Z"
},
{
"name": "execute",
"started_at": "2024-12-12T10:00:30.047287Z",
"completed_at": "2024-12-12T10:00:30.047290Z"
}
],
"thread_id": "Thread-1",
"execution_time": 0.0006129741668701172,
"adapter_response": {},
"message": "NO-OP",
"failures": 0,
"unique_id": "exposure.testy.my_exposure",
"compiled": null,
"compiled_code": null,
"relation_name": null,
"batch_results": null
}
# no-op | ||
return RunResult( | ||
node=compiled_node, | ||
status=RunStatus.NoOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have the effect of changing the existing result behavior of saved queries. Currently, their RunResult
returns RunStatus.Success
, and this will instead return RunStatus.NoOp
. This makes sense to me, and I do not believe it is a breaking change that we need to put behind a behavior-change flag 👍
|
||
|
||
class RunStatus(StrEnum): | ||
Success = NodeStatus.Success | ||
Error = NodeStatus.Error | ||
Skipped = NodeStatus.Skipped | ||
PartialSuccess = NodeStatus.PartialSuccess | ||
NoOp = NodeStatus.NoOp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aranke For the purposes of dbt retry
, should NoOp
be included in RETRYABLE_STATUSES?
Put differently: If my selection criteria includes a set of saved queries and --no-export-saved-queries
(docs), those saved queries will all have no-op statuses in run_results.json
. If I rerun with dbt retry --export-saved-queries
, should dbt reprocess those saved queries, this time with triggers? (The same thing will go for active exposure.)
In order to make that happen, I think we'd also need to include --export-saved-queries
(and the analogous config for exposures) in ALLOW_CLI_OVERRIDE_FLAGS.
My first intuition here was: Because NoOp
is not a failure (or a Skip
due to an upstream failure), it should not be retryable. But it also is not Success
.
- The benefit of making
NoOp
retryable (and adding those CLI flags) is a slightly more ergonomic mechanism to quickly "heal" a run that accidentally missed including that config. - The harm of including them is extra noise in the logs (dbt is retrying more things than expected, only to no-op most of them).
When we discussed this the other week, I think we landed on including RunStatus.NoOp
in the retryable statuses — but after reasoning through this above, I don't think we should include it (= leave the code as it is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this assessment because:
- If an exposure has a failure/skipped status, it will be retried automatically (I will add a test for this case).
- Its result won’t change unless the project config also changes (unlikely IME).
This is also a change we can implement fairly easily if we see a need for it, so I’ll merge the code in as-is for now.
Problem
We want to make exposures executable with a no-op status in run results and log line, similar to saved queries.
Solution
NoOp
run statusExposure
extendNodeInfoMixin
to make it executable inbuild
NoOpRunner
in a newdbt.runner
namespace, which uses theNoOp
run status aboveSavedQueryRunner
inherit fromNoOpRunner
and move it to thedbt.runner
namespaceExposureRunner
that also inherits from theNoOpRunner
Sample output
Logs
Run Results
Checklist