Create a no-op exposure runner #11082

aranke · 2024-12-02T14:03:16Z

Problem

We want to make exposures executable with a no-op status in run results and log line, similar to saved queries.

Solution

Create a new NoOp run status
Make Exposure extend NodeInfoMixin to make it executable in build
Create a new NoOpRunner in a new dbt.runner namespace, which uses the NoOp run status above
Make SavedQueryRunner inherit from NoOpRunner and move it to the dbt.runner namespace
Create a new ExposureRunner that also inherits from the NoOpRunner
Add new tests for run status for both exposures and saved queries
Add tests for selectors for exposures

Sample output

Logs

❯ dbt build
12:39:45  Running with dbt=1.10.0-a1
12:39:45  Registered adapter: duckdb=1.9.1-dev28
12:39:46  Unable to do partial parsing because a project config has changed
12:39:46  Found 5 models, 3 seeds, 20 data tests, 1 exposure, 422 macros, 1 group
...
12:39:47  26 of 29 NO-OP exposure weekly_jaffle_metrics .................................. [NO-OP in 0.00s]
...
12:39:47  Finished running 1 exposure, 3 seeds, 2 table models, 20 data tests, 3 view models in 0 hours 0 minutes and 0.44 seconds (0.44s).
12:39:47  
12:39:47  Completed successfully
12:39:47  
12:39:47  Done. PASS=28 WARN=0 ERROR=0 SKIP=0 NO-OP=1 TOTAL=29

Run Results

❯ jq .results target/run_results.json
[
  ...
  {
    "status": "no-op",
    "timing": [
      {
        "name": "compile",
        "started_at": "2024-12-03T12:39:47.125472Z",
        "completed_at": "2024-12-03T12:39:47.125474Z"
      },
      {
        "name": "execute",
        "started_at": "2024-12-03T12:39:47.129151Z",
        "completed_at": "2024-12-03T12:39:47.129158Z"
      }
    ],
    "thread_id": "Thread-17 (worker)",
    "execution_time": 0.01341700553894043,
    "adapter_response": {},
    "message": "NO-OP",
    "failures": 0,
    "unique_id": "exposure.jaffle_shop.weekly_jaffle_metrics",
    "compiled": null,
    "compiled_code": null,
    "relation_name": null,
    "batch_results": null
  },
  ...
]

Checklist

I have read the contributing guide and understand what's expected of me.
I have run this code in development, and it appears to resolve the stated issue.
This PR includes tests, or tests are not required or relevant for this PR.
This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
This PR includes type annotations for new and modified functions.

github-actions · 2024-12-02T14:03:31Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2024-12-02T14:09:25Z

Codecov Report

Attention: Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.91%. Comparing base (03fdb4c) to head (ef3d8d4).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11082      +/-   ##
==========================================
- Coverage   88.96%   88.91%   -0.05%     
==========================================
  Files         183      187       +4     
  Lines       23933    23970      +37     
==========================================
+ Hits        21291    21312      +21     
- Misses       2642     2658      +16

Flag	Coverage Δ
integration	`86.21% <94.73%> (-0.13%)`	⬇️
unit	`61.98% <77.19%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`61.98% <77.19%> (+0.02%)`	⬆️
Integration Tests	`86.21% <94.73%> (-0.13%)`	⬇️

MichelleArk · 2024-12-03T16:00:58Z

core/dbt/contracts/graph/nodes.py

@@ -1441,6 +1441,12 @@ def same_contents(self, old: Optional["Exposure"]) -> bool:
    def group(self):
        return None

+    def __post_serialize__(self, dct: Dict, context: Optional[Dict] = None):
+        dct = super().__post_serialize__(dct, context)
+        if "_event_status" in dct:


why is this necessary?

You're right, this field is never written out.
Fixed the failing test to account for this field instead.

I wasn't implying this was unnecessary, just moreso trying to understand what added _event_status and why the change was made.

Digging into this I see its because of the NodeInfoMixin inheritance change to Exposure in this PR. I think we should do what is consistent with other nodes that inherit from NodeInfoMixin. It looks like the post_serialize implementation was the same as what ParsedNode did:

dbt-core/core/dbt/contracts/graph/nodes.py

Lines 282 to 286 in 1b7d9b5

def __post_serialize__(self, dct: Dict, context: Optional[Dict] = None):

dct = super().__post_serialize__(dct, context)

if "_event_status" in dct:

del dct["_event_status"]

return dct

I think keeping the __post_serialize__ change makes sense then 👍 Probably the best way to address this is actually implement the method on NodeInfoMixin itself, but I'd do that in a follow-up.

no sweat, just reverted the commit here

core/dbt/runner/exposure_runner.py

core/dbt/runner/saved_query_runner.py

tests/functional/adapter/hooks/test_on_run_hooks.py

This reverts commit f6c8715.

MichelleArk · 2024-12-03T17:58:08Z

Not strictly necessary to merge this PR, but let's just remember to make an update to the schemas in schemas.getdbt.com associated with this change.

jtcohen6

I left two comments, both of which confirm the behavior already present in this PR.

We manually confirmed that this should not run into the same issues we saw when adding hooks to the on-run-end {{ results }} context (#10699, #10885) because the structure of the no-op exposure result matches the structure of all other node results. Trying to access node-level properties that exist for other executable nodes, and do not exist for exposures (e.g. schema), will simply return a null value.

# dbt_project.yml
on-run-end:
  - "{% for result in results %} {{ log('The value of result.node.schema is: ' ~ result.node.schema, info = true) }} {% endfor %}"

🎩 Looks good:

% dbt retry
10:07:40  Running with dbt=1.10.0-a1
10:07:40  Registered adapter: duckdb=1.8.4
10:07:41  Found 1 seed, 1 operation, 1 model, 1 exposure, 421 macros
10:07:41  Nothing to do. Try checking your model configs and model specification args
(env) jerco@Jeremy-Cohen testy % dbt build -s my_model+
10:08:22  Running with dbt=1.10.0-a1
10:08:22  Registered adapter: duckdb=1.8.4
10:08:22  Unable to do partial parsing because a project config has changed
10:08:23  Found 1 model, 1 seed, 1 operation, 1 exposure, 421 macros
10:08:23  
10:08:23  Concurrency: 1 threads (target='dev')
10:08:23  
10:08:23  1 of 2 START sql table model main.my_model ..................................... [RUN]
10:08:23  1 of 2 ERROR creating sql table model main.my_model ............................ [ERROR in 0.04s]
10:08:23  2 of 2 SKIP exposure my_exposure ............................................... [SKIP]
10:08:23  
10:08:23  The value of result.node.schema for model.testy.my_model is: main
10:08:23  The value of result.node.schema for exposure.testy.my_exposure is: 
10:08:23  1 of 1 START hook: testy.on-run-end.0 .......................................... [RUN]
10:08:23  1 of 1 OK hook: testy.on-run-end.0 ............................................. [OK in 0.00s]
10:08:23  
10:08:23  Finished running 1 exposure, 1 project hook, 1 table model in 0 hours 0 minutes and 0.13 seconds (0.13s).
10:08:23  
10:08:23  Completed with 1 error, 0 partial successes, and 0 warnings:
10:08:23  
10:08:23    Runtime Error in model my_model (models/example/my_model.sql)
  Parser Error: syntax error at or near "bad_syntax"
10:08:23  
10:08:23  Done. PASS=1 WARN=0 ERROR=1 SKIP=1 NO-OP=0 TOTAL=3

# fix syntax error

% dbt retry             
10:08:52  Running with dbt=1.10.0-a1
10:08:52  Registered adapter: duckdb=1.8.4
10:08:52  Found 1 seed, 1 operation, 1 model, 1 exposure, 421 macros
10:08:52  
10:08:52  Concurrency: 1 threads (target='dev')
10:08:52  
10:08:52  1 of 2 START sql table model main.my_model ..................................... [RUN]
10:08:52  1 of 2 OK created sql table model main.my_model ................................ [OK in 0.04s]
10:08:52  2 of 2 NO-OP exposure my_exposure .............................................. [NO-OP in 0.00s]
10:08:52  
10:08:52  The value of result.node.schema for model.testy.my_model is: main
10:08:52  The value of result.node.schema for exposure.testy.my_exposure is: 
10:08:52  1 of 1 START hook: testy.on-run-end.0 .......................................... [RUN]
10:08:52  1 of 1 OK hook: testy.on-run-end.0 ............................................. [OK in 0.00s]
10:08:52  
10:08:52  Finished running 1 exposure, 1 project hook, 1 table model in 0 hours 0 minutes and 0.11 seconds (0.11s).
10:08:52  
10:08:52  Completed successfully
10:08:52  
10:08:52  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 NO-OP=1 TOTAL=3

% dbt retry             
10:08:55  Running with dbt=1.10.0-a1
10:08:55  Registered adapter: duckdb=1.8.4
10:08:55  Found 1 seed, 1 operation, 1 model, 1 exposure, 421 macros
10:08:55  Nothing to do. Try checking your model configs and model specification args

In run_results.json:

{
            "status": "no-op",
            "timing": [
                {
                    "name": "compile",
                    "started_at": "2024-12-12T10:00:30.047130Z",
                    "completed_at": "2024-12-12T10:00:30.047131Z"
                },
                {
                    "name": "execute",
                    "started_at": "2024-12-12T10:00:30.047287Z",
                    "completed_at": "2024-12-12T10:00:30.047290Z"
                }
            ],
            "thread_id": "Thread-1",
            "execution_time": 0.0006129741668701172,
            "adapter_response": {},
            "message": "NO-OP",
            "failures": 0,
            "unique_id": "exposure.testy.my_exposure",
            "compiled": null,
            "compiled_code": null,
            "relation_name": null,
            "batch_results": null
        }

jtcohen6 · 2024-12-12T09:30:35Z

core/dbt/runners/no_op_runner.py

+        # no-op
+        return RunResult(
+            node=compiled_node,
+            status=RunStatus.NoOp,


This will have the effect of changing the existing result behavior of saved queries. Currently, their RunResult returns RunStatus.Success, and this will instead return RunStatus.NoOp. This makes sense to me, and I do not believe it is a breaking change that we need to put behind a behavior-change flag 👍

jtcohen6 · 2024-12-12T09:39:51Z

core/dbt/artifacts/schemas/results.py



 class RunStatus(StrEnum):
    Success = NodeStatus.Success
    Error = NodeStatus.Error
    Skipped = NodeStatus.Skipped
    PartialSuccess = NodeStatus.PartialSuccess
+    NoOp = NodeStatus.NoOp


@aranke For the purposes of dbt retry, should NoOp be included in RETRYABLE_STATUSES?

Put differently: If my selection criteria includes a set of saved queries and --no-export-saved-queries (docs), those saved queries will all have no-op statuses in run_results.json. If I rerun with dbt retry --export-saved-queries, should dbt reprocess those saved queries, this time with triggers? (The same thing will go for active exposure.)

In order to make that happen, I think we'd also need to include --export-saved-queries (and the analogous config for exposures) in ALLOW_CLI_OVERRIDE_FLAGS.

My first intuition here was: Because NoOp is not a failure (or a Skip due to an upstream failure), it should not be retryable. But it also is not Success.

The benefit of making NoOp retryable (and adding those CLI flags) is a slightly more ergonomic mechanism to quickly "heal" a run that accidentally missed including that config.

The harm of including them is extra noise in the logs (dbt is retrying more things than expected, only to no-op most of them).

When we discussed this the other week, I think we landed on including RunStatus.NoOp in the retryable statuses — but after reasoning through this above, I don't think we should include it (= leave the code as it is)

I agree with this assessment because:

If an exposure has a failure/skipped status, it will be retried automatically (I will add a test for this case).

Its result won’t change unless the project config also changes (unlikely IME).

This is also a change we can implement fairly easily if we see a need for it, so I’ll merge the code in as-is for now.

Create a no-op exposure runner

e6325c6

cla-bot bot added the cla:yes label Dec 2, 2024

Fix functional test, remove _event_status from Exposure + Saved Query

f41ef0b

aranke added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Dec 2, 2024

aranke added 2 commits December 2, 2024 14:52

fix test_events

adc6af2

Add no-op to count tests

e1969eb

aranke added the artifacts label Dec 2, 2024

aranke added 2 commits December 2, 2024 16:47

add changie

38358b1

Add results tests for exposures and saved queries

d3eda93

aranke marked this pull request as ready for review December 3, 2024 12:43

aranke requested a review from a team as a code owner December 3, 2024 12:43

aranke requested review from gshank, MichelleArk, jtcohen6, dbeatty10 and graciegoheen December 3, 2024 12:44

MichelleArk reviewed Dec 3, 2024

View reviewed changes

core/dbt/runner/exposure_runner.py Outdated Show resolved Hide resolved

MichelleArk reviewed Dec 3, 2024

View reviewed changes

core/dbt/runner/saved_query_runner.py Outdated Show resolved Hide resolved

MichelleArk reviewed Dec 3, 2024

View reviewed changes

tests/functional/adapter/hooks/test_on_run_hooks.py Outdated Show resolved Hide resolved

aranke added 3 commits December 3, 2024 16:58

rename: runner -> runners

503ddac

remove log counts tests

878d5da

remove post_serialize and fix test

f6c8715

aranke requested a review from MichelleArk December 3, 2024 17:34

Revert "remove post_serialize and fix test"

e0281a6

This reverts commit f6c8715.

MichelleArk approved these changes Dec 3, 2024

View reviewed changes

aranke mentioned this pull request Dec 4, 2024

Add no-op status to run_results.json v6 dbt-labs/schemas.getdbt.com#73

Merged

aranke added 2 commits December 4, 2024 16:05

Merge branch 'main' into wip_exposure_runner

4b8cb16

Merge branch 'main' into wip_exposure_runner

ef3d8d4

jtcohen6 approved these changes Dec 12, 2024

View reviewed changes

aranke merged commit 7df04b0 into main Dec 12, 2024
54 of 56 checks passed

aranke deleted the wip_exposure_runner branch December 12, 2024 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a no-op exposure runner #11082

Create a no-op exposure runner #11082

aranke commented Dec 2, 2024 •

edited

Loading

github-actions bot commented Dec 2, 2024

codecov bot commented Dec 2, 2024 •

edited

Loading

MichelleArk Dec 3, 2024

aranke Dec 3, 2024

MichelleArk Dec 3, 2024

aranke Dec 3, 2024

MichelleArk commented Dec 3, 2024

jtcohen6 left a comment

jtcohen6 Dec 12, 2024

jtcohen6 Dec 12, 2024

aranke Dec 12, 2024

	def __post_serialize__(self, dct: Dict, context: Optional[Dict] = None):
	dct = super().__post_serialize__(dct, context)
	if "_event_status" in dct:
	del dct["_event_status"]
	return dct

Create a no-op exposure runner #11082

Create a no-op exposure runner #11082

Conversation

aranke commented Dec 2, 2024 • edited Loading

Problem

Solution

Sample output

Logs

Run Results

Checklist

github-actions bot commented Dec 2, 2024

codecov bot commented Dec 2, 2024 • edited Loading

Codecov Report

MichelleArk Dec 3, 2024

Choose a reason for hiding this comment

aranke Dec 3, 2024

Choose a reason for hiding this comment

MichelleArk Dec 3, 2024

Choose a reason for hiding this comment

aranke Dec 3, 2024

Choose a reason for hiding this comment

MichelleArk commented Dec 3, 2024

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Dec 12, 2024

Choose a reason for hiding this comment

jtcohen6 Dec 12, 2024

Choose a reason for hiding this comment

aranke Dec 12, 2024

Choose a reason for hiding this comment

aranke commented Dec 2, 2024 •

edited

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading