Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move deferral resolution from merge_from_artifact to RuntimeRefResolver #9199

Merged
merged 10 commits into from
May 2, 2024

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Dec 4, 2023

resolves: #10017
follow-up to #9040

Problem

In #9040 (comment), @MichelleArk wondered aloud:

I wonder if it'd actually make more sense to do this deferral + merging business in the requires.manifest wrapper to centralize where manifest loading + writing happens.

Solution

Within requires.manifest: Add defer_relation to all relevant nodes right after manifest loading

Within RuntimeRefResolver: Handle logic for picking the target relation versus defer_relation, based on --favor-state and adapter cache checks (does the target relation exist in the expected location?)

This seems like a promising way to streamline much of this logic (for all tasks), and move it earlier (outside of tasks entirely).

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@cla-bot cla-bot bot added the cla:yes label Dec 4, 2023
Copy link
Contributor

github-actions bot commented Dec 4, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@jtcohen6 jtcohen6 force-pushed the jerco/refactor-refactor-deferral branch from fa16a8c to 7a3598f Compare December 4, 2023 04:10
core/dbt/cli/requires.py Outdated Show resolved Hide resolved

def message(self) -> str:
return f"Merged {self.num_merged} items from state (sample: {self.sample})"
# Removed A004: MergedFromState
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're no longer fully overwriting deferred manifest nodes with the stateful manifest's alternative, I didn't see much use in keeping this event around

Comment on lines 1624 to 1625
if "defer_relation" in node:
del node["defer_relation"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd excluded this field from serialization when it was just an implementation detail for the clone task, but now that it's more generally applicable, I think it makes sense to include in serialized manifest.json.

On the other hand, I think it makes sense to remove the deferred: boolean field.

@@ -181,10 +180,6 @@ def test_run_and_defer(self, project, unique_schema, other_schema):
assert other_schema not in results[0].node.compiled_code
assert unique_schema in results[0].node.compiled_code

with open("target/manifest.json") as fp:
data = json.load(fp)
assert data["nodes"]["seed.test.seed"]["deferred"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're no longer resolving & writing this boolean field. Every node gets a defer_relation (if appropriate), and the determination of whether or not to use it is made at runtime within RuntimeRefResolver

Copy link

codecov bot commented Dec 4, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.14%. Comparing base (4cb6d47) to head (9db16a8).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9199      +/-   ##
==========================================
- Coverage   88.14%   88.14%   -0.01%     
==========================================
  Files         181      181              
  Lines       22702    22700       -2     
==========================================
- Hits        20011    20008       -3     
- Misses       2691     2692       +1     
Flag Coverage Δ
integration 85.45% <100.00%> (+<0.01%) ⬆️
unit 62.36% <73.91%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -342,7 +342,6 @@ class ParsedNode(NodeInfoMixin, ParsedNodeMandatory, SerializableType):
docs: Docs = field(default_factory=Docs)
patch_path: Optional[str] = None
build_path: Optional[str] = None
deferred: bool = False
Copy link
Contributor Author

@jtcohen6 jtcohen6 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I don't think this field is necessary anymore. However:

  • Removing this does require updating a lot of tests
  • It's exactly the sort of breaking change (removing a field) that we said we would try to avoid doing for artifacts in v1.8+...

@@ -226,6 +223,7 @@ def test_clone_same_target_and_state(self, project, unique_schema, other_schema)

clone_args = [
"clone",
"--defer",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like currently, dbt clone only needs to pass --state (or --defer-state), but not the --defer flag. The behavior it's performing is very much analogous to deferral, though.

This does not feel clear in our docs: https://docs.getdbt.com/reference/commands/clone

I'd be okay with requiring it from now on. The alternative is something gross like this in requires.manifest:

if flags.defer or flags.which == 'clone':

@jtcohen6 jtcohen6 requested a review from MichelleArk December 4, 2023 05:41
@jtcohen6 jtcohen6 added the Skip Changelog Skips GHA to check for changelog file label Dec 4, 2023
@jtcohen6 jtcohen6 marked this pull request as ready for review December 4, 2023 05:41
@jtcohen6 jtcohen6 requested review from a team as code owners December 4, 2023 05:41
@jtcohen6 jtcohen6 requested a review from McKnight-42 December 4, 2023 05:41
Base automatically changed from jerco/7965-refactor-deferral to main January 17, 2024 09:24
@jtcohen6 jtcohen6 force-pushed the jerco/refactor-refactor-deferral branch from 5963a9a to 4f1fa63 Compare January 31, 2024 00:53
@jtcohen6 jtcohen6 removed the Skip Changelog Skips GHA to check for changelog file label Jan 31, 2024
@jtcohen6 jtcohen6 changed the title Move deferral from tasks to requires.manifest + RefResolver Move deferral from tasks to parse_manifest + RuntimeRefResolver Jan 31, 2024
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Feb 19, 2024

One failing test TestUnitTestDeferState::test_unit_test_defer_state which will be resolved by dbt-labs/dbt-adapters#94.

At the same time, I got myself into quite a tricky loop trying to debug an edge case with dbt unit tests, until I realized that this PR should actually fix that edge case:

  • Say I have model_a -> model_b and a unit test defined on model_b
  • If I rename a column in model_a, model_b, and the unit test definition, and I build model_a but not model_b in my dev schema, then the unit test will pass without --defer and yet fail with --defer.
  • Why? Because status quo deferral behavior completely rewrites the node in the manifest, meaning that the unit test definition would be using the production version of model_b.raw_code rather than the updated development version.
  • The present PR fixes this by simply adding model_b.defer_relation, rather than overwriting the model_b node entry.

@MichelleArk MichelleArk force-pushed the jerco/refactor-refactor-deferral branch from 69ee8b6 to 6f1ad1e Compare April 23, 2024 21:41
@MichelleArk MichelleArk added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label May 1, 2024
@MichelleArk MichelleArk requested a review from vadim82 May 1, 2024 20:36
@@ -181,7 +181,6 @@ class ParsedResource(ParsedResourceMandatory):
docs: Docs = field(default_factory=Docs)
patch_path: Optional[str] = None
build_path: Optional[str] = None
deferred: bool = False
Copy link
Contributor

@MichelleArk MichelleArk May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming this is a minor change and safe to make here from a runtime serialization/deserialization perspective 👍 Independent unit tests here: #10066

Additionally, we're not using this field for anything functional in dbt-core's implementation, it was previously just accessed in testing to assert certain behaviour had occurred during deferral.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving minor schema evolution changes

@jtcohen6 jtcohen6 force-pushed the jerco/refactor-refactor-deferral branch from 942ae58 to 8e84df8 Compare May 1, 2024 22:14
@jtcohen6 jtcohen6 force-pushed the jerco/refactor-refactor-deferral branch from 8e84df8 to 6b63e31 Compare May 1, 2024 22:21
write_perf_info: bool,
write: bool,
write_json: bool,
) -> Manifest:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before 6b63e31, this was where I was calling merge_from_artifact. Now this file diff is just adding type annotations and an inline comment.

@@ -454,10 +454,10 @@ def print_results_line(self, results, execution_time) -> None:

def before_run(self, adapter, selected_uids: AbstractSet[str]) -> None:
with adapter.connection_named("master"):
self.defer_to_manifest()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this method is now to add defer_relation to all nodes in the project.

Previously, we needed to know at this point which objects exist in the target database locations (which requires the cache). Now that's cache check is happening within RuntimeRefResolver, meaning that we can consistently call defer_to_manifest before populate_adapter_cache in every task.

@@ -479,8 +477,8 @@ def populate_adapter_cache(

def before_run(self, adapter, selected_uids: AbstractSet[str]):
with adapter.connection_named("master"):
self.defer_to_manifest()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on RunTask.before_run

@jtcohen6 jtcohen6 changed the title Move deferral from tasks to parse_manifest + RuntimeRefResolver Move deferral resolution from merge_from_artifact to RuntimeRefResolver May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] deferral for unit test when modifying upstream and doing a partial build
3 participants