Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test dynamic outputs + partitions #26725

Closed

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Dec 26, 2024

Summary & Motivation

Proof of concept of current interoperability behavior between dynamic outputs and dynamic partitions. It actually works, I think this would make an interesting / useful github discussion.

There's some hackiness to this approach currently though - for example, the graph-backed asset defines an implicit output, which means that you need to return an output at the end of the line in the graph - in the case of accumulating dynamic partitions; it might not always make sense to return an output in terms of partition keys at all. I think in that case you're left in a weird state:

  • our default UPathIOManagers do not support handling multiple partition keys (maybe this works in some cases if you yield outputs individually? more test cases needed)
  • you have an accumulated value that represents all the partitions; you want to opt out of IO managers entirely.

How I Tested These Changes

New test

Copy link
Contributor Author

dpeng817 commented Dec 26, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

@dpeng817 dpeng817 marked this pull request as ready for review December 26, 2024 18:03
@gibsondan gibsondan requested review from smackesey and removed request for gibsondan December 26, 2024 18:12
@gibsondan
Copy link
Member

added sean and removed myself, I don't really have context on this one

@dpeng817 dpeng817 force-pushed the dpeng817/test_dynamic_outputs_and_assets_behavior branch from ef247df to 28c74fd Compare December 27, 2024 16:20
Comment on lines +257 to +258
assert result._output_capture == { # noqa: SLF001
StepOutputHandle(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres gotta be a better way to express this without using _output_capture or StepOutputHandle

Copy link
Contributor Author

@dpeng817 dpeng817 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_for_node (the current best way of retrieving in-memory outputs) does not support mapping_key right now. We should definitely add support for that but didn't want to do so in this PR

return collect_fruit_vals(vals.collect())

with instance_for_test() as instance:
instance.add_dynamic_partitions(dg._check.not_none(partitions_def.name), partitions) # noqa: SLF001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we are adding the dynamic partitions before executing anything. In a real deployment this would happen where, in a sensor that then requests this run?

Is there a similar but slightly different example that we would want to have that calculates and adds the dynamic partitions as part of the run? Is that possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a real deployment it would happen in a sensor. I see the utility of having a run that calculates and adds the dynamic partitions as well as processing, but it is not currently possible. For a run to be partitioned (that is, for all the partitioned properties to be filled out), you need to know the partition keys ahead of time (at least right now)

@dpeng817 dpeng817 requested a review from alangenfeld December 31, 2024 18:38
@dpeng817 dpeng817 closed this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants