[data] Support yield
in flat_map
#49900
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P2
Important issue, but not time-critical
Description
Currently, only
map_batches
supports generators, but it would be really useful formap
to also support generators as well.Use case
I have a use case where it is easier to express logic as
map
(i.e., input row: list of files).However, if I expand the list of files, i will end up with an explosion of data (each file is 5GB, list of 100 is 500GB).
So ideally I can yield the output along the way to avoid blowing up heap, and not starving the rest of the pipeline
The text was updated successfully, but these errors were encountered: