feat: Add support for cloud object storage (e.g. s3) based shuffle #48

andygrove · 2024-11-17T17:44:54Z

Follows on from #47

Closes #46

Description

Shuffle files are still written to local disk in ShuffleWriterExec, but they are then uploaded to object storage
ShuffleReaderExec then downloads the shuffle files from object storage and deletes them

edmondop · 2024-11-18T14:18:23Z

datafusion_ray/context.py


        graph = self.ctx.plan(execution_plan)
        final_stage_id = graph.get_final_query_stage().id()
-        partitions = schedule_execution(graph, final_stage_id, True)
+        # serialize the query stages and store in Ray object store
+        query_stages = [


I am a little puzzled, are these the query_stages or the serialized execution plans?

these are the serialized execution plans for each query stage:

graph.get_query_stage(i).get_execution_plan_bytes()

edmondop reviewed Nov 18, 2024

View reviewed changes

andygrove changed the title ~~feat: Add support for object storage based shuffle~~ feat: Add support for cloud object storage (e.g. s3) based shuffle Nov 18, 2024

Use object store to transfer shuffle files between writers and readers

96495b7

andygrove force-pushed the minio branch from 7d08c06 to 96495b7 Compare November 19, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for cloud object storage (e.g. s3) based shuffle #48

feat: Add support for cloud object storage (e.g. s3) based shuffle #48

andygrove commented Nov 17, 2024 •

edited

Loading

edmondop Nov 18, 2024

andygrove Nov 18, 2024

feat: Add support for cloud object storage (e.g. s3) based shuffle #48

Are you sure you want to change the base?

feat: Add support for cloud object storage (e.g. s3) based shuffle #48

Conversation

andygrove commented Nov 17, 2024 • edited Loading

Description

edmondop Nov 18, 2024

Choose a reason for hiding this comment

andygrove Nov 18, 2024

Choose a reason for hiding this comment

andygrove commented Nov 17, 2024 •

edited

Loading