You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implete a reusable join operator, it supports to put multiple tables' data into the same hash table. So we could avoid building multiple hash tables, and save memory usage.
But it may not be an easy job, and we are not sure that its performance is well. A lot of work,😓
Make join spill adaptively
At persent, if the join operator's memory usage is not over a fixed memory limit, it will not spill out. It has no awareness of the memory usage of other operators.
We could make join operator spill out data when total remaining memory is not enough. This is should be done in CH.
Split joins into stages
We could limit to max size of joins that could put into the same stage. Even if we could balance the memory usage for each join operator, too many join operator in the same stage is not efficient.
Backend
CH (ClickHouse)
Bug description
Following query with multiple joins on the same keys, it will have mulitiple join operators in the same stage.
Join is memory-intensive operator, it's easy to cause OOM with multiple join operators in the same stage.
A related issue #8003
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered: