You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(I wanted to drop some notes/thoughts on the UDF dependency handling discussion we had earlier.)
Current approach to do automatic UDF dependency handling (initiated from #237) is to do a pip install in the driver and share these dependencies with the executors through a shared "job folder", as a file tree, or as a ZIP (as added with #845).
Important assumption to get this working is this shared "job folder" between driver and executors. However, given recent problems, I'm starting to doubt if it's safe and future-proof to assume that this job folder will/can be shared and timely synced across driver and executors.
Should we consider alternative approaches? such as
create ZIP on driver and share with executors with explicit S3 link (instead of indirect S3-mounts)
just drop the driver-executor sharing idea altogether and do the full pip install automatically on each executor separately
Maybe that last option is a good idea to have anyway, as fallback that's not very efficient, but at least always works.
(I wanted to drop some notes/thoughts on the UDF dependency handling discussion we had earlier.)
Current approach to do automatic UDF dependency handling (initiated from #237) is to do a
pip install
in the driver and share these dependencies with the executors through a shared "job folder", as a file tree, or as a ZIP (as added with #845).Important assumption to get this working is this shared "job folder" between driver and executors. However, given recent problems, I'm starting to doubt if it's safe and future-proof to assume that this job folder will/can be shared and timely synced across driver and executors.
Should we consider alternative approaches? such as
pip install
automatically on each executor separatelyMaybe that last option is a good idea to have anyway, as fallback that's not very efficient, but at least always works.
refs:
cc @jdries @EmileSonneveld
The text was updated successfully, but these errors were encountered: