Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF dependency handling without "job folder" #941

Open
soxofaan opened this issue Nov 15, 2024 · 0 comments
Open

UDF dependency handling without "job folder" #941

soxofaan opened this issue Nov 15, 2024 · 0 comments
Labels
architecture CDSE Copernicus Data Space Ecosystem enhancement question

Comments

@soxofaan
Copy link
Member

(I wanted to drop some notes/thoughts on the UDF dependency handling discussion we had earlier.)

Current approach to do automatic UDF dependency handling (initiated from #237) is to do a pip install in the driver and share these dependencies with the executors through a shared "job folder", as a file tree, or as a ZIP (as added with #845).

Important assumption to get this working is this shared "job folder" between driver and executors. However, given recent problems, I'm starting to doubt if it's safe and future-proof to assume that this job folder will/can be shared and timely synced across driver and executors.

Should we consider alternative approaches? such as

  • create ZIP on driver and share with executors with explicit S3 link (instead of indirect S3-mounts)
  • just drop the driver-executor sharing idea altogether and do the full pip install automatically on each executor separately

Maybe that last option is a good idea to have anyway, as fallback that's not very efficient, but at least always works.

refs:

cc @jdries @EmileSonneveld

@soxofaan soxofaan added enhancement question architecture CDSE Copernicus Data Space Ecosystem labels Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture CDSE Copernicus Data Space Ecosystem enhancement question
Projects
None yet
Development

No branches or pull requests

1 participant