-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set dynamic allocation for parallel jobs #14
Conversation
Tianhao-Gu
commented
May 31, 2024
•
edited
Loading
edited
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #14 +/- ##
=======================================
Coverage 78.57% 78.57%
=======================================
Files 1 1
Lines 28 28
=======================================
Hits 22 22
Misses 6 6 ☔ View full report in Codecov by Sentry. |
Can you explain the purpose of this change? When I was running example jobs against spark they were using both workers without config so I'm wondering what this does |
it allows parallel job executions. I commented on it in the notebook bash script. |
What happens if 2 people in 2 different notebooks run jobs? |
It should allow both jobs running. |
Are the changes here needed for that, or are they just to allow parallel jobs to run from a single notebook? |
Currently job execution is FIFO meaning any job will take all resources and all other jobs will be waiting. |
Ok, so what happens if one notebook has parallel executors configured and the other doesn't and they both submit jobs? |
I don’t know why it matters. Our cluster only has one notebook. |
For now. I'm trying to understand how this works - it seems odd to me that the configuration is on the "client" rather than on the cluster itself |
This setting can be configured either in master or driver. Typically, the configuration is set on the driver node, as it is the driver that requests resources from the cluster manager. It's also easier for us to config driver this way. |
Ok, so back to the original question - what happens if 2 drivers request resources from the manager, and one has these settings and the other doesn't? |
My best guess is if driver without this setting submit job first, it will take all resources same as of now. And driver with this setting will wait for the first job to finish. I didn't get your cornering regarding this. This is the easiest way to allow parallel jobs with our current stand alone setting. |
Not understanding you here |
^concerns |
It's less concerns and more understanding how the settings work and what the implications are for the future when we presumably will have multiple notebooks running against the cluster simultaneously, and maybe even people firing up their own notebooks with their own settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another one that seems fine for now but might need rethinking longer term