Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set dynamic allocation for parallel jobs #14

Merged
merged 1 commit into from
May 31, 2024
Merged

Conversation

Tianhao-Gu
Copy link
Collaborator

@Tianhao-Gu Tianhao-Gu commented May 31, 2024

Screenshot 2024-05-30 at 11 43 32 AM

Copy link

codecov bot commented May 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.57%. Comparing base (f53c262) to head (a982887).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #14   +/-   ##
=======================================
  Coverage   78.57%   78.57%           
=======================================
  Files           1        1           
  Lines          28       28           
=======================================
  Hits           22       22           
  Misses          6        6           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MrCreosote
Copy link
Member

Can you explain the purpose of this change? When I was running example jobs against spark they were using both workers without config so I'm wondering what this does

@Tianhao-Gu
Copy link
Collaborator Author

Can you explain the purpose of this change? When I was running example jobs against spark they were using both workers without config so I'm wondering what this does

it allows parallel job executions. I commented on it in the notebook bash script.

@MrCreosote
Copy link
Member

What happens if 2 people in 2 different notebooks run jobs?

@Tianhao-Gu
Copy link
Collaborator Author

What happens if 2 people in 2 different notebooks run jobs?

It should allow both jobs running.

@MrCreosote
Copy link
Member

It should allow both jobs running.

Are the changes here needed for that, or are they just to allow parallel jobs to run from a single notebook?

@Tianhao-Gu
Copy link
Collaborator Author

It should allow both jobs running.

Are the changes here needed for that, or are they just to allow parallel jobs to run from a single notebook?

Currently job execution is FIFO meaning any job will take all resources and all other jobs will be waiting.

@MrCreosote
Copy link
Member

Ok, so what happens if one notebook has parallel executors configured and the other doesn't and they both submit jobs?

@Tianhao-Gu
Copy link
Collaborator Author

Ok, so what happens if one notebook has parallel executors configured and the other doesn't and they both submit jobs?

I don’t know why it matters. Our cluster only has one notebook.

@MrCreosote
Copy link
Member

For now. I'm trying to understand how this works - it seems odd to me that the configuration is on the "client" rather than on the cluster itself

@Tianhao-Gu
Copy link
Collaborator Author

For now. I'm trying to understand how this works - it seems odd to me that the configuration is on the "client" rather than on the cluster itself

This setting can be configured either in master or driver. Typically, the configuration is set on the driver node, as it is the driver that requests resources from the cluster manager. It's also easier for us to config driver this way.

@MrCreosote
Copy link
Member

Ok, so back to the original question - what happens if 2 drivers request resources from the manager, and one has these settings and the other doesn't?

@Tianhao-Gu
Copy link
Collaborator Author

Ok, so back to the original question - what happens if 2 drivers request resources from the manager, and one has these settings and the other doesn't?

My best guess is if driver without this setting submit job first, it will take all resources same as of now. And driver with this setting will wait for the first job to finish. I didn't get your cornering regarding this. This is the easiest way to allow parallel jobs with our current stand alone setting.

@MrCreosote
Copy link
Member

I didn't get your cornering regarding this.

Not understanding you here

@Tianhao-Gu
Copy link
Collaborator Author

I didn't get your cornering regarding this.

Not understanding you here

^concerns

@MrCreosote
Copy link
Member

MrCreosote commented May 31, 2024

It's less concerns and more understanding how the settings work and what the implications are for the future when we presumably will have multiple notebooks running against the cluster simultaneously, and maybe even people firing up their own notebooks with their own settings

Copy link
Member

@MrCreosote MrCreosote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another one that seems fine for now but might need rethinking longer term

@Tianhao-Gu Tianhao-Gu merged commit baa5869 into main May 31, 2024
9 checks passed
@Tianhao-Gu Tianhao-Gu deleted the dev_dynamic_allocation branch May 31, 2024 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants