Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Allow more PENDING jobs to be scheduled concurrently (1.4x faster) #4311

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Nov 9, 2024

Follow up on #4310, we now allow 2 PENDING jobs to be scheduled concurrently, and it can get to full 32 simultaneous jobs for 1-min jobs. (> 1.4x faster)
Note: this will break the FIFO order a bit, i.e. at most one later job can be scheduled earlier than a earlier job.

We can increase the concurrent ray job submission, but it will lead to:

  1. Breaks the FIFO order, i.e. the more concurrent ray job submission the more jobs may be scheduled in non-FIFO order.
  2. higher memory consumption -- submitted ray jobs will consume memory
257  sky-cmd  4 mins ago      -               -         1x[CPU:1+]  PENDING    ~/sky_logs/sky-2024-11-09-09-15-53-466777  
256  sky-cmd  4 mins ago      a few secs ago  8s        1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-52-769114  
255  sky-cmd  4 mins ago      a few secs ago  8s        1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-51-749071  
254  sky-cmd  4 mins ago      a few secs ago  10s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-50-819021  
253  sky-cmd  4 mins ago      a few secs ago  10s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-49-438118  
252  sky-cmd  4 mins ago      13 secs ago     13s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-48-685777  
251  sky-cmd  4 mins ago      13 secs ago     13s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-48-208294  
250  sky-cmd  4 mins ago      16 secs ago     16s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-47-751779  
249  sky-cmd  4 mins ago      16 secs ago     16s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-47-080846  
248  sky-cmd  4 mins ago      19 secs ago     19s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-46-371702  
247  sky-cmd  4 mins ago      19 secs ago     19s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-45-185725  
246  sky-cmd  4 mins ago      22 secs ago     22s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-44-439899  
245  sky-cmd  4 mins ago      22 secs ago     22s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-43-138175  
244  sky-cmd  4 mins ago      25 secs ago     25s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-42-233891  
243  sky-cmd  4 mins ago      25 secs ago     25s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-41-872538  
242  sky-cmd  4 mins ago      28 secs ago     28s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-41-343198  
241  sky-cmd  4 mins ago      28 secs ago     28s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-40-807990  
240  sky-cmd  4 mins ago      31 secs ago     31s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-40-159020  
239  sky-cmd  4 mins ago      31 secs ago     31s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-38-953233  
238  sky-cmd  4 mins ago      33 secs ago     33s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-38-661534  
237  sky-cmd  4 mins ago      33 secs ago     33s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-36-493768  
236  sky-cmd  4 mins ago      37 secs ago     37s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-36-027661  
235  sky-cmd  4 mins ago      37 secs ago     37s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-35-079209  
234  sky-cmd  4 mins ago      39 secs ago     39s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-34-933620  
233  sky-cmd  4 mins ago      40 secs ago     40s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-33-983345  
232  sky-cmd  4 mins ago      42 secs ago     42s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-33-948978  
231  sky-cmd  4 mins ago      42 secs ago     42s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-32-272653  
230  sky-cmd  4 mins ago      45 secs ago     45s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-31-713130  
229  sky-cmd  5 mins ago      45 secs ago     45s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-29-825658  
228  sky-cmd  5 mins ago      48 secs ago     48s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-28-983373  
227  sky-cmd  5 mins ago      48 secs ago     48s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-28-049120  
226  sky-cmd  5 mins ago      51 secs ago     51s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-27-978491  
225  sky-cmd  5 mins ago      51 secs ago     51s       1x[CPU:1+]  RUNNING    ~/sky_logs/sky-2024-11-09-09-15-26-983769  
224  sky-cmd  5 mins ago      1 min ago       1m        1x[CPU:1+]  SUCCEEDED  ~/sky_logs/sky-2024-11-09-09-15-26-693400 

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@Michaelvll Michaelvll marked this pull request as draft November 9, 2024 09:24
@Michaelvll Michaelvll changed the title [Core] Allow two PENDING jobs to be scheduled concurrently [Core] Allow two PENDING jobs to be scheduled concurrently (1.4x faster) Nov 9, 2024
@Michaelvll Michaelvll changed the title [Core] Allow two PENDING jobs to be scheduled concurrently (1.4x faster) [Core] Allow more PENDING jobs to be scheduled concurrently (1.4x faster) Nov 9, 2024
@Michaelvll
Copy link
Collaborator Author

We should think of the tradeoff of losing the strict FIFO vs the time spend for scheduling, especially that #4318 has already significantly speed up the job scheduling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant