-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make --fast robust against credential or wheel updates #4289
Conversation
This used to be true, but since skypilot-org#2943, 'ray' is the only provisioner. Add other keys that are now present instead.
This is needed since some files in the fake file mounts don't actually exist, like the wheel path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @cg505! It looks mostly good to me. Left several comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @cg505! It looks mostly good to me! Left several comments, mostly nits.
Can we also add a smoke test for --fast
?
Co-authored-by: Zhanghao Wu <[email protected]>
Co-authored-by: Zhanghao Wu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates @cg505! It looks mostly good to me.
Co-authored-by: Zhanghao Wu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cg505! It looks mostly good to me. Left some comments mostly for readability and code style.
We should make sure the backward compatibility tests pass. Also, can we add a quick smoke test to test this --fast
.
Co-authored-by: Zhanghao Wu <[email protected]>
* [perf] use uv for venv creation and pip install (#4414) * Revert "remove `uv` from runtime setup due to azure installation issue (#4401)" This reverts commit 0b20d56. * on azure, use --prerelease=allow to install azure-cli * use uv venv --seed * fix backwards compatibility * really fix backwards compatibility * use uv to set up controller dependencies * fix python 3.8 * lint * add missing file * update comment * split out azure-cli dep * fix lint for dependencies * use runpy.run_path rather than modifying sys.path * fix cloud dependency installation commands * lint * Update sky/utils/controller_utils.py Co-authored-by: Zhanghao Wu <[email protected]> --------- Co-authored-by: Zhanghao Wu <[email protected]> * [Minor] README updates. (#4436) * [Minor] README touches. * update * update * make --fast robust against credential or wheel updates (#4289) * add config_dict['config_hash'] output to write_cluster_config * fix docstring for write_cluster_config This used to be true, but since #2943, 'ray' is the only provisioner. Add other keys that are now present instead. * when using --fast, check if config_hash matches, and if not, provision * mock hashing method in unit test This is needed since some files in the fake file mounts don't actually exist, like the wheel path. * check config hash within provision with lock held * address other PR review comments * rename to skip_if_no_cluster_updates Co-authored-by: Zhanghao Wu <[email protected]> * add assert details Co-authored-by: Zhanghao Wu <[email protected]> * address PR comments and update docstrings * fix test * update docstrings Co-authored-by: Zhanghao Wu <[email protected]> * address PR comments * fix lint and tests * Update sky/backends/cloud_vm_ray_backend.py Co-authored-by: Zhanghao Wu <[email protected]> * refactor skip_if_no_cluster_update var * clarify comment * format exception --------- Co-authored-by: Zhanghao Wu <[email protected]> * format * format * format * fix --------- Co-authored-by: Christopher Cooper <[email protected]> Co-authored-by: Zongheng Yang <[email protected]>
The --fast behavior is now always enabled. This was unsafe before but since \skypilot-org#4289 it should be safe. We will remove the flag before 0.8.0 so that it never touches a stable version. sky launch still has the --fast flag. This flag is unsafe because it could cause setup to be skipped even though it should be re-run. In the managed jobs case, this is not an issue because we fully control the setup and know it will not change.
The result is that --fast will reprovision the cluster if some important things change (such as new cloud credentials or a new version of the skypilot wheel).
However, this does cause some performance regression in the --fast case since we need to go through the initial provisioning stages. That costs about 4s on my machine. I am looking into optimizing this - it's mostly an unnecessary roundtrip checking that the cluster is up.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_minimal
pytest tests/test_smoke.py:: --managed-jobs
conda deactivate; bash -i tests/backward_compatibility_tests.sh