Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize docker initialization #3823

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

camelop
Copy link

@camelop camelop commented Aug 9, 2024

Reduce ssh interaction round number on the happy path from 14 to 5 for docker initialization.
Related issue: #3661

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below) - No
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_docker_preinstalled_package
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR @camelop! This is awesome. Left several comments. : )

sky/provision/docker_utils.py Show resolved Hide resolved
if [[ $docker_installed -eq 1 ]]
then
# wait for docker daemon to be ready
docker_ready=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this to docker_socket_ready?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, renamed.


# SkyPilot: Docker login if user specified a private docker registry.
cmd_login = ''
if 'docker_login_config' in self.docker_config:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine the following into the same call as the _check_host_status?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am wondering as we may get a longer commands to run, will it be more readable if we explicitly create a enum as a state machine to indicate the stage we are in, and we can do post processing based on the state.

Copy link
Author

@camelop camelop Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's combined to the next "action" call.
Currently it is following the structure of

  1. remote read-only status checking with sh
  2. local operation decision making in Python
  3. remote action

To further save SSH calls, I would recommend completely refactor the execution structure into the following:

  1. gather all local Python execution demands and render an execution script with it (e.g. with templating from Jinja2, similar to https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_templating.html ). The script should potentially execute different branches depending on status on the target machine.
  2. once the script is rendered ready, send it to remote for execution
  3. gather the result in one round to summarize whether the change has been successfully replied

I would expect such change to be a global coding style / design pattern change, so not sure if it is a good fit for a minor opt PR like this.

Comment on lines +255 to +261
if 'container_status' in status:
if 'Exited' in status['container_status']:
logger.info('Container is exited but not removed.')
self.initialized = True
self._run(f'{self.docker_cmd} start {self.container_name}')
self._run('sudo service ssh start', run_env='docker')
return self._run('whoami', run_env='docker')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be integrated into the check_host_status command as well?

Copy link
Author

@camelop camelop Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not on the happy path though, but also follow the above design pattern (the action part in step 3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants