Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make existing node into an cluster #2542

Closed
wisechat-eng opened this issue Sep 11, 2023 · 12 comments
Closed

Make existing node into an cluster #2542

wisechat-eng opened this issue Sep 11, 2023 · 12 comments

Comments

@wisechat-eng
Copy link

I have started a few existing nodes in GCP, and I want to make it into cluster so I can use SkyPilot to control. However not sure what is the best way to configure the existing nodes.

@romilbhardwaj
Copy link
Collaborator

Hi @wisechat-eng - we currently do not support using SkyPilot to manage externally created VMs in GCP. Can you share a bit more about your use case and why you want to manage existing nodes instead of creating new ones with SkyPilot?

@wisechat-eng
Copy link
Author

Hi, we have a lot of code and environment already set up in the existing node. I want to migrate to have skypilot to manage the note using cluster. Instead of create a new one, I'm wondering if I can just reuse the old one, but have skypilot to connect it as a cluster

@Michaelvll
Copy link
Collaborator

Michaelvll commented Sep 11, 2023

Hey @wisechat-eng, one quick way to get around this is to clone your original node with the following steps:

  1. Create a machine image for your existing node.
  2. Launch a skypilot cluster with that machine image by:
sky launch --cloud gcp -c <cluster-name> \
    --image-id projects/<your-project-id>/global/machineImages/<your-machine-image-name> \
    --instance-type <instance-type-of-your-original-instance>

With the command above, you will clone a node from your previous node, and all the code / environment will be preserved in the new cluster managed by skypilot, and you can then safely delete your old node.

@wisechat-eng
Copy link
Author

make sense, thanks Michael. Another use case is that, me and my collegue may want to use SkyPilot to control the same node. Somehow if it is created from my side, he will not see it as a cluster

@Michaelvll
Copy link
Collaborator

We don't officially support sharing a SkyPilot cluster across users, but a workaround will be that you share your ~/.sky folder with your colleague, and add your colleague's ~/.ssh/sky-key.pub into the ~/.ssh/authorized_keys on your node.

Hopefully, both of you will be able to see the same cluster in your sky status and operate the cluster with skypilot. Please keep in mind that this is possible to cause some undefined behavior, although I think it should work.

@concretevitamin
Copy link
Member

@wisechat-eng To understand more, what's the reason that you desire to share a cluster? Is it due to cost/quota reasons (where launching a new node is not ideal), or is it for collaboration/debugging?

@bastienjalbert
Copy link

@concretevitamin I'm not wisechat, but honestly both use cases are to consider.

An other use case could be having a Databricks cluster with already provisioned GPU and want to use it with Skypilot cluster capabilities instead of create standalone VMs to achieve tasks. Skypilot would run as agent on target runtime and usable from local command line this way.

@asaiacai
Copy link
Contributor

@concretevitamin I'm working with a couple of startups that are working with bare metal machines and they would like to be able to use it through the SkyPilot interface. Right now they either launch using slurm or by logging into each node separately, but slurm is quite bloated for this use case and is annoying to maintain, so something that only requires existing SSH access to create launch the Skypilot runtime would be really attractive I think.

@romilbhardwaj
Copy link
Collaborator

@asaiacai - that's interesting. If the bare metal machines are self-owned/long-term rentals, have you considered deploying Kubernetes on them and then using SkyPilot + Kubernetes support? Kubernetes' extensive support for devops and observability tooling keeps the ops teams happy, while SkyPilot can support ML engineers who do not need to deal with Kubernetes APIs.

This comment was marked as outdated.

@github-actions github-actions bot added the Stale label Apr 22, 2024

This comment was marked as outdated.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2024
@Michaelvll Michaelvll removed the Stale label Sep 17, 2024
@Michaelvll
Copy link
Collaborator

Re-open this issue. Related to #3926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants