Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installation of openfl director keeps failing (k8s deployment) #36

Open
kta-intel opened this issue Mar 29, 2023 · 4 comments
Open

installation of openfl director keeps failing (k8s deployment) #36

kta-intel opened this issue Mar 29, 2023 · 4 comments

Comments

@kta-intel
Copy link

I am trying to deploy OpenFL on FedLCM. I set LIFECYCLEMANAGER_EXPERIMENT_ENABLED to true in the k8s_deploy.yaml for the backend.

I followed the instructions listed here: https://github.com/FederatedAI/FedLCM/blob/main/doc/OpenFL_Guide.md but the installation of the director keeps failing. I am unsure how to troubleshoot. Do you have any insights, or do you have advice for setting the director parameters?

This is the error description:

failed to install openfl director, error: job is Failed, job info: &{93231661-4d62-49a6-88d0-50fd70788bc8 2023-03-29 21:32:39.121 +0000 UTC 0001-01-01 00:00:00 +0000 UTC ClusterInstall ef50e111-9122-4a41-b22d-eac5525862b9 admin map[director:{director Running Undefined 2023-03-29 21:32:39.121 +0000 UTC 0001-01-01 00:00:00 +0000 UTC} notebook:{notebook Running Undefined 2023-03-29 21:32:39.121 +0000 UTC 0001-01-01 00:00:00 +0000 UTC}] Failed 1h0m0s 0xc0005602a0 [update job status to Running create Cluster in DB Success overwrite current installation helm install Success checkout Cluster status [3362] checkout Cluster status timeOut!] {3 2023-03-29 21:32:39.122 +0000 UTC 2023-03-29 22:32:40.015 +0000 UTC {0001-01-01 00:00:00 +0000 UTC false}}}

Thank you.

@wfangchi
Copy link
Collaborator

Thanks for using FedLCM's OpenFL support! This is mostly likely because we currently keep the FedLCM's OpenFL container image in a private registry. We are exploring options on how to make it public, which can be discussed during OpenFL's community meeting next week. I will update here once we reach a decision.

@kta-intel
Copy link
Author

Great, thanks for the information and I appreciate your quick response! I will await your update.
In the meantime, I actually support OpenFL efforts from the Intel side, so let me know if there's anything I may to do help in this process

@wfangchi
Copy link
Collaborator

The OpenFL community kindly offered to create a new org account in Docker Hub where we can host our images. We will do that once the org account is set up.

In the mean time, we provide a way to build the image locally and use any customized registries. You can use the current develop-v0.3.0 branch to test the FedLCM and follow this section to build the image: https://github.com/FederatedAI/FedLCM/blob/develop-0.3.0/doc/OpenFL_Guide.md#preparing-the-fedlcm-openfl-image-locally--using-you-own-registry . This image is based on OpenFL v1.5 release that must be used by FedLCM v0.3 (which is currently under development but the OpenFL support is completed.)

If you are interested you can have a try with this approach. Or when v0.3.0 released, we will upload the image and use the future OpenFL Docker Hub org account as the default registry address.

@kta-intel
Copy link
Author

Sorry, I somehow missed your response. Thank you so much for your help. I will give this a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants