-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for control plane highly available #260
Comments
It used to have MySQL, but it became somewhat difficult to maintain, so we kind of dropped MySQL support. But if you're trying to use it in k8s, I encourage you to use the garm-operator. The operator pretty much treats GARM as stateless and syncs the sqlite DB using the info it has stored in etcd. The current push to move some things from the config to the DB is being done in order to eventually have GARM scale-out. So scaling out GARM is on the TODO list and we're working towards that, but even in the current state, it handles a large amount of runners with ease. |
Great! thanks for the quick reply. I tried the k8s operator but I understood it would also require a garm instance outside of the cluster or being reachable. Is this correct? |
You can have GARM run inside k8s without a problem. Have a look here: https://github.com/mercedes-benz/garm-provider-k8s/blob/main/DEVELOPMENT.md The instructions use tilt to bootstrap a local development environment along with garm, the operator and the k8s provider. You can use that as a starting point and expand to other providers you may need. We need to add some proper docs in one place that gives a nice walk-through for the various cases. |
thanks for sharing that! would you say is the only thing and enough to start? I can improve docs once I get familiar with it |
That should bring up up and running with a fully functional GARM on k8s + operator. I usually run it as stand-alone, but I did manage to get it running using that guide. @bavarianbidi may be able to chime in with more details. His wonderful team develops the k8s integration (operator and provider) |
Are you using any specific commit? I can't get garm deployed.
Garm should be deployed according to step 3) in https://github.com/mercedes-benz/garm-provider-k8s/blob/main/DEVELOPMENT.md#getting-started 🤔 |
I used But other than that, I just installed, docker, kubectl, tilt, go and went through the steps. |
you can also edit the existing config map: kubectl -n garm-server edit configmap garm-configuration and add it. Then remove the failing containers. At the end you should have something like: root@garm-deleteme:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-5bd57786d4-jmwdj 1/1 Running 0 58m
cert-manager cert-manager-cainjector-57657d5754-89fwt 1/1 Running 0 58m
cert-manager cert-manager-webhook-7d9f8748d4-npk9b 1/1 Running 0 58m
garm-operator-system garm-operator-controller-manager-69fbd5c478-ctlqt 1/1 Running 0 47m
garm-server garm-server-5b84b7f66-r7mxp 1/1 Running 0 48m
kube-system coredns-5dd5756b68-g8k87 1/1 Running 0 58m
kube-system coredns-5dd5756b68-wzxwj 1/1 Running 0 58m
kube-system etcd-garm-control-plane 1/1 Running 0 59m
kube-system kindnet-7r7mh 1/1 Running 0 58m
kube-system kube-apiserver-garm-control-plane 1/1 Running 0 59m
kube-system kube-controller-manager-garm-control-plane 1/1 Running 0 59m
kube-system kube-proxy-jz67s 1/1 Running 0 58m
kube-system kube-scheduler-garm-control-plane 1/1 Running 0 59m
local-path-storage local-path-provisioner-7577fdbbfb-9bpx4 1/1 Running 0 58m
|
Thanks for the tip about configmap. I was able to fix that part. But now Im getting a different error:
I created a PAT (classical token) but not sure what's going on. I followed this: https://github.com/mercedes-benz/garm-operator/blob/main/DEVELOPMENT.md#%EF%B8%8F-bootstrap-garm-server-with-garm-provider-k8s-for-local-development. Did you use a Github App for authentication? |
I used PAT auth. Make sure that the PAT you're using has access to the org/repo/enterprise you're creating and that you enabled the required scopes when creating the PAT. See: https://github.com/cloudbase/garm/blob/main/doc/github_credentials.md |
ahh. I think I know what's happening. The operator is not yet updated to take into account the recent changes to GARM regarding the URLs. Try adding: webhook_url = "http://garm-server.garm-server.svc:9997/webhooks" here: If you can connect using garm-cli to the garm server, you can also update using |
I think it would be best if you switch garm to v0.1.4. The You can set |
also, to get webhooks from GitHub, you'll most likely need an ingress controller and a cluster IP set on the GARM server. Then you'll need to add your webhook in GitHub to point to your GARM webhook URL. See: https://github.com/cloudbase/garm/blob/v0.1.4/doc/webhooks.md |
that's right but would I need a webhook to have a pool of runners working? I'm not sure. BTW thanks to your help it worked! Im noticing these are configured as ephemeral runners by default:
Do you think we could have Github Apps supported? it looks like it's already from garm-server side there but we're missing some bits between the release of v0.1.5 and garm-provider-k8s |
You don't need webhooks for pools to work, but you do need them to know when to spin up a runner and when to delete it. Otherwise you'll have huge delays between when a job is started and when a runner is spun up. Github app support will probably be added once 0.1.5 is released, depending on how much time the nice folks from mercedes-benz have. |
GARM only spins up ephemeral runners. No persistent runners. |
in any case those runners spawn didn't run anything. Log:
They did registered to github.com but they were not able to run any workflow 😢 |
I thinknthe The upstream image: https://github.com/mercedes-benz/garm-provider-k8s/tree/main/runner/upstream To disable JIT, add: disable_jit_config = true In the provider section of the config: |
Context for the image: |
@pathcl you will most likely need to apply this patch as well: mercedes-benz/garm-provider-k8s#52 to build: cd garm-provider-k8s/runner/upstream
docker build -t localhost:5000/runner-default:latest .
docker push localhost:5000/runner-default:latest Then just apply the new image: kubectl -n garm-operator-system patch image runner-default --type=merge --patch '{"spec": { "tag": "localhost:5000/runner-default:latest"}}' And you should be fine with both JIT and registration token. |
Thanks! it worked now I can see idle runners. However I don't see jobs being picked up. I used
Did you have to change anything else? |
try targeting just: |
FYI, until you set up the webhook endpoint, GARM won't be able to autoscale. You'll still get some cleanup/min-idle-runners. But it will be only when GARM consolidates instead of reacting right away. |
I was finally able to run a workflow! thanks so much. Do we have docs for configuring webhook endpoint? at this point I only see two things in my setup
|
I'm expecting these runners to be ephemeral but it seems idle runners are not being recreated once they've been used. Shouldn't we have always some runners waiting for jobs? |
GARM doesn't know that the runner has finished running a job if webhooks don't work. They will eventually be reaped by the consolidation loop that looks in github and locally and kills used runners. Then the same consolidation loop will create missing runners based on If you set up your webhooks, this will happen automatically, right away. |
There are 2 ways to set up webhooks: in both cases, your webhook endpoint must be accessible by GitHub. |
You can access the GARM API directly by running the following steps: Get the GARM admin password: grep 'garm-password=' ~/garm-provider-k8s/hack/local-development/kubernetes/garm-operator-all.yaml | sed 's/.*=//g' Exec into the garm-server pod kubectl -n garm-server exec -it garm-server-5b84b7f66-rxxxp sh Replace the pod name with your own. Then, log into the GARM server using the GARM CLI: garm-cli profile add --name garm --password <your_garm_password> --url http://garm-server.garm-server.svc:9997/ --username admin then you can view info about your controller, install webhooks, etc: garm-cli controller-info show Make sure that the if your webhook url is already accessible by GitHub and your PAT allows webhook management, you can run garm-cli org webhook install <org_id> |
There is an explanation about the URLs here: https://github.com/cloudbase/garm/blob/main/doc/using_garm.md#controller-operations |
If you're using
This will allow you to use the same GARM instance with multiple providers like Azure, GCP, OpenStack, OCI, etc. |
Thanks for the detailed explanation! . By any chance have you tried garm k8s operator using runner image in a private registry? Im trying to figure if the Image crd needs imagePullSecrets |
I have not tried, but I see there is an issue open here: mercedes-benz/garm-provider-k8s#6 You might try to add a comment there with your use case. |
sorry, didn't follow the entire conversation here 🙈
@pathcl are there any other questions open regarding the |
Dear folks,
Im reading through garm codebase and already spotted there’s support for MySQL. Is it enough to configure garm as highly available control plane? my use case is on top of k8s.
The text was updated successfully, but these errors were encountered: