Skip to content

Commit

Permalink
Introduce lib for gardener-node-agent gardener#8023
Browse files Browse the repository at this point in the history
  • Loading branch information
vknabel committed Jul 19, 2023
1 parent 6d5d601 commit c0f85a5
Show file tree
Hide file tree
Showing 521 changed files with 73,708 additions and 2,487 deletions.
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* [Gardener Admission Controller](concepts/admission-controller.md)
* [Gardener Resource Manager](concepts/resource-manager.md)
* [Gardener Operator](concepts/operator.md)
* [Gardener Node Agent](concepts/node-agent.md)
* [Gardenlet](concepts/gardenlet.md)
* [Backup Restore](concepts/backup-restore.md)
* [etcd](concepts/etcd.md)
Expand Down
312 changes: 312 additions & 0 deletions docs/concepts/images/gardener-nodeagent-architecture.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 59 additions & 0 deletions docs/concepts/node-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Gardener Node Agent

The goal of the `gardener-node-agent` is to bootstrap a machine into a worker node and maintain node-specific components, which run on the node and are unmanaged by Kubernetes (e.g. the controller runtime, the kubelet service, ...).

It effectively is a Kubernetes controller deployed onto the worker node.

## Basic Design

In this section it is described how the `gardener-node-agent` works, what its responsibilities are and how it is installed onto the worker node.

To install the `gardener-node-agent` onto a worker node, there is a very small bash script called `gardener-node-init.sh`, which is installed on the node with cloud-init data. This script's sole purpose is downloading and starting the `gardener-node-agent`. The binary artifact is downloaded as an [OCI artifact](https://github.com/opencontainers/image-spec/blob/main/manifest.md), removing the `docker` dependency on a worker node. At the beginning, two architectures of the `gardener-node-agent` are supported: `amd64` and `x86`. In the same manner, the kubelet has to be provided as an OCI artifact.

Along with the init script, a configuration for the `gardener-node-agent` is carried onto the worker node at `/etc/gardener/node-agent.config`. This configuration contains things like the shoot's kube-apiserver endpoint, the according certificates to communicate with it, the bootstrap token for the kubelet, and so on.

In a bootstrapping phase, the `gardener-node-agent` sets itself up as a systemd service. It also executes tasks that need to be executed before any other components are installed, e.g. formatting the data device for the kubelet.

After the bootstrap phase, the `gardener-node-agent` runs a systemd service watching on secret resources located in the `kube-system` namespace. There is a secret resource that contains the `OperatingSystemConfig` to reconcile. The OSC secret exists for every worker group of the shoot cluster and is named accordingly. Applying the OSC finally installs the kubelet + configuration on the worker node.

## Architecture

![Design](./images/gardener-nodeagent-architecture.drawio.svg)

This figure visualizes the overall architecture of the `gardener-node-agent`. It starts with the downloader OSC being transferred through the userdata to a machine through the machine-controller-manager (MCM). The bootstrap phase of the `gardener-node-agent` will then happen as described in the previous section.

## Reasoning

The `gardener-node-agent` is a replacement for what was called the `cloud-config-downloader` and the `cloud-config-executor`, both written in `bash`. The `gardener-node-agent` gets rid of the sheer complexity of these two scripts, combined with scalability and performance issues urges their removal.

With the new Architecture we gain a lot, let's describe the most important gains here.

### Developer Productivity

Because we all develop in go day by day, writing business logic in `bash` is difficult, hard to maintain, almost impossible to test. Getting rid of almost all `bash` scripts which are currently in use for this very important part of the cluster creation process will enhance the speed of adding new features and removing bugs.

### Speed

Until now, the `cloud-config-downloader` runs in a loop every 60sec to check if something changed on the shoot which requires modifications on the worker node. This produces a lot of unneeded traffic on the api-server and wastes time, it will sometimes take up to 60sec until a desired modification is started on the worker node.
By using the controller-runtime we can watch for the `node`, the`OSC` in the `secret`, and the shoot-access-token in the `secret`. If any of these object changed, and only then, the required action will take effect immediately.
This will speed up operations and will reduce the load on the api-server of the shoot dramatically.

## Scalability

Actually the `cloud-config-downloader` add a random wait time before restarting the `kubelet` in case the `kubelet` was updated or a configuration change was made to it. This is required to reduce the load on the API server and the traffic on the internet uplink. It also reduces the overall downtime of the services in the cluster because every `kubelet` restart takes a node for several seconds into `NotReady` state which eventually interrupts service availability.

```
TODO: The `gardener-node-agent` could do this in a much intelligent way because it watches the `node` object. The gardenlet could add some annotation which tells the `gardener-node-agent` to wait for the kubelet in a coordinated manner. The coordination could be in chunks of nodes and wait for them to finish and then start with the next chunk. Also a equal time spread is possible.
```

Decision was made to keep the existing jitter mechanism which calculates the kubelet-download-and-restart-delay-seconds on the controller itself.

### Correctness

The configuration of the `cloud-config-downloader` is actually done by placing a file for every configuration item on the disk on the worker node. This was done because parsing the content of a single file and using this as a value in `bash` reduces to something like `VALUE=$(cat /the/path/to/the/file)`. Simple but lacks validation, type safety and whatnot.
With the `gardener-node-agent` we introduce a new API which is then stored in the `gardener-node-agent` `secret` and stored on disc in a single yaml file for comparison with the previous known state. This brings all benefits of type safe configuration.
Because actual and previous configuration are compared, removed files and units are also removed and stopped on the worker if removed from the `OSC`.

### Availability

Previously the `cloud-config-downloader` simply restarted the `systemd-units` on every change to the `OSC`, regardless which of the services changed. The `gardener-node-agent` first checks which systemd-unit was changed, and will only restart these. This will remove unneeded `kubelet` restarts.
41 changes: 29 additions & 12 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,20 @@ require (
github.com/onsi/gomega v1.27.6
github.com/prometheus/client_golang v1.14.0
github.com/robfig/cron v1.2.0
github.com/spf13/cobra v1.6.1
github.com/spf13/cobra v1.7.0
github.com/spf13/pflag v1.0.5
github.com/spf13/viper v1.11.0
github.com/texttheater/golang-levenshtein v1.0.1
go.uber.org/automaxprocs v1.5.1
go.uber.org/goleak v1.2.0
go.uber.org/zap v1.24.0
golang.org/x/crypto v0.6.0
golang.org/x/text v0.8.0
golang.org/x/text v0.9.0
golang.org/x/time v0.3.0
golang.org/x/tools v0.7.0
golang.org/x/tools v0.8.0
gomodules.xyz/jsonpatch/v2 v2.2.0
gonum.org/v1/gonum v0.12.0
google.golang.org/protobuf v1.28.1
google.golang.org/protobuf v1.30.0
istio.io/api v0.0.0-20230217221049-9d422bf48675
istio.io/client-go v1.17.1
k8s.io/api v0.26.3
Expand Down Expand Up @@ -69,17 +69,28 @@ require (
)

require (
github.com/BurntSushi/toml v1.0.0 // indirect
github.com/google/go-containerregistry v0.15.2
github.com/spf13/afero v1.8.2
golang.org/x/exp v0.0.0-20230213192124-5e25df0256eb
)

require (
github.com/BurntSushi/toml v1.2.1 // indirect
github.com/Masterminds/goutils v1.1.1 // indirect
github.com/NYTimes/gziphandler v1.1.1 // indirect
github.com/antlr/antlr4/runtime/Go/antlr v1.4.10 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/blang/semver/v4 v4.0.0 // indirect
github.com/cenkalti/backoff/v4 v4.1.3 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/containerd/stargz-snapshotter/estargz v0.14.3 // indirect
github.com/coreos/go-semver v0.3.0 // indirect
github.com/cyphar/filepath-securejoin v0.2.2 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/docker/cli v23.0.5+incompatible // indirect
github.com/docker/distribution v2.8.1+incompatible // indirect
github.com/docker/docker v23.0.5+incompatible // indirect
github.com/docker/docker-credential-helpers v0.7.0 // indirect
github.com/dsnet/compress v0.0.1 // indirect
github.com/elazarl/goproxy v0.0.0-20191011121108-aa519ddbe484 // indirect
github.com/emicklei/go-restful/v3 v3.10.1 // indirect
Expand All @@ -99,6 +110,7 @@ require (
github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572 // indirect
github.com/gobuffalo/flect v0.3.0 // indirect
github.com/gobwas/glob v0.2.3 // indirect
github.com/godbus/dbus/v5 v5.0.4 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/golang/snappy v0.0.4 // indirect
Expand All @@ -112,22 +124,26 @@ require (
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/huandu/xstrings v1.3.2 // indirect
github.com/imdario/mergo v0.3.12 // indirect
github.com/inconshreveable/mousetrap v1.0.1 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.16.5 // indirect
github.com/magiconair/properties v1.8.6 // indirect
github.com/mailru/easyjson v0.7.6 // indirect
github.com/mattn/go-colorable v0.1.12 // indirect
github.com/mattn/go-isatty v0.0.14 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.2 // indirect
github.com/mitchellh/copystructure v1.2.0 // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/mitchellh/mapstructure v1.4.3 // indirect
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/moby/spdystream v0.2.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/nwaples/rardecode v1.1.2 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.1.0-rc3 // indirect
github.com/pelletier/go-toml v1.9.4 // indirect
github.com/pelletier/go-toml/v2 v2.0.0-beta.8 // indirect
github.com/pierrec/lz4 v2.6.1+incompatible // indirect
Expand All @@ -136,12 +152,13 @@ require (
github.com/prometheus/common v0.37.0 // indirect
github.com/prometheus/procfs v0.8.0 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/spf13/afero v1.8.2 // indirect
github.com/sirupsen/logrus v1.9.0 // indirect
github.com/spf13/cast v1.4.1 // indirect
github.com/spf13/jwalterweatherman v1.1.0 // indirect
github.com/stoewer/go-strcase v1.2.0 // indirect
github.com/subosito/gotenv v1.2.0 // indirect
github.com/ulikunitz/xz v0.5.10 // indirect
github.com/vbatts/tar-split v0.11.3 // indirect
github.com/xi2/xz v0.0.0-20171230120015-48954b6210f8 // indirect
go.etcd.io/etcd/api/v3 v3.5.5 // indirect
go.etcd.io/etcd/client/pkg/v3 v3.5.5 // indirect
Expand All @@ -158,12 +175,12 @@ require (
go.opentelemetry.io/proto/otlp v0.19.0 // indirect
go.uber.org/atomic v1.9.0 // indirect
go.uber.org/multierr v1.7.0 // indirect
golang.org/x/mod v0.9.0 // indirect
golang.org/x/net v0.8.0 // indirect
golang.org/x/oauth2 v0.4.0 // indirect
golang.org/x/mod v0.10.0 // indirect
golang.org/x/net v0.9.0 // indirect
golang.org/x/oauth2 v0.7.0 // indirect
golang.org/x/sync v0.1.0 // indirect
golang.org/x/sys v0.6.0 // indirect
golang.org/x/term v0.6.0 // indirect
golang.org/x/sys v0.7.0 // indirect
golang.org/x/term v0.7.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20230110181048-76db0878b65f // indirect
google.golang.org/grpc v1.53.0 // indirect
Expand Down
Loading

0 comments on commit c0f85a5

Please sign in to comment.