Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By hashing the config values and files, only restart services on differences #31

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 162 additions & 31 deletions charms/kubernetes_snaps.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
import contextlib
import hashlib
import ipaddress
import json
import logging
import os
import re
from base64 import b64encode
from functools import lru_cache
from pathlib import Path
from socket import getfqdn, gethostname
from subprocess import DEVNULL, CalledProcessError, call, check_call, check_output
Expand Down Expand Up @@ -100,7 +103,6 @@ def configure_apiserver(
enc_provider_config = encryption_config_path()
if enc_provider_config.exists():
api_opts["encryption-provider-config"] = str(enc_provider_config)

api_opts["advertise-address"] = advertise_address

api_opts["etcd-cafile"] = "/root/cdk/etcd/client-ca.pem"
Expand Down Expand Up @@ -192,7 +194,36 @@ def configure_apiserver(
else:
remove_if_exists(audit_webhook_conf_path)

configure_kubernetes_service("kube-apiserver", api_opts, extra_args_config)
config_files = set(
map(
api_opts.get,
(
"tls-cert-file",
"tls-private-key-file",
"kubelet-certificate-authority",
"kubelet-client-certificate",
"kubelet-client-key",
"authentication-token-webhook-config-file",
"service-account-signing-key-file",
"service-account-key-file",
"encryption-provider-config",
"etcd-cafile",
"etcd-keyfile",
"etcd-certfile",
"authorization-webhook-config-file",
"requestheader-client-ca-file",
"proxy-client-cert-file",
"proxy-client-key-file",
"client-ca-file",
"audit-policy-file",
"audit-webhook-config-file",
),
)
)
Comment on lines +196 to +221
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these keys point to config files that may have been overwritten during the reconcile period for the api server

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how do you know that you didn't miss one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know... that's what i'm SO afraid of. Missing one would be bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eaudetcobello do you have any ideas/opinions on how to not miss a file? I could "scan all the arguments looking for file like strings" but that sounds not great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think this could become unmanageable in the future and might come back to bite us. Looking at the API server options, there seem to be a few common patterns:

  • Options with the file suffix
  • Options with the config suffix
  • Options that include certificate in the name
  • Options that include key in the name

It might be a good idea to check if an option has any of these indicators before trying to process the file. Here's an example: https://regex101.com/r/m5Gq9m/1

Copy link
Member Author

@addyess addyess Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the full list of args that can be files are listed here:
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

Some with comments like:
File with
Path to
File containing
file used

And the hits just keep coming. The docs and arguments names aren't consistent. Maybe in the go code there's a list -- but with every k8s release -- more and more arguments may or may not be config files. Yes -- each config file will need to be ensured it gets added to the list and the regex could be a cool way, but there's no guarantee upstream k8s will follow it.

Copy link
Member Author

@addyess addyess Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there's another argument here which I expressly excluded because it's not a config file but a log file: audit-log-path

We do not want to watch for changes in this file b/c it's a log -- always changing means the api server reboots every time.

I think this list requires brain-power or some other comprehensive list of arguments which are absolutely treated as "read" by the api-server when restarted


configure_kubernetes_service(
"kube-apiserver", api_opts, extra_args_config, config_files
)


def configure_controller_manager(
Expand Down Expand Up @@ -231,10 +262,26 @@ def configure_controller_manager(

controller_opts["feature-gates"] = ",".join(feature_gates)

config_files = set(
map(
controller_opts.get,
(
"root-ca-file",
"kubeconfig",
"authorization-kubeconfig",
"authentication-kubeconfig",
"service-account-private-key-file",
"tls-cert-file",
"tls-private-key-file",
),
)
)

Comment on lines +264 to +278
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these keys point to config files that may have been overwritten during the reconcile period for the controller-manager

configure_kubernetes_service(
"kube-controller-manager",
controller_opts,
extra_args_config,
config_files,
)


Expand Down Expand Up @@ -286,10 +333,13 @@ def configure_kube_proxy(
f.write("# Generated by kubernetes-common library, do not edit\n")
yaml.dump(kube_proxy_config, f)

config_files = {
kube_proxy_opts["config"],
kubeconfig,
}
Comment on lines +335 to +338
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are config files that may have been overwritten during the reconcile period for the kube-proxy service


configure_kubernetes_service(
"kube-proxy",
kube_proxy_opts,
extra_args_config,
"kube-proxy", kube_proxy_opts, extra_args_config, config_files
)


Expand Down Expand Up @@ -375,38 +425,23 @@ def configure_kubelet(
f.write("# Generated by charm, do not edit\n")
yaml.dump(kubelet_config, f)

config_files = {
kubelet_opts["config"],
kubeconfig,
"/run/systemd/resolve/resolv.conf",
"/root/cdk/ca.crt",
"/root/cdk/server.crt",
"/root/cdk/server.key",
}

configure_kubernetes_service(
"kubelet",
kubelet_opts,
extra_args_config,
config_files,
)


def configure_kubernetes_service(service, base_args, extra_args_config):
extra_args = parse_extra_args(extra_args_config)

args = {}
args.update(base_args)
args.update(extra_args)

# TODO: CIS arg handling???
# CIS benchmark action may inject kv config to pass failing tests. Merge
# these after the func args as they should take precedence.
# cis_args_key = "cis-" + service
# cis_args = db.get(cis_args_key) or {}
# args.update(cis_args)

# Remove any args with 'None' values (all k8s args are 'k=v') and
# construct an arg string for use by 'snap set'.
args = {k: v for k, v in args.items() if v is not None}
args = ['--%s="%s"' % arg for arg in args.items()]
args = " ".join(args)

cmd = ["snap", "set", service, f"args={args}"]
check_call(cmd)
service_restart(f"snap.{service}.daemon")


Comment on lines -385 to -409
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is moved down the page...

def configure_scheduler(extra_args_config, kubeconfig):
kube_scheduler_config_path = "/root/cdk/kube-scheduler-config.yaml"
scheduler_opts = {}
Expand All @@ -433,7 +468,103 @@ def configure_scheduler(extra_args_config, kubeconfig):
with open(kube_scheduler_config_path, "w") as f:
yaml.safe_dump(scheduler_config, f)

configure_kubernetes_service("kube-scheduler", scheduler_opts, extra_args_config)
config_files = {
scheduler_opts["config"],
kubeconfig,
kube_scheduler_config_path,
}

configure_kubernetes_service(
"kube-scheduler", scheduler_opts, extra_args_config, config_files
)


@lru_cache
def _sha256_file(config_file):
addyess marked this conversation as resolved.
Show resolved Hide resolved
h = hashlib.sha256()
if Path(config_file).exists():
with open(config_file, "rb") as file:
while True:
# Reading is buffered, so we can read smaller chunks.
chunk = file.read(h.block_size)
if not chunk:
break
h.update(chunk)
return h
addyess marked this conversation as resolved.
Show resolved Hide resolved


def _dict_compare(d1, d2):
d1_keys = set(d1.keys())
d2_keys = set(d2.keys())
shared_keys = d1_keys.intersection(d2_keys)
added = d1_keys - d2_keys
removed = d2_keys - d1_keys
modified = {o: (d1[o], d2[o]) for o in shared_keys if d1[o] != d2[o]}
same = set(o for o in shared_keys if d1[o] == d2[o])
return added, removed, modified, same
addyess marked this conversation as resolved.
Show resolved Hide resolved


@contextlib.contextmanager
def _calculate_config_difference(service, args, config_files):
args_hash = _snap_common_path(service) / "args-hash.txt"

# gather existing config hash
current, updated = {}, {}
if args_hash.exists():
try:
current.update(yaml.safe_load(args_hash.open()))
except yaml.YAMLError:
log.warning("Failed to load existing args hash for %s", service)

# calculate new config hash
updated = {arg: hashlib.sha256(str(val).encode()) for arg, val in args.items()}
updated.update({f"path:{key}": _sha256_file(key) for key in config_files if key})
updated = {k: v.hexdigest() for k, v in updated.items()}

# discern differences
added, removed, modified, _ = _dict_compare(updated, current)
for key in added:
log.debug("New config value %s in test", key)
for key in removed:
log.debug("Dropped config value %s in test", key)
for key in modified:
log.debug("Updated config value %s in test", key)

different = added or removed or modified
yield different
if different:
yaml.safe_dump(updated, args_hash.open("w"))
else:
log.debug("No config changes detected for %s", service)


def configure_kubernetes_service(
service, base_args, extra_args_config, config_files=None
):
extra_args = parse_extra_args(extra_args_config)

args = {}
args.update(base_args)
args.update(extra_args)

# TODO: CIS arg handling???
# CIS benchmark action may inject kv config to pass failing tests. Merge
# these after the func args as they should take precedence.
# cis_args_key = "cis-" + service
# cis_args = db.get(cis_args_key) or {}
# args.update(cis_args)

# Remove any args with 'None' values (all k8s args are 'k=v') and
# construct an arg string for use by 'snap set'.

args = {k: v for k, v in args.items() if v is not None}
with _calculate_config_difference(service, args, config_files or {}) as restart:
if restart:
args = ['--%s="%s"' % arg for arg in args.items()]
args = " ".join(args)
cmd = ["snap", "set", service, f"args={args}"]
check_call(cmd)
service_restart(f"snap.{service}.daemon")


def configure_services_restart_always(control_plane=False):
Expand Down
84 changes: 80 additions & 4 deletions tests/unit/test_kubernetes_snaps.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import logging
import pytest
import yaml
import unittest.mock as mock

from charms import kubernetes_snaps
Expand Down Expand Up @@ -136,17 +138,17 @@ def test_configure_kubelet(
{},
{},
external_cloud,
"kubeconfig",
"/path/to/kubeconfig",
"node_ip",
"registry.io",
["taint:NoExecute"],
)
configure_kubernetes_service.assert_called_once()
service, args, extra = configure_kubernetes_service.call_args[0]
service, args, extra, config_files = configure_kubernetes_service.call_args[0]
assert service == "kubelet"
assert extra == {}
expected_args = {
"kubeconfig": "kubeconfig",
"kubeconfig": "/path/to/kubeconfig",
"v": "0",
"node-ip": "node_ip",
"container-runtime-endpoint": "container_runtime_endpoint",
Expand All @@ -157,6 +159,14 @@ def test_configure_kubelet(
if external_cloud.has_xcp:
expected_args["cloud-provider"] = "external"
assert expected_args == args
assert config_files == {
"/path/to/kubeconfig",
"/root/cdk/ca.crt",
"/root/cdk/kubelet/config.yaml",
"/root/cdk/server.crt",
"/root/cdk/server.key",
"/run/systemd/resolve/resolv.conf",
}


@mock.patch("charms.kubernetes_snaps.configure_kubernetes_service")
Expand All @@ -178,7 +188,7 @@ def test_configure_apiserver(mock_path, configure_kubernetes_service, external_c
None,
)
configure_kubernetes_service.assert_called_once()
service, args, extra = configure_kubernetes_service.call_args[0]
service, args, extra, config_files = configure_kubernetes_service.call_args[0]
assert service == "kube-apiserver"
assert extra == {}
expected_args = {
Expand Down Expand Up @@ -226,6 +236,72 @@ def test_configure_apiserver(mock_path, configure_kubernetes_service, external_c
"audit-policy-file": "/some/path",
"audit-webhook-config-file": "/some/path",
}
assert "/root/cdk/ca.crt" in config_files
if external_cloud.has_xcp:
expected_args["cloud-provider"] = "external"
assert expected_args == args


@mock.patch("pathlib.Path.exists", mock.MagicMock(return_value=True))
@mock.patch("charms.kubernetes_snaps.check_call")
@mock.patch("charms.kubernetes_snaps.service_restart")
def test_configure_kubernetes_service_same_config(service_restart, check_call, caplog):
caplog.set_level(logging.DEBUG)
log_message = "No config changes detected for test"
base_args = {"arg1": "val1", "arg2": "val2"}
extra_args = "arg2=val2-updated arg3=val3"
hashed = {
"arg1": "cc1d9c865e8380c2d566dc724c66369051acfaa3e9e8f36ad6c67d7d9b8461a5", # val1
"arg2": "05a202bd2f507925efc418afec49c00c5904bb532f5b59588dd1cb76773c5075", # val2-updated
"arg3": "bac8d4414984861d5199b7a97699c728bee36c4084299b2ca905434cf65d8944", # val3
}
yamlized = yaml.safe_dump(hashed)
with mock.patch(
"pathlib.Path.open", mock.mock_open(read_data=yamlized)
) as mock_open:
kubernetes_snaps.configure_kubernetes_service("test", base_args, extra_args)
mock_open.assert_called_once()
service_restart.assert_not_called()
check_call.assert_not_called()
assert log_message in caplog.text


@mock.patch("pathlib.Path.exists", mock.MagicMock(return_value=True))
@mock.patch("charms.kubernetes_snaps.check_call")
@mock.patch("charms.kubernetes_snaps.service_restart")
@pytest.mark.parametrize(
"extra_args, log_message",
[
("arg2=val2-updated", "Dropped config value arg3 in test"),
(
"arg2=val2-updated arg3=val3 arg4=val4",
"New config value arg4 in test",
),
(
"arg2=val2-updated arg3=val3-updated",
"Updated config value arg3 in test",
),
],
ids=["drop_key", "add_key", "update_key"],
)
def test_configure_kubernetes_service_difference(
service_restart, check_call, extra_args, log_message, caplog
):
caplog.set_level(logging.DEBUG)
base_args = {"arg1": "val1", "arg2": "val2"}
hashed = {
"arg1": "cc1d9c865e8380c2d566dc724c66369051acfaa3e9e8f36ad6c67d7d9b8461a5",
"arg2": "05a202bd2f507925efc418afec49c00c5904bb532f5b59588dd1cb76773c5075", # val2-updated
"arg3": "bac8d4414984861d5199b7a97699c728bee36c4084299b2ca905434cf65d8944", # val3
}
yamlized = yaml.safe_dump(hashed)
with mock.patch(
"pathlib.Path.open", mock.mock_open(read_data=yamlized)
) as mock_open:
with mock.patch("yaml.safe_dump") as safe_dump:
kubernetes_snaps.configure_kubernetes_service("test", base_args, extra_args)
service_restart.assert_called_once_with("snap.test.daemon")
check_call.assert_called_once()
mock_open.assert_has_calls([mock.call(), mock.call("w")], any_order=True)
safe_dump.assert_called_once()
assert log_message in caplog.text