Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By hashing the config values and files, only restart services on differences #31

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 177 additions & 31 deletions charms/kubernetes_snaps.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import contextlib
import hashlib
import ipaddress
import json
import logging
Expand Down Expand Up @@ -100,7 +102,6 @@ def configure_apiserver(
enc_provider_config = encryption_config_path()
if enc_provider_config.exists():
api_opts["encryption-provider-config"] = str(enc_provider_config)

api_opts["advertise-address"] = advertise_address

api_opts["etcd-cafile"] = "/root/cdk/etcd/client-ca.pem"
Expand Down Expand Up @@ -192,7 +193,36 @@ def configure_apiserver(
else:
remove_if_exists(audit_webhook_conf_path)

configure_kubernetes_service("kube-apiserver", api_opts, extra_args_config)
config_files = set(
map(
api_opts.get,
(
"tls-cert-file",
"tls-private-key-file",
"kubelet-certificate-authority",
"kubelet-client-certificate",
"kubelet-client-key",
"authentication-token-webhook-config-file",
"service-account-signing-key-file",
"service-account-key-file",
"encryption-provider-config",
"etcd-cafile",
"etcd-keyfile",
"etcd-certfile",
"authorization-webhook-config-file",
"requestheader-client-ca-file",
"proxy-client-cert-file",
"proxy-client-key-file",
"client-ca-file",
"audit-policy-file",
"audit-webhook-config-file",
),
)
)
Comment on lines +196 to +221
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these keys point to config files that may have been overwritten during the reconcile period for the api server

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how do you know that you didn't miss one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know... that's what i'm SO afraid of. Missing one would be bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eaudetcobello do you have any ideas/opinions on how to not miss a file? I could "scan all the arguments looking for file like strings" but that sounds not great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think this could become unmanageable in the future and might come back to bite us. Looking at the API server options, there seem to be a few common patterns:

  • Options with the file suffix
  • Options with the config suffix
  • Options that include certificate in the name
  • Options that include key in the name

It might be a good idea to check if an option has any of these indicators before trying to process the file. Here's an example: https://regex101.com/r/m5Gq9m/1

Copy link
Member Author

@addyess addyess Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the full list of args that can be files are listed here:
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

Some with comments like:
File with
Path to
File containing
file used

And the hits just keep coming. The docs and arguments names aren't consistent. Maybe in the go code there's a list -- but with every k8s release -- more and more arguments may or may not be config files. Yes -- each config file will need to be ensured it gets added to the list and the regex could be a cool way, but there's no guarantee upstream k8s will follow it.

Copy link
Member Author

@addyess addyess Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there's another argument here which I expressly excluded because it's not a config file but a log file: audit-log-path

We do not want to watch for changes in this file b/c it's a log -- always changing means the api server reboots every time.

I think this list requires brain-power or some other comprehensive list of arguments which are absolutely treated as "read" by the api-server when restarted


configure_kubernetes_service(
"kube-apiserver", api_opts, extra_args_config, config_files
)


def configure_controller_manager(
Expand Down Expand Up @@ -231,10 +261,26 @@ def configure_controller_manager(

controller_opts["feature-gates"] = ",".join(feature_gates)

config_files = set(
map(
controller_opts.get,
(
"root-ca-file",
"kubeconfig",
"authorization-kubeconfig",
"authentication-kubeconfig",
"service-account-private-key-file",
"tls-cert-file",
"tls-private-key-file",
),
)
)

Comment on lines +264 to +278
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these keys point to config files that may have been overwritten during the reconcile period for the controller-manager

configure_kubernetes_service(
"kube-controller-manager",
controller_opts,
extra_args_config,
config_files,
)


Expand Down Expand Up @@ -286,10 +332,13 @@ def configure_kube_proxy(
f.write("# Generated by kubernetes-common library, do not edit\n")
yaml.dump(kube_proxy_config, f)

config_files = {
kube_proxy_opts["config"],
kubeconfig,
}
Comment on lines +335 to +338
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are config files that may have been overwritten during the reconcile period for the kube-proxy service


configure_kubernetes_service(
"kube-proxy",
kube_proxy_opts,
extra_args_config,
"kube-proxy", kube_proxy_opts, extra_args_config, config_files
)


Expand Down Expand Up @@ -375,38 +424,23 @@ def configure_kubelet(
f.write("# Generated by charm, do not edit\n")
yaml.dump(kubelet_config, f)

config_files = {
kubelet_opts["config"],
kubeconfig,
"/run/systemd/resolve/resolv.conf",
"/root/cdk/ca.crt",
"/root/cdk/server.crt",
"/root/cdk/server.key",
}

configure_kubernetes_service(
"kubelet",
kubelet_opts,
extra_args_config,
config_files,
)


def configure_kubernetes_service(service, base_args, extra_args_config):
extra_args = parse_extra_args(extra_args_config)

args = {}
args.update(base_args)
args.update(extra_args)

# TODO: CIS arg handling???
# CIS benchmark action may inject kv config to pass failing tests. Merge
# these after the func args as they should take precedence.
# cis_args_key = "cis-" + service
# cis_args = db.get(cis_args_key) or {}
# args.update(cis_args)

# Remove any args with 'None' values (all k8s args are 'k=v') and
# construct an arg string for use by 'snap set'.
args = {k: v for k, v in args.items() if v is not None}
args = ['--%s="%s"' % arg for arg in args.items()]
args = " ".join(args)

cmd = ["snap", "set", service, f"args={args}"]
check_call(cmd)
service_restart(f"snap.{service}.daemon")


Comment on lines -385 to -409
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is moved down the page...

def configure_scheduler(extra_args_config, kubeconfig):
kube_scheduler_config_path = "/root/cdk/kube-scheduler-config.yaml"
scheduler_opts = {}
Expand All @@ -433,7 +467,119 @@ def configure_scheduler(extra_args_config, kubeconfig):
with open(kube_scheduler_config_path, "w") as f:
yaml.safe_dump(scheduler_config, f)

configure_kubernetes_service("kube-scheduler", scheduler_opts, extra_args_config)
config_files = {
scheduler_opts["config"],
kubeconfig,
kube_scheduler_config_path,
}

configure_kubernetes_service(
"kube-scheduler", scheduler_opts, extra_args_config, config_files
)


def _enable_config_hashing() -> bool:
"""This is a debug option to enable config hashing."""
env = os.environ.get("JUJU_FEATURE_KUBERNETES_SNAP_CONFIG_HASHING") or ""
return env.lower() == "on"


Comment on lines +481 to +486
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the charm at runtime (per hook even!!!) decide whether or not it wants to enable hashing.

Now an upgrade action could restart the service even if there aren't any config changes. We could even expose via charm config (though i think that'd be going too far).

def _sha256_file(config_file: str) -> hashlib.sha256:
h = hashlib.sha256()
h.update(config_file.encode())
config_file = Path(config_file)
if Path.is_file(config_file):
with config_file.open("rb") as file:
while True:
# Reading is buffered, so we can read smaller chunks.
chunk = file.read(h.block_size)
if not chunk:
break
h.update(chunk)
else:
h.update(b"--empty--")
return h


def _dict_compare(d1, d2):
d1_keys = set(d1.keys())
d2_keys = set(d2.keys())
shared_keys = d1_keys & d2_keys
added = d1_keys - d2_keys
removed = d2_keys - d1_keys
modified = {k: (d1[k], d2[k]) for k in shared_keys if d1[k] != d2[k]}
same = set(k for k in shared_keys if d1[k] == d2[k])
return added, removed, modified, same


@contextlib.contextmanager
def _calculate_config_difference(service: str, args, config_files):
args_hash = _snap_common_path(service) / "args-hash.txt"

if not _enable_config_hashing():
log.debug("Config hashing disabled, skipping config change detection")
yield True
args_hash.unlink(missing_ok=True)
return
Comment on lines +519 to +523
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if hashing isn't enabled, this method declares there are changes, and we should reconfigure the service (as before this feature existed) and it also clears the cache file if one exists.


log.debug("Checking for config changes for %s", service)
# gather existing config hash
current, updated = {}, {}
if Path.is_file(args_hash):
try:
current.update(yaml.safe_load(args_hash.open()))
except yaml.YAMLError:
log.warning("Failed to load existing args hash for %s", service)

# calculate new config hash
updated = {arg: hashlib.sha256(str(val).encode()) for arg, val in args.items()}
updated.update({f"path:{key}": _sha256_file(key) for key in config_files if key})
updated = {k: v.hexdigest() for k, v in updated.items()}

# discern differences
added, removed, modified, _ = _dict_compare(updated, current)
for key in added:
log.debug("%s: Added config value %s", service.capitalize(), key)
for key in removed:
log.debug("%s: Dropped config value %s", service.capitalize(), key)
for key in modified:
log.debug("%s: Updated config value %s", service.capitalize(), key)
Comment on lines +542 to +546
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.debug("%s: Added config value %s", service.capitalize(), key)
for key in removed:
log.debug("%s: Dropped config value %s", service.capitalize(), key)
for key in modified:
log.debug("%s: Updated config value %s", service.capitalize(), key)
log.debug("%s: Added config value %s", service.capitalize(), key)
for key in removed:
log.debug("%s: Dropped config value %s", service.capitalize(), key)
for key in modified:
log.debug("%s: Updated config value %s", service.capitalize(), key)


different = added or removed or modified
yield different
if different:
yaml.safe_dump(updated, args_hash.open("w"))
else:
log.debug("%s: No config changes detected", service.capitalize())


def configure_kubernetes_service(
service, base_args, extra_args_config, config_files=None
):
extra_args = parse_extra_args(extra_args_config)

args = {}
args.update(base_args)
args.update(extra_args)

# TODO: CIS arg handling???
# CIS benchmark action may inject kv config to pass failing tests. Merge
# these after the func args as they should take precedence.
# cis_args_key = "cis-" + service
# cis_args = db.get(cis_args_key) or {}
# args.update(cis_args)

# Remove any args with 'None' values (all k8s args are 'k=v') and
# construct an arg string for use by 'snap set'.

args = {k: v for k, v in args.items() if v is not None}
with _calculate_config_difference(service, args, config_files or {}) as restart:
if restart:
args = ['--%s="%s"' % arg for arg in args.items()]
args = " ".join(args)
cmd = ["snap", "set", service, f"args={args}"]
check_call(cmd)
service_restart(f"snap.{service}.daemon")


def configure_services_restart_always(control_plane=False):
Expand Down
Loading