Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By hashing the config values and files, only restart services on differences #31

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

addyess
Copy link
Member

@addyess addyess commented Aug 22, 2024

Overview

  • calculate a hash for each config item on a kube service, and store the hash values on disk
  • each reconcile loop calculates the differences, if there are none -- don't restart the service

Details

  • store a hash file in _snap_common_path(service) / "args-hash.txt"
  • the hash file has a list of all the args and their sha256 hashed values so no secrets are leaked
  • the hash file also contains a sha256 of any file contents if the configured value is an on disk file that may have changed
  • if the hash for each item matches -- don't restart the service -- this saves 3x time on every reconcile

Comment on lines +197 to +222
config_files = set(
map(
api_opts.get,
(
"tls-cert-file",
"tls-private-key-file",
"kubelet-certificate-authority",
"kubelet-client-certificate",
"kubelet-client-key",
"authentication-token-webhook-config-file",
"service-account-signing-key-file",
"service-account-key-file",
"encryption-provider-config",
"etcd-cafile",
"etcd-keyfile",
"etcd-certfile",
"authorization-webhook-config-file",
"requestheader-client-ca-file",
"proxy-client-cert-file",
"proxy-client-key-file",
"client-ca-file",
"audit-policy-file",
"audit-webhook-config-file",
),
)
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these keys point to config files that may have been overwritten during the reconcile period for the api server

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how do you know that you didn't miss one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know... that's what i'm SO afraid of. Missing one would be bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eaudetcobello do you have any ideas/opinions on how to not miss a file? I could "scan all the arguments looking for file like strings" but that sounds not great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think this could become unmanageable in the future and might come back to bite us. Looking at the API server options, there seem to be a few common patterns:

  • Options with the file suffix
  • Options with the config suffix
  • Options that include certificate in the name
  • Options that include key in the name

It might be a good idea to check if an option has any of these indicators before trying to process the file. Here's an example: https://regex101.com/r/m5Gq9m/1

Copy link
Member Author

@addyess addyess Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the full list of args that can be files are listed here:
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

Some with comments like:
File with
Path to
File containing
file used

And the hits just keep coming. The docs and arguments names aren't consistent. Maybe in the go code there's a list -- but with every k8s release -- more and more arguments may or may not be config files. Yes -- each config file will need to be ensured it gets added to the list and the regex could be a cool way, but there's no guarantee upstream k8s will follow it.

Copy link
Member Author

@addyess addyess Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there's another argument here which I expressly excluded because it's not a config file but a log file: audit-log-path

We do not want to watch for changes in this file b/c it's a log -- always changing means the api server reboots every time.

I think this list requires brain-power or some other comprehensive list of arguments which are absolutely treated as "read" by the api-server when restarted

Comment on lines +265 to +279
config_files = set(
map(
controller_opts.get,
(
"root-ca-file",
"kubeconfig",
"authorization-kubeconfig",
"authentication-kubeconfig",
"service-account-private-key-file",
"tls-cert-file",
"tls-private-key-file",
),
)
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these keys point to config files that may have been overwritten during the reconcile period for the controller-manager

Comment on lines +336 to +339
config_files = {
kube_proxy_opts["config"],
kubeconfig,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are config files that may have been overwritten during the reconcile period for the kube-proxy service

Comment on lines -385 to -409
def configure_kubernetes_service(service, base_args, extra_args_config):
extra_args = parse_extra_args(extra_args_config)

args = {}
args.update(base_args)
args.update(extra_args)

# TODO: CIS arg handling???
# CIS benchmark action may inject kv config to pass failing tests. Merge
# these after the func args as they should take precedence.
# cis_args_key = "cis-" + service
# cis_args = db.get(cis_args_key) or {}
# args.update(cis_args)

# Remove any args with 'None' values (all k8s args are 'k=v') and
# construct an arg string for use by 'snap set'.
args = {k: v for k, v in args.items() if v is not None}
args = ['--%s="%s"' % arg for arg in args.items()]
args = " ".join(args)

cmd = ["snap", "set", service, f"args={args}"]
check_call(cmd)
service_restart(f"snap.{service}.daemon")


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is moved down the page...

charms/kubernetes_snaps.py Outdated Show resolved Hide resolved
@addyess addyess force-pushed the akd/config-hashing-speedup branch 2 times, most recently from b1dd359 to fa9a3db Compare August 22, 2024 15:44
@addyess
Copy link
Member Author

addyess commented Aug 22, 2024

Events times from the same control-plane charm with and without these changes:

total time install -> active/idle
with -- 15min
without -- 36min

Comment on lines +197 to +222
config_files = set(
map(
api_opts.get,
(
"tls-cert-file",
"tls-private-key-file",
"kubelet-certificate-authority",
"kubelet-client-certificate",
"kubelet-client-key",
"authentication-token-webhook-config-file",
"service-account-signing-key-file",
"service-account-key-file",
"encryption-provider-config",
"etcd-cafile",
"etcd-keyfile",
"etcd-certfile",
"authorization-webhook-config-file",
"requestheader-client-ca-file",
"proxy-client-cert-file",
"proxy-client-key-file",
"client-ca-file",
"audit-policy-file",
"audit-webhook-config-file",
),
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think this could become unmanageable in the future and might come back to bite us. Looking at the API server options, there seem to be a few common patterns:

  • Options with the file suffix
  • Options with the config suffix
  • Options that include certificate in the name
  • Options that include key in the name

It might be a good idea to check if an option has any of these indicators before trying to process the file. Here's an example: https://regex101.com/r/m5Gq9m/1

charms/kubernetes_snaps.py Outdated Show resolved Hide resolved
Comment on lines +529 to +533
log.debug("%s: Added config value %s", service.capitalize(), key)
for key in removed:
log.debug("%s: Dropped config value %s", service.capitalize(), key)
for key in modified:
log.debug("%s: Updated config value %s", service.capitalize(), key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.debug("%s: Added config value %s", service.capitalize(), key)
for key in removed:
log.debug("%s: Dropped config value %s", service.capitalize(), key)
for key in modified:
log.debug("%s: Updated config value %s", service.capitalize(), key)
log.debug("%s: Added config value %s", service.capitalize(), key)
for key in removed:
log.debug("%s: Dropped config value %s", service.capitalize(), key)
for key in modified:
log.debug("%s: Updated config value %s", service.capitalize(), key)

Copy link
Member Author

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateoflorido I think you'll like this. We can use this env var to control from the charm whether or not we want to use this caching feature.

Comment on lines +481 to +486
def _enable_config_hashing() -> bool:
"""This is a debug option to enable config hashing."""
env = os.environ.get("JUJU_FEATURE_KUBERNETES_SNAP_CONFIG_HASHING") or ""
return env.lower() == "on"


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the charm at runtime (per hook even!!!) decide whether or not it wants to enable hashing.

Now an upgrade action could restart the service even if there aren't any config changes. We could even expose via charm config (though i think that'd be going too far).

Comment on lines +519 to +523
if not _enable_config_hashing():
log.debug("Config hashing disabled, skipping config change detection")
yield True
args_hash.unlink(missing_ok=True)
return
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if hashing isn't enabled, this method declares there are changes, and we should reconfigure the service (as before this feature existed) and it also clears the cache file if one exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants