Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: cache signature structure in ops.testing state classes #1499

Conversation

tonyandrewmeyer
Copy link
Contributor

@tonyandrewmeyer tonyandrewmeyer commented Dec 13, 2024

The Scenario code that limits the number of positional arguments (needed in Python 3.8 since the dataclasses there do not have KW_ONLY) is quite expensive. We have a method of providing this functionality in 3.8, but it is unfortunately expensive, and therefore a regression from older versions of Scenario.

I considered 4 options here:

  • Manually write out the dozen or so __init__ signatures (as we had at one point in the original PR). I'd rather not for the same reasons we made the decision at that time.
  • Figure out a way to have a syntax that allows using regular dataclasses.KW_ONLY with modern Python and only use our custom code on older Python. I spent some time looking into this, and couldn't figure out a way to do it where the code still seemed readable and maintainable enough.
  • Cache the most expensive work (what's in this PR).
  • Don't bother doing anything now, and eagerly wait for the ability to drop 3.8 support. A roughly 5% performance regression felt worth fixing as long as the change is fairly straightforward (although this change only gets about half of that back).

The most expensive parts of the __new__ method are the ones that work with inspect: inspect.signature in particular, but also getting the default value of the parameters. If we assume that no-one is run-time altering the signature of the class (I believe no-one should be) then the values of these never actually change, but we are currently calculating them every time an instance of the class is created. This PR changes that to cache those three values the first time they are needed.

There's one additional performance tweak in the branch that doesn't make a significant difference, but is trivial to do: when checking if the YAML files exist, skip the filesystem exists() call if we just created an empty temporary directory a few lines above, since we know that it will never exist.

A drive-by: I happened to notice while working on this branch Harness referring to options.yaml, which does not exist (any more?) as far as I know, so a docs tweak to address that.

Timing (best of 3):

Suite main branch
operator unit tests (no concurrency) 165.25s 161.89s
traefik scenario tests 45.49 44.30s
kafka-k8s-operator scenario tests 4.48s 4.38s

Refs #1434

@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review December 13, 2024 07:31
Copy link
Contributor

@james-garner-canonical james-garner-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, clever and simple solution, and a nice performance win.

I was briefly confused when tracing the new vs old logic, because required_args in __new__ stores required (as args -- not kwargs), while cls._init_required_args stores required (as args or kwargs). Idk if there's a clearer short name for _init_required_args though, and it seems easy enough to follow when not trying to go side by side (so fine for future readers).

Copy link
Contributor

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level: the time saved is marginal, but then code is rather straightforward and I'm not sure what would the alternative be, maybe custom types where the inspected bits are inlined or hardcoded?

Anyway +1 on this.

testing/src/scenario/_runtime.py Show resolved Hide resolved
testing/src/scenario/state.py Show resolved Hide resolved
).parameters
cls._init_kw_only = {
name
for name in tuple(parameters)[cls._max_positional_args :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this was auto-formatted by ruff, the dangling :] is kinda funny :]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was, yeah - I lazily just wrote it out on one line and let tox -e fmt clean up.

Does the trailing happy face bother you? I could do something like move tuple(parameters) out to a separate line to get the dict comprehension to be on a single line, if that would be better.

@tonyandrewmeyer tonyandrewmeyer merged commit eb80926 into canonical:main Dec 17, 2024
32 checks passed
@tonyandrewmeyer tonyandrewmeyer deleted the cache-scenario-maxargs-calculations branch December 17, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants