Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: don't use the max-positional-args parent class for JujuLogLine #1495

Merged

Conversation

tonyandrewmeyer
Copy link
Contributor

The Scenario code that limits the number of positional arguments (needed in Python 3.8 since the dataclasses there do not have KW_ONLY) is quite expensive. For the case of JujuLogLine, it's not actually providing any value, because both of the arguments can be positional. JujuLogLine objects are created frequently (one per logging call), so the cost of creation matters a lot for the test performance.

This PR changes JujuLogLine to be a regular frozen dataclass. There are other performance improvements possible in the max-positional-args code (most significant of which would be only using it for Python 3.8 and 3.9) but this one-line change provides most of the benefits, so seems worth doing before any more significant work.

Timing (best of 3):

Suite main branch
operator unit tests (no concurrency) 4m11.923s 3m20.175s
traefik scenario tests 3m49.479s 2m18.024s
kafka-k8s-operator scenario tests 0m13.463s 0m9.973s

Refs #1434

Copy link
Contributor

@james-garner-canonical james-garner-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, _max_pos_args surprised me! I suspect that the 'correct' way to have a dataclasswith kw_only args in Python < 3.10 would be to just manually write the __init__ for the classes that require it ... but there are 19 of them ... maybe worth if it _max_pos_args is a big hit to performance tho ...

@tonyandrewmeyer
Copy link
Contributor Author

Wow, _max_pos_args surprised me! I suspect that the 'correct' way to have a dataclasswith kw_only args in Python < 3.10 would be to just manually write the __init__ for the classes that require it ... but there are 19 of them ... maybe worth if it _max_pos_args is a big hit to performance tho ...

Yeah, we went back-and-forth a lot of times on how this should be implemented (the PR has most of the discussion). I did have all of those custom __init__ signatures in one version, and we ended up removing those and going for the single common location. I wasn't looking at performance at the time, though.

One of the problems is the lengths we go to in order to show really nice errors (more informative ones than you actually get with the built-in dataclasses, in fact).

I think we get to drop 3.8 in under a year, so for now it's perhaps still worth it.

@james-garner-canonical
Copy link
Contributor

Yeah, we went back-and-forth a lot of times on how this should be implemented (the PR has most of the discussion). I did have all of those custom __init__ signatures in one version, and we ended up removing those and going for the single common location. I wasn't looking at performance at the time, though.

Thanks, that was an interesting read!

I think we get to drop 3.8 in under a year, so for now it's perhaps still worth it.

Oh true, and the _max_posargs implementation is going to be way easier to refactor to KW_ONLY and/or the kw_only arguments for dataclass and field than refactoring from 19 custom __init__s would be, so probably best to keep it around for now.

Copy link
Contributor

@dimaqq dimaqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if you used a deterministic profiler or parca

@tonyandrewmeyer
Copy link
Contributor Author

I'm curious if you used a deterministic profiler or parca

The original profiling was provided in #1434 - one was cProfile and one unstated but likely the same. I did a bit of cProfile checking as well, but it was clear enough from what was already in the tickets where the issue was (and combined with my knowledge of the recent changes that was enough to pin things down).

I feel for profiling tests a statistical/deterministic profiler is a better choice than a sampling one.

@tonyandrewmeyer tonyandrewmeyer merged commit d4aadba into canonical:main Dec 11, 2024
32 checks passed
@tonyandrewmeyer tonyandrewmeyer deleted the juju-logline-regular-object branch December 11, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants