Reduce cache pressure in PyCodeObject #161

lpereira · 2021-12-08T21:02:21Z

lpereira
Dec 8, 2021

While writing a tiny module to inspect the results of function quickening, I realized that PyCodeObject might need some reorganization.

I still need to profile this, but I suspect that, since co_quickened is at the very end of the struct, and we need to pull in that cache line every time we want to check if a code object has been quickened already, we're increasing cache pressure there. If we move it closer to the beginning of the struct, and move things that aren't used as often to the end of the struct, we reduce the cache pressure of that (very hot) structure.

Another thing that might help a bit in the same vein is, when quickening a code object, to reallocate the co_code buffer to also contain both the cache and the quickened instructions. This should reduce the amount of memory usage overall, as this helps dilute memory allocation bookkeeping overhead. With this, we might not even need to move co_quickened to the first cache line of PyCodeObject, and can stuff a bit in co_flags to determine if the function is quickened or not. The downside here is that we're not checking for co_quickened with a quick zero check, but the upside is that we're eliminating a potential cache miss, which is a lot more expensive than a bitwise test. (Side note: co_flags should be unsigned int.)

What I don't know, however, if there is an ABI guarantee in this struct, which would thwart this. I sure hope not, because this struct seems to need some better packing strategy that balances performance and size (need to run pahole on this and take a look, but I only have a Mac and this works only on ELF binaries). It currently seems to, at a first glance, have some alignment holes on architectures with 64-bit pointers.

gvanrossum · 2021-12-08T21:26:15Z

gvanrossum
Dec 8, 2021
Maintainer

There is no ABI guarantee. We totally reorganized this since 3.10 already. In fact, the comment on line 55 in code.h suggests that Mark tried to arrange the hottest fields at the top. Of course the quickened code pointer is only accessed once per call, so maybe it doesn't matter it's at the end?

Reallocating co_code to also make place for co_quickened sounds somewhat dubious -- co_code is a bytes object, and people introspecting code objects will expect the size of that bytes object to reflect the number of instructions (times 2). And having "hidden" data at the end of a bytes object sounds really obscure.

0 replies

brandtbucher · 2022-03-16T18:26:59Z

brandtbucher
Mar 16, 2022
Maintainer

I seem to have missed this discussion, but you'll be happy to know that most of the suggestions you made here will be part of python/cpython#31888 (reviews welcome)!

I removed 3 member caches (co_varnames, co_cellvars, and co_freevars) that are almost never used, grouped the int members to remove two 4-byte holes, and was able to get rid of three other pointers (co_code, co_quickened, and co_firstinstr) by storing the instructions as part of the PyCodeObject and quickening them in-place.

As a result, code objects are now much more compact. pahole says we're down from 3.5 cache lines to 2.75 (and the remainder of the last is filled by the first few instructions).

As @markshannon points out on the PR, though, there may be a benefit to adding 2 members back (perhaps a co_code cache and something else) so that the first instruction is aligned to a cache line. That would require 64-byte alignment of code object allocations, though.

0 replies

markshannon · 2022-03-17T09:46:10Z

markshannon
Mar 17, 2022
Collaborator

~~Don't forget the GC Header before the object. The object starts 16 bytes into the allocated space.~~

Edit: Code objects aren't GC'ed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce cache pressure in PyCodeObject #161

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Reduce cache pressure in PyCodeObject #161

lpereira Dec 8, 2021

Replies: 3 comments

gvanrossum Dec 8, 2021 Maintainer

brandtbucher Mar 16, 2022 Maintainer

markshannon Mar 17, 2022 Collaborator

lpereira
Dec 8, 2021

gvanrossum
Dec 8, 2021
Maintainer

brandtbucher
Mar 16, 2022
Maintainer

markshannon
Mar 17, 2022
Collaborator