Replies: 3 comments
-
There is no ABI guarantee. We totally reorganized this since 3.10 already. In fact, the comment on line 55 in code.h suggests that Mark tried to arrange the hottest fields at the top. Of course the quickened code pointer is only accessed once per call, so maybe it doesn't matter it's at the end? Reallocating co_code to also make place for co_quickened sounds somewhat dubious -- co_code is a bytes object, and people introspecting code objects will expect the size of that bytes object to reflect the number of instructions (times 2). And having "hidden" data at the end of a bytes object sounds really obscure. |
Beta Was this translation helpful? Give feedback.
-
I seem to have missed this discussion, but you'll be happy to know that most of the suggestions you made here will be part of python/cpython#31888 (reviews welcome)! I removed 3 member caches ( As a result, code objects are now much more compact. As @markshannon points out on the PR, though, there may be a benefit to adding 2 members back (perhaps a |
Beta Was this translation helpful? Give feedback.
-
Edit: Code objects aren't GC'ed. |
Beta Was this translation helpful? Give feedback.
-
While writing a tiny module to inspect the results of function quickening, I realized that
PyCodeObject
might need some reorganization.I still need to profile this, but I suspect that, since
co_quickened
is at the very end of the struct, and we need to pull in that cache line every time we want to check if a code object has been quickened already, we're increasing cache pressure there. If we move it closer to the beginning of the struct, and move things that aren't used as often to the end of the struct, we reduce the cache pressure of that (very hot) structure.Another thing that might help a bit in the same vein is, when quickening a code object, to reallocate the
co_code
buffer to also contain both the cache and the quickened instructions. This should reduce the amount of memory usage overall, as this helps dilute memory allocation bookkeeping overhead. With this, we might not even need to moveco_quickened
to the first cache line ofPyCodeObject
, and can stuff a bit inco_flags
to determine if the function is quickened or not. The downside here is that we're not checking forco_quickened
with a quick zero check, but the upside is that we're eliminating a potential cache miss, which is a lot more expensive than a bitwise test. (Side note:co_flags
should beunsigned int
.)What I don't know, however, if there is an ABI guarantee in this struct, which would thwart this. I sure hope not, because this struct seems to need some better packing strategy that balances performance and size (need to run
pahole
on this and take a look, but I only have a Mac and this works only on ELF binaries). It currently seems to, at a first glance, have some alignment holes on architectures with 64-bit pointers.Beta Was this translation helpful? Give feedback.
All reactions