Revisit passing tstate to function calls #294
Replies: 8 comments 1 reply
-
Can you explain how you ran your benchmark (what is the benchmark)? What is your OS and your versions (OS, Python, compiler, etc.)? I tried to only pass tstate when tstate was already used in the "regular" code path. One of the motivations of this change was to prepare Python for https://bugs.python.org/issue40522 : get the Python thread state from thread local variable (thread_local, TLS, pthread_getspecific, etc.). The problem is that this change may make the internal _PyThreadState_GET() function slower on C compilers which doesn't support C11 thread_local (use slower but portable pthread_getspecific()), and it may make the public PyThreadState_Get() function slower as well. |
Beta Was this translation helpful? Give feedback.
-
Please read again the issue. See for example the python-dev thread: https://mail.python.org/archives/list/[email protected]/thread/PIXJAJPWKDGHSQD65VOO2B7FDLU2QLHH/ See also this very old sub-intepreter bug which should be fixed: https://bugs.python.org/issue15751 |
Beta Was this translation helpful? Give feedback.
-
The impact on adding an additional argument to function on performance was discussed in python/cpython#17052 |
Beta Was this translation helpful? Give feedback.
-
I'd rather keep discussion on this tracker technical rather than political. Is your suggestion to try to revert the changes in 3.11, or look into faster ways to manage thread state? |
Beta Was this translation helpful? Give feedback.
-
CC @ericsnowcurrently, any thoughts? |
Beta Was this translation helpful? Give feedback.
-
FWIW, for the sake of clarity I'll say that, personally, I favor passing context explicitly. Still, I am not a fan of having such extensive inconsistency between the public API (all lookup) and parts of the internal API (passing). We should plan on committing to one or the other. FWIW, we could go all-in with passing the context around if we provide a shim header file for compatibility. However, going that route will be a larger project than I'm comfortable tackling. 🙂 |
Beta Was this translation helpful? Give feedback.
-
In a completely different context, we had something like this in asyncio. Originally most APIs there were designed to take a loop=... argument, but it defaulted to None, in which case it would be looked up as needed. Eventually we decided to remove the loop= argument in favor of always looking it up (though occasionally caching it on an object, like here). |
Beta Was this translation helpful? Give feedback.
-
Thread state is naturally thread-local variable. Moving the interpreter state to TLS (thread-local storage) seemed to have a positive performance impact, python/cpython#29228, at least for Linux. Maybe we should move the thread state to TLS as well? Using TLS will also help moving toward a multiple-parallel interpreter or NoGIL future. |
Beta Was this translation helpful? Give feedback.
-
In https://bugs.python.org/issue38644 many internal functions added overhead by calling
_PyThreadState_GET()
.The effect was pervasive (over 200 occurrences throughout the core) and hit many time critical code paths. For example in commit 1726909094, type_call() became slower for every single object instantiation. The cost varies depending on the cost of memory fencing and the compiler's implementation of atomic loads. It doesn't look like these changes had any discussion, review, tests, or demonstrable user benefits.
Beta Was this translation helpful? Give feedback.
All reactions