-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make wrapped C++ functions pickleable #30099
Merged
Merged
Changes from 35 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
e4b1218
Add `test_*_repr()` to test behavior with different Python versions.
9d613e9
Adjust expected repr for PyPy
ebc1d23
Adjust another expected repr for PyPy
fdc9126
Try again: undo mistaken adjustment for PyPy
9f1d6a0
Give up on test_pytypes test_capsule_with_name_repr (not sufficiently…
04e16a2
`_wrapped_simple_callable` proof of concept
c53a281
Add `module_::def_as_native()`
424ace3
Resolve PyPy `TypeError: cannot create weak reference to builtin_func…
f9fd9d0
Replace `PyCapsule` with `function_record_PyObject`.
0723dda
function_record_PyTypeObject: Replace C++20 designated initializers w…
694ebbd
Introduce `PYBIND11_DETAIL_FUNCTION_RECORD_ABI_ID` and use along with…
16c19d4
Move `std::once_flag` out of `inline` function (in hopes that that fi…
ca31ae2
`tp_vectorcall` was introduced only with Python 3.8
b5d3bf5
clang-tidy auto-fixes
9e263c5
Disable `-Wmissing-field-initializers`. Guard `PyType_Ready(&function…
6b45525
Give up on the `std::call_once` idea, for Python 3.6 compatibility (i…
2c876d6
Add `__reduce_ex__` to `function_record_PyTypeObject`. Add `_pybind11…
3a5730a
Move `function_record_PyTypeObject_PyType_Ready()` call in `get_inter…
fe1b774
gcc 4.8.5 and 7.5.0 reject `PYBIND11_WARNING_DISABLE_GCC("-Wmissing-f…
1679fe0
`function_record_PyTypeObject_PyType_Ready()`, `get_pybind11_detail_f…
7359b5a
gcc 4.8.5 and 7.5.0 reject `PYBIND11_WARNING_DISABLE_GCC("-Wcast-func…
dc50802
Python 3.6, 3.7: Skip `get_pybind11_detail_function_record_pickle_hel…
fe87588
New version of `_function_record_pickle_helper`, using `collections.n…
554d529
Explicit `str(tup_obj[1])` to fix 🐍 3 • centos:7 • x64 segfault
84f9c35
Factor out detail/function_record_pyobject.h
808cbbf
Use PYBIND11_NAMESPACE_BEGIN/END for function_record_PyTypeObject_met…
61973df
Factor out function_record_PyTypeObject_methods::tp_name_impl, mainly…
9e22ddd
Simplify implementation of UNEXPECTED CALL functions.
d701abb
Factor out `detail::get_scope_module()`
d148a74
IncludeCleaner fixes (Google toolchain).
0d210a5
Comment out unreachable code (to resolve MSVC Werrors).
0e7bb10
Use built-in `eval()` instead of `function_record_pickle_helper()`
1607d54
Remove `function_record_pickle_helper()`
8b77252
Mark `internals::function_record_capsule_name` as OBSOLETE.
0731b71
Add comment pointing to google/pywrapcc#30099
02b31b1
Archive experimental code from video meet with @rainwoodman 2024-02-15
bcdfc69
Add a pickle roundtrip test starting with `m.simple_callable.__self__…
5793372
PyPy does not have `m.simple_callable.__self__`
1c24995
Change "UNUSUAL" comment as suggested by @rainwoodman (only very slig…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,205 @@ | ||
// Copyright (c) 2024 The Pybind Development Team. | ||
// All rights reserved. Use of this source code is governed by a | ||
// BSD-style license that can be found in the LICENSE file. | ||
|
||
// For background see the description of PR google/pywrapcc#30099. | ||
|
||
#pragma once | ||
|
||
#include "../attr.h" | ||
#include "../pytypes.h" | ||
#include "common.h" | ||
|
||
#include <cstring> | ||
|
||
PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE) | ||
PYBIND11_NAMESPACE_BEGIN(detail) | ||
|
||
struct function_record_PyObject { | ||
PyObject_HEAD | ||
function_record *cpp_func_rec; | ||
}; | ||
|
||
PYBIND11_NAMESPACE_BEGIN(function_record_PyTypeObject_methods) | ||
|
||
PyObject *tp_new_impl(PyTypeObject *type, PyObject *args, PyObject *kwds); | ||
PyObject *tp_alloc_impl(PyTypeObject *type, Py_ssize_t nitems); | ||
int tp_init_impl(PyObject *self, PyObject *args, PyObject *kwds); | ||
void tp_dealloc_impl(PyObject *self); | ||
void tp_free_impl(void *self); | ||
|
||
static PyObject *reduce_ex_impl(PyObject *self, PyObject *, PyObject *); | ||
|
||
PYBIND11_WARNING_PUSH | ||
#if defined(__GNUC__) && __GNUC__ >= 8 | ||
PYBIND11_WARNING_DISABLE_GCC("-Wcast-function-type") | ||
#endif | ||
static PyMethodDef tp_methods_impl[] | ||
= {{"__reduce_ex__", (PyCFunction) reduce_ex_impl, METH_VARARGS | METH_KEYWORDS, nullptr}, | ||
{nullptr, nullptr, 0, nullptr}}; | ||
PYBIND11_WARNING_POP | ||
|
||
// Note that this name is versioned. | ||
constexpr char tp_name_impl[] | ||
= "pybind11_detail_function_record_" PYBIND11_DETAIL_FUNCTION_RECORD_ABI_ID | ||
"_" PYBIND11_PLATFORM_ABI_ID_V4; | ||
|
||
PYBIND11_NAMESPACE_END(function_record_PyTypeObject_methods) | ||
|
||
// Designated initializers are a C++20 feature: | ||
// https://en.cppreference.com/w/cpp/language/aggregate_initialization#Designated_initializers | ||
// MSVC rejects them unless /std:c++20 is used (error code C7555). | ||
PYBIND11_WARNING_PUSH | ||
PYBIND11_WARNING_DISABLE_CLANG("-Wmissing-field-initializers") | ||
#if defined(__GNUC__) && __GNUC__ >= 8 | ||
PYBIND11_WARNING_DISABLE_GCC("-Wmissing-field-initializers") | ||
#endif | ||
static PyTypeObject function_record_PyTypeObject = { | ||
PyVarObject_HEAD_INIT(nullptr, 0) | ||
/* const char *tp_name */ function_record_PyTypeObject_methods::tp_name_impl, | ||
/* Py_ssize_t tp_basicsize */ sizeof(function_record_PyObject), | ||
/* Py_ssize_t tp_itemsize */ 0, | ||
/* destructor tp_dealloc */ function_record_PyTypeObject_methods::tp_dealloc_impl, | ||
/* Py_ssize_t tp_vectorcall_offset */ 0, | ||
/* getattrfunc tp_getattr */ nullptr, | ||
/* setattrfunc tp_setattr */ nullptr, | ||
/* PyAsyncMethods *tp_as_async */ nullptr, | ||
/* reprfunc tp_repr */ nullptr, | ||
/* PyNumberMethods *tp_as_number */ nullptr, | ||
/* PySequenceMethods *tp_as_sequence */ nullptr, | ||
/* PyMappingMethods *tp_as_mapping */ nullptr, | ||
/* hashfunc tp_hash */ nullptr, | ||
/* ternaryfunc tp_call */ nullptr, | ||
/* reprfunc tp_str */ nullptr, | ||
/* getattrofunc tp_getattro */ nullptr, | ||
/* setattrofunc tp_setattro */ nullptr, | ||
/* PyBufferProcs *tp_as_buffer */ nullptr, | ||
/* unsigned long tp_flags */ Py_TPFLAGS_DEFAULT, | ||
/* const char *tp_doc */ nullptr, | ||
/* traverseproc tp_traverse */ nullptr, | ||
/* inquiry tp_clear */ nullptr, | ||
/* richcmpfunc tp_richcompare */ nullptr, | ||
/* Py_ssize_t tp_weaklistoffset */ 0, | ||
/* getiterfunc tp_iter */ nullptr, | ||
/* iternextfunc tp_iternext */ nullptr, | ||
/* struct PyMethodDef *tp_methods */ function_record_PyTypeObject_methods::tp_methods_impl, | ||
/* struct PyMemberDef *tp_members */ nullptr, | ||
/* struct PyGetSetDef *tp_getset */ nullptr, | ||
/* struct _typeobject *tp_base */ nullptr, | ||
/* PyObject *tp_dict */ nullptr, | ||
/* descrgetfunc tp_descr_get */ nullptr, | ||
/* descrsetfunc tp_descr_set */ nullptr, | ||
/* Py_ssize_t tp_dictoffset */ 0, | ||
/* initproc tp_init */ function_record_PyTypeObject_methods::tp_init_impl, | ||
/* allocfunc tp_alloc */ function_record_PyTypeObject_methods::tp_alloc_impl, | ||
/* newfunc tp_new */ function_record_PyTypeObject_methods::tp_new_impl, | ||
/* freefunc tp_free */ function_record_PyTypeObject_methods::tp_free_impl, | ||
/* inquiry tp_is_gc */ nullptr, | ||
/* PyObject *tp_bases */ nullptr, | ||
/* PyObject *tp_mro */ nullptr, | ||
/* PyObject *tp_cache */ nullptr, | ||
/* PyObject *tp_subclasses */ nullptr, | ||
/* PyObject *tp_weaklist */ nullptr, | ||
/* destructor tp_del */ nullptr, | ||
/* unsigned int tp_version_tag */ 0, | ||
/* destructor tp_finalize */ nullptr, | ||
#if PY_VERSION_HEX >= 0x03080000 | ||
/* vectorcallfunc tp_vectorcall */ nullptr, | ||
#endif | ||
}; | ||
PYBIND11_WARNING_POP | ||
|
||
static bool function_record_PyTypeObject_PyType_Ready_first_call = true; | ||
|
||
inline void function_record_PyTypeObject_PyType_Ready() { | ||
if (function_record_PyTypeObject_PyType_Ready_first_call) { | ||
if (PyType_Ready(&function_record_PyTypeObject) < 0) { | ||
throw error_already_set(); | ||
} | ||
function_record_PyTypeObject_PyType_Ready_first_call = false; | ||
} | ||
} | ||
|
||
inline bool is_function_record_PyObject(PyObject *obj) { | ||
if (PyType_Check(obj) != 0) { | ||
return false; | ||
} | ||
PyTypeObject *obj_type = Py_TYPE(obj); | ||
// Fast path (pointer comparison). | ||
if (obj_type == &function_record_PyTypeObject) { | ||
return true; | ||
} | ||
// This works across extension modules. Note that tp_name is versioned. | ||
if (strcmp(obj_type->tp_name, function_record_PyTypeObject.tp_name) == 0) { | ||
return true; | ||
} | ||
return false; | ||
} | ||
|
||
inline function_record *function_record_ptr_from_PyObject(PyObject *obj) { | ||
if (is_function_record_PyObject(obj)) { | ||
return ((detail::function_record_PyObject *) obj)->cpp_func_rec; | ||
} | ||
return nullptr; | ||
} | ||
|
||
inline object function_record_PyObject_New() { | ||
auto *py_func_rec = PyObject_New(function_record_PyObject, &function_record_PyTypeObject); | ||
if (py_func_rec == nullptr) { | ||
throw error_already_set(); | ||
} | ||
py_func_rec->cpp_func_rec = nullptr; // For clarity/purity. Redundant in practice. | ||
return reinterpret_steal<object>((PyObject *) py_func_rec); | ||
} | ||
|
||
PYBIND11_NAMESPACE_BEGIN(function_record_PyTypeObject_methods) | ||
|
||
// Guard against accidents & oversights, in particular when porting to future Python versions. | ||
inline PyObject *tp_new_impl(PyTypeObject *, PyObject *, PyObject *) { | ||
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_new_impl"); | ||
// return nullptr; // Unreachable. | ||
} | ||
|
||
inline PyObject *tp_alloc_impl(PyTypeObject *, Py_ssize_t) { | ||
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_alloc_impl"); | ||
// return nullptr; // Unreachable. | ||
} | ||
|
||
inline int tp_init_impl(PyObject *, PyObject *, PyObject *) { | ||
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_init_impl"); | ||
// return -1; // Unreachable. | ||
} | ||
|
||
// The implementation needs the definition of `class cpp_function`. | ||
void tp_dealloc_impl(PyObject *self); | ||
|
||
inline void tp_free_impl(void *) { | ||
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_free_impl"); | ||
} | ||
|
||
inline PyObject *reduce_ex_impl(PyObject *self, PyObject *, PyObject *) { | ||
// Deliberately ignoring the arguments for simplicity (expected is `protocol: int`). | ||
const function_record *rec = function_record_ptr_from_PyObject(self); | ||
if (rec == nullptr) { | ||
pybind11_fail( | ||
"FATAL: function_record_PyTypeObject reduce_ex_impl(): cannot obtain cpp_func_rec."); | ||
} | ||
if (rec->name != nullptr && rec->name[0] != '\0' && rec->scope | ||
&& PyModule_Check(rec->scope.ptr()) != 0) { | ||
object scope_module = get_scope_module(rec->scope); | ||
if (scope_module) { | ||
return make_tuple(reinterpret_borrow<object>(PyEval_GetBuiltins())["eval"], | ||
make_tuple(str("__import__('importlib').import_module('") | ||
+ scope_module + str("')"))) | ||
.release() | ||
.ptr(); | ||
} | ||
} | ||
set_error(PyExc_RuntimeError, repr(self) + str(" is not pickleable.")); | ||
return nullptr; | ||
} | ||
|
||
PYBIND11_NAMESPACE_END(function_record_PyTypeObject_methods) | ||
|
||
PYBIND11_NAMESPACE_END(detail) | ||
PYBIND11_NAMESPACE_END(PYBIND11_NAMESPACE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry this implementation violated the protocol: the returned function call should have created an object of the same type as the one that was reduce_ex-d. aka:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting from the Python documentation:
There is no mention that the callable has to be of the same type is the object being pickled. Such a restriction would be completely artificial and severely limiting:
There are types that for good reasons are not referenced from any importable module. There is no way that
pickle.load()
could find them.This is the case for Boost.Python and pybind11 functions.
Going deeper into the weeds:
Python has
__builtins__
as home for some similar types. I was thinking of adding the pybind11 function type to__builtins__
, but I was told that future Python versions will make this impossible.Even deeper:
We could create something like
__pybind11__builtins__
(I'd actually call that__pybind11_internals__
), which would have the benefit of solving problems like here, BUT, how does that spring into existence? — Python creates__builtins__
on startup. We don't have that luxury.Also note that there may be multiple pybind11 versions / ABIs in use simultaneously in any given Python interpreter.
Conclusion: I don't think there is any practical way to implement a callable of the "correct" pybind11 function record type that
pickle.load()
could find.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is the type of the return value of the callable, which is supposed to be the "initial version of the object". From what I read the current approach is:
I argue that step 1's callable should have returned the PyObject for function_record instead, because of the "returns initial value" requirement. As function_record has a name, and a scope, I think we should have enough information for that. The records can be either attached to a pybind11_builtin module, or the actual scope where the CCallable is attached.
Then step 2 can be adjusted accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Initial" here refers to what happens when the object is unpickled.
When the object is unpickled: the callable creates an object, then calls
__getstate__
(if there is one) to mutate the "initial" object.The step 2. is something we have no control over.
Because it is coded into the pickle implementation how it handles
built-in method
objectsrepr(m.simple_callable)
:Unless we want to completely change how pybind11 functions are implemented (nanobind actually did that!), we are forced to design the
__reduce_ex__
of the function_record to produce:A callable that returns an (intermediate) object for with
getattr(obj, 'simple_callable
)` returns our function object.The intermediate object could be anything that has that behavior.
Additional important requirement: it needs to assume that the module with the function was not imported already, otherwise the pickle load step (we have to assume it's in a new process) will fail.
In other words, in or before the
getattr(obj, 'simple_callable')
step the module needs to be imported, and we cannot rely on any side-effects of importing the module before we trigger the import:Yes, the function object has a name and a scope, but the type object does not have a scope.
This will be a problem even for nanobind, which doesn't use
PyCFunction
but has it's ownnanobind.nb_func
type. To stay with that concrete example: unlessnanobind
exists as an importable module, the pickle load stage cannot work. This is what I meant by the "spring into existence" problem mentioned before.I was thinking of giving giving the function_record type an
__init__
that could act as the callable in the pickle mechanism, but that's the exact same problem: how does it spring into existence in the pickle load stage?Note that this PR solves that problem in a very minimalistic and clean fashion:
It could be even better (avoid the
eval
) if we had something like__import_module__
, which could then act as the callable directly:Unfortunately that does not exist. I actually tried under this PR to make such a callable myself, and to inject it into the
rec->scope
when acpp_function
is built (so that it springs into existence just in time), but that fell flat because stubgen picks it up (>500 TGP failures because we have ~130 auto-generated but checked-in stubgen files). See message of the commit that replaced the "pickle helper" callable witheval
:0e7bb10
I had
_function_record_pickle_helper_abi_stamp
as the name at that time. I also tried__function_record_pickle_helper_abi_stamp
and__function_record_pickle_helper_abi_stamp__
but stubgen stubbornly picked it up no matter what. Theeval
approach was born out of an intense search for a solution that does not make it necessary to change stubgen, or worse, 130+ checked-in auto-generated files all over the codebase.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @rainwoodman for taking the time to go through this yesterday 1:1, and for correcting my understanding of your concern.
I captured the experimental code we worked on in 02b31b1 (exactly as it was when we left off).
Our conclusion is captured in the long comment added with bcdfc69. Could you please take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment and the test. This looks good!