-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom serialization and deserialization (2015 edition) #197
Conversation
3.3 seems to have some reference counting idiosyncrasies. I'll investigate further; if it turns out that it's just a matter of implementation details and the C-API is consistent between 3.3 and 3.4, I'll disable refcount tests for 3.3. |
Cool: I think the current branch tip is actually OK, and the failure was caused by the external memcached process failing to start on one of the builders. Is there an easy way to rerun Travis? This new commit is test-only. Basically, pickle has some internal memoization that can confuse refcounts, in particular the refcount of
Thoughts? |
I added caching for https://gist.github.com/slingamn/2a52be1180deb0b92336 Observations:
|
@@ -2491,9 +2554,14 @@ static void _make_excs(PyObject *module) { | |||
PylibMCExc_Error = PyErr_NewException( | |||
"pylibmc.Error", NULL, NULL); | |||
|
|||
PylibMCExc_CacheMiss = PyErr_NewException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be a root exception, especially if its only use is to signal failure in deserialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, "root exception" in terms of being a public member, or in terms of its place in the inheritance hierarchy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter
- Ludvig
On 10 okt. 2015, at 03:13, Shivaram Lingamneni [email protected] wrote:
In src/_pylibmcmodule.c:
@@ -2491,9 +2554,14 @@ static void _make_excs(PyObject *module) {
PylibMCExc_Error = PyErr_NewException(
"pylibmc.Error", NULL, NULL);
- PylibMCExc_CacheMiss = PyErr_NewException(
Sorry, "root exception" in terms of being a public member, or in terms of its place in the inheritance hierarchy?—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it inherit from BaseException
instead of Exception
? This seems not recommended.
For comparison, StopIteration
inherits from Exception
(but not from StandardError
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that it seems strange to not inherit from the common base exception in pylibmc.
On 11 okt. 2015, at 14:55, Shivaram Lingamneni [email protected] wrote:
In src/_pylibmcmodule.c:
@@ -2491,9 +2554,14 @@ static void _make_excs(PyObject *module) {
PylibMCExc_Error = PyErr_NewException(
"pylibmc.Error", NULL, NULL);
- PylibMCExc_CacheMiss = PyErr_NewException(
Should it inherit from BaseException instead of Exception? This seems not recommended.
For comparison, StopIteration inherits from Exception (but not from StandardError).
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, gotcha.
Would you prefer a patch more along the lines of #75, where the client can choose between "user mode" and "native mode" serialization, thus eliminating the slowdown? I could do that. |
The question is whether or not it impacts performance; I've designed a rather solid benchmarking tool these last couple of days to this end. If you have the time, I would be super interested in seeing the benchmark results before and after the patch -- you don't need to print the graphs, but that would be nice too I guess. On a note unrelated this ticket, it seems like pylibmc's performance has degraded pretty severely over the years. 😢 |
I should also add that my gut is saying native mode sounds like a good idea, but backing it up with some benchmarks before and after would be really good. |
Results from the new benchmark are consistent with the results posted earlier (from older versions of the benchmark) --- a 3-microsecond slowdown per serialization/deserialization, multiplied by the number of values (hence 30 microseconds in the Multi benchmark). Here's the output: https://gist.github.com/slingamn/294f6991521144f34266 I'm open to eliminating this via native mode serialization. Do you have the historical numbers handy? I'm interested in going back and looking to see if some of the regressions can be eliminated. |
Thoughts? |
Sadly I don't have much in ways of historic numbers. Indeed, it might be hard to compare the times between versions simply due to differences in the machine running the test suite. I guess the python-memcached runtime is always a good reference point. |
I would love for pylibmc to be faster than it seems to be right now, though. Initially, the idea was to make something that performs better than the pure-Python implementation, which I remember used to be the case. Then again I didn't really know about statistics back then, so… Eh, well. ;) I'll merge this for now and release it in a while -- I guess you would prefer this be released sooner rather than later? A look into native-mode serialization sounds good. |
custom serialization and deserialization (2015 edition)
Oops wrong button |
Thanks! I can release this internally in the environment I'm planning to deploy it, so I'm not necessarily in a hurry for the release --- it's more important that the patch is upstreamed in the long term. (We were using a version of #102 internally, but this blocked us from upgrading because the patch had bitrotted substantially against the new upstream versions.) |
Alright, so I guess we can close #102 now as well? I’ll get to releasing ASAP, which is not that soon I’m afraid.
|
Good to hear it, @jstasiak. A new release is out. Enjoy! |
I actually already tested it few days ago and it worked very well. Once again - thank you. |
cc @bukzor
There are two commits in this PR. One is minor cleanup to some error-handling logic in the get_multi implementation. (I can split that out into a different PR.)
The main change is derived from #102 and incorporates the discussion from #75: it makes serialization and deserialization customizable by overriding methods in a client subclass. The current C implementations have become the
serialize
anddeserialize
methods of_pylibmc.client
; instead of dispatching to them via C function calls, it is now necessary to dispatch to them viaPyObject_CallMethod
, even in the normal case where they are not overridden.I have before-and-after benchmarks showing a small slowdown, at a low order of magnitude (apparently a few microseconds). The additional overhead is basically just:
flags
value in a Python integerflags
in a tupleThis should all be pretty cheap. (I might be able to recover some of the lost time by caching the values of
cPickle.dumps
andcPickle.loads
instatic PyObject *
variables; right now we do an import and a getattr every time.)Subclasses can raise
CacheMiss
to indicate that a retrieved value should be treated as though it were a miss. The reason it is necessary to do this, rather than having them returnNone
, is because of ambiguities in the API inherited from python-memcache. pylibmc allows clients to set aNone
value (the pickle ofNone
will be set in memcached), but such a value cannot be retrieved viaget
or__getitem__
in a way that distinguishes it from a cache miss. Specifically, in both cases,get
will returnNone
and__getitem__
will raise aKeyError
. However, a storedNone
can be retrieved viaget_multi
, which will return a dictionary mapping the relevant key toNone
. In contrast, when a key misses underget_multi
, the key will not appear in the result dictionary at all. Thus, with respect toget_multi
(and some other APIs), there's a need to distinguishNone
from a miss.Thanks for your time!