custom serialization and deserialization (2015 edition) #197

slingamn · 2015-10-01T08:28:35Z

There are two commits in this PR. One is minor cleanup to some error-handling logic in the get_multi implementation. (I can split that out into a different PR.)

The main change is derived from #102 and incorporates the discussion from #75: it makes serialization and deserialization customizable by overriding methods in a client subclass. The current C implementations have become the serialize and deserialize methods of _pylibmc.client; instead of dispatching to them via C function calls, it is now necessary to dispatch to them via PyObject_CallMethod, even in the normal case where they are not overridden.

I have before-and-after benchmarks showing a small slowdown, at a low order of magnitude (apparently a few microseconds). The additional overhead is basically just:

boxing and unboxing the flags value in a Python integer
boxing and unboxing the serialized bytes together with flags in a tuple
method dispatch magic

This should all be pretty cheap. (I might be able to recover some of the lost time by caching the values of cPickle.dumps and cPickle.loads in static PyObject * variables; right now we do an import and a getattr every time.)

Subclasses can raise CacheMiss to indicate that a retrieved value should be treated as though it were a miss. The reason it is necessary to do this, rather than having them return None, is because of ambiguities in the API inherited from python-memcache. pylibmc allows clients to set a None value (the pickle of None will be set in memcached), but such a value cannot be retrieved via get or __getitem__ in a way that distinguishes it from a cache miss. Specifically, in both cases, get will return None and __getitem__ will raise a KeyError. However, a stored None can be retrieved via get_multi, which will return a dictionary mapping the relevant key to None. In contrast, when a key misses under get_multi, the key will not appear in the result dictionary at all. Thus, with respect to get_multi (and some other APIs), there's a need to distinguish None from a miss.

Thanks for your time!

slingamn · 2015-10-01T11:29:48Z

3.3 seems to have some reference counting idiosyncrasies. I'll investigate further; if it turns out that it's just a matter of implementation details and the C-API is consistent between 3.3 and 3.4, I'll disable refcount tests for 3.3.

slingamn · 2015-10-02T22:07:29Z

Cool: I think the current branch tip is actually OK, and the failure was caused by the external memcached process failing to start on one of the builders. Is there an easy way to rerun Travis?

This new commit is test-only. Basically, pickle has some internal memoization that can confuse refcounts, in particular the refcount of None. (This makes the approach of testing reference counts somewhat more brittle than I had anticipated.) Anyway, the fix is:

Stop checking the refcount of None
Run gc.collect() before each refcount measurement

Thoughts?

slingamn · 2015-10-08T02:06:27Z

I added caching for pickle.loads and pickle.dumps. New benchmarks:

https://gist.github.com/slingamn/2a52be1180deb0b92336

Observations:

Boxing/unboxing and Python method dispatch incur a 3-microsecond penalty when working with bytes, int, bool, and other types that aren't pickled. This penalty gets multiplied by the number of keys in a _multi operation, so the slowdown in the multi I/O benchmark is about 30 microseconds.
When working with pickles, the caching seems to compensate for this, so performance is the same or faster.

lericson · 2015-10-09T15:51:58Z

src/_pylibmcmodule.c

@@ -2491,9 +2554,14 @@ static void _make_excs(PyObject *module) {
    PylibMCExc_Error = PyErr_NewException(
            "pylibmc.Error", NULL, NULL);

+    PylibMCExc_CacheMiss = PyErr_NewException(


I don't think this should be a root exception, especially if its only use is to signal failure in deserialization?

Sorry, "root exception" in terms of being a public member, or in terms of its place in the inheritance hierarchy?

The latter

Ludvig

On 10 okt. 2015, at 03:13, Shivaram Lingamneni [email protected] wrote:

In src/_pylibmcmodule.c:

@@ -2491,9 +2554,14 @@ static void _make_excs(PyObject *module) {
PylibMCExc_Error = PyErr_NewException(
"pylibmc.Error", NULL, NULL);

PylibMCExc_CacheMiss = PyErr_NewException(
Sorry, "root exception" in terms of being a public member, or in terms of its place in the inheritance hierarchy?

—
Reply to this email directly or view it on GitHub.

Should it inherit from BaseException instead of Exception? This seems not recommended.

For comparison, StopIteration inherits from Exception (but not from StandardError).

I mean that it seems strange to not inherit from the common base exception in pylibmc.

On 11 okt. 2015, at 14:55, Shivaram Lingamneni [email protected] wrote:

In src/_pylibmcmodule.c:

@@ -2491,9 +2554,14 @@ static void _make_excs(PyObject *module) {
PylibMCExc_Error = PyErr_NewException(
"pylibmc.Error", NULL, NULL);

PylibMCExc_CacheMiss = PyErr_NewException(

Should it inherit from BaseException instead of Exception? This seems not recommended.

For comparison, StopIteration inherits from Exception (but not from StandardError).

—
Reply to this email directly or view it on GitHub.

Oh, gotcha.

slingamn · 2015-10-13T22:57:12Z

Would you prefer a patch more along the lines of #75, where the client can choose between "user mode" and "native mode" serialization, thus eliminating the slowdown? I could do that.

lericson · 2015-10-20T16:49:23Z

The question is whether or not it impacts performance; I've designed a rather solid benchmarking tool these last couple of days to this end. If you have the time, I would be super interested in seeing the benchmark results before and after the patch -- you don't need to print the graphs, but that would be nice too I guess. On a note unrelated this ticket, it seems like pylibmc's performance has degraded pretty severely over the years. 😢

lericson · 2015-10-20T16:53:41Z

I should also add that my gut is saying native mode sounds like a good idea, but backing it up with some benchmarks before and after would be really good.

slingamn · 2015-10-23T00:07:46Z

Results from the new benchmark are consistent with the results posted earlier (from older versions of the benchmark) --- a 3-microsecond slowdown per serialization/deserialization, multiplied by the number of values (hence 30 microseconds in the Multi benchmark). Here's the output: https://gist.github.com/slingamn/294f6991521144f34266

I'm open to eliminating this via native mode serialization.

Do you have the historical numbers handy? I'm interested in going back and looking to see if some of the regressions can be eliminated.

slingamn · 2015-11-19T01:07:50Z

Thoughts?

lericson · 2015-11-20T20:16:39Z

Sadly I don't have much in ways of historic numbers. Indeed, it might be hard to compare the times between versions simply due to differences in the machine running the test suite. I guess the python-memcached runtime is always a good reference point.

lericson · 2015-11-20T20:17:59Z

I would love for pylibmc to be faster than it seems to be right now, though. Initially, the idea was to make something that performs better than the pure-Python implementation, which I remember used to be the case. Then again I didn't really know about statistics back then, so… Eh, well. ;) I'll merge this for now and release it in a while -- I guess you would prefer this be released sooner rather than later? A look into native-mode serialization sounds good.

custom serialization and deserialization (2015 edition)

lericson · 2015-11-20T20:18:12Z

Oops wrong button

slingamn · 2015-11-20T21:24:35Z

Thanks! I can release this internally in the environment I'm planning to deploy it, so I'm not necessarily in a hurry for the release --- it's more important that the patch is upstreamed in the long term. (We were using a version of #102 internally, but this blocked us from upgrading because the patch had bitrotted substantially against the new upstream versions.)

lericson · 2015-11-30T17:26:17Z

Alright, so I guess we can close #102 now as well?

I’ll get to releasing ASAP, which is not that soon I’m afraid.

On 20 nov. 2015, at 22:24, Shivaram Lingamneni [email protected] wrote:

Thanks! I can release this internally in the environment I'm planning to deploy it, so I'm not necessarily in a hurry for the release --- it's more important that the patch is upstreamed in the long term. (We were using a version of #102 internally, but this blocked us from upgrading because the patch had bitrotted substantially against the new upstream versions.)

—
Reply to this email directly or view it on GitHub.

jstasiak · 2016-03-11T08:55:33Z

Wow! I remember seeing #75 from some time ago and came back to the issue tracker to refresh my memory, it makes me very happy to see this patch, thank you @slingamn and @lericson! Is there a chance this is released on PyPI any time soon?

lericson · 2016-03-29T12:24:47Z

Good to hear it, @jstasiak. A new release is out. Enjoy!

jstasiak · 2016-03-29T15:32:36Z

I actually already tested it few days ago and it worked very well. Once again - thank you.

slingamn added 2 commits September 30, 2015 21:52

clean up failure recovery code in get_multi

9f17b7b

Allow subclasses to override (de)serialization

905d714

make refcount tests more consistent

08a38dd

cache pickle.dumps and pickle.loads

e668304

lericson reviewed Oct 9, 2015
View reviewed changes

CacheMiss should inherit from pylibmc.Error

a6c2b71

lericson closed this Nov 20, 2015

lericson reopened this Nov 20, 2015

lericson added a commit that referenced this pull request Nov 20, 2015

Merge pull request #197 from slingamn/miss_simulation.7

f3e1d03

custom serialization and deserialization (2015 edition)

lericson merged commit f3e1d03 into lericson:master Nov 20, 2015

slingamn mentioned this pull request Dec 25, 2015

Eliminate Python dispatch from (de)serialization #201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom serialization and deserialization (2015 edition) #197

custom serialization and deserialization (2015 edition) #197

slingamn commented Oct 1, 2015

slingamn commented Oct 1, 2015

slingamn commented Oct 2, 2015

slingamn commented Oct 8, 2015

lericson Oct 9, 2015

slingamn Oct 10, 2015

lericson Oct 10, 2015

slingamn Oct 11, 2015

lericson Oct 11, 2015

slingamn Oct 11, 2015

slingamn commented Oct 13, 2015

lericson commented Oct 20, 2015

lericson commented Oct 20, 2015

slingamn commented Oct 23, 2015

slingamn commented Nov 19, 2015

lericson commented Nov 20, 2015

lericson commented Nov 20, 2015

lericson commented Nov 20, 2015

slingamn commented Nov 20, 2015

lericson commented Nov 30, 2015

jstasiak commented Mar 11, 2016

lericson commented Mar 29, 2016

jstasiak commented Mar 29, 2016

custom serialization and deserialization (2015 edition) #197

custom serialization and deserialization (2015 edition) #197

Conversation

slingamn commented Oct 1, 2015

slingamn commented Oct 1, 2015

slingamn commented Oct 2, 2015

slingamn commented Oct 8, 2015

lericson Oct 9, 2015

Choose a reason for hiding this comment

slingamn Oct 10, 2015

Choose a reason for hiding this comment

lericson Oct 10, 2015

Choose a reason for hiding this comment

slingamn Oct 11, 2015

Choose a reason for hiding this comment

lericson Oct 11, 2015

Choose a reason for hiding this comment

slingamn Oct 11, 2015

Choose a reason for hiding this comment

slingamn commented Oct 13, 2015

lericson commented Oct 20, 2015

lericson commented Oct 20, 2015

slingamn commented Oct 23, 2015

slingamn commented Nov 19, 2015

lericson commented Nov 20, 2015

lericson commented Nov 20, 2015

lericson commented Nov 20, 2015

slingamn commented Nov 20, 2015

lericson commented Nov 30, 2015

jstasiak commented Mar 11, 2016

lericson commented Mar 29, 2016

jstasiak commented Mar 29, 2016