Add support for stream_id #104

drolando · 2017-07-21T21:56:42Z

2 main changes:

allow setting an optional stream_id in the request
if stream_id is set, all responses with a different id will be dropped

The issue we're hitting right now is that we set a low timeout (increasing it is not an option), which makes some of our queries timeout. AFAICT there's the possibility of a race condition where the first client times out, immediately after a second client sends a different request using the same connection and ends up reading the old response for client1. The way to avoid this is to set and verify the stream_id and ignore any non-matching message.

NOTES

The logic to assign an id to a request has to be protected with a mutex to avoid 2 requests ending up with the same id. Since the code outside lib/resty/cassandra cannot depend on openresty packages, I had to generate the id in cluster.lua and propagate it to lib/cassandra/cql.lua:build_frame.

The response read code in lib/cassandra/init.lua:send is now inside an infinite loop. We'll keep reading until we find a response with the right id. Or until the timeout expires.

lib/cassandra/init.lua and lib/resty/cassandra are not unit tested at all, so I haven't added any new test for my change.

Also the integration tests fail on my macbook: cassandra refuses to start with Can not set double field org.apache.cassandra.config.Config.commitlog_sync_batch_window_in_ms to null value.
I'll let travis run them.

atancoder · 2017-07-21T22:57:45Z

lib/resty/cassandra/cluster.lua

@@ -550,6 +551,15 @@ function _Cluster:refresh()
  -- initiate the load balancing policy
  self.lb_policy:init(peers)

+  if self.stream_ids == nil then
+    -- Initialize the list of available seed ids (only once)
+    self.stream_ids = {}


Why not just use a counter for stream_id? It's less computationally expensive and you'll end up writing much less code

True, I could use a counter and re-initialize it when we get over 2^7, however that gives us less features than this implementation.

Having a list of available ids and removing and reinserting them means only non currently used ids will be available to get. With a counter, there's no way to keep track of which ids are still being used. This might not be a big deal in the protocol v3 where we can use 32767 ids, but might be an issue with protocol v2 since we only have 127 available ids there.

The list of ids also gives us the advantage that we'll always be using the least recently used (released) id since they're added and removed in order.

This logic is similar to what's done in the python datastax driver: https://github.com/datastax/python-driver/blob/6bec6fd7e852ae33e12cf475b030d08260b04240/cassandra/connection.py#L286

I agree that using the list implementation provides more features than using the counter.
I'm just concerned about the performance impact of inserting into the front.
From the official docs: "The table.insert function inserts an element in a given position of an array, moving up other elements to open space." Making it seem like an O(N) operation.

Also one thing to note is that this doesn't perfectly solve the issue of getting old responses. It seems to me that the stream_id will be released after a timeout, allowing another request to pick up that same stream_id and connection, resulting in the new request receiving the old response.
It's definitely a small probability of happening but the same can be said of a counter.

If Lua can insert in O(1) time to the front, then there shouldn't be a problem. But if it doesn't, it might be useful to think of implementing it using the counter, which is better than the current version (stream_id = 0) but not as robust as the list version.

No, inserting at the beginning of an array as I'm doing takes O(N) but according to the docs is implemented in C and it's very fast.

I run get_stream_id and release_stream_id in a loop 10,000 times and I saw that each call took on average:
get_stream_id: 2.4 usec
release_stream_id: 54usec

Another possibility is to use a linked list instead of an array, which has the same features as the current implementation, O(1) get and put but uses a bit more memory.

I'm not completely against the counter implementation though since we only use the v3 protocol so it should work fine for us, less for people on v2

thibaultcha · 2017-07-22T02:57:43Z

@drolando Interesting, thanks for the PR! I have given it a look and will have some comments, but will need a couple days to post them. Are you already using this patch in production? And if so, have you noticed any performance impact (at least just from a QPS point of view)?

thibaultcha · 2017-08-27T03:49:54Z

@drolando Sorry for the delay! I really need some time on my hands to be able to look at this, I hope to get there soon since I finally got another library out of the door and now have more time to dedicate on this lib.

drolando · 2017-08-28T15:40:12Z

np. Fwiw we've been running this patch in production for a month now and didn't have any issue.

thibaultcha · 2017-08-29T22:33:03Z

Hi @drolando,

Here are the concerns that have been on my mind but that I haven't had time to put in writing until now.

As far as I can tell, this logic will not prevent two workers from using similar stream ids at the same time. Sure, the mutex prevents two workers from choosing a stream_id at the same time. But since each worker operates in its own Lua VM, each worker has a different self.stream_ids table. And that is where the flaw is. Example:

Consider 2 workers with the following self.stream_ids arrays:

w1: [126, 127]
w2: [126, 127]

Now, if both workers process a query at the same time, both workers run get_stream_id():

w1: lock('stream_id')
    w2: lock('stream_id') -- sleep
w1: table.remove(self.stream_ids) -- returns 126
w1: unlock('stream_id')
    w2: table.remove(self.stream_ids) -- returns 126

We now have two workers using 126 at the same time as the stream_id for their request.

Additionally, the mutex logic does not prevent more than two workers from choosing stream ids at the same time. Example:

Consider 3 workers w1, w2, and w3:

w1: lock('stream_id')
    w2: lock('stream_id') -- sleep
    w3: lock('stream_id') -- sleep
w1: table.remove(self.stream_ids)
w1: unlock('stream_id') -- unlocks both w2 and w3
    w2: table.remove(self.stream_ids) + w3: table.remove(self.stream_ids) -- concurrently executed
    w2: unlock('stream_id') + w3: unlock('stream_id') -- could unlock another worker's lock

We here illustrate several issues with the mutex logic: once a worker released the mutex, all other workers execute the get_stream_id() logic concurrently. Additionally, once they are done, they also release a lock that was already released (and potentially release a mutex set by an hypothetical w4, which is the second issue with this logic).

To have one source of truth for stream ids, and to ensure that source of truth's access is protected by a mutex so no two workers can modify it at the same time, you should take a look at the double-ended queue capabilities of ngx_lua (see ngx.shared.dict:lpush() et al.). However, I believe inserting a mutex in such a hot code paths (if cluster:execute() is called once for each request) will create a bottleneck. If you are doing read operations, you can use a solution like the new lua-resty-mlcache, but Cassandra being a write-oriented database, this is an issue for time-series metrics for example. A benchmark of this would be interesting.

I haven't looked into it, but do you happen to know what happens when two frames are using the same stream_id? Does Cassandra refuses to process a query that bears the same stream_id as an already-running query? I will check the CQL protocol to dig for clarifications...

It is also worth noting that connection pools are only per-worker, and not per Nginx instances. This information could be meaningful when deciding whether several workers can use the same stream_id or not... A possible option could be to divide the pool of available ids per worker (i.e. w1 gets 1-16383, and w2 gets 16384-32767). This is just a wild idea because it might not be applicable in practice - not sure.

thibaultcha · 2017-09-13T20:10:47Z

@drolando Hi; any thoughts on the above?

drolando · 2017-09-27T21:34:33Z

@thibaultcha Sorry, I completely missed you comments.

protocol v3: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec#L130

this logic will not prevent two workers from using similar stream ids at the same time

True, but that's not important. Each worker has its own connection pool, so they can use the same id if they want.
The id is used to differentiate responses on the same connection so it only has to be unique per connection. Since here we don't have control on which connection we'll use, we have to make them unique for the entire connection pool.

Does Cassandra refuses to process a query that bears the same stream_id as an already-running query?

The protocol only says Cassandra will put the same id in the response, but it won't validate it in any way.

thibaultcha · 2017-09-27T21:53:57Z

@drolando No worries.

True, but that's not important. Each worker has its own connection pool, so they can use the same id if they want.

Indeed they do have separate connection pools (as highlighted in my previous comment). Then what is the goal of the mutex in this patch? Wasn't it to prevent two workers from using the same stream id?

The protocol only says Cassandra will put the same id in the response, but it won't validate it in any way.

Hmm indeed. For some reason I wouldn't feel comfortable just making that assumption though... I'd be inclined to double check that (although I do have a hunch this assumption will turn true).

In the meantime, should we rework this logic to not be built around an shm mutex anymore?

drolando · 2017-09-27T23:53:05Z

Then what is the goal of the mutex in this patch? Wasn't it to prevent two workers from using the same stream id?

What I wanted to prevent is 2 coroutines on the same worker from using the same id. However I just remembered that nginx non pre-emptive, we don't really risk any race condition, right?

In that case I could just remove the mutex entirely

drolando · 2017-10-05T19:02:31Z

Sorry for the delay on this. I now have some spare time so I'll work in finish upstreaming this change.

I have already made most changes suggested in the above comments, I'm just stuck in trying to run the tests. I'll submit a few other pull-requests while I fix them.

drolando · 2017-10-06T18:35:49Z

A bunch of changes here:

I removed the mutex as discussed. This meant I could move the stream_id implementation inside cassandra/init.lua since I don't need anything openresty specific anymore.

To improve the performance of pushing and popping from the stream_ids list I decided to use a deque. I copied its implementation from the "programming in lua" book, I've no idea why they put it there but not in the stdlib... The implementation is quite simple and reduces the time complexity to O(1) for both push and pop.

I also added a bunch of unit tests to make sure the stream_id is released and that the code does the right thing if it receives a response with the wrong stream_id in the header.

drolando · 2017-10-06T18:39:03Z

I've also rebased on master so hopefully the tests should now pass

atancoder reviewed Jul 21, 2017

View reviewed changes

drolando force-pushed the add_stream_id_support branch from bb05876 to dc6151e Compare October 6, 2017 18:37

drolando and others added 2 commits October 9, 2017 11:24

Add support for stream_id

759627a

Remove mutex - use deque for speed - move implementation in init.lua

0f62ca4

drolando force-pushed the add_stream_id_support branch from dc6151e to 0f62ca4 Compare October 9, 2017 18:24

thibaultcha force-pushed the master branch from 3bb1e4b to 59b278a Compare November 15, 2018 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for stream_id #104

Add support for stream_id #104

drolando commented Jul 21, 2017

atancoder Jul 21, 2017

drolando Jul 21, 2017

atancoder Jul 21, 2017

drolando Jul 24, 2017

thibaultcha commented Jul 22, 2017

thibaultcha commented Aug 27, 2017

drolando commented Aug 28, 2017

thibaultcha commented Aug 29, 2017 •

edited

Loading

thibaultcha commented Sep 13, 2017

drolando commented Sep 27, 2017 •

edited

Loading

thibaultcha commented Sep 27, 2017 •

edited

Loading

drolando commented Sep 27, 2017

drolando commented Oct 5, 2017

drolando commented Oct 6, 2017

drolando commented Oct 6, 2017

Add support for stream_id #104

Are you sure you want to change the base?

Add support for stream_id #104

Conversation

drolando commented Jul 21, 2017

atancoder Jul 21, 2017

Choose a reason for hiding this comment

drolando Jul 21, 2017

Choose a reason for hiding this comment

atancoder Jul 21, 2017

Choose a reason for hiding this comment

drolando Jul 24, 2017

Choose a reason for hiding this comment

thibaultcha commented Jul 22, 2017

thibaultcha commented Aug 27, 2017

drolando commented Aug 28, 2017

thibaultcha commented Aug 29, 2017 • edited Loading

thibaultcha commented Sep 13, 2017

drolando commented Sep 27, 2017 • edited Loading

thibaultcha commented Sep 27, 2017 • edited Loading

drolando commented Sep 27, 2017

drolando commented Oct 5, 2017

drolando commented Oct 6, 2017

drolando commented Oct 6, 2017

thibaultcha commented Aug 29, 2017 •

edited

Loading

drolando commented Sep 27, 2017 •

edited

Loading

thibaultcha commented Sep 27, 2017 •

edited

Loading