[RFC, WIP] Global memorypool implementation #3644

f18m · 2019-08-28T22:45:35Z

See #3631 for the full discussion.
This PR is just to show my current results with:

using moodycamel::concurrentqueue (MPMC zerolock queue)
using an API that will not impact on the ZMQ context API (no allocator is set inside the context) but rather directly on the zmq_msg_t API (there's a new zmq_msg_init_allocator API)

Any comment is welcome!

by completely removing malloc/free from the hot path of benchmark util

sonatype-lift · 2019-08-28T22:52:48Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3529, column 12 is not reachable after line 3530, column 10.

sonatype-lift · 2019-08-28T22:52:50Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3522, column 12 is not reachable after line 3523, column 10.

sonatype-lift · 2019-08-28T22:59:54Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3529, column 12 is not reachable after line 3530, column 10.

sonatype-lift · 2019-08-28T22:59:55Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3522, column 12 is not reachable after line 3523, column 10.

I am aware of the this double-posting of comments. It is technically due to a double-send of the pull request event from GitHub. Solutions are being considered but unless the issue occurs painfully often I don't expect to give it priority. Please feel free to shout about any pain points.

What's worse is that the line numbers are wrong, the comment is made at line 1548 but it mentions line 3522. Around line 1548 is no malloc call.

anyway I think I found the reason of a possible memory leak (global_memory_pool_t dtor was not yet implemented!) and fixed it in my last checkin

Thank you for calling this out. I am now in contact with GitHub regarding this issue.

Details: I've investigated and determined it is a GitHub bug. The quickest way to tell something is inconsistent is by clicking on the file src/concurrentqueue.h and seeing github place the comment on the correct line (35xx instead of 15xx).

somdoron · 2019-08-29T07:36:04Z

As explained in the other pull request, single-consumer/single-producer memory pool is very dangerous and will break zeromq for existing users. I think we should either not include a built-in memory pool at that moment and only allow users to provide their own, or look for a safe alternative.

somdoron · 2019-08-29T07:40:22Z

src/msg.cpp

+    _u.lmsg.metadata = NULL;
+    _u.lmsg.type = type_lmsg;
+    _u.lmsg.flags = 0;
+    _u.lmsg.allocator_was_used = 1;


I don't think we need another field, you can use the ffn and maybe another msg type?

yes ok I can do this change by adding a message type "type_lmsg_from_allocator" or something like that.

I was a bit scared by the idea of adding a new msg type since I was not sure if that would trigger some side effect: there are a few methods like zmq::msg_t::is_lmsg() . I will probably need to find out the places where they get called and understand if I need to add also a call to the new
zmq::msg_t::is_lmsg_from_allocator()
but that's doable of course

somdoron · 2019-08-29T07:51:37Z

src/msg.cpp

+    zmq_assert (alloc_ != NULL && size_ != 0);
+
+    _u.lmsg.metadata = NULL;
+    _u.lmsg.type = type_lmsg;


I think we should have a type for an allocator msg type, just like the one for zero-copy messages:
https://github.com/zeromq/libzmq/blob/master/src/msg.hpp#L197

f18m · 2019-08-30T21:56:54Z

Hi @somdoron

As explained in the other pull request, single-consumer/single-producer memory pool is very dangerous and will break zeromq for existing users. I think we should either not include a built-in memory pool at that moment and only allow users to provide their own, or look for a safe alternative.

I tend to agree; indeed in this WIP memory pool I used the moodycamel::ConcurrentQueue which is a "A fast multi-producer, multi-consumer lock-free concurrent queue for C++11".
This kind of queue allows for safe access in any kind of multithread context. Of course in my proposed implementation there are zero locks even on some variables like:

global_memory_pool_t::m_storage
allocator_t::_type
allocator_t::_tag

however these variables should be set only during initial setup and not later when multiple threads will be using these data structures.

sonatype-lift · 2019-08-30T22:00:11Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3529, column 12 is not reachable after line 3530, column 10.

sonatype-lift · 2019-08-30T22:00:12Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3522, column 12 is not reachable after line 3523, column 10.

sonatype-lift · 2019-08-30T22:21:32Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3529, column 12 is not reachable after line 3530, column 10.

sonatype-lift · 2019-08-30T22:21:33Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3522, column 12 is not reachable after line 3523, column 10.

f18m · 2019-08-31T14:50:30Z

Hi @somdoron , @sigiesec , @bluca ,
a few points that I think are worth considering in a possible implementation like this one I proposed are:

the proposed pool differentiates among 6 different "classes" of message sizes, from 0 to 8kB (I will spend some time to understand what is the byte range that actually provides a performance advantage compared to malloc/free so the 8kB limit is not well defined yet). How to handle messages >8kB? option a) the allocator should just fallback to malloc/free. Option b) however is to have the
zmq_msg_init_allocator() method fail instead, so that the user will know it should just use zmq_msg_init_size() instead.
how to avoid hardcoding the max number of active messages maintained by the memory pool? I think we have 3 options: a) dynamically grow the memory pool by malloc'ing a new big block of messages when a memory pool for a certain class is exhausted; b) just refuse to provide memory from exhausted memory pools; c) provide an option to the user to choose a) or b).
how to retrieve statistics data from the memory pool (always very useful IMO). I would suggest a
int zmq_msg_allocator_get_stats(void* allocator_, int size_class_, int stat_);
method which might take size_class_ = [256|512|...] and stat_= NUM_AVAILABLE_MSGS|MEMORY_POOL_SIZE.
What do you think?

Just for comparison the DPDK framework has chosen option 1b, option 2b.

PS: I will perform some performance benchmark to understand if the advantage is still like the one I found in #3631

sonatype-lift · 2019-08-31T15:08:26Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3529, column 12 is not reachable after line 3530, column 10.

sonatype-lift · 2019-08-31T15:08:27Z

src/concurrentqueue.h

+				// Increment counter
+				auto prevVal = elementsCompletelyDequeued.fetch_add(count, std::memory_order_release);
+				assert(prevVal + count <= BLOCK_SIZE);
+				return prevVal + 


MEMORY_LEAK: memory dynamically allocated by call to malloc at line 3522, column 12 is not reachable after line 3523, column 10.

somdoron · 2019-09-01T08:40:58Z

@f18m

Regarding (1) I think fallback to malloc/free is better, as one of the users of this method is going to be the decoder classes.

Regarding (3), I would the msg from the method name, as this is a method of the allocator class.

somdoron · 2019-09-01T08:41:53Z

include/zmq.h

+// default allocator using malloc/free
+#define ZMQ_MSG_ALLOCATOR_DEFAULT 0
+// using internally a SPSC queue (cannot be used with inproc maybe?) or perhaps an MPMC queue anyway
+#define ZMQ_MSG_ALLOCATOR_PER_THREAD_POOL 1


I think we should drop this for now and stay with the global pool

somdoron · 2019-09-01T08:42:36Z

include/zmq.h

+// using internally a MPMC queue
+#define ZMQ_MSG_ALLOCATOR_GLOBAL_POOL 2
+
+ZMQ_EXPORT void *zmq_msg_allocator_new (int type_);


maybe we can drop the msg from the class name?

somdoron · 2019-09-01T08:46:05Z

I wonder if we want to call it pool instead of allocator.

Also, how do you plan for the decoders to get a hand of the allocator?

sigiesec · 2019-09-02T07:01:16Z

I think allocator is the appropriate abstract term here. A pool is a specific kind of allocator, but there might be other allocators that are custom but not based on a pool.

mjvankampen · 2020-05-03T14:47:50Z

Hi guys I continued a bit here as this PR seemed a bit dead. I was wondering how you (@f18m and the zmq guys) would like to proceed.

If you would like me to continue on this I think it is better if I make a separate PR and we can discuss what needs to change there. @f18m if you want to continue feel free to merge my branch if you think it is useful.

f18m · 2020-05-05T14:33:15Z

Hi @mjvk,
it would be fantastic if you could complete this "concept" (memory pools)... unfortunately I sort of ran out of time for this PR so I'm not planning to resume this work anytime soon... :(

mjvankampen · 2020-05-05T16:41:43Z

Ok cool, I'll try to pick this up. Your comments will be very welcome of course. One would be a C++03 compatible MPMC queue for example, or I'll just add some guards.

bluca · 2020-05-24T13:39:10Z

#3911 seems to have picked up these changes, so closing this one.

f18m added 11 commits August 13, 2019 11:19

Implement a very simple zero-lock message pool to test performance gains

4f84758

by completely removing malloc/free from the hot path of benchmark util

Merge branch 'master' of https://github.com/f18m/libzmq.git

ea0dc06

Allow to choose message sizes as well

a24f2af

Allow using env variables to do some basic overriding

1bd2ae1

fix typo

252e8d4

add TCP kernel socket buffer setting

4a30795

Merge remote-tracking branch 'upstream/master'

577232e

Merge remote-tracking branch 'upstream/master'

ff8d79f

First implementation of global memory pool for ZMQ

00e514e

Remove changes related to graph generation

18c52c4

allow testing up to 8k msg sizes

a720a31

sonatype-lift bot reviewed Aug 28, 2019

View reviewed changes

somdoron reviewed Aug 29, 2019

View reviewed changes

correctly deallocate memory pool blocks

b9e1f01

sonatype-lift bot reviewed Aug 30, 2019

View reviewed changes

fix build with no draft API

1649701

sonatype-lift bot reviewed Aug 30, 2019

View reviewed changes

never use allocator for VSM

0baafa4

sonatype-lift bot reviewed Aug 31, 2019

View reviewed changes

somdoron reviewed Sep 1, 2019

View reviewed changes

mjvankampen mentioned this pull request May 12, 2020

[WIP, RFC] Problem: lot of use of dynamic memory allocation, leading to performance degradation and non-determism #3911

Draft

18 tasks

bluca closed this May 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC, WIP] Global memorypool implementation #3644

[RFC, WIP] Global memorypool implementation #3644

f18m commented Aug 28, 2019

sonatype-lift bot Aug 28, 2019

sonatype-lift bot Aug 28, 2019

sonatype-lift bot Aug 28, 2019

sonatype-lift bot Aug 28, 2019

TomMD Aug 29, 2019

sigiesec Aug 29, 2019

f18m Aug 30, 2019

TomMD Aug 30, 2019

somdoron commented Aug 29, 2019

somdoron Aug 29, 2019

f18m Aug 30, 2019

somdoron Aug 29, 2019 •

edited

Loading

f18m commented Aug 30, 2019

sonatype-lift bot Aug 30, 2019

sonatype-lift bot Aug 30, 2019

sonatype-lift bot Aug 30, 2019

sonatype-lift bot Aug 30, 2019

f18m commented Aug 31, 2019

sonatype-lift bot Aug 31, 2019

sonatype-lift bot Aug 31, 2019

somdoron commented Sep 1, 2019

somdoron Sep 1, 2019

somdoron Sep 1, 2019

somdoron commented Sep 1, 2019

sigiesec commented Sep 2, 2019

mjvankampen commented May 3, 2020

f18m commented May 5, 2020

mjvankampen commented May 5, 2020

bluca commented May 24, 2020

[RFC, WIP] Global memorypool implementation #3644

[RFC, WIP] Global memorypool implementation #3644

Conversation

f18m commented Aug 28, 2019

sonatype-lift bot Aug 28, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 28, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 28, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somdoron commented Aug 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somdoron Aug 29, 2019 • edited Loading

Choose a reason for hiding this comment

f18m commented Aug 30, 2019

sonatype-lift bot Aug 30, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 30, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 30, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 30, 2019

Choose a reason for hiding this comment

f18m commented Aug 31, 2019

sonatype-lift bot Aug 31, 2019

Choose a reason for hiding this comment

sonatype-lift bot Aug 31, 2019

Choose a reason for hiding this comment

somdoron commented Sep 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somdoron commented Sep 1, 2019

sigiesec commented Sep 2, 2019

mjvankampen commented May 3, 2020

f18m commented May 5, 2020

mjvankampen commented May 5, 2020

bluca commented May 24, 2020

somdoron Aug 29, 2019 •

edited

Loading