nng creates too many threads #1572

Jasper-Bekkers · 2022-02-01T15:59:38Z

Describe the bug
I'm running a 32-core/64-thread threadripper machine, and it looks like in total nng creates 151 threads.

While this could be the wanted case for some apps, for the application that I'm writing it's all around way too much, and clutters up the profiling tools quite a bit.

Thread name	Thread count
nng:iocp	64
nng:ipc:dial	1
nng:resolver	4
nng:task	16
nng:reap2	1
nng:timer	1
nng:aio:expire	64

Expected behavior
Ideally, thread creation is left to the application instead of being done by the nng framework, that way I can control scheduling of work and fit it into my async execution runtime, instead of having nng take up resource (and arguably more important) screen real estate in the profiler.

In our application we're trying to severely limit our number of threads and schedule work on our own.

Actual Behavior
See above

To Reproduce
Simply run nng on a high core-count machine.

** Environment Details **

NNG version: we're using the nng-rs crate, I think it's a bit behind on latest nng
Operating system and version: Windows
Compiler and language used: rustc
Shared or static library: unknown (and probably irrelevant)

Additional context
I found an additional issue on limiting the amount of threads being spun up in #769 this seems to require compile time toggles? Ideally this is something that can be set at runtime (we have a few different use-cases all sharing the same codebase - one case where run the service inproc and want a low thread count, one where we run the service as a daemon and we don't really care about thread count).

The text was updated successfully, but these errors were encountered:

JaylinYu · 2022-02-06T16:01:27Z

This is expected behavior. But I agree with you that nng should have a way to set the num of threads at runtime ideally.
Hi @gdamore, please have a look at "var_num_taskq_threads" https://github.com/nanomq/nng/blob/master/src/core/taskq.c, I will prepare a PR if you are ok with my way.

gdamore · 2022-02-06T18:30:35Z

The iocp and expire threads are created one per core to provide good scalability.

Having said that, I'm totally amenable to reducing this. Note that these threads should generally be sitting around idle unless you have extreme I/O pressure, in which case you actually probably want to have this many threads (although you wind up having a trade-off of application processing and NNG I/O processing.)

gdamore · 2022-02-06T18:32:52Z

@JaylinYu those tunables won't really help this -- because the threads involved aren't taskq threads. I'm also pretty opposed to using globals to tune that -- if we need tuning, then local variables (parameters on the taskq) should be set.

We need to ensure that at least 2 threads are allocated for any taskq, as it's likely that deadlocks will occur if only one thread is allocated. I recommend 4 as a bare minimum actually.

gdamore · 2022-02-06T18:35:34Z

I think the IOCP and expire threads should have a similar default limiter as the tasks... I think we max this to 16. Arguably these could be even smaller -- especially the expire thread can safely be reduced all the way down to 1. (It used to be one, but we started creating a lot more of them to improve scalability by reducing contention on locks -- each thread has it's own queue, and having one per thread tends to ensure we don't contend on the expiration queue locks, which were one of the larger areas of lock contention.

Jasper-Bekkers · 2022-02-08T22:53:02Z

Full disclosure: Since filing this issue I've moved away from nng because of this particular issue.

Note that these threads should generally be sitting around idle unless you have extreme I/O pressure, in which case you actually probably want to have this many threads (although you wind up having a trade-off of application processing and NNG I/O processing.)

I highly doubt this is true, over subscribing a machine like that is almost never useful and just introduces overhead.

gdamore · 2022-02-08T23:12:27Z

Idle threads not runnable threads don't hurt anything at all apart from the stack space they consume.

Where things get really ugly is when you have a large number of runnable threads. Then stuff falls apart.

There are unfortunately also some limitations in userland thread frameworks that wind up using way too many resources when allocating a thread (thread stacks) which is why you can't create 10,000 threads in user space harmlessly. (On modern kernels like illumos you can, but only in kernel space. 10,000 threads in that situations uses like 80MB of RAM for stack space, at 8K per thread.)

gdamore · 2022-02-08T23:13:15Z

Btw, the thread count we allocate is based on the CPU core count of the machine.

gdamore · 2022-04-16T22:52:03Z

Just a few more notes for anyone looking at this:

The 64 expiration threads are almost always idle. They represent a timer thread, and the reason we have so many (one timer per CPU) is to reduce pressure on the central lock used for expirations (timers). We found that in some workloads the expiration queue was tragically contended, and this was a simple solution.

We can probably look at alternative approaches to use fewer threads, since likely the expirations have some reasonable tolerances with respect to timing out.

The IOCP thread list is this way on Windows, as it represents the actual threads doing I/O. Conceivably this could better be done with a Windows thread pool. Note that on other platforms we don't have nearly this many threads, but that comes at a scalability cost as well.

The task threads are the ones doing real work, and likely running user callbacks.

It might be worth looking for ways for user supplied thread(s) to be able to chew on this instead of fixed allocation threads. This could ultimately be used for some kind of cooperative multitasking, although running user callbacks is always a bit dodgy since we have no long those will take to complete.

JochenBaier · 2022-09-18T14:28:14Z

The high amount of thread is also a blocker for me. Basically i want to use the Request/Reply
pattern (https://nanomsg.org/gettingstarted/nng/reqrep.html) with only 2 nodes.

On my 6 core machine (Windows 10) this creates 64 threads…

Patching the limits in source would also be fine for me.

wsxjbupt · 2022-11-16T08:59:50Z

Is there a way to close redundant threads without modifying the source code?
We only use nng for thread-to-thread communication through a simple P2P approach.
The number of threads is also a big obstacle in our system.

@gdamore

bionicbeagle · 2022-11-28T20:39:26Z

I've been evaluating nng for use in some tooling and this issue is a blocker for us as well. Much like @Jasper-Bekkers we often use machines with high core counts and the number of threads is a problem for debuggability and performance profiling since the threads cause lots of clutter as well as making the library needlessly resource intensive regardless of whether threads do actual work or not. In the case of IOCP especially there's typically no real need to have a huge number of threads as long as you only do I/O related work in the completion handlers.

We are currently consuming nng via vcpkg and therefore using build-time defines to reconfigure nng is not really a great option. Ideally there would be an API to allow application-specific tuning. I'm currently not familiar enough with the project to propose what that would look like but might take a stab at it if there is no attractive alternative.

bionicbeagle · 2022-11-30T21:12:31Z

I made a quick-and-dirty local patch to enable initialization-time tweaking of thread counts. It's still a bit rough around the edges but works well enough for me to evaluate.

It adds the following API functions:

void nng_set_ncpu_max(int);  // limit the number of CPUs reported
void nng_set_pool_thread_limit_min(int); // configure minimum number of threads in thread pools
void nng_set_pool_thread_limit_max(int); // configure maximum number of threads in thread pools
void nng_set_resolve_thread_max(int); // configure number of resolve threads

patch is here

I wasn't sure what would be a good naming scheme for these as I didn't see any similar function to mimic, feedback is welcomed! (I already know nng_set_resolve_thread_max is a misnomer as it sets the actual count)

gdamore · 2023-04-20T06:42:16Z

So initially Im going to add compile time defaults that are sensible. I may add runtime for these as well.

gdamore · 2023-04-24T00:19:22Z

I'm using 1619 as the EXPIRE_THREADS tunable, and a PR is out... (I'm waiting to merge that as I've uncovered some other worse bugs that I want to ensure are fixed first.)

I expect this will merge.

I'll revisit the poller for Windows to do something similar for it as well. Longer term we need to look at ways to improve the parallelism in the pollers anyway -- in nanomq they are running into scalability issues beyond 64 cores with the single epoll() thread.

gedalia · 2023-06-23T21:32:20Z

did a fix ever merge?

ayizhi · 2023-08-22T02:37:18Z

Hope these threads num settings functions apply as soon as possible, sincerely, respect

xinxinxiangying258 · 2023-08-22T05:59:14Z

Describe the bug I'm running a 32-core/64-thread threadripper machine, and it looks like in total nng creates 151 threads.

While this could be the wanted case for some apps, for the application that I'm writing it's all around way too much, and clutters up the profiling tools quite a bit.

Thread name Thread count
nng:iocp 64
nng:ipc:dial 1
nng:resolver 4
nng:task 16
nng:reap2 1
nng:timer 1
nng:aio:expire 64
Expected behavior Ideally, thread creation is left to the application instead of being done by the nng framework, that way I can control scheduling of work and fit it into my async execution runtime, instead of having nng take up resource (and arguably more important) screen real estate in the profiler.

In our application we're trying to severely limit our number of threads and schedule work on our own.

Actual Behavior See above

To Reproduce Simply run nng on a high core-count machine.

** Environment Details **

NNG version: we're using the nng-rs crate, I think it's a bit behind on latest nng

Operating system and version: Windows

Compiler and language used: rustc

Shared or static library: unknown (and probably irrelevant)

Additional context I found an additional issue on limiting the amount of threads being spun up in #769 this seems to require compile time toggles? Ideally this is something that can be set at runtime (we have a few different use-cases all sharing the same codebase - one case where run the service inproc and want a low thread count, one where we run the service as a daemon and we don't really care about thread count).

Could you tell me how to find nng threads on Windows? I use Process Explorer but cannot find nng threads named as your table shows. THX.

gdamore · 2023-09-14T07:47:04Z

I've not merged anything... as the patch was not submitted in the form of a PR, I will need some time to look it over and get it into such form so that it can run in our CI.

gdamore · 2023-09-14T07:49:17Z

Btw, there are compile time fixes for these that * did* merge. Runtime is a different matter. Probably it will still need to be set up by the application before calling nng_init() or other functions, because the initial threads have to be configured at startup.

gdamore · 2023-11-26T20:14:52Z

I intend at some point to refactor the expiration subsystem entirely -- which will probably let me greatly reduce the needs for lots of threads, by reducing the lock pressure caused by expiration.

I am probably not going to configuring the expiration threads a runtime tunable.

gdamore · 2023-12-18T06:14:19Z

In head, we got read of the nni_timer thread. It's only thread, but still an improvement, if only a small one.

gdamore · 2023-12-30T03:16:01Z

I've decided to apply a sane limit for Windows by default. 64 was simply too big.

#1749

This limits to 8 by default. Note that POSIX pollers are already single threaded right now, so this isn't necessarily a big deal (although we need to make them parallel too at some point.)

Its compile time tunable as well, but 8 is a lot nicer to work around than 64. It has to be at least 2 right now. (I probably could have allowed just one of these, but I'm paranoid about going too low.)

gdamore · 2024-01-01T19:19:06Z

I think I am going to provide an API for this, that applications can call at start up time. Stay tuned.

This further limits some of the thread counts, but principally it offers a new runtime facility, nng_init_set_parameter(), which can be used to set certain runtime parameters on the number of threads, provided it is called before the rest of application start up. This facility is quite intentionally "undocumented", at least for now, as we want to limit our commitment to it. Still this should be helpful for applications that need to reduce the number of threads that are created.

JochenBaier mentioned this issue Sep 23, 2022

Expose NNG_MAX_EXPIRE_THREADS via CMake #1619

Closed

gdamore changed the title ~~nng creates a massive amount of threads on high core count machines~~ nng creates too many expire threads Apr 20, 2023

xinxinxiangying258 mentioned this issue Aug 22, 2023

How to obseve nng threads on windows #1680

Closed

ngwese mentioned this issue Oct 4, 2023

Norns converged update monome/norns#1706

Merged

gdamore changed the title ~~nng creates too many expire threads~~ nng creates too many threads Jan 1, 2024

gdamore closed this as completed in a9e98e5 Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nng creates too many threads #1572

nng creates too many threads #1572

Jasper-Bekkers commented Feb 1, 2022

JaylinYu commented Feb 6, 2022

gdamore commented Feb 6, 2022

gdamore commented Feb 6, 2022

gdamore commented Feb 6, 2022

Jasper-Bekkers commented Feb 8, 2022

gdamore commented Feb 8, 2022

gdamore commented Feb 8, 2022

gdamore commented Apr 16, 2022

JochenBaier commented Sep 18, 2022

wsxjbupt commented Nov 16, 2022

bionicbeagle commented Nov 28, 2022

bionicbeagle commented Nov 30, 2022 •

edited

Loading

gdamore commented Apr 20, 2023

gdamore commented Apr 24, 2023

gedalia commented Jun 23, 2023

ayizhi commented Aug 22, 2023

xinxinxiangying258 commented Aug 22, 2023

gdamore commented Sep 14, 2023

gdamore commented Sep 14, 2023

gdamore commented Nov 26, 2023

gdamore commented Dec 18, 2023

gdamore commented Dec 30, 2023

gdamore commented Jan 1, 2024

nng creates too many threads #1572

nng creates too many threads #1572

Comments

Jasper-Bekkers commented Feb 1, 2022

JaylinYu commented Feb 6, 2022

gdamore commented Feb 6, 2022

gdamore commented Feb 6, 2022

gdamore commented Feb 6, 2022

Jasper-Bekkers commented Feb 8, 2022

gdamore commented Feb 8, 2022

gdamore commented Feb 8, 2022

gdamore commented Apr 16, 2022

JochenBaier commented Sep 18, 2022

wsxjbupt commented Nov 16, 2022

bionicbeagle commented Nov 28, 2022

bionicbeagle commented Nov 30, 2022 • edited Loading

gdamore commented Apr 20, 2023

gdamore commented Apr 24, 2023

gedalia commented Jun 23, 2023

ayizhi commented Aug 22, 2023

xinxinxiangying258 commented Aug 22, 2023

gdamore commented Sep 14, 2023

gdamore commented Sep 14, 2023

gdamore commented Nov 26, 2023

gdamore commented Dec 18, 2023

gdamore commented Dec 30, 2023

gdamore commented Jan 1, 2024

bionicbeagle commented Nov 30, 2022 •

edited

Loading