Processes stack up on concurrent restarts #45

joshribakoff · 2018-01-13T09:39:09Z

send SIGUSR2 & wait, my 2 processes come back up as 2 process every time. Using siege, availability stays at 100%

Send SIGUSR2 twice rapidly, my 2 processes turn to 5 processes and I get error in console Error [ERR_IPC_DISCONNECTED]: IPC channel is already disconnected

Using siege, it shows availability drops from 100% to around 90%. Eventually number of running processes decrease back down to 2.

I would have expected it to kill a worker before spawning a replacement, instead it appears it spawns replacements before killing the worker being replaced, this means total processes sometimes exceed number of CPUs (undesired). Also 90% availability instead of 100% when issuing restarts in quick succession is an issue. A quick fix could be to toggle a flag & block restarts if theres already one pending.

The text was updated successfully, but these errors were encountered:

spion · 2018-01-13T11:43:05Z

Replacements must be spawned before killing the workers, otherwise there will be a zero availability period between killing the workers and the replacements becoming ready.

edit: additionally, old workers will continue to run without receiving new requests as long as they are still servicing long running old requests, or until the kill timeout expires.

Blocking restarts while one is in progress could be a good optional feature.

joshribakoff · 2018-01-13T20:25:19Z

I believe if you kill first and then spawn there will always be at least N-1 workers which may be desirable over having sometimes up to N*2 workers. I tested and it can for sure be 100% available using either strategy given you have 2 or more workers. Either way it's not 100% available now due to the unaccounted for edge case.

On Sat, Jan 13, 2018 at 3:43 AM Gorgi Kosev ***@***.***> wrote: Replacements must be spawned before killing the workers, otherwise there will be a zero availability period between killing the workers and the replacements becoming ready. Blocking restarts while one is in progress could be a good optional feature. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#45 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD1cOvyUVycvRLlVDeXRCFvzt3CtFTkSks5tKJbKgaJpZM4RdKoF> .

-- Sent from Gmail Mobile

spion · 2018-01-14T00:40:02Z

I tested and it can for sure be 100% available using either strategy given you have 2 or more workers.

You mean using rolling restarts? That could be an option too.

joshribakoff · 2018-01-15T00:08:53Z

Correct, the primary goal of this issue would be blocking concurrent restarts so it remains 100% available, without that, I don't feel like its really production ready (what if a restart takes longer than usual and someone accidentally deploys again before its done restarting?) According to siege, you'll drop like 10-20% of traffic to the nodeJS backend which is really risk.

A secondary goal would be to support rolling restarts, where you always have N-1 works online, and no more than N workers online. There could be a setting to preserve the current strategy of spawning N replacement workers, where you now have between N & N*2 workers online.

My performance testing shows that having rolling restarts (>= N-1 workers and <= N workers at all times) performs better than sometimes having up to N2 workers. with N-1 workers, or rolling restarts, the other workers just pick up the slack. With N2 workers, they start competing for CPU & other system resources since you now have more workers spawned than you have CPU cores, not to mention some apps are memory intensive.

So the primary goal would be to be able to have 100% uptime, despite any edge cases. The secondary goal would be a "rolling restart" strategy that doesn't impact performance as much while a restart is occurring.

joshribakoff · 2018-01-25T19:36:49Z

This workaround seems to solve the problem.

let reloading = false
process.on('SIGUSR2', function() {
    if(reloading) {
        console.log('Got SIGUSR2, already reloading so ignoring it...');
        return;
    }

    reloading = true;
    console.log('Got SIGUSR2, reloading cluster...');
    
    cluster.reload(() => reloading = false);
});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processes stack up on concurrent restarts #45

Processes stack up on concurrent restarts #45

joshribakoff commented Jan 13, 2018 •

edited

Loading

spion commented Jan 13, 2018 •

edited

Loading

joshribakoff commented Jan 13, 2018 via email

spion commented Jan 14, 2018

joshribakoff commented Jan 15, 2018

joshribakoff commented Jan 25, 2018

Processes stack up on concurrent restarts #45

Processes stack up on concurrent restarts #45

Comments

joshribakoff commented Jan 13, 2018 • edited Loading

spion commented Jan 13, 2018 • edited Loading

joshribakoff commented Jan 13, 2018 via email

spion commented Jan 14, 2018

joshribakoff commented Jan 15, 2018

joshribakoff commented Jan 25, 2018

joshribakoff commented Jan 13, 2018 •

edited

Loading

spion commented Jan 13, 2018 •

edited

Loading