Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docker graceful shutdown #830

Merged

Conversation

fflorent
Copy link
Collaborator

Context

When running grist on docker, the only way to stop the node process is to kill (or wait for docker stop to kill the container after 10 seconds).

We can see that the bash pid remains, which seems to be a bad practice

docker exec -it 74196865e076 ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          20  0.0  0.0   7640  2688 pts/0    Rs+  09:50   0:00 ps faux
root           1  0.0  0.0   2388  1408 ?        Ss   09:49   0:00 /bin/sh -c ./sandbox/run.sh
root           8  0.0  0.0   3736  2688 ?        S    09:49   0:00 bash ./sandbox/run.sh
root           9  9.0  0.5 1003572 194712 ?      Sl   09:49   0:01  \_ node _build/stubs/app/server/server.js

Instead we would like this situation:

docker exec -it f496a0a6fe273f59d288811deddb6a33f4105caf0f3ed43b485bb30d6afe92f1 ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          18  0.0  0.0   7640  2560 pts/0    Rs+  09:53   0:00 ps faux
root           1 15.7  0.6 1003316 196020 ?      Ssl  09:53   0:01 node _build/stubs/app/server/server.js

Proposed solution

Here are two fixes:

  1. In order to let nodejs gracefully shutdown, in run.sh, we run node using the exec command (this substitute the bash process with nodejs's one for pid #1)
  2. We replace process.kill() with process.exit() which seems to stop the server without waiting (we are not 100% sure about this fix).

cc'ing @rouja @ohemelaar with whom I have pair-progged on this PR.

Florent FAYOLLE added 2 commits January 22, 2024 16:54
This notably ensures that "docker stop" sends a TERM signal to the node
process which can handle it gracefully and avoid document corruption.
Without that, the node process will hang until docker kills the
container.
@fflorent fflorent requested a review from dsagal January 25, 2024 15:10
Copy link
Member

@dsagal dsagal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fflorent . In my understanding (and limited testing), just the second change is sufficient. Are you seeing something different? In particular, do you see a scenario where the server is logging "Server ... exiting on SIGTERM" and then not exiting promptly?

If the server isn't seeing SIGTERM, I ran into this when testing the second change on grist-omnibus -- it was helpful but not sufficient because there was an extra sh process, that wasn't passing along the SIGTERM from docker. To get rid of that in grist-omnibus, I had to change the command from /grist/run.js to ["node", "/grist/run.js"]. Just sharing in case something similar is affecting other setups.

@fflorent
Copy link
Collaborator Author

Hi @dsagal

In my understanding (and limited testing), just the second change is sufficient.

Does the "second change" refer to this commit: a6259aa?

Are you seeing something different? In particular, do you see a scenario where the server is logging "Server ... exiting on SIGTERM" and then not exiting promptly?

Yes, exactly: SIGTERM with docket stop command or SIGINT with CTRL-C. Both won't stop.
I use docker installed through Debian stable (bookworm) repository:

$ docker --version
Docker version 20.10.24+dfsg1, build 297e128

A tip we use in order to test that fix:

$ yarn build:prod
$ cp ./sandbox/run.sh ./sandbox/run2.sh # Not sure about that… someone in our team told us this avoids using the previous version of run.sh in the image
$ docker run --rm -v $(pwd):/grist -p 8484:8484 --name grist_latest gristlabs/grist:latest /grist/sandbox/run2.sh

and then either try ctrl-c or docker stop.

@fflorent
Copy link
Collaborator Author

Also I just installed docker version 25.0.1 (following this procedure), I've got the same problem.

@dsagal
Copy link
Member

dsagal commented Jan 29, 2024

Thanks for the more details @fflorent . I learned something new today about running things in docker!

The reason I was looking to avoid process.exit(0) inside of a signal handle is that it hides the fact that the process was killed. Normally, the exit code reflects the reason for the exit (like, which signal stopped the process). In most cases, like running Grist locally or in grist-omnibus, the current code works as intended: on receiving a signal, it does its cleanup, then re-sends the same signal, which is no longer caught, and causes it to exit with the expected exit code for that signal. But not in grist-core, when exec is added to run.sh.

Turns out, the reason is that the main container process with PID 1 is special and cannot be killed from inside a container. I found a good explanation at https://daveiscoding.com/why-do-you-need-an-init-process-inside-your-docker-container-pid-1.

The recommendation there is to run docker with --init flag. With that flag, the exec env change is sufficient -- container will now exit as expected as soon as Grist finishes cleanup. (In effect, it runs a wrapper process as the PID 1 process, one based on tini.)

The other option is to keep your change, and address my concern by setting the process exit code to the expected one. This is possible with this tweak (in place of process.exit(0)):

        // Exit with the expected exit code for being killed by this signal.
        const signalNumber = os.constants.signals[signal];
        process.exit(128 + signalNumber);

The --init option has other benefits (cleanup of orphaned processes), and seem generally recommended by docker. But the process.exit approach should be fine for this purpose and doesn't require a new flag.

@paulfitz or @fflorent , do either of you have an opinion about the preferred approach? I'd be OK with either of these.

@fflorent
Copy link
Collaborator Author

Thanks for your feedback @dsagal.
The --init option seems a good way to handle this. @rouja and I will look at whether Kubernetes has an equivalent option (shareProcessNamespace: true seems a good candidate).

@rouja
Copy link

rouja commented Jan 29, 2024

Hello,

I did some tests on a k8s cluster. If we set shareProcessNamespace: true on the deployment, we don't need the commit 0f76ada.
But we forgot a change in the first commit. In order to have everything working, in the Dockerfile we need to replace :

CMD ./sandbox/run.sh

by

CMD ["/sandbox/run.sh"]

As @dsagal suggests in his comment.

@fflorent
Copy link
Collaborator Author

Thanks @rouja for your feedback, I have integrated the remark and reverted 0f76ada

@fflorent fflorent requested a review from dsagal January 29, 2024 18:57
Copy link
Member

@dsagal dsagal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thank you!

@fflorent
Copy link
Collaborator Author

@dsagal Thank you for your review! Is it OK to merge or do you need anything beforehand? :)

@dsagal dsagal merged commit 36ade2b into gristlabs:main Jan 30, 2024
12 of 13 checks passed
@dsagal
Copy link
Member

dsagal commented Jan 30, 2024

Sorry @fflorent , forgot that step again 🙂

@fflorent fflorent deleted the fix-docker-graceful-shutdown branch January 30, 2024 16:14
@paulfitz
Copy link
Member

paulfitz commented Mar 6, 2024

Hi there, I've noticed recently that it doesn't seem possible to stop Grist with ^C anymore when it is run like this:

docker run -p 8484:8484 -it gristlabs/grist

This is the command given in the README for people trying out Grist for the first time. @fflorent do you see this behavior too?

@fflorent
Copy link
Collaborator Author

fflorent commented Mar 7, 2024

I do too (I have not tested on my usual laptop, but it has to be fixed, I take the issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants