Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart_process restart-helper leaks zombie processes #570

Open
glennpratt opened this issue Apr 12, 2024 · 4 comments
Open

restart_process restart-helper leaks zombie processes #570

glennpratt opened this issue Apr 12, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@glennpratt
Copy link

glennpratt commented Apr 12, 2024

root@api-5dd8c8769-nwdv7:/# ps aux wwwf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          49  0.0  0.0   4188  3332 pts/0    Ss   14:30   0:00 bash
root          80  0.0  0.0   8088  3940 pts/0    R+   14:31   0:00  \_ ps aux wwwf
root           1  0.0  0.0 703588  1664 ?        Ssl  05:37   0:00 /tilt-restart-wrapper --watch_file=/tmp/.restart-proc sh -c /api-server
root          11  0.0  0.0   1220   104 ?        S    05:37   0:00 /entr -rz sh -c /api-server
root          70  0.0  0.0   2576   860 ?        S    14:30   0:00  \_ sh -c /api-server
root          71  0.1  0.4 1287044 72528 ?       Sl   14:30   0:00      \_ /api-server
root          13  0.0  0.0      0     0 ?        Z    05:37   0:02 [api-server] <defunct>
root          38  0.0  0.0      0     0 ?        Z    14:29   0:00 [api-server] <defunct>

Seems like it might need a minimal init (tini, dumb-init, etc).

I noticed this because entr also doesn't reliably kill the process before starting a new one... perhaps this is because of sh, I'm not sure. That's a different issue.

@nicks
Copy link
Member

nicks commented Apr 16, 2024

hmmm...i was not able to reproduce this problem. here are the steps i tried:

  1. cd into tilt-extensions/restart_process/test
  2. Run tilt up -f custom_deploy.Tiltfile
  3. Clicked test_update a bunch of times
  4. Exec'd into the pod and ran ps aux wwwf
  5. Saw that there was exactly one start.sh, as i expected.

when i poked around in the entr repo, i found this issue - eradman/entr#38 - which seems to indicate to me that it's at least expected for entr to kill the process

@nicks nicks added needs repro case bug Something isn't working labels Apr 16, 2024
@samuliy
Copy link

samuliy commented Jul 4, 2024

I'm running into a sort of similar situation where a process can be left around hanging as a zombie process.

I tried to create a reproducable testcase and came up with something which creates a zombie process.
This is not supposed to be a useful or optimized usecase for anything, just commands put together to create a zombie process.

diff --git a/restart_process/test/Dockerfile.test b/restart_process/test/Dockerfile.test
index cfbeb9f..89e11d3 100644
--- a/restart_process/test/Dockerfile.test
+++ b/restart_process/test/Dockerfile.test
@@ -1,5 +1,7 @@
 FROM alpine
 
+RUN apk add nginx
+
 RUN echo 0 > restart_count.txt
 
 ADD start.sh /
diff --git a/restart_process/test/start.sh b/restart_process/test/start.sh
index c9bc981..6277158 100755
--- a/restart_process/test/start.sh
+++ b/restart_process/test/start.sh
@@ -12,8 +12,13 @@ handle_sigterm() {
 
 trap handle_sigterm SIGTERM
 
-while true
-do
-  echo running
-  sleep 5
+while pgrep -f nginx >/dev/null ; do
+  echo "waiting for nginx to shut down"
+  set +e
+  pkill -f nginx
+  set -e
+  sleep 3
 done
+nginx
+
+tail -F /var/log/nginx/*
  1. Run tilt up -f custom_deploy.Tiltfile
  2. The restart should happen right away with the first start.
  3. The new run of the entrypoint script is left waiting for the previous instance of nginx to shut down, since it will remain there as a zombie process because tilt-restart-wrapper as pid 1 does not call waitpid on it.

Going into the container and running ps faux:

ps faux
PID   USER     TIME  COMMAND
    1 root      0:00 /tilt-restart-wrapper --watch_file=/tmp/.restart-proc sh -c /start.sh
   17 root      0:00 /entr -rz sh -c /start.sh
   25 root      0:00 [nginx]
   62 root      0:00 {start.sh} /bin/sh /start.sh
   78 root      0:00 sleep 3
   79 root      0:00 sh
   85 root      0:00 ps faux

The pid 25 is there as a zombie.


I assume the issue is tilt-restart-wrapper not reaping zombies, which should be a job of a process running as PID 1.

@glennpratt
Copy link
Author

Thanks for the reproduction @samuliy. Until this is fixed, there is a simple workaround if this doesn't cause issues with your workload or sidecars:

      {{- if .Values.tilt }}
      # Fix zombie processes not stopping under Tilt
      shareProcessNamespace: true
      {{- end }}

@glennpratt
Copy link
Author

glennpratt commented Oct 1, 2024

@nicks can you remove the needs repro case label?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants