Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OpenSearch container does not allow for graceful termination #3229

Closed
shikharbhardwaj opened this issue Jan 28, 2023 · 15 comments · Fixed by #4694 · May be fixed by #3320 or #4217
Closed

[BUG] OpenSearch container does not allow for graceful termination #3229

shikharbhardwaj opened this issue Jan 28, 2023 · 15 comments · Fixed by #4694 · May be fixed by #3320 or #4217
Assignees
Labels
bug Something isn't working

Comments

@shikharbhardwaj
Copy link

shikharbhardwaj commented Jan 28, 2023

Describe the bug

OpenSearch container does not do a graceful shutdown when a termination signal (SIGTERM/SIGINT) is sent.
This issue was discovered when running the official docker image within Kubernetes, where a rollout would cause the container to be terminated forcefully after terminationGracePeriodSeconds instead of catching the SIGTERM signal sent and shutting down cleanly.

This could be fixed by using a proper init process (like tini) as PID 1 instead of a shell, which would register signal handlers for these termination signals and allow for graceful termination. An example for this change is in the gist below.

Resources with more details on this issue:

To Reproduce
Steps to reproduce the behavior:

  1. Run an OpenSearch container using the official docker image (tested on both 1.3.7 and 2.5.0).
  2. Send a SIGTERM or SIGINT to the container and it will not terminate.

Gist here with more detailed reproduction/remediation steps: OpenSearch container termination signal handling

Expected behavior
The container should listen for these termination signals and shutdown gracefully.

Plugins
N/A

Host/Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • Docker: 20.10.16
  • OpenSearch versions: 1.3.7, 2.5.0

Additional context
Add any other context about the problem here.

@shikharbhardwaj shikharbhardwaj added bug Something isn't working untriaged Issues that have not yet been triaged labels Jan 28, 2023
@dblock dblock transferred this issue from opensearch-project/OpenSearch Jan 30, 2023
@shikharbhardwaj
Copy link
Author

@dblock -- taking another look at the Dockerfile, I believe it is generated from this file in the opensearch-build project.

https://github.com/opensearch-project/opensearch-build/blob/d99811b3cfab45e950cdd2ed7f004074baf7d93e/docker/release/dockerfiles/opensearch.al2.dockerfile#L110:L111

Do you think that project would be the better place to move this issue?

@peterzhuamazon
Copy link
Member

Hi @shikharbhardwaj we would love to work with you to improve our docker image.
I will move thie issue to build repo.

Thanks.

@peterzhuamazon peterzhuamazon transferred this issue from opensearch-project/opensearch-devops Feb 22, 2023
@peterzhuamazon
Copy link
Member

Hi @shikharbhardwaj, if you dont mind, we would like to invite you to open a PR if you have any solutions.
We would love to assist in the process.

Thanks!

@shikharbhardwaj
Copy link
Author

@peterzhuamazon sounds good, I will raise a PR for this.

@gaiksaya gaiksaya removed the untriaged Issues that have not yet been triaged label Mar 2, 2023
@shikharbhardwaj
Copy link
Author

Hello @peterzhuamazon, raised a PR for fixing this issue here: #3320
Let me know if there is something I can help with to get this reviewed/merged
Thanks!

@peterzhuamazon
Copy link
Member

Added review, sorry for the delay on this @shikharbhardwaj .

Thanks.

@peterzhuamazon peterzhuamazon self-assigned this Nov 14, 2023
@chawinphat
Copy link
Contributor

@peterzhuamazon commenting so you can assign me

@chawinphat
Copy link
Contributor

@peterzhuamazon

I was able to gracefully terminate using Tini as described above only with the path being '/bin/tini". Using the relative path "./tini" gives the error:

teddyt@Teddys-MacBook-Pro docker % docker run -it -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" chawinphat/opensearch:1.3.9 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "./tini": stat ./tini: no such file or directory: unknown.

@jordarlu
Copy link
Contributor

jordarlu commented Dec 6, 2023

Thanks @chawinphat and @peterzhuamazon !
@peterzhuamazon , if you could have another review on #4217 when you get any chance, thanks!!

@239487230597507
Copy link

Hi @shikharbhardwaj we have solved this by using the exec command in the docker entrypoint file.

https://github.com/opensearch-project/opensearch-build/blob/d99811b3cfab45e950cdd2ed7f004074baf7d93e/docker/release/config/opensearch/opensearch-docker-entrypoint.sh

If the command in line 101 is started using the exec command the opensearch process will run with PID 1 and the TERM signal will be sent to this process.

@peterzhuamazon
Copy link
Member

Hi @shikharbhardwaj we have solved this by using the exec command in the docker entrypoint file.

https://github.com/opensearch-project/opensearch-build/blob/d99811b3cfab45e950cdd2ed7f004074baf7d93e/docker/release/config/opensearch/opensearch-docker-entrypoint.sh

If the command in line 101 is started using the exec command the opensearch process will run with PID 1 and the TERM signal will be sent to this process.

Hi @239487230597507 would you willing to share your changes?
If so, would you mind contributing the changes as a PR?

This issue has been opened for a while and we would like to resolve it soon. Thanks 😄

@cinimins
Copy link

cinimins commented Mar 7, 2024

Any updates here? It happened already for the 2nd time that our opensearch got corrupted because the container got killed by Kubernetes (probably duing some write-operation) after the termination grace period was exceeded.

Since the author of the PR hasn't commented in almost over a year, should we open a new PR based on his branch or do you want to go with the approach by @239487230597507 ?

@239487230597507
Copy link

@cinimins @peterzhuamazon I will try to create a PR. Unfortunately I am a complete Github noob so it might take a couple of days. :D

@cinimins
Copy link

cinimins commented Mar 7, 2024

@cinimins @peterzhuamazon I will try to create a PR. Unfortunately I am a complete Github noob so it might take a couple of days. :D

Let me know if you need any help with that! :)

@benobytes
Copy link

To anyone who is finding this thread like myself, for the current workaround, I am using the --init option on Docker 1.13+ which uses tini and is part of https://github.com/krallin/tini.

Example from my compose file:

  opensearch:
    image: opensearchproject/opensearch:latest
    init: true

JulianKniephoff added a commit to JulianKniephoff/opencast that referenced this issue Apr 30, 2024
This puts the process under the control of a process manager
(`tini` specifically) so it can handle signals better.

cf. opensearch-project/opensearch-build#3229 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment