Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bz1505037: Introduce DEBUG_IGNORE_SCRIPT_FAILURES option to rescue from failure of the setup script #199

Merged
merged 2 commits into from
Jan 2, 2018

Conversation

nak3
Copy link
Contributor

@nak3 nak3 commented Oct 21, 2017

Since run-mysqld sets set -eu, if commands returned non 0 result
inside passwd-change.sh, mysql container does not start. Due to this,
if mysql command starts failing inside passwd-change.sh, mysql does
not run and we cannot debug it.

This patch introduces set +e inside the script and ignore the
user/password creation error, so users can debug mysql after it starts
running.

@centos-ci
Copy link
Collaborator

Can one of the admins verify this patch?

1 similar comment
@rhscl-bot
Copy link

Can one of the admins verify this patch?

@hhorak
Copy link
Member

hhorak commented Oct 21, 2017

Thanks @nak3 for reporting. Do you know about some specific case when some command in the script returns non-zero exit code? I'd rather try fix that case, than ignore the failure by setting set +e, since at this point I don't see what command could return non-exit code and it would be taken as fine.

@nak3
Copy link
Contributor Author

nak3 commented Oct 22, 2017

Hi @hhorak

I'm sorry, I should have informed the bug ticket. Below is the exact ticket. We have not found the reason why mysql returned Access denied for user 'root'@'localhost' yet. (We are debugging it by an image which we built by ourselves as mysql never become Running.)

Bug 1505037 - mysql pod is not debug friendly when passwd-change.sh failed
https://bugzilla.redhat.com/show_bug.cgi?id=1505037

$ oc logs mysql-1-x95v8 -c mysql

OUTPUT:
---> 14:16:58     Processing MySQL configuration files ...
 
 ... snip ...
 
2017-10-18T14:16:59.149617Z 0 [Note] /opt/rh/rh-mysql57/root/usr/libexec/mysqld: ready for connections.
Version: '5.7.16'  socket: '/tmp/mysql.sock'  port: 0  MySQL Community Server (GPL)
2017-10-18T14:16:59.535076Z 2 [Note] Access denied for user 'UNKNOWN_USER'@'localhost' (using password: NO)
---> 14:16:59     MySQL started successfully
---> 14:16:59     Setting passwords ...
2017-10-18T14:16:59.558903Z 3 [Note] Access denied for user 'root'@'localhost' (using password: NO)
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

@nak3 nak3 changed the title Unset -e inside passwd-change.sh bz1505037: Unset -e inside passwd-change.sh Oct 22, 2017
@nak3
Copy link
Contributor Author

nak3 commented Oct 22, 2017

Another option I came up with that:

in 5.x/root/usr/bin/run-mysqld

if [[ ! -v DEBUG_MODE ]]; then
  set -eu
fi

@hhorak
Copy link
Member

hhorak commented Oct 23, 2017

I understand better now, although I don't think that just turning on the debug mode should change behaviour of the image. That could lead to a situation where the image fails when the debug mode is disabled, but would work fine when debug mode is enabled, which wouldn't help debugging much.

PR #202 tries to help debugging by simply putting more information, so users should see what failed when debugging mode is enabled (different env. var is used, since CONTAINER_DEBUG was already used in the image).

@hhorak
Copy link
Member

hhorak commented Oct 23, 2017

@nak3 would something like #202 address your needs?

@nak3
Copy link
Contributor Author

nak3 commented Oct 23, 2017

@hhorak Thank you. But unfortunately it does not address my needs. Although verbose message would be helpful, mysql pod will still not able to be RUNNING. We probably must run mysql command to fix the issue.

For example, if the deubg log informed of some grant was wrong, how we fix it? mysql pod is still not running, so we cannot access to the database/table.

@hhorak
Copy link
Member

hhorak commented Oct 23, 2017

Ah, I understand better now. So, it's not only about see where the issue happened, but also how to get it into working state by some troubleshooting techniques, which, in terms of container means to change some starting scripts.

For that, I believe, the new concept of making the image extensible could be helpful. For example, when you see some issue in passwd-change.sh file (that is provided by the image by default), that this script is not capable of fixing by its own, you can provide an own modified passwd-change.sh file by either bind-mounting a volume or using secrets (or creating a new image using source-to-image), simply putting the changed file into /opt/app-root/src/init/passwd-change.sh and this modified file will be used instead of the original one. This should work if there is no other way forward.

However, even your solution to not set the set -eu makes sense to me, sometimes it can help to overcome some random failure of a single command and can be the easiest way forward. However, one must understands that such change can potentially lead to undefined behavior and that it's done on user's responsibility. Thus, I'd not call this variable DEBUG_MODE, but rather something like DEBUG_IGNORE_SCRIPT_FAILURES.. And we should be explicit that running a container with this set is not supported and proper functionality is not guaranteed.

@nak3
Copy link
Contributor Author

nak3 commented Oct 24, 2017

For that, I believe, the new concept of making the image extensible could be helpful. For example, when you see some issue in passwd-change.sh file (that is provided by the image by default), that this script is not capable of fixing by its own, you can provide an own modified passwd-change.sh file by either bind-mounting a volume or using secrets (or creating a new image using source-to-image), simply putting the changed file into /opt/app-root/src/init/passwd-change.sh and this modified file will be used instead of the original one. This should work if there is no other way forward.

Yes, as I have reported on the bugzilla, we have already reached the solution of re-building the image.

From https://bugzilla.redhat.com/show_bug.cgi?id=1505037

Actual results:
- We had to rebuild MySQL image and put it on public registry.

However, there are so many restrictions that come up.

  • Users have to push the image for public repo.

    • let them use docker hub? Some company does not allow employees to pull image from docker hub.
    • let them use internal registry? OpenShit Online (or multi tenant) does not allow users to use it.
    • On top of that, RH complains if we distribute re-build RHEL image in public repo.
  • Users have to install docker on their client machine to build/push the image.

    • Windowns/Mac users cannot do that. (They need to install Linux VM.)

... etc

However, even your solution to not set the set -eu makes sense to me, sometimes it can help to overcome some random failure of a single command and can be the easiest way forward. However, one must understands that such change can potentially lead to undefined behavior and that it's done on user's responsibility. Thus, I'd not call this variable DEBUG_MODE, but rather something like DEBUG_IGNORE_SCRIPT_FAILURES.. And we should be explicit that running a container with this set is not supported and proper functionality is not guaranteed.

OK, DEBUG_IGNORE_SCRIPT_FAILURES is better. I know the risk of unset -eu. However, database outage is more critical and we want to use DEBUG_IGNORE_SCRIPT_FAILURES if we have the back up data.

@nak3 nak3 changed the title bz1505037: Unset -e inside passwd-change.sh bz1505037: Introduce DEBUG_IGNORE_SCRIPT_FAILURES option to rescue from failure of the setup script Nov 6, 2017
@nak3
Copy link
Contributor Author

nak3 commented Nov 6, 2017

Updated this PR to introduc DEBUG_IGNORE_SCRIPT_FAILURES option to rescue from failure of the setup script.

cc// @hhorak

@rhscl-bot
Copy link

Can one of the admins verify this patch?

1 similar comment
@centos-ci
Copy link
Collaborator

Can one of the admins verify this patch?

@hhorak
Copy link
Member

hhorak commented Dec 10, 2017

[test]

@nak3
Copy link
Contributor Author

nak3 commented Dec 29, 2017

ping @hhorak

@hhorak
Copy link
Member

hhorak commented Jan 2, 2018

Sorry for delay.. I think it's all right, let's merge it.

@hhorak hhorak merged commit a0e6a39 into sclorg:master Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants