-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rebooting loop after recovered by WAL-E. #123
Comments
Are you running v2.0.0? That error indicates |
@bacongobbler Yes, v2.0.0. It was fixed after deleted by using I thought it was occurred by memory exhaust. But the result of
|
okay so if you're on v2.0.0 then the other reason this error would pop up is if wal-e could not get a connection to the database, as it says in the logs. Since the previous logs say
I would assume that is your issue, and that the database took an abnormally long time to boot. Once it was restarted it restored faster (likely connection issues to Azure?). I've got a work-in-progress that removes the wait timeout, which is the likely cause for this issue. #112 |
I see. I'll try the canary build after #112 is merged. And also trying more information when this issue was reproduced. |
BTW, some persons using Kube on Azure may have DNS related issues. It seems be reasonable if my issues I posted recently were DNS on Azure specific. |
I tried to #112 based builds and it seems to resolve this issue. |
(I know PR #112 is WIP and it is going to fix this issue near the future.)~~~
I had some confusions about my canary images. Let me revert this comment.
But still I'm in trouble around there and be inspecting... |
Even though I don't have no certain evidence, I guess it is caused when there was executed Wal-E and psql at the same time. (Maybe, psql inside Wal-E and psql outside WAL-E) Deis/database container runs psql periodically. |
Additional information: Recovery failures with SIGQUIT may be decreased by upgrading the spec of the node running SkyDNS (not the node running deis/database). And, upgrading specs seems not 'silver blullets'. Because still I have randomly termination by signal 3 from WAL-E. (BTW, I'm curious. Referencing to official document, it is enough to run Deis Workflow by preparing 2 core * 2 nodes. But my cluster requires more specs. My nodes is off production state, just a few sample apps running. Why does the cluster require more power...) |
this should be resolved via #137. If it isn't please re-open the issue at wal-e/wal-e. Thanks! |
The deis-database on my cluster was in rebooting loops.
I'm not sure the reason why for now because It seems be started in my sleeping time.
The text was updated successfully, but these errors were encountered: