-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User Story] Detect server failures and automatically fix them #38
Comments
Thanks for opening this issue 👍 Just to be sure, you're describing that the server does not reconnect when an MQTT connection fails, and the HTTP interface is not affected, right? There's currently no official way to do this, see empicano/aiomqtt#287 for reference. I'll see how to hack around this today. |
I think so. I didn't explicitly test the HTTP connection. The issue showed over the dashboard which showed no new messages from all sensors for the same timestamp. The dashboard didn't report any issues and showed the "latest" messages. Therefore I would guess that the API is still running. For quick and dirty fix: |
The problem is actually not to discover a connection failure, we get an exception from I've tested the reconnection by shutting a local broker on and off, should work now 🙂 |
It happened again roughly 1 hour ago. |
The server was still running on 4007fe3, I just pushed and deployed the new version for you 🙂 What I showed you to redeploy on DigitalOcean only restarts the server. To deploy a new version, we have to build and push a new Docker image to the registry. DigitalOcean has a guide with more information on how to do this. In short, we call |
As a network operator, I would like the server to detect a communication failure on its own so that I can prevent downtime and loss of node messages.
Additional context:
The server was down for four days over the weekend (redeployment on 17.06 fixed the issue)
The server was down for two days over the weekend (redeployment on 27.05 fixed the issue)
Other downtimes 08.04, 11.1, 28.12
The text was updated successfully, but these errors were encountered: