You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 23, 2024. It is now read-only.
Currently, when deploying services, we donºt know if the application is actually running. We only know the container was started. Recently we had an issue in configuration, where the NIJobs API couldn't connect to the DB, causing it to restart indefinitely.
To fix this, we should add a line to the project configs file specifying a health check endpoint (this only works for web applications, but that's the only services we have right now, so it should be fine).
Then, the deployment script would call that endpoint (using curl, for example), and if it didn't return 200, it would cause the deployment to fail, which would trigger a rollback (it would go back to the previous commit, and deploy that one)
Since services can take some time to boot up, this health check should have a retry mechanism (try to call the endpoint every 10 seconds up to 5 minutes, and if no call returned 200, rollback)
The text was updated successfully, but these errors were encountered:
Currently, when deploying services, we donºt know if the application is actually running. We only know the container was started. Recently we had an issue in configuration, where the NIJobs API couldn't connect to the DB, causing it to restart indefinitely.
To fix this, we should add a line to the project configs file specifying a health check endpoint (this only works for web applications, but that's the only services we have right now, so it should be fine).
Then, the deployment script would call that endpoint (using curl, for example), and if it didn't return
200
, it would cause the deployment to fail, which would trigger a rollback (it would go back to the previous commit, and deploy that one)Since services can take some time to boot up, this health check should have a retry mechanism (try to call the endpoint every 10 seconds up to 5 minutes, and if no call returned 200, rollback)
The text was updated successfully, but these errors were encountered: