-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inspection Service] Add simple consensus health check endpoint. #15512
Conversation
⏱️ 1h 54m total CI duration on this PR
🚨 1 job on the last run was significantly faster/slower than expected
|
if gauge_value == "0" { | ||
return ( | ||
StatusCode::OK, | ||
Body::from("Consensus health check passed!"), | ||
CONTENT_TYPE_TEXT.into(), | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health check logic appears to be reversed. The code returns OK
when gauge_value == "0"
, indicating consensus is not executing. However, based on the function's documented purpose of checking if "the node is currently participating in consensus", it should return OK
when gauge_value == "1"
. This would align with the gauge's behavior set in update_executing_component_metrics()
where 1
indicates active consensus participation.
Spotted by Graphite Reviewer
Is this helpful? React 👍 or 👎 to let us know.
6c1c559
to
67569b0
Compare
use hyper::{Body, StatusCode}; | ||
use prometheus::TextEncoder; | ||
|
||
// The metric key for the consensus execution gauge | ||
const CONSENSUS_EXECUTION_GAUGE: &str = "aptos_state_sync_consensus_executing_gauge{}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric name aptos_state_sync_consensus_executing_gauge{}
includes empty curly braces that should be removed since this gauge doesn't use any labels. To match the registered metric name in the code below, this should be aptos_state_sync_consensus_executing_gauge
.
Spotted by Graphite Reviewer
Is this helpful? React 👍 or 👎 to let us know.
67569b0
to
ea3a5b9
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
const CONSENSUS_EXECUTION_GAUGE: &str = "aptos_state_sync_consensus_executing_gauge{}"; | ||
|
||
/// Handles a consensus health check request. This method returns | ||
/// 200 iff the node is currently participating in consensus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// 200 iff the node is currently participating in consensus. | |
/// 200 if the node is currently participating in consensus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, this was meant to be "if and only if" 😄 https://www.merriam-webster.com/dictionary/iff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, this was meant to be "if and only if" 😄 merriam-webster.com/dictionary/iff
I recommend simply using if and only if in that case. I doubt that even half of our eng team knows the iff
thingy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Changed it to if
. Don't feel as strongly as you do 😄
ea3a5b9
to
bf4f448
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation and see the Github Action logs for details |
Description
This PR adds a simple consensus health check endpoint to the node inspection service. Specifically, the new endpoint
/consensus_health_check
will return a 200 status code iff the node is currently executing consensus.To achieve this, I added a new metric gauge that is set iff consensus is executing on the validator node. The inspection service simply fetches the value of this gauge. The gauge can be seen here.
Testing Plan
Existing test infrastructure, and manual verification, e.g., I ran a local validator and pinged the endpoint: