-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[apache/helix] -- Add SetPartitionToError for participants to self annotate a node to ERROR state #2792
[apache/helix] -- Add SetPartitionToError for participants to self annotate a node to ERROR state #2792
Conversation
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think we need to make the logic similar to resetPartition can simplify a lot of assumption without adding a lot of logic there.
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
01e180e
to
1dddca9
Compare
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
1dddca9
to
d519193
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
All looks good to me but want to follow up on one thing. I remember we saw some exceptions being thrown when trying to create sensors for * -> ERROR state transition metrics. Can we also resolve this issue in the PR? I think this may be a long standing issue for * -> * transition as well. It would be good to fix it now since users of helix-agent will want to have those metrics in addition users of * -> ERROR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d519193
to
6210d78
Compare
…notate a node to ERROR state
6210d78
to
c3c2cee
Compare
Fixed the mbean issue in HelixTask. Now it supports * -> * transitions. Since this wasn't failing tests, adding some logs. Before:
After:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Please make sure the last commit message appended.
This PR is ready to be merged. This PR adds SetPartitionToError endpoint for participants to self annotate a node to ERROR state |
Issues
Fixes #2791
Description
What: An API endpoint that validates the incoming request and sends a state transition message to sets one or more partitions from any current state to ERROR state.
Why: Currently, the participants are unable to set a partition to an ERROR state explicitly when they seem to be stuck in a specific current state. The only way a replica can be set to ERROR is from within a state model. Having an endpoint to allow this behavior would allow the clients to call the resetPartition endpoint to set it back to INIT state and recover the replica. resetPartition works only on partitions in error state.
Tests
The following tests are written/updated for this issue:
mvn test -o -Dtest=TestSetPartitionToErrorState -pl=helix-core
mvn test -o -Dtest=TestZkHelixAdmin -pl=helix-core
mvn test -o -Dtest=TestPerInstanceAccessor -pl=helix-rest
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)