-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix partitionAssignment API failing due to NPE when no resource config #2653
Fix partitionAssignment API failing due to NPE when no resource config #2653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GrantPSpencer - except for one comment, code change is simple.
thanks once again,
@@ -359,8 +359,10 @@ private void computeWagedAssignmentResult(List<IdealState> wagedResourceIdealSta | |||
ConfigAccessor cfgAccessor = getConfigAccessor(); | |||
List<ResourceConfig> wagedResourceConfigs = new ArrayList<>(); | |||
for (IdealState idealState : wagedResourceIdealState) { | |||
wagedResourceConfigs | |||
.add(cfgAccessor.getResourceConfig(clusterId, idealState.getResourceName())); | |||
ResourceConfig resourceConfig = cfgAccessor.getResourceConfig(clusterId, idealState.getResourceName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not check if the resource is WAGED enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for WAGED resources happens in the function (computeOptimalAssignmentForResources) that calls this function (computeWagedAssignmentResult). My understanding is there is an assumption that the only resources passed to it are waged resources.
The calling function is computeOptimalAssignmentForResources() in ResourceAssignmentOptimizerAccessor line 254
// Compute all Waged resources in a batch later.
if (idealState.getRebalancerClassName() != null && idealState.getRebalancerClassName()
.equals(WagedRebalancer.class.getName())) {
wagedResourceIdealState.add(idealState);
continue;
}
and then:
if (!wagedResourceIdealState.isEmpty()) {
computeWagedAssignmentResult(wagedResourceIdealState, inputFields, clusterState, clusterId,
result);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, so in some way, user has configured resource as WAGED but hasn't provided WAGED resource config? isn't this user error? we can prevent null pointer exception but shouldn't user know that the config is wrong too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically waged rebalance "works" for clusters without resource configs. This is actually the current set up of our super clusters. Using waged without any resource configs or relevant instance capacity configs.
I'm not 100% on this part, but I believe if there are no resource configs then the score calculated for each node will be 0 and tiebreak will go to the node without any resources assigned to it. I don't think there's a guarantee of evenness if there's no resource and instance capacity configs, but it will guarantee that each node will have at least 1 replica assigned to it (given # replicas > # nodes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NPE only occurs for the partitionAssignment API, but the actual controller rebalance algorithm works fine in the same scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that means, we fill out the value using some default values in Waged workflow but not in this workflow.
Please look at: WagedValidationUtil::validateAndGetPartitionCapacity.
But your fix should be good too.
Pull request approved by @xyuanlu |
Issues
The PartitionAssignment API fails for waged clusters where a resource does not have a respective resource config defined for it in ZK.
This is the error that is shown to users:
This is the error that is found in helix-rest logs (truncated)
Description
partitionAssignment API fails for clusters where resource configs aren't set due to NPE. This NPE occurs because getResourceConfig() will return null if the resource config does not exist, which is then added into the wagedResourceConfigs list. The below code is where the NPE occurs as one of the items in the list is null.
Tests
The following tests are written for this issue:
No new unit tests. But I did test this by deploying helix-rest locally to confirm that the partitionAssignment API worked after the change
The following is the result of the "mvn test" command on the appropriate module:
$mvn test -o -Dtest=TestResourceAssignmentOptimizerAccessor -pl=helix-rest