Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenericHelixControler.getLeaderController does not check for potential conrtroller's _helixMapper's nullity #2963

Open
jacob-netguardians opened this issue Nov 13, 2024 · 1 comment · May be fixed by #2977
Labels
bug Something isn't working

Comments

@jacob-netguardians
Copy link

jacob-netguardians commented Nov 13, 2024

Describe the bug

Plain old NPE.
Depending on the order of the controllers in the set of existing controllers, and especially depending on the position of the "leader" controller in the collection, some controllers prior to the "leader" one may still have their _helixMapper field set to null, resulting with an NPE while filtering for leader.

To Reproduce

Happens from time to time, depending on the ordering of the retrieved controllers in the collection.

Expected behavior

controllers without _helixMapper set cannot be leaders, so we could expect them to be filtered out during the search.

Here is the code I propose to change (in GenericHelixController.java):

  public static GenericHelixController getLeaderController(String clusterName) {
    if (clusterName != null) {
      ImmutableSet<GenericHelixController> controllers = _helixControllerFactory.get(clusterName);
      if (controllers != null) {
        return controllers.stream().filter(controller -> controller._helixManager.isLeader())
            .findAny().orElse(null);
      }
    }
    return null;
  }

to

  public static GenericHelixController getLeaderController(String clusterName) {
    if (clusterName != null) {
      ImmutableSet<GenericHelixController> controllers = _helixControllerFactory.get(clusterName);
      if (controllers != null) {
        return controllers.stream().filter(controller -> controller._helixManager != null && controller._helixManager.isLeader())
            .findAny().orElse(null);
      }
    }
    return null;
  }
@jacob-netguardians
Copy link
Author

PR could be created (I don't have that sort of right, sorry, still a newbie here) from that branch here: https://github.com/jacob-netguardians/helix/tree/bugfix/2963-fix-npe-while-rebalancing-task

jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Nov 19, 2024
In case _helixManager in one of the known controllers is _null_, finding
the "leader controller" can lead to a NPE.

When run as an asynchronous task, the NPE may not even be logged, and the
task finishes without any trace, hence the addition of ExecutorTaskUtil
class to wrap callables/runnables in such a manner that such an exception
at least gets logged.
jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Nov 19, 2024
In case _helixManager in one of the known controllers is _null_, finding
the "leader controller" can lead to a NPE.

When run as an asynchronous task, the NPE may not even be logged, and the
task finishes without any trace, hence the addition of ExecutorTaskUtil
class to wrap callables/runnables in such a manner that such an exception
at least gets logged.
jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Nov 19, 2024
In case _helixManager in one of the known controllers is _null_, finding
the "leader controller" can lead to a NPE.

When run as an asynchronous task, the NPE may not even be logged, and the
task finishes without any trace, hence the addition of ExecutorTaskUtil
class to wrap callables/runnables in such a manner that such an exception
at least gets logged.
jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Nov 19, 2024
In case _helixManager in one of the known controllers is _null_, finding
the "leader controller" can lead to a NPE.

When run as an asynchronous task, the NPE may not even be logged, and the
task finishes without any trace, hence the addition of ExecutorTaskUtil
class to wrap callables/runnables in such a manner that such an exception
at least gets logged.
@jacob-netguardians jacob-netguardians linked a pull request Dec 6, 2024 that will close this issue
5 tasks
jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Dec 9, 2024
Reformatted touched files according to "helix-format"
jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Dec 11, 2024
jacob-netguardians added a commit to jacob-netguardians/helix that referenced this issue Dec 11, 2024
Reformatted touched files according to "helix-format"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant