Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Discovery not taking nat manager port-mapping into account #6578

Conversation

daanporon
Copy link

PR description

Proposal to fix the issue i reported in #6573

  • PingPacketData should take into account the NATManager portMapping for the discoveryPort.
  • Take into account the udpPort from the PingPacketData if the host is also taken from there.

Fixed Issue(s)

fixes: #6573

Copy link

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • I thought about the changelog and included a changelog update if required.
  • If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests
  • I thought about running CI.
  • If I did not run CI, I ran as much locally as possible before pushing.

@daanporon
Copy link
Author

daanporon commented Feb 15, 2024

This is my first PR and i haven't tested it yet .. at the moment it's more a proposal on how i think it can be fixed. If it makes sense i can try to finish the PR :)

@daanporon daanporon force-pushed the fix/discovery-not-taking-nat-manager--portmapping-into-account branch from d75da86 to c2c87cc Compare February 15, 2024 09:36
@daanporon daanporon force-pushed the fix/discovery-not-taking-nat-manager--portmapping-into-account branch from 983de99 to e6c3eee Compare February 21, 2024 16:21
@daanporon daanporon marked this pull request as ready for review February 23, 2024 14:22
@daanporon
Copy link
Author

This PR is ready for review. Only had one question. I saw that even when discovery is disabled we are still starting the peerDiscoveryAgent, does this make sense?

I noticed, because in the tests enodeUrlShouldHaveAdvertisedHostWhenDiscoveryDisabled discovery is disabled but still the enodeUrl returned a discport because the discoveryAgent was started and so we fetched the discoveryPort from the NatManager. The NatManager was enabled because on the CI we are running in Docker and by default natMode == AUTO.

@matthew1001
Copy link
Contributor

@daanporon this seems like a reasonable change.

Re. the PeerDiscoveryAgent always starting, I'm not sure I'm afraid. It may be that disabling discovery doesn't prevent "outbound" UDP-based discovery via bootnodes, in which case I guess the agent is still needed?

@daanporon
Copy link
Author

@daanporon this seems like a reasonable change.

Re. the PeerDiscoveryAgent always starting, I'm not sure I'm afraid. It may be that disabling discovery doesn't prevent "outbound" UDP-based discovery via bootnodes, in which case I guess the agent is still needed?

@matthew1001 is something else needed in order to get this approved?

About the PeerDiscoveryAgent i'm not familiar enough with the whole architecture. It was just a consideration i had while looking at the tests and the code. I tried to make my code changes in such a way that it only gets the PortMapping when needed, when the local discoveryPort != 0. This way the thing with the Docker NatMethod was not causing an issue anymore ... so it works the way it worked before. But i still chose to set the NatMethod to NONE, just for clarity.

@matthew1001
Copy link
Contributor

@matthew1001 is something else needed in order to get this approved?

It looks OK to me, but I think it would be sensible to get @pinges to take a look and do the actual approval.

Copy link
Contributor

@pinges pinges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me.
@daanporon what scenarios have you tested that in? What kind of nodes have you started to test this?

…iscovery/PeerDiscoveryAgent.java

Co-authored-by: Stefan Pingel <[email protected]>
Signed-off-by: Daan Poron <[email protected]>
@daanporon
Copy link
Author

daanporon commented Mar 18, 2024

These changes look good to me. @daanporon what scenarios have you tested that in? What kind of nodes have you started to test this?

@pinges I have only run the tests to see if it works and started a local node using: https://wiki.hyperledger.org/display/BESU/Building+from+source#running-developer-builds, but this doesn't use the NAT manager.

If possible it would be good if i can tests this in my k8s environment. Any idea what would be the easiest to build this and push it to a container registy so that i can easily test this?

@macfarla
Copy link
Contributor

@daanporon develop builds (updated every push to main) are here https://github.com/hyperledger/besu/releases/tag/develop - can you work with that?

@daanporon
Copy link
Author

@daanporon develop builds (updated every push to main) are here https://github.com/hyperledger/besu/releases/tag/develop - can you work with that?

but that will only work once it is merged in main i think? but i will try to trigger the workflow in my own repo and push it to a container registry i own so that i can test it.

@daanporon daanporon marked this pull request as draft April 15, 2024 11:26
@daanporon
Copy link
Author

I have been testing this on AWS and Azure and the change is working as expected. I still have other issues with discovery on kubernetes but at least this works already.

@daanporon daanporon marked this pull request as ready for review April 30, 2024 16:05
@daanporon daanporon requested a review from pinges May 15, 2024 15:00
@non-fungible-nelson non-fungible-nelson requested a review from matkt June 4, 2024 15:23
@jframe
Copy link
Contributor

jframe commented Jun 10, 2024

@pinges Can you take another look at this PR

@jflo
Copy link
Contributor

jflo commented Aug 13, 2024

@pinges bump

@pinges
Copy link
Contributor

pinges commented Aug 15, 2024

Looks like this is needed in some environments, I think the only question is how this can be tested. Would be great to have an "AT" that shows that it is working as it should. I guess setting up an environment that we could use to test that automatically is a lot of work.
We could just do some manual testing ourselves, or rely on Daanporon?
@jflo @jframe

@daanporon
Copy link
Author

daanporon commented Oct 3, 2024

I was picking this back up today, and resynced the PR ... because without the PR i'm not able to get it to work. The PONG will always be send to the host/udp-port combination.

This PR should only interfere when there is a NAT configured. And logically you would think that the configuration it exposes should work, else there are issues with the NAT. So i think this should work.

However i think there are still multiple other issues with Discovery to make it work in multiple setups, because for now with this fix i was only able to get it to work using NodePorts. I think there is caching happening on multiple levels that didn't allow it in other use-cases and also the fact that some loadBalancers are using dns-names in stead of ip addresses. I tried to describe my other findings here: #7013

I also added some logs of my latest tests: #7013 (comment)

@macfarla
Copy link
Contributor

Hi @daanporon

We have made the decision to deprecate k8s with nat. So we don't want to add to this section of the code base. We've got solutions in place that bypass nat manager for k8s and thats working for private, and public nodes that we run. Our preference is to keep it simple and clean for the long term and not more band aid patches for another service mechanism or cloud provider.

We recognize that you have gone through a lot of trial and error to create this PR but that's kinda on us for leaving this as is and not making the call earlier - and is also a symptom of the issue that we besu developers aren't experts in this part of the code base, it's not our core expertise.

If you want to keep this functionality and extend, you are welcome to fork and maintain - or consider converting the functionality to a plugin long-term?

I'm closing this PR as we won't be merging it, hope you understand the maintainers point of view.

@macfarla macfarla closed this Dec 10, 2024
@daanporon
Copy link
Author

@macfarla thanks for the clarification. We already worked around this by using NodePorts in stead of the Loadbalancers. But the last time i checked i was still having issues with the P2P discovery, and it trying to communicate back on the wrong UDP port. But i'll check again, and maybe create a partial PR with this fix: https://github.com/hyperledger/besu/pull/6578/files#diff-b10aec51aeaa23631d517c3107f0ab25c7f2afda83a8535e95d302b5c825a85cR298, because this was not only for the NAT manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discovery is not taking NATManager portMapping into account
6 participants