Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

authority-discovery: Populate DHT records with public listen addresses #6298

Merged
merged 28 commits into from
Nov 5, 2024

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Oct 30, 2024

This PR's main goal is to add public listen addresses to the DHT authorities records. This change improves the discoverability of validators that did not provide the --public-addresses flag.

This PR populates the authority DHT records with public listen addresses if any.

The change effectively ensures that addresses are added to the DHT record in following order:

  1. Public addresses provided by CLI --public-addresses
  2. Maximum of 4 public (global) listen addresses (if any)
  3. Any external addresses discovered from the network (ie from /identify protocol)

While at it, this PR adds the following constraints on the number of addresses:

  • Total number of addresses cached is bounded at 16 (increased from 10).
  • A maximum number of 32 addresses are published to DHT records (previously unbounded).
  • A maximum of 4 global listen addresses are utilized.

This PR also removes the following warning:
WARNING: No public address specified, validator node may not be reachable.

Next Steps

  • deploy and monitor in versi network

Closes: #6280
Part of: #5266

cc @paritytech/networking

@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I5-enhancement An additional feature request. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. A4-needs-backport Pull request must be backported to all maintained releases. labels Oct 30, 2024
@lexnv lexnv self-assigned this Oct 30, 2024
Signed-off-by: Alexandru Vasile <[email protected]>
Comment on lines -207 to -211
eprintln!(
"WARNING: No public address specified, validator node may not be reachable.
Consider setting `--public-addr` to the public IP address of this node.
This will become a hard requirement in future versions."
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I would keep this warning, because there is no guarantee we will have a global listen address nor that the external address will be discovered correctly (with libp2p address translation). Especially since there is a practice of "hiding" validators behind firewalls / NATs.

@lexnv
Copy link
Contributor Author

lexnv commented Nov 1, 2024

Versi Network Triage Report

Repo Count Level Triage report
polkadot-sdk 10000 error 🥩 for session starting at block .* no BEEFY authority key found in store
polkadot-sdk 144 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Invalid justification. Banned, disconnecting.)
polkadot-sdk 50 error Warp sync failed. Continuing with full sync.
polkadot-sdk 50 error Can't use warp sync mode with a partially synced database. Reverting to full sync mode.
polkadot-sdk 18 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Peer disconnected with inflight after backoffs. Banned, disconnecting.)
polkadot-sdk 4 error 💔 Called on_validated_block_announce with a bad peer ID .*
polkadot-sdk 3 error Call to DetermineUndisputedChain failed
polkadot-sdk 3 warn Received msg before first active leaves update. This is not expected - message will be dropped.
polkadot-sdk 3 warn Re-finalized block #.* (.) in the canonical chain, current best finalized is #.
  • 🥩 for session starting at block .* no BEEFY authority key found in store versi is misconfigured after a scale up and down some time ago
  • Warp sync failed and Can't use warp sync mode with a partially synced database. - believe these are related to deploying a new version and having old databases available

@lexnv
Copy link
Contributor Author

lexnv commented Nov 1, 2024

I have moved the warning to the authority-discovery mechanism instead of removing it.

This has the following implications:

  • warning is printed only once, with the first authority record publish to DHT if any
  • libp2p: Local listen addresses are propagated via swarm events, and therefore the libp2p object needs to be polled a few times. On my local machine, the libp2p backend produces listen addresses before authority-discovery is started.
  • litep2p: Liten addresses are available immediately via API calls

@dmitry-markin
Copy link
Contributor

  • warning is printed only once, with the first authority record publish to DHT if any

Is it printed soon enough for a node operator to notice?

@lexnv
Copy link
Contributor Author

lexnv commented Nov 1, 2024

From my testing it should be printed roughly within 10 seconds:

2024-11-01 15:17:15.675  INFO main sc_cli::runner: Parity Polkadot
...
2024-11-01 15:17:26.671  WARN tokio-runtime-worker sub-authority-discovery: No public addresses configured and no global...

I used cumulus/zombienet/tests/0003-full_node_catching_up.toml to double check this:

  • alice and bob relay chain validators: they print the warning one single time
  • charlie parachain collator: no warning
  • dave and eve parachain full node: no warning
  • ferdie: cumulus node with embedded light client: no warning

Signed-off-by: Alexandru Vasile <[email protected]>
@kianenigma
Copy link
Contributor

Who would be a good second reviewer of this code? I would love to get it approved, but it is too far from my domain to review in a meaningful way.

@dmitry-markin
Copy link
Contributor

I would ask @alexggh

@lexnv
Copy link
Contributor Author

lexnv commented Nov 4, 2024

Alex and Sebastian are on holiday for quite some time

Would love if maybe @alindima and @bkchr could have a look since I'm not that familiar with collators 🙏

@lexnv lexnv requested a review from koute as a code owner November 4, 2024 16:46
@lexnv lexnv added this pull request to the merge queue Nov 5, 2024
Merged via the queue into master with commit 762f824 Nov 5, 2024
189 of 192 checks passed
@lexnv lexnv deleted the lexnv/warn-dhtfix branch November 5, 2024 10:13
@paritytech-cmd-bot-polkadot-sdk

Created backport PR for stable2407:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-6298-to-stable2407
git worktree add --checkout .worktree/backport-6298-to-stable2407 backport-6298-to-stable2407
cd .worktree/backport-6298-to-stable2407
git reset --hard HEAD^
git cherry-pick -x 762f824cb3adf23c72a6645f575057bfb543850b
git push --force-with-lease

github-actions bot pushed a commit that referenced this pull request Nov 5, 2024
#6298)

This PR's main goal is to add public listen addresses to the DHT
authorities records. This change improves the discoverability of
validators that did not provide the `--public-addresses` flag.

This PR populates the authority DHT records with public listen addresses
if any.

The change effectively ensures that addresses are added to the DHT
record in following order:
 1. Public addresses provided by CLI `--public-addresses`
 2. Maximum of 4 public (global) listen addresses (if any)
3. Any external addresses discovered from the network (ie from
`/identify` protocol)

While at it, this PR adds the following constraints on the number of
addresses:
- Total number of addresses cached is bounded at 16 (increased from 10).
- A maximum number of 32 addresses are published to DHT records
(previously unbounded).
  - A maximum of 4 global listen addresses are utilized.

This PR also removes the following warning:
`WARNING: No public address specified, validator node may not be
reachable.`

### Next Steps
- [ ] deploy and monitor in versi network

Closes: #6280
Part of: #5266

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
(cherry picked from commit 762f824)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2409:

ordian added a commit that referenced this pull request Nov 5, 2024
* master: (129 commits)
  pallet-revive: Use `RUSTUP_TOOLCHAIN` if set (#6365)
  [eth-rpc] proxy /health (#6360)
  [Release|CI/CD] adjust release pipelines (#6366)
  Bump the known_good_semver group across 1 directory with 3 updates (#6339)
  Run check semver in MQ (#6287)
  [Deprecation] deprecate treasury `spend_local` call and related items (#6169)
  refactor and harden check_core_index (#6217)
  litep2p: Update litep2p to v0.8.0 (#6353)
  [pallet-staking] Additional check for virtual stakers (#5985)
  migrate pallet-remarks to v2 bench syntax (#6291)
  Remove leftover references of Wococo (#6361)
  snowbridge: allow account conversion for Ethereum accounts (#6221)
  authority-discovery: Populate DHT records with public listen addresses (#6298)
  Bounty Pallet: add `approve_bounty_with_curator` call to `bounties` pallet (#5961)
  Silent annoying log (#6351)
  [pallet-revive] rework balance transfers (#6187)
  `statement-distribution`: RFC103 implementation (#5883)
  Disable flaky tests reported in #6343 / #6345 (#6346)
  migrate pallet-recovery to benchmark V2 syntax (#6299)
  inclusion emulator: correctly handle UMP signals (#6178)
  ...

if let Some(multiaddr::Protocol::P2p(peer_id)) = address.iter().last() {
if peer_id != *local_peer_id.as_ref() {
error!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line for sure breaks this test:

alice: count of log lines containing "error" is 0 within 10 seconds

Does it need to be error? If so, I can adjust the condition in test to be 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it back to warning for now, sorry missed that 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is coming from the validator setup:

2024-11-05 20:07:15.454	
2024-11-05 18:07:15.454 ERROR tokio-runtime-worker sub-authority-discovery: No public addresses configured and no global listen addresses found. Authority DHT record may contain unreachable addresses. Consider setting `--public-addr` to the public IP address of this node. This will become a hard requirement in future versions for authorities.    

Will adjust the testing instead to accommodate this 🙏

github-merge-queue bot pushed a commit that referenced this pull request Nov 7, 2024
This PR fixes the `0002-validators-warp-sync` zombienet test by matching
for a specific error case.

The following PR introduced a new error for validators that don't have
public addresses available for publishing in the DHT:
- #6298
The log line was moved from `eprintln!("Warning: ...` in sc-cli to the
authority discovery where it is printed properly as an error.


cc @paritytech/sdk-node

Signed-off-by: Alexandru Vasile <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Nov 7, 2024
…6380)

This PR ensures that external addresses with different PeerIDs are not
propagated to the higher layer of the network code.

While at it, this ensures that libp2p only adds the `/p2p/peerid` part
to the discovered address if it does not contain it already.

This is a followup from:
- #6298

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
paritytech-ci pushed a commit that referenced this pull request Nov 7, 2024
…6380)

This PR ensures that external addresses with different PeerIDs are not
propagated to the higher layer of the network code.

While at it, this ensures that libp2p only adds the `/p2p/peerid` part
to the discovered address if it does not contain it already.

This is a followup from:
- #6298

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
lexnv added a commit that referenced this pull request Nov 15, 2024
…6380)

This PR ensures that external addresses with different PeerIDs are not
propagated to the higher layer of the network code.

While at it, this ensures that libp2p only adds the `/p2p/peerid` part
to the discovered address if it does not contain it already.

This is a followup from:
- #6298

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
lexnv added a commit that referenced this pull request Nov 15, 2024
#6298)

This PR's main goal is to add public listen addresses to the DHT
authorities records. This change improves the discoverability of
validators that did not provide the `--public-addresses` flag.

This PR populates the authority DHT records with public listen addresses
if any.

The change effectively ensures that addresses are added to the DHT
record in following order:
 1. Public addresses provided by CLI `--public-addresses`
 2. Maximum of 4 public (global) listen addresses (if any)
3. Any external addresses discovered from the network (ie from
`/identify` protocol)


While at it, this PR adds the following constraints on the number of
addresses:
- Total number of addresses cached is bounded at 16 (increased from 10).
- A maximum number of 32 addresses are published to DHT records
(previously unbounded).
  - A maximum of 4 global listen addresses are utilized.
      
This PR also removes the following warning:
`WARNING: No public address specified, validator node may not be
reachable.`

### Next Steps
- [ ] deploy and monitor in versi network

Closes: #6280
Part of: #5266

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
lexnv added a commit that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request. T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid warning for parachain node
5 participants