Dial all configured known relay and direct node addresses on schedule #622

sandreae · 2024-06-14T20:30:24Z

When a node is first launched it attempts to connect to any configured relay or direct node addresses, the steps are as follows:

register at all relay/rendezvous nodes for peer discovery and connections
connect to and initiate a replication sessions with all relay/rendezvous nodes
connect to and initiate a replication sessions with all direct nodes

If the device where a node is running is offline when the app starts, all of these steps will fail and no further connection attempts are made. Similarly, if the node loses connectivity after it started, connections will be dropped for good.

This PR implements a modest solution for points 2) an 3) above by adding a polling service to the EventLoop which attempts to (re)connect to all known peer addresses (relay or otherwise) every x seconds. A connection is only established if none already exist to the address in question. Replication sessions will be initiated with any peers that we successfully (re)connect to.

Finding a solution in the same situation for point 1) above is a little more involved, these changes seemed like easy low hanging fruit which brings greatly improved UX to node users, especially in situations where nodes run for long periods and connections may drop.

📋 Checklist

~~Add tests that cover your changes~~
Add this PR to the Unreleased section in CHANGELOG.md
Link this PR to any issues it closes
New files contain a SPDX license header

adzialocha

Looking good!

On mobile phones we have ways to find out about the connection status of a device, it could be nice to have manual methods to disconnect and connect on the Node API instead of frequent checks, that could be a more efficient approach. Pragmatically this PR is totally fine though and solves the issue.

* Make clippy happy * Revert "Make clippy happy" This reverts commit e250ccd. * Try fmt and clippy again * Add clippy suggestions * Allow setting path to config file via env args (p2panda#611) * Enable passing path to config file via env args * Remove println * Update comment * Remove unwanted file * Update CHANGELOG * Accept domain name and ip addresses for peers (p2panda#612) * Accept String for relay and direct peer addresses in config * Use ToSocketAddress to handle ip and domain name addresses * Clippy * fmt * Update CHANGELOG * Update example config.toml * Prepare CHANGELOG for release * 0.7.2 * Fix: query for child relations fails when relation list empty (p2panda#614) * Add test get_child_document_ids test case for document with empty relation list * Account for null values when relation lists are empty * Update test comment * Update CHANGELOG * 0.7.3 * Re-run tasks for partially materialized blobs (p2panda#618) * Check materialized blob file is complete before aborting task * Add test * fmt * Update CHANGELOG * Clippy * Correct cmp logic * Remove double comment --------- Co-authored-by: adz <[email protected]> * Fix: include all logs from target schema id during replication (p2panda#620) * Include tombstoned documents when calculating local log heights * Clippy * Update CHANGELOG * Make clippy happy * Bump rust gh action to v1 and define toolchain version * Introduce `PeerAddress` struct for improved address resolution patterns (p2panda#621) * Introduce PeerAddress struct with socket and multiaddr resolution methods * Don't pop of p2p protocol from relay address as it isn't there * fmt * Update CHANGELOG * Cache socket addresses * Remove Multiaddr from PeerAddress * Remove serde traits from PeerAddress * Add doc string to PeerAddress * Rename methods * Re-apply unhandled operations during startup of materializer service (p2panda#623) * Store method to get all un-indexed operation ids * Pick up un-indexed operations when starting materializer service, add a test * Add entry to CHANGELOG.md * Increase `max_pending_connections_*` (p2panda#628) * Increase max pending connections * Update CHANGELOG * Dial all configured known relay and direct node addresses on schedule (p2panda#622) * Poll all known peer addresses * Update PeerAddress method name * Update CHANGELOG * WIP: poll known peers * Check if a direct node was identified (and add comments) * Don't dial direct node address on startup, rely on scheduler * More comments * Remove unused import * fmt * Doc strings for EventLoop struct * Clippy * 0.7.4 * Minor CHANGELOG.md formatting change * Fix: handle connection ids greater than 9 in `Peer` impl of `Human` trait (p2panda#634) * Handle connection ids greater than 9 in peer Human impl * Clippy * Update CHANGELOG * Bump `libp2p` to version `0.53.2` (p2panda#631) * Bump libp2p to version 0.53.2 * We don't need to listen on tcp port when in relay mode * Listening on relay circuit no longer sometimes fails * Remove tcp feature requirement from libp2p * Refactor connection_keep_alive method * Clippy * Remove unnecessary connection_keep_alive method from peers behaviour * Add CHANGELOG.md entry --------- Co-authored-by: adz <[email protected]> * Move relay connection logic into main event loop (p2panda#632) * Bump `libp2p` to version `0.53.2` (p2panda#631) * Bump libp2p to version 0.53.2 * We don't need to listen on tcp port when in relay mode * Listening on relay circuit no longer sometimes fails * Remove tcp feature requirement from libp2p * Refactor connection_keep_alive method * Clippy * Remove unnecessary connection_keep_alive method from peers behaviour * Add CHANGELOG.md entry --------- Co-authored-by: adz <[email protected]> * Move network service relay initialization into main event loop * Clippy * Add DCUTR event debug logging to swarm * Change log message * Adjust connection limits * Even nicer log messages * Helper to print or info log depending on log level * Listening on relay circuit no longer sometimes fails --------- Co-authored-by: adz <[email protected]> * Support private net with pre-shared key (p2panda#635) * Swarm listens on both TCP and QUIC addresses * Support both QUIC and TCP protocols * TCP port_reuse should be false * Establish a private net over TCP when psk provided in NetworkConfig * Initiate swarm with private net when psk provided in config * Update CHANGELOG * Doc string fix * Don't need to differentiate between transports when detecting port * Update README * Fix README formatting * Update example config file * Check if blob file exists before deleting it from fs (p2panda#636) * Check if blob file exists before deleting it from fs * Add entry to CHANGELOG.md * Inconsistent blob storage warning was wrongly shown (p2panda#638) * Inconsistent blob storage warning was wrongly shown * Add entry to CHANGELOG.md * Minor config.toml cleanup * Safely handle missing document when retrieving document view from store (p2panda#637) * Return None when document was deleted * Add entry to CHANGELOG.md * Introduce API to subscribe to peer connection events (p2panda#625) * Introduce API to subscribe to peer connection events * Add entry to CHANGELOG.md * 0.8.0 * Also bump version in aquadoggo_cli, add note about that in RELEASE.md * Adjust level of replication session and document materialization logs (p2panda#639) * Remove relay and direct peer poll attempt logging * Change document creation/update/delete logging to info level * Lower level of replication session logs to debug * Update CHANGELOG * Remove incorrectly commit file * Lower logging level for replication finished message * Fix logging logic error in reducer * Improve GraphQL re-build error * Update README.md * Expose NodeEvent to public API (p2panda#643) * Expose NodeEvent to public API * Add entry to CHANGELOG.md --------- Co-authored-by: adz <[email protected]> Co-authored-by: Sam Andreae <[email protected]> Co-authored-by: adz <[email protected]>

sandreae changed the title ~~Poll known peer addresses~~ Dial all configured known relay and direct node addresses on schedule Jun 14, 2024

sandreae requested a review from adzialocha June 14, 2024 20:36

sandreae marked this pull request as draft June 14, 2024 20:56

adzialocha approved these changes Jun 15, 2024

View reviewed changes

sandreae force-pushed the poll-known-peer-addresses branch from 2795c7b to f2d464d Compare June 18, 2024 18:06

sandreae marked this pull request as ready for review June 18, 2024 18:06

sandreae changed the base branch from improve-peer-addr-resolution to main June 18, 2024 18:11

sandreae requested a review from adzialocha June 18, 2024 18:45

sandreae added 11 commits June 18, 2024 19:55

Poll all known peer addresses

e9474f1

Update PeerAddress method name

ac792ac

Update CHANGELOG

3d887d5

WIP: poll known peers

18f7848

Check if a direct node was identified (and add comments)

fd7ef1f

Don't dial direct node address on startup, rely on scheduler

ec5d650

More comments

4a03e97

Remove unused import

d5bd91c

fmt

848de5c

Doc strings for EventLoop struct

06883c8

Clippy

6c13f1b

sandreae force-pushed the poll-known-peer-addresses branch from a2cecc1 to 6c13f1b Compare June 18, 2024 19:02

sandreae merged commit 2ed3c4e into main Jun 18, 2024
8 checks passed

sandreae deleted the poll-known-peer-addresses branch June 25, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dial all configured known relay and direct node addresses on schedule #622

Dial all configured known relay and direct node addresses on schedule #622

sandreae commented Jun 14, 2024 •

edited

Loading

adzialocha left a comment

Dial all configured known relay and direct node addresses on schedule #622

Dial all configured known relay and direct node addresses on schedule #622

Conversation

sandreae commented Jun 14, 2024 • edited Loading

📋 Checklist

adzialocha left a comment

Choose a reason for hiding this comment

sandreae commented Jun 14, 2024 •

edited

Loading