Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add provider record addresses to peerstore #870

Merged
merged 2 commits into from
Aug 22, 2023
Merged

Conversation

dennis-tra
Copy link
Contributor

@dennis-tra dennis-tra commented Aug 22, 2023

fixes #868

@dennis-tra dennis-tra requested review from guillaumemichel and a team as code owners August 22, 2023 11:59
Copy link
Contributor

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guillaumemichel guillaumemichel merged commit 2cbe38a into master Aug 22, 2023
@guillaumemichel guillaumemichel deleted the issue-868 branch August 22, 2023 13:33
@Jorropo
Copy link
Contributor

Jorropo commented Aug 23, 2023

What is the point of this ?
Identify already takes care of adding addresses of peer we are connected to.


I mean the fact we are even passing the addresses in the add endpoint seems wrong and I doubt we do.
If we do this is a bug, we shouldn't send all of our addresses multiplied by all of our CIDs, wastefull use of bandwidth.

@dennis-tra
Copy link
Contributor Author

The idea is to avoid the second lookup for network addresses after you've found the provider record.

The routed host will still do the lookup for new addresses if it cannot connect to the peer.

Up until now we relied on identify to keep the addresses around. IIRC @cortze's RFM showed improvement potential if we kept them around longer.

I don't think at all that this is a bug or anything. This is just caching.

@Jorropo
Copy link
Contributor

Jorropo commented Aug 23, 2023

@dennis-tra what does the add provider endpoint has to do with that ?

When I add a provider the remote peer already know my addresses since I'm connected to it.
You could use host.Peerstore() to fetch them instead, and be sure the client does not send them over the wire.
But also generally you don't need to store addresses on each provider record, you should have two table, one CID → [peerid] oen and one PeerID → [Address] one, in order to avoid storing lots of duplicated data.

@Jorropo
Copy link
Contributor

Jorropo commented Aug 23, 2023

Ok so, the provider store does indeed not store the addreses

pm.pstore.AddAddrs(provInfo.ID, provInfo.Addrs, ProviderAddrTTL)

I did checked that some addresses are sent over the wire and yes some are:

pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 8
pi.Addrs: 7
pi.Addrs: 16
pi.Addrs: 16
pi.Addrs: 16
pi.Addrs: 14
pi.Addrs: 7
pi.Addrs: 11
pi.Addrs: 5
pi.Addrs: 15
pi.Addrs: 12
pi.Addrs: 8
pi.Addrs: 1
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 7
pi.Addrs: 12
pi.Addrs: 12
pi.Addrs: 14
pi.Addrs: 21
pi.Addrs: 8
pi.Addrs: 26
pi.Addrs: 9
pi.Addrs: 17
pi.Addrs: 12
pi.Addrs: 1
pi.Addrs: 11
pi.Addrs: 4
pi.Addrs: 18
pi.Addrs: 8

However this code is at least buggy because it fails to run the addresses through the address filter (using maybeAddAddrs).
That means it saves all the private LAN addresses of the nodes doing PUTs in our peerstore which we want to stop be doing due to the smart dialler.

@dennis-tra
Copy link
Contributor Author

what does the add provider endpoint has to do with that ?

The add provider endpoint explicitly allows the client to associate addresses with the record.

When I add a provider the remote peer already know my addresses since I'm connected to it.

I agree that, as things currently are, it probably doesn’t matter if we take the addresses from the identify exchange or from the wire message. They probably contain the same set of addresses.

That’s not the point though. The thing that we want to address here is the TTL associated with the multiaddresses of that peer. However, I still think it’s better to take the explicitly passed multiaddresses from the request.

You could use host.Peerstore() to fetch them instead, and be sure the client does not send them over the wire.
But also generally you don't need to store addresses on each provider record, you should have two table, one CID → [peerid] oen and one PeerID → [Address] one, in order to avoid storing lots of duplicated data.

Not sure if I’m missing something but that is what we’re doing isn’t it? We’re storing the multihash -> PeerID mapping in a datastore and another mapping from PeerID to multiaddresses in the peerstore. We’re not storing any duplicate multiaddresses.

When a peer requests a record we look up the peer IDs in the datastore and then the multiaddresses in the peerstore. We serve both information to the client. However, we stop serving multiaddresses after 30 min because the TTL is not properly updated. We want to continue serving the multiaddresses for much longer. This PR attempted to fix that.

@Jorropo
Copy link
Contributor

Jorropo commented Aug 23, 2023

Not sure if I’m missing something but that is what we’re doing isn’t it? We’re storing the multihash -> PeerID mapping in a datastore and another mapping from PeerID to multiaddresses in the peerstore. We’re not storing any duplicate multiaddresses.

Yeah I wrote that before tracing the code. Still over the wire we send theses duplicates which is an ineficient use of resources.

I agree that, as things currently are, it probably doesn’t matter if we take the addresses from the identify exchange or from the wire message. They probably contain the same set of addresses.

That’s not the point though. The thing that we want to address here is the TTL associated with the multiaddresses of that peer. However, I still think it’s better to take the explicitly passed multiaddresses from the request.

We don't need to send the addresses over kad-dht to update the TTL.

IMO This fix just makes thing worst because it incentivize clients to do the wrong thing and waste bandwidth, before it was a bug and we did nothing with theses addresses now you turned it in a feature ✨.
You could call https://pkg.go.dev/github.com/libp2p/[email protected]/core/peerstore#AddrBook.UpdateAddrs, sadly it doesn't have a wildcard update, I think you could submit a PR to libp2p which add some magic wellknown TTL like 0 match all addresses already stored.
If you don't want to submit code to go-libp2p you could do:

// PseudoCode-Ish
addrs := dht.pstore.Addrs(p)
if f := dht.addrsFilter; f != nil {
 addrs = dht.addrsFilter(addrs)
}
pstore.AddAddrs(p, addrs, dht.providerTTL)

They both achieve the same thing except this allows us to stop sending maddrs along side add provider.


Edit: the pseudo code that fetch run through the filter and add the addresses back-in is better IMO, because it only increase the cache on the addresses we want to cache.

@Jorropo
Copy link
Contributor

Jorropo commented Aug 23, 2023

if len(pi.Addrs) < 1 {
	logger.Debugw("no valid addresses for provider", "from", p)
	continue
}

I actually this is on purpose 🤦, forget what I said.
We should eventually make a protocol upgrade with versioned stream handler so we can skip doing this. This is a multiple X background traffic saving.

Edit: new issue here #871

@dennis-tra dennis-tra added the v2 All issues related to the v2 rewrite label Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2 All issues related to the v2 rewrite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increased ProviderAddrTTL from RFM17.1 isn't applied
3 participants