Cache Provider Records in the network #386

aarshkshah1992 · 2019-08-29T13:59:38Z

Problem

In Kademlia, the K (20 for now) closest nodes to the hash H of some content are responsible for serving the providers for that content. As described quite well in libp2p/specs#163, as the content gets more popular, these nodes get overburdened since all provider lookups converge onto them. This creates both performance & availability problems.

Solution

The kademlia paper suggests caching records like so:

For caching purposes, once a lookup succeeds, the requesting node stores the (key,value)
pair at the closest node it observed to the key that did not return the value.
Because of the unidirectionality of the topology, future searches for the same key are 
likely to hit cached entries before querying the closest node.

Basically, once a node has finished fetching the providers for hash H from the network, it should store the providers on the node N closest to the hash H that did not return the providers when it was traversing the network. This expands the radius of providers & future queries for H from nodes further from H than N will discover the cached providers before they bother the K closest nodes.
This distributes provider lookups across the network & prevents them from converging on & congesting the same path.

Cache Eviction

From the kad paper:

To avoid “over-caching,” we make the expiration time of a (key,value) pair in any 
node’s database exponentially inversely proportional to the number of nodes 
between the current node and the node whose ID is closest to the key ID.

Basically, the further a node N is away from a node V which is the node that's closest to hash H in the whole network, the faster the expiry of the cached providers on N. This makes intuitive sense because queries for hash H from any node move closer & closer to V with each hop. So, the closer a node is to V, the higher the probability that it will be on a provider lookup path than a node that's further from V.

Question: Can't we simply use the XOR distance between N & H to determine expiry time rather than number of nodes between N & V since that is not trivial to do in our current implementation

Testing
libp2p testlab FTW !

@raulk @Stebalien

The text was updated successfully, but these errors were encountered:

aschmahmann · 2019-08-29T14:48:19Z

Can't we simply use the distance from H to determine expiry time rather than distance from V since determining V will need more effort.

Yes and No. According to your quote the time t that we store the data corresponding to H is c*2^-d where d is some key space distance metric and c is some preset coefficient. If we use the distance from H instead of from V then c will need to be very large in order to make the cache times non-negligible. However, if c is some preset coefficient then it depends on properties like network size (i.e. not really a constant).

This leaves us with some options:

Make c a property of the network so if people want to run DHTs aside from the shared global one they'll have to tweak the number appropriately
Have some metric for calculating network size (and therefore c) related to how filled the buckets are.
Ask the node doing the "put" to give the correct cache duration (since that node has searched for where to put the values already it should have a good idea, but it requires trusting the put node to do that correctly)

I'd prefer option 2 (or even 1) over 3 since 3 makes DoS attacks on particular record updates much easier than 1 or 2 do.

aarshkshah1992 · 2019-11-06T14:06:16Z

@aschmahmann I see exactly what you mean. Because distance(N, H) is potentially >> num_nodes(N,V). Given that 2^-d for us will be in [1, 2] (min d = 1, max d = 2^32 -1), we will have to choose c (based on the network size) carefully to come up with a meaningful expiry time in millis.

Let me come up with a rudimentary TODO.

aarshkshah1992 · 2019-11-06T14:45:05Z

Hmm, I now realise that we currently do not/should not allow a node to register another node as a provider which means this caching protocol can not work for us for Provider Records in it's current form.

I think using it for PutValue records first makes more sense.

TODO

Pick brains to come up with a constant based on the network size that produces meaningful expiry times for eviction.
Currently, we do not support a 'custom expiry time' for a PutValue record. The expiry time is determined by the caller who periodically sweeps the store(Expire and republish records received via PutValue #388). We need to make the expiry time a part of the serialized PutValue record for eviction purposes & ensure that we do not republish cached records as part of Expire and republish records received via PutValue #388.
Modify the dht.getValues query to keep a track of the closest peer it observed that did not return the value & asynchronously call PutValue on that peer after getValues finishes successfully.

@aschmahmann @Stebalien Please let me know your thoughts.

aarshkshah1992 changed the title ~~`Cache' Provider Records in the network~~ Cache Provider Records in the network Aug 29, 2019

libp2p deleted a comment from bitcard Mar 8, 2021

dennis-tra mentioned this issue Oct 2, 2023

Cache Provider Records in the network probe-lab/zikade#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Provider Records in the network #386

Cache Provider Records in the network #386

aarshkshah1992 commented Aug 29, 2019 •

edited

Loading

aschmahmann commented Aug 29, 2019

aarshkshah1992 commented Nov 6, 2019 •

edited

Loading

aarshkshah1992 commented Nov 6, 2019 •

edited

Loading

Cache Provider Records in the network #386

Cache Provider Records in the network #386

Comments

aarshkshah1992 commented Aug 29, 2019 • edited Loading

aschmahmann commented Aug 29, 2019

aarshkshah1992 commented Nov 6, 2019 • edited Loading

aarshkshah1992 commented Nov 6, 2019 • edited Loading

TODO

aarshkshah1992 commented Aug 29, 2019 •

edited

Loading

aarshkshah1992 commented Nov 6, 2019 •

edited

Loading

aarshkshah1992 commented Nov 6, 2019 •

edited

Loading