-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External sender queue implementation #308
Conversation
Vlad, could you provide a brief (but complete) outline of the changes you've made just for a frame of reference? I haven't been following the this topic closely so it would be helpful for me but it would also make it easier for us in the future if we ever look back at this. I usually like to put a description or list of changes in the commit message but here in the PR is fine too. |
@c0gent, I updated the description as you asked. |
Excellent description! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I think that's the right approach to the sender queue.
(Only halfway through the review so far. Will continue in the afternoon.)
I'm still a bit concerned about getting rid of the general "observer", and instead coupling that concept to the peer turning up inside an InProgress
in a batch. I wonder whether there's something to be gained from keeping that concept more general, and allowing the user to just tell the sender queue if an observer has connected.
But maybe both can be combined anyway, and it's certainly something for a separate PR, if at all.
examples/simulation.rs
Outdated
let dyn_hb = DynamicHoneyBadger::builder().build(netinfo); | ||
QueueingHoneyBadger::builder(dyn_hb) | ||
let dhb = DynamicHoneyBadger::builder().build(netinfo.clone()); | ||
let netinfo = Arc::new(netinfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Arc
needed? It seems like we don't even have to clone netinfo
: We could collect our_id
and all_ids
before building dhb
.
src/traits.rs
Outdated
.iter() | ||
.filter(|(_, them)| isnt_earlier_epoch(them)) | ||
.map(|(id, _)| id) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid creating these collections by just pushing the messages right away. Especially if remote_epochs
contained all remote nodes, it would be a lot simpler:
for (id, them) in remote_epochs {
if is_accepting_epoch(&message, them) {
passed_msgs.push(Target::Node(id.clone()).message(message.clone()));
} else if is_later_epoch(&message, them) {
deferred_msgs.push((id.clone(), message.clone()));
}
}
(Although I might be confused by all the negations. 😵)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this ⬆️ would make the code simpler and avoid allocating the two sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about double negations. Let's keep them away if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should only be one extra negation in your else if
clause.
&& !node.instance().has_input() | ||
&& node.instance().netinfo().is_validator() | ||
&& !node.instance().algo().has_input() | ||
&& node.instance().algo().netinfo().is_validator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add these as convenience methods to SenderQueue
(has_input
only for SenderQueue<DHB>
and SenderQueue<HB>
, of course).
I just realized that we probably need to include With this and #288, it would make sense to add a mechanism that on network startup and for joining nodes ensures that the nodes all agree on those parameters (but probably in a separate PR). |
I reckon including |
e021c40
to
8247d33
Compare
examples/simulation.rs
Outdated
.expect("instantiate QueueingHoneyBadger") | ||
.expect("instantiate QueueingHoneyBadger"); | ||
let our_id = *netinfo.our_id(); | ||
let peer_ids = netinfo.all_ids().filter(|&&them| them != our_id).cloned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we collect these before creating dhb
, we don't need to clone the netinfo
at all.
Also, how is the era/epoch terminology defined now? |
Hydrabadger works fine with these changes fyi. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More documentation of era
, epoch
terminology needed.
src/honey_badger/batch.rs
Outdated
|
||
/// Returns the **next** `HoneyBadger` epoch after the sequential epoch of the batch. | ||
fn epoch(&self) -> u64 { | ||
self.epoch + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that very counterintuitive. Should there maybe be a next_epoch
method in SenderQueueableOutput
instead?
src/sender_queue/mod.rs
Outdated
fn is_obsolete(&self, them: <Self as Epoched>::Epoch) -> bool; | ||
} | ||
|
||
pub trait SenderQueueableOutput<N, M>: Epoched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add next_epoch
, we can probably remove the Epoched
trait bound here.
src/traits.rs
Outdated
Target::Node(id) => peer_epochs | ||
.get(&id) | ||
.map_or(false, |&them| message.is_accepted(them, max_future_epochs)), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel it would be easier to just iterate through all the messages in the big loop below, and populate passed_msgs
there, too. That would avoid allocating and then consuming failed_msgs
.
(Can also be refactored in another PR, of course.)
src/traits.rs
Outdated
.iter() | ||
.filter(|(_, them)| isnt_earlier_epoch(them)) | ||
.map(|(id, _)| id) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this ⬆️ would make the code simpler and avoid allocating the two sets.
If that is needed, I would rather add |
b1346be
to
11c185b
Compare
59af49b
to
4563df3
Compare
Rebased on top of master. The last commit adds an intermediate type between non-linear epochs |
4563df3
to
52cbde4
Compare
Fixes #226
This PR implements a reference spam control mechanism - the sender queue - by requiring all peers implement post-processing of outgoing messages as follows:
Peer epochs are monitored.
An outgoing message is sent to a peer if and when the sender reckons the peer is at an epoch that matches the epoch of the message. If the peer is already ahead, the message is discarded by the sender instead of being sent.
Target::All
messages may be split intoTarget::Node
messages for each peer if peers happen to be at different epochs, some of which match the epoch of the message while others don't.Incoming message queues have been removed. Message recipients discard obsolete messages and return a
Fault
for messages with epochs too far ahead.The sender queue is agnostic to the differences between validators and observers. All it cares about is what the peer IDs are and what is the ID of the node it is running on. I believe this is approximately in line with Option 2 of @afck's comment.
There are three implementations of a sender queue, one for each of
HoneyBadger
,DynamicHoneyBadger
andQueueingHoneyBadger
, although the last one uses theDynamicHoneyBadger
implementation, so those two are essentially the same.Comments welcome: I'm not sure whether
Epoched::Epoch
is fine to stayCopy
rather thanClone
. Maybe I should have implemented an optional storage for the computed epoch and makeEpoched::Epoch
to beClone
instead to avoid possible recomputation of the epoch of the same message multiple times.