Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External sender queue implementation #308

Merged
merged 5 commits into from
Nov 5, 2018
Merged

Conversation

vkomenda
Copy link
Contributor

@vkomenda vkomenda commented Oct 29, 2018

Fixes #226

This PR implements a reference spam control mechanism - the sender queue - by requiring all peers implement post-processing of outgoing messages as follows:

  • Peer epochs are monitored.

  • An outgoing message is sent to a peer if and when the sender reckons the peer is at an epoch that matches the epoch of the message. If the peer is already ahead, the message is discarded by the sender instead of being sent.

  • Target::All messages may be split into Target::Node messages for each peer if peers happen to be at different epochs, some of which match the epoch of the message while others don't.

Incoming message queues have been removed. Message recipients discard obsolete messages and return a Fault for messages with epochs too far ahead.

The sender queue is agnostic to the differences between validators and observers. All it cares about is what the peer IDs are and what is the ID of the node it is running on. I believe this is approximately in line with Option 2 of @afck's comment.

There are three implementations of a sender queue, one for each of HoneyBadger, DynamicHoneyBadger and QueueingHoneyBadger, although the last one uses the DynamicHoneyBadger implementation, so those two are essentially the same.

Comments welcome: I'm not sure whether Epoched::Epoch is fine to stay Copy rather than Clone. Maybe I should have implemented an optional storage for the computed epoch and make Epoched::Epoch to be Clone instead to avoid possible recomputation of the epoch of the same message multiple times.

@vkomenda vkomenda requested review from mbr, c0gent and afck October 29, 2018 16:48
@c0gent
Copy link
Contributor

c0gent commented Oct 29, 2018

Vlad, could you provide a brief (but complete) outline of the changes you've made just for a frame of reference? I haven't been following the this topic closely so it would be helpful for me but it would also make it easier for us in the future if we ever look back at this. I usually like to put a description or list of changes in the commit message but here in the PR is fine too.

@vkomenda
Copy link
Contributor Author

@c0gent, I updated the description as you asked.

@c0gent
Copy link
Contributor

c0gent commented Oct 29, 2018

Excellent description!

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think that's the right approach to the sender queue.
(Only halfway through the review so far. Will continue in the afternoon.)

I'm still a bit concerned about getting rid of the general "observer", and instead coupling that concept to the peer turning up inside an InProgress in a batch. I wonder whether there's something to be gained from keeping that concept more general, and allowing the user to just tell the sender queue if an observer has connected.
But maybe both can be combined anyway, and it's certainly something for a separate PR, if at all.

let dyn_hb = DynamicHoneyBadger::builder().build(netinfo);
QueueingHoneyBadger::builder(dyn_hb)
let dhb = DynamicHoneyBadger::builder().build(netinfo.clone());
let netinfo = Arc::new(netinfo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Arc needed? It seems like we don't even have to clone netinfo: We could collect our_id and all_ids before building dhb.

src/dynamic_honey_badger/batch.rs Outdated Show resolved Hide resolved
src/dynamic_honey_badger/builder.rs Outdated Show resolved Hide resolved
src/dynamic_honey_badger/mod.rs Outdated Show resolved Hide resolved
src/dynamic_honey_badger/sender_queue.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
src/traits.rs Outdated Show resolved Hide resolved
src/traits.rs Outdated Show resolved Hide resolved
src/traits.rs Show resolved Hide resolved
src/traits.rs Outdated
.iter()
.filter(|(_, them)| isnt_earlier_epoch(them))
.map(|(id, _)| id)
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid creating these collections by just pushing the messages right away. Especially if remote_epochs contained all remote nodes, it would be a lot simpler:

for (id, them) in remote_epochs {
    if is_accepting_epoch(&message, them) {
        passed_msgs.push(Target::Node(id.clone()).message(message.clone()));
    } else if is_later_epoch(&message, them) {
        deferred_msgs.push((id.clone(), message.clone()));
    }
}

(Although I might be confused by all the negations. 😵)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this ⬆️ would make the code simpler and avoid allocating the two sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about double negations. Let's keep them away if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should only be one extra negation in your else if clause.

src/sender_queue/mod.rs Outdated Show resolved Hide resolved
&& !node.instance().has_input()
&& node.instance().netinfo().is_validator()
&& !node.instance().algo().has_input()
&& node.instance().algo().netinfo().is_validator()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add these as convenience methods to SenderQueue (has_input only for SenderQueue<DHB> and SenderQueue<HB>, of course).

@afck
Copy link
Collaborator

afck commented Oct 31, 2018

I just realized that we probably need to include max_future_epochs in the JoinPlan now (or, alternatively, include more information about which epochs we accept in the EpochStarted message). Otherwise a node with max_future_epochs == 3 will send messages to one with max_future_epochs == 2 that it won't accept yet.

With this and #288, it would make sense to add a mechanism that on network startup and for joining nodes ensures that the nodes all agree on those parameters (but probably in a separate PR).

@vkomenda
Copy link
Contributor Author

I reckon including max_future_epochs in every EpochStarted message would be a bit chatty. But including it in the JoinPlan is doable. I can do it in this PR if you prefer.

@vkomenda vkomenda force-pushed the vk-extern-sender-queue branch 10 times, most recently from e021c40 to 8247d33 Compare November 1, 2018 14:33
.expect("instantiate QueueingHoneyBadger")
.expect("instantiate QueueingHoneyBadger");
let our_id = *netinfo.our_id();
let peer_ids = netinfo.all_ids().filter(|&&them| them != our_id).cloned();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we collect these before creating dhb, we don't need to clone the netinfo at all.

src/dynamic_honey_badger/sender_queueable.rs Outdated Show resolved Hide resolved
src/sender_queue/mod.rs Outdated Show resolved Hide resolved
c0gent
c0gent previously requested changes Nov 1, 2018
src/dynamic_honey_badger/mod.rs Show resolved Hide resolved
@c0gent
Copy link
Contributor

c0gent commented Nov 1, 2018

Also, how is the era/epoch terminology defined now?

@c0gent c0gent dismissed their stale review November 1, 2018 16:29

Unnecessary

@c0gent
Copy link
Contributor

c0gent commented Nov 1, 2018

Hydrabadger works fine with these changes fyi. Batch::seqnum definitely needs to be renamed.

Copy link
Contributor

@c0gent c0gent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More documentation of era, epoch terminology needed.

src/dynamic_honey_badger/batch.rs Outdated Show resolved Hide resolved

/// Returns the **next** `HoneyBadger` epoch after the sequential epoch of the batch.
fn epoch(&self) -> u64 {
self.epoch + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that very counterintuitive. Should there maybe be a next_epoch method in SenderQueueableOutput instead?

fn is_obsolete(&self, them: <Self as Epoched>::Epoch) -> bool;
}

pub trait SenderQueueableOutput<N, M>: Epoched
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add next_epoch, we can probably remove the Epoched trait bound here.

src/traits.rs Outdated
Target::Node(id) => peer_epochs
.get(&id)
.map_or(false, |&them| message.is_accepted(them, max_future_epochs)),
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel it would be easier to just iterate through all the messages in the big loop below, and populate passed_msgs there, too. That would avoid allocating and then consuming failed_msgs.
(Can also be refactored in another PR, of course.)

src/traits.rs Outdated
.iter()
.filter(|(_, them)| isnt_earlier_epoch(them))
.map(|(id, _)| id)
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this ⬆️ would make the code simpler and avoid allocating the two sets.

@afck
Copy link
Collaborator

afck commented Nov 1, 2018

I won't be able to continue the review before Monday, but if @mbr and @c0gent approve, feel free to merge.

@vkomenda
Copy link
Contributor Author

vkomenda commented Nov 1, 2018

If that is needed, I would rather add max_future_epochs to JoinPlan in a separate PR not to overcrowd this one.

@vkomenda vkomenda force-pushed the vk-extern-sender-queue branch from b1346be to 11c185b Compare November 1, 2018 18:41
@vkomenda vkomenda force-pushed the vk-extern-sender-queue branch from 59af49b to 4563df3 Compare November 2, 2018 10:09
@vkomenda
Copy link
Contributor Author

vkomenda commented Nov 2, 2018

Rebased on top of master.

The last commit adds an intermediate type between non-linear epochs Epoched::Epoch and linear epochs of DHB, that is u64. This intermediate type is Epoched::LinEpoch of "linearizable epochs". It can be converted to u64 but not vice versa. Also peer_epochs now use this type, which allows to avoid unreachable match cases.

src/traits.rs Outdated Show resolved Hide resolved
@vkomenda vkomenda force-pushed the vk-extern-sender-queue branch from 4563df3 to 52cbde4 Compare November 5, 2018 13:38
@vkomenda vkomenda merged commit 62cd29e into master Nov 5, 2018
@vkomenda vkomenda deleted the vk-extern-sender-queue branch November 5, 2018 19:05
@afck afck mentioned this pull request Nov 6, 2018
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants