Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collation fetching fairness #4880

Merged
merged 158 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
f4738dc
Collation fetching fairness
tdimitrov Jun 26, 2024
c7074da
Comments
tdimitrov Jun 26, 2024
73eee87
Fix tests and add some logs
tdimitrov Jun 26, 2024
fa321ce
Fix per para limit calculation in `is_collations_limit_reached`
tdimitrov Jun 27, 2024
96392a5
Fix default `TestState` initialization: claim queue len should be equ…
tdimitrov Jun 27, 2024
0f28aa8
clippy
tdimitrov Jun 27, 2024
e5ea548
Update `is_collations_limit_reached` - remove seconded limit
tdimitrov Jun 28, 2024
9abc898
Fix pending fetches and more tests
tdimitrov Jul 1, 2024
c07890b
Remove unnecessary clone
tdimitrov Jul 1, 2024
e50440e
Comments
tdimitrov Jul 1, 2024
42b05c7
Better var names
tdimitrov Jul 1, 2024
2f5a466
Fix `pick_a_collation_to_fetch` and add more tests
tdimitrov Jul 2, 2024
ff96ef9
Fix test: `collation_fetching_respects_claim_queue`
tdimitrov Jul 2, 2024
e837689
Add `collation_fetching_fallback_works` test + comments
tdimitrov Jul 2, 2024
91cdd13
More tests
tdimitrov Jul 3, 2024
9f2d59b
Fix collation limit fallback
tdimitrov Jul 3, 2024
a10c86d
Separate `claim_queue_support` from `ProspectiveParachainsMode`
tdimitrov Jul 3, 2024
b39858a
Fix comments and add logs
tdimitrov Jul 3, 2024
b30f340
Update test: `collation_fetching_prefer_entries_earlier_in_claim_queue`
tdimitrov Jul 3, 2024
c0f18b9
Fix `pick_a_collation_to_fetch` and more tests
tdimitrov Jul 3, 2024
703ed6d
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Jul 3, 2024
fba7ca6
Fix `pick_a_collation_to_fetch` - iter 1
tdimitrov Jul 4, 2024
d4f4ce2
Fix `pick_a_collation_to_fetch` - iter 2
tdimitrov Jul 4, 2024
5f52712
Remove a redundant runtime version check
tdimitrov Jul 4, 2024
6c73e24
formatting and comments
tdimitrov Jul 4, 2024
752f3cc
pr doc
tdimitrov Jul 4, 2024
f0069f1
add license
tdimitrov Jul 4, 2024
6b9f0b3
clippy
tdimitrov Jul 4, 2024
5f6dcdd
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Jul 4, 2024
b8c1b85
Update prdoc/pr_4880.prdoc
tdimitrov Jul 5, 2024
f26362f
Limit collations based on seconded count instead of number of fetches
tdimitrov Jul 7, 2024
d6857fc
Undo rename: is_seconded_limit_reached
tdimitrov Jul 7, 2024
cde28cd
fix collation tests
tdimitrov Jul 8, 2024
4c3db2a
`collations_fetching_respects_seconded_limit` test
tdimitrov Jul 8, 2024
b2bbdfe
nits
tdimitrov Jul 8, 2024
e220cb4
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Aug 26, 2024
01d121e
Remove duplicated dependency after merge
tdimitrov Aug 26, 2024
7b3c002
Remove `ProspectiveParachainsMode` from collator-protocol, validator-…
tdimitrov Jul 10, 2024
5dffdde
Fix compilation errors in collation tests
tdimitrov Jul 10, 2024
1c1744b
`is_seconded_limit_reached` uses the whole view
tdimitrov Jul 11, 2024
aaccab1
Fix `is_seconded_limit_reached` check
tdimitrov Aug 30, 2024
b1df2e3
Trace logs useful for debugging tests
tdimitrov Sep 11, 2024
ce3a95e
Handle unconnected candidates
tdimitrov Sep 11, 2024
fe3c09d
Rework pre-prospective parachains tests to work with claim queue
tdimitrov Sep 11, 2024
b9ab579
Fix `collation_fetches_without_claimqueue`
tdimitrov Sep 12, 2024
fe623bc
Test - `collation_fetching_prefer_entries_earlier_in_claim_queue`
tdimitrov Sep 13, 2024
d216689
Remove collations test file - all tests are moved in prospective_para…
tdimitrov Sep 13, 2024
ea99c7a
fixup - collation_fetching_prefer_entries_earlier_in_claim_queue
tdimitrov Sep 16, 2024
ee155f5
New test - `collation_fetching_considers_advertisements_from_the_whol…
tdimitrov Sep 16, 2024
55b7902
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Sep 17, 2024
515a784
Update PRdoc and comments
tdimitrov Sep 17, 2024
4ef6919
Combine `seconded_per_para` and `claims_per_para` from collations in …
tdimitrov Sep 17, 2024
bd7174f
No need to handle missing claim queue anymore
tdimitrov Sep 17, 2024
df6165e
Remove dead code and fix some comments
tdimitrov Sep 17, 2024
4c5c271
Remove `is_seconded_limit_reached` and use the code directly due to t…
tdimitrov Sep 17, 2024
b0e4627
Fix comments
tdimitrov Sep 17, 2024
d1cf41d
`pending_for_para` -> `is_pending_for_para`
tdimitrov Sep 17, 2024
df3a215
Fix `0011-async-backing-6-seconds-rate.toml` - set `lookahead` to 3 o…
tdimitrov Sep 19, 2024
b70807b
Set `lookahead` in polkadot/zombienet_tests/elastic_scaling/0002-elas…
tdimitrov Sep 20, 2024
f047036
paras_now -> assigned_paras
tdimitrov Sep 30, 2024
94e4fc3
Remove a duplicated parameter in `update_view`
tdimitrov Sep 30, 2024
386488b
Remove an outdated comment
tdimitrov Sep 30, 2024
ff312c9
Fix `seconded_and_pending_for_para_in_view`
tdimitrov Oct 2, 2024
88d0307
`claim_queue_state` becomes `unfulfilled_claim_queue_entries` - the b…
tdimitrov Oct 2, 2024
af78352
For consistency use `chain_ids` only from `test_state`
tdimitrov Oct 2, 2024
d636091
Limit the number of advertisements accepted by each peer for spam pro…
tdimitrov Oct 3, 2024
2bb82eb
Zombienet test
tdimitrov Oct 7, 2024
c782058
Rearrange imports
tdimitrov Oct 7, 2024
903f7f4
Newline and outdated comment
tdimitrov Oct 7, 2024
cefbce8
Undo `lookahead = 3` in zombienet tests
tdimitrov Oct 14, 2024
cb69361
Consider what's scheduled on the core when determining assignments
tdimitrov Oct 14, 2024
1142a90
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 14, 2024
4438349
Fix a clippy warning
tdimitrov Oct 14, 2024
e82c386
Update PRdoc
tdimitrov Oct 15, 2024
4b2d4c5
Apply suggestions from code review
tdimitrov Oct 15, 2024
1c91371
".git/.scripts/commands/fmt/fmt.sh"
Oct 15, 2024
be34132
Code review feedback
tdimitrov Oct 15, 2024
5c7b2ac
Fix a typo in prdoc
tdimitrov Oct 15, 2024
62c6473
`seconded_and_pending_for_para_in_view` looks up to the len of the cl…
tdimitrov Oct 16, 2024
9e3f62d
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 16, 2024
a4bc21f
rerun CI
tdimitrov Oct 16, 2024
6c103df
Fix zombienet test
tdimitrov Oct 18, 2024
15e3a74
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 18, 2024
d6b35ca
Relax expected block counts for each para
tdimitrov Oct 18, 2024
586b56b
Bump lookahead and decrease timeout
tdimitrov Oct 18, 2024
a04d480
Fix ZN pipeline - try 1
tdimitrov Oct 18, 2024
13d5d15
Fix ZN pipeline - try 2
tdimitrov Oct 18, 2024
86870d0
Fix ZN pipeline - try 3
tdimitrov Oct 18, 2024
7b822af
Fix ZN pipeline - try 4
tdimitrov Oct 18, 2024
558c82e
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 21, 2024
06c0fd0
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 24, 2024
ade7f9b
Rename ZN test
tdimitrov Oct 24, 2024
8ba2a80
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Oct 28, 2024
ab70567
Handle merge conflicts
tdimitrov Oct 28, 2024
d24fdc1
When counting occupied slots from the claim queue consider relay pare…
tdimitrov Nov 6, 2024
f55390e
Add a test
tdimitrov Nov 7, 2024
a2093ee
Small style fixes in tests
tdimitrov Nov 7, 2024
ded6fb5
Fix a todo
tdimitrov Nov 7, 2024
cda9330
Fix `paths_to_relay_parent`
tdimitrov Nov 7, 2024
505eb24
Additional test for `paths_to_relay_parent`
tdimitrov Nov 8, 2024
94f573a
Simplifications
tdimitrov Nov 8, 2024
fa82404
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Nov 8, 2024
55e7fb2
Resolve merge conflicts
tdimitrov Nov 8, 2024
e27ddd4
Fix todos
tdimitrov Nov 8, 2024
a10c0c1
Comment
tdimitrov Nov 8, 2024
ee11c6a
Remove unneeded log line
tdimitrov Nov 8, 2024
f8e3f52
Update ZN tests and add some additional logs
tdimitrov Nov 12, 2024
1c0f93a
Remove `CollationStatus::WaitingOnValidation` and fix pending collati…
tdimitrov Nov 12, 2024
67321cb
Use slot based collator for 0018-shared-core-idle-parachain
tdimitrov Nov 12, 2024
40cbf2f
Use slot based collator for 0019-coretime-collation-fetching fairness
tdimitrov Nov 12, 2024
b30c037
Remove log line assert from coretime-collation-fetching-fairness test
tdimitrov Nov 12, 2024
9fbed03
Update coretime-collation-fetching-fairness ZN test to simulate a col…
tdimitrov Nov 14, 2024
b032c66
Remove `outer_leaves` from backing implicit view
tdimitrov Nov 14, 2024
eb0ca12
Restore `WaitingOnValidation` and `back_to_waiting`
tdimitrov Nov 14, 2024
00fafd4
Update polkadot/node/network/collator-protocol/src/validator_side/mod.rs
tdimitrov Nov 15, 2024
61e3a80
Don't fetch availability cores
tdimitrov Nov 15, 2024
5b741d2
Fix a log message and remove 'outer' from comments/variable names.
tdimitrov Nov 15, 2024
56eac5d
Small refactor in `seconded_and_pending_for_para_below` and `seconded…
tdimitrov Nov 15, 2024
9607e1c
Comments + test name
tdimitrov Nov 15, 2024
02ed2ae
Remove `activated` from `update_view` in tests
tdimitrov Nov 15, 2024
fe7f944
Remove `cores` from `TestState`
tdimitrov Nov 15, 2024
a31bf6f
Fix `ensure_seconding_limit_is_respected`
tdimitrov Nov 17, 2024
d251a14
newline
tdimitrov Nov 18, 2024
7c39070
clippy
tdimitrov Nov 18, 2024
5c3e52a
fmt
tdimitrov Nov 18, 2024
2c9721e
".git/.scripts/commands/fmt/fmt.sh"
Nov 18, 2024
46b4d9f
Fix PRdoc
tdimitrov Nov 19, 2024
bf927d2
Remove `seconded_and_pending_for_para_below` and fix `seconded_and_pe…
tdimitrov Nov 20, 2024
d05777b
Update comment
tdimitrov Nov 20, 2024
f833783
new test - `claims_above_are_counted_correctly`
tdimitrov Nov 20, 2024
7c807e9
more tests
tdimitrov Nov 20, 2024
8f33ba0
comment
tdimitrov Nov 21, 2024
1c9db10
Fix path iteration in `seconded_and_pending_for_para_above`
tdimitrov Nov 22, 2024
e6947b9
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Nov 22, 2024
439291a
Remove a debug println
tdimitrov Nov 25, 2024
3f7691a
Refactor claims counting: project claim queue state on top of relay p…
tdimitrov Nov 22, 2024
b1de6ba
Remove unused code
tdimitrov Nov 25, 2024
48d3f5c
Trace logs in claim_queue_state
tdimitrov Nov 25, 2024
2e0d142
Fix path generation in `ensure_seconding_limit_is_respected`
tdimitrov Nov 25, 2024
33e5c9f
Rework `unfulfilled_claim_queue_entries`
tdimitrov Nov 26, 2024
5bc63de
`paths_from_leaves_via_relay_parent`
tdimitrov Nov 26, 2024
1a195c1
Fix `fetch_next_collation_on_invalid_collation`
tdimitrov Nov 26, 2024
ac3e1a1
`paths_via_relay_parent` uses `block_info`
tdimitrov Nov 26, 2024
31c6cdb
Fix tests
tdimitrov Nov 26, 2024
3794e69
Comments
tdimitrov Nov 26, 2024
3e2acd4
File header and tests for edge cases in claim_queue_state
tdimitrov Nov 26, 2024
583f469
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Nov 27, 2024
eac7a73
clippy
tdimitrov Nov 27, 2024
3bd85d9
Apply suggestions from code review
tdimitrov Nov 27, 2024
4b2a67d
comment
tdimitrov Nov 28, 2024
f3634e1
Small refactoring at claim_queue_state
tdimitrov Nov 28, 2024
6dd55bf
Apply suggestions from code review
tdimitrov Dec 2, 2024
14177ad
Fix comments and remove an unneeded error type
tdimitrov Dec 2, 2024
6d9d8b8
More tests + a small bug fix
tdimitrov Dec 2, 2024
0133a2f
Update PRdoc
tdimitrov Dec 2, 2024
33eeb6d
Comments
tdimitrov Dec 11, 2024
8014eba
Accept an advertisement if there is a free spot for it at at least on…
tdimitrov Dec 11, 2024
2e176b9
Merge branch 'master' into tsv-collator-proto-fairness
tdimitrov Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 150 additions & 23 deletions polkadot/node/network/collator-protocol/src/validator_side/collation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,12 @@
//! ┌──────────────────────────────────────────┐
//! └─▶Advertised ─▶ Pending ─▶ Fetched ─▶ Validated
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved

use std::{collections::VecDeque, future::Future, pin::Pin, task::Poll};
use std::{
collections::{BTreeMap, VecDeque},
future::Future,
pin::Pin,
task::Poll,
};

use futures::{future::BoxFuture, FutureExt};
use polkadot_node_network_protocol::{
Expand All @@ -48,6 +53,8 @@ use tokio_util::sync::CancellationToken;

use crate::{error::SecondingError, LOG_TARGET};

use super::GroupAssignments;

/// Candidate supplied with a para head it's built on top of.
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq)]
pub struct ProspectiveCandidate {
Expand Down Expand Up @@ -216,27 +223,58 @@ impl CollationStatus {
}

/// Information about collations per relay parent.
#[derive(Default)]
pub struct Collations {
/// What is the current status in regards to a collation for this relay parent?
pub status: CollationStatus,
/// Collator we're fetching from, optionally which candidate was requested.
///
/// This is the currently last started fetch, which did not exceed `MAX_UNSHARED_DOWNLOAD_TIME`
/// yet.
/// This is the last fetch for the relay parent. The value is used in
/// `get_next_collation_to_fetch` (called from `dequeue_next_collation_and_fetch`) to determine
/// if the last fetched collation is the same as the one which just finished. If yes - another
/// collation should be fetched. If not - another fetch was already initiated and
/// `get_next_collation_to_fetch` will do nothing.
///
/// For the reasons above this value is not set to `None` when the fetch is done! Don't use it
/// to check if there is a pending fetch.
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
pub fetching_from: Option<(CollatorId, Option<CandidateHash>)>,
alindima marked this conversation as resolved.
Show resolved Hide resolved
/// Collation that were advertised to us, but we did not yet fetch.
pub waiting_queue: VecDeque<(PendingCollation, CollatorId)>,
/// Collation that were advertised to us, but we did not yet fetch. Grouped by `ParaId`.
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
waiting_queue: BTreeMap<ParaId, VecDeque<(PendingCollation, CollatorId)>>,
/// How many collations have been seconded.
pub seconded_count: usize,
/// What collations were fetched so far for this relay parent.
fetched_per_para: BTreeMap<ParaId, usize>,
// Claims per `ParaId` for the assigned core at the relay parent. This information is obtained
// from the claim queue.
claims_per_para: BTreeMap<ParaId, usize>,
}

impl Collations {
pub(super) fn new(assignments: &Vec<ParaId>) -> Self {
let mut claims_per_para = BTreeMap::new();
for para_id in assignments {
*claims_per_para.entry(*para_id).or_default() += 1;
}

Self {
status: Default::default(),
fetching_from: None,
waiting_queue: Default::default(),
seconded_count: 0,
fetched_per_para: Default::default(),
claims_per_para,
}
}

/// Note a seconded collation for a given para.
pub(super) fn note_seconded(&mut self) {
self.seconded_count += 1
}

// Note a collation which has been successfully fetched.
pub(super) fn note_fetched(&mut self, para_id: ParaId) {
*self.fetched_per_para.entry(para_id).or_default() += 1
}

/// Returns the next collation to fetch from the `waiting_queue`.
///
/// This will reset the status back to `Waiting` using [`CollationStatus::back_to_waiting`].
Expand All @@ -247,6 +285,7 @@ impl Collations {
&mut self,
finished_one: &(CollatorId, Option<CandidateHash>),
relay_parent_mode: ProspectiveParachainsMode,
assignments: &GroupAssignments,
) -> Option<(PendingCollation, CollatorId)> {
// If finished one does not match waiting_collation, then we already dequeued another fetch
// to replace it.
Expand All @@ -269,31 +308,119 @@ impl Collations {
match self.status {
// We don't need to fetch any other collation when we already have seconded one.
CollationStatus::Seconded => None,
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
CollationStatus::Waiting =>
if self.is_seconded_limit_reached(relay_parent_mode) {
None
} else {
self.waiting_queue.pop_front()
},
CollationStatus::Waiting => self.pick_a_collation_to_fetch(&assignments.current),
CollationStatus::WaitingOnValidation | CollationStatus::Fetching =>
unreachable!("We have reset the status above!"),
}
}

/// Checks the limit of seconded candidates.
pub(super) fn is_seconded_limit_reached(
/// Checks if another collation can be accepted. The number of collations that can be fetched
/// per parachain is limited by the entries in the claim queue for the `ParaId` in question.
///
/// If prospective parachains mode is not enabled then we fall back to synchronous backing. In
/// this case there is a limit of 1 collation per relay parent.
pub(super) fn is_collations_limit_reached(
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
&self,
relay_parent_mode: ProspectiveParachainsMode,
para_id: ParaId,
num_pending_fetches: usize,
) -> bool {
let seconded_limit =
if let ProspectiveParachainsMode::Enabled { max_candidate_depth, .. } =
relay_parent_mode
{
max_candidate_depth + 1
} else {
1
};
self.seconded_count >= seconded_limit
if let ProspectiveParachainsMode::Disabled = relay_parent_mode {
// fallback to synchronous backing
return self.seconded_count >= 1
}

// Successful fetches + a pending fetch < claim queue entries for `para_id`
let respected_per_para_limit =
self.claims_per_para.get(&para_id).copied().unwrap_or_default() >
self.fetched_per_para.get(&para_id).copied().unwrap_or_default() +
num_pending_fetches;

gum::trace!(
target: LOG_TARGET,
?para_id,
claims_per_para=?self.claims_per_para,
fetched_per_para=?self.fetched_per_para,
?num_pending_fetches,
?respected_per_para_limit,
"is_collations_limit_reached"
);

!respected_per_para_limit
}

/// Adds a new collation to the waiting queue for the relay parent. This function doesn't
/// perform any limits check. The caller (`enqueue_collation`) should assure that the collation
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
/// can be enqueued.
pub(super) fn add_to_waiting_queue(&mut self, collation: (PendingCollation, CollatorId)) {
self.waiting_queue.entry(collation.0.para_id).or_default().push_back(collation);
}

/// Picks a collation to fetch from the waiting queue.
/// When fetching collations we need to ensure that each parachain has got a fair core time
/// share depending on its assignments in the claim queue. This means that the number of
/// collations fetched per parachain should ideally be equal to (but not bigger than) the number
/// of claims for the particular parachain in the claim queue.
///
/// To achieve this each parachain with at an entry in the `waiting_queue` has got a score
/// calculated by dividing the number of fetched collations by the number of entries in the
/// claim queue. Lower the score means higher fetching priority. Note that if a parachain hasn't
/// got anything fetched at this relay parent it will have score 0 which means highest priority.
///
/// If two parachains has got the same score the one which is earlier in the claim queue will be
/// picked.
fn pick_a_collation_to_fetch(
&mut self,
claims: &Vec<ParaId>,
) -> Option<(PendingCollation, CollatorId)> {
gum::trace!(
target: LOG_TARGET,
waiting_queue = ?self.waiting_queue,
claim_queue = ?claims,
"Pick a collation to fetch."
);

// Find the parachain(s) with the lowest score.
let mut lowest_score = None;
for (para_id, collations) in &mut self.waiting_queue {
let para_score = self
.fetched_per_para
.get(para_id)
.copied()
.unwrap_or_default()
.saturating_div(self.claims_per_para.get(para_id).copied().unwrap_or_default());

// skip empty queues
if collations.is_empty() {
continue
}

match lowest_score {
Some((score, _)) if para_score < score =>
lowest_score = Some((para_score, vec![(para_id, collations)])),
Some((_, ref mut paras)) => {
paras.push((para_id, collations));
},
None => lowest_score = Some((para_score, vec![(para_id, collations)])),
}
}

if let Some((_, mut lowest_score)) = lowest_score {
for claim in claims {
if let Some((_, collations)) = lowest_score.iter_mut().find(|(id, _)| *id == claim)
{
match collations.pop_front() {
Some(collation) => return Some(collation),
None => {
unreachable!("Collation can't be empty!")
},
}
}
}
unreachable!("All entries in waiting_queue should also be in claim queue")
} else {
None
}
tdimitrov marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down
Loading
Loading