-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add domain snap sync algorithm #3027
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I don’t understand the snap sync algorithm enough to fully review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, I don't think I'm familiar enough with the algorithm to approve, but it looks reasonable to me.
} | ||
|
||
/// Wait for the allowing signal for the consensus chain snap sync. | ||
pub async fn consensus_snap_sync_unblocked(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubspaceLink::block_importing_notification_stream()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR, we discussed offline a possibility of changing the current algorithm with sync orchestrator from blocking to reactive approach by utilizing SubspaceLink::block_importing_notification_stream()
and its ability to acknowledge blocks. I tried to use that approach and deleted several initial synchronization points instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not deleting this one then? My point was that we ideally wouldn't need this orchestrator at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consensus chain snap sync is a part of the more complex domain snap sync process. In this form, it must start after we acquire the correct target block. I removed other blocking orchestrator points after our conversation: for example, we don't need to send signals from consensus snap sync anymore, however, it's not clear to me how to remove the dependency completely. Let's wait until the full solution is merged and return to this, I'm open for a change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just have a mutex or oneshot channel or something that is passed down to subspace-service
that prevents sync as such from starting? I don't think you need to block/unblock it many times anyway, just pause until something happens on domain side and it is not necessarily specific to Snap sync either.
What else is this orchestrator needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation will contain the target block provider that conceals the orchestrator in the full version:
pub trait SnapSyncTargetBlockProvider: Send + Sync {
async fn target_block(&self) -> Option<BlockNumber>;
}
The default non-blocking implementation returns None,
which is close to what you proposed. I tried to limit the scope of the PR, but it seems the whole solution will provide more context and will be easier to review despite its size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense in general, will take another look.
network_request, | ||
)?; | ||
|
||
let last_block_from_sync_result = sync_engine.download_state().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be possible that the state is too large to fit into the memory and cause OOM, not an immediate issue to fix but we need to resolve it in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an upstream issue for this: paritytech/polkadot-sdk#4
if last_confirmed_block_receipt.domain_block_number == 0u32.into() { | ||
return Err(sp_blockchain::Error::Application( | ||
"Can't snap sync from genesis.".into(), | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen if the domain is instantiated but has produced less than 14_400 blocks, especially when we enable the domain in the mainnet phase 2, the operator node will have to use full sync and sync from genesis.
Is it possible to avoid this error? by either downloading the genesis state from other peers or deriving the genesis state from the consensus chain (after consensus sync is finished) as the domain bootstrapper currently does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, it's not possible to insert the state after the initialization at the moment. However, we could try to investigate a hybrid "consensus-snap and domain-full" mode for such a case. Further research is required.
# Conflicts: # crates/subspace-service/src/sync_from_dsn.rs # domains/client/domain-operator/Cargo.toml
} | ||
|
||
/// Wait for the allowing signal for the consensus chain snap sync. | ||
pub async fn consensus_snap_sync_unblocked(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not deleting this one then? My point was that we ideally wouldn't need this orchestrator at all.
// Skip last `FINALIZATION_DEPTH_IN_SEGMENTS` archived segments | ||
.and_then(|max_segment_index| { | ||
max_segment_index.checked_sub(FINALIZATION_DEPTH_IN_SEGMENTS) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed that it is not actually needed to download older segment at FINALIZATION_DEPTH_IN_SEGMENTS
because it is equivalent to downloading the latest segment if responder can do a little bit of custom logic composing necessary data from technically not yet "finalized" from Substrate's point of view blocks. What happened to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my current task and I will issue a separate PR similar to other MMR-sync updates.,
} | ||
|
||
/// Wait for the allowing signal for the consensus chain snap sync. | ||
pub async fn consensus_snap_sync_unblocked(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just have a mutex or oneshot channel or something that is passed down to subspace-service
that prevents sync as such from starting? I don't think you need to block/unblock it many times anyway, just pause until something happens on domain side and it is not necessarily specific to Snap sync either.
What else is this orchestrator needed for?
Superseded by #3115 |
This PR introduces several important pieces for domain snap sync implementation (#3026): snap sync orchestrator and domain snap sync algorithm. This PR continues the discussion on the algorithm implementation highlighting the current decisions. It lacks the final integration in the code both at consensus and domain chain sides, proper configuration changes, and several security guarantees discussed previously.
The first commit introduces the
SnapSyncOrchestrator
- a synchronization manager that arranges correctly processes in both consensus and domain chains. 2-4 commits modify the existing code and add a structure to pass to the domain snap-sync algorithm introduced in the commit 5. The last (6) commit has an updatedCargo.lock
placed separately to simplify the review.Known future algorithm changes
FINALIZATION_DEPTH_IN_SEGMENTS
constant): modify MMR gadget to use archived blocks instead of finalized blocks.Code contributor checklist: