Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full domain chain snap sync. #3115

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
64e54ef
Add peer consensus for the last confirmed domain block receipt.
shamil-gadelshin Sep 30, 2024
40758d0
Introduce snap sync orchestrator.
shamil-gadelshin Oct 9, 2024
b7801f6
Introduce consensus sync params struct.
shamil-gadelshin Sep 16, 2024
ce5f5f4
Add domain snap sync algorithm.
shamil-gadelshin Sep 13, 2024
b2b85f8
Update MMR-sync
shamil-gadelshin Oct 9, 2024
ea16060
Update consensus snap-sync
shamil-gadelshin Oct 9, 2024
b9832af
Modify domain networking stack.
shamil-gadelshin Oct 10, 2024
3969d74
Update domain service (domain snap-sync integration)
shamil-gadelshin Oct 9, 2024
4a1404c
Update domain operator (domain snap-sync integration)
shamil-gadelshin Oct 10, 2024
f9f87a6
Update relayer (domain snap-sync integration)
shamil-gadelshin Oct 9, 2024
0568ab9
Update subspace-service (domain snap-sync integration)
shamil-gadelshin Oct 9, 2024
a274f32
Update subspace-node (domain snap-sync integration)
shamil-gadelshin Oct 9, 2024
bdb738a
Update test dependencies (domain snap-sync integration)
shamil-gadelshin Oct 9, 2024
859c2cd
Update Cargo.lock
shamil-gadelshin Oct 9, 2024
bf89a41
Remove unnecessary dependencies.
shamil-gadelshin Oct 11, 2024
eb78e45
Refactor domain snap sync
shamil-gadelshin Oct 14, 2024
b8571cf
Move wait_for_block_import method.
shamil-gadelshin Oct 16, 2024
c5d4b0f
Remove unnecessary AuxStore field from ConsensusChainSyncParams
shamil-gadelshin Oct 16, 2024
9956983
Change waiting for blocks to block notification acknowledgement
shamil-gadelshin Oct 17, 2024
44499af
Modify SubspaceSyncOracle to add domain snap sync.
shamil-gadelshin Oct 18, 2024
60d45e9
Refactor SnapSyncOrchestrator
shamil-gadelshin Oct 22, 2024
8a035c0
Move domain block receipt protocol to domain network
shamil-gadelshin Oct 18, 2024
bf34b67
Update last confirmed domain block execution receipt protocol.
shamil-gadelshin Oct 24, 2024
ad6a1f1
Mark block import streams as Unpin
teor2345 Oct 28, 2024
6962b8f
Merge pull request #3179 from autonomys/full-domain-snap-sync-unpin
shamil-gadelshin Oct 28, 2024
74c4f7f
Refactor domain snap sync.
shamil-gadelshin Oct 28, 2024
1f78a74
Simplify error handling.
shamil-gadelshin Oct 28, 2024
0bed801
Simplify async start_worker_task.
shamil-gadelshin Oct 29, 2024
7179802
Refactor domain snap sync
shamil-gadelshin Oct 30, 2024
b6e9f84
Add log message on. failed domain snap sync.
shamil-gadelshin Oct 30, 2024
2267205
Refactor error messages.
shamil-gadelshin Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions crates/sc-consensus-subspace/src/slot_worker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ where
{
force_authoring: bool,
pause_sync: Arc<AtomicBool>,
domain_sync_finished: Option<Arc<AtomicBool>>,
shamil-gadelshin marked this conversation as resolved.
Show resolved Hide resolved
inner: SO,
}

Expand All @@ -106,6 +107,11 @@ where
// (default state), it also accounts for DSN sync
(!self.force_authoring && self.inner.is_major_syncing())
|| self.pause_sync.load(Ordering::Acquire)
|| self
.domain_sync_finished
.as_ref()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is an unusually strong ordering, which might have a performance impact. Usually it is enough to use Acquire for loads and Release for stores.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll set it to the proposed values. However, I don't think performance suffers here because of the access pattern (write once in total, query once per second). Please, correct me if I'm wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the domain worker, SubspaceSyncOracle is also used in other places of the consensus chain like the archiver, PoT worker and RPC, etc, and if the domain sync is not finished it will change the return value of is_major_syncing and some behaviors of these workers, better to have more eyes on this to see if it is okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SeqCst ordering has an impact on read and write performance, because it imposes a total order across all threads:
https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html#variant.SeqCst

.map(|sync_finished| !sync_finished.load(Ordering::SeqCst))
.unwrap_or_default()
}

fn is_offline(&self) -> bool {
Expand All @@ -122,10 +128,12 @@ where
force_authoring: bool,
pause_sync: Arc<AtomicBool>,
substrate_sync_oracle: SO,
domain_sync_finished: Option<Arc<AtomicBool>>,
) -> Self {
Self {
force_authoring,
pause_sync,
domain_sync_finished,
inner: substrate_sync_oracle,
}
}
Expand Down
37 changes: 9 additions & 28 deletions crates/subspace-service/src/domains/snap_sync_orchestrator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
use crate::sync_from_dsn::snap_sync::{DefaultTargetBlockProvider, SnapSyncTargetBlockProvider};
use async_trait::async_trait;
use parking_lot::Mutex;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use subspace_core_primitives::BlockNumber;
use tokio::sync::Notify;
Expand All @@ -23,15 +24,15 @@ pub struct SnapSyncOrchestrator {
notify_consensus_snap_sync_unblocked: Notify,
consensus_snap_sync_block_number: Mutex<Option<BlockNumber>>,
notify_domain_snap_sync_finished: Notify,
domain_snap_sync_finished: Mutex<bool>,
domain_snap_sync_finished: Arc<AtomicBool>,
}

#[async_trait]
impl SnapSyncTargetBlockProvider for SnapSyncOrchestrator {
async fn target_block(&self) -> Option<BlockNumber> {
self.consensus_snap_sync_unblocked().await;
self.notify_consensus_snap_sync_unblocked.notified().await;
NingLin-P marked this conversation as resolved.
Show resolved Hide resolved

self.target_consensus_snap_sync_block_number()
*self.consensus_snap_sync_block_number.lock()
}
}

Expand All @@ -48,23 +49,10 @@ impl SnapSyncOrchestrator {
notify_consensus_snap_sync_unblocked: Notify::new(),
consensus_snap_sync_block_number: Mutex::new(None),
notify_domain_snap_sync_finished: Notify::new(),
domain_snap_sync_finished: Mutex::new(false),
domain_snap_sync_finished: Arc::new(AtomicBool::new(false)),
}
}

/// Returns optional target block for consensus chain snap sync. None means target block is
/// not defined yet.
pub fn target_consensus_snap_sync_block_number(&self) -> Option<BlockNumber> {
*self.consensus_snap_sync_block_number.lock()
}

/// Wait for the allowing signal for the consensus chain snap sync.
pub async fn consensus_snap_sync_unblocked(&self) {
debug!("Waiting for a signal to start consensus chain snap sync.");
self.notify_consensus_snap_sync_unblocked.notified().await;
debug!("Finished waiting for a signal to start consensus chain snap sync.");
}

/// Unblocks (allows) consensus chain snap sync with the given target block.
pub fn unblock_consensus_snap_sync(&self, target_block_number: BlockNumber) {
debug!(%target_block_number, "Allowed starting consensus chain snap sync.");
Expand All @@ -76,31 +64,24 @@ impl SnapSyncOrchestrator {
}

/// Returns true if domain snap sync finished.
pub fn domain_snap_sync_finished(&self) -> bool {
*self.domain_snap_sync_finished.lock()
pub fn domain_snap_sync_finished(&self) -> Arc<AtomicBool> {
self.domain_snap_sync_finished.clone()
}

/// Signal that domain snap sync finished.
pub fn mark_domain_snap_sync_finished(&self) {
debug!("Signal that domain snap sync finished.");
*self.domain_snap_sync_finished.lock() = true;
self.domain_snap_sync_finished.store(true, Ordering::SeqCst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my last comment, Release should be enough here:
https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html

Copy link
Contributor

@teor2345 teor2345 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just un-resolving this because Ning wanted more people to review the code.


self.notify_domain_snap_sync_finished.notify_waiters();
}

/// Wait for a signal that domain snap sync finished.
pub async fn domain_snap_sync_finished_blocking(&self) {
debug!("Waiting for a signal that domain snap sync finished.");
self.notify_domain_snap_sync_finished.notified().await;
debug!("Finished waiting for a signal that domain snap sync finished.");
}

/// Unblock all processes (synchronization cancelled).
pub fn unblock_all(&self) {
debug!("Allow all processes (synchronization cancelled).");

self.notify_consensus_snap_sync_unblocked.notify_waiters();
self.notify_domain_snap_sync_finished.notify_waiters();
*self.domain_snap_sync_finished.lock() = true;
self.domain_snap_sync_finished.store(true, Ordering::SeqCst);
}
}
3 changes: 3 additions & 0 deletions crates/subspace-service/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1038,6 +1038,9 @@ where
config.base.force_authoring,
Arc::clone(&pause_sync),
sync_service.clone(),
snap_sync_orchestrator
.as_ref()
.map(|orchestrator| orchestrator.domain_snap_sync_finished()),
);

let subspace_archiver = tokio::task::block_in_place(|| {
Expand Down
12 changes: 0 additions & 12 deletions domains/client/relayer/src/worker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ use sp_mmr_primitives::MmrApi;
use sp_runtime::traits::{CheckedSub, NumberFor, One};
use sp_runtime::SaturatedConversion;
use std::sync::Arc;
use subspace_service::domains::snap_sync_orchestrator::SnapSyncOrchestrator;

pub async fn gossip_channel_updates<Client, Block, CBlock, SO>(
chain_id: ChainId,
Expand Down Expand Up @@ -137,7 +136,6 @@ pub async fn start_relaying_messages<CClient, Client, CBlock, Block, SO>(
confirmation_depth_k: NumberFor<CBlock>,
sync_oracle: SO,
gossip_message_sink: GossipMessageSink,
snap_sync_orchestrator: Option<Arc<SnapSyncOrchestrator>>,
) where
Block: BlockT,
CBlock: BlockT,
Expand Down Expand Up @@ -177,16 +175,6 @@ pub async fn start_relaying_messages<CClient, Client, CBlock, Block, SO>(
continue;
}

if let Some(ref snap_sync_orchestrator) = snap_sync_orchestrator {
if !snap_sync_orchestrator.domain_snap_sync_finished() {
tracing::debug!(
target: LOG_TARGET,
"Domain snap sync: skipping message processing..."
);
continue;
}
}

let Some(confirmed_block_number) = imported_block
.header
.number()
Expand Down
5 changes: 0 additions & 5 deletions domains/service/src/domain.rs
Original file line number Diff line number Diff line change
Expand Up @@ -506,10 +506,6 @@ where
// TODO: Implement when block tree is ready.
let domain_confirmation_depth = 256u32;

let snap_sync_orchestrator = consensus_chain_sync_params
.as_ref()
.map(|params| params.snap_sync_orchestrator.clone());

let operator = Operator::new(
Box::new(spawn_essential.clone()),
OperatorParams {
Expand Down Expand Up @@ -550,7 +546,6 @@ where
// since domain sync oracle will always return `synced` due to force sync being set.
consensus_network_sync_oracle.clone(),
gossip_message_sink.clone(),
snap_sync_orchestrator,
);

spawn_essential.spawn_essential_blocking("domain-relayer", None, Box::pin(relayer_worker));
Expand Down