Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(node): new voting reactor state machine #1136

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

cryptoAtwill
Copy link
Contributor

@cryptoAtwill cryptoAtwill commented Sep 18, 2024

This PR is the start of in total 4 PRs that implements the new topdown flow. The implementation is broken down as follows:

For this PR, the key changes are:

  • Using tokio single thread reactor mode
  • Use an enum to represent the different states of the state machine
  • Introduce operation mode state machine skeleton with Active and Paused operation mode
  • Use events emitted from syncer to communicate with the state machine: TopDownSyncEvent

The state machine is as follow:

stateDiagram-v2
  [*] --> Paused : Process start
  Paused --> Active : Synced
  Active --> Recovery : Checkpoints quiet
  Recovery --> Active : Checkpoints observed
  Active --> Paused : Stopping
  Recovery --> Paused : Stopping
	
  state Recovery {
    [*] --> SoftRecovery
    SoftRecovery --> HardRecovery : Still no new topdown checkpoints
    SoftRecovery --> [*] : New checkpoints
    HardRecovery --> [*] : New checkpoints
  }
Loading

@cryptoAtwill cryptoAtwill requested a review from a team as a code owner September 18, 2024 09:10
Base automatically changed from collateral-sourcing to main September 25, 2024 10:45
@raulk raulk added the topdown label Oct 14, 2024
/// }
/// TODO: Soft and Hard recovery mode to be added
pub enum OperationStateMachine {
Paused(PausedOperationMode),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the Paused naming to be misleading, because it only relates to the state of the votes publishing (I assume), while other operations are still running. What about Sync/Syncing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paused could be triggered by many other condition, mostly not just in syncing. Just not implemented yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that Paused isn't really idle, so this name is a bit confusing. I didn't refer to what triggers that mode, but to how it's best to describe it (given what it's doing).

fendermint/vm/topdown/src/vote/operation/mod.rs Outdated Show resolved Hide resolved
fendermint/vm/topdown/src/vote/mod.rs Show resolved Hide resolved
fendermint/vm/topdown/src/vote/mod.rs Outdated Show resolved Hide resolved
}
Err(mpsc::error::TryRecvError::Disconnected) => {
tracing::warn!("voting reactor tx closed unexpected");
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break will only break the batch loop, before re-entering. There's no shutdown / error handling mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are error, we just log it or maybe raise an alert, but the loop should continue running. As for shutdown, I think we can add it in VoteReactorRequest, should be pretty straightforward to do.

fendermint/vm/topdown/src/vote/mod.rs Show resolved Hide resolved
fendermint/vm/topdown/src/vote/mod.rs Show resolved Hide resolved
pub fn step(self) -> Self {
match self {
OperationStateMachine::Paused(p) => p.advance(),
OperationStateMachine::Active(p) => p.advance(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind the sequential phases within a single advance(), for both modes, is unclear. Can you elaborate more on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I dont quite get the question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can continue here: #1136 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated accordingly.

fendermint/vm/topdown/src/vote/operation/mod.rs Outdated Show resolved Hide resolved
fendermint/vm/topdown/src/sync/mod.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

4 participants