Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: keep sync fp status #52

Merged
merged 15 commits into from
Sep 18, 2024
Merged

chore: keep sync fp status #52

merged 15 commits into from
Sep 18, 2024

Conversation

RafilxTenfen
Copy link
Contributor

@RafilxTenfen RafilxTenfen commented Sep 12, 2024

  • Removed previous MinutesToWaitForConsumer added in chore: add retry for consumer chain to become available #50 since it was not good enough to wait for the chain to be up, without starting the fpd server to receive fpd creation
  • Add new loop that verifies the FP status on chain and update it accordingly (before it was only one time at startup)
  • Add check for keyring key is valid at start of NewBabylonController to avoid panic at mustGetTxSigner

@RafilxTenfen RafilxTenfen self-assigned this Sep 12, 2024
@RafilxTenfen RafilxTenfen marked this pull request as ready for review September 13, 2024 01:06
Copy link
Member

@Lazar955 Lazar955 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

finality-provider/store/storedfp.go Outdated Show resolved Hide resolved
Copy link
Member

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I was missing something but I don't understand the motivation of having a loop for state sync. What specific case does this PR wants to resolve?

clientcontroller/babylon.go Show resolved Hide resolved
Comment on lines 280 to 286
allowedErr := fmt.Sprintf("rpc error: code = Unknown desc = %s: unknown request", btcstakingtypes.ErrVotingPowerTableNotUpdated.Wrapf("height: %d", blockHeight).Error())
if strings.EqualFold(err.Error(), allowedErr) {
// if nothing was updated in the voting power table, it should consider as zero VP to start to send pub random
return 0, nil
}

return 0, fmt.Errorf("failed to query Finality Voting Power at Height: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to think we should not ignore this error because the error essentially means that there's no voting power for any fps yet. If we keep retrying, then eventually the error will be gone. If we ignore this error, we won't revisit the voting power for this height, which could lead to missing a block that the FP has voting power.

I think we should ensure that QueryFinalityProviderVotingPower would not be called before waitForActivation is finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm, this makes sense
But I had understood that the finality provider should send pub rand list even if he does not have voting power, only the finality signature should verify that it has voting power, before sending

Is my understanding correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thing is that if the fp gets 0 from QueryFinalityProviderVotingPower, it will skip sending finality votes for this height for good. In this PR, QueryFinalityProviderVotingPower can return 0 due to the protocol is not activated. So it is possible that QueryFinalityProviderVotingPower will return non-zero value for this height after the protocol is activated. I was worried that the fp might miss such heights due to this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm, I think I understand
For finality signatures we still want to respect this error and avoid skipping it!
For send the pub rand list, we should still send it even if the btcstaking protocol is not activated yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified at a43b3f1, now only the portion that updates from Created to Register, allows the error ErrVotingPowerTableNotUpdated

@RafilxTenfen
Copy link
Contributor Author

Maybe I was missing something but I don't understand the motivation of having a loop for state sync. What specific case does this PR wants to resolve?

This PR wants to solve:

  • FPs created offline following the docs that will be included in the upgrade, do not pass to the normal process of finality provider registration where it uses the command fpd register-finality-provider, since they will already be on the chain state. So the finality provider must keep asking the chain "Hey, are you ready? If so, give me the registered finality providers and I will start the fpd instance"
  • Finality providers to be able to turn on and start the fpd even before the consumer chain is running, and when the consumer chain becomes available and ready, it starts sending the needed messages

@gitferry

Copy link
Member

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • FPs created offline following the docs that will be included in the upgrade, do not pass to the normal process of finality provider registration where it uses the command fpd register-finality-provider, since they will already be on the chain state. So the finality provider must keep asking the chain "Hey, are you ready? If so, give me the registered finality providers and I will start the fpd instance"
  • Finality providers to be able to turn on and start the fpd even before the consumer chain is running, and when the consumer chain becomes available and ready, it starts sending the needed messages

I'm still not convinced that we should use a long-lived loop for the above. I think the only job of this loop is to wait for the status of the fp to be >= REGISTERED && < SLASHED. After the fp is started then the loop runs for nothing (we already have monitorStatusUpdate to periodically update fp status). Why can't we use a one-time waitForChainReady and waitForActivation before initiating the fp instance? Note that the poller starts after the protocol is activated, meaning that the very first block sent from the poller should already be activated.

clientcontroller/babylon.go Outdated Show resolved Hide resolved
@RafilxTenfen
Copy link
Contributor Author

  • FPs created offline following the docs that will be included in the upgrade, do not pass to the normal process of finality provider registration where it uses the command fpd register-finality-provider, since they will already be on the chain state. So the finality provider must keep asking the chain "Hey, are you ready? If so, give me the registered finality providers and I will start the fpd instance"
  • Finality providers to be able to turn on and start the fpd even before the consumer chain is running, and when the consumer chain becomes available and ready, it starts sending the needed messages

I'm still not convinced that we should use a long-lived loop for the above. I think the only job of this loop is to wait for the status of the fp to be >= REGISTERED && < SLASHED. After the fp is started then the loop runs for nothing (we already have monitorStatusUpdate to periodically update fp status). Why can't we use a one-time waitForChainReady and waitForActivation before initiating the fp instance? Note that the poller starts after the protocol is activated, meaning that the very first block sent from the poller should already be activated.

Without this loop syncChainFpStatusLoop imagine that the fpd starts by complete, which starts to listen for requests like (create and register FP). We call fpd create-finality-provider to store the new FP into the fpd database, we can't call fpd register-finality-provider, because the FPs inserted during the upgrade are already registered on chain, the fpd just doesn't know they are already registered (and will never know without checking the chain).
So, no FP instance will start, because they are not considered as REGISTERED

One thing is that I do agree, after the FP is marked as REGISTERED and the instance start, there is no use for this loop and it could be stopped and from #34 we are not going to support multiple fps, so I will modify that after the FP is considered registered, the loop syncChainFpStatusLoop is stopped.

What do you think? @gitferry

@gitferry
Copy link
Member

gitferry commented Sep 17, 2024

Yep, we need the loop to periodically update the fp status. After it reaches to a status that is ready to start the instance, we can stop the loop and start the instance

@RafilxTenfen
Copy link
Contributor Author

to sync the fp status from CREATED to REGISTERED without the user having to run the register command. My question is why is the previous implementation not enough to cover this case? See

The old implementation was waiting for the chain to be ready to update from CREATED to REGISTERED to start the rest of loops (but this also did not started the fp server and handler of requests). The FP server needed to be online to receive the call fpd create-finality-provider

@gitferry
Copy link
Member

The old implementation was waiting for the chain to be ready to update from CREATED to REGISTERED to start the rest of loops (but this also did not started the fp server and handler of requests). The FP server needed to be online to receive the call fpd create-finality-provider

sorry I edited my comment before your reply. Yeah, now I think I slowly get it. Thanks!

Copy link
Member

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks for all the changes and explanation! Only one missing piece is comprehensive tests, especially on the transition scenario. This can be done in a separate PR

finality-provider/service/app.go Show resolved Hide resolved
finality-provider/service/app.go Show resolved Hide resolved
@RafilxTenfen
Copy link
Contributor Author

I will be adding tests in a next PR

@RafilxTenfen RafilxTenfen merged commit 7a5db87 into main Sep 18, 2024
8 checks passed
RafilxTenfen added a commit that referenced this pull request Sep 20, 2024
Add tests to created functions at #52
@gusin13
Copy link

gusin13 commented Sep 27, 2024

Hi guys found a particular edge case while working on consumer integration
#73

This is not a blocker for us as we temporarily increased the status sync interval in the consumer fpd.conf ref

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants