-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large reorg test #2497
base: master
Are you sure you want to change the base?
Large reorg test #2497
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is due to this insertionMutex being held here.
During the reorg:
- TransactionStreamer, holding insertionMutex lock, calls ExecutionEngine, which then adds old messages to a channel, so they can be resequenced asynchronously by ExecutionEngine. After that, and before releasing the lock, TransactionStreamer does more computations.
- Asynchronously, ExecutionEngine reads from this channel and calls TransactionStreamer, which expects that insertionMutex is free in order to succeed.
This issue is more likely to occur with larger reorgs, since step 1 will take more time to execute, which make step 2 more likely to start while step 1 is still executing.
Not sure if this issue is already tracked in Linear, but we can bring it up with the team.
We are temporarily lowering the number of iterations for this test until we can fix the underlying issues we discovered with it.
Fixes NIT-2686 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing with path scheme :(
There are two issues that I am aware with it:
- path scheme will only be able to handle reorgs that want to revert to a block before than HEAD-128 if ancients storage is enabled. Our tests don't enable it today, here. We should be using OpenDatabaseWithFreezer for this test instead. Not sure if there is an issue with using freezer in all Nitro's system tests 🤔, if not it will be simpler to adjust this :)
- NewNodeBuilder(ctx).DefaultConfig currently sets gethexec.CachingConfig.StateHistory to 0. We should set to a different value for this test. As an example, if StateHistory is set to 2 then it will be possible to revert to HEAD-130 but not to HEAD-131.
But even if those issues are fixed right now, not sure if we should merge this PR without first solving the issue with re-sequencing old messages during a reorg.
Even TestReorgResequencing, that has a smaller reorg than the one that is being proposed in this PR, is flaky, sometimes failing with the same error that you added to this PR's description.
If the
expectedCounter
field for this test is increased to 10_000 then it will fail with the following error:We are tracking resolution of this issue separately and can increase the reorg size once it's fixed.