Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry mine keystone on failure #18

Merged
merged 18 commits into from
Mar 14, 2024
Merged

Conversation

ClaytonNorthey92
Copy link
Contributor

@ClaytonNorthey92 ClaytonNorthey92 commented Mar 1, 2024

Summary
We need a way to retry mining keystones if they fail, this provides a solution.

Fixes #7

Changes

  • keep buffer of 10 newest keystones known
  • upon receiving a keystone, only insert into buffer if it is newer than the oldest keystone
  • if buffer is full, drop oldest keystone upon inserting another
  • when a keystone is mined, mark it as "processed" to not repeat mining same keystone
  • generate fewer btc blocks in full network e2e test, since we have a retry strategy now
  • remove retry from full network e2e test

@ClaytonNorthey92 ClaytonNorthey92 force-pushed the clayton/popm-keystone-retry branch 3 times, most recently from 68f726b to 6bb1289 Compare March 1, 2024 12:56
@ClaytonNorthey92 ClaytonNorthey92 added type: enhancement This improves existing functionality size: XL This change is very large (+/- <1000) area: popm This is a change to popm (PoP Miner) labels Mar 1, 2024
@ClaytonNorthey92 ClaytonNorthey92 marked this pull request as ready for review March 1, 2024 13:05
service/popm/popm.go Outdated Show resolved Hide resolved
@ClaytonNorthey92 ClaytonNorthey92 changed the title retry keystone on failure retry mine keystone on failure Mar 1, 2024
// Push inserts an L2Keystone, dropping the oldest if full
func (r *L2KeystonePriorityBuffer) Push(val hemi.L2Keystone) {
r.mtx.Lock()
defer r.mtx.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do less while locked.


// temporarily set this to false, so another goroutine doesn't
// try to process
v.requiresProcessing = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks racy. You could have started traversing the map while the variable is not set yet. This is a logic race. Setting and unsetting the variable should really happen in the same lock.

key := hex.EncodeToString(mined[:])

err := cb(e)
r.mtx.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to say this but this is probably where you want to serialize using a channel. I may be missing some subtlety so don't take this as gospel truth but once you start seeing locking/unlocking in the same loop it typically means the code should be rethought to prevent that from happening.

}
case <-m.mineNowCh:
go m.mineKnownKeystones(ctx)
case <-time.After(15 * time.Second):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a giveaway there is a chance that logic race exists in this code somewhere.

@@ -12,6 +12,7 @@ import (
"fmt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I cannot put my finger on it, it seems not unlikely that there is a logic race/error somewhere in this code. I would create a test that has hundreds of go routines attacking the pipeline and see what test -race comes up with.

Copy link
Contributor

@marcopeereboom marcopeereboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not be fully appreciating what is going on but let's spend a minute talking through this PR.

bfgCmdCh: make(chan bfgCmd, 10),
holdoffTimeout: 5 * time.Second,
requestTimeout: 5 * time.Second,
mineNowCh: make(chan struct{}, 1),
l2Keystones: make(map[string]L2KeystoneProcessingContainer),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These really should have an initial size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I can do that, I can make it the size of the max keystones (10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

func (m *Miner) mineKnownKeystones(ctx context.Context) {
keystonesFailed := false

m.ForEachL2Keystone(func(ks hemi.L2Keystone) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a proper for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the ForEachL2Keystone to handle the locking and copying, I could pull it out and make it a for loop if you want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to a for loop 👍

func (m *Miner) l2KeystonesForProcessing() []hemi.L2Keystone {
m.mtx.Lock()

copies := []hemi.L2Keystone{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be allocated with make outside of the mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure 👍

case <-ctx.Done():
return
case <-time.After(l2KeystoneRetryTimeout):
go m.mineKnownKeystones(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not `go recurse calls. And can we get rid of the channel?

* keep buffer of 10 newest keystones known
* upon receiving a keystone, only insert into buffer if it is newer than the oldest keystone
* if buffer is full, drop oldest keystone upon inserting another
* when a keystone is mined, mark it as "processed" to not repeat mining same keystone
* generate fewer btc blocks in full e2e network test since we have a retry strategy
@ClaytonNorthey92 ClaytonNorthey92 force-pushed the clayton/popm-keystone-retry branch from c25ed44 to 4927945 Compare March 11, 2024 10:39
service/popm/popm.go Show resolved Hide resolved
service/popm/popm_test.go Show resolved Hide resolved
service/popm/popm_test.go Show resolved Hide resolved
service/popm/popm.go Outdated Show resolved Hide resolved
@ClaytonNorthey92 ClaytonNorthey92 merged commit 8896259 into main Mar 14, 2024
1 check passed
@ClaytonNorthey92 ClaytonNorthey92 deleted the clayton/popm-keystone-retry branch March 14, 2024 13:48
web3cryptoguy pushed a commit to web3cryptoguy/heminetwork that referenced this pull request Nov 1, 2024
* retry keystone on failure

* keep buffer of 10 newest keystones known
* upon receiving a keystone, only insert into buffer if it is newer than the oldest keystone
* if buffer is full, drop oldest keystone upon inserting another
* when a keystone is mined, mark it as "processed" to not repeat mining same keystone
* generate fewer btc blocks in full e2e network test since we have a retry strategy

* use requiresProcessing so multiple goroutines don't process the same keystone at once

* just copy the keystone for processing, not the whole buffer element

* squash

* map attached to miner

* better wording

* don't retry l2 keystone broadcast in tests

* on retry per loop

* do not remove from map; mark as processed or processing

* use one flag

* remove foreach and set size

* update comment

* make and other timer

* ok check in test

* added test case

* fix start height since we generate fewer

* formatted declaration

* removed extra line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: popm This is a change to popm (PoP Miner) size: XL This change is very large (+/- <1000) type: enhancement This improves existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement "retry" when mining L2 Keystones in PoP Miner
3 participants