Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix deadlock raised in #11 #13

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

said-saifi
Copy link

@said-saifi said-saifi commented Apr 1, 2022

This fixes #11 .

Root cause:

  • When an error occurs we exit a worker by returning from startG function.
  • If the number of workers gCount reaches 0 while all other functions are already in the buffer then a deadlock will be reached since there is no chance a new worker will be created and the current number of workers is 0.

Fix: (Note: there are different ways of achieving this)

  • Before adding to the buffer check if there was already an error, in case there was then just exit and don't try to add to buffer.
  • Unit test added

@@ -40,8 +40,10 @@ type Group struct {

wg sync.WaitGroup

errOnce sync.Once
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now we will be reading err continuously then we needed an RWMutex already.
We could keep the sync.Once but the same can be achieved with RWMutex that is already needed, so removed the sync.Once. Let me know if you think it's cleaner to keep both sync.Once and sync.RWMutex.

@@ -154,13 +156,28 @@ func (g *Group) Go(f func() error) {
return
}

g.qCh <- f
for {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this for loop we simply do the following:

  • Get a read lock on err
  • If err is NOT nil we return as there is no need to send function to buffer
  • If err is nil we try to send to channel, if we can't send to channel then we break the select and go back to checking if err is not nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only concern with this is it could become a busy loop. I wonder if sync.Cond can do something better here

@@ -204,16 +221,30 @@ func (g *Group) startG() {
return
}

if err := f(); err != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nicer but as explained above we are already needing the RWMutex so this can be achieved by using the RWMutex instead of also introducing a new field sync.Once.

So let me know if you think its better to stay using sync.Once in addition to the RWMutex just for readability.

@said-saifi
Copy link
Author

said-saifi commented Apr 1, 2022

As stated here this can be tried out using

go mod edit -replace github.com/neilotoole/errgroup=github.com/alpacahq/errgroup@master

err error

// errMu protects err.
errMu sync.RWMutex
Copy link

@umitanuki umitanuki May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomc.Value could also work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock with context
2 participants