Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: resolve data race in Broadcasters system #153

Merged
merged 6 commits into from
Jun 4, 2024

Conversation

cwaldren-ld
Copy link
Contributor

@cwaldren-ld cwaldren-ld commented May 29, 2024

This resolves a data race in the Broadcaster implementation, specifically HasListeners() was not protected by a lock.

Additionally it swaps out the mutex for a RWLock so that concurrent readers don't block each other.

A behavioral change is that the Broadcast method now locks the subscribers array, so that Add or RemoveListener cannot be called concurrently.

JohnStarich and others added 2 commits May 28, 2024 14:09
**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/v5/CONTRIBUTING.md#submitting-pull-requests)
- [ ] I have validated my changes against all supported platform
versions
* What does "platform" mean here? Go version? OS version? Happy to help
test, just not sure what I'm testing!

**Related issues**

I hit data races when running tests in parallel, which effectively
obscures other data races my application may have. It looks like others
did too, judging by #102.
Fixes #102

**Describe the solution you've provided**

I've addressed 2 data races with improved locking:
* `HasListeners()` had a data race due to any mutation on
`b.subscribers`
* `Close()` had a data race where closing a send channel triggers a
panic in `Broadcast()`

I also saw an easy opportunity to use more fine-grained locks with an
RWMutex, although I'm happy to back that out if you would prefer.

**Describe alternatives you've considered**

I also considered using an atomic data type for subscribers, but I
figured that change would be less of a surgical fix. It also may be more
difficult to mutate the slice, since a compare-and-swap can fail and
would need a loop (it's not as simple as `atomic.Add`).

Another idea which may be simpler is using a channel to manage shutdown.
Typically I'd use a `context.Context` here to manage cancellation.
That'd also prevent `Broadcast()`ing on a full send channel from
blocking `Close()`.

**Additional context**

Add any other context about the pull request here.
This is a small cleanup to the existing Broadcasters implementation to
use `slices.DeleteFunc`.
@cwaldren-ld cwaldren-ld requested review from tanderson-ld, kinyoklion and keelerm84 and removed request for kinyoklion May 29, 2024 23:21
@cwaldren-ld cwaldren-ld marked this pull request as ready for review May 29, 2024 23:22
@cwaldren-ld cwaldren-ld requested a review from a team May 29, 2024 23:22
for _, ch := range ss {
ch.sendCh <- value
}
b.lock.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is a behavioral change here, which may or may not be relevant. But previously during the broadcast you could remove a listener (the lock wasn't retained during the actual broadcast), and now you wouldn't be able do get the exclusive lock during that time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we probably shouldn't hold the lock during the broadcast as that could take an unbounded amount of time to complete. I'll revert that and add a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so there's a problem here: if we allow removing a listener while Broadcast is running concurrently, then it could try to send to a channel that is closed (due to removing the listener.) This is a panic.

So that behavior was already bad.

I'm thinking it's better to say "if you aren't draining your receivers, you're not gonna be able to add or remove receivers" which would be the case if we locked around the broadcast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reverting to locking around the broadcast. I'll note in the release that this may affect the use-case of adding/removing a listener while broadcasting.

This test is fundamentally flawed in that the order of executing the
broadcasters is important. We can't execute them in random order without
causing deadlock, as receivers need to drain their channels for
broadcast to unblock.
@cwaldren-ld cwaldren-ld force-pushed the cw/sc-244791/broadcast-data-race branch from eaa33b9 to 93c1e39 Compare June 3, 2024 22:27
@cwaldren-ld cwaldren-ld merged commit 68cb1a4 into v7 Jun 4, 2024
14 checks passed
@cwaldren-ld cwaldren-ld deleted the cw/sc-244791/broadcast-data-race branch June 4, 2024 18:04
@github-actions github-actions bot mentioned this pull request Jun 4, 2024
cwaldren-ld pushed a commit that referenced this pull request Jun 4, 2024
🤖 I have created a release *beep* *boop*
---


##
[7.4.1](v7.4.0...v7.4.1)
(2024-06-04)


### Bug Fixes

* add warning log message if client init duration is longer than
recommended (60s)
([68e3440](68e3440))
* align high timeout log msg with spec
([#150](#150))
([606d224](606d224))
* resolve data race in Broadcasters system
([#153](#153))
([68cb1a4](68cb1a4))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants