-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close previous conn when switching to new conn. #351
Conversation
This is a different attempt at #350. Feels a bit cleaner, i. e. if the internal `conn atomic.Value` is replaced, the previous `conn` is closed before doing that.
signalclient.go
Outdated
if c.OnClose != nil { | ||
c.OnClose() | ||
} | ||
c.conn.Store((*websocket.Conn)(nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot store nil into an atomic.Value; it'll panic: https://pkg.go.dev/sync/atomic#Value.Store
would be good to use typed atomic.Pointer instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting nil
to the concrete type does not cause a panic. Learnt that today. I can't find that post now, but here is a Go Playground example - https://go.dev/play/p/-O0qiTGJxJs>
But, I will change to look at atomic.Pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, because a reference to an interface with a type but a nil object is not nil, which makes nil checks really confusing at time. Ask me how I know...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know @biglittlebigben 😊?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic in prod at 3am. It's always a panic in prod at 3am...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using atomic.Pointer
in this commit.
NOTE: There is comment in README.md that this module requires go 1.18+ since version 1.0. Need to update that to 1.19+ when we cut a release.
res, err := c.readResponse() | ||
if err != nil { | ||
if !isIgnoredWebsocketError(err) && !c.isClosed.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll also need to make sure readResponse returns EOF if it's expected close. otherwise it'll keep spamming logs about error while reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, this does not do anything different. A unilateral close (by calling Close() method) will close the underlying conn and read will fail with reading a closed conn
. And this loop will exit/return.
Before this change, the code was only checking isClosed
which is set by the unilateral Close()
.
The remote close will be an error returned by ReadMessage
. Yes, this change will log one message more because it will log an error on an unilateral Close()
too. But, nothing more.
Once an error is returned, the loop is exited.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm reading this incorrectly.. here it would previously return io.EOF when Close
has been called due to the c.isClosed.Load()
check.. currently I think it'll go into errors.New("cannot read response before join")
?
It seems that we should not return an error if Close has been called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct @davidzhao . You are reading it right. But, what I was saying is that Close()
in this PR sets the conn
to nil
(the other discussion about atomic.Value == nil
, I will change that to an atomic.Pointer
). So, yes readResponse
will return errors.New("cannot read response before join")
.
But, the readLoop
will exit because of that error return.
Only difference is that previously, the read loop would not have logged an error before exiting as it was checking for isClosed
and with this PR, it will log an error before exiting. So, what I was mentioning was that I could not spot any runaway loops repeatedly logging an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't mean it would be in a loop. Just that in server environments, it would be good to avoid logging errors when the user has closed the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I misunderstood. I will look at adding a change to filter out the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in this PR - #353
This seems to fix the cloud-ingress restart issue. |
res, err := c.readResponse() | ||
if err != nil { | ||
if !isIgnoredWebsocketError(err) && !c.isClosed.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm reading this incorrectly.. here it would previously return io.EOF when Close
has been called due to the c.isClosed.Load()
check.. currently I think it'll go into errors.New("cannot read response before join")
?
It seems that we should not return an error if Close has been called.
@@ -348,18 +350,17 @@ func (c *SignalClient) handleResponse(res *livekit.SignalResponse) { | |||
func (c *SignalClient) readWorker() { | |||
defer func() { | |||
c.isStarted.Store(false) | |||
if c.OnClose != nil { | |||
c.OnClose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means OnClose
will only be called when Close
is called instead of connection is interrupted, and the Engine
relies this callback to handle signal disconnection and resume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @cnderrauber . Will take a look. Was hitting a data race in the tests with the OnClose()
in the defer
. So, I serialised it :-( My bad. Will take a look at calling back on read loop exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cnderrauber Made changes in these two commits
- 9a727bc - moving
OnClose()
callback back todefer
ofreadWorker
. - 0acef17 - using a local variable to avoid race. The race reported before is something like this. The test was calling simulate scenario reconnect which called signal client close. And the read on
readerCloserCh
inClose()
was being accessed in bothClose()
andStart()
(where it is changed to set up for next read worker). Getting around it by using a local variable inClose()
and read on that.
Please have a look when you get a chance and let me know if you have concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for fixing this.
This is a different attempt at
#350. Feels a bit cleaner, i. e. if the internal
conn atomic.Value
is replaced, the previousconn
is closed before doing that.