Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[phantom-presence] handle stray websockets #207

Merged
merged 4 commits into from
Sep 13, 2024
Merged

[phantom-presence] handle stray websockets #207

merged 4 commits into from
Sep 13, 2024

Conversation

stopachka
Copy link
Contributor

@stopachka stopachka commented Sep 12, 2024

This PR:

  1. On the client, we make sure to close the previous websocket connection, when we are about to start a new one
  2. On the server, we started a ping-pong mechanism, where we detect idle connections and proactively kill them

Context

Kosmik reported a curious bug: If you went offline and went back online, you would get a phantom presence object. What happened?

Bugs

The fundamental bug was this: tcp connections stayed 'active', even when we went offline.

We assumed that the underlying websocket would close when we went offline. But this is not the case. Google Chrome does not close the socket automatically when you go offline. [1]

What happens to idle TCP connections really depends on the machine [2]. Sometimes it can take minutes to close the connection.

Solutions

I did two things in this PR:

1) Client: Close previous connections when starting new ones

In Reactor, when we go back online, we run _startSocket. We used to assume that previous _ws would have been closed. Now no more: when we start a new socket, we'll make sure to close the previous one.

2) Server: Start ping/pong

Having the client close stray connections is a good step. But it doesn't quite cover it. For example, if the user goes offline, and never goes back online, then what?

The server could keep the connection around for minutes. Here's what we do now: the server sends ping messages every 5 seconds. If we don't receive some message within 10 seconds, we close the connection.

[1] Here's an interesting chrome bug report
[2] Interesting microsoft article hiredman shared with us

I kicked off a conversation about this on the clojurians slack

Copy link

View Vercel preview at instant-www-js-socket-issues-jsv.vercel.app.

@stopachka stopachka changed the title [phantom-presence][1] make sure we close websocket connections before we start new ones [phantom-presence] handle stray websockets Sep 13, 2024
(let [pooled (.getData message)]
(try
(let [payload (.getResource pooled)]
(on-message {:channel (channel-wrapper channel)
:data (Util/toArray payload)}))
(finally (.free pooled)))))
(onPong [^WebSocketChannel channel ^StreamSourceFrameChannel channel]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be wondering:

How does the client know to send a pong, we did not change Reactor.

Most clients now can handle websocket pings by default. So since the server sends ping, chrome will automatically respond with a pong.

(Tested this with chrome and expo iOS)

Copy link
Contributor

@dwwoelfel dwwoelfel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@@ -548,7 +533,6 @@
:http-req http-req
:ws-conn ws-conn
:receive-q receive-q
:ping-job (start-ping-job store-conn id)
:pending-handlers pending-handlers}]
(on-open store-conn socket)))
:on-message (fn [{:keys [data]}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the atomic-last-received-at in on-message also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed this down to the caller of on-message, so this already happens. Took the inspiration of updating on every message from Precursor : )

@stopachka stopachka merged commit 5c86302 into main Sep 13, 2024
21 checks passed
@stopachka stopachka deleted the socket_issues branch September 13, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants