-
-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HTTP long polling as an alternative to WebSockets #859
Conversation
afb841e
to
d38b44b
Compare
There's a case for doing that, since CI style testing doesn't really capture the diversity of browsers and company firewalls and network latency out there. But I'd guess this would involve significant more work? An alternative would be, once we're happy with the PR, to tag it for release to a fly.io preview system we have before it lands. Then we could have volunteers from all over the world (recruited from our discord) hammer on it a bit. That could still miss a problem that only shows up after e.g. multiple days work with intermittent laptop suspends or whatever, but it would increase confidence that whatever problem remains it'll be something we can handle in follow-up. |
1081d2f
to
5990029
Compare
To make sure we're on the same page: if when saying "this would involve significant more work" you're referring to the complexity of having both implementations coexist in the code, this is something I have already handled. I briefly tried a wholesale substitution at the start but realized I was not going to be able to ensure the absence of regressions if I was not able to keep running the previous (raw WebSockets) implementation side-by-side. Basically, with the code in this PR, setting (at run time — it has no effect on the build) However you may have been referring to extra work on the release management and support side of things — in which case I agree that there would be an added burden in supporting these two configurations. In any case, having this new implementation put through its paces by volunteers sound like a great idea. I have just managed to get all the tests to pass, so I'll mark this PR as ready for review and hopefully it can be deployed on your testing environment. |
I have updated the PR description to reflect the current implementation status. I have also pushed a commit that adds |
The Fly deployment failure seems caused by the fact that PRs based on branches from forks do not have access to repository secrets, for obvious security reasons. You may have to clone this branch to one local to this repository? |
Yes, you're right, done. And thanks for adding the environment variable, I was just kicking the preview machinery to make sure it still works. For anyone following along, a preview is over in https://grist-http-long-polling.fly.dev/ For the work: thanks for reminding me you had this feature under a flag already. If code review and preview all seem fine I think I'll be voting for making the new behavior the default, though it will be good that any installation that sees trouble has an easy way back to the old behavior. Will get someone on reviewing this as soon as possible. Thanks a lot for your work! Will be exciting to have this extra level of robustness. |
(just did a quick test in firefox with |
3bc132e
to
3b82ad6
Compare
Testing with a more complex deployment including multiple doc workers and a load balancer revealed that the addition of the I'll be trying to get rid of that prefix so that Grist admins don't need to rework their setup. |
Hi! I added a few commits that :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work, thank you @jonathanperret!
app/server/lib/GristServerSocket.ts
Outdated
log.debug("got socket close reason=%s description=%s messages=%s", | ||
reason, description?.message ?? description, [...this._messageCallbacks.keys()]); | ||
|
||
const err = description ?? new Error(reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is description
an Error
instance, when present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see when it is constructed and forwarded, the description
is indeed an Error
instance. It is not documented as such however, unfortunately.
I'm not sure what to do here to make this cleaner. Maybe just ignore the description
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, maybe just add a comment that in practice description
is an instance of Error
, or just change its type to Error
if Typescript doesn't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for a comment and hopefully robust way of always ending up with an Error object. I also changed GristServerSocket.onerror
to take an (err: Error) => void
callback.
const err = description ?? new Error(reason); | |
// In practice, when available, description has more error details, | |
// possibly in the form of an Error object. | |
const maybeErr = description ?? reason; | |
const err = maybeErr instanceof Error ? maybeErr : new Error(maybeErr); |
constructor(server: http.Server) { | ||
super(); | ||
this._server = new EIO.Server({ | ||
allowUpgrades: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this false because we try websocket before long-polling, and this option is for upgrading from long-polling to websockets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just suggest adding a comment for anyone else wondering the same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
app/server/lib/GristSocketServer.ts
Outdated
// this CORS behavior will only apply to polling requests, which end up | ||
// subjected downstream to the same Origin checks as those necessarily | ||
// applied to WebSocket requests, it is safe to let any client attempt | ||
// a connection here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, it looks like your security assumption isn't actually the reality today. It doesn't look like we are checking the Origin header for websocket requests. I had to test whether there is a security vulnerability because of that. It looks like there is not, currently, at least on modern browsers, because we set SameSite=Lax attribute on session cookies, so they are not actually sent with CORS requests. With websockets, I don't think there is a way to include them. So a CORS ws connection may be made, and used to access public documents, but it would not include credentials to access private ones.
Your assumption (that an Origin check would necessarily get applied to all WebSocket requests) makes sense. I think it's important even if SameSite=Lax provides protection, if only for consistency with other request handling where we have decided it's important (and also because my quick check does not rule out all possible shenanigans). Incidentally, I found this commit as our latest change to origin checking.
Is that something you would consider adding as part of this diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, good thing you looked into this then. Indeed, after a bit of testing on my side I can confirm it looks like SameSite
is what is really keeping cross-origin WebSockets from being a vulnerability (WebSocket requests do include what cookies they can by default, without having to add e.g. credentials: "include"
as is required with fetch
).
It looks like the underlying assumption that "Opening up CORS for regular requests that are being handled by the same code that treats WebSocket requests (however insecurely) does not make things worse" still holds though.
But I agree that checking Origin
would make for a more secure implementation all around. I would not mind looking into including the extra check, however I would require a few pointers to find the proper methods to call for verifying the origin.
The commit you referenced modifies the FlexServer.trustOriginHandler
method but I cannot call this when validating a WebSocket request, since it is not in the (req, res, next)
context of an Express middleware. The underlying requestUtils.trustOrigin
is still bound to a Request
/Response
interface and can add response headers which again do not make sense in a WebSocket response (by the way, the fact that on this line req.hostname
is checked rather than origin
is puzzling).
It looks like the core of what trustOrigin
does is in this line:
if (!allowHost(req, new URL(origin)) && !isEnvironmentAllowedHost(new URL(origin)))
I could add something like this early in Comm._onWebSocketConnection
(where I found out WebSocket requests whose Host
header does not match an allowed value are rejected, as a result of calling hosts.addOrgInfo
).
But looking at allowHost
, I see a cast of req
to RequestWithOrg
(when the raw WebSocket request clearly has no organization info, at least when it first arrives) and this gets into a part of the code I have no familiarity with yet. I wouldn't want to break a legitimate scenario, so I'm afraid some hand-holding will be necessary here.
If you can show me how and where you would add this Origin check to WebSocket requests in the current codebase, I think I would easily port this to my rearranged version of the code in this PR. Come to think of it, maybe it would be more prudent to add this check in a separate (prerequisite to this one) PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...require a few pointers to find the proper methods to call for verifying the origin.
What you suggested is exactly what I had in mind. addOrgInfo
is what turns a req
into RequestWithOrg
, so if you add the allowHost
check afterwards, it should almost make sense. "Almost" because you would need to adjust some methods to accept either express.Request
or the underlying http.IncomingMessage
. I think that's always possible by getting headers using req.headers[lowercase_name]
.
If allowHost
returns false, we could then either fail when credentials are present (since we don't expect them to be), like trustOriginHandler
does, or perhaps just strip the credentials and keep going.
the fact that on this line req.hostname is checked rather than origin is puzzling
Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0
), but if it did have an effect, it also looks like the wrong one to me. I'd try removing it and seeing if any tests fail. If you can try that, that would be great; if not, let me know, I'll make a note so it doesn't fall through the cracks. Thank you for noticing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fact that on this line req.hostname is checked rather than origin is puzzling
Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0), but if it did have an effect, it also looks like the wrong one to me. I'd try removing it and seeing if any tests fail. If you can try that, that would be great; if not, let me know, I'll make a note so it doesn't fall through the cracks. Thank you for noticing!
I can confirm that removing this line has no effect on tests. I ran the whole test suite on GitHub Actions, even though it seems only the DocApi
test suite would catch a regression in this code based on the fact that it seems to be the only suite where the Origin
header is varied.
So I went ahead and removed that line. Now looking into getting websocket request origins checked as you described. This will involve adding some Comm
tests I believe.
app/server/lib/GristServerSocket.ts
Outdated
log.debug("got socket close reason=%s description=%s messages=%s", | ||
reason, description?.message ?? description, [...this._messageCallbacks.keys()]); | ||
|
||
const err = description ?? new Error(reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, maybe just add a comment that in practice description
is an instance of Error
, or just change its type to Error
if Typescript doesn't mind.
constructor(server: http.Server) { | ||
super(); | ||
this._server = new EIO.Server({ | ||
allowUpgrades: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just suggest adding a comment for anyone else wondering the same question.
public set onmessage(cb: null | ((data: string) => void)) { | ||
this._ws.onmessage = cb ? | ||
(event: WS.MessageEvent | MessageEvent<any>) => { | ||
cb(event.data instanceof String ? event.data : event.data.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work on the client-side? It looks like it might string
in practice, but if it's not, then according to documentation, it would be ArrayBuffer or Blob, and neither would do anything useful with toString()
, I don't think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the client side, I think it is supposed to always be a string
since we are only sending text frames, see https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/message_event#event_properties
If the message type is "text", then this field is a string.
On the server side (using ws
), the call to .toString()
here is meant to only make explicit what JSON.parse
in GristWSConnection
was doing implicitly.
But in fact, looking at what ws
does to emulate the HTML5 WebSocket API, it seems the toString()
is not even necessary since it will have been node by ws
to convert its NodeJS Buffer
object (which has a useful toString
, as contrasted from ArrayBuffer
or Blob
) to a string to match standard WebSocket behavior.
So I've gone ahead and removed the toString()
and added an explanatory comment.
cb(event.data instanceof String ? event.data : event.data.toString()); | |
// event.data is guaranteed to be a string here because we only send text frames. | |
// https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/message_event#event_properties | |
cb(event.data); |
app/server/lib/GristServerSocket.ts
Outdated
cb(err); | ||
} | ||
for (const cb of this._messageCallbacks.values()) { | ||
cb?.(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?.
part shouldn't be needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I removed it.
app/server/lib/GristServerSocket.ts
Outdated
log.debug("calling cb for msg %d", msgNum); | ||
cb(); | ||
if (this._messageCallbacks.delete(msgNum)) { | ||
cb?.(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can cb
be undefined? That's fine, but it would be good if its type reflected that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I changed send
so that the callback is optional.
public send(data: string, cb?: (err?: Error) => void) {
const msgNum = this._messageCounter++;
if (cb) {
this._messageCallbacks.set(msgNum, cb);
}
this._socket.send(data, {}, () => {
if (cb && this._messageCallbacks.delete(msgNum)) {
cb();
}
});
}
app/server/lib/GristSocketServer.ts
Outdated
// this CORS behavior will only apply to polling requests, which end up | ||
// subjected downstream to the same Origin checks as those necessarily | ||
// applied to WebSocket requests, it is safe to let any client attempt | ||
// a connection here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...require a few pointers to find the proper methods to call for verifying the origin.
What you suggested is exactly what I had in mind. addOrgInfo
is what turns a req
into RequestWithOrg
, so if you add the allowHost
check afterwards, it should almost make sense. "Almost" because you would need to adjust some methods to accept either express.Request
or the underlying http.IncomingMessage
. I think that's always possible by getting headers using req.headers[lowercase_name]
.
If allowHost
returns false, we could then either fail when credentials are present (since we don't expect them to be), like trustOriginHandler
does, or perhaps just strip the credentials and keep going.
the fact that on this line req.hostname is checked rather than origin is puzzling
Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0
), but if it did have an effect, it also looks like the wrong one to me. I'd try removing it and seeing if any tests fail. If you can try that, that would be great; if not, let me know, I'll make a note so it doesn't fall through the cracks. Thank you for noticing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! All the changes look good so far, including the removal of the GRIST_HOST check (which I am not seeing in the PR yet but I saw your note that you are removing it).
Thank you also for working on the origin checks!
3b2a88c
to
0139004
Compare
The reference to "window" would unexpectedly break Node-based tests where a client stays connected long enough to send a heartbeat. GristWSSettings.getPageUrl seems explicitly made for this.
This could avoid false results from the tests.
It seems this was added for test purposes but the current tests all pass without this, and it looks a bit safer to remove it.
4c47e47
to
63de9af
Compare
These classes, used as an alternative to native WebSockets, provide an automatic fallback to HTTP long polling (implemented using Engine.IO) when a WebSocket connection fails.
63de9af
to
ea08b49
Compare
Hi @dsagal , I finally found the time to work on this again, putting in what will hopefully be the last big change. For origin verification, I added a More significantly, I have changed my approach to integrating Engine.IO. It is no longer introduced as a unifying abstraction over WebSockets and long polling, as I initially hoped to do. Instead, the existing WebSocket-based protocol remains the default communication method, with an automatic fallback to the long polling implementation provided by Engine.IO. Among other things, this solves a problem revealed by the The most commonly taken path will therefore remain the native WebSocket connection without the overhead of the Engine.IO protocol, which makes regressions much less likely. This in turn led me to remove the I have squashed most of the commits into one that introduces the Apologies for the review churn, and as always I'm looking forward to your feedback. |
} | ||
|
||
export class GristClientSocket { | ||
private _wsSocket: WS.WebSocket | WebSocket | undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to have a comment that exactly one of _wsSocket
, _eioSocket
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
private _onEIOOpen() { | ||
this._openHandler?.(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to set _openDone
? And doesn't _onEIOMessage
need to check that? (If not, a comment to explain would be good, since that's a sign of a difference with WS implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned below, I reworked the logic and there's only one state check left, in _onWSError
since that's the only point where we need to decide whether to emit an error or downgrade the socket.
_onWSMessage
doesn't need to check the state (now called _wsConnected
) because it can only occur after a successful call to _onWSOpen
. None of the _onEIO
handlers need to check the state either, since the very fact they are called implies the socket was downgraded to Engine.IO.
this._openHandler?.(); | ||
} | ||
|
||
private _onWSError(ev: Event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's impossible to receive the error
event for a socket that's already open? If possible, should we forward it to _errorHandler rather than reconnect using polling at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I ended up simplifying the logic, replacing _openDone
with a _wsConnected
bool that only tells us whether the next WebSocket error
event means we should retry with Engine.IO or emit the error.
app/server/lib/requestUtils.ts
Outdated
return true; | ||
} | ||
|
||
// Returns whether req satisfies the given allowedHost. Unless req is to a custom domain, it is | ||
// enough if only the base domains match. Differing ports are allowed, which helps in dev/testing. | ||
export function allowHost(req: Request, allowedHost: string|URL) { | ||
export function allowHost(req: IncomingMessage, allowedHost: string|URL) { | ||
const mreq = req as RequestWithOrg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest avoiding creating mreq
variable, since it's a risky type conversion (e.g. mreq.get()
is presumably not always available but would be considered correct type), and instead doing this cast inline in the one place where it's used, i.e. (req as RequestWithOrg).isCustomHost
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea about scoping the cast more tightly.
Actually, looking into it I realized the check as I had written it in Comm
would not work for custom hosts, because allowHost
was called before the request object had been enriched with organization info. I reworked the check to have the call to addOrgInfo
done before trustOrigin
.
if (process.env.APP_HOME_URL) { | ||
return new URL(process.env.APP_HOME_URL).protocol.replace(':', ''); | ||
} | ||
return req.get("X-Forwarded-Proto") || req.protocol; | ||
return req.headers["x-forwarded-proto"] || ((req.socket as TLSSocket).encrypted ? 'https' : 'http'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the implementation of protocol
for express.Request. There is a difference around trust-proxy settings which protocol
checks. But it's moot because we seem to have been blindly trusting x-forwarded-proto
header anyway. That doesn't seem quite right (maybe it's safe in this case, but it's certanly not obvious to me). It's fine not to address here, but if you don't mind, I'd appreciate a TODO to look into when x-forwarded-proto
should be trusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO Comment added.
if (cb) { | ||
this._messageCallbacks.set(msgNum, cb); | ||
} | ||
this._socket.send(data, {}, () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the callback ever have an argument here (e.g. if err
can be received, should it be passed on)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Engine.IO only calls the send()
callback on success, without an argument. So if we get here the err
callback argument is correctly undefined
. I'm adding a comment to clarify.
app/server/lib/GristServerSocket.ts
Outdated
|
||
public set onmessage(handler: (data: string) => void) { | ||
this._ws.on('message', (msg: Buffer) => handler(msg.toString())); | ||
this._eventHandlers.push({ event: 'message', handler }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to remember the wrapper function to match on removal (handler
is not exactly what got added with _ws.on('message')
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry about missing that again, fixed.
app/server/lib/Client.ts
Outdated
@@ -189,7 +184,7 @@ export class Client { | |||
|
|||
public interruptConnection() { | |||
if (this._websocket) { | |||
this._removeWebsocketListeners(); | |||
this._websocket?.removeAllListeners(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question mark unneeded here (given the conditional)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, removed.
I just updated the preview at https://grist-http-long-polling.fly.dev/ and edited a doc using firefox with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you @jonathanperret !!
Thanks @jonathanperret and @dsagal ! This is really great to have. One thing that came to my mind recently as possible follow-up. @jonathanperret is it possible to determine whether long-polling fallback is happening consistently, and log a warning if so? Not setting up websocket forwarding correctly is a fairly common configuration problem (one example of many: #261). Until now people have been forced to notice and fix this problem since they can't use documents at all until they get it right. But now, they might never even notice. Which isn't terrible, but seems a bit sub-optimal? |
Thank you guys for helping bring this to fruition. Can't wait to deploy it and unlock the power of Grist for our restricted-network users. @paulfitz I see what you mean. I'm not optimistic however that a Grist administrator would notice such a warning, particularly since it wouldn't occur right on server startup. Maybe calling out WebSockets as worthy of attention in the self-managing docs would help more? It appears there are zero hits for "WebSocket" currently on https://support.getgrist.com. |
I agree, that is true. There is an admin panel coming soon where installation problems can be highlighted. If the warning existed, then the developer working on that panel would just need to find it in the code and hook up a way to report it better.
Good point! We have a systems engineer joining the team next week whose mandate will include smoothing out the self-hosting experience. I've added that to the list of easy things to start with :-) |
if (!origin) { return true; } // Not a CORS request. | ||
if (process.env.GRIST_HOST && req.hostname === process.env.GRIST_HOST) { return true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for an explanation for why this line was removed, as I believe it's responsible for certain requests failing under Docker and possibly under other circumstances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From earlier in the review:
Jonathan:
the fact that on this line req.hostname is checked rather than origin is puzzling
Me:
Doh, it's puzzling to me too. It looks like a mistake. It seems it was added for tests, and in practice normally has no effect (because GRIST_HOST is either unset or set to something like 0.0.0.0), but if it did have an effect, it also looks like the wrong one to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When GRIST_HOST
is set to 0.0.0.0
, which seems to be the case when running all of Grist within the same Docker instance from the gristlabs/grist
image, certain internal requests from the doc workers to the home servers seems to fail. This seems to be the underlying cause for why certain operations like duplicating documents fail when running the tests under Docker.
But surely running the Docker tests isn't the only reason to have a setup in which we might need such internal requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a somewhat related discussion in #915
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordigh would you have a repro scenario for requests failing since this change, under Docker or otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivation
This PR is a proposed fix for #840 .
As mentioned in that issue, the motivation for supporting an alternative to WebSockets is that while all browsers supported by Grist offer native WebSocket support, some networking environments do not allow WebSocket traffic.
Implementation
Engine.IO was selected as the underlying implementation of HTTP long polling, after careful consideration as described here.
While Engine.IO offers both a WebSocket-based transport and HTTP long polling, and the ability to transparently upgrade from long polling to WebSockets, I chose to only use Engine.IO for its polling implementation, to avoid imposing the overhead of the Engine.IO protocol (and server-side memory usage) on the vast majority of clients, which can use plain WebSockets.
The Grist client will first attempt a regular WebSocket connection, using the same protocol and endpoints as before, but fall back to long polling using Engine.IO if the WebSocket connection fails.
To get all of the test cases to pass (particularly the
Comm
tests) I had to add some form of emulation of the behavior of thews
library to the Engine.IO-based implementation. In particular, while the Engine.IO socket'ssend
method accepts a callback, it is only called upon success and not if the transfer fails. Because a failed transfer immediately causes the socket to be closed, I keep track of messages that have not had their callback called yet and invoke these uponclose
if any are still pending.