-
-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce APP_HOME_INTERNAL_URL and fix duplicate docs #915
Introduce APP_HOME_INTERNAL_URL and fix duplicate docs #915
Conversation
b76c4c4
to
1bf5fd5
Compare
LGTM :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mostly stylistic suggestions, feel free to disregard them.
More globally, I have doubts about building on the selfPrefix
workaround; as described here it only exists because it seems we have uncertainty about the app's URL. Maybe effort would be best spent resolving that uncertainty at the source instead of carrying that URL/prefix duality all the way, with the complexity it entails?
In a similar vein, the APP_HOME_INTERNAL_URL
variable we are introducing here would not be necessary if fetchDoc
directly consulted the DocWorkerMap
to find a document's worker URL (basically copying this logic) instead of bothering a home worker for that information. After all, the doc worker does have access to DocWorkerMap
since it registers itself in it. I did prototype this approach locally and it seemed to work, but more testing might be needed as there are so many possible configurations.
Again, feel free to ignore these ramblings 😉
app/server/lib/AppEndpoint.ts
Outdated
res.json({ | ||
docWorkerUrl: customizeDocWorkerUrl(docStatus.docWorker.publicUrl, req), | ||
internalDocWorkerUrl: docStatus.docWorker.internalUrl | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this endpoint (/api/worker/:id
) is called both from the client (in UserAPI
) and from a worker (in uploads
) but it does feel a bit weird to be returning an internal URL from a public-facing API.
Perhaps the endpoint could be split?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds a good idea. Also would give a hint about who is the caller in the logs.
Maybe then:
/api/worker/:assignmentId/public
- and
/api/worker/:assignmentId/internal
But there is this /?*
in the existing path that puzzles me a bit…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting API design problem.
On the one hand, this sound like an API you'd want exposed only to internal callers, which would call for introducing a prefix (so that e.g. these routes could be blocked on the reverse proxy), as in /internalapi/worker/:assignmentId
. On the other hand, I can imagine scenarios where an internal call would be made to obtain an external URL, e.g. for redirecting a client to another worker. So, just because it is an internal call doesn't mean you want (only) the internal URL. So you'd end up with /internalapi/worker/:assignmentId/internal
I guess. Or maybe just have /internalapi/worker/:assignmentId
return both internal and public URLs as the current endpoint does.
Alternately, keep a single (public) endpoint but include the internal URL only if the caller was detected as internal (perhaps through a header that the reverse proxy blocks).
Or, as I suggested above, maybe drop the need for internal calls to this endpoint altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be resolved with the latest changes.
app/common/gristUrls.ts
Outdated
function isInternalUrl(host: string, envValue?: string) { | ||
if (!envValue) { return false; } | ||
const internalUrl = new URL('/', envValue); | ||
return internalUrl.host === host; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the need for new URL('/', …)
instead of just new URL(…)
?
Also, since it is now quite generic, perhaps this function would be better called something like hostMatchesUrl(host: string, url: string?)
?
As an aside, I'm slightly puzzled this works at all since APP_DOC_INTERNAL_URL
is only defined inside a doc worker, and can contain only the URL for this worker, whose host isn't guaranteed to match any other worker's host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, I'm slightly puzzled this works at all since APP_DOC_INTERNAL_URL is only defined inside a doc worker, and can contain only the URL for this worker, whose host isn't guaranteed to match any other worker's host.
Thanks, this sentence led me to change the functions, especially the naming to make it clearer: cba4e6c
The idea is that is should only be tested to check whether it matches its own internal URL, which I hope it is clearer now.
52f5599
to
cba4e6c
Compare
app/server/lib/requestUtils.ts
Outdated
|
||
if (isOwnInternalUrlHost(req.get('Host'))) { return true; } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the conflict with the changes I made in #859 (removing the GRIST_HOST
exception above, and changing the type of req
), I'm not sure what the intent of this exception is?
Something to keep in mind: every exception here is a potential vulnerability. Allowing a request based on the Host
header, without even checking Origin
seems strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to allow requests made internally, assuming that the url of APP_DOC_INTERNAL_URL
and APP_HOME_INTERNAL_URL
cannot be resolved outside the internal network.
That being said, and as I understand that the origin
header is passed along the fetch
call made internally, I should check why it cannot pass the test of the below code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test that fails without this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A manual test at least (duplicating the document), but with what I understand of the code, I have doubts, so I should try again without and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I confirm I have this error if I remove this line:
18:12:42 Credentials not supported for cross-origin requests
Duplicate Document
This is because below, the allowHost(req, new URL(origin))
checks whether the Origin
header contains the same domain as for the Host
header, because internally the Origin
header is passed along from the request the client made (the one to copy) to the internal one (the one to fetch the document).
BTW, the justification (that I think is given below) puzzles me a lot 🤔:
grist-core/app/server/lib/Authorizer.ts
Lines 686 to 687 in 3405ce9
const Origin = req.get('Origin'); // Pass along the original Origin since it may | |
// play a role in granular access control. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fflorent what you found matches my own research. I had likewise concluded that preserving the Host
header in some form when forwarding requests would be required, as access decisions in the Grist codebase seem to often be made by comparing Origin
to Host
(as an aside, here's an unfortunate detail : the getOrigin
function in requestUtils
actually returns a value based on the Host
header, not the Origin
header).
Also, these decisions can be quite involved, and may even require a database access as in the case of custom domains — which is unfortunately a code path that as far as I know no one is exercising apart from Grist SaaS, which makes me wary of suggesting any changes to that logic.
Your proposal that replicates the Host
header in getTransitiveHeaders
does go in the same direction I had in mind. But I'm afraid it may break some deployments, as the Host
header can play a critical role in dispatching an HTTP request. Here's a contrived example that shows how it could break:
- Given a Grist instance with a public URL of
https://grist.example.org
; - Handling an incoming user request at
https://grist.example.org/…
(therefore,Host
isgrist.example.org
) that triggers a forwarded request to e.g.http://grist-worker.internal
; - Copying the
Host
header from the public request to the forwarded request would result in making a request tohttp://grist-worker.internal
with aHost
header value ofgrist.example.org
(as opposed to aHost
value ofgrist-worker.internal
as is currently automatically generated by the HTTP client, i.e.node-fetch
as far as I can tell).
Depending on how the HTTP server ongrist-worker.internal
is setup, this could cause the request to fail. Ifgrist-worker.internal
resolves directly to a Grist worker, it's likely to work as the Node.JS HTTP server basically ignores the incomingHost
value. But if for some reason a reverse proxy/load balancer is in front of that server, it may reject the request or route it to the wrong server.
This very common problem with reverse proxying is generally addressed by copying the Host
header value to another header such as X-Forwarded-Host
instead, so that the code handling the request can make decisions based on the Host
value sent by the initial client, without compromising request routing. Of course, this implies careful handling of whether to trust such a header, since it can easily be spoofed by a client if it is not filtered at the reverse proxy (see e.g. @dsagal 's remark here about the related X-Forwarded-Proto
header).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulfitz the use of Origin
in access rules is another wrinkle I hadn't discovered yet, thanks for bringing it to our attention.
If I understand what you're suggesting, removing the Origin
header from forwarded requests would likely let them more easily through, as requests without an Origin
header are considered same-origin and exempted from any further checks in trustOrigin
.
It seems to me though there are other parts of the code that currently match the Host
header with e.g. parts of the session cookie, such as this code in Authorizer
— again, this appears to be custom host-specific so difficult to exercise outside of Grist SaaS. Correctly forwarding Host
would probably help such code work, even though it doesn't seem to pose a problem currently because it probably isn't hit by any forwarded requests, but that seems rather an accident.
In general, it seems to me that if any header (such as Origin
or Cookie
) is forwarded, any value that it may be checked against (here, Host
) needs to be made available as well to preserve the full security context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very common problem with reverse proxying is generally addressed by copying the Host header value to another header such as X-Forwarded-Host instead, so that the code handling the request can make decisions based on the Host value sent by the initial client, without compromising request routing. Of course, this implies careful handling of whether to trust such a header, since it can easily be spoofed by a client if it is not filtered at the reverse proxy (see e.g. @dsagal 's remark #859 (comment) about the related X-Forwarded-Proto header).
Could we take advantage of the express 'trust proxy'
feature? This way, the call to req.hostname
would return the value of the X-Forwarded-Host
header. We could introduce another env variable to set IP addresses / subnets written in CSV format, and document it as dangerous unless the X-Forwarded-For
, X-Forwarded-Host
, and X-Forwarded-Proto
are not cleared/overwritten by the RP.
PS:
@fflorent what you found matches my own research.
Actually, I remember you told us such a thing in the tchat, but was not able to retrieve it. This good idea was your own :).
PS2:
@fflorent In any case, I strongly suggest you rebase your branch onto main as the code in question was affected by the merging of #859 .
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the trust proxy
feature of Express is indeed meant to support reading these headers. A few things to keep in mind though:
- as explained in the Express docs, that option should not be enabled indiscriminately, so we would probably need to add an environment variable to opt into it;
trust proxy
does impact thehostname
on the ExpressRequest
object, but in several places ingrist-core
the parentIncomingMessage
interface is used instead, because WebSocket connections do not go through Express. This is why for example I had to re-implement Express'sreq.proto
here: Support HTTP long polling as an alternative to WebSockets #859 (comment)
93be827
to
e36f549
Compare
I am not very certain, I tend to think that we should anyway adapt the Also we would still need the But maybe my intuition is wrong, I am not working on that subject today (will do the next week) At least I am quite sure it does not harm, and quite a good news if that solves an issue you had! |
To add my two cents, at first glance, the idea of setting Setting One valuable insight from @jordigh is that Maybe what would be clearer is have |
283a31d
to
faf9fa2
Compare
9944492
to
d3d7e02
Compare
3c08203
to
79210fc
Compare
Also use GRIST_ORG_IN_PATH to make it work as expected, and GRIST_SINGLE_PORT=0 to ensure to query the pool of workers.
a7ab1b3
to
14b2bfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small comments
README.md
Outdated
| APP_HOME_URL | url prefix for home api (home and doc servers need this) | | ||
| APP_HOME_INTERNAL_URL | like `APP_HOME_URL` but used by the home server to reach any home workers using an internal domain name resolution (like in a docker environment). Defaults to `APP_HOME_URL` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistency in home workers
- for better or worse, we call the two types of server home servers
and doc workers
.
Btw do you know for a fact that the variable is used only by home servers and never by doc workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw do you know for a fact that the variable is used only by home servers and never by doc workers?
Sorry, I may not understand your question, I may reformulate what I understand: you ask me whether this variable is used by both workers or only by home workers?
It's actually used by both servers. The /compare
endpoint is run by the doc worker, and the POST /api/docs
is IIRC run by the home server and may be used to copy a document (when passing sourceDocumentId
).
I change the description as requested.
app/common/UserAPI.ts
Outdated
@@ -1156,6 +1156,27 @@ export class DocAPIImpl extends BaseAPI implements DocAPI { | |||
} | |||
} | |||
|
|||
/** | |||
* Reprensents information to build public doc worker url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*represents
app/common/UserAPI.ts
Outdated
} | ||
return docWorkerInfo.docWorkerUrl; | ||
export function getPublicDocWorkerUrl(homeUrl: string, docWorkerInfo: PublicDocWorkerUrlInfo) { | ||
return docWorkerInfo.selfPrefix ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you compared with null
you might not need the !
at the end of docWorkerInfo.docWorkerUrl!
.
app/common/gristUrls.ts
Outdated
* @param {string?} host The host to check | ||
*/ | ||
function isOwnInternalUrlHost(host?: string) { | ||
// Note: APP_HOME_INTERNAL_URL may also defined in doc worker as well as in Home worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*be defined
inconsistent capitalization
app/server/lib/DocWorkerUtils.ts
Outdated
import log from 'app/server/lib/log'; | ||
import {adaptServerUrl} from 'app/server/lib/requestUtils'; | ||
import * as express from 'express'; | ||
import fetch, {Response as FetchResponse, RequestInit} from 'node-fetch'; | ||
import { getAssignmentId } from './idUtils'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use same style of import as the rest? It helps with grist-electron/grist-static for messy reasons.
@paulfitz Thanks for your review, I'll take a look either next Friday or next week! |
This was a feature that was never fleshed out, which has caused trouble, and which will shortly be removed: gristlabs/grist-core#915 I propose removing the documentation for the feature entirely. Another option would be to strike it out and link to the discussion above. But given that we've no evidence anyone is using it, I've just deleted it. If someone complains we can revisit.
This was a feature that was never fleshed out, which has caused trouble, and which will shortly be removed: gristlabs/grist-core#915 I propose removing the documentation for the feature entirely. Another option would be to strike it out and link to the discussion above. But given that we've no evidence anyone is using it, I've just deleted it. If someone complains we can revisit.
49a7631
to
3bb3346
Compare
3bb3346
to
c3a40d7
Compare
ceefd82
to
e1569b8
Compare
e1569b8
to
de29a6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some typos and header order linting issues in latest changes, but substance looks ok and in any case only affects tests. Not going to do any more back and forth, given schedule. Thanks a lot for all your time on this @fflorent !
Bravo @fflorent 🎉 |
Context
On self-hosted instances, some places in the code rely on the fact that we resolves public domains while being behind reverse proxies. This leads to cases where features are not available, such as the "Duplicate document" one.
Bugs that are solved
On self-hosted instances:
The checkboxes are ticked as I successfully qualified the bug resolution with my patch.
Proposed solution
APP_HOME_INTERNAL_URL
env variable, which is quite the same asAPP_DOC_INTERNAL_URL
except that it may point to any home worker;/api/worker/:assignmentId([^/]+)/?*
return not only the doc worker public url but also the internal one, and adapt the call points likefetchDocs
;trustOrigin
;Fixes #185