-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review: How Top CDN manage content authentication? #4438
Comments
For our case this could be as simple as one token per gateway who would do fine grained restrictions per user (based on their plan with the App). |
Great work @zeeshanakram3 I don't get this comment
Some how the CDN server has to be able to know what token gives access to what resource? If you are saying there is just one universal authorization mode: access everything or nothing, then that seeems super impractical? Apps typically need more granularity? I would assume this information somehow is embedded or represented in token. Obviously, with such a scheme, authentication is super fast, you dont need the sort of mapping we have been talking about for Argus. Can you elaborate? |
Yes, your assumption is correct. When I say |
Ok, great,
|
So I looked at the internals of CDN token generation and verification on different providers. Although they use more or less the same format, however, there isn't a industry-wide standard that everyone needs to follow. Also, there is a lot of parameters customization available by each provider, and Applications can selectively choose some or all of the parameters to be included in the configured token. Azure CDN token formatList of some of the important Azure token authentication parameter values in our context:
Azure uses symmetric encryption to encrypt the parameters to create the token. Then the same key is used by CDN to decrypt the token and then view the token params, and eventually serve requests based on those token params. First. applications have to set up encryption parameters and generate a token/s. Then distribute those to consumers based onApp's internal authorization logic after he/she signs in using username/password (i.e., What privilege a user has, so give the user a token which includes a specific path set as the value of IBM CDN token formatIBM Cloud CDN authentication token format uses a similar scheme used by Azure CDN, that is, generating the token by encrypting the parameter values. Looking in detail, it seems like IBM CDN does not have it does not have its own CDN infrastructure, and underlying, it uses Akamai's CDN services; in fact, the Akamai client library for token generation logic can be used by IBM client applications Here is the list of token param options in the case of IBM/Akamai CDN
AWS Cloudfront token formatCloudFront uses a signed URLs/cookies approach using public-private pairs instead of encrypted tokens approach as described above. From the AWS docs:
Example of a sample policy: {
"Statement": [
{
"Resource": "URL or stream name of the file",
"Condition": {
"DateLessThan": {
"AWS:EpochTime": required ending date and time in Unix time format and UTC
},
"DateGreaterThan": {
"AWS:EpochTime": optional beginning date and time in Unix time format and UTC
},
"IpAddress": {
"AWS:SourceIp": "optional IP address"
}
}
}
]
} The Couldfront signed request URL consists of the following parts:
So, an example URL would be as follows: Here is an excellent description of the complete token authentication workflow from the docs:
I didn't find much info on the design and internal working of the CDN, as most of the information is available for Application developers as the target audience. However, based on the token format as described above, it seems that the only state that is needed on the CDN side is the client-configured key which will be used to verify the token content. So setting up this state requires upfront interaction between the app & CDN, and eventually, whenever the app has to do the CDN keys rotation/upgradation, etc. |
First off, thank you, very well done analysis. GeneralWhat is pretty clear is that the policies used are all (a) Non-interactive between CDN provider and content authority infrastructure, both per access and per new app user entry. It seems like the original impetus for moving to token-based authentication was to avoid per-user scaling cost of session based authentication on the provider side
Notice here that the problem is the memory footprint of doing the authentication of each request became too large. This means they were already not accepting the use of an on-disk database for holding this state, let alone a third party (QN) server with such a database, to be consulted per request. This is presumably because the per-request latency involved would be too large. Latency is indeed the key determinant to having Atlas give a good UX when rendering rich scenes with lots of assets. So I think we can add the following constraints (c) server side verification must be memory-only and not scale with request volume, at worst, if not stateless. ProblemWe are imagining an access policy which is sensitive to
and at the same time satisfies (a),(d) and either (b) or (c), where (c) is a second best. The by far most important case is personae 1.9, as this is where the UX really counts. There is also a subsidiary question of the trust model, e.g. can we trust gateways blindly to just tell Argus nodes that a given gateway user indeed also has a CRT or NFT? If we can do this blindly, then that helps allow Argus from having to look at, and maintain, any memeory state that pertains to the content directory. We have also discussed the prospect of having a dedicated lead controlled auth server. |
Because it wasn't mentioned before
There are more listed here. |
After reading through this issue and #4332 (comment) I'm convinced that there is actually a good use-case for introducing a separate authentication node managed by a working group lead. However, I think it only pays off if we make some non-trivial changes to Argus aswell. What I imagine is that the lead node would be responsible for authenticating for very specific roles, like:
And then if the authentication is successful, it would return a signed token that attests for a specific set of roles (the client would have to specify which roles they wish to authenticate for). This token should be relatively short-lived, because the chain state w.r.t. what roles are available for a given member can change very frequently. However it should also be long-lived enough so that the authentication doesn't become a burden and slow down the page loads significantly. I think something like 15-30 minutes could be a good compromise. Of course users can occupy hundreds of roles at a given time, so the client application will have to choose which roles to authenticate for and when + probably manage a local cache of tokens for different roles/sets of roles and choose which token to use depending on the asset being requested. On the Argus side, I imagine it to be storing access policy for each asset in-memory in a structure like:
The first double-map would serve as a way to avoid unnecessary iterations over all assets in case the channel/video status changes. In order to keep this structure up to date, ideally Argus would have it's own on-chain events processor, alternatively to make things simple initially, it could subscribe to postgresql database updates made by the current Hydra processor. Upon request for an asset Argus would then be able to very quickly execute the following steps:
|
Great! Let's start with the most important consideration: what should we call it 🤣
This looks solid, I must admit I am a bit rusty on the full context here, but looks very plausible. What changes would be needed to accomodate 1.9, which arguably is the most important one?
Why not just QN? If its for speed of processing, then are you thinking of a new Subsquid node? |
After reviewing the top CDN offerings, it seems that all of these follow more or less the same authentication approach to mitigate several problems, e,g, hotlinking, denial of services, etc.
I reviewed the following CDN providers.
Usually, a CDN involves three actors:
In Joystream network topology, I think there is not a clear distinction between these three actors. (It will be more clear after going over the authentication flow of an example provider)
IBM CDN Authentication Flow
Azure CDN Authentication Flow
AWS Cloudfront Authentication Flow
On CloudFront, applications configure CloudFront to require that users access your files using either signed URLs or signed cookies. The application has to choose a public-private keypair and then 1) sign URL/cookies using a private pair, 2) give that signed message to an authenticated user, and 3) which the user/consumer will use in a CDN request for private content.
The above authentication flows gives a hint that before a consumer could interact with the CDN, a backend communication happens b/w CDN and web application/client, during which the client configures the CDN and generates some token, which is then passed to the consumer, who uses this token in a request for some asset to the CDN.
However, in Joystream, the CDN (Argus) itself has to do the authentication, i.e., verifying infra access key, verifying persona, etc., at least for the first request; if we go without the lead node, and then it can issue a token just like client issues it in previous examples, which then can be reused in subsequent requests to the CDN.
Comments
HTTPS
so that connections are encrypted when a CDN communicates with viewers, hence whatever scheme is used, token or signed message, the communication is safe from token reuse or replay attacks from the malicious actors.pay as you go
plan.The text was updated successfully, but these errors were encountered: