Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authn protocol #7

Merged
merged 12 commits into from
Aug 10, 2022
Merged

Add authn protocol #7

merged 12 commits into from
Aug 10, 2022

Conversation

mkobetic
Copy link
Contributor

@mkobetic mkobetic commented Aug 9, 2022

This PR proposes a fairly significant change to our authn story. Instead of running a separate protocol, we would just add a Token type, carrying pretty much the same payload as authn.Request does, to the message/v1 protocol. The Token would need to be attached to every request and validated by the nodes (again same type of validation as authn did), so the validation would happen on every request instead of on every run of the authn protocol (although significant chunk of this could be cached since the token remains the same during its validity period).

The token can be carried as a standard HTTP/GRPC authorization header (The gateway maps these automatically https://github.com/grpc-ecosystem/grpc-gateway#mapping-grpc-to-http). The server side processing can be implemented with a GRPC interceptor.

Upsides:

  • no need to orchestrate the interaction of the the two protocols
  • the token is portable across nodes, so the client doesn't need to "reauthenticate" if it gets loadbalanced to a different node
  • implementation should be simpler

Downsides:

  • The token/signature validation happens on every request (we can cache some of that if needed)
  • We are losing the channel binding of the token that was mediated by the peer IDs, so the blast radius of a leaked token is necessarily larger (but comparable to JWT token style authorization)

proto/authn/v1/authn.proto Outdated Show resolved Hide resolved
@mkobetic
Copy link
Contributor Author

mkobetic commented Aug 9, 2022

@neekolas I'm not entirely sure what to do on the TS side. It doesn't seem like I should just add

export * from "./auth/v1/authn.pb"

to ts/index.ts. Wouldn't that yield a single package with everything mixed together?

@neekolas
Copy link
Collaborator

neekolas commented Aug 9, 2022

Wouldn't that yield a single package with everything mixed together?

Ya, we should probably move to namespacing the exports. export * as authn from './auth/v1/authn.pb'

@neekolas
Copy link
Collaborator

neekolas commented Aug 9, 2022

Although we don't have to export anything right now. All we are ultimately going to care about from Typescript is whatever GRPC Gateway API gets exposed. We shouldn't need any typescript for the actual protos.

@mkobetic
Copy link
Contributor Author

mkobetic commented Aug 9, 2022

Hrm, this will be a problem, when I run npm run generate I get:

authn/v1/authn.proto: is a proto3 file that contains optional fields, but code generator protoc-gen-grpc-gateway-ts hasn't been updated to support optional fields in proto3. Please ask the owner of this code generator to support proto3 optional.--grpc-gateway-ts_out:

@neekolas
Copy link
Collaborator

neekolas commented Aug 9, 2022

@mkobetic There is a hack listed in this issue, but it's not great. The suggestion is just to modify the generator and make it register itself as supporting optional fields when it doesn't. Given that we don't actually have optional fields (I'm not entirely sure why this is triggered, but I suspect it's because oneof fields with a single option are effectively the same as optional in proto3) that may be enough to make it generate valid code.

Open to better suggestions.

message ClientAuthResponse {
oneof union {
bytes token = 1; // if auth was successful
string error_str = 2; // if auth was not successful
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just use GRPC's native status codes, with custom messages, to pass error responses?

Copy link
Contributor Author

@mkobetic mkobetic Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can, I'm not quite sure how that works. I'm still just playing with this, trying to figure out what I want to do.

At this point I'm thinking the token could really be just a random number, that we associate with the wallet when auth succeeds (this is the replacement for peer Id in peer store). The client will have to pass the token along with every request to the API and until the token is purged from the store (purgeLoop) the requests will be allowed. I think this would be an equivalent of what we had.

But then it gets interesting what happens when we purge the token and the connection is still active, somehow the client would have to re-engage the auth flow to get a new token. Makes me wonder if we couldn't just use the AuthData bundled as a token with every request to begin with. Then we don't have to worry about the protocol interplay and requests will be automatically portable between nodes too. Thoughts?

Copy link
Contributor Author

@mkobetic mkobetic Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the token would be produced by the client and it would sign a timestamp and wallet address using the identity key signed by the wallet (very much like what Jazz had in the authn request). The server would just check the token on every request. If the token is too old (or from the future) or signatures don't check out, the request is rejected (probably using GRPC interceptors to implement this). That's it. So there wouldn't even be a separate Authn API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't sound crazy to me. Little bit of extra overhead verifying all the signatures on each publish, but there's no network calls or anything like that. I like that it's stateless and portable.

Apps would be able to generate tokens infinitely into the future, but that was possible with the previous flow as well (so long as they kept the same peerID). Probably not our biggest worry.

Wdyt @jazzz?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reject tokens with future timestamps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the timestamp basically a "slowly changing network variable"? I don't see stockpiling as an issue. There's no point for a valid client to stockpile tokens. There's nothing to gain from it other then increasing a chance of someone stealing them.

Copy link
Contributor Author

@mkobetic mkobetic Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

effectively establishing an upper and lower bound on the validity of tokens. This would mean when the rest of the malicious client mitigation work is completed we won't need to worry about the risk of legacy tokens.

The bounds I want to set with this design are

  1. node_time < token.timestamp + expiration(1h?)
  2. token.timestamp < node_time

So a token is only valid between token.timestamp and token.timestamp + expiration as interpreted in node time.

Copy link

@jazzz jazzz Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point for a valid client to stockpile tokens.

Agreed but I'm thinking in the context of a malicious client.

Isn't the timestamp basically a "slowly changing network variable"?

Time is deterministic, and since the timestamp field is client supplied and untrustworthy theres no guarantee of when that token was generated. By adding a random variable to be included, we build confidence that the token was generated after the challenge was made public.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other more substantial issues that need to be addressed before this is the limiting factor, so I'm happy to revisit it later. However having a "proof of life" style mechanic that establishes a lowerbound on when the token was generated would reduce risk of token abuse

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but I'm thinking in the context of a malicious client.

At least right now, the malicious client would also have the IdentityKeys needed to generate new tokens on demand.

@mkobetic
Copy link
Contributor Author

mkobetic commented Aug 9, 2022

@mkobetic There is a hack listed in this issue, but it's not great. The suggestion is just to modify the generator and make it register itself as supporting optional fields when it doesn't. Given that we don't actually have optional fields (I'm not entirely sure why this is triggered, but I suspect it's because oneof fields with a single option are effectively the same as optional in proto3) that may be enough to make it generate valid code.

Open to better suggestions.

It's because of the optional Signature on the PublicKey. We can actually remove optional on it since we require the wallet signature anyway. That's also why I dropped the separate signature field from the message, we my as well use the field on the PublicKey struct (I understand that Jazz was trying to preserve the original definition, but I don't think we really need to in this case). So I just removed optional on that field and it's not complaining now.

@neekolas
Copy link
Collaborator

neekolas commented Aug 9, 2022

@mkobetic OK, great.

service AuthnApi {
rpc Authenticate (ClientAuthRequest) returns (ClientAuthResponse) {
option (google.api.http) = {
post: "/authn/v1/authenticate"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this and the next two lines should be indented. Does the formatter let you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just run dev/lint on this and it mostly autofixes things. I'll check.

proto/authn/v1/authn.proto Outdated Show resolved Hide resolved
@mkobetic
Copy link
Contributor Author

@neekolas, the latest is how it would look if Authn was just a Token that is part of the message/v1 protocol. WDYT?

dev/ts/generate Outdated
docker run --rm -i -v${PWD}:/code xmtp/protoc -I=./proto --grpc-gateway-ts_out=ts message_api/v1/message_api.proto message_api/v1/authn.proto
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to want to change this to use protoc-gen-ts_proto for the authn protos to get real protobuf generated code. This is going to generate types and code for a GRPC Gateway JSON API.

Think we'll need to add the --ts_proto_opt=esModuleInterop=true flag as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest the change you have in mind. This stuff is black magic to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and commit it to this branch. It'll take a little fiddling for me to figure out too. Protoc arguments never work the way I expect them to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the changes I made here

Basically, I added the generator to the Docker image and run it as a second codegen step, since the output formats are different.

This build is getting complicated enough I am starting to wonder if using buf would make this more maintainable/readable. I hesitated to include it originally, since it's giving us very little value in xmtp-js. But maybe it makes sense here just to make this configuration more declarative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'd rather postpone switching to buf for a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I can take that one.


// Token is used by clients to prove to the nodes
// that they are serving a specific wallet.
message Token {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good

@mkobetic
Copy link
Contributor Author

I updated the PR description to summarize where we are atm (including Jazz's objections). Let me know if something's off.

I'd like to go ahead with this so that I can move forward on the node side. I'd appreciate some ✅ if we are far enough here.

// address of the wallet
string wallet_addr = 1;
// time when the token was generated/signed
uint64 timestamp = 2;
Copy link
Contributor Author

@mkobetic mkobetic Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought: Should we turn this into valid_until to allow the client to control the validity expiration within node enforced limits? (e.g. let's say the nodes will only accept up to an hour or something). This would allow sensitive clients to use more restrictive expiration than the nodes allow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to imagine a client developer choosing anything less than the maximum amount of time, given that the tokens are there to protect XMTP more than they are to protect the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at it differently. A stolen token is not an issue for the network, it doesn't matter who used up the limit. It is an issue for the user, since it allows someone to DOS them. So I was thinking a more paranoid user may want to reduce the expiration period at the cost of having to create tokens more often.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/paranoid/security minded/g

Copy link
Collaborator

@neekolas neekolas Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see it. If we have a problem where people are able to compromise these tokens and use them for DOS, pretty sure the response is going to be for compromised users to not use the network at all. Being DOS'd for 20 minutes and 60 feels about the same to the end user: unacceptable. Especially given the likelihood that whatever exploit gave the attacker the last token could be re-used to get the new one if the user doesn't change their behavior in between.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll leave this be.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a time the token was generated would be helpful into the future.

Though I could see having an addition field for expiry like in the JWT spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we care about having the client be aware of when the tokens are due to expire we could always add a response header where the server would provide that value. Not sure how useful it is given that the client can always just regenerate when a request fails with a 401 status code, but that feels preferable to having the client guess when the token is going to expire.

@mkobetic mkobetic merged commit 1a7490b into main Aug 10, 2022
@mkobetic mkobetic deleted the authn branch August 10, 2022 18:27
@neekolas
Copy link
Collaborator

Great. This is going to be quite easy to integrate into my branch.

@mkobetic
Copy link
Contributor Author

Sigh, as soon as I merged I realized that AuthData.timestamp should probably be AuthData.created_ns to fit the message/v1 theme. Worth fixing in a follow up?

I also note that there's also PublicKey.timestamp, but changing that would move us away from the xmtp-js stuff. Not quite sure that actually matters though.

@mkobetic
Copy link
Contributor Author

And I think I tripped up with the semantic releaser thingy 🤦

@mkobetic mkobetic mentioned this pull request Aug 10, 2022
@github-actions
Copy link

🎉 This PR is included in version 1.1.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants