-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key management and initialization design #15
Comments
I know cfc_webapp.py already has support for TLS, but I haven't tested it. Here are the TLS changes I propose to make
Of course these 3 steps also require certificate generation. I think these are the extent of what is necessary to protect data and transfer the key properly. @shankari do you know if its also necessary to modify how the mongodb transfers data to the cloud server? I'm not quite sure how that works but I'm focusing on the steps I took to transfer the key (and also verifying user connections/uploads are secure). |
We should definitely discuss key management in greater detail.
We could use a self-signed cert to get around that. Alternatively, we could continue using a private key. I don't see why we need to switch from a regular private key to a bi-directional TLS connection. It seems like it is more complicated wrt generating signed certs, and I am not sure what it actually buys us. Note also that we should have some way to recover the key in case people are switching phones or have lost their phone. Prior E2E encryption products have supported storing alternate representations (QR code, text representation) of the private key so that the user can store them offline for backup and recovery. |
These are all fantastic points and I look forward to discussing them soon. My reasoning behind bi-directional TLS was so we could ensure the user was the same user, but I'm not sure that's necessary (and doesn't seem to be worth the hassle). Uni-directional TLS (which I went through the process of updating my examples to utilize and will PR soon) seems sufficient for the most part. I do agree we need to talk about key recovery and the data life cycle in more detail. |
@njriasan at our discussion yesterday, you said that one of the problems with the multi-step approach (client id and then private key over TLS) was that connections could be refused if the host went down. I'd like to understand that better. In particular, I don't think we expect complex multi-stage protocols between the client and the container. The container exposes a REST API, so the client sends a request and receives a response.
|
At a high level, I think we should focus on the case in which the container is stateless and is discarded after every API call. As David pointed out, that is definitely possible in a cloud environment, and that is the worst case for this scenario. Keeping the containers around is a essentially a caching optimization. Great when it works, but the system still has to function as expected when it doesn't. |
And that brings me to one more question/food for thought. I think we have a pretty good sense of how the key management will work for calls generated from the client. The client (e.g. smartphone app) will be the only storage location for the key, and it will send the private key to the server from every call. However, the differentially private query layer, when it makes the calls to the container, will not have the private key. I believe this was the reason for caching the key in the container to begin with. But if we can't assume that the key is cached in the container, then we will need to have a server-initiated request to the client for the key, which is likely to slow down the query response. Maybe that is acceptable, but we should discuss this. |
These are definitely all great points. The issue I was suggesting for the REST API was rooted in how we contact each container. If we do a name lookup (the default way of doing things in Kubernetes) then yes if the container goes down it will immediately be replaced and we contacted that directly. This should be how we want to do things. However we had some questions about secure connections that cross through this load balancer (or router as we called it yesterday). Because of this we talked about trying to extract the specific IP-address that we could use to contact the individual container/pod. This could produce issues because these nodes can be moved and if we contact the IP-address directly we could connect to something else or nothing if the container goes down or is moved. Assuming this document can be trusted "https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-multi-ssl" I think the first two paragraphs explain why we don't want to just use the load balancer to forward to the services and instead ideally contact them directly. |
@njriasan I can think of a solution to this. May not be the best solution, but I am pretty sure it will work. And it is actually pretty elegant because the scheduler only needs to support one operation - running a script for a client. The scheduler does this by launching a container for the script and returning the container IP to the client. The client and container then handshake for attestation and authentication.
For user-initiated scripts - e.g. storing data, running inference algorithms, retrieving data - the client triggers the script in the normal way since it has the private key. For aggregate-querier initiated scripts, the querier sends a message to the client with the query it wants to run. The client checks the user policy on participating in aggregate queries. If the query matches the policy, then the client initiates the query script using the same mechanism as usual, and passes in the IP address of the querier. The query script runs and sends the result to the querier. If we get attestation in general to work, the script should be able to attest the querier before it sends the data to it. wrt the design alternatives,
|
It seems to me that there is a fairly simple solution to this. We only contact the IP address directly for the duration of a single call. Concretely, the flow could be something like:
The next client call will start again with step 1. This means that the vulnerable region is fairly short, and is bounded by the duration of a single call. Note also that the vulnerable region is sensitive to failures other than the container being killed. For example, the client <-> container connection could be dropped at any time due to network issues. The client has to (and does) handle connection failures in the vulnerable region anyway, primarily through retrying. So handling yet another failure mode is not really that big a deal. |
The code as written doesn't transfer the private keys securely over TLS. This change is likely extremely necessary to accurate gauge performance. Additionally I should probably add an authentication mechanism (basically assign on first use) to prevent users from adding data to any already allocated UPC instances. This authentication is likely separate from simply the user profile containing legal algorithm instances (although they can be joined by including a certificate in the user profile) from a need perspective.
Assigning to @njriasan (because it doesn't seem I can actually assign).
The text was updated successfully, but these errors were encountered: