Gather use cases for service accounts as selectors #274

tssurya · 2024-11-19T10:01:47Z

This came from discussions at KubeCon, I want to make sure they are captured here correctly.
Closes: #173

k8s-ci-robot · 2024-11-19T10:01:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tssurya
Once this PR has been reviewed and has the lgtm label, please assign aojea for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2024-11-19T10:02:04Z

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name	Link
🔨 Latest commit	`c53903f`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/674575a1f736e20008a0987a
😎 Deploy Preview	https://deploy-preview-274--kubernetes-sigs-network-policy-api.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

tssurya · 2024-11-19T17:18:51Z

npeps/npep-173-service-account-matches.md

+
+## Non-Goals
+
+* There might be other identity management constructs like SPIFFE IDs which are outside the scope of this enhancement. We can only provide a way to select constructs known to core Kubernetes.


Implementor bottlenecks - since implementations might not want to consume the whole CRD...

there was a conclusion at kubecon that it makes more sense for it to be a new CRD?

what about L7 layer CRD instead of the L3/L4 matches since if its in the policy its usually for L4/L3 IP based matches

The typical service mesh pattern is required/expected identity-based authorization policy at L4 with:

some provision for configuring a default deny policy

deny all connections except those coming from an allowlisted set of identities

only applied logically by meshes after the bits have been allowed to flow at L3

some mesh implementations with a CNI component may collapse this logic to enforce at L3

potentially some use case to restrict L3 based on ServiceAccount in NetPol to drop connections before mTLS can even be established, but writing this in two separate places for defense in depth feels like a burdensome configuration for cluster operators - curious if a single vendor-neutral policy in a separate CRD can/should be applied by multiple implementations in a complementary manner

ideally an "audit" mode to preview the impact of default deny after some granular "allow" rules have been configured

allowing TCP connection based on identity only if a lower level like CNI did not already drop/block the attempt at L3

support for additional constraints at L7 (such as only allowing requests from a given identity on specific paths or with specific HTTP/gRPC methods)

many (most?) meshes support this now, but historically this capability was often added years later than initial L4 AuthZ policy implementations

ideally this is an extra optional field on an existing AuthZ policy, not a separate CRD from the L4 identity-based policy

thanks @mikemorris for the insights here! very valuable, I have tried to capture some parts of this point and the point @danwinship mentioned above under "unresolved discussions" so that its written down somewhere..

some mesh implementations with a CNI component may collapse this logic to enforce at L3

would you know some sample implementations for mesh that consolidates this with the CNI level? I am guessing Cilium is one?

aojea · 2024-11-20T13:09:39Z

site-src/npeps/npep-133-fqdn-egress-selector.md

@@ -0,0 +1,440 @@
+# NPEP-133: FQDN Selector for Egress Traffic


is this related?

Ah it looks like this is the rendered website source for a prior NPEP, probably need to just regenerate the current website content in a new PR and rebase this after that merges.

yeah its not related at all, running make docs generated all this extra stuff, I can split it into a new PR

aojea · 2024-11-20T15:07:06Z

npeps/npep-173-service-account-matches.md

+Ability to express subjects and peers in (Baseline)AdminNetworkPolicy API
+using [Kubernetes service accounts](https://kubernetes.io/docs/concepts/security/service-accounts/).
+
+## Goals


I think that the main goal is to provide network policies based on identities, and service accounts are a good primitive in Kubernetes to do that.
I also think that we want to look for this new model to be more scalable to avoid implementation having to track all the existing Pod IPs

@thockin your thoughts?

I don't understand the interest in describing this as being "based on identities", particularly if you say that arbitrary CIDRs can be "identities". How is calling this "based on identities" better than saying "you can select CIDRs or service accounts or pod labels"?

particularly if you say that arbitrary CIDRs can be "identities"

I don't think arbitrary CIDRs should be considered as identities - there's a bit of nuance in that some implementations of identity-based authorization policies may ultimately resolve those identities to IPs for enforcement, but an end-user would not be interleaving IP-based policy rules with identity-based policy rules even if the dataplane may collapse them together. https://www.solo.io/blog/could-network-cache-based-identity-be-mistaken and https://cilium.io/blog/2024/03/20/improving-mutual-auth-security/ have more specific details on the potential lossiness of this approach and possible mitigations.

yea I have added the scale and consistency user story to the next section under user stories, maybe I can change the goals to reflect "identity" or refactor the way I am writing "service account as a way to select pods"

Depending on whether we feel this is a new API CRD or part of ANP -> in which case we need to decide also the points Dan is raising: https://github.com/kubernetes-sigs/network-policy-api/pull/274/files#r1850516386 we might need to tweak the TLDR/Goals sections here

I have tried to change the TLDR and Goals to reflect the high level abstract ones instead of the specifics I had in mind in putting them into ANP/BANP based on the discussion we had in the sig-network-policy-api meeting on tuesday (19th) as its not really clear if this is going to be part of netpol API or a new API

I second mike here. The main benefits I can see are;

Users dont interact with CIDRs/IPs.

Some implementations of the API dont need to track IPs to provide this functionality (when using identities).

Serviceaccounts are more secure than labels. e.g they dont span namespaces, can not be changed at runtime.

We dont have to be opinionated on how the API would be implemented, and service account as a way to select pods can just be an implementation detail.

My comments were based on Cilium's definitions, where (AIUI) everything you can apply a network policy toward is considered an "identity". (e.g., this blog post talks about how it turns podSelectors into "identities").

I had initially assumed people were talking about the same thing here. In particular, I had assumed that people were still talking about having a mix of "traditional" NetworkPolicy rules and new service-account-based NetworkPolicy rules, but calling the whole thing "identity-based", which seemed pointless.

But it sounds like what most of the people here actually want is a separate parallel network policy stack, which only uses "identity-based" rules based on service-account/mTLS/SPIFFE/whatever.

My comments were based on Cilium's definitions, where (AIUI) everything you can apply a network policy toward is considered an "identity". (e.g., this blog post talks about how it turns podSelectors into "identities").

I

cilium identities are clear a no goal, and now I realize that makes the terminology confusing.

If we consider identity as "the set of characteristics that uniquely identify an application (running in a Pod)".
IPs can uniquely identify a Pod, but if a deployment is rolled out the IPs will change (pods don't restart are recreated), so the rules applied directly to IPs will not be identity based.

We can use the labels to identify the pods of the deployments/daemonsets but this identity system seems mostly based on the implementation details of kubernetes, and labels can be added arbitrarily.

Service Accounts by the kubernetes docs definitions are "a type of non-human account that, in Kubernetes, provides a distinct identity in a Kubernetes cluster. Application Pods, system components, and entities inside and outside the cluster can use a specific ServiceAccount's credentials to identify as that ServiceAccount. "

I think that ServiceAccounts can work independently on how we translate the API to the implementation, map ServiceAccount to IPs, use ServiceAccount Tokens or Certificates , .... and also allow us to define cross-cluster or cross-services policies

hmm my understanding was the same as Dan's which is why I called out the non-goal here but seems like that is one of the main goals looks like :)

aojea · 2024-11-20T15:09:26Z

npeps/npep-173-service-account-matches.md

+
+## User-Stories/Use-Cases
+
+1. **As a** cluster admin **I want** to select pods using their service accounts instead of labels **so that** I can avoid having to setup webhooks and validation admission policies to prevent users from changing labels on their namespaces and pods that will make my policy start/stop matching the traffic which is undesired


I don't think we want to "select pods" as using the service accounts as an aggregation of pods, I think we want to provide identities to the applications and then implement policies based on those identities, how to implement it is a different thing.

hmm during the kubecon meeting I thought for the purpose of this working group if we were to put it into (admin) network policies then it has to be pods because we don't really have any other identity in core kubernetes that the API recognises and do we define abstract things in the API? Plus the L3/L4 is only where we implement policies not at L7,
however if we plan to go with new CRD for expressing this then we could call it an identity but even then service account today is only for pods right? what are the other identites we plan to cover use cases for? I called our SPIFFE use case as non goal since we don't plan to address that in this API right? I can move it out of the non-goals if the word "identity" is better suited than "serviceaccounts of pods"

I personally think we should have a separate API for this that is strictly focused on identities. The starting point has to be identities based on Service Accounts that both mesh and CNI implementations would have a path to implement.

however if we plan to go with new CRD for expressing this then we could call it an identity but even then service account today is only for pods right? what are the other identites we plan to cover use cases for? I called our SPIFFE use case as non goal since we don't plan to address that in this API right? I can move it out of the non-goals if the word "identity" is better suited than "serviceaccounts of pods"

I'd personally rather focus on identity, with ServiceAccount as our starting point. Even though that's roughly the same thing, it leaves room for future expansion into other identities, potentially following a SPIFFE format. That seems like one of the key benefits of an identity-based policy, so I'd like to ensure we're leaving room in the API to eventually support identities outside of the local cluster.

aojea · 2024-11-20T15:10:22Z

npeps/npep-173-service-account-matches.md

+## User-Stories/Use-Cases
+
+1. **As a** cluster admin **I want** to select pods using their service accounts instead of labels **so that** I can avoid having to setup webhooks and validation admission policies to prevent users from changing labels on their namespaces and pods that will make my policy start/stop matching the traffic which is undesired
+2. **As a** cluster admin **I want** to select pods using their service accounts instead of labels **so that** I can avoid the scale impact caused due to mutation of pod/namespace labels that will cause a churn which makes my CNI implementation react to that every time user changes a label.


If this ends on translatiing service accounts to Pods, the scale problem is going to be worse, because CNI/Netwokr plugins will need to watch both service accounts and pod objects, today they just watch one

no, the idea is that they'd just match on pod.spec.serviceAccountName. They don't have to look at the actual ServiceAccounts at all

yea I think we all agreed on that day "using labels to select serviceAccounts" should be dropped cause then it just re-introduces the problems and instead directly match on the name of the account, but yea that is an implementation detail

npeps/npep-173-service-account-matches.md

danwinship · 2024-11-20T15:23:03Z

npeps/npep-173-service-account-matches.md

+## Goals
+
+* Implement service account as a way to select pods in the policy's subject
+* Implement service account as a way to select pods in the policy's peers


There were two completely independent sets of goals in the discussion:

Use service accounts / "identities" as a more scalable / less accidentally-explodable way of describing relevant policies about pods.

Provide a standard way to configure/describe the service mesh behavior of intercepting traffic and deciding whether to allow it based on information in the TLS packets.

The latter part was much more complicated and problematic, and I'm not sure we came up with any good answers. In particular, there seemed to be no good way to say "deny all, except allow particular serviceAccount-to-serviceAccount connections" if the "deny all" was going to be implemented by the network plugin but the "allow based on serviceAccount" was going to be implemented by the service mesh. It seemed like the best we could do was to have the network plugin implement that as "deny all except allow particular pod-to-pod connections by matching serviceAccounts up to pod IPs" and then having the service mesh implement "allow TLS connections by matching serviceAccounts up to TLS certificates", which would, in the normal case, be entirely redundant because the service mesh is already ensuring that all pod-to-pod traffic uses exactly the expected TLS certificates, so the TLS checking rule would end up allowing everything since the IP-checking rule would already have filtered out any disallowed connections.

But I feel like, as the meeting ended, we had not yet given up on that idea? (Partly because the meeting ended abruptly because we ran out of time...)

if the "deny all" was going to be implemented by the network plugin but the "allow based on serviceAccount" was going to be implemented by the service mesh

Yea I think that should not be expected to function seamlessly - the "deny all" behavior needs to be implemented by the mesh layer, and is only applied if the IP-based connection has been allowed by the CNI layer. Some cluster operators with a particularly "defense in depth" posture may stack policies like "deny all except port 443" or even want a ServiceAccount selector on L3 NetPol (for use case 1) in addition to cryptographic authorization provided by a mesh, although that would be particularly burdensome to operate as it would require coordination between networking and app dev teams (who typically own AuthZ policies for their own services) to update authorization policies and network policies (which can still be a familiar pattern in more traditional organizations).

Right, but it's not necessarily "deny all". You could have an AdminNetworkPolicy that says:

egress: - action: Deny to: - pods: podSelector: ... - action: Allow to: - serviceAccount: foo - action: Deny to: - pods: podSelector: ...

where the identity-based part of the policy is sandwiched between two non-identity-based parts.

There does not seem to be any plausible way to have the podSelector parts of that rule be implemented by the network plugin but have the serviceAccount part be implemented by the mesh (without implementation-specific coordination between them, or, obviously, if the network plugin is the mesh).

More generally, it feels like all network policy (in the allow/deny sense) needs to be implemented by a single component. Or at least, for every pair of destinations A and B, there must be only a single component implementing network policy between them. (eg, you could potentially have a service mesh doing TLS-based policy for pod-to-pod traffic, with another network plugin doing IP-based policy for pod-to-external and external-to-pod traffic).

There does not seem to be any plausible way to have the podSelector parts of that rule be implemented by the network plugin but have the serviceAccount part be implemented by the mesh

Yea, absolutely agreed - that was a significant motivation for the conclusion we somewhat reached at the end of the discussion that a new CRD specifically for ServiceAccount/identity-based authorization policy only would likely be preferable.

this is a good "UNRESOLVED NEED TO THINK ABOUT" discussion that I should probably capture in one of the subsections, will add a unresolved discussions section and start putting these there

robscott · 2024-11-25T18:41:11Z

Thanks for writing this up @tssurya!

/cc @robscott @LiorLieberman

thockin · 2024-11-25T20:47:02Z

I will take a look at this ASAP, too (rough with US holiday this week).

Signed-off-by: Surya Seetharaman <[email protected]>

npinaeva · 2024-11-26T14:35:27Z

npeps/npep-173-service-account-matches.md

+1. **As a** cluster admin **I want** to select pods using their service accounts
+   instead of labels **so that** I can avoid having to setup webhooks and
+   validation admission policies to prevent users from changing labels on their
+   namespaces and pods that will make my policy start/stop matching the traffic
+   which is undesired


I am not sure I understand this user story: the only difference between labels vs serviceAccount here is that labels can be changed after pod creation, and serviceAccount can't. But the user can still create a new serviceAccount and a new pod using this account, that will not "match" a policy.
This user story kinda suggests that admin sets up ANPs using pod labels, and also creates or labels all pods in the cluster to match created ANPs, and doesn't want the users to be able to change those pod labels?

There is a huge different IMO.

This is as much a human issue rather than a technical one, but users are very used to using serviceAccount as a security boundary -- it serves essentially no purpose other than security, so its heavily scrutinized. Both in human reviews ("why are you changing the SA") and in validation tooling

Its far more likely to have 'update pod labels' permission on a workload than 'create a pod'. I'd wager >50% of clusters have a DaemonSet with that permission (which implies that if ANY node is compromised, any pod-label-based network policy can be bypassed).

The cross-namespace label selection aspect is easy to get wrong. For example "allow: role=admin" seems fine, except I gave my devs access to a namepace dev-blah and they have arbitrary label access. Oops! This can be avoided by using a namespace selector as well, but the API doesn't enforce it. With SA it would (presumably) be impossible to not specify the namespace

Some more discussion in https://blog.howardjohn.info/posts/netpol-api/ and https://blog.howardjohn.info/posts/node-zerotrust/

aojea · 2024-11-26T14:49:26Z

npeps/npep-173-service-account-matches.md

+    between the CNI plugin that implementations the network policy at L3/L4 and
+    service mesh implementation that implements the policy at L7?
+    * One way is to probably split the implementation responsibility:
+      1. CNI plugin can take of implementing denyAll connections and


This is assuming that is either CNI plugin or service mesh, and can perfectly be another component implementing it, see kube-network-policies per example ... I think we need to be agnostic of the implementation and focus on the behaviors we want to implement ... "we want to express that a group of pods using a specific service account can or can not communicate with another group identified by another service account" ... at what point happens the policy enforcement should not be in the scope and left to the implementation

I don't see how a non-CNI can implement it then. If both the CNI and the mesh are trying to enforce the same rule, it is either redundant (they both apply the same logic) or broken (one accepts, one denies).

For instance, I may have kube-network-policies and Istio, and say "allow SA foo". I get a TLS request from out-of-cluster with a TLS SAN mapping to foo. Istio would allow it but kube-network-policies would deny it.

IMO we need something like networkPolicyClass to specifically select which layer handles it

aojea · 2024-11-26T14:51:49Z

npeps/npep-173-service-account-matches.md

+    with other types of selectors like namespaces, pods, nodes, networks, fqdns and
+    expecting mesh implementations to implement the full CRD just to get the identity
+    based selection might not be practical
+  * Making it part of new API: We fall back to compatibility problem of the different


why compatibility issues, the different policies applies should be an AND, in which sense is this incompatible?

I dont see compatibility issues but I can see terrible UX if we end up with another API which sits on top of netpol. This means that as a user, to allow point A to talk with point B on a cluster with default deny I need to:

add a netpol to allow the traffic

add an identity-based policy to allow it

potentially add an L7 policy if I want L7 attributes based policy.

I think there are a non-trivial set of implementations that would like to support an identity-based version of network policy, but would not support the additional set of APIs based on label selectors. To me, that suggests that it would be cleaner to have a new API focused on identity-based APIs.

I agree that @LiorLieberman's example is not a great UX, but I'd hope that most users would be fully served by only one of these APIs and would not need to layer like that. Could we express default deny within this new policy to avoid layering?

FWIW in Istio we would ideally have a policy that consists only of SA/identity and not pod label selectors.

aojea · 2024-11-26T14:53:10Z

npeps/npep-173-service-account-matches.md

+      CNI component in them that collapses the logic of enforcement - so maybe
+      that might become a requirement that the implementation be able to handle
+      end to end enforcement of the full policy
+  * We might also want to express additional constraints at L7 (such as only


I don't think we should put this in scope, L7 is a totally different layer with much more complexity and infinite number of protocols with different semantics

+1 I think we should not care about it and leave it to Gateway/mesh implementations. Most probably will be a different API.

I generally agree. FWIW Istio literally splits these layers into two components (so there are really 3 layers in the system: L3/L4 CNI, L4 TLS, L7 TLS + HTTP)

LiorLieberman

Thanks for capturing the notes @tssurya! I left a few comments

LiorLieberman · 2024-11-26T18:11:47Z

npeps/npep-173-service-account-matches.md

+Ability to express subjects and peers in (Baseline)AdminNetworkPolicy API
+using [Kubernetes service accounts](https://kubernetes.io/docs/concepts/security/service-accounts/).
+
+## Goals


I second mike here. The main benefits I can see are;

Users dont interact with CIDRs/IPs.

Some implementations of the API dont need to track IPs to provide this functionality (when using identities).

Serviceaccounts are more secure than labels. e.g they dont span namespaces, can not be changed at runtime.

We dont have to be opinionated on how the API would be implemented, and service account as a way to select pods can just be an implementation detail.

LiorLieberman · 2024-11-26T18:21:14Z

npeps/npep-173-service-account-matches.md

+This NPEP tries to capture the design details for service accounts as selectors
+for policies.
+
+## User-Stories/Use-Cases


one of the goals/motivation is also to decrease the implementation complexity and allow larger scale. I think we want to cover this here or in the goals section

LiorLieberman · 2024-11-26T18:24:03Z

npeps/npep-173-service-account-matches.md

+      CNI component in them that collapses the logic of enforcement - so maybe
+      that might become a requirement that the implementation be able to handle
+      end to end enforcement of the full policy
+  * We might also want to express additional constraints at L7 (such as only


+1 I think we should not care about it and leave it to Gateway/mesh implementations. Most probably will be a different API.

LiorLieberman · 2024-11-26T18:26:19Z

npeps/npep-173-service-account-matches.md

+  * NetworkPolicies apply on L3 and L4 while Meshes operate at L7 mostly. So when
+    trying to express "denyAll connections except the allowed serviceAccount to
+    serviceAccount connections"; how do we split the enforcement in this scenario
+    between the CNI plugin that implementations the network policy at L3/L4 and
+    service mesh implementation that implements the policy at L7?
+    * One way is to probably split the implementation responsibility:
+      1. CNI plugin can take of implementing denyAll connections and
+         allowed serviceAccount to serviceAccount connections upto L3/L4
+      2. service mesh implementation can implement the allowed serviceAccount to
+         serviceAccount connections on L7


This concerns me the most. Cant think of a good way to divide the responsibility between mesh and cni in the same API without creating an extremely complex UX (and perhaps less portable experience)

LiorLieberman · 2024-11-26T18:32:35Z

npeps/npep-173-service-account-matches.md

+    with other types of selectors like namespaces, pods, nodes, networks, fqdns and
+    expecting mesh implementations to implement the full CRD just to get the identity
+    based selection might not be practical
+  * Making it part of new API: We fall back to compatibility problem of the different


I dont see compatibility issues but I can see terrible UX if we end up with another API which sits on top of netpol. This means that as a user, to allow point A to talk with point B on a cluster with default deny I need to:

add a netpol to allow the traffic

add an identity-based policy to allow it

potentially add an L7 policy if I want L7 attributes based policy.

danwinship · 2024-11-27T18:01:12Z

npeps/npep-173-service-account-matches.md

+    service mesh implementation that implements the policy at L7?
+    * One way is to probably split the implementation responsibility:
+      1. CNI plugin can take of implementing denyAll connections and
+         allowed serviceAccount to serviceAccount connections upto L3/L4


The problem with this, as I was saying in #274 (comment), is that it seems pointless. If this is going to work correctly, without accidentally allowing any connections that aren't supposed to be allowed, then the ANP/NP implementation needs to let through exactly the set of connections that the mesh implementation is going to inspect, which implies it needs to understand "serviceAccount selectors" and how they correspond to pod IP addresses, but at that point, it can just validate the serviceAccount selectors itself based entirely on IPs, and the semantics of the service mesh guarantee that this is exactly equivalent to validating based on the TLS certificates, so the mesh doesn't actually have to do anything.

So we've been thinking about the TLS-based validation as being something roughly parallel to the "NetworkPolicy" step of our policy evaluation chain:

+--------------------+ | AdminNetworkPolicy | +--------------------+ /\ / \ / \ +---------------+ +------------------+ | NetworkPolicy | | TLS-based policy | +---------------+ +------------------+ \ / \ / \/ +----------------------------+ | BaselineAdminNetworkPolicy | +----------------------------+

but what if instead it was parallel to the BANP step?

+--------------------+ | AdminNetworkPolicy | +--------------------+ | | | +---------------+ | NetworkPolicy | +---------------+ / \ / \ / \ +------+ +-----------+ | BANP | | TLS-based | +------+ +-----------+

We'd say, if you want to use TLS-based policy, then that becomes the baseline (at least for pod-to-pod traffic). Then we don't have to worry about interleaving IP-based and TLS-based policy, because you'd only reach the TLS-based policy step if you had already exhausted the IP-based steps. (This doesn't mean you can't have "deny all unless accepted by serviceAccount match", it just means that the "deny all" part has to be implemented by the TLS-based policy implementation rather than by the core ANP/NP implementation in this case.)

This may go with the idea I had vaguely tossed out about replacing BaselineAdminNetworkPolicy with "ClusterDefaultPolicy" as part of trying to make tenancy make sense...

at that point, it can just validate the serviceAccount selectors itself based entirely on IPs, and the semantics of the service mesh guarantee that this is exactly equivalent to validating based on the TLS certificates, so the mesh doesn't actually have to do anything.

Isn't there some eventual consistency at play here? With a TLS certificate, I know the identity of the Pod with confidence. With an IP address, I'm assuming it matches the Pod that had that IP based on the most recent data I have, but if my Pod -> IP mapping is out of date, this isn't as strong of a guarantee as a TLS cert is.

This doesn't mean you can't have "deny all unless accepted by serviceAccount match", it just means that the "deny all" part has to be implemented by the TLS-based policy implementation rather than by the core ANP/NP implementation in this case.

+1, I think I was trying to get to a similar point in another comment. "Deny all" could also be implemented at the identity-based policy layer.

This may go with the idea I had vaguely tossed out about replacing BaselineAdminNetworkPolicy with "#178 (comment)" as part of trying to make tenancy make sense...

It would definitely be worth exploring a way to express "default deny" in a sufficiently generic way that any category of implementation could support it. For example, that seemed to be focused on namespace granularity, which could be some common ground in terms of implementability.

howardjohn · 2024-12-02T22:52:25Z

npeps/npep-173-service-account-matches.md

+1. **As a** cluster admin **I want** to select pods using their service accounts
+   instead of labels **so that** I can avoid having to setup webhooks and
+   validation admission policies to prevent users from changing labels on their
+   namespaces and pods that will make my policy start/stop matching the traffic
+   which is undesired


There is a huge different IMO.

This is as much a human issue rather than a technical one, but users are very used to using serviceAccount as a security boundary -- it serves essentially no purpose other than security, so its heavily scrutinized. Both in human reviews ("why are you changing the SA") and in validation tooling

Its far more likely to have 'update pod labels' permission on a workload than 'create a pod'. I'd wager >50% of clusters have a DaemonSet with that permission (which implies that if ANY node is compromised, any pod-label-based network policy can be bypassed).

The cross-namespace label selection aspect is easy to get wrong. For example "allow: role=admin" seems fine, except I gave my devs access to a namepace dev-blah and they have arbitrary label access. Oops! This can be avoided by using a namespace selector as well, but the API doesn't enforce it. With SA it would (presumably) be impossible to not specify the namespace

Some more discussion in https://blog.howardjohn.info/posts/netpol-api/ and https://blog.howardjohn.info/posts/node-zerotrust/

howardjohn · 2024-12-02T22:54:33Z

npeps/npep-173-service-account-matches.md

+3. **As a** cluster admin my workloads have immutable identities and **I want**
+   to apply policies per workloads using their service accounts instead of labels
+   **since** I want to have eventual consistency of that policy in the cluster.
+


Plausibly: As a cluster admin, I want to assert policies against workloads outside of the cluster

Whether this is to other k8s clusters or non-k8s, having SA enables this if you make the implementation choice that SA maps to a transport identity (i.e. in Istio it maps to TLS SAN)

howardjohn · 2024-12-02T22:55:45Z

npeps/npep-173-service-account-matches.md

+    serviceAccount connections"; how do we split the enforcement in this scenario
+    between the CNI plugin that implementations the network policy at L3/L4 and
+    service mesh implementation that implements the policy at L7?
+    * One way is to probably split the implementation responsibility:


An alternative option is to have a networkPolicyClass (or similar).

howardjohn · 2024-12-02T22:57:45Z

npeps/npep-173-service-account-matches.md

+    between the CNI plugin that implementations the network policy at L3/L4 and
+    service mesh implementation that implements the policy at L7?
+    * One way is to probably split the implementation responsibility:
+      1. CNI plugin can take of implementing denyAll connections and


I don't see how a non-CNI can implement it then. If both the CNI and the mesh are trying to enforce the same rule, it is either redundant (they both apply the same logic) or broken (one accepts, one denies).

For instance, I may have kube-network-policies and Istio, and say "allow SA foo". I get a TLS request from out-of-cluster with a TLS SAN mapping to foo. Istio would allow it but kube-network-policies would deny it.

IMO we need something like networkPolicyClass to specifically select which layer handles it

howardjohn · 2024-12-02T23:00:24Z

npeps/npep-173-service-account-matches.md

+      CNI component in them that collapses the logic of enforcement - so maybe
+      that might become a requirement that the implementation be able to handle
+      end to end enforcement of the full policy
+  * We might also want to express additional constraints at L7 (such as only


I generally agree. FWIW Istio literally splits these layers into two components (so there are really 3 layers in the system: L3/L4 CNI, L4 TLS, L7 TLS + HTTP)

howardjohn · 2024-12-02T23:02:00Z

npeps/npep-173-service-account-matches.md

+    with other types of selectors like namespaces, pods, nodes, networks, fqdns and
+    expecting mesh implementations to implement the full CRD just to get the identity
+    based selection might not be practical
+  * Making it part of new API: We fall back to compatibility problem of the different


FWIW in Istio we would ideally have a policy that consists only of SA/identity and not pod label selectors.

tssurya · 2024-12-03T16:43:45Z

lots of comments here! :) will look through them and try to cover/reflect the discussions or at least write it down in the next update I push

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 19, 2024

k8s-ci-robot requested a review from danwinship November 19, 2024 10:01

k8s-ci-robot requested a review from Dyanngg November 19, 2024 10:01

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 19, 2024

tssurya mentioned this pull request Nov 19, 2024

[ENHANCEMENT] Support for ServiceAccount Match #173

Open

tssurya commented Nov 19, 2024

View reviewed changes

aojea reviewed Nov 20, 2024

View reviewed changes

npeps/npep-173-service-account-matches.md Outdated Show resolved Hide resolved

danwinship reviewed Nov 20, 2024

View reviewed changes

k8s-ci-robot requested review from LiorLieberman and robscott November 25, 2024 18:41

Gather use cases for service accounts as selectors

c53903f

Signed-off-by: Surya Seetharaman <[email protected]>

tssurya force-pushed the serviceAccountMatch branch from 23d6732 to c53903f Compare November 26, 2024 07:15

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 26, 2024

npinaeva reviewed Nov 26, 2024

View reviewed changes

aojea reviewed Nov 26, 2024

View reviewed changes

LiorLieberman reviewed Nov 26, 2024

View reviewed changes

danwinship reviewed Nov 27, 2024

View reviewed changes

howardjohn reviewed Dec 2, 2024

View reviewed changes


		## Non-Goals

		* There might be other identity management constructs like SPIFFE IDs which are outside the scope of this enhancement. We can only provide a way to select constructs known to core Kubernetes.

		@@ -0,0 +1,440 @@
		# NPEP-133: FQDN Selector for Egress Traffic


		## User-Stories/Use-Cases

		1. As a cluster admin I want to select pods using their service accounts instead of labels so that I can avoid having to setup webhooks and validation admission policies to prevent users from changing labels on their namespaces and pods that will make my policy start/stop matching the traffic which is undesired

Gather use cases for service accounts as selectors #274

Are you sure you want to change the base?

Gather use cases for service accounts as selectors #274

Conversation

tssurya commented Nov 19, 2024

k8s-ci-robot commented Nov 19, 2024

netlify bot commented Nov 19, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemorris Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

tssurya Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemorris Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemorris Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott commented Nov 25, 2024

thockin commented Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiorLieberman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tssurya commented Dec 3, 2024 • edited Loading

netlify bot commented Nov 19, 2024 •

edited

Loading

mikemorris Nov 20, 2024 •

edited

Loading

tssurya Nov 26, 2024 •

edited

Loading

mikemorris Nov 20, 2024 •

edited

Loading

mikemorris Nov 21, 2024 •

edited

Loading

aojea Nov 26, 2024 •

edited

Loading

tssurya commented Dec 3, 2024 •

edited

Loading