-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network Policy Not Enforced on Initial Creation #271
Comments
@kervrosales Are you saying deleting and recreating the NP solved it? or are you trying to restart pods as well? Pods will be in default allow mode until the network policies are reconciled against them and this can take up to 2-3s based on the cluster load. Please check if this situation applies to you and if it does please try with Strict mode and let us know if it helps.. |
@achevuru Thanks for your reply! That is correct; deleting and recreating the NetworkPolicy resolved the issue. I had to do this for each namespace that requires a NetworkPolicy. However, restarting the pods did not resolve the issue, which leads me to believe that the problem I am experiencing might be different from what was previously mentioned. Can you please confirm that enabling "strict" mode on AWS CNI is achieved by updating the DaemonSet in the kube-system namespace? I have done this, but for some reason, it does not seem to work. None of my new pods get the default-deny policy by default. I confirm that the container has the correct environment variables:
Thanks! |
This is expected, albeit I do not know why the AWS implementation
deviates from the official K8s spec for network policies:
This choice results in severe security implications, mainly that pods might not be isolated for the first few seconds they are started, which would give a malicious actor the opportunity to perform network actions in this short time frame. @achevuru, imo this should be addressed immediately, as a breach in network allow rules during pod startup poses a high security risk, especially for workloads that cannot be scanned for security issues. |
@samox73 Above doc calls out a Also, in the |
@achevuru This is not exactly the desired behavior either though. This blocks all traffic by default. Thus, one would have to write policies for all workloads that run in the cluster and need either ingress or egress connections, which can pose a time consuming task that takes multiple days for big clusters. It would be very cool, if pods that are associated with a network policy have traffic blocked from the start and pods that match no network policy to have default allow. |
@samox73 So, I'll summarize the options available right now,
Also, If you want to reduce the initial time period (1-3 seconds based on the load) during which a pod is either in default allow (or) default deny mode(based on the NP enforcement mode selected), you can use the ANNOTATE_POD_IP feature of VPC CNI. We introduced this feature precisely for this scenario but for Calico Network policy solution and we extended this to VPC CNI's Network Policy implementation as well. With this feature, it should reduce the initial 1-3s period down to less than a second (or even lower in most cases). |
ANNOTATE_POD_IP seems to fix the issue on my side too. |
Using
It appears that setting this only drops the initial period down by maybe a second? And so it still takes a while for either the network policy to be enforced, or if I set the enforcement mode to I have confirmed that the annotation includes the pod IP, etc. Is there something that I can do to help speed up the pod start up even further? |
@achevuru the comment @samox73 made is being somewhat underestimated, I think. With enforcing mode With enforcing mode
We really need an implementation that follows the k8s spec for network policies. |
Hi @achevuru @jayanthvn @jaydeokar, I wanted to follow up on the deviation of the AWS VPC CNI from the Kubernetes specification. Could you clarify whether this is an intentional divergence or if there are plans to align it with the official spec in the future? If there are ongoing efforts in this direction, any insights on the timeline would be greatly appreciated. Maybe you could also guide a new contributor on where to start to fix this? I would be open to work on this. Thanks! |
I have sone crow to eat here I think. Looking closely at the netpol behaviour spec, I think what it says "pods are brought up in isolation" describes the behaviour of the vpc-cni in strict mode. The non-normative text explicitly calls out that pods should be able to operate if their required networking appears after they have started. We have in fact seen issues with coredns where that didn't happen, but I think that's just a bug. I am led to conclude that the misbehaviour of things like external-dns (which tries on startup to create an aws client and fails because it can't retrieve a region, for instance) is actually a bug in that software; I think pods should retry client creation until it's successful in those circumstances (and report via readiness when they are running coreectly). |
I am experiencing an issue with network policies not being enforced upon their initial creation in my Kubernetes cluster using the AWS Network Policy Agent. The policies only take effect after being deleted and re-created. Below are the details of my configuration.
Kubernetes YAML resources
What happened:
When the network policy is initially created, it does not enforce the ingress rules as expected. I am still able to access the demo-app service from the client-one pod. However, after deleting and re-creating the network policy, the policy is enforced correctly, and access is denied as expected.
Attach logs
Initial Network Policy Creation logs from Network Policy agent:
not-working-infra.json
Delete of Network Policy logs:
delete-log-infra.json
Recreation of Network policy logs:
working-infra.json
What you expected to happen:
The network policy should enforce the ingress rules upon initial creation without requiring deletion and re-creation.
How to reproduce it (as minimally and precisely as possible):
Deploy the Kubernetes resources:
Test connectivity:
Delete and re-create the network policy:
Test connectivity again:
Anything else we need to know?:
I have verified the CNI plugin configuration and ensured that it supports network policies. This issue seems to be related to the timing or synchronization of the policy application.
Environment:
EKS Version: 1.27
CNI Plugin: v1.18.1-eksbuild.1
The text was updated successfully, but these errors were encountered: