-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network denies despite allow-all policy (strict mode) #288
Comments
@creinheimer If I understood the issue accurately, you've pods that are only configured with an So, the issue with |
Hi @achevuru, I mentioned the other issues to provide you some context.
Yes. That happens sporadically on different pods even though we have an
I would suggest we focus on understanding why denials occur sporadically (sometimes hours after pods have been running) despite having an allow-all rule applied to all namespaces. |
Hi, we are experiencing a similar issue with Pods can start and are unable to connect to any host. They end up in a crash loop (due to timeouts in the app) and never recover. Only removing the pods manually does resolve this issue. We moved to strict mode as we were experiencing dropped connections with workloads shortly after pod start. These are the network-policy-agent.log logs network-policy-agent.log. The pod name is We can easily reproduce this. [edit] I am happy to supply any debugging information go help resolve this. |
Hello @pelzerim, We are also getting the same behaviour as our use case also involves a short pod lifecycle of 10 - 15 seconds. The init container workaround that you have currently, is it 100% effective? or you still observing issues after that workaround? Thanks ! |
Hey @anshulpatel25, the init container workaround does only work for standard mode. We've determined that its not actually the log line that does the magic but the minimum wait time of 1 second. Unrelated to that issue, strict mode seems to be currently incompatible with high pod churn (see my previous comment) |
Hi @pelzerim, can you shed some light on what the implementation of this init container looks like? Does it just wait? So far we have tried static wait time in the regular container, but this does not seem to affect the problem. |
@creinheimer We have a couple of fixes that went in with the latest release v1.1.3. Can you try with the latest image and let us know if you are still seeing the issue https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.5 |
Hello,
For several weeks we've been working on implementing network policies using the AWS solution. However, we've encountered various challenges along the way. Initially, we discovered that using the
standard
enforcement mode could lead to network instability. As a result, we decided to use the so calledstrict
mode.In this thread #271 (comment), @achevuru suggested that we could create an
allow all
policy for each namespace and that the only side effect would be the deny mode during the first seconds of a newly launched pod. We then created an allow all policy on all namespaces and enabled the annotate Pod IP flag to allow faster network policy evaluations.Now we have a new issue: pods in namespaces with an allow-all network policy are still experiencing network denies. This isn't limited to the initial startup period. It's happening long after pods have been running, sometimes hours later.
This behaviour is causing different problems, including pod crashes. In some cases, even the pod's internal health checks are being denied, triggering unnecessary restarts.
Can you provide any insight into why this might be happening? Am I missing something?
More info:
Deny logs from pods to control-plane
Deny logs from pods to pods on same namespace
These are just a few of them. We had approx. 200 denies over last 15 minutes.
NetworkPolicy allow-all
Environment:
v1.29.4-eks-036c24b
v1.18.1
v1.1.2
Amazon Linux 2
5.10.215-203.850.amzn2.aarch64
Our AWS-CNI uses the default helm-chart with the following variables:
AWS-CNI configuration
Note:
@jayanthvn that's follow-up of #73 (comment).
The text was updated successfully, but these errors were encountered: