-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Policy Assistant] Add support for k8s native workload traffic #227
Conversation
Hi @gabrielggg. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gabrielggg, thanks for the PR. You are speedy! ⚡😅 Sorry that I was not as quick to respond.
Taking a step back, I think it'd be easiest to finalize the data structures (and how to populate them) before starting the code that will use the data structures.
These are the parts of #220 that I'd recommend we focus on first:
TL;DR
Write go code to get a Deployment/DaemonSet from a cluster and create a corresponding TrafficPeer (see struct referenced below).
...
It would be nice if a user could instead reference a Pod/Deployment/DaemonSet, and then Policy Assistant queries someone's cluster to fill in:
- pod labels
- namespace labels
- node labels
- IP (or IPs of a Deployment, for instance)
We could start by building go code to convert a Deployment or DaemonSet to a TrafficPeer for a user's Kubernetes cluster.
Data Format
I think your JSON design is nearly there. The main caveat with it is that it will be hard to track different IPs for the Pods of a workload.
So specifically, I would probably convert these structs
type TrafficPeer struct {
Internal *InternalPeer
IP string
}
type InternalPeer struct {
PodLabels map[string]string
NamespaceLabels map[string]string
Namespace string
NodeLabels map[string]string
Node string
}
to
type TrafficPeer struct {
Internal *InternalPeer
// keep this field for backwards-compatibility or for IPs without internalPeer
IP string
// use this for pod IPs
*Workload
}
type InternalPeer struct {
PodLabels map[string]string
NamespaceLabels map[string]string
Namespace string
// I believe I added these node pieces. We can remove
}
type Workload struct {
// format: namespace/kind/name
fullName string
pods []PodNetworking
}
type PodNetworking struct {
IP string
// don't worry about populating below fields right now
IsHostNetworking bool
NodeLabels []string
}
Feel free to suggest changes as well. For instance, I think the placement of the workload name (fullName
) field is somewhat arbitrary and could always be refactored later to make using the structs easier.
Populating the Data from Cluster
After we finalize the structs, would you be able to make those modifications? Could you please also see where TrafficPeer.IP
is referenced? We can worry about what to do where it is referenced later.
Then, would you be ok writing go code to translate a user input (say "ns-dev/deployment/frontend") into your modified TrafficPeer
struct? I would recommend we don't modify any code for the analyze
command in this first PR.
Hey @gabrielggg I accidentally published my review before I finished writing. When you can, please take a look at the edited text above and let me know how this sounds |
Hey @gabrielggg, taking a second look I’d approve your original idea, and my only suggestion is adding the pods field to InternalPeer. Let’s keep everything internal (as in internal to cluster). So like: type TrafficPeer struct {
Internal *InternalPeer
// IP external to cluster
IP string
}
// Internal to cluster
type InternalPeer struct {
// optional: if set, will override remaining values with information from cluster
workload string
PodLabels map[string]string
NamespaceLabels map[string]string
Namespace string
// optional
Pods []*PodNetworking
} Still would prefer to save the traffic command logic for a follow-up PR if you don’t mind? So this PR would just modify data structures and define a function that populates InternalPeer based on workload name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries @gabrielggg. This is nice how we can reuse the Translate()
function! Thanks for taking initiative to do the other workload types too. If we're doing all workloads, I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.
Could you add results of testing this in the test/
directory? And if you don't mind, let's also add test yamls for a basic Pod, StatefulSet, and ReplicaSet (yamls which are not associated with a Deployment or DaemonSet).
This is awesome to see. Everything's coming together nicely 😁
I've created a separated func for each workload type. But maybe it will be better to put everything on 1 func to reuse the code. Please tell me what you think about it.
Looks ok to me but up to you what seems easier to write/read/maintain 🙂
@@ -5,7 +5,10 @@ import ( | |||
"strings" | |||
|
|||
"github.com/mattfenwick/collections/pkg/slice" | |||
"github.com/mattfenwick/cyclonus/pkg/kube" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to import policy-assistant's kube package instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to this issue #170
if !workloadOwnerExists { | ||
logrus.Infof("workload not found on the cluster") | ||
internalPeer = InternalPeer{ | ||
Workload: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the churn. Could we actually return an error here instead? I think that might be a more natural way to convey failure to find the specified workload. If we do this, we might as well return errors at these two locations too:
ns, err := kubeClient.GetNamespace(workloadMetadata[0])
kubePods, err := kube.GetPodsInNamespaces(kubeClient, []string{workloadMetadata[0]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @huntergregory , thanks for the review, the problem here is that doing a fatal exits the program when a workload is scaled down to 0. for example if you have a deployment with 0 replicas it crashes there when you call the function to map all the deployments on the cluster to trafficpeers. Have you thought about that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is nitpicky, and I'm noticing it requires a good amount of change, so honestly feel free to leave everything as is 🙂. I was more so suggesting that we could change the function signature to:
func (p *TrafficPeer) Translate() (*TrafficPeer, error) {
and instead of using logrus.Fatal()
, we could start returning errors to be handled as needed:
return nil, fmt.Errorf("failed to get workload: %w", err)
But this might be a lot of work for little reward: if you replace logrus.Fatal()
in this function, you might as well replace it in all functions. So feel free to leave as is 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example if you have a deployment with 0 replicas
Good call out. Do you mind calling this out somehow in the code too?
Hey @huntergregory, thanks a lot for the review, i'm working on the observations but i have one doubt regarding this comment :
The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance. |
Hey sorry @gabrielggg, I expressed this incorrectly. I mean for each Pod to be associated with exactly one TrafficPeer. For example, all Pods in a Deployment should be associated with the same TrafficPeer (with the Deployment workload type). Those same Pods should not be associated with a TrafficPeer with the ReplicaSet or Pod workload type. So there should be less TrafficPeers than Pods. The function logic would be something like:
|
thanks for the clarification @huntergregory , I think that makes sense. I will be working on it. |
hey @huntergregory , i managed to get each Pod to be associated with exactly one TrafficPeer (you can see those inputs/outputs on the test files). Please check that out, also added all the test files used for testing including workload yamls and json outputs of each function, please check it out to and tell me what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Love that it was a simple solution 🙂. Thanks for adding all of the yamls/outputs too.
I just noticed a tiny edge case and had two last nitpicks.
hey @huntergregory , thanks for the review. I fixed the edge case and removed the log. Please check it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gabrielggg for the HUGE help and persistence. This will be key infra for Policy Assistant going forward.
/lgtm 🚀
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gabrielggg, huntergregory The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @huntergregory as we discussed on #168 , this is to solve #220 , i've implemented this so that a user can use both types of traffic (native, non native) in a same traffic.json file (please refer to https://github.com/gabrielggg/network-policy-api/blob/main/cmd/policy-assistant/examples/traffic.json file to see some examples). Now the trafficPeer is supporting both. I hope you like this and give some feedback.
For example, for a traffic input like this one :
{
"Source": {
"Internal": {
"Workload": {"daemonset": "fluentd-elasticsearch"},
"Namespace": "kube-system"
}
},
"Destination": {
"Internal": {
"Workload": {"deployment": "nginx-deployment2"},
"Namespace": "default"
}
},
"Protocol": "TCP",
"ResolvedPort": 80,
"ResolvedPortName": "serve-80-tcp"
},{
"Source": {
"Internal": {
"Workload": {"daemonset": "fluentd-elasticsearch"},
"Namespace": "kube-system"
}
},
"Destination": {
"Internal": {
"PodLabels": {"pod": "b"},
"NamespaceLabels": {"ns": "y"},
"Namespace": "y"
},
"IP": "192.168.1.100"
},
"Protocol": "TCP",
"ResolvedPort": 80,
"ResolvedPortName": "serve-80-tcp"
}
You get this output (considering that both the deployment and the daemonset have 2 replicas):
Please check this out!