Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Policy Assistant] Add support for k8s native workload traffic #227

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

gabrielggg
Copy link
Contributor

@gabrielggg gabrielggg commented Apr 26, 2024

Hi @huntergregory as we discussed on #168 , this is to solve #220 , i've implemented this so that a user can use both types of traffic (native, non native) in a same traffic.json file (please refer to https://github.com/gabrielggg/network-policy-api/blob/main/cmd/policy-assistant/examples/traffic.json file to see some examples). Now the trafficPeer is supporting both. I hope you like this and give some feedback.

For example, for a traffic input like this one :

{
"Source": {
"Internal": {
"Workload": {"daemonset": "fluentd-elasticsearch"},
"Namespace": "kube-system"
}
},
"Destination": {
"Internal": {
"Workload": {"deployment": "nginx-deployment2"},
"Namespace": "default"
}
},
"Protocol": "TCP",
"ResolvedPort": 80,
"ResolvedPortName": "serve-80-tcp"
},{
"Source": {
"Internal": {
"Workload": {"daemonset": "fluentd-elasticsearch"},
"Namespace": "kube-system"
}
},
"Destination": {
"Internal": {
"PodLabels": {"pod": "b"},
"NamespaceLabels": {"ns": "y"},
"Namespace": "y"
},
"IP": "192.168.1.100"
},
"Protocol": "TCP",
"ResolvedPort": 80,
"ResolvedPortName": "serve-80-tcp"
}

You get this output (considering that both the deployment and the daemonset have 2 replicas):

image
image
image

Please check this out!

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @gabrielggg. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 26, 2024
Copy link

netlify bot commented Apr 26, 2024

Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name Link
🔨 Latest commit 8ec9885
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/6671fe846fe4320008f77644
😎 Deploy Preview https://deploy-preview-227--kubernetes-sigs-network-policy-api.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@gabrielggg gabrielggg changed the title [Policy Assistant] Add support for k8s native traffic [Policy Assistant] Add support for k8s native workload traffic Apr 26, 2024
@gabrielggg gabrielggg closed this Apr 28, 2024
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 28, 2024
@gabrielggg gabrielggg reopened this Apr 28, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 28, 2024
@mattfenwick
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 1, 2024
@gabrielggg
Copy link
Contributor Author

/retest

Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gabrielggg, thanks for the PR. You are speedy! ⚡😅 Sorry that I was not as quick to respond.

Taking a step back, I think it'd be easiest to finalize the data structures (and how to populate them) before starting the code that will use the data structures.

These are the parts of #220 that I'd recommend we focus on first:

TL;DR
Write go code to get a Deployment/DaemonSet from a cluster and create a corresponding TrafficPeer (see struct referenced below).
...
It would be nice if a user could instead reference a Pod/Deployment/DaemonSet, and then Policy Assistant queries someone's cluster to fill in:
- pod labels
- namespace labels
- node labels
- IP (or IPs of a Deployment, for instance)
We could start by building go code to convert a Deployment or DaemonSet to a TrafficPeer for a user's Kubernetes cluster.

Data Format

I think your JSON design is nearly there. The main caveat with it is that it will be hard to track different IPs for the Pods of a workload.

So specifically, I would probably convert these structs

type TrafficPeer struct {
	Internal *InternalPeer
	IP          string
}

type InternalPeer struct {
	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
	NodeLabels      map[string]string
	Node            string
}

to

type TrafficPeer struct {
	Internal *InternalPeer
       // keep this field for backwards-compatibility or for IPs without internalPeer
	IP          string
       // use this for pod IPs
       *Workload
}

type InternalPeer struct {
	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
       // I believe I added these node pieces. We can remove
}

type Workload struct {
      // format: namespace/kind/name
      fullName string
       pods []PodNetworking
}

type PodNetworking struct {
       IP string
      // don't worry about populating below fields right now
       IsHostNetworking bool
       NodeLabels []string
}

Feel free to suggest changes as well. For instance, I think the placement of the workload name (fullName) field is somewhat arbitrary and could always be refactored later to make using the structs easier.

Populating the Data from Cluster

After we finalize the structs, would you be able to make those modifications? Could you please also see where TrafficPeer.IP is referenced? We can worry about what to do where it is referenced later.

Then, would you be ok writing go code to translate a user input (say "ns-dev/deployment/frontend") into your modified TrafficPeer struct? I would recommend we don't modify any code for the analyze command in this first PR.

@huntergregory
Copy link
Contributor

Hey @gabrielggg I accidentally published my review before I finished writing. When you can, please take a look at the edited text above and let me know how this sounds

@huntergregory
Copy link
Contributor

Hey @gabrielggg, taking a second look I’d approve your original idea, and my only suggestion is adding the pods field to InternalPeer. Let’s keep everything internal (as in internal to cluster). So like:

type TrafficPeer struct {
	Internal *InternalPeer
        // IP external to cluster
	IP          string
}

// Internal to cluster
type InternalPeer struct {
        // optional: if set, will override remaining values with information from cluster
        workload string

	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
        // optional
        Pods      []*PodNetworking
}

Still would prefer to save the traffic command logic for a follow-up PR if you don’t mind? So this PR would just modify data structures and define a function that populates InternalPeer based on workload name.

@gabrielggg gabrielggg closed this May 9, 2024
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 9, 2024
@gabrielggg gabrielggg reopened this May 9, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 9, 2024
Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries @gabrielggg. This is nice how we can reuse the Translate() function! Thanks for taking initiative to do the other workload types too. If we're doing all workloads, I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

Could you add results of testing this in the test/ directory? And if you don't mind, let's also add test yamls for a basic Pod, StatefulSet, and ReplicaSet (yamls which are not associated with a Deployment or DaemonSet).

This is awesome to see. Everything's coming together nicely 😁

I've created a separated func for each workload type. But maybe it will be better to put everything on 1 func to reuse the code. Please tell me what you think about it.

Looks ok to me but up to you what seems easier to write/read/maintain 🙂

@@ -5,7 +5,10 @@ import (
"strings"

"github.com/mattfenwick/collections/pkg/slice"
"github.com/mattfenwick/cyclonus/pkg/kube"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to import policy-assistant's kube package instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to this issue #170

if !workloadOwnerExists {
logrus.Infof("workload not found on the cluster")
internalPeer = InternalPeer{
Workload: "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the churn. Could we actually return an error here instead? I think that might be a more natural way to convey failure to find the specified workload. If we do this, we might as well return errors at these two locations too:

  1. ns, err := kubeClient.GetNamespace(workloadMetadata[0])
  2. kubePods, err := kube.GetPodsInNamespaces(kubeClient, []string{workloadMetadata[0]})

Copy link
Contributor Author

@gabrielggg gabrielggg Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @huntergregory , thanks for the review, the problem here is that doing a fatal exits the program when a workload is scaled down to 0. for example if you have a deployment with 0 replicas it crashes there when you call the function to map all the deployments on the cluster to trafficpeers. Have you thought about that scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is nitpicky, and I'm noticing it requires a good amount of change, so honestly feel free to leave everything as is 🙂. I was more so suggesting that we could change the function signature to:

func (p *TrafficPeer) Translate() (*TrafficPeer, error) {

and instead of using logrus.Fatal(), we could start returning errors to be handled as needed:

return nil, fmt.Errorf("failed to get workload: %w", err)

But this might be a lot of work for little reward: if you replace logrus.Fatal() in this function, you might as well replace it in all functions. So feel free to leave as is 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example if you have a deployment with 0 replicas

Good call out. Do you mind calling this out somehow in the code too?

cmd/policy-assistant/pkg/matcher/traffic.go Outdated Show resolved Hide resolved
cmd/policy-assistant/pkg/matcher/traffic.go Outdated Show resolved Hide resolved
@gabrielggg
Copy link
Contributor Author

Hey @huntergregory, thanks a lot for the review, i'm working on the observations but i have one doubt regarding this comment :

I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance.

@huntergregory
Copy link
Contributor

I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance.

Hey sorry @gabrielggg, I expressed this incorrectly. I mean for each Pod to be associated with exactly one TrafficPeer. For example, all Pods in a Deployment should be associated with the same TrafficPeer (with the Deployment workload type). Those same Pods should not be associated with a TrafficPeer with the ReplicaSet or Pod workload type. So there should be less TrafficPeers than Pods. The function logic would be something like:

  1. Create one TrafficPeer for each Deployment, and consider all Pods in the Deployment as "handled".
  2. Similar for DaemonSets.
  3. Get all ReplicaSets. Ignore the ReplicaSet if it's associated with a Deployment (equally, ignore Pods that have been "handled" already).
  4. Similar for StatefulSets.
  5. For remaining Pods, only create a TrafficPeer if a Pod has not been "handled".

@gabrielggg
Copy link
Contributor Author

thanks for the clarification @huntergregory , I think that makes sense. I will be working on it.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 17, 2024
@gabrielggg
Copy link
Contributor Author

gabrielggg commented Jun 17, 2024

hey @huntergregory , i managed to get each Pod to be associated with exactly one TrafficPeer (you can see those inputs/outputs on the test files). Please check that out, also added all the test files used for testing including workload yamls and json outputs of each function, please check it out to and tell me what you think.

Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Love that it was a simple solution 🙂. Thanks for adding all of the yamls/outputs too.

I just noticed a tiny edge case and had two last nitpicks.

cmd/policy-assistant/pkg/matcher/traffic.go Outdated Show resolved Hide resolved
cmd/policy-assistant/pkg/matcher/traffic.go Show resolved Hide resolved
cmd/policy-assistant/pkg/matcher/traffic.go Show resolved Hide resolved
@gabrielggg
Copy link
Contributor Author

gabrielggg commented Jun 18, 2024

hey @huntergregory , thanks for the review. I fixed the edge case and removed the log. Please check it out.

Copy link
Contributor

@huntergregory huntergregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gabrielggg for the HUGE help and persistence. This will be key infra for Policy Assistant going forward.

/lgtm 🚀

cmd/policy-assistant/pkg/matcher/traffic.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2024
@gabrielggg gabrielggg closed this Jun 18, 2024
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 18, 2024
@gabrielggg gabrielggg reopened this Jun 18, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 18, 2024
@huntergregory
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 5, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabrielggg, huntergregory

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants