[Policy Assistant] Add support for k8s native workload traffic #227

gabrielggg · 2024-04-26T06:43:03Z

Hi @huntergregory as we discussed on #168 , this is to solve #220 , i've implemented this so that a user can use both types of traffic (native, non native) in a same traffic.json file (please refer to https://github.com/gabrielggg/network-policy-api/blob/main/cmd/policy-assistant/examples/traffic.json file to see some examples). Now the trafficPeer is supporting both. I hope you like this and give some feedback.

For example, for a traffic input like this one :

{
"Source": {
"Internal": {
"Workload": {"daemonset": "fluentd-elasticsearch"},
"Namespace": "kube-system"
}
},
"Destination": {
"Internal": {
"Workload": {"deployment": "nginx-deployment2"},
"Namespace": "default"
}
},
"Protocol": "TCP",
"ResolvedPort": 80,
"ResolvedPortName": "serve-80-tcp"
},{
"Source": {
"Internal": {
"Workload": {"daemonset": "fluentd-elasticsearch"},
"Namespace": "kube-system"
}
},
"Destination": {
"Internal": {
"PodLabels": {"pod": "b"},
"NamespaceLabels": {"ns": "y"},
"Namespace": "y"
},
"IP": "192.168.1.100"
},
"Protocol": "TCP",
"ResolvedPort": 80,
"ResolvedPortName": "serve-80-tcp"
}

You get this output (considering that both the deployment and the daemonset have 2 replicas):

Please check this out!

k8s-ci-robot · 2024-04-26T06:43:13Z

Hi @gabrielggg. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

netlify · 2024-04-26T06:43:20Z

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name	Link
🔨 Latest commit	`8ec9885`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/6671fe846fe4320008f77644
😎 Deploy Preview	https://deploy-preview-227--kubernetes-sigs-network-policy-api.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

mattfenwick · 2024-05-01T13:07:20Z

/ok-to-test

gabrielggg · 2024-05-02T16:13:32Z

/retest

huntergregory

Hi @gabrielggg, thanks for the PR. You are speedy! ⚡😅 Sorry that I was not as quick to respond.

Taking a step back, I think it'd be easiest to finalize the data structures (and how to populate them) before starting the code that will use the data structures.

These are the parts of #220 that I'd recommend we focus on first:

TL;DR
Write go code to get a Deployment/DaemonSet from a cluster and create a corresponding TrafficPeer (see struct referenced below).
...
It would be nice if a user could instead reference a Pod/Deployment/DaemonSet, and then Policy Assistant queries someone's cluster to fill in:
- pod labels
- namespace labels
- node labels
- IP (or IPs of a Deployment, for instance)
We could start by building go code to convert a Deployment or DaemonSet to a TrafficPeer for a user's Kubernetes cluster.

Data Format

I think your JSON design is nearly there. The main caveat with it is that it will be hard to track different IPs for the Pods of a workload.

So specifically, I would probably convert these structs

type TrafficPeer struct {
	Internal *InternalPeer
	IP          string
}

type InternalPeer struct {
	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
	NodeLabels      map[string]string
	Node            string
}

to

type TrafficPeer struct {
	Internal *InternalPeer
       // keep this field for backwards-compatibility or for IPs without internalPeer
	IP          string
       // use this for pod IPs
       *Workload
}

type InternalPeer struct {
	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
       // I believe I added these node pieces. We can remove
}

type Workload struct {
      // format: namespace/kind/name
      fullName string
       pods []PodNetworking
}

type PodNetworking struct {
       IP string
      // don't worry about populating below fields right now
       IsHostNetworking bool
       NodeLabels []string
}

Feel free to suggest changes as well. For instance, I think the placement of the workload name (fullName) field is somewhat arbitrary and could always be refactored later to make using the structs easier.

Populating the Data from Cluster

After we finalize the structs, would you be able to make those modifications? Could you please also see where TrafficPeer.IP is referenced? We can worry about what to do where it is referenced later.

Then, would you be ok writing go code to translate a user input (say "ns-dev/deployment/frontend") into your modified TrafficPeer struct? I would recommend we don't modify any code for the analyze command in this first PR.

huntergregory · 2024-05-06T22:34:40Z

Hey @gabrielggg I accidentally published my review before I finished writing. When you can, please take a look at the edited text above and let me know how this sounds

huntergregory · 2024-05-09T01:28:35Z

Hey @gabrielggg, taking a second look I’d approve your original idea, and my only suggestion is adding the pods field to InternalPeer. Let’s keep everything internal (as in internal to cluster). So like:

type TrafficPeer struct {
	Internal *InternalPeer
        // IP external to cluster
	IP          string
}

// Internal to cluster
type InternalPeer struct {
        // optional: if set, will override remaining values with information from cluster
        workload string

	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
        // optional
        Pods      []*PodNetworking
}

Still would prefer to save the traffic command logic for a follow-up PR if you don’t mind? So this PR would just modify data structures and define a function that populates InternalPeer based on workload name.

huntergregory

No worries @gabrielggg. This is nice how we can reuse the Translate() function! Thanks for taking initiative to do the other workload types too. If we're doing all workloads, I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

Could you add results of testing this in the test/ directory? And if you don't mind, let's also add test yamls for a basic Pod, StatefulSet, and ReplicaSet (yamls which are not associated with a Deployment or DaemonSet).

This is awesome to see. Everything's coming together nicely 😁

I've created a separated func for each workload type. But maybe it will be better to put everything on 1 func to reuse the code. Please tell me what you think about it.

Looks ok to me but up to you what seems easier to write/read/maintain 🙂

huntergregory · 2024-06-07T22:13:19Z

cmd/policy-assistant/pkg/matcher/traffic.go

@@ -5,7 +5,10 @@ import (
 	"strings"

 	"github.com/mattfenwick/collections/pkg/slice"
+	"github.com/mattfenwick/cyclonus/pkg/kube"


do we need to import policy-assistant's kube package instead?

I think this is related to this issue #170

huntergregory · 2024-06-07T22:20:40Z

cmd/policy-assistant/pkg/matcher/traffic.go

+	if !workloadOwnerExists {
+		logrus.Infof("workload not found on the cluster")
+		internalPeer = InternalPeer{
+			Workload: "",


sorry for the churn. Could we actually return an error here instead? I think that might be a more natural way to convey failure to find the specified workload. If we do this, we might as well return errors at these two locations too:

ns, err := kubeClient.GetNamespace(workloadMetadata[0])

kubePods, err := kube.GetPodsInNamespaces(kubeClient, []string{workloadMetadata[0]})

hey @huntergregory , thanks for the review, the problem here is that doing a fatal exits the program when a workload is scaled down to 0. for example if you have a deployment with 0 replicas it crashes there when you call the function to map all the deployments on the cluster to trafficpeers. Have you thought about that scenario.

My suggestion is nitpicky, and I'm noticing it requires a good amount of change, so honestly feel free to leave everything as is 🙂. I was more so suggesting that we could change the function signature to:

func (p *TrafficPeer) Translate() (*TrafficPeer, error) {

and instead of using logrus.Fatal(), we could start returning errors to be handled as needed:

return nil, fmt.Errorf("failed to get workload: %w", err)

But this might be a lot of work for little reward: if you replace logrus.Fatal() in this function, you might as well replace it in all functions. So feel free to leave as is 🙂

for example if you have a deployment with 0 replicas

Good call out. Do you mind calling this out somehow in the code too?

cmd/policy-assistant/pkg/matcher/traffic.go

gabrielggg · 2024-06-08T03:35:32Z

Hey @huntergregory, thanks a lot for the review, i'm working on the observations but i have one doubt regarding this comment :

I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance.

huntergregory · 2024-06-08T23:39:34Z

I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance.

Hey sorry @gabrielggg, I expressed this incorrectly. I mean for each Pod to be associated with exactly one TrafficPeer. For example, all Pods in a Deployment should be associated with the same TrafficPeer (with the Deployment workload type). Those same Pods should not be associated with a TrafficPeer with the ReplicaSet or Pod workload type. So there should be less TrafficPeers than Pods. The function logic would be something like:

Create one TrafficPeer for each Deployment, and consider all Pods in the Deployment as "handled".
Similar for DaemonSets.
Get all ReplicaSets. Ignore the ReplicaSet if it's associated with a Deployment (equally, ignore Pods that have been "handled" already).
Similar for StatefulSets.
For remaining Pods, only create a TrafficPeer if a Pod has not been "handled".

gabrielggg · 2024-06-09T04:24:34Z

thanks for the clarification @huntergregory , I think that makes sense. I will be working on it.

gabrielggg · 2024-06-17T21:04:42Z

hey @huntergregory , i managed to get each Pod to be associated with exactly one TrafficPeer (you can see those inputs/outputs on the test files). Please check that out, also added all the test files used for testing including workload yamls and json outputs of each function, please check it out to and tell me what you think.

huntergregory

Nice! Love that it was a simple solution 🙂. Thanks for adding all of the yamls/outputs too.

I just noticed a tiny edge case and had two last nitpicks.

cmd/policy-assistant/pkg/matcher/traffic.go

gabrielggg · 2024-06-18T20:54:38Z

hey @huntergregory , thanks for the review. I fixed the edge case and removed the log. Please check it out.

huntergregory

Thank you @gabrielggg for the HUGE help and persistence. This will be key infra for Policy Assistant going forward.

/lgtm 🚀

cmd/policy-assistant/pkg/matcher/traffic.go

huntergregory · 2024-07-05T19:38:06Z

/lgtm
/approve

k8s-ci-robot · 2024-07-05T19:38:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabrielggg, huntergregory

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/policy-assistant/OWNERS~~ [huntergregory]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 26, 2024

k8s-ci-robot requested review from huntergregory and mattfenwick April 26, 2024 06:43

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 26, 2024

gabrielggg changed the title ~~[Policy Assistant] Add support for k8s native traffic~~ [Policy Assistant] Add support for k8s native workload traffic Apr 26, 2024

gabrielggg closed this Apr 28, 2024

gabrielggg force-pushed the main branch from ec71aaa to 7e215ea Compare April 28, 2024 18:43

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 28, 2024

gabrielggg reopened this Apr 28, 2024

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 28, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 1, 2024

huntergregory suggested changes May 4, 2024

View reviewed changes

gabrielggg closed this May 9, 2024

gabrielggg force-pushed the main branch from 5466a52 to 7e215ea Compare May 9, 2024 22:03

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 9, 2024

gabrielggg reopened this May 9, 2024

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 9, 2024

gabrielggg force-pushed the main branch from b12d4c9 to 9f4894c Compare June 6, 2024 17:25

huntergregory reviewed Jun 7, 2024

View reviewed changes

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 17, 2024

huntergregory reviewed Jun 17, 2024

View reviewed changes

cmd/policy-assistant/pkg/matcher/traffic.go Outdated Show resolved Hide resolved

cmd/policy-assistant/pkg/matcher/traffic.go Show resolved Hide resolved

cmd/policy-assistant/pkg/matcher/traffic.go Show resolved Hide resolved

huntergregory approved these changes Jun 18, 2024

View reviewed changes

cmd/policy-assistant/pkg/matcher/traffic.go Show resolved Hide resolved

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2024

gabrielggg closed this Jun 18, 2024

gabrielggg force-pushed the main branch from f5d74fd to ff90f9d Compare June 18, 2024 21:36

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 18, 2024

add support for translating k8s workload traffic to TrafficPeers

8ec9885

gabrielggg reopened this Jun 18, 2024

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 18, 2024

gabrielggg mentioned this pull request Jun 18, 2024

[Policy Assistant] Avoid unnecessary api calls to get all the pods in the namespace. #236

Open

k8s-ci-robot assigned huntergregory Jul 5, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 5, 2024

k8s-ci-robot merged commit 964c353 into kubernetes-sigs:main Jul 5, 2024
8 checks passed

huntergregory mentioned this pull request Aug 8, 2024

feat: [Policy Assistant] walkthrough mode and a README (KubeCon demo PR 2/2) #245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Policy Assistant] Add support for k8s native workload traffic #227

[Policy Assistant] Add support for k8s native workload traffic #227

gabrielggg commented Apr 26, 2024 •

edited

Loading

k8s-ci-robot commented Apr 26, 2024

netlify bot commented Apr 26, 2024 •

edited

Loading

mattfenwick commented May 1, 2024

gabrielggg commented May 2, 2024

huntergregory left a comment •

edited

Loading

huntergregory commented May 6, 2024

huntergregory commented May 9, 2024

huntergregory left a comment

huntergregory Jun 7, 2024

gabrielggg Jun 17, 2024

huntergregory Jun 7, 2024

gabrielggg Jun 8, 2024 •

edited

Loading

huntergregory Jun 8, 2024

huntergregory Jun 8, 2024

gabrielggg commented Jun 8, 2024

huntergregory commented Jun 8, 2024

gabrielggg commented Jun 9, 2024

gabrielggg commented Jun 17, 2024 •

edited

Loading

huntergregory left a comment

gabrielggg commented Jun 18, 2024 •

edited

Loading

huntergregory left a comment

huntergregory commented Jul 5, 2024

k8s-ci-robot commented Jul 5, 2024

[Policy Assistant] Add support for k8s native workload traffic #227

[Policy Assistant] Add support for k8s native workload traffic #227

Conversation

gabrielggg commented Apr 26, 2024 • edited Loading

k8s-ci-robot commented Apr 26, 2024

netlify bot commented Apr 26, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

mattfenwick commented May 1, 2024

gabrielggg commented May 2, 2024

huntergregory left a comment • edited Loading

Choose a reason for hiding this comment

Data Format

Populating the Data from Cluster

huntergregory commented May 6, 2024

huntergregory commented May 9, 2024

huntergregory left a comment

Choose a reason for hiding this comment

huntergregory Jun 7, 2024

Choose a reason for hiding this comment

gabrielggg Jun 17, 2024

Choose a reason for hiding this comment

huntergregory Jun 7, 2024

Choose a reason for hiding this comment

gabrielggg Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

huntergregory Jun 8, 2024

Choose a reason for hiding this comment

huntergregory Jun 8, 2024

Choose a reason for hiding this comment

gabrielggg commented Jun 8, 2024

huntergregory commented Jun 8, 2024

gabrielggg commented Jun 9, 2024

gabrielggg commented Jun 17, 2024 • edited Loading

huntergregory left a comment

Choose a reason for hiding this comment

gabrielggg commented Jun 18, 2024 • edited Loading

huntergregory left a comment

Choose a reason for hiding this comment

huntergregory commented Jul 5, 2024

k8s-ci-robot commented Jul 5, 2024

gabrielggg commented Apr 26, 2024 •

edited

Loading

netlify bot commented Apr 26, 2024 •

edited

Loading

huntergregory left a comment •

edited

Loading

gabrielggg Jun 8, 2024 •

edited

Loading

gabrielggg commented Jun 17, 2024 •

edited

Loading

gabrielggg commented Jun 18, 2024 •

edited

Loading