-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support to VPC resource controller's CNINode CRD #2442
Conversation
Note: Since this is waiting for a new release tag of VPC resource controller for an update on the CNINode CRD, it is referring to my fork repo tag by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change overall LGTM, no major concerns. I added a few questions and comments. For the manifest changes, can we make the aws-vpc-cni
chart changes as well? Since these are only additions, they are fine to add in this PR
pkg/sgpp/utils.go
Outdated
@@ -30,3 +38,68 @@ func LoadEnforcingModeFromEnv() EnforcingMode { | |||
return DefaultEnforcingMode | |||
} | |||
} | |||
|
|||
func AddFeatureToCNINode(ctx context.Context, k8sClient client.Client, nodeName string, featureName rc.FeatureName, featureValue string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not attached to any struct and is "dangling". Is it possible to attach this to a greater context, so we can avoid the complexity of passing dependencies around through method arguments (e.g. k8sClient).
Another thought -- does it make sense to decouple these two concepts:
- Constructs of the CNINode
- Application of the CNINode
Something like:
node := v1.Node{}
_, err := client.Get(ctx, &node) { ... }
cniNode := CNINodeFor(node)
client.Patch(ctx, &cniNode) { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be added to ipamd.go instead of from another package. Then it can be attached to the ipamd struct.
I prefer to create the CNINode objects from VPC resource controller side since in general VPC resource controller is persisting and tracking/managing the life time of the node object from its point of view. VPC CNI will restart for many reasons and seems no need to worry about creation of this CRD object, like it doesn't care the creation of node object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to create the CNINode objects from VPC resource controller side since in general VPC resource controller is persisting and tracking/managing the life time of the node object from its point of view.
Huge +1. I could see taking this a step further, and taking an assumption that the CRD exists for the given node before taking any actions as a CNI. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice. I am trying to find out if there is a k8s mechanism which is like a "generator" (the opposite side of finalizer) with Ownership. I think this is no such native controller to do this? In this case, we are relying on the VPC resource controller to generate the CRD object upon its owner node object's creation. Not sure if this is what you were referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there's no "generator" concept. I think of the CNINode is being owned by the CNI-agent, as it is the source of truth for the CNINode.spec. I'm not sure I see the benefit of creating the object from VPCRC, regardless of the restart strategy of VPCRC or VPCCNI.
What do you think about the following:
- VPCCNI wakes up, and applies (create/update) the CNINode to be what it expects it to be.
- VPCRC watches for the CNINode objects and takes action accordingly (i.e., attaching trunk eni)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CNINode is being owned by the CNI-agent
I am not sure if this is correct since the agent will change for any reason but the CNINode should be living with the node. My preference is
- VPC RC (sort of) manage "nodes" and their owning CNINode objects
- VPC CNI doesn't have to worry about the CNINode creation since it should be taken care by VPC RC
- VPC CNI shouldn't have create permission on this resource
- VPC RC managing these objects will benefit future improvements (we can sync offline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a discussion, I think.
go.mod
Outdated
@@ -171,3 +173,5 @@ replace golang.org/x/crypto => golang.org/x/crypto v0.1.0 | |||
replace github.com/containernetworking/cni => github.com/containernetworking/cni v0.8.1 | |||
|
|||
replace github.com/containernetworking/plugins => github.com/containernetworking/plugins v0.9.1 | |||
|
|||
replace github.com/aws/amazon-vpc-resource-controller-k8s => github.com/haouc/amazon-vpc-resource-controller-k8s v1.2.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out go.work files, which you can add to .gitignore and not need to edit the module: https://go.dev/doc/tutorial/workspaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check the doc. My understanding is the multi modules workspace in this case will keep the change local and this PR will fail in GitHub actions (won't build)? Which is fine since this replace
change will be removed after the VPC RC has a new tag for CRD update and before this PR is merged. But I think I get your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haouc lets please make sure PR is merged after tag is released, or other option is cut a RC release on VPC RC and use that tag here. Since it will break master builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @jayanthvn. We can discuss offline. This PR has been changed to target release branch as a possible RC release.
be25a05
to
5175e74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
charts/aws-vpc-cni/Chart.yaml
Outdated
version: 1.13.2 | ||
appVersion: "v1.13.2" | ||
version: 1.13.3 | ||
appVersion: "v1.13.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the appVersion updated? We generally update this during release..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
- vpcresources.k8s.aws | ||
resources: | ||
- cninodes | ||
verbs: ["get", "list", "patch"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss on adding "patch" permission offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need to remove the update
permission from node resource right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need remove node update permission. The problem is we can merge the deletion to master. I have synced with Jeff offline. We may use a feature branch instead.
pkg/ipamd/ipamd.go
Outdated
@@ -138,9 +140,6 @@ const ( | |||
// vpcENIConfigLabel is used by the VPC resource controller to pick the right ENI config. | |||
vpcENIConfigLabel = "vpc.amazonaws.com/eniConfig" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even this can be cleaned up right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning work has been done in the latest update.
// Signal to VPC Resource Controller that the node is using custom networking | ||
err := c.SetNodeLabel(ctx, vpcENIConfigLabel, eniConfigName) | ||
// Add the feature name to CNINode of this node | ||
err := c.AddFeatureToCNINode(ctx, rcv1alpha1.CustomNetworking, eniConfigName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit -> something like AddFeatureToCNINodeResource
or anything else maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we verified delete flow behavior (CNINode feature set/unset)? i.e, unset env variable -> terminate nodes -> new nodes launch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't unset the features in CNINode. CNiNode itself has owner reference to the node object. When node is deleted, the CNINode will be deleted.
pkg/ipamd/ipamd.go
Outdated
} | ||
// Check that there is room for a trunk ENI to be attached: | ||
if c.dataStore.GetENIs() >= (c.maxENI - c.unmanagedENI) { | ||
log.Error("No slot available for a trunk ENI to be attached. Not labeling the node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add these 2 lines here -
podENIErrInc("askForTrunkENIIfNeeded")
log.Errorf("Failed to add SGP feature into the CNINode", err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message has been updated in the updates.
cniNode := &rcv1alpha1.CNINode{} | ||
|
||
if err := c.cachedK8SClient.Get(ctx, types.NamespacedName{Name: c.myNodeName}, cniNode); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel, we need to add some new metrics to track CNINode updates/deletes..you can refer here - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/ipamd.go#L187
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client go should give us these for free
@@ -47,6 +49,7 @@ func CreateKubeClient(mapper meta.RESTMapper) (client.Client, error) { | |||
vpcCniScheme := runtime.NewScheme() | |||
clientgoscheme.AddToScheme(vpcCniScheme) | |||
eniconfigscheme.AddToScheme(vpcCniScheme) | |||
rcscheme.AddToScheme(vpcCniScheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we measured the number of API server calls increase with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are changing to use rawclient.
bacfd13
to
dc0e478
Compare
What type of PR is this?
We want to replace node labels when enabling SGP feature through VPC CNI. VPC resource controller will create the new CRD objects with one to one mapping to k8s nodes. VPC CNI is responsible to patch the CR to add features.
The resource controller's PR is aws/amazon-vpc-resource-controller-k8s#270
The tests have been done with resource controller change.
Which issue does this PR fix:
What does this PR do / Why do we need it:
We need to switch to the new CRD based implementation.
If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
Testing done on this change:
Automation added to e2e:
Will need wait for the VPC resource controller release
Will this PR introduce any new dependencies?:
Yes, depending on the new CRD
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
It will not.
Does this change require updates to the CNI daemonset config files to work?:
no
Does this PR introduce any user-facing change?:
The node's label for trunk interface will no longer be added.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.