Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render SriovNetworkNodeState before Device Plugin ConfigMap #487

Merged
merged 1 commit into from
Aug 8, 2023

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Aug 7, 2023

Device Plugin ConfigMap relies on SriovNetworkNodeState object, so we need to render it earlier.

The issue is occured only if node stops to correspond to configDaemonNodeSelector during SR-IOV configuration, so controller will delete SriovNetworkNodeState and won't re-create it again once node starts correspond to configDaemonNodeSelector again.

Device Plugin ConfigMap relies on SriovNetworkNodeState object, so
we need to render it earlier.

The issue is occured only if node stops to correspond to
configDaemonNodeSelector during SR-IOV configuration, so controller
will delete SriovNetworkNodeState and won't re-create it again once node
starts correspond to configDaemonNodeSelector again.
@github-actions
Copy link

github-actions bot commented Aug 7, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 5783839350

  • 0 of 4 (0.0%) changed or added relevant lines in 1 file are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.09%) to 24.523%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/sriovnetworknodepolicy_controller.go 0 4 0.0%
Files with Coverage Reduction New Missed Lines %
controllers/sriovnetwork_controller.go 4 63.81%
Totals Coverage Status
Change from base Build 5740514141: 0.09%
Covered Lines: 2083
Relevant Lines: 8494

💛 - Coveralls

@e0ne
Copy link
Collaborator Author

e0ne commented Aug 7, 2023

/test-all

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device plugin config map depends on SriovNetworkNodeState of that node, so it makes sense to sync it first.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Though the device plugin config map rendering logic depends on SriovNetworkNodeState.Status, that is written by the config-daemon. BTW, this PR avoid raising most common errors IMO

@adrianchiris
Copy link
Collaborator

adrianchiris commented Aug 8, 2023

this PR avoid raising most common errors IMO

Thanks for the review ! can you elaborate ?

@adrianchiris adrianchiris merged commit 7b5ecac into k8snetworkplumbingwg:master Aug 8, 2023
@zeeke
Copy link
Member

zeeke commented Aug 10, 2023

Thanks for the review ! can you elaborate ?

Sure:
renderDevicePluginConfigData(...) is invoked on every reconcile loop and it tries to retrieve the SriovNetworkNodeState. (sriovnetworknodepolicy_controller.go#L226 ).

That value is used only if the policy has Spec.NicSelector.NetFilter != "" (#L783), which reads the Status field. If it's just rendered by syncAllSriovNetworkNodeStates(...), the Status is not yet populated, as that function sets the Spec fields.

But in the end, it probably works even in the above scenario: the policies are reconciled every 5 minutes, and at the second loop every resource should be in place and correctly populated. If that's the case, sorry for the noise.

@adrianchiris
Copy link
Collaborator

adrianchiris commented Aug 10, 2023

But in the end, it probably works even in the above scenario: the policies are reconciled every 5 minutes, and at the second loop every resource should be in place and correctly populated. If that's the case, sorry for the noise.

That is the case to my understanding.

however i think we should fail (and retry reconcile) if nodestate has no status field if its a pre-req for rendering device plugin config. @e0ne

e0ne added a commit to e0ne/sriov-network-operator that referenced this pull request Aug 14, 2023
NetFilter selector depends on PCI address of NICs from the node state.
After PR k8snetworkplumbingwg#487 is merged we need to check if node state is updated
or return an reconcile error to render device plugin config faster.
e0ne added a commit to e0ne/sriov-network-operator that referenced this pull request Aug 14, 2023
NetFilter selector depends on PCI address of NICs from the node state.
After PR k8snetworkplumbingwg#487 is merged we need to check if node state is updated
or return an reconcile error to render device plugin config faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants