Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Pod IP for peer communication #220

Merged
merged 5 commits into from
Jul 9, 2024

Conversation

clobrano
Copy link
Contributor

Why we need this PR

SNR Peers communication uses hostnetwork (Node IP), which exposes a HTTP/2 endpoint.
Using the Pod IP will make the port harder to attack.

Changes made

In place of looking for Nodes' IP, we look for other agents' Pod IP.

Which issue(s) this PR fixes

https://issues.redhat.com/browse/ECOPROJECT-1879

Test plan

@clobrano
Copy link
Contributor Author

/test 4.15-openshift-e2e

1 similar comment
@mshitrit
Copy link
Member

/test 4.15-openshift-e2e

pkg/apicheck/check.go Outdated Show resolved Hide resolved
pkg/apicheck/check.go Outdated Show resolved Hide resolved
pkg/apicheck/check.go Outdated Show resolved Hide resolved
@clobrano
Copy link
Contributor Author

/test 4.15-openshift-e2e

addresses[i] = node.Status.Addresses
for _, pod := range pods.Items {
if pod.Spec.NodeName == node.Name {
addresses[i] = pod.Status.PodIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why we have a string type now, and indeed there is a better choice IMHO, what about using pod.Status.PodIPs[0]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean moving around this data as PodIP and then let popPeersIP deal with it returning the string[] of IPs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, at which point of the chain would it be better to use the underlining PodIP.IP?
IIUC, the only interface requiring the string is grpc.DialContext

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use PodIP everywhere where we used NodeAddress before.
But: oh, it's just a wrapper around a string, I expected a more IP-ish thing 😁
And: oh, we did not even check the type of the NodeAddress in the old version of popNodes, and just took the first one 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But: oh, it's just a wrapper around a string, I expected a more IP-ish thing 😁

😁 yep, moreover at the end of the day we use the string, so not sure it's worth it

}

chosenNodesAddresses := c.popNodes(&nodesToAsk, nodesBatchCount)
healthyResponses, unhealthyResponses, apiErrorsResponses, _ := c.getHealthStatusFromPeers(chosenNodesAddresses)
chosenPodIPs := c.popPeerIPs(&peersToAsk, nodesBatchCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at other places we use peer instead of node or pod, what about naming this var chosenPeerIPs as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, just a typo here

@clobrano
Copy link
Contributor Author

/test 4.15-openshift-e2e

1 similar comment
@clobrano
Copy link
Contributor Author

/test 4.15-openshift-e2e

@mshitrit
Copy link
Member

mshitrit commented Jun 26, 2024

/lgtm
/hold
Holding since not sure if others threads are resolved - feel free to unhold if this is the case.

@slintes
Copy link
Member

slintes commented Jun 27, 2024

code lgtm, but I would prefer to have enabled peer check e2e tests before merging this

- re-enable and fix api check log tests in e2e test
  - use service IP for killing API connection
  - kill API connection on SNR DS pod
  - add peer check server logs and use them for test which can't
    get logs from unhealthy node's SNR agent pod
  - wait for pod deletion only, not restart (restart is caused by
    reboot, not SNR)
- refactor / cleanup e2e tests
- fix owner check / node name / machine name in peer check server
  and agent reconciler
- update sort-imports, which ignores generated files now
At startup (but it might happen in other moments too), some peers' Pod
IP can still be empty, which means that until the next peers update we
cannot check the connection with the other peers.

Return an error in case a peer's Pod IP is empty.

Signed-off-by: Carlo Lobrano <[email protected]>
@clobrano
Copy link
Contributor Author

clobrano commented Jul 9, 2024

/test 4.15-openshift-e2e

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

wait for #226 being merged

Copy link
Contributor

openshift-ci bot commented Jul 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@clobrano clobrano marked this pull request as ready for review July 9, 2024 17:10
@openshift-ci openshift-ci bot requested review from mshitrit and slintes July 9, 2024 17:10
@slintes
Copy link
Member

slintes commented Jul 9, 2024

/test all

@slintes
Copy link
Member

slintes commented Jul 9, 2024

/hold cancel

@slintes
Copy link
Member

slintes commented Jul 9, 2024

/retest

@clobrano
Copy link
Contributor Author

clobrano commented Jul 9, 2024

could not run steps: step [input:ocp-4.12-upi-installer] failed: failed to wait for importing imagestreamtag

4.12?

@clobrano
Copy link
Contributor Author

clobrano commented Jul 9, 2024

It doesn't seem something related to our test.

/retest

@slintes
Copy link
Member

slintes commented Jul 9, 2024

could not run steps: step [input:ocp-4.12-upi-installer] failed: failed to wait for importing imagestreamtag

4.12?

looks very unrelated, upi is also wrong, should be ipi IIUC

@openshift-merge-bot openshift-merge-bot bot merged commit 22336d0 into medik8s:main Jul 9, 2024
26 checks passed
@slintes
Copy link
Member

slintes commented Jul 9, 2024

/cherry-pick release-0.9

@openshift-cherrypick-robot

@slintes: new pull request created: #234

In response to this:

/cherry-pick release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@slintes
Copy link
Member

slintes commented Jul 9, 2024

/cherry-pick release-0.9

@openshift-cherrypick-robot

@slintes: #220 failed to apply on top of branch "release-0.9":

Applying: Some fixes and e2e test improvements:
Using index info to reconstruct a base tree...
M	Makefile
M	controllers/selfnoderemediation_controller.go
M	controllers/tests/controller/selfnoderemediation_controller_test.go
M	e2e/self_node_remediation_test.go
M	e2e/suite_test.go
A	e2e/utils/node.go
M	e2e/utils/pod.go
M	go.mod
M	go.sum
M	main.go
M	pkg/apicheck/check.go
M	pkg/peerhealth/client.go
M	pkg/peerhealth/client_server_test.go
M	pkg/peerhealth/peerhealth.pb.go
M	pkg/peerhealth/peerhealth_grpc.pb.go
M	pkg/peerhealth/server.go
M	pkg/peerhealth/suite_test.go
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging pkg/peerhealth/server.go
CONFLICT (content): Merge conflict in pkg/peerhealth/server.go
Auto-merging controllers/selfnoderemediation_controller.go
CONFLICT (content): Merge conflict in controllers/selfnoderemediation_controller.go
CONFLICT (add/add): Merge conflict in controllers/owner_and_name.go
Auto-merging controllers/owner_and_name.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Some fixes and e2e test improvements:
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@slintes
Copy link
Member

slintes commented Jul 9, 2024

meh, reopening and fixing #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants