-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-30233: Filter IPs in majority group validation #6094
Conversation
@CrystalChun: This pull request references Jira Issue OCPBUGS-30233, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CrystalChun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@zaneb would this change cover the issue? If not, may I request your advice? |
internal/cluster/cluster.go
Outdated
primaryMachineCIDR := "" | ||
if network.IsMachineCidrAvailable(cluster) { | ||
primaryMachineCIDR = network.GetMachineCidrById(cluster, 0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to limit to a single MachineNetwork CIDR here?
This already will not work for dual stack clusters, because CreateL3MajorityGroup() is called for both IPv4 and IPv6, but this only passes the IPv4 MachineNetwork.
#6071 will allow multiple machine networks of each address family as well.
I would prefer to pass all of the MachineNetworks and check if each IP is in any of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I modified it to check IPs across multiple machine networks. I think the function I use to get all of the cluster's machine network CIDRs should encompass both address families?
I have difficulty following the connectivity groups code, but it seems plausible. If we ignore the extra IP addresses at the point where we're reading the inventory then it should work as if those interfaces did not exist, which is what I think we want. |
This change is incorrect since we don't have machine-networks in multi-node UMN. So there is no reason to limit the connectivity to machine-networks. |
@@ -482,7 +482,13 @@ func newL3QueryFactory(hosts []*models.Host, family AddressFamily) (hostQueryFac | |||
if err != nil { | |||
return nil, err | |||
} | |||
value[ip.String()] = true | |||
ipInCidr, err := IpInCidr(ip.String(), primaryMachineCIDR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This misses the whole point of creating connectivity-groups. The connectivity-groups should provide all connectivity information. Then the validation should filter weather it needs L2, L3 or specific networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...I couldn't find another place to filter ip addresses 😅 do you know of another place where we can filter them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you are right. If we want to limit the addresses this is probably the place.
In general, in UMN any IP from any network can be part of the cluster. Therefore, no machine-networks are required for multi-node UMN.
Since no machine-network should be provided in the install-config, OCP doesn't expect them either.
So even if we limit the addresses we use in connectivity-check, does OCP limit its activity to these addresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not too sure, @zaneb do you know if OCP will use these addresses? Also if no machine networks are required for UMN then how do we filter them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCP doesn't really seem to use the machine-networks except for validations in the installer.
In UMN hosts are allowed to be in different networks. Because assisted has a limitation of only 1 machine-network, I guess we just allow none to be specified. #6071 removes that limitation. We still don't want any of the machine networks to overlap with e.g. the clusterNetwork or serviceNetwork, so this enables us to do all of the validations.
In any event, the agent installer always passes the machineNetwork(s), even with UMN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this change would work well for both cases where either multiple machine networks are specified or none are specified.
/test? |
@ori-amizur: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test edge-e2e-metal-assisted-none |
/test edge-e2e-metal-assisted-none |
It seems that the test passed, So I wonder where did the machine-networks come from? Needs to be checked. |
It seems there is something wrong with the connectivity-groups. So there is IPv6 connectivity but there are no IPv6 addresses. |
/test edge-unit-test |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6094 +/- ##
==========================================
+ Coverage 68.35% 68.36% +0.01%
==========================================
Files 239 239
Lines 35524 35584 +60
==========================================
+ Hits 24283 24328 +45
- Misses 9132 9144 +12
- Partials 2109 2112 +3
|
e178b30
to
42d78e3
Compare
Oh I think it's because of the original condition here https://github.com/openshift/assisted-service/pull/6094/files#diff-e0b0e8e6b4de6465aa429384eadda9ed02b2ceb1300e6b95a2f89b8a0858f658R442 |
42d78e3
to
e2907a2
Compare
/test edge-unit-test |
https://issues.redhat.com/browse/OCPBUGS-30233 Filter the IP addresses when creating connectivity groups to hosts that belong within the machine network CIDRs (if they exist) to prevent failing "belongs-to-majority" validation. Previously, if the IPs were not filtered, the validation would fail if hosts had IPs on different NICs that couldn't connect to other hosts. These IPs were not used and caused the "belongs-to-majority" validation to fail.
eff4c3b
to
0b7cbc0
Compare
Modified the filter to only be used if there are machine networks! |
/retest |
@CrystalChun: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Right now it actually checks that there are no impossible connections between hosts. That's a very high bar.
Failures could occur if the actual routes set up don't result in traffic going over the network explicitly specified as the machine network, and also don't result in a successful connection over the networks the traffic is actually sent over. But as you said, in standard assisted installations the machine network is not provided anyway. So this change will only affect ABI. |
It appears that when UMN is enabled (so no VIPs) the interface used for the Node IP is the one that has the default route. |
I believe it needs to be tested |
It is all a matter of what OCP dictates.
There might be other reasons for connectivity failures. Besides, the edge cases need to be tested in order to verify that this change doesn't lead to installation failures.
We currently do not limit adding machine-networks in UMN. We are not aware if the invoker is ABI. |
/hold |
Also there's currently a validation that would fail this scenario (UMN, multiple machine nets) assisted-service/internal/cluster/validations/validations.go Lines 825 to 828 in 1795995
No more than one machine network can be defined if it's not dual-stack |
If the nodes can talk to each other on their kubelet IPs then OCP will be happy. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@CrystalChun: This pull request references Jira Issue OCPBUGS-30233. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
https://issues.redhat.com/browse/OCPBUGS-30233
Filter the IP addresses when creating connectivity groups to
hosts that belong within the machine network CIDRs
(if they exist) to prevent failing "belongs-to-majority"
validation. Previously, if the IPs were not filtered,
the validation would fail if hosts had IPs on different
NICs that couldn't connect to other hosts. These IPs
were not used and caused the "belongs-to-majority"
validation to fail.
List all the issues related to this PR
What environments does this code impact?
How was this code tested?
Checklist
docs
, README, etc)Reviewers Checklist
/cc @zaneb