OCPBUGS-29975: Allow multiple machine networks #6071

zaneb · 2024-03-11T01:10:13Z

Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).

None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)

This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.

(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the belongs-to-majority-group host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

assisted-test-infra environment
dev-scripts environment
Reviewer's test appreciated
Waiting for CI to do a full test run
Manual (Elaborate on how it was tested)
No tests needed

Checklist

Title and description added to both, commit and PR.
Relevant issues have been associated (see CONTRIBUTING guide)
This change does not require a documentation update (docstring, docs, README, etc)
Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?
Is there a bug required (and linked) for this change?
Should this PR be backported?

openshift-ci-robot · 2024-03-11T01:10:17Z

@zaneb: This pull request references Jira Issue OCPBUGS-29975, which is invalid:

expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).

None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)

This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.

(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the belongs-to-majority-group host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

Agent-based installer

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

zaneb · 2024-03-11T01:11:37Z

/jira refresh

openshift-ci-robot · 2024-03-11T01:11:43Z

@zaneb: This pull request references Jira Issue OCPBUGS-29975, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mhanss

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

zaneb · 2024-03-11T03:00:11Z

Infra issues with the CentOS repos
/retest

codecov · 2024-03-11T03:24:14Z

Codecov Report

Attention: Patch coverage is 52.77778% with 68 lines in your changes missing coverage. Please review.

Project coverage is 68.18%. Comparing base (d295de7) to head (b9ba27b).
Report is 29 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/network/cidr_validations.go	8.82%	31 Missing ⚠️
internal/network/machine_network_cidr.go	77.41%	9 Missing and 5 partials ⚠️
internal/network/dual_stack_validations.go	0.00%	11 Missing ⚠️
internal/provider/baremetal/installConfig.go	0.00%	7 Missing ⚠️
internal/bminventory/inventory.go	75.00%	1 Missing and 1 partial ⚠️
internal/cluster/validations/validations.go	0.00%	1 Missing ⚠️
internal/provider/nutanix/installConfig.go	0.00%	1 Missing ⚠️
internal/provider/vsphere/installConfig.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6071      +/-   ##
==========================================
- Coverage   68.19%   68.18%   -0.02%     
==========================================
  Files         279      279              
  Lines       39282    39288       +6     
==========================================
  Hits        26789    26789              
- Misses      10060    10068       +8     
+ Partials     2433     2431       -2

Files with missing lines	Coverage Δ
internal/cluster/validator.go	`95.87% <100.00%> (+0.51%)`	⬆️
internal/host/validator.go	`82.49% <100.00%> (+0.19%)`	⬆️
internal/cluster/validations/validations.go	`13.73% <0.00%> (+0.13%)`	⬆️
internal/provider/nutanix/installConfig.go	`0.00% <0.00%> (ø)`
internal/provider/vsphere/installConfig.go	`0.00% <0.00%> (ø)`
internal/bminventory/inventory.go	`70.88% <75.00%> (-0.03%)`	⬇️
internal/provider/baremetal/installConfig.go	`42.35% <0.00%> (-0.51%)`	⬇️
internal/network/dual_stack_validations.go	`0.00% <0.00%> (ø)`
internal/network/machine_network_cidr.go	`62.32% <77.41%> (+1.73%)`	⬆️
internal/network/cidr_validations.go	`44.76% <8.82%> (-5.78%)`	⬇️

ori-amizur

It is not clear what are the restrictions here:
Do we allow multiple API VIPs per address family?
Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.
What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?
Can a machine-network be stale (without hosts)?

ori-amizur · 2024-03-11T12:46:21Z

internal/network/cidr_validations.go

@@ -21,7 +22,7 @@ const MinSNOMachineMaskDelta = 1
 func parseCIDR(cidr string) (ip net.IP, ipnet *net.IPNet, err error) {
 	ip, ipnet, err = net.ParseCIDR(cidr)
 	if err != nil {
-		err = errors.Wrapf(err, "Failed to parse CIDR '%s'", cidr)
+		err = fmt.Errorf("Failed to parse CIDR '%s': %w", cidr, err)


Why this was changed?

Because it's a pain to import both the standard library "errors" package and the deprecated old-timey hack "github.com/pkg/errors" package in the same file. This is a trivial refactor, the result is the same.

ori-amizur · 2024-03-11T12:47:42Z

internal/network/cidr_validations.go

 	if err != nil {
 		return err
 	}

 	if overlap {
-		return errors.Errorf("CIDRS %s and %s overlap", aCidrStr, bCidrStr)
+		return fmt.Errorf("CIDRS %s and %s overlap", aCidrStr, bCidrStr)


Please do not change the errors unless necessary

ori-amizur · 2024-03-11T16:16:34Z

internal/bminventory/inventory.go

 					log.WithError(err).Warnf("Verify VIPs")
 					return common.NewApiError(http.StatusBadRequest, err)
 				}
 			}

 		} else {
-			primaryMachineNetworkCidr, err = network.CalculateMachineNetworkCIDR(network.GetApiVipById(&targetConfiguration, 0), network.GetIngressVipById(&targetConfiguration, 0), cluster.Hosts, matchRequired)
+			primaryMachineNetworkCidr, err := network.CalculateMachineNetworkCIDR(network.GetApiVipById(&targetConfiguration, 0), network.GetIngressVipById(&targetConfiguration, 0), cluster.Hosts, matchRequired)


This might be wrong according the concept presented by this PR, because we may have to machine-network - one per vip and one of both vips.

I added a comment indicating that this is something we'll want to fix as part of OCPBUGS-30730.

ori-amizur · 2024-03-11T16:27:07Z

internal/cluster/validator.go

-	if err := checkCidrsOverlapping(c.cluster); err != nil {
-		return ValidationFailure, fmt.Sprintf("CIDRS Overlapping: %s.", err.Error())
+	if err := network.VerifyNoNetworkCidrOverlaps(c.cluster.ClusterNetworks, c.cluster.MachineNetworks, c.cluster.ServiceNetworks); err != nil {
+		return ValidationFailure, err.Error()


The validation error messages need to be agreed upon. They are presented by UI. Here it is not clear what is expected as text of this error message

Ack. I did change them, you can see the change in the tests here: 51e76b1#diff-854f83029709261bb4e532a5a6839b51ddeb2beb2f6e921056016a74340d06a3

internal/host/validator.go

ori-amizur · 2024-03-11T16:47:20Z

internal/network/machine_network_cidr.go

 		return models.VipVerificationUnverified, errors.Errorf("%s <%s> cannot be set if Machine Network CIDR is empty", vipName, vip)
 	}
-	if !ipInCidr(vip, machineNetworkCidr) {
-		return models.VipVerificationFailed, errors.Errorf("%s <%s> does not belong to machine-network-cidr <%s>", vipName, vip, machineNetworkCidr)
+	if machineNetworkCidr == "" {


This test is irrelevant. The machine network cannot be valid and empty.

Yes it can - validMachineNetwork will be true if any machine networks are defined, but machineNetworkCidr will only be set if the VIP is in one of them.

So you don't need the test for valid-network

This is maintaining the previous behaviour. If there are no machineNetworks specified, we return VipVerificationUnverified. If there are machineNetworks specified and the VIP isn't in any of them, we return VipVerificationFailed.

I think what's confusing in the diff is that machineNetworkCidr was previously a parameter, and now it's a local variable.

zaneb · 2024-03-12T10:57:59Z

Do we allow multiple API VIPs per address family?

No.

Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.

No.

What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?

Yes (and in fact this is likely to be a common case for platform: none).

Can a machine-network be stale (without hosts)?

I don't see why not.

ori-amizur · 2024-03-13T09:05:44Z

Do we have restriction per host-role to belong to a specific machine-network? If yes, this needs to be added as validation.

No.

So workers do not have to belong to machine-networks of ingress-vips? Maybe we have to verify that we have at least 2 workers belonging to machine-network of ingress VIPs ? If not it will cause a mess. The logic is already complicated and if we don't set rules, users will be confused.

What if we have only 3 host cluster (only masters). Can such cluster have multiple IPv4 machine-networks?

Yes (and in fact this is likely to be a common case for platform: none).

None platform does not have machine-networks at all (At least how we implemented it).
Anyway if we have multiple machine-networks for masters, how can the API VIPs move between these hosts?

Can a machine-network be stale (without hosts)?

I don't see why not.

Why is it needed?

openshift-bot · 2024-06-12T09:01:14Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-07-13T00:30:12Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2024-08-12T08:00:46Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2024-08-12T08:01:14Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot · 2024-08-12T08:01:18Z

@zaneb: This pull request references Jira Issue OCPBUGS-29975. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

In response to this:

Until now, assisted-service has assumed that there would be either exactly one MachineNetwork specified for single-stack clusters, or for dual-stack clusters that there would be exactly one IPv4 and one IPv6 MachineNetwork specified (in that order).

None of these restrictions exist in OpenShift itself, which allows multiple MachineNetworks of each address family. This is necessary to support remote worker nodes on day 1, as well as distributed installations of a kind that are common in UPI deployments (where an external load balancer can easily balance across hosts in separate networks). (OCPBUGS-29975)

This change removes the assumptions about a single MachineNetwork per address family for clusters with UserManagedNetworking enabled, and reverts the change in #4867 that prevents users from specifying more networks at the API level.

(Clusters without UserManagedNetworking will still use Layer 2 reachability checks in the belongs-to-majority-group host validation, which effectively prevents using remote worker nodes when creating these clusters. See OCPBUGS-30730.)

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

Agent-based installer

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

zaneb · 2024-08-20T10:21:45Z

@ori-amizur I added a check in the VerifyVIPs that the machine network the API VIP is in is present on all master nodes, and the machine network the Ingress VIP is in is present on all worker nodes.
Note that for non-UserManagedNetworking, the l2_connectivity check is still going to require all the nodes to be on the same L2 network segment, effectively. There is a separate bug OCPBUGS-30730 for handling that, but this is a prerequisite for it.
For UserManagedNetworking clusters, hosts need only have l3_connectivity, so this change will allow them to pass multiple machine networks and have hosts connected to disjoint L2 domains, as long as they are routable. Since these clusters have no VIPs, this is what we want.

Note that in ABI we always pass the MachineNetworks provided in the install-config. Also, for dual-stack the user must always pass the MachineNetworks explicitly to assisted-service.

zaneb · 2024-09-03T09:02:43Z

Tests are working now, so I believe this is ready for another round of review.

pawanpinjarkar · 2024-10-24T14:21:50Z

/cc @avishayt @ori-amizur could you take another look and comment if this is ready to go.

It's confusing that GetMachineNetworksFromBootstrapHost() returns the existing MachineNetworks in the cluster (and does *not* get them from the bootstrap host) if they already exist there. Refactor to make the logic clearer.

The network type is set for all platforms in getBasicInstallConfig(). There is no need to set it again in the none platform provider.

None of the subnets specified in any of the machineNetworks, clusterNetworks, or serviceNetworks should overlap. Validate all of these combinations, as openshift-installer does, instead of making assumptions about indices being aligned to address families.

Don't make assumptions about a 1:1 mapping between MachineNetworks and VIPs. Check only that the VIP is a member of any MachineNetwork.

When doing cluster validations, check that all of the hosts that the VIP can point to (i.e. control plane hosts for the API VIP, workers for the Ingress VIP) are members of the VIP's MachineNetwork. Since at the time of adding the VIPs (and also during cluster validations) we only check that the VIP is a member of _some_ MachineNetwork, we need this additional check to ensure that it is one where the hosts are.

It doesn't appear this was ever used outside of its own unit tests.

openshift-ci · 2024-12-03T04:26:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zaneb
Once this PR has been reviewed and has the lgtm label, please assign ori-amizur for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In the belongs-to-machine-cidr validation, allow the host to be a member of any MachineNetwork. In a dual-stack cluster, require it to be a member of both an IPv4 and an IPv6 network. Previously it was assumed that the only reason for multiple MachineNetworks to appear was that a dual stack cluster could contain exactly one IPv4 and one IPv6 MachineNetwork.

Multiple MachineNetworks in the same address family and IPv6-primary dual-stack clusters are a thing, so relax the dual-stack validation requirements for machine networks to allow them.

Allow users to specify multiple machine networks of the same address family. This is a documented and supported feature of OpenShift. This reverts commit 873dd81.

Don't restrict ourselves to the first machine network when looking for an interface on a machine network to set the BootMACAddress.

OpenShift has ~always supported having machines in multiple machineNetworks, so update the TODO comment to reflect that accounting for this is already something we need to do to fully support non-UserManagedNetworking clusters. (UserManagedNetworking clusters use only the L3 connectivity check.) See https://issues.redhat.com/browse/OCPBUGS-30730 for more details.

openshift-ci · 2024-12-05T10:50:07Z

@zaneb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/edge-e2e-metal-assisted-none	`ff9be84`	link	false	`/test edge-e2e-metal-assisted-none`
ci/prow/edge-e2e-metal-assisted-mtv-4-17	`e1878e0`	link	true	`/test edge-e2e-metal-assisted-mtv-4-17`
ci/prow/edge-e2e-nutanix-assisted	`b9ba27b`	link	false	`/test edge-e2e-nutanix-assisted`
ci/prow/edge-e2e-nutanix-assisted-4-14	`b9ba27b`	link	false	`/test edge-e2e-nutanix-assisted-4-14`
ci/prow/edge-subsystem-aws	`b9ba27b`	link	true	`/test edge-subsystem-aws`
ci/prow/edge-subsystem-kubeapi-aws	`b9ba27b`	link	true	`/test edge-subsystem-kubeapi-aws`
ci/prow/okd-scos-e2e-aws-ovn	`b9ba27b`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/edge-e2e-metal-assisted	`b9ba27b`	link	true	`/test edge-e2e-metal-assisted`
ci/prow/edge-unit-test	`b9ba27b`	link	true	`/test edge-unit-test`
ci/prow/edge-e2e-vsphere-assisted	`b9ba27b`	link	false	`/test edge-e2e-vsphere-assisted`
ci/prow/edge-e2e-oci-assisted	`b9ba27b`	link	false	`/test edge-e2e-oci-assisted`
ci/prow/e2e-agent-compact-ipv4	`b9ba27b`	link	true	`/test e2e-agent-compact-ipv4`
ci/prow/edge-e2e-oci-assisted-4-14	`b9ba27b`	link	false	`/test edge-e2e-oci-assisted-4-14`
ci/prow/edge-e2e-ai-operator-ztp	`b9ba27b`	link	true	`/test edge-e2e-ai-operator-ztp`
ci/prow/edge-e2e-metal-assisted-osc-sno-4-17	`b9ba27b`	link	true	`/test edge-e2e-metal-assisted-osc-sno-4-17`
ci/prow/edge-e2e-metal-assisted-osc-4-17	`b9ba27b`	link	true	`/test edge-e2e-metal-assisted-osc-4-17`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

paul-maidment · 2024-12-10T12:12:06Z

internal/bminventory/inventory_test.go

+		Context("Cluster with two networks of same stack", func() {
+			It("only v4 in cluster networks rejected", func() {
+				errStr := "Second cluster network has to be IPv6 subnet"
+				params.NewClusterParams.ClusterNetworks = []*models.ClusterNetwork{


Because ClusterNetworks is an array that can theoretically contain any number of entries.

Is the maximum network count we expect 2?

If so, then should we have a test to ensure that only 2 networks are provided.

If not, then should we be assuming that we are dealing with a "Second" cluster network or should this test in fact check that at least one of the cluster networks has an IPv6 subnet and at least one is IPv4?

paul-maidment · 2024-12-10T12:13:11Z

internal/bminventory/inventory_test.go

+			It("only v4 in service networks rejected", func() {
+				errStr := "Second service network has to be IPv6 subnet"
+				params.NewClusterParams.ClusterNetworks = common.TestDualStackNetworking.ClusterNetworks
+				params.NewClusterParams.ServiceNetworks = []*models.ServiceNetwork{


Because ServiceNetworks is an array that can theoretically contain any number of entries.

Is the maximum network count we expect 2?

If so, then should we have a test to ensure that only 2 networks are provided.

If not, then should we be assuming that we are dealing with a "Second" service network or should this test in fact check that at least one of the service networks has an IPv6 subnet and at least one is IPv4?

paul-maidment · 2024-12-10T12:17:07Z

internal/host/transition_test.go

@@ -4234,7 +4234,7 @@ var _ = Describe("Refresh Host", func() {
 				ntpSources:       defaultNTPSources,
 				role:             models.HostRoleMaster,
 				statusInfoChecker: makeValueChecker(formatStatusInfoFailedValidation(statusInfoNotReadyForInstall,
-					"Host does not belong to machine network CIDRs. Verify that the host belongs to every CIDR listed under machine networks")),
+					"Host does not belong to machine network CIDRs. Verify that the host belongs to a listed machine network CIDR for each IP stack in use")),


Maybe adding something like for each IP stack (IPv4/IPv6) in use so that the user clearly understands the meaning of 'IP stack'

paul-maidment · 2024-12-10T12:40:35Z

internal/network/dual_stack_validations.go

 func VerifyMachineNetworksDualStack(networks []*models.MachineNetwork, isDualStack bool) error {
 	if !isDualStack {
 		return nil
 	}
-	if len(networks) != 2 {
+	if len(networks) < 2 {
 		return errors.Errorf("Expected 2 machine networks, found %d", len(networks))


"Expected at least 2 machine networks and at least one for each IP stack (IPv4, IPv6), found %d"

paul-maidment · 2024-12-10T12:47:42Z

internal/cluster/validator.go

 			vipsWrapper.Verification(i), v.log)
-		failed = failed || verification != models.VipVerificationSucceeded
+		if verification == models.VipVerificationSucceeded {


I think we need a comment to explain what is going on here, looks like you are checking host networks after failing to find the VIP in machine networks.

If so, I think we should have a comment.

paul-maidment · 2024-12-10T12:50:56Z

internal/cluster/validator.go

+			verification, err = network.ValidateVipInHostNetworks(c.cluster.Hosts, c.cluster.MachineNetworks, vipsWrapper.IP(i), vipsWrapper.Type(), v.log)
+			failed = failed || verification != models.VipVerificationSucceeded
+		} else {
+			failed = true


I think we should skip the else here and assume failure unless disproved in the machine network and cluster network checks.

Failed = true should be the default value before any checks have been performed and we should be aiming to set this false in subsequent tests.

paul-maidment · 2024-12-10T13:09:05Z

internal/network/cidr_validations.go

+	machineNetworks []*models.MachineNetwork,
+	serviceNetworks []*models.ServiceNetwork) error {
+	errs := []error{}
+	for imn, mn := range machineNetworks {


High time complexity here O(m*n)
There might be something clever we could do to reduce this.

Different language I know, but there are approaches using a data structure to memorize already searched ranges and to allow iteration across a set of networks in Order(n)

Note the use of a Radix tree in this implementation.

https://github.com/fgiuba/ipconflict/blob/master/ipconflict/subnet.py#L46-L61

Due to the limited number of networks, the consequences of not handling this may not be problematic. But it does look like something could be improved here!

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 11, 2024

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Mar 11, 2024

zaneb changed the title ~~OCPBUGS-29975: Support multiple machine networks~~ OCPBUGS-29975: Allow multiple machine networks Mar 11, 2024

openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 11, 2024

openshift-ci bot requested review from adriengentil and tsorya March 11, 2024 01:10

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2024

openshift-ci bot requested a review from mhanss March 11, 2024 01:11

zaneb force-pushed the multiple-machine-networks branch from 68f0945 to b996310 Compare March 11, 2024 09:49

ori-amizur reviewed Mar 11, 2024

View reviewed changes

zaneb force-pushed the multiple-machine-networks branch 2 times, most recently from c6e01dc to ff9be84 Compare March 12, 2024 10:12

zaneb mentioned this pull request Mar 19, 2024

OCPBUGS-30233: Filter IPs in majority group validation #6094

Closed

20 tasks

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 13, 2024

openshift-ci bot closed this Aug 12, 2024

zaneb force-pushed the multiple-machine-networks branch 2 times, most recently from 3be57c4 to 4a89db6 Compare August 20, 2024 10:20

zaneb force-pushed the multiple-machine-networks branch 2 times, most recently from 17570ee to 04cfd6d Compare August 20, 2024 11:19

zaneb force-pushed the multiple-machine-networks branch from 04cfd6d to e1878e0 Compare September 3, 2024 05:28

zaneb force-pushed the multiple-machine-networks branch from e1878e0 to 924e8a1 Compare November 19, 2024 00:03

zaneb added 10 commits December 3, 2024 17:24

Fix typo in function name

3348dcd

Refactor GetMachineNetworkForUserManagedNetworking()

def52bf

It's confusing that GetMachineNetworksFromBootstrapHost() returns the existing MachineNetworks in the cluster (and does *not* get them from the bootstrap host) if they already exist there. Refactor to make the logic clearer.

Use plural in function name

b91e5d5

Remove redundant code

484602b

The network type is set for all platforms in getBasicInstallConfig(). There is no need to set it again in the none platform provider.

Use fmt.Errorf() instead of deprecated errors package

270fcd6

Define constants for VIP types

ed86360

Validate VIPs against all MachineNetworks

44bb20e

Don't make assumptions about a 1:1 mapping between MachineNetworks and VIPs. Check only that the VIP is a member of any MachineNetwork.

Remove unused function

a3bccff

It doesn't appear this was ever used outside of its own unit tests.

zaneb force-pushed the multiple-machine-networks branch from 924e8a1 to eb9879e Compare December 3, 2024 04:25

zaneb added 5 commits December 3, 2024 17:41

Allow multiple MachineNetworks in dual stack

2ca8e8d

Multiple MachineNetworks in the same address family and IPv6-primary dual-stack clusters are a thing, so relax the dual-stack validation requirements for machine networks to allow them.

Revert "Forbid multiple machine networks in single-stack clusters"

102891f

Allow users to specify multiple machine networks of the same address family. This is a documented and supported feature of OpenShift. This reverts commit 873dd81.

Find BootMACAddress from any machine network

5f5598e

Don't restrict ourselves to the first machine network when looking for an interface on a machine network to set the BootMACAddress.

zaneb force-pushed the multiple-machine-networks branch from eb9879e to b9ba27b Compare December 3, 2024 04:41

paul-maidment suggested changes Dec 10, 2024

View reviewed changes

openshift-ci bot assigned paul-maidment Dec 10, 2024

OCPBUGS-29975: Allow multiple machine networks #6071

Are you sure you want to change the base?

OCPBUGS-29975: Allow multiple machine networks #6071

Conversation

zaneb commented Mar 11, 2024

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

openshift-ci-robot commented Mar 11, 2024

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

zaneb commented Mar 11, 2024

openshift-ci-robot commented Mar 11, 2024

zaneb commented Mar 11, 2024

codecov bot commented Mar 11, 2024 • edited Loading

Codecov Report

ori-amizur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaneb commented Mar 12, 2024

ori-amizur commented Mar 13, 2024 • edited Loading

openshift-bot commented Jun 12, 2024

openshift-bot commented Jul 13, 2024

openshift-bot commented Aug 12, 2024

openshift-ci bot commented Aug 12, 2024

openshift-ci-robot commented Aug 12, 2024

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

zaneb commented Aug 20, 2024

zaneb commented Sep 3, 2024

pawanpinjarkar commented Oct 24, 2024

openshift-ci bot commented Dec 3, 2024

openshift-ci bot commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-maidment Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Mar 11, 2024 •

edited

Loading

ori-amizur commented Mar 13, 2024 •

edited

Loading

paul-maidment Dec 10, 2024 •

edited

Loading