Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc(networking): add detailed explanation regarding route connectivity #611

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

starbops
Copy link
Member

@starbops starbops commented Jul 17, 2024

Extending work #481, add a new section to explain more about how the route connectivity columns are reflected, what could possibly be broken if the indicator is not green, and more importantly, what could be done to make it right.

@starbops starbops marked this pull request as ready for review July 17, 2024 09:30
Copy link

github-actions bot commented Jul 17, 2024

Name Link
🔨 Latest commit c36e97e
😎 Deploy Preview https://66fa8978a6c7ca09f874a61c--harvester-preview.netlify.app


Behind the scenes, the Harvester network controller checks the connectivity of each VM Network. Connectivity means whether the target VM Network is reachable (via router(s) if necessary) from the Harvester node. The check is essential because it indicates that such a VM Network is suitable for running workloads that require connections to the Harvester node, especially the control plane. For instance, the Harvester cloud provider that is running in the guest cluster needs to access the underlying Harvester/Kubernetes APIs to be able to calculate the node topology and provide the load balancer feature.

To check the connectivity, the gateway IP address is of interest to the Harvester network controller. Such information could be absent during VM Network creation. However, it's still possible to get it if a DHCP server is running on the target VM Network and configured with the gateway information. If the user actively provides the gateway information during network creation, the network controller happily accepts it. Otherwise, the network controller will create a helper job on the target network that acts as a DHCP client to get the gateway information. With the gateway IP address in mind, the network controller then sends ICMP Echo Request packets from the management network to the gateway and waits for responses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question, please double check:

The helper job is set with additional network annotation, which in turn creates the 2nd interface in the job POD that connects to VM network; like the storage network, LH PODs get an additional NIC & IP, all the related DHCP/ping are done on this interface. Not from the management network.

job.Spec.Template.ObjectMeta.Annotations[cniv1.NetworkAttachmentAnnot] = selectedNetworks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed that the DHCP client runs on the target VM Network according to the pod annotation. On the other hand, the pingGW call is run directly inside the manager pod instead of the helper pod, so it's apparently issued from the management network:

Besides, it would be useless to check the connectivity between the target VM Network and the management network just by pinging the gateway from a random IP address on the target VM network. Please correct me if I'm wrong. Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion and investigation:

It looks the connectivity between mgmt network and VM network is mostly related to REK2 downstream cluster scenairo. For other general cases, those two networks are blocked.

The ping detection looks to be better as an option for VM network which is enabled by the user, the user knows he need to make sure the infrastructure layer connects them.

Let's add detailed background and connect with rancher integratio part if possible, thanks.

Copy link
Contributor

@jillian-maroket jillian-maroket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have concerns about the changes.

Comment on lines 114 to 119
There are four possible cases for the **Route Connectivity** for each VM Network:

- `Active`, meaning the connectivity between the VM Network and Harvester hosts via the configured gateway is confirmed.
- `Dhcp failed`, meaning Harvester cannot get the route information via DHCP, therefore it cannot confirm the connectivity between the VM Network and Harvester hosts. Please ensure the DHCP server is configured appropriately and is L2-reachable (or routable if a DHCP relay server is provided in the target network). Otherwise, please provide the gateway IP address directly during the VM Network creation.
- `Ping failed`, meaning Harvester is unable to send ICMP Echo Request packets. This rarely happens.
- `Inactive`, meaning such a VM Network is not reachable (or reachable but packet loss is greater than 20%) from Harvester hosts. Please ensure the gateway is configured appropriately and is reachable via the management network where the Harvester nodes live.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are four possible cases for the **Route Connectivity** for each VM Network:
- `Active`, meaning the connectivity between the VM Network and Harvester hosts via the configured gateway is confirmed.
- `Dhcp failed`, meaning Harvester cannot get the route information via DHCP, therefore it cannot confirm the connectivity between the VM Network and Harvester hosts. Please ensure the DHCP server is configured appropriately and is L2-reachable (or routable if a DHCP relay server is provided in the target network). Otherwise, please provide the gateway IP address directly during the VM Network creation.
- `Ping failed`, meaning Harvester is unable to send ICMP Echo Request packets. This rarely happens.
- `Inactive`, meaning such a VM Network is not reachable (or reachable but packet loss is greater than 20%) from Harvester hosts. Please ensure the gateway is configured appropriately and is reachable via the management network where the Harvester nodes live.
Route connectivity for each VM Network can have any of the following states:
- **Active**: Connectivity between the VM Network and Harvester hosts via the configured gateway is confirmed.
- **Dhcp failed**: Harvester is unable to obtain route information via DHCP, so connectivity between the VM network and Harvester hosts cannot be confirmed. Ensure that the DHCP server is configured correctly and is L2-reachable (or routable if a DHCP relay server is provided in the target network). Otherwise, specify the gateway IP address when you create the VM network.
- **Ping failed**: Harvester is unable to send ICMP Echo Request packets. This is a rare occurrence.
- **Inactive**: Harvester hosts are unable to reach a VM network. In some cases, the VM network may be reachable but packet loss is greater than 20%. Ensure that the gateway is configured correctly and is reachable via the management network that the Harvester nodes are connected to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually reserve backticks for code. UI elements are bolded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to elaborate on the Dhcp failed part. The previous version is not accurate and causes confusion. Please take a look. Thank you.


:::info important

For the [VM load balancer feature](./loadbalancer#vm-load-balancer) to work, the VM network must be `Active` in terms of route connectivity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For the [VM load balancer feature](./loadbalancer#vm-load-balancer) to work, the VM network must be `Active` in terms of route connectivity.
The [VM load balancer](./loadbalancer#vm-load-balancer) functions as intended only if the route connectivity state is **Active**.


:::

Behind the scenes, the Harvester network controller checks the connectivity of each VM Network. Connectivity means whether the target VM Network is reachable (via router(s) if necessary) from the Harvester node. The check is essential because it indicates that such a VM Network is suitable for running workloads that require connections to the Harvester node, especially the control plane. For instance, the Harvester cloud provider that is running in the guest cluster needs to access the underlying Harvester/Kubernetes APIs to be able to calculate the node topology and provide the load balancer feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Behind the scenes, the Harvester network controller checks the connectivity of each VM Network. Connectivity means whether the target VM Network is reachable (via router(s) if necessary) from the Harvester node. The check is essential because it indicates that such a VM Network is suitable for running workloads that require connections to the Harvester node, especially the control plane. For instance, the Harvester cloud provider that is running in the guest cluster needs to access the underlying Harvester/Kubernetes APIs to be able to calculate the node topology and provide the load balancer feature.
The Harvester network controller checks VM network connectivity. This check is essential because if a VM network is reachable from a Harvester node (via routers, if necessary), the VM network is suitable for running workloads that require connections to the Harvester node, especially the control plane. For example, the Harvester cloud provider that is running in the guest cluster must access the underlying Harvester and Kubernetes APIs to be able to calculate the node topology and provide the load balancer functionality.


Behind the scenes, the Harvester network controller checks the connectivity of each VM Network. Connectivity means whether the target VM Network is reachable (via router(s) if necessary) from the Harvester node. The check is essential because it indicates that such a VM Network is suitable for running workloads that require connections to the Harvester node, especially the control plane. For instance, the Harvester cloud provider that is running in the guest cluster needs to access the underlying Harvester/Kubernetes APIs to be able to calculate the node topology and provide the load balancer feature.

To check the connectivity, the gateway IP address is of interest to the Harvester network controller. Such information could be absent during VM Network creation. However, it's still possible to get it if a DHCP server is running on the target VM Network and configured with the gateway information. If the user actively provides the gateway information during network creation, the network controller happily accepts it. Otherwise, the network controller will create a helper job on the target network that acts as a DHCP client to get the gateway information. With the gateway IP address in mind, the network controller then sends ICMP Echo Request packets from the management network to the gateway and waits for responses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To check the connectivity, the gateway IP address is of interest to the Harvester network controller. Such information could be absent during VM Network creation. However, it's still possible to get it if a DHCP server is running on the target VM Network and configured with the gateway information. If the user actively provides the gateway information during network creation, the network controller happily accepts it. Otherwise, the network controller will create a helper job on the target network that acts as a DHCP client to get the gateway information. With the gateway IP address in mind, the network controller then sends ICMP Echo Request packets from the management network to the gateway and waits for responses.
To check connectivity, the Harvester network controller must know the gateway IP address, which is not always specified when the VM network is created. However, this address can still be obtained if a DHCP server that is configured with the gateway information is running on the target VM network. To obtain the information, the network controller creates a helper job, which functions as a DHCP client, on the target network. Once the gateway address is obtained, the network controller sends ICMP Echo Request packets from the management network to the gateway, and waits for responses.


To check the connectivity, the gateway IP address is of interest to the Harvester network controller. Such information could be absent during VM Network creation. However, it's still possible to get it if a DHCP server is running on the target VM Network and configured with the gateway information. If the user actively provides the gateway information during network creation, the network controller happily accepts it. Otherwise, the network controller will create a helper job on the target network that acts as a DHCP client to get the gateway information. With the gateway IP address in mind, the network controller then sends ICMP Echo Request packets from the management network to the gateway and waits for responses.

To wrap up, the **Route Connectivity** for VM Networks is an important indicator representing the connectivity between the VM Network and the management network where the Harvester nodes live.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To wrap up, the **Route Connectivity** for VM Networks is an important indicator representing the connectivity between the VM Network and the management network where the Harvester nodes live.
In summary, **route connectivity** represents connectivity between the VM network and the management network, which the Harvester nodes are connected to.


:::note

If a VM Network's route connectivity is `Dhcp failed`, `Ping failed`, or `Inactive`, it doesn't mean the network is entirely unusable. It depends on what you're going to do with the network. Suppose you only want to run some workloads that should be completely isolated from any other network, including the management network where the Harvester nodes live. In that case, the VM Network is suitable for the job. Whether or not the VM Network has Internet connectivity is not the concern of the Harvester network controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a VM Network's route connectivity is `Dhcp failed`, `Ping failed`, or `Inactive`, it doesn't mean the network is entirely unusable. It depends on what you're going to do with the network. Suppose you only want to run some workloads that should be completely isolated from any other network, including the management network where the Harvester nodes live. In that case, the VM Network is suitable for the job. Whether or not the VM Network has Internet connectivity is not the concern of the Harvester network controller.
The states **Dhcp failed**, **Ping failed**, and **Inactive** do not imply that a VM network is completely unusable. For example, if you only want isolate certain workloads from other networks (including the management network that the Harvester nodes are connected to), a VM network can still be used. Whether a VM network has internet connectivity is not the concern of the Harvester network controller.

@@ -80,7 +80,7 @@ The [Harvester network-controller](https://github.com/harvester/harvester-networ
![](/img/v1.2/networking/create-network-manual.png)

:::info important
Harvester uses the information to verify that all nodes can access the VM network you are creating. If that is the case, the *Network connectivity* column on the **VM Networks** screen indicates that the network is active. Otherwise, the screen indicates that an error has occurred.
Harvester uses the information to verify that all nodes can access the VM network you are creating. If that is the case, the *Network connectivity* column on the **VM Networks** screen indicates that the network is active. Otherwise, the screen indicates that an error has occurred. Please check [the Route Connectivity section](#about-route-connectivity) for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Harvester uses the information to verify that all nodes can access the VM network you are creating. If that is the case, the *Network connectivity* column on the **VM Networks** screen indicates that the network is active. Otherwise, the screen indicates that an error has occurred. Please check [the Route Connectivity section](#about-route-connectivity) for more details.
Harvester uses the information to verify that all nodes can access the VM network you are creating. If that is the case, the *Network connectivity* column on the **VM Networks** screen indicates that the network is active. Otherwise, the screen indicates that an error has occurred. For more information, see [the Route Connectivity section](#about-route-connectivity).

@bk201 bk201 requested a review from rrajendran17 August 8, 2024 03:53
@rrajendran17
Copy link

For dhcpFailed, do you think we have to add more information to the user to understand what caused the failure ?
Like if a dhcp server is allocated and connected to the VM network ?
I am trying to understand who configures the dhcp server and is it done manually for every cluster and do we pre allocate what vlan networks we support ?

@rrajendran17
Copy link

rrajendran17 commented Aug 13, 2024

I believe dhcp server configuration per cluster is infra related and user will not have much control over it when creating vm networks.But I believe more understanding of the dhcp server configuration will make sure the user will configure the supported vm vlan networks to avoid dhcp connectivity failure.

My understanding is, the underlying DHCP server configuration is done on the switch connected to the uplink NIC, and the switch should be configured as trunk port with various vlans supported to accept tagged traffic and DHCP server must be configured with ip pools (subnet per vlan) for various vlans supported.If this is not done correctly or if user configures a vlan vm network not supported by the dhcp server, then route connectivity for dhcp will fail.

@starbops
Copy link
Member Author

starbops commented Sep 27, 2024

@rrajendran17 Thanks for the comments.

My understanding is, the underlying DHCP server configuration is done on the switch connected to the uplink NIC, and the switch should be configured as trunk port with various vlans supported to accept tagged traffic and DHCP server must be configured with ip pools (subnet per vlan) for various vlans supported.If this is not done correctly or if user configures a vlan vm network not supported by the dhcp server, then route connectivity for dhcp will fail.

It's not that complicated. The user can set up a DHCP server for each VM network; the DHCP servers are not required to be accessible by the Harvester nodes from the management network. This is because the Harvester network controller creates a pod running the DHCP client to get the network information, including the gateway IP address for each VM network. The pod is attached to the bridge that connects to the VM network. However, when checking the connectivity, the ICMP echo requests indeed come from the Harvester nodes via the management network. That's the difference between the two-step operation.

It's okay for users to create a VM network without a DHCP server running. However, they need to provide the network information, i.e., the subnet and gateway IP, when creating the VM network (the so-called "manual" mode). The Harvester network controller can then check the route connectivity as usual.

IMO, the most important thing from users' perspectives is to make sure the VM network is routable from the management network. This ensures some of the functionalities, like load balancers, work as expected, and that's why the check is called "route connectivity" but not "dhcp availability" or something else. On the contrary, this also implies that if users don't care about those functionalities and want to create a VM network isolated from the management network, they can totally ignore the inactive/dhcpFailed route connectivity.

For dhcpFailed, do you think we have to add more information to the user to understand what caused the failure ?
Like if a dhcp server is allocated and connected to the VM network ?
I am trying to understand who configures the dhcp server and is it done manually for every cluster and do we pre allocate what vlan networks we support ?

The proposed sentences are not very clear. I will revise it. Thank you for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants