-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define Linux Network Devices #1271
base: main
Are you sure you want to change the base?
Conversation
/assign @samuelkarp |
|
||
**`netdevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime MAY supply them however it likes. | ||
|
||
The name of the network device is the entry key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the map order matter? If so, implementation can be complicated for Go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the linux kernel guarantees the uniqueness of the name in the runtime namespace, so a set is ok. Order is not important , each network device should be independent of each other ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend a runtime performs a uniqueness check as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uniqueness inside container should be checked, e.g. that rename operation was successful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added more text to clarify runtime checks and network devices lifecycle, PTAL
https://github.com/opencontainers/runtime-spec/blob/main/features.md should be updated too |
51e5104
to
3a666eb
Compare
updated and addressed the comments |
|
||
**`netdevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime MAY supply them however it likes. | ||
|
||
The name of the network device is the entry key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend a runtime performs a uniqueness check as well?
AI @aojea (document the cleanup and destroy of the network interfaces) |
From the in-person discussion today:
|
|
||
This schema focuses solely on moving existing network devices identified by name into the container namespace. It does not cover the complexities of network device creation or network configuration, such as IP address assignment, routing, and DNS setup. | ||
|
||
**`netDevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime is responsible for providing these devices; the underlying mechanism is implementation-defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec said "MUST" but, I think it can't do it in the rootless container because the rootless container doesn't have CAP_NET_ADMIN, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should take care of the rootless container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be an error in the case of a rootless container, if the runtime is not able to satisfy the MUST condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be an error in the case of a rootless container, if the runtime is not able to satisfy the MUST condition.
+1 but It'd be better to clarify it in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added mor explanations about runtime and network devices lifecycle and runtime checks, PTAL
Pushed a new commit addressing those comments, the changelog is
|
7c83eac
to
da9b134
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Changelog since the last review, replace the field Example of implementation in runc opencontainers/runc#4538 |
Changelog since the last review:
@akerouanton I succesfully implemented the new behavior of bringing back the interfaces, despite are virtual, in runc. Please see opencontainers/runc#4538 , that implements this spec probing its feasibility |
Kindly ping to reviewers, I think I addressed all the comments, also have an implementation on opencontainers/runc#4538 with tests to validate assumptions |
Can you please elaborate a bit more on why this is something that What are the pros and cons? What are the risks (especially to some of the more esoteric runtimes, like those implemented via VMs)? When should bundle authors use this new field instead of what they're doing currently (and how do they accurately determine if what they're already doing is something that can be converted to this new field losslessly)? |
Oops, some of what I'm after is in #1239, my apologies. That being said, I think the use cases for this are still pretty thin since it's just moves of pre-existing interfaces, and most container networking operations will be about new interfaces (veth at the very least). It feels inconsistent to include that bit of networking here when we don't have any other parts (nor plans to include them, because they're understandably complex essentially right away) -- for a higher-order runtime like Docker to take advantage of this, it'd have to split "network creation" processing code into separate paths based on what type of networking is requested (such as pre-creating veth devices before passing a bundle along so they can be moved by runc, which I think has some big rough edges too). |
From the Kubernetes side, we have a considerable number of workloads and demand, specially those related to AI/ML and Telco , that require ONLY the addition of existing network interfaces on the node to be moved to the containers ... existing implementation based on CNI need to rely on out-of-band mechanisms to compensate this problem with a lot of glue and red tape that makes these operations very brittle ... this problem can be easily solved with a declaratively way to indicate "please, move mlx0 to the container" ... that is this proposal
This proposal is about "network interface" as device interface not "networks" as "docker networks" or CNI, I've tried to make this distinction clear in several places to avoid conflating both problems that are completely different. For a high level runtime these should just be |
config-linux.md
Outdated
* **`addresses`** *(array of strings, OPTIONAL)* - the IP addresses, IPv4 and or IPv6, of the device within the container in CIDR format (IP address / Prefix). All IPv4 addresses SHOULD be expressed in their decimal format, consisting of four decimal numbers separated by periods. Each number ranges from 0 to 255 and represents an octet of the address. IPv6 addresses SHOULD be represented in their canonical form as defined in RFC 5952. | ||
The runtime MAY limit the number of addresses allowed. | ||
The runtime MAY decide to revert back the original addreses. | ||
* **`hardwareAddress`** *(string, OPTIONAL)* - represents the hardware address (e.g. MAC Address) of the device's network interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the string representation of the MAC address should be clarified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, clarified to mach the golang ParseMac function https://pkg.go.dev/net#ParseMAC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds too specific to Go implementation.
Probably just need to support the single standard form (HEX:HEX:HEX:HEX:HEX:HEX)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just need to support the single standard form (HEX:HEX:HEX:HEX:HEX:HEX)
that is the IEEE 802 MAC-48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think golang implementation just got all the common formats https://networkengineering.stackexchange.com/a/82800
|
||
The runtime MUST set the network device state to "up" after moving it to the network namespace to allow the container to send and receive network traffic through that device. | ||
|
||
Notice that after deleting a network namespace, all its migratable network devices are moved to the default network namespace, virtual devices (veth, macvlan, ...) are destroyed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual devices (veth, macvlan, ...) are destroyed.
I'm still on the fence on that as it means we can't re-use the same netDevice. @aojea What's the rationale for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will answer below, this is just explaining the status quo, the line below explains this is a MAY, so runtimes that don't want to participate in the interface deletion don't need to do anything
The runtime MUST set the network device state to "up" after moving it to the network namespace to allow the container to send and receive network traffic through that device. | ||
|
||
Notice that after deleting a network namespace, all its migratable network devices are moved to the default network namespace, virtual devices (veth, macvlan, ...) are destroyed. | ||
The runtime MAY decide to move back or destroy the network device before the network namespace is deleted. If the network device is moved back, the runtime MUST set its state to "down" before moving it back to ensure that the interface is no longer active and won't interfere with other network operations or cause IP address conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much the same comment as above ^, why MAY decide to move back
and not MUST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are the experts in the runtime worlds so I will follow your advice, if you prefer and agree it should be MUST I don't have more argument than I'm suggesting MAY because I always prefer to be conservative when defining APIs and spec, I'm worried if some runtimes may not be able to participate in deletion or it can be very complex for them to do it, like I imaging can be the case for kata and gvisor. Being flexible allows them to implement the spec and be fully complaint instead of having to document it as limitation.
The proposed "netdevices" field provides a declarative way to specify which host network devices should be moved into a container's network namespace. This approach is similar than the existing "devices" field used for block devices but uses a dictionary keyed by the interface name instead. The proposed scheme is based on the existing representation of network device by the `struct net_device` https://docs.kernel.org/networking/netdevices.html. This proposal focuses solely on moving existing network devices into the container namespace. It does not cover the complexities of network configuration or network interface creation, emphasizing the separation of device management and network configuration. Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
- Clarify network device lifecycle and runtime checks during creation and deleting of the container. - Remove mask field and instead use the Address field with CIDR annotation to allow to use it for both IPv4 or IPv6. - Add a HardwareAddress field for use cases that require to set a specific mac or infiniband address. Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]> Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Albin Kerouanton <[email protected]> Signed-off-by: Antonio Ojea <[email protected]> Signed-off-by: Antonio Ojea <[email protected]>
- Remove reference to rootless containers, the feature flag will be used by the corredponding runtime to indicate if the feature is supported. - Clarify the runtime MUST set the interface UP when moving it to the container network namesapce - Clarify the runtime MUST revert back the original name if the interface is renamed to guarantee idempotence - Clarify the runtime MAY choose to revert the other original attributes like addresses, mtu and hardware address. Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Co-authored-by: Albin Kerouanton <[email protected]> Signed-off-by: Antonio Ojea <[email protected]> Signed-off-by: Antonio Ojea <[email protected]>
5ed1640
to
d8eb25f
Compare
The proposed "netdevices" field provides a declarative way to specify which host network devices should be moved into a container's network namespace.
This approach is similar than the existing "devices" field used for block devices but uses a dictionary keyed by the interface name instead.
The proposed scheme is based on the existing representation of network device by the
struct net_device
https://docs.kernel.org/networking/netdevices.html.
This proposal focuses solely on moving existing network devices into the container namespace. It does not cover the complexities of network configuration or network interface creation, emphasizing the separation of device management and network configuration.
Fixes: #1239