-
Couldn't load subscription status.
- Fork 60
[feat, multicast] Multicast Group Support #9091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This work introduces multicast IP pool capabilities to support external multicast traffic routing through the rack's switching infrastructure. Includes: - Add IpPoolType enum (unicast/multicast) with unicast as default - Add multicast pool fields: switch_port_uplinks (UUID[]), mvlan (VLAN ID) - Add database migration (multicast-support/up01.sql) with new columns and indexes - Add ASM/SSM range validation for multicast pools to prevent mixing - Add pool type-aware resolution for IP allocation - Add custom deserializer for switch port uplinks with deduplication - Update external API params/views for multicast pool configuration - Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_FLAG_FIELD) for validation Database schema updates: - ip_pool table: pool_type, switch_port_uplinks, mvlan columns - Index on pool_type for efficient filtering - Migration preserves existing pools as unicast type by default This provides the foundation for multicast group functionality while maintaining full backward compatibility with existing unicast pools. References (for review): - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14
7eec6c5 to
889f1ef
Compare
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required
for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming
and exposes an API for operating multicast groups over instances.
Highlights:
- DB: new multicast_group tables; member lifecycle management
- API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback
- Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking
- Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests
- Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present
- Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/
Components:
- Database schema: external and underlay multicast groups; member/instance association tables
- Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence
- API layer: endpoints and validation; default-VNI semantics when VPC not provided
- Sled agent: OPTE stubs and compatibility shims for older agents
Workflows Implemented:
1. Instance lifecycle integration:
- "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW
- "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack
- "Stop" -> deactivate dataplane membership; retain DB membership for fast restart
- "Delete" -> remove instance memberships; group deletion is explicit
- "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees
- Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming
2. RPW reconciliation:
- ensure dataplane switches match database state
- handle sled migrations and state transitions
- Eventual consistency with retry logic
Migrations:
- Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql)
- Bump schema versions accordingly
API/Compatibility:
- OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json
- Contains a version change (to v5) as InstanceEnsureBody has been modified to
include multicast_groups associated with an instance in the
underlying sled config
- Regenerate clients where applicable
References:
- RFD 488: https://rfd.shared.oxide.computer/rfd/488
- IP Pool extensions: #9084
- Dendrite PRs (based on recency):
* oxidecomputer/dendrite#132
* oxidecomputer/dendrite#109
* oxidecomputer/dendrite#14
Follow-ups include:
- OPTE integration
- commtest extension
- omdb commands are tracked in issues
- pool and group stats
889f1ef to
04dfa49
Compare
…sed on config
Being that we still have OPTE and Maghemite updates to come for statically routed multicast,
we gate RPW and Saga actions behind runtime configuration ("on" for tests). API calls
are tagged "experimental."
8c6215e to
ca242df
Compare
|
@internet-diglett, others, I added "feature-gating" to this PR, as well as "experimental" tagging for the new entrypoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few API questions to start out with.
Removes project-scoping on multicast groups, adds documentation to API paths doing similar logic for members (through groups vs instances)
This commit performs cleanup and adds additional ttests related to moving multicast greates to Fleet scope. This commit also adds mvlan (for vlans related to mcast egress traffic from the rack) to mcast groups.
Also, we add DB a constraint and specialize the range error.
This work introduces multicast IP pool capabilities to support external multicast traffic routing through the rack's switching infrastructure. Closes #8217. Includes: - Add IpPoolType enum (unicast/multicast) with unicast as default - Add database migration (multicast-pool-support/up01.sql) with new columns and indexes - Add ASM/SSM range validation for multicast pools to prevent mixing - Add pool type-aware resolution for IP allocation - Update external API params/views for multicast pool configuration - Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_SUBNET) for validation Database schema updates: - ip_pool table: pool_type - Index on pool_type for efficient filtering - Migration preserves existing pools as unicast type by default This provides the foundation for multicast group functionality while maintaining full backward compatibility with existing unicast pools. References (for review): - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 TODOs: - [x] Multicast Group Support (#9091)
This also includes some test formatting consistency for multicast tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zeeshanlakhani. Looks like multicast is coming together really nicely in Omicron. Here is my first batch of comments. I think I'm about half way through, will try to get through the next half tomorrow.
| /// populated by reconciler when group becomes ["Active"](MulticastGroupState::Active). | ||
| pub underlay_group_id: Option<Uuid>, | ||
| /// Rack ID multicast group was created on. | ||
| pub rack_id: Uuid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons. We used to find the associated ports (which is logic I should reconfigure). This was also tying rack_id to groups if we (once we?) have a multi-rack world. Mainly the first one, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll return to this once I rework the find ports fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty, through the second half. This is a massive chunk of work and a lot to wrap your arms around!
Schema updates:
- Use (only) ff04::/64 admin-scoped default for multicast underlay
addresses (not whole space)
- Remove vni from underlay_multicast_group structure and associated index
- Improve comment consistency across mcast schemas
Database Operations:
- New ops/ module with atomic operations addressing TOCTOU concerns:
- member_attach.rs: CTE-based atomic member attachment
- member_reconcile.rs: CAS operations for RPW reconciliation
- Refactor of members.rs datastore with sled_id tracking and lifecycle management
Instance Integration:
- Updates to multicast reconciler activation after instance lifecycle operations (start, stop, reboot, migrate)
- saga update: Update multicast member sled_id in dedicated saga node with undo support
RPW Background Tasks:
- Updates to reconciler logic for member state transitions (Joining → Joined → Left)
- Better DPD synchronization with retry and error handling
Database operations:
- Use CTE for atomic member attachment (addresses TOCTOU concerns)
- Use CAS operations for member reconciliation in RPW
- New ops module with member_attach and member_reconcile implementations
Address validation:
- Add multicast subnet constants to common/address.rs
- Use constants for IP pool validation (replaces hardcoded ranges)
Authorization:
- Allow any authenticated user to create/modify multicast groups in their fleet (not just Fleet::Admin)
- Enables cross-project and cross-silo multicast communication
- Added create_child and modify permissions to MulticastGroup policy
Note: Review IP Pool extensions first.
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances.
Closes #8242.
This currently points to #9084, which is needed for this to work.
Highlights:
Components:
Workflows Implemented:
Instance lifecycle integration:
RPW reconciliation:
Migrations:
API/Compatibility:
include multicast_groups associated with an instance in the underlying sled config
References:
Follow-ups include: