Skip to content

Conversation

@zeeshanlakhani
Copy link
Collaborator

@zeeshanlakhani zeeshanlakhani commented Sep 25, 2025

Note: Review IP Pool extensions first.

Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances.

Closes #8242.

This currently points to #9084, which is needed for this to work.

Highlights:

  • DB: new multicast_group tables; member lifecycle management
  • API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback
  • Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking
  • Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests
  • Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present
  • Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/

Components:

  • Database schema: external and underlay multicast groups; member/instance association tables
  • Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence
  • API layer: endpoints and validation; default-VNI semantics when VPC not provided
  • Sled agent: OPTE stubs and compatibility shims for older agents

Workflows Implemented:

  1. Instance lifecycle integration:

    • "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW
    • "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack
    • "Stop" -> deactivate dataplane membership; retain DB membership for fast restart
    • "Delete" -> remove instance memberships; group deletion is explicit
    • "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees
    • Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming
  2. RPW reconciliation:

    • ensure dataplane switches match database state
    • handle sled migrations and state transitions - Eventual consistency with retry logic

Migrations:

  • Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql)
  • Bump schema versions accordingly

API/Compatibility:

  • OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json
  • Contains a version change (to v5) as InstanceEnsureBody has been modified to
    include multicast_groups associated with an instance in the underlying sled config
  • Regenerate clients where applicable

References:

Follow-ups include:

  • OPTE integration
  • commtest extension
  • omdb commands are tracked in issues
  • pool and group stats

This work introduces multicast IP pool capabilities to support external
multicast traffic routing through the rack's switching infrastructure.

Includes:
  - Add IpPoolType enum (unicast/multicast) with unicast as default
  - Add multicast pool fields: switch_port_uplinks (UUID[]), mvlan (VLAN ID)
  - Add database migration (multicast-support/up01.sql) with new columns and indexes
  - Add ASM/SSM range validation for multicast pools to prevent mixing
  - Add pool type-aware resolution for IP allocation
  - Add custom deserializer for switch port uplinks with deduplication
  - Update external API params/views for multicast pool configuration
  - Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_FLAG_FIELD) for validation

Database schema updates:
  - ip_pool table: pool_type, switch_port_uplinks, mvlan columns
  - Index on pool_type for efficient filtering
  - Migration preserves existing pools as unicast type by default

This provides the foundation for multicast group functionality while
maintaining full backward compatibility with existing unicast pools.

References (for review):
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14
@zeeshanlakhani zeeshanlakhani changed the title Zl/mcast impl [feat] Multicast Group Support Sep 25, 2025
@zeeshanlakhani zeeshanlakhani changed the base branch from main to zl/ip-pool-multicast-support September 25, 2025 16:04
@zeeshanlakhani zeeshanlakhani self-assigned this Sep 25, 2025
@zeeshanlakhani zeeshanlakhani changed the title [feat] Multicast Group Support [feat, multicast] Multicast Group Support Sep 25, 2025
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required
for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming
and exposes an API for operating multicast groups over instances.

Highlights:
  - DB: new multicast_group tables; member lifecycle management
  - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback
  - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking
  - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests
  - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present
  - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/

Components:
  - Database schema: external and underlay multicast groups; member/instance association tables
  - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence
  - API layer: endpoints and validation; default-VNI semantics when VPC not provided
  - Sled agent: OPTE stubs and compatibility shims for older agents

Workflows Implemented:
  1. Instance lifecycle integration:

     - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW
     - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack
     - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart
     - "Delete" -> remove instance memberships; group deletion is explicit
     - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees
     - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming

  2. RPW reconciliation:

     - ensure dataplane switches match database state
     - handle sled migrations and state transitions
     - Eventual consistency with retry logic

Migrations:
  - Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql)
  - Bump schema versions accordingly

API/Compatibility:
  - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json
  - Contains a version change (to v5) as InstanceEnsureBody has been modified to
    include multicast_groups associated with an instance in the
    underlying sled config
  - Regenerate clients where applicable

References:
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - IP Pool extensions: #9084
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14

Follow-ups include:
  - OPTE integration
  - commtest extension
  - omdb commands are tracked in issues
  - pool and group stats
…sed on config

Being that we still have OPTE and Maghemite updates to come for statically routed multicast,
we gate RPW and Saga actions behind runtime configuration ("on" for tests). API calls
are tagged "experimental."
@zeeshanlakhani
Copy link
Collaborator Author

@internet-diglett, others, I added "feature-gating" to this PR, as well as "experimental" tagging for the new entrypoints.

Copy link
Contributor

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few API questions to start out with.

This commit performs cleanup and adds additional ttests related
to moving multicast greates to Fleet scope.

This commit also adds mvlan (for vlans related to mcast egress traffic from the
rack) to mcast groups.
Also, we add DB a constraint and specialize the range error.
zeeshanlakhani added a commit that referenced this pull request Oct 15, 2025
This work introduces multicast IP pool capabilities to support external multicast traffic routing through the rack's switching infrastructure.

Closes #8217. 

Includes:
  - Add IpPoolType enum (unicast/multicast) with unicast as default
- Add database migration (multicast-pool-support/up01.sql) with new columns
and indexes
  - Add ASM/SSM range validation for multicast pools to prevent mixing
  - Add pool type-aware resolution for IP allocation
  - Update external API params/views for multicast pool configuration
- Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_SUBNET) for validation

Database schema updates:
  - ip_pool table: pool_type
  - Index on pool_type for efficient filtering
  - Migration preserves existing pools as unicast type by default

This provides the foundation for multicast group functionality while maintaining full backward compatibility with existing unicast pools.

References (for review):
  - RFD 488: https://rfd.shared.oxide.computer/rfd/488
  - Dendrite PRs (based on recency):
    * oxidecomputer/dendrite#132
    * oxidecomputer/dendrite#109
    * oxidecomputer/dendrite#14

TODOs:
- [x] Multicast Group Support (#9091)
Base automatically changed from zl/ip-pool-multicast-support to main October 15, 2025 13:28
This also includes some test formatting consistency for multicast tests.
Copy link
Contributor

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zeeshanlakhani. Looks like multicast is coming together really nicely in Omicron. Here is my first batch of comments. I think I'm about half way through, will try to get through the next half tomorrow.

/// populated by reconciler when group becomes ["Active"](MulticastGroupState::Active).
pub underlay_group_id: Option<Uuid>,
/// Rack ID multicast group was created on.
pub rack_id: Uuid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for?

Copy link
Collaborator Author

@zeeshanlakhani zeeshanlakhani Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons. We used to find the associated ports (which is logic I should reconfigure). This was also tying rack_id to groups if we (once we?) have a multi-rack world. Mainly the first one, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll return to this once I rework the find ports fn.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this in the next commit.

Copy link
Contributor

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, through the second half. This is a massive chunk of work and a lot to wrap your arms around!

  Schema updates:
    - Use (only) ff04::/64 admin-scoped default for multicast underlay
    addresses (not whole space)
    - Remove vni from underlay_multicast_group structure and associated index
    - Improve comment consistency across mcast schemas

  Database Operations:
    - New ops/ module with atomic operations addressing TOCTOU concerns:
      - member_attach.rs: CTE-based atomic member attachment
      - member_reconcile.rs: CAS operations for RPW reconciliation
    - Refactor of members.rs datastore with sled_id tracking and lifecycle management

  Instance Integration:
    - Updates to multicast reconciler activation after instance lifecycle operations (start, stop, reboot, migrate)
    - saga update: Update multicast member sled_id in dedicated saga node with undo support

  RPW Background Tasks:
    - Updates to reconciler logic for member state transitions (Joining → Joined → Left)
    - Better DPD synchronization with retry and error handling

  Database operations:
    - Use CTE for atomic member attachment (addresses TOCTOU concerns)
    - Use CAS operations for member reconciliation in RPW
    - New ops module with member_attach and member_reconcile implementations

  Address validation:
    - Add multicast subnet constants to common/address.rs
    - Use constants for IP pool validation (replaces hardcoded ranges)

  Authorization:
    - Allow any authenticated user to create/modify multicast groups in their fleet (not just Fleet::Admin)
    - Enables cross-project and cross-silo multicast communication
    - Added create_child and modify permissions to MulticastGroup policy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants