Relay chain coretime assigner does not support more assignments than fit in a single XCM message (currently 28) #6102

seadanda · 2024-10-17T09:55:02Z

The system allows interlacing right down to the single block level (80 assignments per timeslice, each with a CoreMask with one bit set)
The problem is that it creates a call that doesn't actually fit in an XCM message (we can fit max 28 assignments in a single XCM)

We can easily chunk that on the Coretime Chain side and send it over as four messages, however with the current design that means we need to call assign_core multiple times on the relay for a given timeslice which is disallowed by design due to some assumptions made by the scheduler.

Mitigation in the mean time:
28 assignments is the limit, but 27 assignments that don't add up to a complete mask will be rejected due to the requirement for a full mask on the relay. Therefore we take the first 27 and append an Idle assignment, taking it to 28.
This will make anybody who interlaces more than 27 times lose some assignments, but it's better than the current system, which just drops the entire core's assignments because the message is too big. Once this is missed, it's gone from the workplan and is a total mess. Far preferable to truncate and assign everything we can until we can drop some assumptions in the scheduler on the relay.

Mitigation for the Polkadot launch: polkadot-fellows/runtimes#434
Testnets mitigation: #6022

The text was updated successfully, but these errors were encountered:

seadanda · 2024-10-17T10:43:18Z

Just copying the initial idea here that I had when this first popped up:
The likelihood of somebody interlacing down to 27 assignments is very low, so maybe something like assigning each chunk one block later than the previous could be a fix that maintains some of the assumptions in the implementation, with a potential short outage for the workloads who get the second or third chunk, but by timeslice 2 of the region they're all running as intended. Since the first 27 assignments are already on the relay, it should be possible to achieve this without any downtime.

assign_core has the signature

	pub fn assign_core(
		core_idx: CoreIndex,
		begin: BlockNumberFor<T>,
		assignments: Vec<(CoreAssignment, PartsOf57600)>,
		end_hint: Option<BlockNumberFor<T>>,
	) -> Result<(), DispatchError> {

and we just need to never call it more than once for the same core and begin combination.
so 80 assignments:
0..27 assigned on the begin
27..55 assigned on begin+1
55..80 assigned on begin+2

But we still need to change the relay logic to drop the requirement for each assign_core call to contain a fully scheduled core. As part of that we'd need to add logic in there to pad an underscheduled core with Idle, then when a further underscheduled assignment comes in within a timeslice (for example) it should try to remove the Idle padding, "append" the new parts and recompute the padding again.

spanow · 2024-11-04T16:30:18Z

Hello,
May I work on this issue ?

seadanda · 2024-11-05T08:08:54Z

Hi, yes this is free to take on

eskimor · 2024-11-06T10:17:30Z

@spanow have you started already? I hope not, release is getting cut today and we have the other fixes ready, so I want to get this in today: Fixed the issue here by relaxing the append requirement, it now is relaxed to be fine also if begin == last. The mask also no longer needs to be full.

spanow · 2024-11-06T12:13:37Z

Totally understandable
It's alright, thank you for notifying me @eskimor

eskimor · 2024-11-06T13:45:43Z

@spanow @seadanda suggested a unit test to ensure non-exhaustive (and overfull) assignments don't cause any issues. If you are interested in an even more beginner friendly task, that would still be left to do (and is not time-critical).

Relax requirements for `assign_core` so that it accepts updates for the last scheduled entry. Fixes #6102 --------- Co-authored-by: eskimor <[email protected]> Co-authored-by: GitHub Action <[email protected]>

seadanda · 2024-11-06T21:45:41Z

#6397 is the issue if you're interested

seadanda added I2-bug The node fails to follow expected behavior. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Oct 17, 2024

bkchr added D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK. labels Oct 17, 2024

seadanda added the C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. label Oct 17, 2024

eskimor mentioned this issue Nov 6, 2024

Fix #6102 #6384

Merged

eskimor closed this as completed in #6384 Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relay chain coretime assigner does not support more assignments than fit in a single XCM message (currently 28) #6102

Relay chain coretime assigner does not support more assignments than fit in a single XCM message (currently 28) #6102

seadanda commented Oct 17, 2024

seadanda commented Oct 17, 2024

spanow commented Nov 4, 2024

seadanda commented Nov 5, 2024

eskimor commented Nov 6, 2024

spanow commented Nov 6, 2024

eskimor commented Nov 6, 2024

seadanda commented Nov 6, 2024

Relay chain coretime assigner does not support more assignments than fit in a single XCM message (currently 28) #6102

Relay chain coretime assigner does not support more assignments than fit in a single XCM message (currently 28) #6102

Comments

seadanda commented Oct 17, 2024

seadanda commented Oct 17, 2024

spanow commented Nov 4, 2024

seadanda commented Nov 5, 2024

eskimor commented Nov 6, 2024

spanow commented Nov 6, 2024

eskimor commented Nov 6, 2024

seadanda commented Nov 6, 2024