Add Litmus tests #170

ezelioli · 2024-12-03T12:02:37Z

Contributions:

Refactor bootrom source code to be parametric (note no changes in actual bootrom content)
Add SMP support to software runtime
Add simple SMP hello software test
Add PULP's fork of Litmus tests as submodule (in sw/deps)
Add script with utility functions to parse output of Litmus tests (in utils/litmus)
Add make flow to run tests
Extend zero-stage boot-loader for SMP

ezelioli · 2024-12-03T18:58:03Z

The bootrom SMP support consists of pausing all secondary cores after a first common reset sequence, and let the main core do the initialization process. The main (non-SMP) core is statically determined by a macro at the beginning of the bootrom. The secondary cores are then woken up before moving to the next boot stage, i.e. in boot_next_stage.

The wakeup sequence consists of:

Sending a software interrupt to all cores (here). Note that this includes the main core itself, which will send an IPI to iself.
Wait for IPI to be received in each core. First, each core waits in a WFI loop. When the IPI is received, the core clears the respective CLINT IPI register, clearing the interrupt. Then, each core reads all the CLINT IPI registers to check that all other cores have already cleared it.
Both cores proceed to next stage

Possible problems:

We could avoid sending interrupts to the main core itself, simply resuming the other cores. This would then have some possible implications on how the "synch" step (point 2 above) happens, since secondary cores would not know when the first core has completed the wakeup sequence (by reading CLINT IPI registers). However, do we need this synchronization? Also, is this synchronization based on IPIs really race-free?

ezelioli · 2024-12-03T19:09:25Z

The SMP support in the software runtime (crt0.S) instead fixes the main core to core 0. All other cores are paused after some common required initialization steps in the crt0.S. Non-main cores wait in a WFI loop for software interrupts. The wake-up sequence in this case only sends IPIs to all cores except core 0. The smp_resume routine also waits for the interrupt to be cleared by the secondary cores before proceeding (here). This ensures that when the smp_resume returns, the IPIs have been propagated to all cores and that the cores have woke up. However, this has the downside of potentially deadlocking if another core does not wake up properly. Also, if another core has not reached the WFI loop for any reason, this will stall core 0 until then. Finally, is this really race-free?

ezelioli · 2024-12-03T21:00:12Z

Zero-stage bootloader also required some adaptations wrt #85 due to the different behavior upon resuming secondary harts in the Cheshire runtime (crt0.S). When calling smp_resume the secondary harts jump to main - exactly as for the primary hart, but skipping some cold init steps - instead of jumping to the point in the code where the smp_resume is placed.

niwis

LGTM. Can we add smp_hello to the Cheshire CI? I remember that we previously had issues with executing from either DRAM or SPM because of the way the stack was set up. Would be great to see if this is working now. Just out of curiosity, did you test the bootloader for a multicore configuration (e.g. SMP Linux?)

Regarding your comments:

We could avoid sending interrupts to the main core itself, simply resuming the other cores.

Why do you think this might be a problem? If synchronisation is needed, we could also add a barrier. Not sure if there would be a reason for it, though.

Zero-stage bootloader also required some adaptations wrt #85 due to the different behavior upon resuming secondary harts in the Cheshire runtime (crt0.S). When calling smp_resume the secondary harts jump to main - exactly as for the primary hart, but skipping some cold init steps - instead of jumping to the point in the code where the smp_resume is placed.

I think this makes sense!

sw/lib/crt0.S

niwis · 2024-12-04T05:07:14Z

sw/lib/smp.c

+    fence();
+    for (uint32_t i = 1; i < num_harts; i++) {
+        *reg32(&__base_clint, i << 2) = 0x1;
+        while (*reg32(&__base_clint, i << 2))


The smp_resume routine also waits for the interrupt to be cleared by the secondary cores before proceeding (here). This ensures that when the smp_resume returns, the IPIs have been propagated to all cores and that the cores have woke up. However, this has the downside of potentially deadlocking if another core does not wake up properly. Also, if another core has not reached the WFI loop for any reason, this will stall core 0 until then. Finally, is this really race-free?

The main possible drawback that I see here is that it might introduce a delay between waking up cores. Could the same be achieved by adding a barrier after smp_resume if necessary?

Yes, we could remove the CLINT register polling and leave the synchronization up to the programmer (e.g. by adding a barrier) if needed.

ezelioli · 2024-12-04T12:35:04Z

Regarding CI:

LGTM. Can we add smp_hello to the Cheshire CI? I remember that we previously had issues with executing from either DRAM or SPM because of the way the stack was set up. Would be great to see if this is working now. Just out of curiosity, did you test the bootloader for a multicore configuration (e.g. SMP Linux?)

Yes that should be possible. I have only tested the smp_hello myself, will add that to CI as well.

Regarding bootrom SMP:

We could avoid sending interrupts to the main core itself, simply resuming the other cores.

Why do you think this might be a problem? If synchronisation is needed, we could also add a barrier. Not sure if there would be a reason for it, though.

I think both approaches would be fine. I just am not sure whether we need to synchronize cores, and whether this way is a proper way of doing. However, the current approach is working and I don't see major issues with it.

Co-authored-by: Emanuele Parisi <[email protected]>

- Add dual-core configuration in testbench - Add number of cores parameter for consistent CLINT/PLIC generation - Add PLIC configuration file generation according to number of cores - Bump nonfree to version with baremetal SMP tests

niwis · 2025-01-23T05:40:27Z

sw/lib/smp.c

+void smp_barrier_init() {
+    _barrier_target = 0;
+}
+
+void smp_barrier_up(uint64_t n_processes) {
+    barrier_wait(&_barrier_target, 1, n_processes);
+}
+
+void smp_barrier_down() {
+    barrier_wait(&_barrier_target, -1, 0);
+}


I think we need two barriers to avoid race conditions. e.g. core 0 passes barrier_up and already reaches barrier_down, decrementing it before core 1 had a chance to pass barrier_up, which can result in a deadlock. See here for an example.

Is this not already enforced by the barrier itself? Can core 0 pass barrier_up if core 1 has not reached it as well?

core 0 passing barrier_up will guarantee that core 1 has reached it, but not necessarily that it has passed it. Then, if core 0 reaches barrier_down before core 1 has passed barrier_up, it might decrement the barrier again before core 1 observed that it ever reached num_harts. This is especially pronounced on WB configurations, where the fence instructions invalidate the L1 and subsequent loads of the barrier lie few hundred cycles apart, which can be easily sufficient for the other core to reach the next barrier and decrement it.
By adding the second barrier, we can guarantee that core 0 has passed the first barrier_up when core 1 reaches barrier_down. I hope that makes sense :-D

Yes that makes total sense! Will push a fix to this.
Thanks for the catch :)

Also I was wondering whether we could have a more efficient implementation of this, that does not require a fixed interleaving of counting up and down, i.e. a barrier() function that can be called multiple times in a row. At the moment that can be achieved by

void smp_barrier() { smp_barrier_up(N_CORES); smp_barrier_down(); }

But this seems a bit inefficient, especially if each direction requires counting up/down on two variables.
Any insight on this is welcome :)

niwis · 2025-02-19T13:33:44Z

hw/cheshire_soc.sv

+
+  // Check that CLINT core count is equal to `NumIrqHarts`
+  if (clint_reg_pkg::NumCores != NumIrqHarts)
+    $fatal(1, "CLIC core count (%d) does not match `NumIrqHarts` (%d)",


Suggested change

$fatal(1, "CLIC core count (%d) does not match `NumIrqHarts` (%d)",

$fatal(1, "CLINT core count (%d) does not match `NumIrqHarts` (%d)",

ezelioli mentioned this pull request Dec 3, 2024

Improve SMP support #169

Closed

ezelioli marked this pull request as ready for review December 3, 2024 12:41

ezelioli requested review from paulsc96 and niwis December 3, 2024 12:41

ezelioli self-assigned this Dec 3, 2024

niwis previously approved these changes Dec 4, 2024

View reviewed changes

ezelioli dismissed niwis’s stale review via 7fc246a December 5, 2024 09:54

ezelioli force-pushed the ez/litmus branch from e653bd4 to 7fc246a Compare December 5, 2024 09:54

paulsc96 added this to the v0.3.0 milestone Jan 8, 2025

ezelioli and others added 3 commits January 14, 2025 14:20

Extend SMP support to runtime and ZSL

0f61290

Co-authored-by: Emanuele Parisi <[email protected]>

Add Litmus tests

7dc5563

Add SMP tests to Gitlab CI

b1a2750

- Add dual-core configuration in testbench - Add number of cores parameter for consistent CLINT/PLIC generation - Add PLIC configuration file generation according to number of cores - Bump nonfree to version with baremetal SMP tests

paulsc96 force-pushed the ez/litmus branch from 9c32519 to b1a2750 Compare January 14, 2025 13:22

paulsc96 added 6 commits January 14, 2025 17:24

util: Clean up HW generation scripts

79e9e29

hw: Add elaboration-time interrupt parameter checks

2161f9f

docs: Minor fix to interrupt architecture

955fd24

cheshire.mk: Properly parameterize IRQs, update nonfree

7ff9442

hw/bootrom: Minor cleanup

9aa2397

hw/bootrom: Correct fatal bug in platform ROM fallthrough

a367f41

paulsc96 force-pushed the ez/litmus branch from aa1f3a2 to a367f41 Compare January 14, 2025 18:23

niwis reviewed Jan 23, 2025

View reviewed changes

niwis reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Litmus tests #170

Add Litmus tests #170

ezelioli commented Dec 3, 2024 •

edited

Loading

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

niwis left a comment

niwis Dec 4, 2024

ezelioli Dec 4, 2024

ezelioli commented Dec 4, 2024

niwis Jan 23, 2025

ezelioli Jan 23, 2025

niwis Jan 24, 2025

ezelioli Jan 24, 2025

ezelioli Jan 24, 2025

niwis Feb 19, 2025

	$fatal(1, "CLIC core count (%d) does not match `NumIrqHarts` (%d)",
	$fatal(1, "CLINT core count (%d) does not match `NumIrqHarts` (%d)",

Add Litmus tests #170

Are you sure you want to change the base?

Add Litmus tests #170

Conversation

ezelioli commented Dec 3, 2024 • edited Loading

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

ezelioli commented Dec 3, 2024

niwis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezelioli commented Dec 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezelioli commented Dec 3, 2024 •

edited

Loading