-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Litmus tests #170
base: main
Are you sure you want to change the base?
Add Litmus tests #170
Conversation
The bootrom SMP support consists of pausing all secondary cores after a first common reset sequence, and let the main core do the initialization process. The main (non-SMP) core is statically determined by a macro at the beginning of the bootrom. The secondary cores are then woken up before moving to the next boot stage, i.e. in boot_next_stage. The wakeup sequence consists of:
Possible problems:
|
The SMP support in the software runtime (crt0.S) instead fixes the main core to core 0. All other cores are paused after some common required initialization steps in the crt0.S. Non-main cores wait in a WFI loop for software interrupts. The wake-up sequence in this case only sends IPIs to all cores except core 0. The |
Zero-stage bootloader also required some adaptations wrt #85 due to the different behavior upon resuming secondary harts in the Cheshire runtime (crt0.S). When calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can we add smp_hello
to the Cheshire CI? I remember that we previously had issues with executing from either DRAM or SPM because of the way the stack was set up. Would be great to see if this is working now. Just out of curiosity, did you test the bootloader for a multicore configuration (e.g. SMP Linux?)
Regarding your comments:
We could avoid sending interrupts to the main core itself, simply resuming the other cores.
Why do you think this might be a problem? If synchronisation is needed, we could also add a barrier. Not sure if there would be a reason for it, though.
Zero-stage bootloader also required some adaptations wrt #85 due to the different behavior upon resuming secondary harts in the Cheshire runtime (crt0.S). When calling smp_resume the secondary harts jump to main - exactly as for the primary hart, but skipping some cold init steps - instead of jumping to the point in the code where the smp_resume is placed.
I think this makes sense!
fence(); | ||
for (uint32_t i = 1; i < num_harts; i++) { | ||
*reg32(&__base_clint, i << 2) = 0x1; | ||
while (*reg32(&__base_clint, i << 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The smp_resume routine also waits for the interrupt to be cleared by the secondary cores before proceeding (here). This ensures that when the smp_resume returns, the IPIs have been propagated to all cores and that the cores have woke up. However, this has the downside of potentially deadlocking if another core does not wake up properly. Also, if another core has not reached the WFI loop for any reason, this will stall core 0 until then. Finally, is this really race-free?
The main possible drawback that I see here is that it might introduce a delay between waking up cores. Could the same be achieved by adding a barrier after smp_resume
if necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could remove the CLINT register polling and leave the synchronization up to the programmer (e.g. by adding a barrier) if needed.
Regarding CI:
Yes that should be possible. I have only tested the Regarding bootrom SMP:
I think both approaches would be fine. I just am not sure whether we need to synchronize cores, and whether this way is a proper way of doing. However, the current approach is working and I don't see major issues with it. |
Co-authored-by: Emanuele Parisi <[email protected]>
- Add dual-core configuration in testbench - Add number of cores parameter for consistent CLINT/PLIC generation - Add PLIC configuration file generation according to number of cores - Bump nonfree to version with baremetal SMP tests
void smp_barrier_init() { | ||
_barrier_target = 0; | ||
} | ||
|
||
void smp_barrier_up(uint64_t n_processes) { | ||
barrier_wait(&_barrier_target, 1, n_processes); | ||
} | ||
|
||
void smp_barrier_down() { | ||
barrier_wait(&_barrier_target, -1, 0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need two barriers to avoid race conditions. e.g. core 0 passes barrier_up
and already reaches barrier_down
, decrementing it before core 1 had a chance to pass barrier_up
, which can result in a deadlock. See here for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not already enforced by the barrier itself? Can core 0 pass barrier_up
if core 1 has not reached it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core 0 passing barrier_up
will guarantee that core 1 has reached it, but not necessarily that it has passed it. Then, if core 0 reaches barrier_down
before core 1 has passed barrier_up
, it might decrement the barrier again before core 1 observed that it ever reached num_harts
. This is especially pronounced on WB configurations, where the fence
instructions invalidate the L1 and subsequent loads of the barrier lie few hundred cycles apart, which can be easily sufficient for the other core to reach the next barrier and decrement it.
By adding the second barrier, we can guarantee that core 0 has passed the first barrier_up
when core 1 reaches barrier_down
. I hope that makes sense :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes total sense! Will push a fix to this.
Thanks for the catch :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I was wondering whether we could have a more efficient implementation of this, that does not require a fixed interleaving of counting up and down, i.e. a barrier()
function that can be called multiple times in a row. At the moment that can be achieved by
void smp_barrier() {
smp_barrier_up(N_CORES);
smp_barrier_down();
}
But this seems a bit inefficient, especially if each direction requires counting up/down on two variables.
Any insight on this is welcome :)
|
||
// Check that CLINT core count is equal to `NumIrqHarts` | ||
if (clint_reg_pkg::NumCores != NumIrqHarts) | ||
$fatal(1, "CLIC core count (%d) does not match `NumIrqHarts` (%d)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$fatal(1, "CLIC core count (%d) does not match `NumIrqHarts` (%d)", | |
$fatal(1, "CLINT core count (%d) does not match `NumIrqHarts` (%d)", |
Contributions:
sw/deps
)utils/litmus
)