-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add synchronization for multicore #782
Conversation
Build for testing: |
Thanks @soyersoyer. |
Build for testing: |
Build for testing: |
Barriers are not needed because they are already in the Acquire () / Release () of the spinlock of the CDexedAdapter getSamples () function. Volatile variables are written and read with STR/LDR ARM instructions, which are atomic, and it would be enough if there were no other m_nFramesToProcess variable, for which there is no guarantee that its new value would be available to the other cores sooner than the new value of m_CoreStatus. If I understand correctly? If volatiles remain, I think that another solution could be to omit the nFrames variable in ProcessSound (use m_nFramesToProcess instead)? The other thing I don't know is whether Core1 can wait indefinitely if m_CoreStatus[nCore] != CoreStatusIdle is read, then it starts waiting for the interrupt, but between the read and the wait for the interrupt, m_CoreStatus changes (by Core2) and it has already received its Interrupt. Or, how can this be ensured so that it doesn't happen. I haven't been able to measure yet whether it really consumes less this way and whether it's really worth it, I'll let you know when the meter arrives. |
I wasn't aware of any issues in this area - do you have something specific you were attempting to fix? I seem to recall that m_CoreStatus is only ever written by one core and read by another, so just waiting on it ought to be fine. In theory it might be possible for the compiler to attempt to optimise out a read if it isn't obvious it is being changed, or for there to be some kind of cache coherency issue between cores. But it is already declared volatile for exactly this reason, and most multi-core CPUs are fully aware of the problems of cache coherency, so I'm not aware of any issues there either. As you say, areas where synchronisation could be a problem are already protected. As a general point, I'm getting slightly nervous that there seem to be a lot of changes happening here and there with no specific issue to solve and as the original developers have since moved on from the project I can't help feeling that we are introducing additional risk as a consequence. I know code bases ought to evolve, but there is also danger of negating the significant testing that has already happened with the codebase to date if we introduce too many other changes without a significant reason. Kevin |
Having a core continuously read a variable is not the most sophisticated solution. It generates unnecessary heat, can drain a battery faster if the Pi is running on it, and the DAC output can be noisier due to the extra heat. It would be nice to find a stable solution so that it can go to suspend while it waits. And then I saw that volatile is used for synchronization, which isn't exactly what it's for, and I thought I'd get around it better. This is a draft, by the way. m_CoreStatus is good anyway because of its volatile type, because ARM guarantees that its value will be the same on all cores at the same time. But since processors can rearrange memory writes, there is no guarantee (with volatile) that TG3 or TG6 will not use the previous value of m_nFramesToProcess. (Or at least I don't see it) I also tried running TG3 and TG6 with smaller chunk sizes while the others worked with the default, but I couldn't hear the difference. So in real life, I don't think it will cause any audible problems with volatile either. Although if there is a performance or stress test that can make a 3b+ sweat, I would be happy to try it. |
Maybe. And this might be something that is more an issue with a Pi 5. Certainly for Pi 1,2,3 I think the processing time was pretty maxed out processing frames anyway, so there probably wouldn't be much waiting time for suspending. With a Pi 4 and 16 TGs that might also be the case, but I don't believe we have any metrics to know if there is any idle time in the system. The latest performance with a Pi 5 could be more significant, but again, I don't know. Maybe we need some measures of idle times on the different Pis to see if there is anything better than essentially just "spinning".
Yes, I don't know. I can only imagine that is assumed because all cores are kicked off processing frames at the same time, so m_CoreStatus ought to be idle when the number of frames is set? And it is probably quite likely that the number of frames might always be the same - I haven't looked into the sound driver to see how many requested frames come back from GetQueueFramesAvail(). These are questions for Rene really, which is why I'm not a fan of changing these things unless we know there is a problem that needs fixing. Kevin |
There is currently no memory synchronization between the processor cores, which could cause problems in theory. I added memory barriers and atomic access to the cores' status.
I also suspend the cores when waiting. I thought this would make the PI a little cooler, but I haven't been able to measure it. Can someone measure it with a USB power meter?
Maybe not all of these changes are necessary. I'll get to know the barriers better.