-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement atomics wait/notify with C++20 runtime support #2268
Conversation
Edit: all tests passing now Looks like I have to fix some asan build settings. Will fix. Please feel free to start code reviews while I fix this though --- these fixes are should only require minor changes with build scripts. |
Ping: reminder to have a look when you have a chance :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very reasonable approach!
lgtm, although didn't have time to digest the changes to the test runner yet. Everything else lgtm % comments
I have fixed the comments. I've also tested that this works with the wasi-thread implementation |
I guess I have some higher-level questions... feel free to tell me if this is overkill. I should admit I'm basically terrified of threads and the complexities they introduce.
|
A mix of the above. So i did have a look at that link in detail prior to the implementation. On the first parts on things like "how many spinlock iterations" before calling sleep, these are things that should happen at the wasm's libc level, and so would be transparent to us. However, in case the value change doesn't happen during the duration of the spinlock of Wasm libc, then Wasm libc has to fall back to the Wasm platform wait/notify primitives. Re the bits about avoid datastructures. Unfortunately this is not applicable to Wasm. libc++ can do this because they have to "implement a wait/notify until an underlying value has changed, AND they allow spurious wakeups". The OS APIs like futex are well setup for this. Wasm on the other hand wants "a wait/notify that waits UNTIL NOTIFIED AND with no spurious wake ups". This is a large difference that causes issues. I'll walk this more below by contrasting how this looks like in libstdc++ wait/notify vs. Wasm's wait/notify. Consider an example where want a Wasm implementation that avoids data strucutres. We want to use platform primitives like futex (or in our case, the host C++ runtime's atomic wait/notify support) to make this happen. However platform primitives has spurious wakeups. When this spurious wakeups happen, we need a test to see if this wake up is spurious. In libstdc++'s wait/notify, the test is simple --- check if the value has changed. Wasm does not have an easy test, as it is defined as "wake up only when you get a notify" --- there is no guarantee that checking that the underlying value has changed is the thing the program was waiting for. The only way to have a valid test, is to setup a separate data structure that bridges these differences. (This was the crux of my earlier comment about why I think this design was a mistake in the Wasm standards) Also, another interesting data point here is when the platform primitives do not match what the exactly want, even libstdc++ has to resort to extra data structures. This is discussed under "How to handle those types that do not fit in a __platform_wait_t". In their case, the limitations of the platform APIs are to do with size of atomics (which is different from what we consider as limitations), but it nevertheless demonstrates why separate data structures are needed when the platform primitives do not exactly match what the spec demands.
Oh this is neat. I wasn't aware of try_acquire_until. I checked how this is implemented in libstdc++ https://developers.redhat.com/articles/2023/04/18/implementing-c20-semaphores#semaphores_in_c__ and it seems like they are relying on atomics where possible, which means this won't just spinlock and wreck system performance. I think something like this could allow me to kill the timer thread in this implementation. It won't eliminate extra datastructures though. I'll investigate this more and update this thread/PR appropriately.
We may need to do something like that to be in compliance with the Wasm spec. The upside is that this is easily possible with the relaxed parameter to the existing compiler primitive atomics we rely on https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html However, I would like to separate this bit into a new PR, as the current PR is mostly focused on the runtime. I expect at least 2 more PRs in this space before we get to full spec compliance --- one PR to add some spec required checks on the shared memory bit (shared memory has to have a max size etc.), and one PR to address this point of regular non-atomic load/stores to shared memory.
On the fundamental philosophy of do we want to maintain tests --- At a high level, I do want to caution that I don't think we want to be in situation where we say the only tests we will execute are the upstream tests. I think is unrealistic for any production level wasm engine. Wamr for instance certainly has additional thread-wait-notify tests in their repo https://github.com/bytecodealliance/wasm-micro-runtime/tree/main/core/iwasm/libraries/lib-wasi-threads/test as does Wasmtime https://github.com/bytecodealliance/wasmtime/blob/72b87183ffc57d834737e9ad6d4f1c1967e559a9/tests/all/wait_notify.rs On the point about taking ownership of syntax define the parallel tests, executing the syntax in the test runner --- while I would definitely agree if this test runner change was sufficiently complicated, it turns out that it's, simply put, not that hard. I haven't counted the linediff for just this part, but this PR basically shows that this is probably less than 50 lines of code, changes that are pretty easy to grok or build on. Summarizing, the basic change is "invocation of each test-function sequentially" --> "invoke each test-function using THREAD_CREATE" and finally adding a THREAD_JOIN at the end, and then adding one annotation to wasm2c custom test format under the test/*.txt path (apart from some minor things like make test counts atomic). Ultimately, I don't think there is a lot here. But if you really feel strongly about this part, I am ok ripping out the extra tests
I don't have any tests in mind that need more than what this PR brings.
Unfortunately, I do not have cycles for this at this time. Additionally, with the possible changes in the underlying thread proposal being discussed, I am wary of entering this space of the standards discussions, when the momentum and folks' attention is plainly in other parts of the thread proposal (resolving thread_create). Maybe at some point in the future. Ultimately, though, I think this is an orthogonal question --- we can still land these tests in this PR and separately pursue getting these tests included upstream in parallel or after. |
Update: I had a chat with @conrad-watt and it looks like I can add to the above. Apparently concurrent spec tests are much further along than I expected (and it exists, albeit on a non main branch of th spec test repo). It is expected to make it to the main branch by mid October. Given this, and @keithw's concerns above, I am happy to rip out the concurrent testing flag/syntax we have for now. We will likely need something very similar in the future (but with some changes) to accommodate the upstream syntax to support the concurrent spec tests. @conrad-watt also mentioned I could share any missing tests directly with him given that this is getting updated as we speak. On the bit about non-atomic accesses, I have confirmed that we can go ahead and implement it as relaxed memory order and this is implementable in a way that won't introduce UB or performance issues in single or multi threaded wasm2c output for gcc/clang. I am still working on figuring out the path for msvc --- on msvc we can achieve spec compatibility, but maybe not optimal performance as I haven't yet figured out how to get memory_order_relaxed via compiler intrinsics, meaning we may have to employ a stronger memory order sacrificing performance. I will investigate further to see how to avoid this. Finally, one idea prompted by discussion with @conrad-watt does perhaps reopen the possibility of relying on c11 atomic accesses instead of compiler intrinsics which would be much nicer. I will investigate if this path can be taken while avoiding UB in wasm2c, in parallel to these changes. Proposed path forward
|
That plan sounds pretty good to me! |
Will create new PRs with proposed changes |
Implementing await/notify using runtime support with some C++20. This PR contains the following changes.
BUILD_WASM2C_THREAD_WAIT_NOTIFY
which will build wabt with C++ 20 and include the wait/notify runtime support in C++20Note -- build defaults to including dummy wait/notify implementation so that consumers don't need C++ 20 yet. Test suite defaults to using C++20 so we can continue to test the wait/notify runtime support.