Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c supporting threads and other features #1766

Open
rhobro opened this issue Nov 21, 2021 · 14 comments
Open

wasm2c supporting threads and other features #1766

rhobro opened this issue Nov 21, 2021 · 14 comments

Comments

@rhobro
Copy link

rhobro commented Nov 21, 2021

Hi there,

I was trying to convert a WASM to C when wasm2c exited saying that the WASM file used threads.

Would it be possible to enable this feature in wasm2c?

Thanks

@kripken
Copy link
Member

kripken commented Nov 21, 2021

It is possible in principle, but would take some work:

  1. Implement support for lowering wasm atomics into C atomics in wasm2c.
  2. Implement runtime support for the pthreads API.

@rhobro
Copy link
Author

rhobro commented Nov 22, 2021

As long as it is possible haha. @kripken

Out of interest, what are atomics? Are they the same as the concurrency-safe variables I've come across in languages such as Java and Go?

Secondly, is pthreads part of Emscripten? When looking for existing issues on the matter, I think I came across an issue mentioning pthreads. Aha, I found it: #1645

@kripken
Copy link
Member

kripken commented Nov 22, 2021

Out of interest, what are atomics? Are they the same as the concurrency-safe variables I've come across in languages such as Java and Go?

They are lower-level than most languages. Basically they are used to implement the language-level features.

https://github.com/WebAssembly/threads/blob/main/proposals/threads/Overview.md#atomic-memory-accesses

Secondly, is pthreads part of Emscripten?

Yes, Emscripten has stable support for pthreads:

https://emscripten.org/docs/porting/pthreads.html

That uses a bunch of JS for things not supported in wasm yet, like creating web workers and handling the inability to block on the main thread. It works on the web and in node, but atm not anywhere else AFAIK.

@lars-t-hansen
Copy link

It is possible in principle, ...

Wasm has semantics for data races and mixed atomic/non-atomic accesses and partly-aliased accesses where C has only UB, so there's a fairly big caveat here that for an efficient and idiomatic translation, the wasm has to be well-behaved according to C shared-memory semantics to avoid UB.

@rhobro
Copy link
Author

rhobro commented Nov 23, 2021

As long as it is possible.

So the onus is on the WASM to prevent UB? That makes sense.

But being able to compile the WASM code to C and then compile that to machine code should have a decent performance boost.

@lars-t-hansen
Copy link

So the onus is on the WASM to prevent UB? That makes sense.

The onus is on the Wasm producer (source language compiler + binaryen + linker + ...) to produce wasm that is of a form that will allow wasm2c to produce UB-free C code.

@rossberg
Copy link
Member

@lars-t-hansen, is that more than just a theoretic possibility? The producer would need to know precisely what wasm2c spits out, which it probably cannot even rely on being the same across versions. And then safely steer around the gigantic chasms that are UB in C.

Truth probably is that the idea that you could efficiently compile Wasm through C (or LLVM, for that matter) while correctly maintaining all its semantics is hopeless, at least with threads. C's (and LLVM's) UB semantics is much too weak and infectious.

In practice, the semantic discrepancy may not matter, though.

@rhobro
Copy link
Author

rhobro commented Nov 23, 2021

If the code was compiled to WASM from C, then would it still produce errors when decompiling to C?

@lars-t-hansen
Copy link

@rossberg

@lars-t-hansen, is that more than just a theoretic possibility?

No, I don't think so.

The producer would need to know precisely what wasm2c spits out, which it probably cannot even rely on being the same across versions. And then safely steer around the gigantic chasms that are UB in C.

Truth probably is that the idea that you could efficiently compile Wasm through C (or LLVM, for that matter) while correctly maintaining all its semantics is hopeless, at least with threads. C's (and LLVM's) UB semantics is much too weak and infectious.

In practice, the semantic discrepancy may not matter, though.

The main problem, I expect, is that the C compiler may observe that something is UB and start rewriting the code based on that observation. How big a problem this might be is hard to quantify. It's probably at most a small problem, but one could imagine static memory addresses in wasm turning into sufficiently predictable pointer values in C, and if the wasm code does something interesting via those pointers I could believe UB might be observable.

@rhobro, even if the source language is C it's possible the compiler+tools introduces optimized code that, if translated back to C, violate C's idea of undefined behavior. For example, a reasonable C compiler would assume that the integer representation is two's complement and optimize based on that, but that assumption may be invalid in C.

@rhobro
Copy link
Author

rhobro commented Nov 28, 2021

So how could this be implemented?

@kripken
Copy link
Member

kripken commented Nov 28, 2021

@rhobro

So how could this be implemented?

I think that was answered here? #1766 (comment) Those two tasks are basically the steps (which mirror most other features for wasm2c, although here the second task, the runtime, would be larger).

It is very true that there is some risk of C UB that needs to be considered carefully during that process, as has been mentioned since. But it may be easier to estimate how big an issue that is after work starts, with concrete code.

@rhobro
Copy link
Author

rhobro commented Dec 6, 2021

Thanks. When can this be implemented?

@kripken
Copy link
Member

kripken commented Dec 7, 2021

@rhobro

There is no way to really estimate that until someone volunteers to do the work. Usually how these things go is that someone interested enough in the feature, like yourself, will decide to implement it.

Once someone starts on this, I'd expect it would take around a week to get something basic working, in addition to any time to ramp up on the codebase. Implementing all the various instructions and corner cases would take somewhat longer, maybe a few more weeks. Aside from that, there are the open questions about undefined behavior as mentioned before - we might only see how serious that issue is once enough is implemented.

@rhobro
Copy link
Author

rhobro commented Dec 7, 2021

I would implement it if I could 😂 but I don't understand the workings under the hood well enough.

Ok. Can someone tag the issue as an enhancement or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants