-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new proxying modes foo__proxy: 'abort' and foo__proxy: 'abort_debug' #22648
base: main
Are you sure you want to change the base?
Conversation
…ug', which will add compile time checks to verify that given JS function is not called in pthreads or Wasm Workers.
How to fix the lint errors?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable if there is use case for it. Are there existing system library function that you had in mind when designing this attribute?
src/jsifier.mjs
Outdated
const insideWorker = (PTHREADS && WASM_WORKERS) | ||
? '(ENVIRONMENT_IS_PTHREAD || ENVIRONMENT_IS_WASM_WORKER)' | ||
: (PTHREADS ? 'ENVIRONMENT_IS_PTHREAD' : 'ENVIRONMENT_IS_WASM_WORKER'); | ||
prefix = `assert(!${insideWorker}, "Attempted to call function '${mangled}' inside a pthread/Wasm Worker, but this function has been declared to only be callable from the main browser thread");`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "but this function can only be called from the main thread"?
Using the term main browser thread
I a little misleading because emscripten application can be run on workers where there is no main browser thread involved. We sometimes call this "main application thread", but I think just "main thread" here is clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proxying modes __proxy: 'sync'
and __proxy: 'async'
definitely mean to run the JS code in the main browser thread, and not in the thread that executes 'main'.
I.e. in PROXY_TO_PTHREAD builds, __proxy: 'sync'
and __proxy: 'async'
run the proxied JS code in the main browser thread, which is not the same thread as the main application thread (the main runtime thread)
The earlier PROXY_TO_WORKER build mode (which I presume is what you mean by "can be run on workers where there is no main browser thread involved") predates pthreads/SharedArrayBuffer, and does not interact with SAB, or the proxying architecture. So these proxying modes are not relevant there when pthreads are not used.
So I think saying "main browser thread" here is the correct thing to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "but this function can only be called from the main thread"?
Changed the message into form
"Attempted to call function '${mangled}' inside a pthread/Wasm Worker, but by using the proxying directive "'${proxyingMode}'", this function has been declared to only be callable from the main browser thread"
I would like to keep the mention about the proxy declaration that prevents the call. Otherwise a developer might be surprised to mistakenly ponder "how does it know this function is not callable from a worker?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proxying modes
__proxy: 'sync'
and__proxy: 'async'
definitely mean to run the JS code in the main browser thread, and not in the thread that executes 'main'.
The thread on which the emscripten-generated code is first run I call the main application thread
, and it might be running in worker, in which case there is no main browser thread at all in the picture. When you write MAIN_THREAD_EM_ASM
or __proxy: 'sync'
its is the main application thread that runs you code. It just happens that this is also normally the main browser thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, has there been some kind of change in configuration that I am not aware of?
My expectation:
Building with -sPROXY_TO_WORKER
: main()
is run in a Worker thread. __proxy
architecture is not available there.
Building with -pthread
: main()
is run in main browser thread. main browser thread == main application thread, __proxy
targets main browser thread.
Building with -pthread -sPROXY_TO_PTHREAD
: main()
is run in a separate Worker thread, i.e. main application thread is not main browser thread. __proxy: 'sync'
and MAIN_THREAD_EM_ASM()
proxy to the main browser thread, and not to the main application thread.
In each case, proxying does target the main browser thread. Has this changed somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have several users that do this.
Is this case a scenario where the user manually spawns a new Worker(), and inside that Worker, they load up the Emscripten-compiled pthread-enabled program?
Such use case was never supported originally, because of the problem that in such a manual "I loaded the app into a Worker" mode, Emscripten would still need to be able to load some stuff (a lot of stuff) into the JS context of the main application thread, in order for any kind of general proxying to work. And Emscripten runtime JS libraries were not developed with the impression that the target thread that proxying occurs to, would not be the main browser thread.
For example, if I start searching for __proxy: 'sync'
in /src/, there are functions that assume that they are always the main browser thread:
emscripten_set_window_title(): document.title only exists in main browser thread
emscripten_get_screen_size(): screen.width / screen.height exist only in main browser thread
emscripten_hide_mouse(): document.styleSheets only in main browser thread
emscripten_set_canvas_size: Browser.setCanvasSize() works only in main browser thread
emscripten_get_canvas_size: Module['canvas'] only works in main browser thread
emscripten_create_worker: requires Subworkers support in order to work (https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers#spawning_subworkers)
library_html5.js: everything here that has __proxy: 'sync' expects that proxy target lands to the main browser thread
library_egl.js, library_glut.js: everything with __proxy: 'sync' expects that proxy target is the main browser thread
library_openal.js, library_html5_webgl.js, library_sdl.js: likewise
that is why I added the whole PROXY_TO_PTHREAD mode so that users would not need to figure out how to manually launch Emscripten content main() in a Worker - they could reuse the system-provided mechanism for that.
Although now that I think about it, iirc subworkers are supported widely, so that is no longer a problem (in 2015 Chrome did not support subworkers)
So maybe those users that do go and launch their pthread builds in a Worker are careful to not use any of the above APIs, and just instead do generic computation there? I.e. those people are not using Canvases or library_browser.js or library_html5.js or any other DOM accessing things that Emscripten provides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I believe it was actually you who added those two functions back in 174a082)
Yeah, I added those, but that is not for supporting the "user manually launches a pthread-enabled build inside their manually created new Worker()".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case a scenario where the user manually spawns a new Worker(), and inside that Worker, they load up the Emscripten-compiled pthread-enabled program?
Such use case was never supported originally, because of the problem that in such a manual "I loaded the app into a Worker" mode, Emscripten would still need to be able to load some stuff (a lot of stuff) into the JS context of the main application thread, in order for any kind of general proxying to work. And Emscripten runtime JS libraries were not developed with the impression that the target thread that proxying occurs to, would not be the main browser thread.
Yes exactly. It turns out that this configuration is supported, at least for some programs. Obviously such programs cannot use DOM or anything like that, but I believe we do at least some testing of this configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I added those, but that is not for supporting the "user manually launches a pthread-enabled build inside their manually created new Worker()".
Can you explain what you intended the difference between those two functions to be then? I seems that unless we support that configuration then then the main runtime thread and the main browser thread would always be the same? Unless I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very originally those functions intended to be a generic set to enable common handling of each of single-threaded, -pthread, and PROXY_TO_PTHREAD and PROXY_TO_WORKER build modes, so that users would be able to do multiple variants of builds from the same codebase and we'd be able to write system library code that would be aware of each of those modes.
So, for example, in a PROXY_TO_PTHREAD build, it would be expected to happen that emscripten_is_main_runtime_thread()
== emscripten_is_main_browser_thread()
. But then if one would make a PROXY_TO_WORKER
build of the same, then those functions would change to return the different respective things. So that way one could have one codebase that was ready to mutate to any of those build modes.
But I did not originally intend for pthreads enabled builds to ever be loaded while inside a Worker(), since the __proxy: 'sync' semantics would proxy to the hosting Worker instead of going to the main browser thread, and we have never had any facilities to do proxying to the main browser thread in such a mode.
Although now that I think about it, yeah, it is probably the case that if user avoids all those library_x.js files that I listed above (that suffer from this problem), things will probably be pretty safe for them, even if they manually load pthread-enabled builds in their own Workers. And one can build "reasonable" use cases in the absence of relying on those library_x.js files.
I scanned library_browser.js and library_wasm_worker.js to see if there might be functions like this, but none stood out. The motivation for adding this is that at Unity I am seeing more and more feature developers add their own JS functions, and they generally copy-paste the __proxy: 'sync' directives when they are unsure about what that means. Many of these JS library functions should never use proxying, but we only ever call those JS functions in the main browser thread - if any pthread might call any of those JS functions, there would be trouble. So that is I why I wanted to make the proxying mechanism check this for me: by documenting each function that should be strictly main thread only, I can run the test suites to confirm that it indeeds happens to be the case, and that we are not getting any surprising unwanted proxying taking effect. |
It looks like there is this wholesale proxy: 'sync' directive that is happening with the filesystem, so changed https://github.com/emscripten-core/emscripten/pull/22648/files#diff-5de5cdf403acefc1dc87b4446a186049adc104ceca678d993f3d5a007bcafcc5R273 to make tests pass. |
@@ -270,6 +270,7 @@ var WasiLibrary = { | |||
#else | |||
fd_write__deps: ['$printChar'], | |||
#endif | |||
fd_write__proxy: 'none', // Each Worker can do stdout/stderr printing on its own, without needing to proxy to the main thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure doWritev
won't work if you call it on a thread. And I think its not true for printChar
either since you would end up with a different printCharBuffers
on each thread.. which is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would end up with a different
printCharBuffers
on each thread.. which is not correct.
I looked into this some decade ago, and from back then, I could not find defined semantics on how multiple threads writing on stdout/stderr should synchronize, when fflush is not called by the user. In most implementations, like ours, printing \n
is an implicit fflush.
If multiple threads race to print to stdout/stderr, then it does seem to me it would be better semantics for each thread to construct full \n buffered lines that they would then race to print, rather than each thread mixing their individual character writes that could result in the data from multiple threads inter-mixing with each other? I could not find info to say such behavior would not be compliant.
Each Worker has a copy of MEMFS/library_tty.js in them, so they would each end up calling to default_tty_ops.put_char() and pushing out console.log() calls independently?
Having each thread synchronously block to perform stdout prints would seem like a rough performance overhead to have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the function is not only used for stdout/stderr, its used for all file access.
If we do make this change it would be very large one (basically making the filesystem multi-threaded) and I would at least expect a separate RP for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, you're right of course. I somehow was interpreting this from the perspective that this function was already just for the stdout/stderr writing case. That would not work. I'll think of something here.
Add new proxying modes foo__proxy: 'abort' and foo__proxy: 'abort_debug', which will add compile time checks to verify that given JS function is not called in pthreads or Wasm Workers.