Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption in WASI build #774

Open
Hakkin opened this issue Dec 27, 2024 · 13 comments
Open

Memory corruption in WASI build #774

Hakkin opened this issue Dec 27, 2024 · 13 comments

Comments

@Hakkin
Copy link

Hakkin commented Dec 27, 2024

Hi, I seem to have come across some kind of memory-corruption(?) bug while trying to use the WASI build.
This is the script I was running, it is an obfuscated function extracted from the YouTube player JS.
I added a bit of debug code to the middle of the function here:

    let getHandler = function(target, property, receiver) {
        console.log(typeof property);
        return Reflect.get(target, property, receiver);
    }
    let p = new Proxy(N,{get:getHandler});
    N = p;

I originally was trying to run this using wazero, but got the following output:

$ wazero run --mount .:/:ro qjs-wasi.wasm youtube.js
string
string
(...)
string
symbol
string
string
string
string
string
string
string
string
string
stri¡�
stri¡�
stri¡�
stri¡�
error instantiating wasm binary: module[] function[_start] failed: wasm error: out of bounds memory access
wasm stack trace:
        .JS_ToCStringLen2(i32,i32,i64,i32) i32
        .js_print(i32,i64,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .js_proxy_get(i32,i64,i32,i64) i64
        .JS_GetPropertyInternal2(i32,i64,i32,i64,i32,i32) i64
        .JS_ToPrimitiveFree(i32,i64,i32) i64
        .JS_ToStringInternal(i32,i64,i32) i64
        .JS_ToStringFree(i32,i64) i64
        .string_buffer_concat_value_free(i32,i64) i32
        .js_array_join(i32,i64,i32,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .js_array_toString(i32,i64,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .JS_ToPrimitiveFree(i32,i64,i32) i64
        .JS_ToStringInternal(i32,i64,i32) i64
        .JS_ToStringFree(i32,i64) i64
        .string_buffer_concat_value_free(i32,i64) i32
        .js_array_join(i32,i64,i32,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .js_array_toString(i32,i64,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .JS_ToPrimitiveFree(i32,i64,i32) i64
        .JS_ToStringInternal(i32,i64,i32) i64
        .JS_ToStringFree(i32,i64) i64
        ... maybe followed by omitted frames

I thought this might be a bug with wazero, so I attempted using wasmer, but got the same results:

$ wasmer run --mapdir /:. qjs-wasi.wasm youtube.js
string
string
(...)
string
symbol
string
string
string
string
string
string
string
string
string
stri¡�
stri¡�
stri¡�
stri¡�
[05:14:49:627 - 0]: wasm_runtime_malloc failed: memory hasn't been initialize.

[05:14:49:628 - 0]: Init vector failed: alloc memory failed.

The couple stri¡� before it crashes leads me to believe the script is causing some kind of memory corruption in the WASM runtime.

Running this script using the native version (seemingly?) works correctly, though I can't guarantee there isn't some kind of subtle memory corruption going on:

$ qjs youtube.js
string
string
(...)
string
string
B08_QXeAL9aa3Q
@Hakkin
Copy link
Author

Hakkin commented Dec 27, 2024

Hi, I was able to reduce the crash to the following:

let N = [1,2,3];
N[3] = N;
N % 1; 

It seems to be some issue with arrays that have circular references.

$ wazero run --mount . qjs-wasi.wasm repro.js
error instantiating wasm binary: module[] function[_start] failed: wasm error: out of bounds memory access
wasm stack trace:
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .JS_ToPrimitiveFree(i32,i64,i32) i64
        .JS_ToStringInternal(i32,i64,i32) i64
        .JS_ToStringFree(i32,i64) i64
        .string_buffer_concat_value_free(i32,i64) i32
        .js_array_join(i32,i64,i32,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .js_array_toString(i32,i64,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .JS_ToPrimitiveFree(i32,i64,i32) i64
        .JS_ToStringInternal(i32,i64,i32) i64
        .JS_ToStringFree(i32,i64) i64
        .string_buffer_concat_value_free(i32,i64) i32
        .js_array_join(i32,i64,i32,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .js_array_toString(i32,i64,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .JS_ToPrimitiveFree(i32,i64,i32) i64
        .JS_ToStringInternal(i32,i64,i32) i64
        .JS_ToStringFree(i32,i64) i64
        .string_buffer_concat_value_free(i32,i64) i32
        .js_array_join(i32,i64,i32,i32,i32) i64
        .js_call_c_function(i32,i64,i64,i32,i32,i32) i64
        .JS_CallInternal(i32,i64,i64,i64,i32,i32,i32) i64
        .js_array_toString(i32,i64,i32,i32) i64
        ... maybe followed by omitted frames

Note that N % 1; isn't the only problematic expression,
N.toString(); will also result in a crash.
N + 1; will result in an infinite loop.

I'm not sure if this is the sole issue here, since I'm not sure if this explains the memory corruption portion of this.

@Hakkin
Copy link
Author

Hakkin commented Dec 27, 2024

The root of the crash seems to stem from the built-in implementation of Array.prototype.toString, it doesn't seem to detect circular references so it overflows the stack with infinite recursion. In the native build, this throws a RangeError inside the JS environment once it hits the stack-size limit, but it seems the WASI build doesn't respect this and just overflows the stack and crashes the runtime.

Adding Array.prototype.toString = Object.prototype.toString to the top of both example scripts fixes the crash.

@saghul
Copy link
Contributor

saghul commented Dec 27, 2024

Good find!

Yes, stack overflow detection is disabled in WASI, I wonder if there is a way to make it work...

@Hakkin
Copy link
Author

Hakkin commented Dec 28, 2024

It seems like there are 2 bugs here I guess, the stack overflow detection in the WASI build and also the Array.prototype.toString implementation that affects all builds. Should I split that into a separate issue?

@saghul
Copy link
Contributor

saghul commented Dec 28, 2024

Doesn't the stack overflow checks work on other platdorms and thus errored properly?

@Hakkin
Copy link
Author

Hakkin commented Dec 28, 2024

Yes, but I'd argue Array.prototype.toString (technically it's actually Array.prototype.join that's the issue) triggering infinite recursion in the first place is a bug in its own right...Spidermonkey and V8 both detect these kinds of cyclic references and skip them.

@saghul
Copy link
Contributor

saghul commented Dec 28, 2024

Gotcha, I'll take a look!

@bnoordhuis
Copy link
Contributor

triggering infinite recursion in the first place is a bug in its own right...Spidermonkey and V8 both detect these kinds of cyclic references and skip them.

https://tc39.es/ecma262/multipage/indexed-collections.html#sec-array.prototype.join - the algorithm is explicitly recursive (step 7c.i) and throwing a stack overflow exception is allowed.

What V8 and SM do is maybe a quality-of-implementation enhancement but note how your [1,2,3,N] example gets stringified to [1,2,3,1], which is kind of nonsensical. I feel throwing may be better than quietly producing wrong results, less hard to debug.


W.r.t. the lack of stack checks in wasm builds: we could switch to explicit recursion checks (call_depth field in JSRuntime that's incremented and decremented at key points, like on entry and exit in JS_CallInternal) but it might be slower than what we have now.

@Hakkin
Copy link
Author

Hakkin commented Dec 28, 2024

What V8 and SM do is maybe a quality-of-implementation enhancement but note how your [1,2,3,N] example gets stringified to [1,2,3,1], which is kind of nonsensical. I feel throwing may be better than quietly producing wrong results, less hard to debug.

I'm not sure how you're getting that result.

let N = [1,2,3];
N[3] = [N,5,6];
N.toString();

results in 1,2,3,,5,6 in both V8 and Spidermonkey, ,, being valid syntax for an empty array slot (since it simply skips the object if it's found to be cyclic).

@bnoordhuis
Copy link
Contributor

I suppose we could do that - replace cycles with a hole - but the thing I don't like is that arrays are practically never cyclic so now the really uncommon case impacts the performance of the really common case. We'd need to track cycles in a side table but that isn't free; it costs CPU and memory.

If you could persuade tc39 to add a test to that effect to test262, we'd be obliged to support it, but otherwise I'm inclined to leave well enough alone.

@Hakkin
Copy link
Author

Hakkin commented Dec 31, 2024

Looking through the source, it seems like there are a fews places that circular references are handled already:
JSON.stringify handles it by storing a stack of parent objects in a context and scanning through it for each new non-primative:
https://github.com/quickjs-ng/quickjs/blob/74fd4d7dc90f7ebf07e0bb4c7223d1cb98f8ad78/quickjs.c#L45385-L45391

The JS_WriteObjectRec function (used by the bjson module) handles it by checking and marking the tmp_mark bit on each entered JSObject so a context/stack isn't required:
https://github.com/quickjs-ng/quickjs/blob/74fd4d7dc90f7ebf07e0bb4c7223d1cb98f8ad78/quickjs.c#L34411-L34415

If the tmp_mark bit could be reused, cycle detection could be done for pretty much free. Checking and setting a single bit should be negligible performance wise. The stack method used by JSON.stringify is less performant, but it also shouldn't be that bad unless you have deeply nested objects. There are obviously other methods that can be used as well.

@chqrlie
Copy link
Collaborator

chqrlie commented Dec 31, 2024

Messing with the tmp_mark bit is risky: it would be a problem if gc is triggered during the operation, which is not the case for JS_WriteObjectRec(), but could happen for Array.prototype.toString() because memory is allocated and javascript code is invoked for the conversions. Furthermore, this method would make Array.prototype.toString() non-reentrant, which may also be a problem. Here is an alternative approach: keep track of the maximum recursion level and if a given threshold is reached, restart the operation with an alternative implementation that uses a similar loop detection scheme as JSON.stringify.

Can you quote the text from the ECMA Standard that specifies the required behavior?

@Hakkin
Copy link
Author

Hakkin commented Dec 31, 2024

Can you quote the text from the ECMA Standard that specifies the required behavior?

It's not required by the standard as far as I can tell, but it is something implemented by most other engines, to the point where I wouldn't be surprised if there's a decent amount of existing code that unintentionally relies on this behavior. I found this crash running real-life code after all, and I was personally a bit surprised that something as simple as stringifying an array can cause a stack overflow. I can understand wanting to stick to the specifications though, changing unspecified behavior for compatibility with other engines is opening a bit of a can of worms. The join issue can technically be fixed outside of the library anyways, so it's not a huge issue if it's not something you guys are interesting in changing.

It would be nice to find a solution to the WASI stack overflow issue though, since that's a little harder to work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants