-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Maximum call stack size exceeded" on Safari 18.2 #2
Comments
There's one recent WebKit bug that could be related. It specifies iOS but it doesn't actually say that it doesn't happen on desktop. It could be an ARM thing so it would show up on any Apple Silicon. The reporter says that the crash appears with Emscripten optimization levels "O2 O3 Os OZ". Our bug matches that behavior exactly:
My working theory is we're running into the same problem, and its absence from the WebKit Technology Preview may mean it is already being addressed. I'll still take a look at the stack leak possibility, though. |
I tried building with Emscripten 4.0.1 (instead of 3.1.61) on the chance that a difference in code generated by the compiler might sidestep the problem. Unfortunately it didn't. |
I injected this printf into the SQLite C source: SQLITE_API int sqlite3_step(sqlite3_stmt *pStmt){
int rc = SQLITE_OK; /* Result from sqlite3Step() */
Vdbe *v = (Vdbe*)pStmt; /* the prepared statement */
int cnt = 0; /* Counter to prevent infinite loop of reprepares */
sqlite3 *db; /* The database connection */
printf("sqlite3_step %p\n", &rc); That produced this on the Safari console: This shows thousands of calls to |
If this is a Safari bug then I'm wondering why a lot more developers haven't reported it. There are quite a few WebAssembly apps out there now. My speculation is that it is related to Asyncify, which would really narrow the space. I tried running the test page with a synchronous build (i.e. no Asyncify) and OPFSCoopSyncVFS, and that ran successfully so that is a consistent observation. |
Well, we know that Notion uses SQLite. I just tried it in Safari and it looks like it created a SQLite file in OPFS. And based on this article, my guess is they also use an async build. Another pretty mature implementation is Livestore (example project). If we could somehow stress-test those by injecting our code, that would tell us whether they have the same problem. I know author of Livestore, so maybe we could get him on a call and see if he encountered the same problem building the library. |
Notion uses the official SQLite WASM, which only has a synchronous build. I added a comment on that WebKit bug to ask the reporter if they use Asyncify. Update: the response was that they do use Asyncify. |
I put a simplified test online (source), and added a comment to the WebKit bug in hopes that might aid a complete and timely fix. |
I just tried branch https://github.com/team-reflect/reflect/pull/3177 and got the maximum call stack size error even in Google Chrome. This was when I was trying to sync my account with 10k notes. It seems that sync succeeded. Also, app is working normally after the error. Edit: I can reproduce this each time (tried 3 times) with this new build. Going to try multiple times with the old build to see if it's also happening there and we just missed it before. |
My local Reflect app with the -O1 build is working fine for me on both Safari and Chrome. Maybe I don't have enough notes or they don't have the right content to trigger the problem. I wonder also if your app build might somehow be picking up inconsistent pieces of wa-sqlite, e.g. because of build or browser caching? If you have mismatched Emscripten JS and WebAssembly, that can cause all kinds of strange failures. I'm thinking that isn't the problem here because it's usually more catastrophic than what you're getting, but if you can force a full rebuild and reload it might be worth trying. I also created another wa-sqlite branch, rhashimoto/wa-sqlite-01-debug, that has debug symbols (the WebAssembly file is 12MB!). Try reproducing your Chrome stack overflow with that and see if that tells us which function is getting blamed for infinite recursion. I don't know if this will tell us anything meaningful, so not high priority. Just searching for any bit of information that might help from cases we can actually reproduce. |
@rhashimoto I'm gonna try your debug build, and in the meantime, here's an export with 10k notes that should reproduce the issue: After importing, sign out, and sign back in, so that the full sync runs. |
With your |
Thanks, that does in fact reproduce in my environment. The failure of the O1 build in Chrome looks like a real stack overflow. Emscripten uses "far more [local variables]" for unoptimized (and presumably lightly optimized) code. My theory is that in trying to avoid a false stack overflow by reducing the optimization, we ironically caused a true stack overflow. I was hoping to fix this by changing the Next I tried changing The other path is to crank up the stack size. I think that should also work if we don't run into any hard limits. I'll try that tomorrow. |
@rhashimoto I switched to The SearchStore reindex happens with thousands of IDs after the initial sync. We could stop using IN like this:
However there is still an instance where a large IN query is necessary, and that is during the initial sync:
There's no way to stop using the IN query here. do you like the solution where we change the application code to stop using certain features, like IN? Would that be your suggestion for us? |
I suppose we could limit the size of IN queries in both reindexing code and initial sync. the page size of 20 however feels too small. I feel uneasy about crashing at only 100 rows per page. |
I'm thinking that if IN is the operation causing recursion (still a conjecture), we might be able to write an equivalent query with a CTE. So instead of this: SELECT * FROM "notes"
WHERE "graphId" == ?
AND "deletedAt" IS NULL
AND ("subject" IS NOT NULL OR "isBlank" = ?)
AND "id" IN (SELECT "value" FROM JSON_EACH(?))
ORDER BY "id" Replace with something like this (untested even for syntax): WITH "key" ("graphId", "id") AS
(SELECT ?, "value" FROM JSON_EACH(?))
SELECT * FROM "notes" NATURAL JOIN "key"
WHERE "deletedAt" IS NULL
AND ("subject" IS NOT NULL OR "isBlank" = ?)
ORDER BY "id" I think the CTE form might be faster as well. |
@rhashimoto so do you want me to update our query and try that form? It'll take some messing around with Kysely, but I should be able to do that. |
Yes, I think trying a CTE is worth a shot. So far I'm not having much success with cranking up the stack size. I would run both queries while you're developing so you can verify that they return identical results. For a first pass it might be easiest to use the "sql" raw query builder just to see if it works. I think that would be something like: await sql`WITH key (graphId, id) AS
(SELECT ${sql.lit(graphId)}, value FROM ${jsonEach(chunkIds)})
SELECT * FROM notes NATURAL JOIN key
WHERE deletedAt IS NULL
AND (subject IS NOT NULL OR isBlank = 0)
ORDER BY id
`.execute(db) |
@ocavue I'm not gonna have enough time to wrap this up today - so could you please try @rhashimoto's suggestion? And in the meantime @maccman agreed that we restore the dialog warning about Safari. |
Apparently there are two stacks in WebAssembly, the C stack and the VM stack. The "RangeError" in
indicates that it is the VM stack that is overflowing. Unfortunately, that one is built in to the browser and Emscripten build flags don't affect it. Generally the fix is...use an optimizing build to reduce stack usage by eliminating unnecessary locals. 😖 So we need to decrease optimization to avoid the Safari bug but doing so causes certain queries to overflow the Chrome VM stack. This is looking like a dead end to fix this problem. The pathways still open are:
|
If the CTE is hard to express with Kysely, maybe a subquery would be easier: SELECT * FROM notes
NATURAL JOIN (SELECT 'myGraph' AS graphId, value AS id FROM JSON_EACH('[...]'))
WHERE ...
ORDER BY ... |
This is close: const q = this.noteTable.query()
.innerJoin(
sql<{ graphId: string, id: string }>`(SELECT ${sql.val(this.graphId).as('graphId')}, ${sql.ref('value').as('id')} FROM JSON_EACH(${sql.val(JSON.stringify(chunkIds))}))`.as('key'),
join => join
.onRef('notes.graphId', '=', 'key.graphId')
.onRef('notes.id', '=', 'key.id'))
.selectAll()
.where('deletedAt', 'is', null)
.where((eb) => eb.or([eb('subject', 'is not', null), eb('isBlank', '=', 0)]))
.orderBy('key.id')
// .compile() But it produces this: select * from "notes" inner join
(SELECT ? as "graphId", "value" as "id" FROM JSON_EACH(?)) as "key"
on "notes"."graphId" = "key"."graphId" and "notes"."id" = "key"."id"
where "graphId" == ? and "deletedAt" is null and ("subject" is not null or "isBlank" = ?)
order by "id" That has an extra In any case, although this might eventually replace IN, I think this will be a partial solution at best. I see other queries without IN that also crash Chrome. I'm not feeling optimistic about it. |
@maccman @vojto @ocavue
After many queries, Safari 18.2 can hang or produce a "Maximum call stack size exceeded" WebAssembly error from wa-sqlite.
Reproduce with this test page. This fails for me after ~4000-10000 queries (depending if you use dist or debug wa-sqlite) on Safari 18.2 while it has not been observed to fail on Safari Technology Preview, Chrome 132, or Firefox 134 for 1 million queries.
It is possible this is a stack leak similar to this one - this can be investigated as done here.
I think a browser bug is also a strong possibility. Since it doesn't appear in Safari Technology Preview, we can look at recently closed WebKit bugs to see if there is anything interesting there.
The text was updated successfully, but these errors were encountered: