-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wpe-2.38][mse] hang on early seek #1367
Comments
We use attached patch as workaround with GStreamer 1.18.5: |
I haven't been able to reproduce the issue even after 500 repetitions. I might try to blindly include that patch in our set of buildroot custom patches, but the problem you describe is suspicioulsy similar to the one reported in https://bugs.webkit.org/show_bug.cgi?id=272975 / WebKit/WebKit#27517 (a problem with a half-configured playsink when flush happens). |
@emutavchi: Any feedback about applying the solutions from https://bugs.webkit.org/show_bug.cgi?id=272975 and https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6763 ? |
Hi @eocanha, apologies for delayed response. Unfortunately I don't have cycles to try latest GStreamer right now. However, the hang is quite easily reproducible for me with Epiphany Browser on WebKitGTK 2.44.3 + GStreamer 1.20.3 on Ubuntu 22.04.5 LTS. this is the best callstack of the hang I could get with Ubuntu debug syms:
|
Ok. I've been able to reproduce it upstream. I'm debugging it. |
The problem, which happened once in a bunch of test case runs, was as follows: A single buffer of one of the streams was pushed (the other stream simply didn't exist yet), and this caused the stream to be in mid-creation when the seek happened. The seek caused WebKitMediaSrc to emit flush-start, which was forwarded downstream in a half-created stream element chain until it got to an unconnected element (and couldn't be forwarded anymore because the rest of the stream element chain didn't exist yet). Then the chain got completely connected (by timing chance) and flush-stop was emitted (also by timing chance). This caused the last elements of the chain to receive a flush-stop without a previous flush-start. The flush-stop event was propagated downstream, up to a pad (decodebin3-5:video_0) that has sticky events stored (the stream-start event, to be precise). That pad had to push the sticky events first before processing that flush-stop. The stream-start event was forwarded downstream until it reached video_sink:proxypad142, which calls a blocking probe and keeps the flush-stop event stuck upstream. After discussing the problem with some colleagues and come up with some possible workarounds, the final solution was to disable sending sticky events when pushing a FLUSH_STOP. @ntrrgc kindly wrote a patch to implement this solution in GStreamer and submitted it as https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7632, which was approved and should land soon. I've checked that this patch effectively solves the problem in WebKit upstream. I haven't been able to reproduce it anymore after 334 iterations of the test case (it would manifest before 10 repetitions without it). I'm going to work in the next days to port that patch to GStreamer 1.18 and submit it as a buildroot patch. |
FLUSH_STOP is meant to clear the flushing state of pads and elements downstream, not to process data. Hence, a FLUSH_STOP should not propagate sticky events. This is also consistent with how flushes are a special case for probes. Currently this is almost always the case, since a FLUSH_STOP is __usually__ preceded by a FLUSH_START, and events (sticky or not) are discarded while a pad has the FLUSHING flag active (set by FLUSH_START). However, it is currently assumed that a FLUSH_STOP not preceded by a FLUSH_START is correct behavior, and this will occur while autoplugging pipelines are constructed. This leaves us with an unhandled edge case! This patch explicitly disables sending sticky events when pushing a FLUSH_STOP, instead of relying on the flushing flag of the pad, which will break in the edge case of a FLUSH_STOP not preceded by a FLUSH_START. If sticky events are propagated in response to a FLUSH_STOP, the flushing thread can end up deadlocked in blocking code of a downstream pad, such as a blocking probe. Instead, those events should be propagated from the streaming thread of the pad when handling a non-flushing synchronized event or buffer. This fixes a deadlock found in WebKit with playbin3 when seeks occur before preroll, where the seeking thread ended up stuck in the blocking probe of playsink: WebPlatformForEmbedded/WPEWebKit#1367 Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7632>
Patch migrated and pushed to https://github.com/WebPlatformForEmbedded/buildroot/tree/wpe as WebPlatformForEmbedded/buildroot@44b57a2 and to https://github.com/WebPlatformForEmbedded/buildroot/tree/main as WebPlatformForEmbedded/buildroot@fbdbe27. Closing Issue. |
Can be reproduced with attached mse_hang.html.gz. It is reproducible randomly, but usually within first 100 iterations
Browser main thread is blocked on playsink data prob that is awaiting for data flow on another pad:
all_bt.txt.gz
The text was updated successfully, but these errors were encountered: