-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application crashes when using explicit sync #110
Comments
I have also seen this issue on GNOME. I've also used WAYLAND_DEBUG to get the log for this, here are the last few lines:
System: Note that I had to add this line to /etc/modprobe.d/nvidia.conf to get the Wayland session to show up in GDM. Regardless, I had to do this for other driver versions as well, and I don't think it has influenced the bug:
|
If I understand the log correctly, does it mean that the applications such as Firefox are violating the Wayland explicit sync protocol by committing contents to an explicit sync surface before it's been allocated? And if that's the case, this is something that needs fixes elsewhere (in GUI toolkits?), not in the NVIDIA driver. |
No, Firefox doesn't knowingly violate the explicit sync protocol. There are likely two threads doing wayland compositor calls in parallel (which is perfectly fine. The functions are thread safe) leading to an invalid sequence of individual calls. |
I specifically remember mentions of this exact issue being discussed on the MR for explicit sync. I think the conclusion of that discussion was that the firefox behavior would be marked as a protocol violation. So it'd be up to Mozilla and Thunderbird to fix it. The protocol is working as designed. |
I have found that this issue also happens on KDE Plasma 6.0.5 with GTK4/libadwaita applications when a hamburger menu tries to close. This does not happen on GNOME. I have tested it with Curtail, Paper Clip, Foliate and the libadwaita demo. Log from the adwaita demo:
|
Thanks for the note. Found it: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/90#note_2243522 @MaLoLHD This portion of the log is not very useful. The object (wp_linux_drm_syncobj_surface_v1@68) that violates the protocol is never referenced and we don't know to which surface it is attached. |
|
The object is attached to wl_surface@62 which got a null buffer attached. |
That was a really important find. Three people from NVIDIA, the XWayland maintainer, and one of the KDE Explicit Sync developers, and others, are all talking about it there. I've read through every reply and can summarize it as follows:
I bet there's a Mozilla bug tracker thread about it somewhere too. |
Wayland as the messaging protocol is thread safe, but access to the wl_surface from different threads is not. What Firefox is doing has always been broken, and always had the potential to cause crashes and bugs. With explicit sync it just gets way more chances for that to actually cause visible problems. |
Closing as this is a firefox bug. Thanks everyone for following the discussion about it in the protocol MR. Seems that a firefox fix is on the way. |
Possibly needs to be reopened see: https://bugzilla.mozilla.org/show_bug.cgi?id=1908825 Still crashing after backporting the following patches to 128.0 |
With
Using There is a closed issue in kitty bug tracker with this exact same issue: kitty#7767 Not sure where else to report this, so here I am. |
Reference: #104 (comment)
While I don't use KDE I have similar issues with sway although the only application I encountered such crashes where firefox and thunderbird.
The issue here is a race between sending requests to the compositor:
full log
This is a portion of the WAYLAND_DEBUG log when such a crash occurs (from thunderbird).
What I imagine happens here is that one thread calls set_opaque_region on the surface while another attaches a new buffer.
The commit after the set_opaque_region call happens to be in between attaching the buffer and setting the corresponding syncobj points. This is per definition a protocol violation of wp_linux_drm_syncobj_surface_v1.
So the real issue is that there is no way to send multiple request (like attach, set_acquire_point, set_release_point) atomically to the compositor.
The text was updated successfully, but these errors were encountered: