Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

CendioOssman · 2024-06-21T09:51:26Z

Describe the bug
If I start Xvnc with -renderNode set to my integrated AMD GPU, and then start an application using my discrete Nvidia GPU, then Xvnc will crash with SIGBUS:

(EE) 
(EE) Backtrace:
(EE) 0: Xvnc (xorg_backtrace+0x82) [0x557530197d42]
(EE) 1: Xvnc (0x55752ffe1000+0x1b7f4c) [0x557530198f4c]
(EE) 2: /lib64/libc.so.6 (0x7f475db30000+0x40710) [0x7f475db70710]
(EE) 3: /lib64/libpixman-1.so.0 (0x7f475e151000+0x8a2d0) [0x7f475e1db2d0]
(EE) 4: /lib64/libpixman-1.so.0 (pixman_blt+0x81) [0x7f475e15f8d1]
(EE) 5: Xvnc (vncDRI3SyncPixmapFromGPU+0x10e) [0x55753004303e]
(EE) 6: Xvnc (0x55752ffe1000+0x622c3) [0x5575300432c3]
(EE) 7: Xvnc (dri3_pixmap_from_fds+0xcf) [0x5575300cfdaf]
(EE) 8: Xvnc (0x55752ffe1000+0xf1309) [0x5575300d2309]
(EE) 9: Xvnc (Dispatch+0x426) [0x557530133f56]
(EE) 10: Xvnc (dix_main+0x46a) [0x557530142d4a]
(EE) 11: /lib64/libc.so.6 (0x7f475db30000+0x2a088) [0x7f475db5a088]
(EE) 12: /lib64/libc.so.6 (__libc_start_main+0x8b) [0x7f475db5a14b]
(EE) 13: Xvnc (_start+0x25) [0x55753003ed75]
(EE) 
(EE) Bus error at address 0x7f4753011000
(EE) 
Fatal server error:
(EE) Caught signal 7 (Bus error). Server aborting
(EE)

To Reproduce
Steps to reproduce the behavior:

Xvnc -renderNode /dev/dri/renderD128 :2 (assuming renderD128 is the AMD iGPU)
DISPLAY=:2 vkcube --gpu-number 1 (assuming GPU 1 is the Nvidia dGPU)

Expected behavior
vkcube renders perfectly normal on the Xvnc display.

Client (please complete the following information):
No client needed.

Server (please complete the following information):

OS: Fedora 40
VNC server: TigerVNC
VNC server version: 1.14.0 beta
Server downloaded from: Built from contrib spec file
Server was started using: See above

Additional context
Also crashes with an Intel ARC discrete GPU instead of the Nvidia one.

Does not crash if Xvnc is started with the discrete GPU and the application uses the integrated GPU. Possible bug in AMD driver?

The text was updated successfully, but these errors were encountered:

CendioOssman · 2024-06-21T09:51:56Z

More details available in this thread:

https://lists.freedesktop.org/archives/mesa-dev/2024-June/226245.html

CendioHalim · 2024-06-27T10:47:50Z

A bug has been reported to the kernel: https://bugzilla.kernel.org/show_bug.cgi?id=218993

CendioHalim · 2024-06-27T13:12:06Z

https://gitlab.freedesktop.org/drm/amd/-/issues/3457

dcommander · 2024-07-12T20:46:20Z

I observe a bus error when attempting to start a VMware virtual machine with 3D acceleration. VMware uses Vulkan, and the failure seems to occur at exactly the same place as the failure described in this issue. (The symptoms are identical when I start a VMware virtual machine with 3D acceleration vs. when I run vkcube --gpu_number 1.) Symptomatically, a pixmap is allocated from a file descriptor, and a buffer object is successfully imported. However, when attempting to synchronize the buffer object and the pixmap, the pointer obtained from gbm_bo_map() appears to be invalid, so the pixel copy crashes.

dcommander · 2024-07-12T20:54:34Z

It does appear to be the same issue. If I set VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json to force VMware to use the AMD Vulkan driver, then all is well.

(based on the implementation in TigerVNC 1.14 beta) - Synchronize pixels between DRI3 pixmaps and their corresponding GBM buffer objects on an as-needed basis, in response to specific X11 operations rather than on a schedule. - Implement the simpler DRI3 v1 interface rather than DRI3 v2. This avoids the need to implement the get_formats(), get_modifiers(), and get_drawable_modifiers() methods. - Use Pixman (which is SIMD-accelerated) to synchronize pixels. - Hook the DestroyPixmap() screen method to clean up a pixmap's corresponding GBM buffer object if there are no more references to the pixmap. - Hook the CloseScreen() screen method to clean up the GBM device and close the DRM render node. To do: - Synchronize only the pixels that have changed. Known issues: TigerVNC/tigervnc#1772

seacat17 · 2025-02-19T16:58:31Z

Is there any fix found for this issue yet?

EDIT: I have an idea but I don't know how to do that.

How can I run the server entirely on dGPU without using AMD driver? I have 2 GPUs: iGPU is AMD Radeon Vega and dGPU is RTX 3050. Can I run the server on RTX 3050 only?

CendioOssman · 2025-02-20T09:02:46Z

Is there any fix found for this issue yet?

Please see the upstream bug reports linked above. But no, currently we haven't seen any update from them with a fix.

How can I run the server entirely on dGPU without using AMD driver? I have 2 GPUs: iGPU is AMD Radeon Vega and dGPU is RTX 3050. Can I run the server on RTX 3050 only?

Yes, with the renderNode setting mentioned above. I would guess your dGPU is at /dev/dri/renderD129. Note #1773, though.

You could also see if you can completely disable the iGPU in UEFI, if it's not being used.

seacat17 · 2025-02-20T17:09:03Z

My laptop cannot disable iGPU.

seacat17 · 2025-02-20T22:58:11Z

Yes, with the renderNode setting mentioned above. I would guess your dGPU is at /dev/dri/renderD129. Note #1773, though.

How do I set renderNode for vncsession or vncserver?

EDIT: I figured it out. I needed to use the user config file and add the parameter like this:

rendernode=/dev/dri/renderD129

However, I have a weird issue with performance. The game runs just fine and nvidia-smi reports the load, but the game seems to be unable to unleash the GPU's performance - it only loads it for 15% at best, judging by Manohud stats.

CendioOssman · 2025-02-21T09:03:12Z

Indeed. Nvidia's driver is incompatible with TigerVNC, so you're not getting the same acceleration as with other drivers. It seems their driver has some basic acceleration, because it seems to be faster than just pure CPU, but it's still way slower than what the GPU should be able to do.

We can't do much about this until either Nvidia becomes more compatible with the open-source driver model, or documents their proprietary magic.

dcommander · 2025-02-21T14:56:16Z

My understanding from their driver devs is that their proprietary magic is based on DRI2, which allocates GPU buffers on the X server. DRI3 instead allocates GPU buffers in the X client, at the expense of GLX conformance. (Multiple processes cannot render to the same GLX drawable with DRI3, but fortunately few applications need to do that.) nVidia's drivers also make heavy use of their proprietary and undocumented NV-GLX extension. Thus, even if there were documentation for the proprietary magic, we still wouldn't be able to make it work outside of a physical X server.

I strongly suspect that the hack described in #1773 (setting __GLX_VENDOR_LIBRARY_NAME=nvidia) causes the nVidia front end to be used with an unaccelerated back end. In my testing, not only is OpenGL performance sluggish with that hack, but Xvnc becomes sluggish and unresponsive as well. If you recall, prior to the introduction of GLVND, direct rendering with llvmpipe didn't work out of the box in Xvnc if nVidia's proprietary drivers were installed. You had to do a similar environment variable hack to enable Mesa's front end. With nVidia's front end, indirect rendering was used, which monopolized the X server. In my testing, Xvnc's behavior with __GLX_VENDOR_LIBRARY_NAME=nvidia and DRI3 is very reminiscent of that old indirect rendering environment. Maybe nVidia's front end has an unaccelerated fallback mode that allows it to talk to X servers that don't have nVidia's drivers installed, and that mode is activated because the X server doesn't have DRI2 or NV-GLX. That's just a wild guess, though.

CendioOssman added the bug Something isn't working label Jun 21, 2024

dcommander mentioned this issue Jul 12, 2024

Compile error on DEV version using DRI3 option TurboVNC/turbovnc#413

Closed

CendioOssman marked this as a duplicate of #1913 Feb 19, 2025

CendioOssman mentioned this issue Feb 19, 2025

Instant crash when opening fullscreen apps #1913

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

CendioOssman commented Jun 21, 2024

CendioOssman commented Jun 21, 2024

CendioHalim commented Jun 27, 2024

CendioHalim commented Jun 27, 2024

dcommander commented Jul 12, 2024

dcommander commented Jul 12, 2024

seacat17 commented Feb 19, 2025 •

edited

Loading

CendioOssman commented Feb 20, 2025

seacat17 commented Feb 20, 2025 via email •

edited

Loading

seacat17 commented Feb 20, 2025 •

edited

Loading

CendioOssman commented Feb 21, 2025

dcommander commented Feb 21, 2025

Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

Comments

CendioOssman commented Jun 21, 2024

CendioOssman commented Jun 21, 2024

CendioHalim commented Jun 27, 2024

CendioHalim commented Jun 27, 2024

dcommander commented Jul 12, 2024

dcommander commented Jul 12, 2024

seacat17 commented Feb 19, 2025 • edited Loading

CendioOssman commented Feb 20, 2025

seacat17 commented Feb 20, 2025 via email • edited Loading

seacat17 commented Feb 20, 2025 • edited Loading

CendioOssman commented Feb 21, 2025

dcommander commented Feb 21, 2025

seacat17 commented Feb 19, 2025 •

edited

Loading

seacat17 commented Feb 20, 2025 via email •

edited

Loading

seacat17 commented Feb 20, 2025 •

edited

Loading