Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permitting device loss due to improper external synchronization is (or at least seems) too restrictive #2409

Open
DemiMarie opened this issue Aug 16, 2024 · 2 comments
Assignees

Comments

@DemiMarie
Copy link

The Vulkan specification states that improper external synchronization may cause device loss. For semaphores:

  • Losing the logical device on which the violation occurred immediately or at a future time, resulting in a VK_ERROR_DEVICE_LOST error from subsequent commands, including the one causing the violation.

For fences:

  • In the preceding cases, any of the devices associated with the fences sharing the payload may be lost, or any of the queue submission or fence reset commands may return VK_ERROR_INITIALIZATION_FAILED

While VK_ERROR_DEVICE_LOST is not a fatal error, it is nevertheless quite disruptive. An untrusted program (such as a guest virtual machine or a sandboxed application) being able to cause VK_ERROR_DEVICE_LOST is therefore not desireable. If it happened repeatedly, it could effectively cause denial of service to the importing application, such as a Wayland compositor. Therefore, it would be preferable to provide stronger guarantees here.

Due to hardware or firmware limitations, some implementations may not be able to prevent faults in one application from causing device loss in another. This is out of scope for this issue.

@cubanismo
Copy link

This is beyond the level of security provided by the current external memory extensions. The external memory extensions guarantee only that the applications won't be able to clobber each-others' non-shared resources or cause the other application to terminate IIRC. Beyond that, some level of mutual trust is indeed assumed. If you have an environment that requires stronger protections and can provide them, you could define additional extensions defining the additional guarantees there.

For the situation where a DOS occurs as you describe, the expectation is the victim application detects any repeated device lost from operations associated with synchronization primitives from the peer, and terminates its trust/sharing with that peer if it wants to avoid DOS.

@DemiMarie
Copy link
Author

Are there any real-world implementations that do not provide stronger guarantees?

I’m specifically concerned with the dmabuf extension, which is used by Wayland compositors. My understanding (from reading many mailing list posts, among other things) is that a client is expected to not be able to DoS the compositor, except by causing a GPU fault that isn’t able to be recovered from. This means that the guarantees provided by the extension are significantly weaker than what real-world implementations are expected to provide.

I’m CCing @marmarek (from Qubes OS), @emersion (wlroots lead maintainer), @gfxstrand (Linux graphics stack developer), and @robclark (ChromeOS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants