You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Toga GTK should provide a Camera API implementation based on GStreamer, Pipewire, and the XDG Camera Portal.
Interaction with the camera portal needs to happen over DBus. While libportal can be interacted with via the GObject bindings, it's GIR bindings aren't available in the Flatpak runtime. It also less generally useful than if Toga spoke D-Bus directly, either via Gio.DBusProxy or using one of the Python D-Bus interfaces which have much better ergonomics from Python than the Gio PyGI bindings. (jeepney looks like the best option considering its flexibility to support different main loops. It is also actively maintained. Jeepney's flexibility means it can be portable between Python's asyncio (e.g., for console apps), GLibMainLoop, Qt, or whatever other types of application loops Toga might support for Linux. While it is "pure-Python" and therefore potentially a performance bottleneck (Jeepney's docs cover this trade-off), Toga's usage for the Camera API and other UI-related matters will not be performance sensitive. Toga users in need of high-performance D-Bus would still be free to use whatever implementation they prefer, and without the concern of mainloop flexibility because an actual app is already constrained by mainloop type it would target for UI anyway (i.e., GTK apps will have the GLibMainLoop, Qt will have Qt's loop, etc).
While the implementation should not over-complicate itself in order to be GTK-agnostic from the start, if the core can be written in such a way that it does not rely on the GLib main loop, it will benefit future Linux backends and enable code sharing of core logic.
Why tie Toga to the XDG portal even when the application is not sandboxed
Using the XDG Camera Portal similarly creates portability. While it couples Toga apps to the XDG desktop portal, and while that isn't strictly necessary, XDG is the way things are moving on Linux applications, and by using the portal API even outside where it's strictly needed (e.g., in a sandbox) Toga will not need to have divergent implementations for finding camera devices in and outside the sandbox. One only needs to see the complexity of divergent behaviours in and out of a sandbox that the location API had to deal with (#2999) to see how useful it is if the implementation is able to be agnostic of whether it's sandboxed. There are vanishingly few circumstances with desktop Linux where a GUI is relevant where an XDG desktop portal implementation is not available. Even users of minimalist WMs like sway usually install an XDG desktop implementation, due to how prevalent usage of the portal is in contemporary Linux applications.
Ergonomics and framing the shot
As far as ergonomics are concerned, I think there might be issues with how to present the actual camera interface to users. For instance, it's probably desirable for the user of a Toga application to be able to see a viewfinder before they take the image. iOS and Android handle that nicely by popping up a camera UI so that the user can frame the shot, take the photo, and the photo gets passed back to the application. None of that would come by default for Toga GTK, so I think in addition to implementing the baseline Camera API, a kind of simple GTK popout window to simulate the same experience you get on mobile devices would be useful. Once a camera handle is available, it's easy to render the video output from the device into a GTK video widget.
Describe alternatives you've considered
I strongly considered and spent time exploring whether it would be worth implementing a simple version of the camera API that only works on non-sandboxed applications. The library linuxpy makes it easy to interact with video4linux2, for example, and capture a frame. Most of the camera API (except for flash) could be very easily implemented with linuxpy. However, none of that implementation could be shared with a version that works in an application sandbox. The dependency is also of questionable broad value for Toga GTK (none of its other APIs are really relevant for GUI apps) and writing a custom interaction with v4l2 would be a huge pain and a massive thing to maintain. The camera portal + Pipewire + GStreamer is, in fact, simpler in the long term. Pipewire video objects are relatively immature, but Firefox uses this approach successfully for its own camera support, so it absolutely is not lacking in support.
Similarly, a v4l2-specific approach would not set Toga GTK up for adding things like sound recording ability, if a microphone API was ever introduced. The Pipewire + GStreamer approach makes that easy. GStreamer itself is widely documented with many millions of lines of open source code to use as reference. Likewise for Pipewire. v4l2 is also popular, but is on its way out in favour of the Pipewire approach (as evidenced by Firefox's implementation, as well as the fact that the camera portal returns a Pipewire remote as the camera handle).
Another alternative I considered was to actually pipe everything through a WebKit WebView, and leverage WebKit's camera API instead. This might sound like overkill, it but could seriously simplify things for Toga, and the WebView would not need to be rendered out to the Toga application for this, it could just be used as a host for a small wrapper to translate the Python camera API into JavaScript calls. However, it would be useful for the WebView to be the video output sink to display the "viewfinder" for the framing UI I mention in the previous section.
I don't even hate this approach, but I shy away from it because it might be too unconventional, and the dependency on WebKit for it might be too heavy. Not sure.
The same approach could be used for other APIs that browsers support, and it would always work in a sandbox. The geolocation API could be re-implemented in this way, for example, thereby avoiding the need to use GeoClue directly and deal with the differences in sandboxed execution.
But it isn't pretty 🙂
Oh, and it doesn't move Toga towards better D-Bus interactions, which will be needed anyway for things like StatusNotifierItem (see #3001), and the WebView wouldn't be a suitable interface for all things that the XDG Portal supports, like backgrounding, global shortcuts, clipboard, printing arbitrary documents, etc, all things that would be useful for Toga to provide cross-platform APIs to. For that, speaking D-Bus and doing so in an easy to manage way would be much more useful than the potential simplification piping camera access through a WebView would bring.
The text was updated successfully, but these errors were encountered:
Based on this writeup, you've convinced me that the XDG Camera Portal approach makes sense. Webkit would definitely be the cursed option of last resort; and needing a different implementation to support sandboxed and non-sandboxed apps sounds like a complication that isn't worth the effort.
One comment/background detail that might be helpful:
Ergonomics and framing the shot
As far as ergonomics are concerned, I think there might be issues with how to present the actual camera interface to users.
FWIW - macOS implements a "viewfinder" window as there's no native macOS viewfinder. I don't know how much of that implementation could be factored out into a utility available at the core level, but at least the "shape" of the implementation should be re-usable.
Longer term, we'll also need to be able to have a "live" camera view widget - so - declaring an areas of the app layout that is a live camera feed (to support QR/Barcode scanning, for example).
This isn't currently supported on any platform, but it is nominally on my radar as a likely feature at some point in the future, so I'm flagging it in case it factors into any technology choices.
FWIW - macOS implements a "viewfinder" window as there's no native macOS viewfinder. I don't know how much of that implementation could be factored out into a utility available at the core level, but at least the "shape" of the implementation should be re-usable.
Fantastic to know! I had not looked into yet but wondered if the macOS implementation had one. I'll look to it to dictate the shape of the interaction, as you've said.
What is the problem or limitation you are having?
Toga GTK has no support for the Toga Camera API.
Describe the solution you'd like
Toga GTK should provide a Camera API implementation based on GStreamer, Pipewire, and the XDG Camera Portal.
Interaction with the camera portal needs to happen over DBus. While
libportal
can be interacted with via the GObject bindings, it's GIR bindings aren't available in the Flatpak runtime. It also less generally useful than if Toga spoke D-Bus directly, either viaGio.DBusProxy
or using one of the Python D-Bus interfaces which have much better ergonomics from Python than theGio
PyGI bindings. (jeepney
looks like the best option considering its flexibility to support different main loops. It is also actively maintained. Jeepney's flexibility means it can be portable between Python's asyncio (e.g., for console apps),GLibMainLoop
, Qt, or whatever other types of application loops Toga might support for Linux. While it is "pure-Python" and therefore potentially a performance bottleneck (Jeepney's docs cover this trade-off), Toga's usage for the Camera API and other UI-related matters will not be performance sensitive. Toga users in need of high-performance D-Bus would still be free to use whatever implementation they prefer, and without the concern of mainloop flexibility because an actual app is already constrained by mainloop type it would target for UI anyway (i.e., GTK apps will have the GLibMainLoop, Qt will have Qt's loop, etc).While the implementation should not over-complicate itself in order to be GTK-agnostic from the start, if the core can be written in such a way that it does not rely on the GLib main loop, it will benefit future Linux backends and enable code sharing of core logic.
Why tie Toga to the XDG portal even when the application is not sandboxed
Using the XDG Camera Portal similarly creates portability. While it couples Toga apps to the XDG desktop portal, and while that isn't strictly necessary, XDG is the way things are moving on Linux applications, and by using the portal API even outside where it's strictly needed (e.g., in a sandbox) Toga will not need to have divergent implementations for finding camera devices in and outside the sandbox. One only needs to see the complexity of divergent behaviours in and out of a sandbox that the location API had to deal with (#2999) to see how useful it is if the implementation is able to be agnostic of whether it's sandboxed. There are vanishingly few circumstances with desktop Linux where a GUI is relevant where an XDG desktop portal implementation is not available. Even users of minimalist WMs like
sway
usually install an XDG desktop implementation, due to how prevalent usage of the portal is in contemporary Linux applications.Ergonomics and framing the shot
As far as ergonomics are concerned, I think there might be issues with how to present the actual camera interface to users. For instance, it's probably desirable for the user of a Toga application to be able to see a viewfinder before they take the image. iOS and Android handle that nicely by popping up a camera UI so that the user can frame the shot, take the photo, and the photo gets passed back to the application. None of that would come by default for Toga GTK, so I think in addition to implementing the baseline Camera API, a kind of simple GTK popout window to simulate the same experience you get on mobile devices would be useful. Once a camera handle is available, it's easy to render the video output from the device into a GTK video widget.
Describe alternatives you've considered
I strongly considered and spent time exploring whether it would be worth implementing a simple version of the camera API that only works on non-sandboxed applications. The library
linuxpy
makes it easy to interact with video4linux2, for example, and capture a frame. Most of the camera API (except for flash) could be very easily implemented with linuxpy. However, none of that implementation could be shared with a version that works in an application sandbox. The dependency is also of questionable broad value for Toga GTK (none of its other APIs are really relevant for GUI apps) and writing a custom interaction with v4l2 would be a huge pain and a massive thing to maintain. The camera portal + Pipewire + GStreamer is, in fact, simpler in the long term. Pipewire video objects are relatively immature, but Firefox uses this approach successfully for its own camera support, so it absolutely is not lacking in support.Similarly, a v4l2-specific approach would not set Toga GTK up for adding things like sound recording ability, if a microphone API was ever introduced. The Pipewire + GStreamer approach makes that easy. GStreamer itself is widely documented with many millions of lines of open source code to use as reference. Likewise for Pipewire. v4l2 is also popular, but is on its way out in favour of the Pipewire approach (as evidenced by Firefox's implementation, as well as the fact that the camera portal returns a Pipewire remote as the camera handle).
Another alternative I considered was to actually pipe everything through a WebKit WebView, and leverage WebKit's camera API instead. This might sound like overkill, it but could seriously simplify things for Toga, and the WebView would not need to be rendered out to the Toga application for this, it could just be used as a host for a small wrapper to translate the Python camera API into JavaScript calls. However, it would be useful for the WebView to be the video output sink to display the "viewfinder" for the framing UI I mention in the previous section.
I don't even hate this approach, but I shy away from it because it might be too unconventional, and the dependency on WebKit for it might be too heavy. Not sure.
The same approach could be used for other APIs that browsers support, and it would always work in a sandbox. The geolocation API could be re-implemented in this way, for example, thereby avoiding the need to use GeoClue directly and deal with the differences in sandboxed execution.
But it isn't pretty 🙂
Oh, and it doesn't move Toga towards better D-Bus interactions, which will be needed anyway for things like StatusNotifierItem (see #3001), and the WebView wouldn't be a suitable interface for all things that the XDG Portal supports, like backgrounding, global shortcuts, clipboard, printing arbitrary documents, etc, all things that would be useful for Toga to provide cross-platform APIs to. For that, speaking D-Bus and doing so in an easy to manage way would be much more useful than the potential simplification piping camera access through a WebView would bring.
The text was updated successfully, but these errors were encountered: