Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toga GTK Camera API support #3049

Open
sarayourfriend opened this issue Dec 19, 2024 · 2 comments
Open

Toga GTK Camera API support #3049

sarayourfriend opened this issue Dec 19, 2024 · 2 comments
Labels
enhancement New features, or improvements to existing features.

Comments

@sarayourfriend
Copy link
Contributor

sarayourfriend commented Dec 19, 2024

What is the problem or limitation you are having?

Toga GTK has no support for the Toga Camera API.

Describe the solution you'd like

Toga GTK should provide a Camera API implementation based on GStreamer, Pipewire, and the XDG Camera Portal.

Interaction with the camera portal needs to happen over DBus. While libportal can be interacted with via the GObject bindings, it's GIR bindings aren't available in the Flatpak runtime. It also less generally useful than if Toga spoke D-Bus directly, either via Gio.DBusProxy or using one of the Python D-Bus interfaces which have much better ergonomics from Python than the Gio PyGI bindings. (jeepney looks like the best option considering its flexibility to support different main loops. It is also actively maintained. Jeepney's flexibility means it can be portable between Python's asyncio (e.g., for console apps), GLibMainLoop, Qt, or whatever other types of application loops Toga might support for Linux. While it is "pure-Python" and therefore potentially a performance bottleneck (Jeepney's docs cover this trade-off), Toga's usage for the Camera API and other UI-related matters will not be performance sensitive. Toga users in need of high-performance D-Bus would still be free to use whatever implementation they prefer, and without the concern of mainloop flexibility because an actual app is already constrained by mainloop type it would target for UI anyway (i.e., GTK apps will have the GLibMainLoop, Qt will have Qt's loop, etc).

While the implementation should not over-complicate itself in order to be GTK-agnostic from the start, if the core can be written in such a way that it does not rely on the GLib main loop, it will benefit future Linux backends and enable code sharing of core logic.

By the way, while GStreamer is indeed a GLib API, there are Qt bindings for it that adapt it to use Qt's loop, and therefore does not necessarily depend on GLib's loop being available to the application (see "features" heading at the link).

Why tie Toga to the XDG portal even when the application is not sandboxed

Using the XDG Camera Portal similarly creates portability. While it couples Toga apps to the XDG desktop portal, and while that isn't strictly necessary, XDG is the way things are moving on Linux applications, and by using the portal API even outside where it's strictly needed (e.g., in a sandbox) Toga will not need to have divergent implementations for finding camera devices in and outside the sandbox. One only needs to see the complexity of divergent behaviours in and out of a sandbox that the location API had to deal with (#2999) to see how useful it is if the implementation is able to be agnostic of whether it's sandboxed. There are vanishingly few circumstances with desktop Linux where a GUI is relevant where an XDG desktop portal implementation is not available. Even users of minimalist WMs like sway usually install an XDG desktop implementation, due to how prevalent usage of the portal is in contemporary Linux applications.

Ergonomics and framing the shot

As far as ergonomics are concerned, I think there might be issues with how to present the actual camera interface to users. For instance, it's probably desirable for the user of a Toga application to be able to see a viewfinder before they take the image. iOS and Android handle that nicely by popping up a camera UI so that the user can frame the shot, take the photo, and the photo gets passed back to the application. None of that would come by default for Toga GTK, so I think in addition to implementing the baseline Camera API, a kind of simple GTK popout window to simulate the same experience you get on mobile devices would be useful. Once a camera handle is available, it's easy to render the video output from the device into a GTK video widget.

Describe alternatives you've considered

I strongly considered and spent time exploring whether it would be worth implementing a simple version of the camera API that only works on non-sandboxed applications. The library linuxpy makes it easy to interact with video4linux2, for example, and capture a frame. Most of the camera API (except for flash) could be very easily implemented with linuxpy. However, none of that implementation could be shared with a version that works in an application sandbox. The dependency is also of questionable broad value for Toga GTK (none of its other APIs are really relevant for GUI apps) and writing a custom interaction with v4l2 would be a huge pain and a massive thing to maintain. The camera portal + Pipewire + GStreamer is, in fact, simpler in the long term. Pipewire video objects are relatively immature, but Firefox uses this approach successfully for its own camera support, so it absolutely is not lacking in support.

Similarly, a v4l2-specific approach would not set Toga GTK up for adding things like sound recording ability, if a microphone API was ever introduced. The Pipewire + GStreamer approach makes that easy. GStreamer itself is widely documented with many millions of lines of open source code to use as reference. Likewise for Pipewire. v4l2 is also popular, but is on its way out in favour of the Pipewire approach (as evidenced by Firefox's implementation, as well as the fact that the camera portal returns a Pipewire remote as the camera handle).


Another alternative I considered was to actually pipe everything through a WebKit WebView, and leverage WebKit's camera API instead. This might sound like overkill, it but could seriously simplify things for Toga, and the WebView would not need to be rendered out to the Toga application for this, it could just be used as a host for a small wrapper to translate the Python camera API into JavaScript calls. However, it would be useful for the WebView to be the video output sink to display the "viewfinder" for the framing UI I mention in the previous section.

I don't even hate this approach, but I shy away from it because it might be too unconventional, and the dependency on WebKit for it might be too heavy. Not sure.

The same approach could be used for other APIs that browsers support, and it would always work in a sandbox. The geolocation API could be re-implemented in this way, for example, thereby avoiding the need to use GeoClue directly and deal with the differences in sandboxed execution.

But it isn't pretty 🙂

Oh, and it doesn't move Toga towards better D-Bus interactions, which will be needed anyway for things like StatusNotifierItem (see #3001), and the WebView wouldn't be a suitable interface for all things that the XDG Portal supports, like backgrounding, global shortcuts, clipboard, printing arbitrary documents, etc, all things that would be useful for Toga to provide cross-platform APIs to. For that, speaking D-Bus and doing so in an easy to manage way would be much more useful than the potential simplification piping camera access through a WebView would bring.

@sarayourfriend sarayourfriend added the enhancement New features, or improvements to existing features. label Dec 19, 2024
@freakboy3742
Copy link
Member

Another extraordinary thorough analysis - thanks.

Based on this writeup, you've convinced me that the XDG Camera Portal approach makes sense. Webkit would definitely be the cursed option of last resort; and needing a different implementation to support sandboxed and non-sandboxed apps sounds like a complication that isn't worth the effort.

One comment/background detail that might be helpful:

Ergonomics and framing the shot

As far as ergonomics are concerned, I think there might be issues with how to present the actual camera interface to users.

FWIW - macOS implements a "viewfinder" window as there's no native macOS viewfinder. I don't know how much of that implementation could be factored out into a utility available at the core level, but at least the "shape" of the implementation should be re-usable.

Longer term, we'll also need to be able to have a "live" camera view widget - so - declaring an areas of the app layout that is a live camera feed (to support QR/Barcode scanning, for example).

This isn't currently supported on any platform, but it is nominally on my radar as a likely feature at some point in the future, so I'm flagging it in case it factors into any technology choices.

@sarayourfriend
Copy link
Contributor Author

FWIW - macOS implements a "viewfinder" window as there's no native macOS viewfinder. I don't know how much of that implementation could be factored out into a utility available at the core level, but at least the "shape" of the implementation should be re-usable.

Fantastic to know! I had not looked into yet but wondered if the macOS implementation had one. I'll look to it to dictate the shape of the interaction, as you've said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features, or improvements to existing features.
Projects
None yet
Development

No branches or pull requests

2 participants