Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move more data providers to core library #288

Closed
nyalldawson opened this issue Mar 14, 2024 · 15 comments
Closed

Move more data providers to core library #288

nyalldawson opened this issue Mar 14, 2024 · 15 comments

Comments

@nyalldawson
Copy link
Contributor

nyalldawson commented Mar 14, 2024

QGIS Enhancement: Move more data providers to core library

Date 2024/03/14

Author Nyall Dawson (@nyalldawson)

Contact nyall dot dawson at gmail dot com

Version QGIS 3.38/3.40

Summary

Currently, QGIS data providers are split between implementation as private classes within the core library and an external "plugin" provider interface. There are significant benefits of moving providers to the "core" library and avoiding "plugin" style providers.

As of QGIS 3.36, the "core" providers are:

  • Cloud optimised point cloud ("COPC")
  • Entwine point cloud ("EPT")
  • GDAL raster provider ("GDAL")
  • Memory (scratch layer) vector provider ("memory")
  • Memory (scratch layer) mesh provider ("meshmemory")
  • OGR vector provider ("OGR")
  • SensorThings vector provider ("sensorthings")
  • Virtual point cloud ("vpc")
  • Vector tiles
  • Cesium tiled scene provider ("cesium")

Plugin based providers are:

  • ArcGIS REST vector feature server ("arcgisfeatureserver")
  • ArcGIS REST mapserver raster ("arcgismapserver")
  • Delimited text ("delimitedtext")
  • GPX ("gpx")
  • GRASS raster provider ("grassraster")
  • GRASS vector provider ("grass")
  • SAP HANA vector provider ("hana")
  • MDAL mesh provider ("mdal")
  • SQL Server vector provider ("mssql")
  • Oracle vector ("oracle")
  • PDAL point cloud provider ("pdal")
  • Postgres vector provider ("postgres")
  • Postgres raster provider ("postgresraster")
  • Spatialite vector provider ("spatialite")
  • Virtual vector provider ("virtual")
  • Virtual raster provider ("virtualraster")
  • WCS raster provider ("wcs")
  • WFS / OAPIF vector providers
  • WMS / WMTS / XYZ raster provider ("wms")

This split between core and plugin providers has developed over time. Initially, ALL data providers
were implemented as plugin providers. There are some benefits of this approach:

  • Since the plugin provider libraries are loaded at runtime, theoretically this allows users to
    create "slim" QGIS installations by permitting deletion of unwanted provider libraries from
    their QGIS install. (eg by deleting .dll files in Windows installs)
  • It enforces isolation of provider code, since individual providers cannot access the private code
    from other providers. Instead, common functionality must be extracted to modular re-usable functions
    and placed within the core library.
  • Ideally, this enforced isolation should also mean that there is NO provider specific logic living
    anywhere EXCEPT in that individual provider's library. (In practice this is not the case, see below)
  • Historically, it was needed to keep provider specific GUI related code out of the core library. Older
    QGIS versions required that a data providers mixed their core (headless) functionality with their
    GUI configuration widgets (such as dialogs for connection handling, layer configuration). By moving
    all the code related to one provider (including the gui code) into isolated parts of the QGIS code, it
    avoided GUI classes bleeding into the QGIS core library.
  • 3rd parties can theoretically create custom data providers and provide them as drop-in compiled
    libraries to be added to a standard QGIS install.

However, there are downsides to the plugin provider implementation:

  • Dynamically loading a provider at runtime increases the startup time of QGIS (and also other
    critically performance reliant QGIS related tools such as qgis_process and QGIS server).
  • Dynamically loaded providers also contribute to the (frustratingly slow!!) startup times of
    QGIS within debug sessions. This is a pain point for QGIS development.
  • In practice, it is not possible to completely disable an unwanted provider by removing libraries
    from a QGIS install. There are still leftovers from these providers, such as broken entries in the Add Layer
    menus, which just do nothing when activated. Instead, properly disabling a provider requires compile-time
    CMake configuration to ensure that all references to the provider are removed.
  • Dynamically loaded providers do not work well with other QGIS library clients (such as QField/Mergin),
    which require these providers to be statically linked. The setup for building with static libraries is
    complex, and is not tested on the QGIS CI. This leds to build breakage (eg fix installing and linking in delimited provider gui QGIS#56233)
  • Modern QGIS versions have full support for provider logic split between the core and gui libraries,
    with no risk of bleeding gui related code into the core library.

For these reasons, the recent trend has been to implementing new providers directly in the
core library (hence why "younger" providers like the point cloud providers and tiled scene providers
are implemented in the core library).

In this project some additional providers will be moved to become "core" providers. The targeted providers are:

  • ArcGIS REST vector feature server ("arcgisfeatureserver")
  • ArcGIS REST mapserver raster ("arcgismapserver")
  • Delimited text ("delimitedtext")
  • GPX ("gpx")
  • MDAL mesh provider ("mdal")
  • PDAL point cloud provider ("pdal")
  • Postgres vector provider ("postgres")
  • Postgres raster provider ("postgresraster")
  • Spatialite vector provider ("spatialite")
  • Virtual raster provider ("virtualraster")
  • WMS / WMTS / XYZ raster provider ("wms")

This leaves the following providers as plugin providers:

  • GRASS raster provider ("grassraster")
  • GRASS vector provider ("grass")
  • SAP HANA vector provider ("hana")
  • SQL Server vector provider ("mssql")
  • Oracle vector ("oracle")
  • Virtual vector provider ("virtual")
  • WCS raster provider ("wcs")
  • WFS / OAPIF vector providers

(These remaining providers are excluded in order to limit the project scope, and could be targeted in future projects.)

Startup cost of provider plugins

Profiling of the qgis_process tool shows that (on a debug enabled QGIS build) the initialization of the plugin style providers accounts for ~9% of the application start up cost (this increases to ~30% of the application start up cost if the --no-python flag is added!)

Scope of work

  • The targeted providers will be moved to src/core/providers/xxx, with their GUI related logic moved
    to gui/providers/xxx (to enforce the core/gui logic split).
  • Minor code cleanups and modernisation will be performed during the move

Benefits to QGIS

  • Faster startup times
  • More maintainable code
  • Simplified build setup
  • Benefits to PyQGIS initialisation -- see below

Easier initialisation of PyQGIS for use in external scripts

Right now, you can initialize the PyQGIS API through the following code

from qgis.core import QgsApplication

QGISAPP = QgsApplication([], True)
QgsApplication.initQgis()

However, this approach does NOT load any of the plugin style providers, so the resultant PyQGIS environment can only access layers using GDAL/OGR/memory and the other "core" providers. Other layer types (eg wms) will always be considered invalid.

To avoid this, you need to set various environment variables such as QGIS_PREFIX_PATH so that the plugin style providers can be located and loaded. This complicates development of standalone PyQGIS scripts.

By moving more providers to core style, we can help reduce the need for the extra environment setup (and ultimately, remove it entirely!)

Performance Implications

  • This will result in faster QGIS startup times!

Risks

This is a low-risk project. The interfaces for clean split of core/gui for providers are mature and widely used. The existing "core" providers are mature and the approach of including providers in the core library is proven and stable.

The targeted providers are all well covered by existing unit tests.

Further Considerations/Improvements

  • Won't this prevent custom "slim" QGIS builds? No. Providers will still be selectively enabled
    when compiling based on CMake configuration at compile time. E.g. a custom build without postgres
    support can be made by disabling the WITH_POSTGRESQL cmake flag. This is a BETTER approach to customised
    slim builds as ALL references to the provider are disabled in the build.
  • Won't this add extra dependencies to QGIS builds? No. Just like we currently have, the providers
    will be selectively compiled based on cmake configuration flags and the dependencies these libraries have
    configured via cmake. There will be NO change to the dependencies required for a default QGIS build.
  • Does this prevent the use of 3rd party provider libraries? No. All the existing logic and interfaces
    for dynamically loaded providers will be left in place. These third party libraries will continue
    to work without change.
  • Will provider logic be exposed to Python? No. core providers are marked as private classes, and
    are not exposed as part of stable API.
@troopa81
Copy link

Dynamically loading a provider at runtime increases the startup time of QGIS (and also other
critically performance reliant QGIS related tools such as qgis_process and QGIS server).

Do you have an idea on how much it increases startup time ?

Could we let them in providers directory and create a qgis_providers library dynamically linked but not dynamically loaded at runtime? It would keep isolation from core, advertise that all its content is private and solve the performance issues.

@andreasneumann
Copy link
Member

This leaves the following providers as plugin providers:

* GRASS raster provider ("grassraster")

* GRASS vector provider ("grass")

* SAP HANA vector provider ("hana")

* SQL Server vector provider ("mssql")

* Oracle vector ("oracle")

* Virtual vector provider ("virtual")

* WCS raster provider ("wcs")

* WFS / OAPIF vector providers

@nyalldawson - is there a particular reason why WMS / WMTS / XYZ would be a core provider and WFS / OAPIF a plugin?

Both are very important and widely used network services, just as much (or even more widely used than let's say ArcGIS REST services.

@rouault
Copy link
Contributor

rouault commented Mar 14, 2024

and WFS / OAPIF a plugin?

I guess one reason is that in the core logic of the WFS provider we use the Qt GUI API, for example to display the dialogue box with the progress bar (and more recently in QGIS 3.36 when downloading schemas in the new "schema aware" mode). This might be tricky to un-entangle. Well, one "quick" & dirty way would probably to use a callback mechanism where the UI part of the WFS would set the callback to the core. But perhaps there should be some core mechanism to report progress of long feature iteration, but that's a topic for another enhancement.

@wonder-sk
Copy link
Member

+1 this will be a good move! Some more advantages not mentioned in the QEP is the chance for 1. better testability (easier to do unit tests of internal classes of a provider), 2. better code reuse, 3. ability to expose some functionality in core API (e.g. manage list of connections through API, get parsed GetCapabilities response)

My slight preference would be to have WFS and WCS brought to core first instead of database providers (posgres, spatialite), so we have all OGC-related bits in core...

@nyalldawson
Copy link
Contributor Author

@andreasneumann

is there a particular reason why WMS / WMTS / XYZ would be a core provider and WFS / OAPIF a plugin?
Both are very important and widely used network services, just as much (or even more widely used than let's say ArcGIS REST services.

It's not so much that it's considered less important, rather that (as @rouault pointed out) there's particular complexities in upgrading wfs/oapif from a plugin to a core provider. There's a substantial amount of risk involved in porting that one, which is why I'm opting to leave it to a future piece of work.

@rouault

I guess one reason is that in the core logic of the WFS provider we use the Qt GUI API, for example to display the dialogue box with the progress bar (and more recently in QGIS 3.36 when downloading schemas in the new "schema aware" mode). This might be tricky to un-entangle. Well, one "quick" & dirty way would probably to use a callback mechanism where the UI part of the WFS would set the callback to the core. But perhaps there should be some core mechanism to report progress of long feature iteration, but that's a topic for another enhancement.

My thinking is that the progress bar should be removed along with all that gui code, and replaced with a QgsTask instead. That way we get nicer GUI since the wfs progress is shown alongside all the other long running tasks, and we don't have any gui related code / logic sitting in the core provider parts.

@nyalldawson
Copy link
Contributor Author

@troopa81

Do you have an idea on how much it increases startup time ?

I've added a "Startup cost of provider plugins" section above

Profiling of the qgis_process tool shows that (on a debug enabled QGIS build) the initialization of the plugin style providers accounts for ~9% of the application start up cost (this increases to ~30% of the application start up cost if the --no-python flag is added!). It's the next logical "low hanging fruit" in optimising this tool. (I don't think there's any opportunities to speed up the python initialisation, and the bulk of the remainder is coming from Qt application initialisation internals).

Could we let them in providers directory and create a qgis_providers library dynamically linked but not dynamically loaded at runtime? It would keep isolation from core, advertise that all its content is private and solve the performance issues.

We could, but I don't think the extra complexity is worth it.

@rouault
Copy link
Contributor

rouault commented Mar 15, 2024

I've added a "Startup cost of provider plugins" section above

FYI: a somewhat related move we've made with GDAL plugins in 3.9dev : https://gdal.org/development/rfc/rfc96_deferred_plugin_loading.html "However, there is a penalty at GDALAllRegister() time. For example, on Linux, it takes ~ 300 ms to complete for a build with 126 plugins, which is a substantial time for short lived GDAL-based processes (for example a script which would run gdalinfo or ogrinfo on many files). This time is entirely spent in the dlopen() method of the operating system and there is, to the best of our knowledge, nothing we can do to reduce it.." And I confirm the time penalty under a debugger was at least one order of magnitude worse.

@troopa81
Copy link

troopa81 commented Mar 18, 2024

I've added a "Startup cost of provider plugins" section above

Thanks!

We could, but I don't think the extra complexity is worth it.

I don't see exactly why it would be that much complex compared with the solution you propose?

@nyalldawson
Copy link
Contributor Author

@troopa81

I don't see exactly why it would be that much complex compared with the solution you propose?

Well we already have a mature framework for including providers in src/core and src/gui. There's no extra cmake setup required to move more providers to this system.

If we make another library there IS cmake changes required. (We'd also need two extra libraries, one for the non gui part of the providers and one for the gui components.) I just don't think the extra layers have a compelling enough justification.

@troopa81
Copy link

We'd also need two extra libraries, one for the non gui part of the providers and one for the gui components.)

Why is that? we could have both in the same library.

The performance issue is mainly due to the dynamic loading (dlopen), why not keep the providers as shared libraries and dynamically link the application to them, thus avoiding dynamic loading (which look like to be done for postgres ?

@m-kuhn
Copy link
Member

m-kuhn commented Mar 23, 2024

The current cmake code of plugins is painful indeed with building shared and static libraries of providers.

How we will make sure that it will still be possible to compile a "slim" version of qgis core. I think is important as it's more and more used as library (server, mobile, wasm, process). With the current approach provider API to provider functionality is clearly defined. I could well imagine that we will have direct access to some [name your provider] specific code in various places in QGIS core and the cmake flags will be rendered disfunctional. Are there any plans to test building with very few (e.g. only memory and maybe gdal/ogr) provider?

It's currently already possible to build providers with static linkage. This is referred to as "complex and untested" in the proposal. Has any consideration been given to make static the default / only supported approach. This would make providers immediately tested and increase runtime loading of qgis in the same way as with this proposal but keep the "natural barrier" for keeping providers self contained and modular. Or is there a complexity about this setup that we'd like to get rid of?

@anitagraser
Copy link
Member

Thank you for submitting your proposal to the 2024 QGIS Grant Programme. The 2 week discussion period starts today. At the end of the discussion, the proposal author has to provide a 3-line pitch of their proposal for the voter information material. (For an example from last year check qgis/PSC#58 (comment))

@nyalldawson
Copy link
Contributor Author

@m-kuhn If this proposal doesn't meet the QField use case to your satisfaction then I'll withdraw it -- improving things for QField was one of the main drivers here 🤣 👍

@m-kuhn
Copy link
Member

m-kuhn commented Mar 25, 2024

I am in favor of seeing something happen 👍

If building providers dynamic, newer XCode versions complain about the static and dynamic versions of the providers both depending on the same files (can't recall the exact wording right now) which makes things messy. I would not mind seeing providers as plugins disappear, I'm just not yet completely convinced that inflating core is the correct approach. What counts for me in the end is that we keep the possibility to build a "slick" core with only few providers (ideally we had the possibility to build just a renderer with memory layers but I guess that train has gone 😆 )

@nyalldawson
Copy link
Contributor Author

Ok, after giving more thought I've decided to withdraw this proposal and will rethink for next round.

@anitagraser can you remove this grant request from the list please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants