Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Data Source Manager STAC search initial implementation #59534

Merged
merged 5 commits into from
Dec 2, 2024

Conversation

uclaros
Copy link
Contributor

@uclaros uclaros commented Nov 20, 2024

Description

This is the second part of the implementation of qgis/QGIS-Enhancement-Proposals#300

A new page has been added to the Data Source Manager that allows interaction with STAC API endpoints.

image

Filters for searching the catalog can be supplied by a dialog accessible through the Filters button:

image

The results list allows for endless scrolling, fetching more contents as it is scrolled down.
For STAC Items containing Assets on a cloud optimized format (COG, COPC, EPT for now) the Add Layers button is enabled allowing to add all such Assets as layers, similar to how it can be done using the QGIS browser.

STAC Catalogs that do not support STAC API /collections and search endpoints can only be used QGIS browser and not the data source manager.

@uclaros uclaros added the STAC Related to SpatioTemporal Asset Catalog label Nov 20, 2024
@github-actions github-actions bot added this to the 3.42.0 milestone Nov 20, 2024
Copy link

github-actions bot commented Nov 20, 2024

🪟 Windows builds

Download Windows builds of this PR for testing.
Debug symbols for this build are available here.
(Built from commit d42acc8)

🪟 Windows Qt6 builds

Download Windows Qt6 builds of this PR for testing.
(Built from commit d42acc8)

@nyalldawson
Copy link
Collaborator

STAC Catalogs that do not support STAC API /collections and search endpoints can only be used QGIS browser and not the data source manager.

What happens when you try to connect to these via DSM? One thing we could do here is fallback to using the browser model with the root item being the connection, so that users could still utilise these connections via DSM. (This is the approach we use with eg the ArcGIS REST page in DSM -- when you connect to a service then we just use the browser model showing only the content from that connection. It's a nice approach because we already have all that code already for the browser panel..)

src/core/stac/qgsstacasset.cpp Outdated Show resolved Hide resolved
src/core/stac/qgsstacitem.cpp Outdated Show resolved Hide resolved
}
else if ( it->href().startsWith( QLatin1String( "s3://" ), Qt::CaseInsensitive ) )
{
uri.uri = QStringLiteral( "/vsis3/%1" ).arg( it->href().mid( 5 ) );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? I suspect we'd generally need to also set the credentialOptions in the uri, eg to add the AWS_NO_SIGN_REQUEST=Yes option. I'm not aware of any s3 connections which will work without ANY credential options set.

I think in practice we'll need to use the storage extension to determine appropriate credentials -- see https://github.com/stac-extensions/storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works since we practically just let GDAL handle the s3 authentication. Users will need to set up their gdal env variables or ~/.aws/config files and their layers will just work!

For downloading assets though, this would not work indeed (there's no s3:// url handling at the moment).
There's room for discussion on how to handle that, but that's outside the scope of this PR.
I also haven't seen the storage stac-extension used on any of the servers I've used for testing, so I'd be reluctant to depend on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users will need to set up their gdal env variables or ~/.aws/config files and their layers will just work!

Right -- but since #57826 we can do better, because we could just use credentialOptions in the layer URI and set this for users 😄 Then things really will "just work", no config file messing around required.

I also haven't seen the storage stac-extension used on any of the servers I've used for testing, so I'd be reluctant to depend on it.

Try https://earth-search.aws.element84.com/v1/collections/ -- that one uses the storage extension to specify:

"storage:platform": [
          "AWS"
        ],
        "storage:region": [
          "eu-central-1"
        ],
        "storage:requester_pays": [false]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the extension, we could indeed handle the cases where requester_pays=false and use AWS_NO_SIGN_REQUEST=YES (shouldn't this be default when no other signing info are provided?) and maybe also use the region in order to translate the s3:// url to https:// for downloading assets.
As for CredentialOptions, AFAICT they are gdal specific. I'd prefer to have something more generic (like the config files) that can also work outside gdal for downloading assets.
We could use the existing QgsAuthAwsS3Method and in the STAC connection we could define a list of authCfgs which should be used for downloading assets, as they will probably be different from the catalog's authcfg.
Then the AuthManager would have a method QStringList getAvailableAuthCfgsForThisLink( url ) and we could decide which one to use for downloading the files.

But as I said before, all this handling of s3 links is well outside the scope of this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the extension, we could indeed handle the cases where requester_pays=false and use AWS_NO_SIGN_REQUEST=YES (shouldn't this be default when no other signing info are provided?)

I'd say that's a safe call to make, but @rouault will likely know better

maybe also use the region in order to translate the s3:// url to https:// for downloading assets.

I'd suggest just directly mapping that to the "AWS_DEFAULT_REGION" credential option, and stick with vsis3 and let GDAL handle the rest.

As for CredentialOptions, AFAICT they are gdal specific. I'd prefer to have something more generic (like the config files) that can also work outside gdal for downloading assets.

They are GDAL/OGR provider specific, but we should definitely use them when loading GDAL/OGR layers from the collections. That we we pass off all the handling of this and optimisations and such to GDAL, and it's the BEST place to do that 😆

We could use the existing QgsAuthAwsS3Method and in the STAC connection we could define a list of authCfgs which should be used for downloading assets, as they will probably be different from the catalog's authcfg.
Then the AuthManager would have a method QStringList getAvailableAuthCfgsForThisLink( url ) and we could decide which one to use for downloading the files.

I'm -1 to using auth methods for this (in general). They don't work at all if you share projects or layer configuration, and for the simple cases of AWS_NO_SIGN_REQUEST/AWS_DEFAULT_REGION there's nothing sensitive in the configuration which requires use of the secure auth framework.

I guess the question is whether or not there's ever going to be a stac collection where the items require DIFFERENT authentication vs the collection itself. If not, then could just handle the auth configuration in the stac connection and just copy the authcfg key from the connection to the loaded layers.

But as I said before, all this handling of s3 links is well outside the scope of this PR

That's fair enough -- but I'd go back to suggesting that we then ALWAYS add the "AWS_NO_SIGN_REQUEST=Yes" credential option, and just hope that it works 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that's a safe call to make, but @rouault will likely know better

I'm not sure. aws cli requires an explicit --no-sign-request.

or not there's ever going to be a stac collection where the items require DIFFERENT authentication vs the collection itself

OSGeo/gdal#11317 / OSGeo/gdal#11319 is an example where the authentication to the STAC API is different than the assets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also use the region in order to translate the s3:// url to https:// for downloading assets.

I'd suggest just directly mapping that to the "AWS_DEFAULT_REGION" credential option, and stick with vsis3 and let GDAL handle the rest.

Can we use vsis3 to download assets? Assets are not necessarily gdal supported formats!

I'm -1 to using auth methods for this (in general). They don't work at all if you share projects or layer configuration

I don't mean for loading assets as layers, this should definitely be done using vsis3. My point is for downloading assets as files, which may also be arbitrary non-gdal formats.

That's fair enough -- but I'd go back to suggesting that we then ALWAYS add the "AWS_NO_SIGN_REQUEST=Yes" credential option, and just hope that it works 😆

Will it work if the link actually requires signing and the user has properly setup config files?

src/gui/stac/qgsstacsearchparametersdialog.cpp Outdated Show resolved Hide resolved
src/ui/qgisapp.ui Outdated Show resolved Hide resolved
src/gui/stac/qgsstacsourceselect.cpp Outdated Show resolved Hide resolved
@rouault
Copy link
Contributor

rouault commented Nov 21, 2024

very cool ! Maybe it could be worth extending tests/src/core/testqgsstac.cpp ?

src/gui/stac/qgsstacitemlistmodel.cpp Outdated Show resolved Hide resolved
src/gui/stac/qgsstacsourceselectprovider.cpp Outdated Show resolved Hide resolved
@uclaros
Copy link
Contributor Author

uclaros commented Nov 27, 2024

One thing we could do here is fallback to using the browser model with the root item being the connection, so that users could still utilise these connections via DSM.

@nyalldawson I like this approach! I'll look into extending this in the future!

Maybe it could be worth extending tests/src/core/testqgsstac.cpp ?

@rouault The parsing of Item Collection and Collection Collection replies is already covered. I'm not sure what else to test as this PR is mostly GUI which we usually don't test. Do you maybe have some specific test to suggest?

@rouault
Copy link
Contributor

rouault commented Nov 27, 2024

Do you maybe have some specific test to suggest?

I see new methods in src/core/stac/, so I imagine they could be tested

@uclaros
Copy link
Contributor Author

uclaros commented Nov 28, 2024

I see new methods in src/core/stac/, so I imagine they could be tested

Thanks for the hint, it helped identify a couple of minor bugs!

Is it safe to assume that the clang-tidy memory leak report on the QTimer::singleshot is a false positive?

@rouault
Copy link
Contributor

rouault commented Nov 28, 2024

Is it safe to assume that the clang-tidy memory leak report on the QTimer::singleshot is a false positive?

I think so

@nyalldawson
Copy link
Collaborator

assume that the clang-tidy memory leak report on the QTimer::singleshot is a false positive?

Yes,.see elsewhere where we have to "ifndef clang analyser" to suppress this

@wonder-sk wonder-sk merged commit 21f693e into qgis:master Dec 2, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
STAC Related to SpatioTemporal Asset Catalog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants