Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGIS Enhancement Proposal: Mitigate Abusive Tile Fetching on OpenStreetMap (OSM) Servers #291

Closed
nirvn opened this issue Mar 20, 2024 · 12 comments

Comments

@nirvn
Copy link

nirvn commented Mar 20, 2024

QGIS Enhancement: QGIS Enhancement Proposal: Mitigate Abusive Tile Fetching on OpenStreetMap (OSM) Servers

Date 2024/03/14

Author Mathieu Pellerin (@nirvn)

Contact [email protected]

Version QGIS 3.X

Current Situation

The OpenStreetMap (OSM) Foundation has expressed concerns with QGIS regarding the escalating tile usage, primarily attributable to a minority of users engaging in mass downloading of tiles. Detailed statistics on this issue are available at https://github.com/openstreetmap/operations/issues/1019. It is imperative for QGIS to address this matter to alleviate strain on OSM's limited resources, ensuring the continued availability of OSM layers as default options within QGIS.

Summary

This proposal aims to enhance QGIS's handling of tile fetching on the OpenStreetMap (OSM) servers to mitigate abusive tile fetching practices. The key objective is to prevent excessive strain on OSM servers by implementing measures to reduce fetching of tiles during ‘normal’ usage of the OSM tile server as well as discourage mass downloading of tiles through QGIS desktop.

Implementation Details

  1. Research and Analysis: Conduct thorough research into current tile fetching practices within QGIS and analyze the impact on OSM servers. Collaborate with the OSM Foundation to understand their concerns and gather insights for implementing effective mitigation strategies.
  2. Documentation and Communication: Provide clear documentation on the updated network cache size algorithms and algorithm changes to prohibit abusive tile fetching. Communicate these changes effectively to QGIS users through release notes and algorithm documentation updates.

Proposed Solution

  1. Optimized Default Network Cache Size: Develop an improved logic to define the default network cache size in QGIS, considering available space on users' systems. This optimization will help avoid small cache sizes when ample space is available, thereby reducing the frequency of tile requests to OSM servers.
  2. Processing Algorithm Changes: Modify relevant processing algorithms within QGIS to incorporate safeguards against mass downloading of tiles from the official OSM server (i.e. https://tile.openstreetmap.org/) . These changes will include rate-limiting mechanisms and detection of large-scale downloading patterns to prevent abuse of OSM server resources.

Risks

Low

@haubourg
Copy link
Member

+1 as discussed in PSC, we need to lower our impact on OSM servers.

@wonder-sk
Copy link
Member

Maybe we could also add the {usage} term in the default OSM connection, so it is easier to track when someone is downloading tiles rather using tiles for viewing - see qgis/QGIS#46731

Agreed we should increase the cache size - probably to at least 1 GB, given that more and more data sources are being streamed from remote servers...

@pathmapper
Copy link

👍

Maybe qgis/QGIS#56197 is of interest, currently the cache could grow up to the double amount of what is defined as size in network settings.

@rouault
Copy link
Contributor

rouault commented Mar 20, 2024

This proposal goes in the right direction, but while QGIS may help the OSM team to better identify the origin of requests, it seems to me that the ultimate solution is on the server side. Especially against abusive mass download of tiles (which seems the main issue OSM admins face). A sufficiently determined QGIS user may just remove any client-side rate-limiting we might add... Or they could just use a trivial shell script to mass download OSM tiles outside of QGIS.

I can imagine that when a tool starts downloading tiles some unique identifier of the "session" could help the server. But isn't the IP address of the client a sufficient enough information for OSM servers to already rate-limit an abuser?

@nyalldawson
Copy link
Contributor

One area I feel we can definitely improve is handling xyz tiles when the project is not in web mercator. In this scenario we fetch tiles at too high a zoom level, and end up requesting many more tiles then we need. I suspect this is one major contributor to our tile usage.

And in this scenario, the layer rendering will always be degraded anyway, so fetching lower res tiles shouldn't be a noticeable regression...

@nyalldawson
Copy link
Contributor

But anyway, big +1 to this, and taking steps to improve the relationship with OSM. That's something we can't afford to harm!

@grischard
Copy link

This proposal goes in the right direction, but while QGIS may help the OSM team to better identify the origin of requests, it seems to me that the ultimate solution is on the server side.

Hi from OpenStreetMap! We do limit abusive downloads on the server side, including from people masquerading as QGIS. This is many legitimate users inadvertently hitting our servers a bit too hard, and we do not want to block all of QGIS.

@anitagraser
Copy link
Member

Thank you for submitting your proposal to the 2024 QGIS Grant Programme. The 2 week discussion period starts today. At the end of the discussion, the proposal author has to provide a 3-line pitch of their proposal for the voter information material. (For an example from last year check qgis/PSC#58 (comment))

@rduivenvoorde
Copy link
Contributor

rduivenvoorde commented Apr 29, 2024

Don't want to be a PITA, but on normal use of an OSM layer, we are still requesting all tiles of all 8 mapcanvas-extents around the current mapcanvas. I think (given the giant screens I see QGIS running on nowadays), NOT requesting all these (for OSM) would probably help already a lot. I have always found it a little opportunistic to do all these requests (given we do not run our own tile servers)...

I understand the user-experience will not be better, but I really want to keep OSM in QGIS.

Or else: what about giving the 'normal'-requests (from QGIS/MapCanvas) another User-Agent then the requests from processing/tile-downloader? Then at least OSM can determine the bottleneck better?

See: qgis/QGIS#41832
and qgis/QGIS#41953

Proposed/implemented it there earlier, but got veto'd/overthrown apparently :-)

I think client software should be more polite against servers ran by others.

@Marwe
Copy link

Marwe commented May 1, 2024

Print output (e.g. for atlas with many pages) and rendering/storage for offline usage come to my mind. Such operations may trigger mass downloads, heavily depending on the resolution settings.
For WMS you usually get a notification, maybe there are places for hooks?

@ianthetechie
Copy link

I saw a comment about this on Mastodon and was encouraged to drop some thoughts over here 😀

@rduivenvoorde hit one of the things I'm pretty sure I've hit before. I haven't gotten to the level of digging into the detail he has, but I notice that it loads a lot more tiles than are necessary in many cases.

Another thing I think is going on is not cancelling queued in flight requests that aren't necessary anymore. This could actually be the above.

Both of these would actually improve the overall responsiveness of QGIS (which is always slower than web based maps, presumably due some mix of poor queuing logic and over-fetching), in addition to being lighter on tile servers like OSM.

@nirvn
Copy link
Author

nirvn commented Oct 19, 2024

Implemented.

@nirvn nirvn closed this as completed Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests