You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
most COGs in the wild tend to be ~100km footprint (corresponding to a single satellite acquisition). So total file size is something less than 1GB. But you can also generate huge cogs that cover a large spatial area or have very fine resolution. Here is a discussion about considerations for that https://twitter.com/howardbutler/status/1379053172497375232
a relevant point:
Depends on how smart the client is. A 1 megapixel x 1 megapixel raster tiled as 512x512 has a 58 MB tile index ((1e6/512)^228) GDAL with recent libtiff will only read a few KB in it when extracting a given tile. Less smart clients will ingest those whole 58 MB at file opening
also interesting discussion about how S3 throttling works for single files versus separate files.
The text was updated successfully, but these errors were encountered:
also interesting discussion about how S3 throttling works for single files versus separate files.
minor clarification. S3 throttles per key prefix which means its less about single/separate files and more the structure of those keys within the bucket. Consider landsat-pds, where each Landsat scene is under a unique key prefix using the grid row/col and scene ID.
In this example, the key prefix is L8/220/244/LC82202442014222LGN00. I'm a big fan of organizing S3 datasets in this way because you can optimize the S3 rate limit by distributing it geographically across a grid (in this case WRS). As long as clients distribute their requests across this grid they can achieve very high throughputs.
Another interesting property of S3 is it performs much better on fewer larger files than many smaller ones. There is lots of overhead in accessing a new file for the first time. You have to establish a connection, navigate TLS, and there are a handful of additional operations that happen within the data center. All of this adds up to time-to-first-byte (TTFB), which is the most expensive part of performing remote reads against S3.
Of course all of this changes with different filesystems, so when thinking about lots of little COGs vs fewer larger COGs, it is critical to consider the properties of the filesystem being used and optimize around that.
most COGs in the wild tend to be ~100km footprint (corresponding to a single satellite acquisition). So total file size is something less than 1GB. But you can also generate huge cogs that cover a large spatial area or have very fine resolution. Here is a discussion about considerations for that https://twitter.com/howardbutler/status/1379053172497375232
a relevant point:
also interesting discussion about how S3 throttling works for single files versus separate files.
The text was updated successfully, but these errors were encountered: