Experimental, proof-of-concept.
This repository started as a simple set of demonstrations to prompt discussions over whether and how we should approach Virtualizing GeoTIFFs and COGs.
First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:
- Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
- Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
- Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/xstac, thereby decreasing cost and increasing performance
- Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
- Include etags in the virtualized datasets to support reproducibility
- A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (zarr-developers/geozarr-spec#53)
- Clone the repository:
git clone https://github.com/maxrjones/virtual-tiff.git
. - Pull baseline image data from dvc remote
pixi run -e test download-test-images
WARNING: This will download ~1.4GB of TIFFs for testing to your machine. - Run the test suite using
pixi run -e test run-tests
WARNING: Some tests will fail due to incomplete status of the implementation. - Start a shell if needed in the development environment using
pixi run -e test zsh
.
virtual-tiff
is distributed under the terms of the MIT license.