-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a virtual zarr datacube from COG s3 objects #125
Comments
seems to be calling builtin |
To work around the immediate issue: with TiffFile(fs.open(s3_flist[0])) as tif:
shape = tif.series[0].shape
dtype = tif.series[0].dtype
...
with image_sequence.aszarr(chunkshape=shape, dtype=dtype) as store:
store.write_fsspec(
'issue125.json',
'https://mghp.osn.xsede.org/rsignellbucket1/lcmap/cog',
codec_id='tifffile',
) Except for multi-file OME-TIFF series, tifffile does not currently support multiscales or chunks within file series. I am waiting for zarr to standardize "sharding" before implementing this. Try to export the reference files for each TIFF file as follows and then edit+merge those files. Should be relatively straightforward for this case of a linear file sequence without higher dimensions or tiling: import json
import fsspec
import zarr
import tifffile
fs = fsspec.filesystem(
's3',
anon=True,
client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'},
)
s3_flist = [f's3://{f}' for f in fs.ls('/rsignellbucket1/lcmap/cog')]
for filename in s3_flist:
url, name = filename.rsplit('/', 1)
with tifffile.TiffFile(fs.open(filename), name=name) as tif:
with tif.series[0].aszarr() as store:
store.write_fsspec(name + '.json', url=url)
aggregate = {}
for i, filename in enumerate(sorted(s3_flist)):
url, name = filename.rsplit('/', 1)
with open(name + '.json') as fh:
data = fh.read()
for key, value in json.loads(data).items():
if key in aggregate:
continue
if key.endswith('.zarray'):
value = json.loads(value)
value['chunks'].insert(0, 1)
value['shape'].insert(0, len(s3_flist))
value = json.dumps(value)
elif key[-1].isdigit():
key = key.replace('/', f'/{i}.')
aggregate[key] = value
with open('aggregate.json', 'w') as fh:
fh.write(json.dumps(aggregate, indent=1))
mapper = fsspec.get_mapper(
'reference://',
fo='aggregate.json',
target_protocol='file',
remote_protocol='s3',
)
za = zarr.open(mapper, mode='r')
print(za[0]) Alternatively, it might be easier to create the reference file directly from the metadata in the TIFF files without going through a zarr store. This prints all necessary metadata for filename in sorted(s3_flist):
with TiffFile(fs.open(filename)) as tif:
for level in tif.series[0].levels:
print(level.dtype)
print(level.shape)
print(level.keyframe.chunks)
print(level.keyframe.compression)
for page in level.pages:
for offset, bytecount in zip(
page.dataoffsets, page.databytecounts
):
print(offset, bytecount) |
I've had a go, and kerchunk's combine almost deals correctly with this case. It fails because there are no _ARRAY_DIMENSIONS, since the TIFFs don't conform to netCDF. If the output of TIFFFile is meant for xarrayy (as opposed to just zarr), it would be reasonable to add them; but also it's reasonable that kerchunk should be able to cope with their absence. Also as a note: there are very many small arrays in the data, so kerchunk wants to inline them, and it does this serially, which is very slow from where I am. @cgohlke , I'd love to hear your thought on zarr sharding, versus the kinds of things that kerchunk can do e.g., fsspec/kerchunk#134 |
I tried to add |
Xarray can read these data, if we assign independent X/Y coordinates to each downsampling level. Here is a hand-modified version of one of the GOCs:
where the X/Ys are simply index arrays, but could be values taken to co-align the levels. Of course, one might ask whether it is useful to do this. It is true that xarray does not handle zarr arrays, but of course kerchunk can produce arbitrary directory layouts, so making it into a group for xarray's consumption is easy. Further, talking with zarr/OME people, it also does make sense to add an extra dimension to each of the levels, so they become 3D, even if the extra dimension is the same in each level. One would need to describe that the downsampling from level to the next was not the same across dimensions (see the |
Indeed, including the level in the array dimension names is the workaround I just implemented in tifffile. Also, specifying a groupname for a zarr array in
That extra dimension would only be needed if there are not already dimensions that are not downsampled, like samples/channels or time? |
Indeed, there might be even more dimensions like bands/channels. IF the dimensions being concatenated along also has multiple points per input file (e.g., each input array has 2 timepoints and so a shape like (2, X, Y)), then you get into exactly what kerchunk.combine is actually for; but we need a way to know what the times for each file are and where they figure in an array's dimensions, in order to know how to combine them. Of course, you can always do this kind of thing by processing the JSON directly, and time is just one example of a dimension to concatenate along. |
@martindurant : I just noticed that the earthbigdata.py script no longer works with fsspec 2022.3.0 or main branch. The JSON file gets mapped and the xarray dataset created but the array data and plots are empty. fsspec 2022.02.0 works with the same JSON file. tifffile/examples/earthbigdata.py Lines 463 to 507 in 3428213
|
Because the reference for tifffile/examples/earthbigdata.py Line 467 in 3428213
|
Thank you! It works. |
The file issue125.json was created by this script: # tifffile/examples/issues125.py
"""Create a fsspec ReferenceFileSystem for a sequence of TIFF files on S3
This Python script uses the tifffile and fsspec libraries to create a
multiscale ReferenceFileSystem JSON file for a sequence of cloud optimized
GeoTIFF (COG) files stored on S3. The tiles of the COG files are used as
chunks. No additional numcodecs codec needs to be registered since the COG
files use zlib compression. A xarray dataset is created from the
ReferenceFileSystem file and a subset of the dataset is ploted.
See https://github.com/cgohlke/tifffile/issues/125
"""
import tifffile # >= 2022.4.8
import fsspec
import xarray
from matplotlib import pyplot
# get a list of cloud optimized GeoTIFF files stored on S3
remote_options = {
'anon': True,
'client_kwargs': {'endpoint_url': 'https://mghp.osn.xsede.org'},
}
fs = fsspec.filesystem('s3', **remote_options)
files = [f's3://{f}' for f in fs.ls('/rsignellbucket1/lcmap/cog')]
# write the ReferenceFileSystem of each file to a JSON file
with open('issue125.json', 'w', newline='\n') as jsonfile:
for i, filename in enumerate(tifffile.natural_sorted(files)):
url, name = filename.rsplit('/', 1)
with fs.open(filename) as fh:
with tifffile.TiffFile(fh, name=name) as tif:
with tif.series[0].aszarr() as store:
store.write_fsspec(
jsonfile,
url=url,
# using an experimental API:
_shape=[len(files)], # shape of file sequence
_axes='T', # axes of file sequence
_index=[i], # index of this file in sequence
_append=i != 0, # if True, only write index keys+value
_close=i == len(files) - 1, # if True, no more appends
# groupname='0', # required for non-pyramidal series
)
# create a fsspec mapper instance from the ReferenceFileSystem file
mapper = fsspec.get_mapper(
'reference://',
fo='issue125.json',
target_protocol='file',
remote_protocol='s3',
remote_options=remote_options,
)
# create a xarray dataset from the mapper
dataset = xarray.open_zarr(mapper, consolidated=False)
print(dataset)
# plot a slice of the 5th pyramidal level of the dataset
xarray.plot.imshow(dataset['5'][0, 32:-32, 32:-32])
pyplot.show() |
@cgohlke I tried it and it worked perfectly -- it's awesome to be able to access the different overviews in xarray! What do you think would be the best way to generate the x and y coordinates for xarray? Xarray needs the full 1D x,y arrays, even if they are uniformly spaced (as they are in tiff). These arrays would compress extremely well, however, so shouldn't take up too much space in the JSON, right? |
It should be possible to add the coordinate arrays to the JSON, similar to the earthbigdata.py script: tifffile/examples/earthbigdata.py Lines 197 to 202 in 3f88ca5
The coordinates could be hiding in the GeoTIFF metadata
I just noticed that the files are color indexed ( |
I don't think so, but I'll check around. This is land cover classification with only 8 values:
|
Shouldn't these tags appear as .zattrs in the output references? I don't know if they belong to the variables or the top level. |
I just tried the above:- ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_37452/991707357.py in <module>
14 with tifffile.TiffFile(fh, name=name) as tif:
15 with tif.series[0].aszarr() as store:
---> 16 store.write_fsspec(
17 jsonfile,
18 url=url,
TypeError: write_fsspec() got an unexpected keyword argument '_shape' |
am I behind? |
Line 16 in 375d97f
|
thanks! |
Having a look at your examples now. |
This is my current use case. https://discourse.pangeo.io/t/model-geotiff-postprocessing-at-scale-what-would-you-do/2461/3 |
Borrowing the example definitely doing some things e.g. starts, gets a list of chunks, then I think stops one last file - am I missing something to do with adding a delimiter at the end? {
".zattrs": "{\n \"_ARRAY_DIMENSIONS\": [\n \"T\",\n \"Y\",\n \"X\"\n ]\n}",
".zarray": "{\n \"chunks\": [\n 1,\n 1,\n 5087\n ],\n \"compressor\": {\n \"id\": \"imagecodecs_packbits\"\n },\n \"dtype\": \"<f4\",\n \"fill_value\": -1000.0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 117,\n 4710,\n 5087\n ],\n \"zarr_format\": 2\n}", so that looks ok I am just throwing it a list of single band floating point standard geotiffs, not cogs ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_149996/2784632950.py in <module>
19
20 # create a fsspec mapper instance from the ReferenceFileSystem file
---> 21 mapper = fsspec.get_mapper(
22 'reference://',
23 fo='test.json',
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\fsspec\mapping.py in get_mapper(url, check, create, missing_exceptions, alternate_root, **kwargs)
226 """
227 # Removing protocol here - could defer to each open() on the backend
--> 228 fs, urlpath = url_to_fs(url, **kwargs)
229 root = alternate_root if alternate_root is not None else urlpath
230 return FSMap(root, fs, check, create, missing_exceptions=missing_exceptions)
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\fsspec\core.py in url_to_fs(url, **kwargs)
405 options = cls._get_kwargs_from_urls(url)
406 update_storage_options(options, kwargs)
--> 407 fs = cls(**options)
408 urlpath = fs._strip_protocol(url)
409 return fs, urlpath
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\fsspec\spec.py in __call__(cls, *args, **kwargs)
66 return cls._cache[token]
67 else:
---> 68 obj = super().__call__(*args, **kwargs)
69 # Setting _fs_token here causes some static linters to complain.
70 obj._fs_token_ = token
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\fsspec\implementations\reference.py in __init__(self, fo, target, ref_storage_args, target_protocol, target_options, remote_protocol, remote_options, fs, template_overrides, simple_templates, loop, ref_type, **kwargs)
149 self._process_dataframe()
150 else:
--> 151 self._process_references(text, template_overrides)
152 if fs is not None:
153 self.fs = fs
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\fsspec\implementations\reference.py in _process_references(self, references, template_overrides)
284 def _process_references(self, references, template_overrides=None):
285 if isinstance(references, (str, bytes)):
--> 286 references = json.loads(references)
287 vers = references.get("version", None)
288 if vers is None:
ValueError: Unexpected character in found when decoding object value |
# write the ReferenceFileSystem of each file to a JSON file
with open(test.json', 'w', newline='\n') as jsonfile:
for i, filename in enumerate(tifffile.natural_sorted(filetest[0:2])):
url, name = filename.rsplit('/', 1)
with fs.open(filename) as fh:
with tifffile.TiffFile(fh, name=name) as tif:
with tif.series[0].aszarr() as store:
store.write_fsspec(
jsonfile,
url=url,
# using an experimental API:
_shape=[len(files)], # shape of file sequence
_axes='T', # axes of file sequence
_index=[i], # index of this file in sequence
_append=i != 0, # if True, only write index keys+value
_close=i == len(files) - 1, # if True, no more appends
# groupname='0', # required for non-pyramidal series
)
# create a fsspec mapper instance from the ReferenceFileSystem file
mapper = fsspec.get_mapper(
'reference://',
fo='test.json',
target_protocol='file',
remote_protocol='s3',
remote_options=r_opts,
) |
e.g. was no } at the end, how should the file close? |
if I add the groupname in - then, wants a group - no fileclosing still it seems ---------------------------------------------------------------------------
ContainsArrayError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_149996/644436119.py in <module>
----> 1 dataset = xarray.open_zarr(mapper, consolidated=False)
2 print(dataset)
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, **kwargs)
766 }
767
--> 768 ds = open_dataset(
769 filename_or_obj=store,
770 group=group,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
493
494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
496 filename_or_obj,
497 drop_variables=drop_variables,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, lock)
822
823 filename_or_obj = _normalize_path(filename_or_obj)
--> 824 store = ZarrStore.open_group(
825 filename_or_obj,
826 group=group,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel)
384 zarr_group = zarr.open_consolidated(store, **open_kwargs)
385 else:
--> 386 zarr_group = zarr.open_group(store, **open_kwargs)
387 return cls(
388 zarr_group,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\zarr\hierarchy.py in open_group(store, mode, cache_attrs, synchronizer, path, chunk_store, storage_options)
1164 if mode in ['r', 'r+']:
1165 if contains_array(store, path=path):
-> 1166 raise ContainsArrayError(path)
1167 elif not contains_group(store, path=path):
1168 raise GroupNotFoundError(path)
ContainsArrayError: path '' contains an array |
With the 0 default group, then looks like this:- {
".zgroup": "{\n \"zarr_format\": 2\n}",
"0/.zattrs": "{\n \"_ARRAY_DIMENSIONS\": [\n \"T\",\n \"Y\",\n \"X\"\n ]\n}",
"0/.zarray": "{\n \"chunks\": [\n 1,\n 1,\n 5087\n ],\n \"compressor\": {\n \"id\": \"imagecodecs_packbits\"\n },\n \"dtype\": \"<f4\",\n \"fill_value\": -1000.0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 117,\n 4710,\n 5087\n ],\n \"zarr_format\": 2\n}",
same error if I actually close with a brace, too. Just trying with two files to start. |
Unless this is actually an xarray problem? Of which this environment hsa 0.20.1 |
something else it doesn't like in the directory, other json etc? |
Moving the notebook just to see, I get:- ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_26516/644436119.py in <module>
----> 1 dataset = xarray.open_zarr(mapper, consolidated=False)
2 print(dataset)
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, **kwargs)
766 }
767
--> 768 ds = open_dataset(
769 filename_or_obj=store,
770 group=group,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
493
494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
496 filename_or_obj,
497 drop_variables=drop_variables,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, lock)
836 store_entrypoint = StoreBackendEntrypoint()
837 with close_on_error(store):
--> 838 ds = store_entrypoint.open_dataset(
839 store,
840 mask_and_scale=mask_and_scale,
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\store.py in open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
22 decode_timedelta=None,
23 ):
---> 24 vars, attrs = store.load()
25 encoding = store.get_encoding()
26
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\common.py in load(self)
121 """
122 variables = FrozenDict(
--> 123 (_decode_variable_name(k), v) for k, v in self.get_variables().items()
124 )
125 attributes = FrozenDict(self.get_attrs())
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in get_variables(self)
436
437 def get_variables(self):
--> 438 return FrozenDict(
439 (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
440 )
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\core\utils.py in FrozenDict(*args, **kwargs)
474
475 def FrozenDict(*args, **kwargs) -> Frozen:
--> 476 return Frozen(dict(*args, **kwargs))
477
478
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\xarray\backends\zarr.py in <genexpr>(.0)
436
437 def get_variables(self):
--> 438 return FrozenDict(
439 (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
440 )
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\zarr\hierarchy.py in _array_iter(self, keys_only, method, recurse)
483 path = self._key_prefix + key
484 if contains_array(self._store, path):
--> 485 yield key if keys_only else (key, self[key])
486 elif recurse and contains_group(self._store, path):
487 group = self[key]
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\zarr\hierarchy.py in __getitem__(self, item)
339 path = self._item_path(item)
340 if contains_array(self._store, path):
--> 341 return Array(self._store, read_only=self._read_only, path=path,
342 chunk_store=self._chunk_store,
343 synchronizer=self._synchronizer, cache_attrs=self.attrs.cache)
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\zarr\core.py in __init__(self, store, path, read_only, chunk_store, synchronizer, cache_metadata, cache_attrs, partial_decompress)
157
158 # initialize metadata
--> 159 self._load_metadata()
160
161 # initialize attributes
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\zarr\core.py in _load_metadata(self)
174 """(Re)load metadata from store."""
175 if self._synchronizer is None:
--> 176 self._load_metadata_nosync()
177 else:
178 mkey = self._key_prefix + array_meta_key
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\zarr\core.py in _load_metadata_nosync(self)
203 self._compressor = None
204 else:
--> 205 self._compressor = get_codec(config)
206
207 # setup filters
~\AppData\Local\Continuum\anaconda3\envs\stuartshelf\lib\site-packages\numcodecs\registry.py in get_codec(config)
31 cls = codec_registry.get(codec_id, None)
32 if cls is None:
---> 33 raise ValueError('codec not available: %r' % codec_id)
34 return cls.from_config(config)
35
ValueError: codec not available: 'imagecodecs_packbits'
1
fs = fsspec.filesystem('s3', **r_opts)
2 although the environment tells me it is there - maybe the environment has issues? |
Same thing, different environment, so presumably some sort of install issue. |
Just in case, but no luck (base) C:\>conda activate kerchunktest
(kerchunktest) C:\>conda install -c conda-forge --force-reinstall imagecodecs
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: C:\Users\rscott\AppData\Local\Continuum\anaconda3\envs\kerchunktest
added / updated specs:
- imagecodecs
The following packages will be downloaded:
package | build
---------------------------|-----------------
blosc-1.21.1 | h74325e0_2 49 KB conda-forge
c-blosc2-2.1.1 | h09319c2_0 212 KB conda-forge
imagecodecs-2022.2.22 | py38h6acac19_4 2.1 MB conda-forge
libavif-0.10.1 | h8ffe710_0 2.3 MB conda-forge
------------------------------------------------------------
Total: 4.6 MB
The following NEW packages will be INSTALLED:
aom conda-forge/win-64::aom-3.3.0-h0e60522_1
blosc conda-forge/win-64::blosc-1.21.1-h74325e0_2
c-blosc2 conda-forge/win-64::c-blosc2-2.1.1-h09319c2_0
cfitsio conda-forge/win-64::cfitsio-4.1.0-h5a969a9_0
charls conda-forge/win-64::charls-2.3.4-h39d44d4_0
giflib conda-forge/win-64::giflib-5.2.1-h8d14728_2
imagecodecs conda-forge/win-64::imagecodecs-2022.2.22-py38h6acac19_4
jbig conda-forge/win-64::jbig-2.1-h8d14728_2003
jxrlib conda-forge/win-64::jxrlib-1.1-h8ffe710_2
lcms2 conda-forge/win-64::lcms2-2.12-h2a16943_0
lerc conda-forge/win-64::lerc-3.0-h0e60522_0
libaec conda-forge/win-64::libaec-1.0.6-h39d44d4_0
libavif conda-forge/win-64::libavif-0.10.1-h8ffe710_0
libbrotlicommon conda-forge/win-64::libbrotlicommon-1.0.9-h8ffe710_7
libbrotlidec conda-forge/win-64::libbrotlidec-1.0.9-h8ffe710_7
libbrotlienc conda-forge/win-64::libbrotlienc-1.0.9-h8ffe710_7
libdeflate conda-forge/win-64::libdeflate-1.10-h8ffe710_0
libtiff conda-forge/win-64::libtiff-4.3.0-hc4061b1_3
libwebp-base conda-forge/win-64::libwebp-base-1.2.2-h8ffe710_1
libzopfli conda-forge/win-64::libzopfli-1.0.3-h0e60522_0
openjpeg conda-forge/win-64::openjpeg-2.4.0-hb211442_1
snappy conda-forge/win-64::snappy-1.1.9-h82413e6_0
zfp conda-forge/win-64::zfp-0.5.5-h0e60522_8
Proceed ([y]/n)? y
Downloading and Extracting Packages
c-blosc2-2.1.1 | 212 KB | ############################################################################ | 100%
imagecodecs-2022.2.2 | 2.1 MB | ############################################################################ | 100%
libavif-0.10.1 | 2.3 MB | ############################################################################ | 100%
blosc-1.21.1 | 49 KB | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done |
Unless you have also updated numcodecs, you need to do
to make sure zarr knows about it. |
Thanks Martin! See how much further I can get tomorrow. |
Set
That's a very inefficient chunk. Probably better to treat individual files as chunks: Lines 655 to 684 in 375d97f
In that case, using the TIFF decoder from the imagecodecs package is more efficient. See the earthbigdata.py |
Thanks for that. |
OK, got this far chunkshape = (testraster.shape[1],testraster.shape[2])
print("chunkshape:", chunkshape)
fillvalue = -1000
latitude_label = 'latitude'
latitude_coordinates = [y for y in testraster.y.values]
longitude_label = 'longitude'
longitude_coordinates = [x for x in testraster.x.values]
coordinates = {}
zarrgroup = zarr.open_group(coordinates)
zarrgroup.array(
longitude_label, data=longitude_coordinates, dtype='float32'
).attrs['_ARRAY_DIMENSIONS'] = [longitude_label]
zarrgroup.array(
latitude_label, data=latitude_coordinates, dtype='float32'
).attrs['_ARRAY_DIMENSIONS'] = [latitude_label]
coordinates_json = tifffile.ZarrStore._json(str(coordinates)).decode()
jsonfile.write(coordinates_json[:-2]) # skip the last newline and brace
#mode = 'test' not needed here
fileseq = tifffile.TiffSequence(
[file for file in pMine_files],
)
store = fileseq.aszarr(
dtype='float32',
chunkshape=chunkshape,
fillvalue=fillvalue,
axestiled={0: 0},
zattrs={
'_ARRAY_DIMENSIONS': [
latitude_label,
longitude_label,
]
},
)
print(store)
store.write_fsspec(
jsonfile,
baseurl,
groupname=mode,
codec_id='imagecodecs_tiff',
_append=True,
_close=mode == 'lsmap', # close after last store
)
jsonfile.write('\n}')
jsonfile.close()
chunkshape: (4710, 5087)
ZarrFileSequenceStore
shape: 9420, 5087
chunks: 4710, 5087
dtype: float32
fillvalue: -1000 |
Which seems good, a json formatting error now, which I bypassed with a string conversion in the interim to check the above works. however, returning coordinates_json = tifffile.ZarrStore._json(coordinates).decode() gives a serialisation error ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [70], in <cell line: 26>()
17 zarrgroup.array(
18 longitude_label, data=longitude_coordinates, dtype='float32'
19 ).attrs['_ARRAY_DIMENSIONS'] = [longitude_label]
21 zarrgroup.array(
22 latitude_label, data=latitude_coordinates, dtype='float32'
23 ).attrs['_ARRAY_DIMENSIONS'] = [latitude_label]
---> 26 coordinates_json = tifffile.ZarrStore._json(coordinates).decode()
28 jsonfile.write(coordinates_json[:-2]) # skip the last newline and brace
30 #mode = 'test' not needed here
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\tifffile\tifffile.py:10256, in ZarrStore._json(obj)
10253 @staticmethod
10254 def _json(obj: Any, /) -> bytes:
10255 """Serialize obj to a JSON formatted string."""
> 10256 return json.dumps(
10257 obj,
10258 indent=1,
10259 sort_keys=True,
10260 ensure_ascii=True,
10261 separators=(',', ': '),
10262 ).encode('ascii')
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\json\__init__.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
232 if cls is None:
233 cls = JSONEncoder
--> 234 return cls(
235 skipkeys=skipkeys, ensure_ascii=ensure_ascii,
236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
237 separators=separators, default=default, sort_keys=sort_keys,
238 **kw).encode(obj)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\json\encoder.py:201, in JSONEncoder.encode(self, o)
199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
--> 201 chunks = list(chunks)
202 return ''.join(chunks)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\json\encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
429 yield from _iterencode_list(o, _current_indent_level)
430 elif isinstance(o, dict):
--> 431 yield from _iterencode_dict(o, _current_indent_level)
432 else:
433 if markers is not None:
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\json\encoder.py:405, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
403 else:
404 chunks = _iterencode(value, _current_indent_level)
--> 405 yield from chunks
406 if newline_indent is not None:
407 _current_indent_level -= 1
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\json\encoder.py:438, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
436 raise ValueError("Circular reference detected")
437 markers[markerid] = o
--> 438 o = _default(o)
439 yield from _iterencode(o, _current_indent_level)
440 if markers is not None:
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\json\encoder.py:179, in JSONEncoder.default(self, o)
160 def default(self, o):
161 """Implement this method in a subclass such that it returns
162 a serializable object for ``o``, or calls the base implementation
163 (to raise a ``TypeError``).
(...)
177
178 """
--> 179 raise TypeError(f'Object of type {o.__class__.__name__} '
180 f'is not JSON serializable')
TypeError: Object of type bytes is not JSON serializable |
tifffile/examples/earthbigdata.py Lines 242 to 249 in 375d97f
|
ah, thanks - missed that bit then, probably assuming it was ok. |
I get this error whether I include the band coordinate or not ".zgroup": "{\n \"zarr_format\": 2\n}",
"band/.zarray": "{\n \"chunks\": [\n 1\n ],\n \"compressor\": {\n \"blocksize\": 0,\n \"clevel\": 5,\n \"cname\": \"lz4\",\n \"id\": \"blosc\",\n \"shuffle\": 1\n },\n \"dtype\": \"<i4\",\n \"fill_value\": 0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 1\n ],\n \"zarr_format\": 2\n}",
"band/.zattrs": "{\n \"_ARRAY_DIMENSIONS\": [\n \"band\"\n ]\n}",
"band/0": "\u0002\u00013\u0004\u0004\u0000\u0000\u0000\u0004\u0000\u0000\u0000\u0014\u0000\u0000\u0000\u0001\u0000\u0000\u0000",
"latitude/.zarray": "{\n \"chunks\": [\n 4710\n ],\n \"compressor\": {\n \"blocksize\": 0,\n \"clevel\": 5,\n \"cname\": \"lz4\",\n \"id\": \"blosc\",\n \"shuffle\": 1\n },\n \"dtype\": \"<f4\",\n \"fill_value\": 0.0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 4710\n ],\n \"zarr_format\": 2\n}",
"latitude/.zattrs": "{\n \"_ARRAY_DIMENSIONS\": [\n \"latitude\"\n ]\n}",
"latitude/0":
...
"test/.zattrs": "{\n \"_ARRAY_DIMENSIONS\": [\n \"band\",\n \"latitude\",\n \"longitude\"\n ]\n}",
"test/.zarray": "{\n \"chunks\": [\n 1,\n 1,\n 4710,\n 5087\n ],\n \"compressor\": {\n \"id\": \"imagecodecs_tiff\"\n },\n \"dtype\": \"<f4\",\n \"fill_value\": -1000.0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 2,\n 1,\n 4710,\n 5087\n ],\n \"zarr_format\": 2\n}",
"test/0.0.0.0": ["https://generated/results-20201205/testrasters.tif"],
"test/1.0.0.0": ["https://generated/results-20201206/testrasters.tif"]
}
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [127], in <cell line: 9>()
1 mapper = fsspec.get_mapper(
2 'reference://',
3 fo='earthbigdata.json',
(...)
6 remote_options=r_opts,
7 )
----> 9 dataset = xarray.open_dataset(
10 mapper,
11 engine='zarr',
12 mask_and_scale=False,
13 backend_kwargs={'consolidated': False},
14 )
15 print(dataset)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
483 decoders = _resolve_decoders_kwargs(
484 decode_cf,
485 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
491 decode_coords=decode_coords,
492 )
494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
496 filename_or_obj,
497 drop_variables=drop_variables,
498 **decoders,
499 **kwargs,
500 )
501 ds = _dataset_from_backend_dataset(
502 backend_ds,
503 filename_or_obj,
(...)
510 **kwargs,
511 )
512 return ds
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\zarr.py:814, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)
812 store_entrypoint = StoreBackendEntrypoint()
813 with close_on_error(store):
--> 814 ds = store_entrypoint.open_dataset(
815 store,
816 mask_and_scale=mask_and_scale,
817 decode_times=decode_times,
818 concat_characters=concat_characters,
819 decode_coords=decode_coords,
820 drop_variables=drop_variables,
821 use_cftime=use_cftime,
822 decode_timedelta=decode_timedelta,
823 )
824 return ds
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\store.py:24, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
12 def open_dataset(
13 self,
14 store,
(...)
22 decode_timedelta=None,
23 ):
---> 24 vars, attrs = store.load()
25 encoding = store.get_encoding()
27 vars, attrs, coord_names = conventions.decode_cf_variables(
28 vars,
29 attrs,
(...)
36 decode_timedelta=decode_timedelta,
37 )
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\common.py:123, in AbstractDataStore.load(self)
101 def load(self):
102 """
103 This loads the variables and attributes simultaneously.
104 A centralized loading function makes it easier to create
(...)
120 are requested, so care should be taken to make sure its fast.
121 """
122 variables = FrozenDict(
--> 123 (_decode_variable_name(k), v) for k, v in self.get_variables().items()
124 )
125 attributes = FrozenDict(self.get_attrs())
126 return variables, attributes
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\zarr.py:422, in ZarrStore.get_variables(self)
421 def get_variables(self):
--> 422 return FrozenDict(
423 (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
424 )
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\utils.py:473, in FrozenDict(*args, **kwargs)
472 def FrozenDict(*args, **kwargs) -> Frozen:
--> 473 return Frozen(dict(*args, **kwargs))
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\zarr.py:423, in <genexpr>(.0)
421 def get_variables(self):
422 return FrozenDict(
--> 423 (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
424 )
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\zarr.py:419, in ZarrStore.open_store_variable(self, name, zarr_array)
416 if getattr(zarr_array, "fill_value") is not None:
417 attributes["_FillValue"] = zarr_array.fill_value
--> 419 return Variable(dimensions, data, attributes, encoding)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\variable.py:305, in Variable.__init__(self, dims, data, attrs, encoding, fastpath)
285 """
286 Parameters
287 ----------
(...)
302 unrecognized encoding items.
303 """
304 self._data = as_compatible_data(data, fastpath=fastpath)
--> 305 self._dims = self._parse_dimensions(dims)
306 self._attrs = None
307 self._encoding = None
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\variable.py:573, in Variable._parse_dimensions(self, dims)
571 dims = tuple(dims)
572 if len(dims) != self.ndim:
--> 573 raise ValueError(
574 f"dimensions {dims} must have the same length as the "
575 f"number of data dimensions, ndim={self.ndim}"
576 )
577 return dims
ValueError: dimensions ('latitude', 'longitude') must have the same length as the number of data dimensions, ndim=3 |
Ok, looks like it was keeping the first try - trialling this is a notebook, so when I reran was more up to date. |
It seems I had to add another dimension to match and have it work, which isn't in Christoph's earthbigdata script, but haven't worked out what I have done wrong yet. e.g. the first map is just all the fill value of -1000 it looks like. |
So, those are small lores files. How would you approach chunking generic geotiffs this way? I had been turning things into xarray zarr, which had useful defaults that worked ok. |
Have to decide what to do for a 40gb geotiff with 15 bands for example, chunkwise. |
Or add coords? |
So next question is approaching something like this:- <xarray.Dataset>
Dimensions: (band: 23, model_run: 1, y: 4710, x: 5087)
Coordinates:
* band (band) int32 1 2 3 4 5 6 7 8 9 10 ... 15 16 17 18 19 20 21 22 23
* model_run (model_run) int32 20201205
* x (x) float32 113.0 113.0 113.0 113.0 ... 154.0 154.0 154.0 154.0
* y (y) float32 -10.0 -10.01 -10.02 -10.03 ... -43.98 -43.99 -44.0
Data variables:
pMine (model_run, band, y, x) float32 ... e.g. 23 different model varieties in one - this is too big for a 'one file' chunk'. Depending how big they are, strip sized chunks might be ok [if it is 50GB], but probably not at the 2GB level, e.g. one Sentinel band or the file example above. |
Not a particularly useful thing to do, but a trial :- chunkshape: (23, 4710, 5087)
['20201205']
ZarrFileSequenceStore
shape: 1, 23, 4710, 5087
chunks: 1, 23, 4710, 5087
dtype: float32
fillvalue: -1000
dataset['pMine'].max()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 dataset['pMine'].max()
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\common.py:58, in ImplementsArrayReduce._reduce_method.<locals>.wrapped_func(self, dim, axis, skipna, **kwargs)
57 def wrapped_func(self, dim=None, axis=None, skipna=None, **kwargs):
---> 58 return self.reduce(func, dim, axis, skipna=skipna, **kwargs)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\dataarray.py:2696, in DataArray.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
2654 def reduce(
2655 self,
2656 func: Callable[..., Any],
(...)
2661 **kwargs: Any,
2662 ) -> DataArray:
2663 """Reduce this array by applying `func` along some dimension(s).
2664
2665 Parameters
(...)
2693 summarized data and the indicated dimension(s) removed.
2694 """
-> 2696 var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
2697 return self._replace_maybe_drop_dims(var)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\variable.py:1806, in Variable.reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
1804 data = func(self.data, axis=axis, **kwargs)
1805 else:
-> 1806 data = func(self.data, **kwargs)
1808 if getattr(data, "shape", ()) == self.shape:
1809 dims = self.dims
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\variable.py:339, in Variable.data(self)
337 return self._data
338 else:
--> 339 return self.values
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\variable.py:512, in Variable.values(self)
509 @property
510 def values(self):
511 """The variable's data as a numpy.ndarray"""
--> 512 return _as_array_or_item(self._data)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\variable.py:252, in _as_array_or_item(data)
238 def _as_array_or_item(data):
239 """Return the given values as a numpy array, or as an individual item if
240 it's a 0d datetime64 or timedelta64 array.
241
(...)
250 TODO: remove this (replace with np.asarray) once these issues are fixed
251 """
--> 252 data = np.asarray(data)
253 if data.ndim == 0:
254 if data.dtype.kind == "M":
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\indexing.py:552, in MemoryCachedArray.__array__(self, dtype)
551 def __array__(self, dtype=None):
--> 552 self._ensure_cached()
553 return np.asarray(self.array, dtype=dtype)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\indexing.py:549, in MemoryCachedArray._ensure_cached(self)
547 def _ensure_cached(self):
548 if not isinstance(self.array, NumpyIndexingAdapter):
--> 549 self.array = NumpyIndexingAdapter(np.asarray(self.array))
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\indexing.py:522, in CopyOnWriteArray.__array__(self, dtype)
521 def __array__(self, dtype=None):
--> 522 return np.asarray(self.array, dtype=dtype)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\core\indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
421 def __array__(self, dtype=None):
422 array = as_indexable(self.array)
--> 423 return np.asarray(array[self.key], dtype=None)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\xarray\backends\zarr.py:73, in ZarrArrayWrapper.__getitem__(self, key)
71 array = self.get_array()
72 if isinstance(key, indexing.BasicIndexer):
---> 73 return array[key.tuple]
74 elif isinstance(key, indexing.VectorizedIndexer):
75 return array.vindex[
76 indexing._arrayize_vectorized_indexer(key, self.shape).tuple
77 ]
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\zarr\core.py:720, in Array.__getitem__(self, selection)
718 result = self.vindex[selection]
719 else:
--> 720 result = self.get_basic_selection(pure_selection, fields=fields)
721 return result
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\zarr\core.py:846, in Array.get_basic_selection(self, selection, out, fields)
843 return self._get_basic_selection_zd(selection=selection, out=out,
844 fields=fields)
845 else:
--> 846 return self._get_basic_selection_nd(selection=selection, out=out,
847 fields=fields)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\zarr\core.py:889, in Array._get_basic_selection_nd(self, selection, out, fields)
883 def _get_basic_selection_nd(self, selection, out=None, fields=None):
884 # implementation of basic selection for array with at least one dimension
885
886 # setup indexer
887 indexer = BasicIndexer(selection, self)
--> 889 return self._get_selection(indexer=indexer, out=out, fields=fields)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\zarr\core.py:1179, in Array._get_selection(self, indexer, out, fields)
1173 if not hasattr(self.chunk_store, "getitems") or \
1174 any(map(lambda x: x == 0, self.shape)):
1175 # sequentially get one key at a time from storage
1176 for chunk_coords, chunk_selection, out_selection in indexer:
1177
1178 # load chunk selection into output array
-> 1179 self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection,
1180 drop_axes=indexer.drop_axes, fields=fields)
1181 else:
1182 # allow storage to get multiple items at once
1183 lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\zarr\core.py:1883, in Array._chunk_getitem(self, chunk_coords, chunk_selection, out, out_selection, drop_axes, fields)
1880 out[out_selection] = fill_value
1882 else:
-> 1883 self._process_chunk(out, cdata, chunk_selection, drop_axes,
1884 out_is_ndarray, fields, out_selection)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\zarr\core.py:1791, in Array._process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
1789 if isinstance(cdata, PartialReadBuffer):
1790 cdata = cdata.read_full()
-> 1791 self._compressor.decode(cdata, dest)
1792 else:
1793 chunk = ensure_ndarray(cdata).view(self._dtype)
File ~\AppData\Local\Continuum\anaconda3\envs\kerchunktest\lib\site-packages\imagecodecs\numcodecs.py:812, in Tiff.decode(self, buf, out)
811 def decode(self, buf, out=None):
--> 812 return imagecodecs.tiff_decode(
813 buf,
814 index=self.index,
815 asrgb=self.asrgb,
816 verbose=self.verbose,
817 out=out,
818 )
File imagecodecs\_tiff.pyx:327, in imagecodecs._tiff.tiff_decode()
File imagecodecs\_shared.pyx:164, in imagecodecs._shared._create_array()
ValueError: invalid output shape (1, 23, 4710, 5087) (4710, 5087, 23) |
Looks like the dataset was removed: >>> import fsspec
>>> fs = fsspec.filesystem('s3', anon=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'})
>>> fs.ls('/rsignellbucket1/lcmap/cog')
[] Edit: Never mind. Works for me now. |
@cgohlke , indeed every once in a while it seems there is a glitch on the open storage network. :( |
@cgohlke, as mentioned at fsspec/kerchunk#78 (comment), I'd like to try reading a time stack of geotiffs via the Zarr library.
As as test, I've got 2 COGs on S3-API-accessible object storage that I'd like to create a virtual Zarr dataset for using the
tifffile
library.I can get it going if the COGS are local, but I'm a little unsure how to handle the s3 objects.
I'm trying to do:
but neither one works. These buckets are public access (no egress!) so this code should be reproducible.
Here's the Jupyter Notebook demonstrating the local success (except I'd like to use the native 2048x2048 chunking in the COG instead of the entire file!) and remote s3 failures.
What is the best way to approach this?
The text was updated successfully, but these errors were encountered: