- Add a constructor to
RepositoryConfig
- Now each array has its own chunk manifest, speeding up reads for large repositories
- The snapshot now keeps track of the chunk space bounding box for each manifest
- Configuration settings can now be overridden in a field-by-field basis
Example:
will use 0 for
config = icechunk.RepositoryConfig(inline_chunk_threshold_byte=0) storage = ... repo = icechunk.Repository.open( storage=storage, config=config, )
inline_chunk_threshold_byte
but all other configuration fields will come from the repository persistent config. If persistent config is not set, configuration defaults will take its place. - In preparation for on-disk format stability, all metadata files include extensive format information; including a set of magic bytes, file type, spec version, compression format, etc.
- Zarr's
getsize
got orders of magnitude faster because it's implemented natively and with no need of any I/O - We added several performance benchmarks to the repository
- Better configuration for metadata asset caches, now based on their sizes instead of their number
from icechunk import *
no longer fails
-
New
Repository.reopen
function to ope a repo again, overwriting its configuration and/or virtual chunk container credentials -
Configuration classes are now mutable and easier to use:
storage = ... config = icechunk.RepositoryConfig.default() config.storage.concurrency.ideal_concurrent_request_size = 1_000_000 repo = icechunk.Repository.open( storage=storage, config=config, )
-
ancestry
function can now receive a branch/tag name or a snapshot id -
set_virtual_ref
can now validate the virtual chunk container exists
- Better concurrent download of big chunks, both native and virtual
- We no longer allow
main
branch to be deleted
- Adds support for Azure Blob Storage
- Manifests now load faster, due to an improved serialization format
- The store now releases the GIL appropriately in multithreaded contexts
- Large chunks are fetched concurrently
IcechunkStore.list_dir
is now significantly faster- Support for Zarr 3.0 and xarray 2025.1.1
- Transaction logs and snapshot files are compressed
- Manifests compression using Zstd
- Large manifests are fetched using multiple parallel requests
- Functions to fetch and store repository config
- Faster
list_dir
anddelete_dir
implementations in the Zarr store
- Credentials from environment in GCS
- New Python API using
Repository
,Session
andStore
as separate entities - New Python API for configuring and opening
Repositories
- Added support for object store credential refresh
- Persistent repository config
- Commit conflict resolution and rebase support
- Added experimental support for Google Cloud Storage
- Add optional checksums for virtual chunks, either using Etag or last-updated-at
- Support for multiple virtual chunk locations using virtual chunk containers concept
- Added function
all_virtual_chunk_locations
toSession
to retrieve all locations where the repo has data
- Refs were stored in the wrong prefix
- Allow overwriting existing groups and arrays in Icechunk stores
- Fixed an error during commits where chunks would get mixed between different arrays
- Sync with zarr 3.0b2. The biggest change is the
mode
param onIcechunkStore
methods has been simplified toread_only
. - Changed
IcechunkStore::distributed_commit
toIcechunkStore::merge
, which now does not commit, but attempts to merge the changes from another store back into the current store. - Added a new
icechunk.dask.store_dask
method to write a dask array to an icechunk store. This is required for safely writing dask arrays to an icechunk store. - Added a new
icechunk.xarray.to_icechunk
method to write an xarray dataset to an icechunk store. This is required for safely writing xarray datasets with dask arrays to an icechunk store in a distributed or multi-processing context.
- The
StorageConfig
methods have been correctly typed. IcechunkStore
instances are now set toread_only
by default after pickling.- When checking out a snapshot or tag, the
IcechunkStore
will be set to read-only. If you want to write to the store, you must callIcechunkStore::set_writable()
. - An error will now be raised if you try to checkout a snapshot that does not exist.
- Added
IcechunkStore::reset_branch
andIcechunkStore::async_reset_branch
methods to point the head of the current branch to another snapshot, changing the history of the branch
- Zarr metadata will now only include the attributes key when the attributes dictionary of the node is not empty, aligning Icechunk with the python-zarr implementation.
- Initial release