-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(store): V3 ZipStore #2078
Conversation
async def delete(self, key: str) -> None: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I was reminded that you can't delete anything from inside a ZipFile 😢. This behavior also existed in 2.18.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be nice to have an In-memory version of a Zip store where all the zip data is read in memory. This way it can support deleting and updating entries. Thereafter, a user can persist the data using a method like write_to_file
. I think it would be very efficient for data sets whose compressed size is small enough to fit entirely in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that would be nice @zoj613. I would like to save that until after the 3.0 release though as the minimal zip store is a release blocker at this point.
src/zarr/store/zip.py
Outdated
supports_writes: bool = True | ||
supports_partial_writes: bool = False | ||
supports_listing: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do we think about adding supports_deletes: bool = False
as a class attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be a solid extrapolation of the design we are currently using. one downside of that design is that, even if a class has supports_x
set to False
, the class will still need an implementation of x
. Another solution would be to express supports_x
by having the class inherit from a DoesX
mixin. But that's maybe out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone the supports_deletes
route for now.
I played around with this a bit locally.
|
…nto feature/zip-store
There is now a context manager interface here. I'll note that I'm mildly uncomfortable with this because its not clear if this should be a async context manager or not. While I think this is what you were asking for, its not really the way classes that do async things should work. I've got some ideas for how we can improve this but I think we should hold them for another PR. Two asides:
Noting that this behavior is also present in v2: Lines 1884 to 1885 in 11fd8db
I think fsspec can handle most of this for us. Some work is likely needed there but it should be possible. |
…nto feature/zip-store
A sidenote that similar to the situation with consolidated metadata, the zip store will bring v3 to parity with v2 (good thing ™️ 👍🏽) but will leave other implementations in the same situation of not having a spec. If there's a chance that the discussion around that spec will lead to changes in the on-disk format, it would be in our best interest to make sure we have a specification to go along with these change so that there's less likelihood of multiple versions we need to support. |
+1, it would be great to define a spec for this so other implementations can safely support it |
Draft spec to go along with this feature is now available for review: zarr-developers/zarr-specs#311 |
…nto feature/zip-store
…to feature/zip-store
I tried this, and it produces zip files that are compatible with the zip store implementation in
I would expect it to succeed and just not write empty chunks.
For reference, in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see about addressing @LDeakin's finding here before merging. |
@LDeakin - your example exposed something beyond the reach of this PR. Setting a chunk equal to the fill value triggers a delete - even if the chunk does not exist. This is handled nicely for all other stores but is hard to manage here. Something we could do in the future is allow async def delete(self, key: str) -> None:
if await self.exists(key):
raise NotImplementedError This is a bit too clever IMO so I'm going to leave this for later. |
* v3: fix: opening a group with unspecified format finds either v2 or v3 (zarr-developers#2183) test: check that store, array, and group classes are serializable (zarr-developers#2006) feature(store): V3 ZipStore (zarr-developers#2078) More typing fixes for tests (zarr-developers#2173) refactor: split metadata into v2 and v3 modules (zarr-developers#2163) Accept dictionaries for `store` argument (zarr-developers#2164) Simplify mypy config for tests (zarr-developers#2156) Fixed path segment duplication in open_array (zarr-developers#2167) Fixed test warnings (zarr-developers#2168) chore: update pre-commit hooks (zarr-developers#2165) Ensure that store_dict used for empty dicts (zarr-developers#2162) Bump pypa/gh-action-pypi-publish from 1.10.0 to 1.10.1 in the actions group (zarr-developers#2160)
I threw together a super basic implementation of a v3 zip store. I am still testing this but it seems to be working for basic things.
closes #2010
try it!
TODO: