-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are write operations with the zarr Driver guaranteed to be thread- and process-safe? #198
Comments
To achieve this you need to use transactions, note, however, that this will not work with the s3 driver. |
I see... So something like this would be sufficient. Is that correct? # Process 1
path = "path/to/my.zarr"
arr = ts.open(
{
"driver": "zarr",
"kvstore": {"driver": "file", "path": path},
},
open=True,
read=True,
write=True,
create=False,
).result()
with ts.Transaction() as txn:
arr.with_transaction(txn)[(0,0,0):(64,64,32)] = 100 # Process 2
path = "path/to/my.zarr"
arr = ts.open(...) # Same as Process 1
with ts.Transaction() as txn:
arr.with_transaction(txn)[(0,0,32):(64,64,64)] = 200 I do have some additional related questions:
Please and thank you. |
You don't need to use a transaction to ensure that concurrent writes by different processes to disjoint portions of the same chunk are not lost --- this is always ensured, provided that you use a kvstore that supports atomic writes, like Both with and without use of an explicit transaction, only the conflicting chunks will be retried. In the case of the We are planning to add an option to disable the locking (currently does not exist), but I would expect that it adds very little overhead for a local filesystem if there is no contention. In general I expect it would only be noticed in the case of very small chunks, as otherwise the actual I/O would surely dominate. What can have a large impact is disabling fsync (via https://google.github.io/tensorstore/kvstore/file/index.html#durability-of-writes). |
Suppose I have an existing on-disk Zarr array. If I were to have two separate processes that:
tensorstore.open
Are these two write operations guaranteed to write correctly?
For example, suppose
my.zarr
has a chunk shape of (64,64,64).The only mention I could find was in the homepage, under the list of highlights.
And some basic testing seems to suggest that this is indeed true.
However, is this guaranteed to be the case? Is there anything within the documentation that provides this guarantee?
P.S. Out of curiosity, how is the OCC actually implemented? Checking the last modified date of the Zarr chunk in which to write, or something along these lines?
P.P.S. Great library, by the way
The text was updated successfully, but these errors were encountered: