-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Performance Tweaks #125
Comments
Thanks for exploring this and offering some ideas. I am generally hesitant to add any sort of Worker-specific features to zarr. I've used threads.js in a couple of projects and tried to package workers previously (https://github.com/geotiffjs/geotiff.js/, https://github.com/hms-dbmi/viv, https://github.com/gosling-lang/gosling.js) and I just don't think the ecosystem is mature enough for packaging worker code consistently; it has always resulted in serious headaches. The primary issue in my experience is that packaging for npm requires additional bundling of the library that add tons of maintenance complexity and requires making trade-offs for platforms. For example, using ES6 module imports in a Worker is only supported in Chrome, so any worker entrypoint must bundle it's dependencies completely or use With that said, one thing I have been curious about (but have yet to explore) is providing a codec implementation in an application using import { addCodec } from 'zarr';
class BloscWorkerPool { /* ... */ }
addCodec('blosc', BloscWorkerPool); // override builtin module I'd be curious to see how you modified the code in |
Thanks for the quick answer! Completely understood. Bundling is a nightmare. I like the codec approach but i am wondering (not very well versed in Javascript concurrency) what the tradeoff would to have fetch run in main and then pass the buffer to the codec pool (as transferable object) vs have both run in the worker. I guess that fetch does some magic under the hood that it would not make a difference? Happy to share my implementation, it is however a react hook at this point, and needs some refactoring to be condensed. :D If possible I would love to discuss about Viv as well if you have the time? (Trying to implement it as a viewer for a data management and workflow platform). Maybe we could schedule a little online discussion? |
Oh wow, this question actually made me realize you could implement a Store that performs fetching and decoding entirely in a worker. Basically, the custom store could act as a transformation layer on top of the original store, which intercepts import { openArray, HTTPStore } from 'zarr';
let url = "https://example.com/data.zarr";
// fetch the metadata
let meta = await fetch(url + '/.zarray').then(res => res.json());
console.log(meta.compressor);
// {
// "blocksize": 0,
// "clevel": 5,
// "cname": "lz4",
// "id": "blosc",
// "shuffle": 1
// } // fetching and decoding happen on main thread withing `ZarrArray`
let store = new HTTPStore(url);
let arr = await openArray(store);
console.log(arr.compressor); // Blosc, // fetching and decoding happen in a Worker (inside the `getItem` method of the store)
let store = new HTTPWorkerStore(url);
let arr = await openArray(store);
console.log(arr.compressor); // null, store modifies the `.zarray` and decodes chunks itself
That would be great! Send me an email and we can find a time to chat ([email protected]). |
Love that! To maybe reuse some logic it would be nice to have some ComposedStore that would allow to insert part of that "mittleware" logic, maybe following a pattern like "Apollo Link". Something that could help also with some commonly compute intensive tasks like "rescaling".e.g let store = composeStore(
decodeLink,
rescaleLink,
fetchTerminatingLink
) |
Just found this thread and the code I wrote might be of interest as showing some way to deal with worker loading. |
While exploring zarr.js performance on big datasets, I realized that there are some limitations when loading and decoding lots of chunks, as for know everything happens in the main thread ( #33 was mentioning this issue before). I played a bit around and found that using "treads.js" and their pool implementation speed up the loading dramatically, as decoding can happen in a webworker (threadjs also provides the same abstraction for running on node), however this required a rewrite of
getBasicSelection
in ZarrArray.I understood from #33 that this might be mission creep, but maybe it would be an option to give the ability to offload the decoding to the store (by maybe extending DecodingStore), then the store implementation could handle the retrieval and decoding in a worker pool? (There are a few gotchas of what you are able to pass to and from a webworker but one way is to only send the Codec Config and the ChunkKey to the worker).
The text was updated successfully, but these errors were encountered: