You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
zarr_checksum seems to be not zarr specific at all! It just estimates a checksum over a hierarchy of directories/files given a "walker" generator thus could be generalized for any hierarchy checksumming given a "walker". Moreover it is already implemented that way pretty much given the FileGenerator interface and yield_files_s3 and yield_files_local implementations. There is even now
Then it would become relevant to fscache where we also do folders checksumming , and somehow threaded implementation ended up slowing things down: https://github.com/con/fscacher/pull/67/files .
The text was updated successfully, but these errors were encountered:
What would be the point of generalizing the code? What do you want to use it for that requires generalization?
If you want to use this for fscacher, then (assuming the code is generalized to the point that we can reproduce fscacher's current directory-fingerprinting algorithm with it), what would we gain from doing so?
You might as well be right @jwodder. As you know I dislike code duplication (duplication of effort/bugs/...), and this feels like the right target for the reason that we need directories fingerprinting now in multiple cases (at least for zarr and fscacher ATM), and all of them could benefit from a clean and efficient "traverse + fingerprint" implementation.
Initial motivation was also simply realization that there is really not much (if anything) of zarr specific (hence #41) in this library, just our custom way to get checksum over a hierarchy which is reminiscent of AWS Digest.
Inspired by the discussion/analysis in https://github.com/dandi/dandi-cli/pull/1371/files#r1420705289.
zarr_checksum
seems to be not zarr specific at all! It just estimates a checksum over a hierarchy of directories/files given a "walker" generator thus could be generalized for any hierarchy checksumming given a "walker". Moreover it is already implemented that way pretty much given theFileGenerator
interface andyield_files_s3
andyield_files_local
implementations. There is even nowzarr
dependency if possible #41Then it would become relevant to fscache where we also do folders checksumming , and somehow threaded implementation ended up slowing things down: https://github.com/con/fscacher/pull/67/files .
The text was updated successfully, but these errors were encountered: