-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Walker for key-value-store filesystems #45
Comments
Not that I know of, but it would be a good idea. There is a Let me know if you need any help with that. I would almost certainly want to borrow the implementation for S3FS. BTW If you are copying files, the slow walking is somewhat ameliorated by the multi-threaded copying. Since the walking can be done in the background.. |
Cool, thanks for the tips, I'll give it a try! The walking is slow in my use case because I am walking over deeply "nested" keys. For every level, a separate request is sent to GCS which is, of course, a lot slower than retrieving the keys in large batches and "faking" the (path, dirs, files) tuple under the hood. |
Unfortunately, it's a little more complicated than I thought. For example: The first element returned by walk is supposed to contain all
Unfortunately there is no real way to be smart here, one can not anticipate how many files or folders are in a bucket and which algorithm will be faster/make more sense. In general, if you know that you will need to walk the entire fs anyway, option 2 will be a lot faster (which is my use case). I don't think it should be the default walker though. |
Has there any work been done towards a custom Walker for key-value-store filesystems? Walking with the standard Walker is extremely slow on my gcsfs implementation because of all the requests and I can imagine it's the same on s3fs.
Walking on buckets could be implemented fairly efficient because it comes down to something like
bucket.list()
and one would just need to format the walk output correctly. This way we would need way less S3/GCS calls. Am I missing something here or is this correct?Are there currently any custom Walker implementations? And where would such a custom Walker live? In the main pyfilesystem2 repo?
Thanks! :)
The text was updated successfully, but these errors were encountered: