-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AWS S3 access #25
Comments
Have a version running that passes all tests with regular OS filesystem access and using Zipped sets of files (useful for test fixtures with empty dirs that git doesn't support). Found a gotcha with S3 support via S3FS in that it assumes that there are "directory objects" to help simulate a filesystem (noted in https://fs-s3fs.readthedocs.io/en/latest/#limitations). However, it would be good to be able to validate OCFL objects and storage roots on S3 that do not include ""directory objects". Solution may be to create the S3FS object with |
Maybe it is possible to pass the |
Have merged in |
We are currently designing a process to validate ~11M OCFL objects that are stored behind an S3 interface. It would be ideal if we were able to perform OCFL object-level validation using ocfl-py without requiring copying of the objects to local disk first. Assuming access to large-scale memory for this validation process, is it conceivable for ocfl-py to be enhanced to support this use case? |
I think the current version 1.3.0 on pypi might actually work. For a single object you could try something like:
I did write a sketchy version of walk that uses pyfilesystem2 so even a bulk validation might work though I think trying to do a run over 11M objects without some means of checkpoint/restart would be a frustrating experience. I am in the middle of working on a major refactor for v2 of |
re: "open S3 endpoint with a copy of the fixture objects" |
After a bit of time messing about, the current dev code does validate and appropriately fail respectively for two objects on S3:
|
I'm going to close this in favor of #133 to avoid giving the impression that S3 is not supported. |
See
pyfilesystem2
branch for work to change over to use PyFilesystem for all file access. This should enable the code to work with regular OS filesystems, S3 and Zipped filesystems among others.The text was updated successfully, but these errors were encountered: