-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG-4624] [S3 Improvements] Project PR - Waterbutler Part #406
Conversation
821463a
to
253dff0
Compare
253dff0
to
e627806
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some confusion in the path math for metadata. It might need some adjustments. It'd probably also be good to add some tests for metadata listings when the prefix is a bucket subfolder.
waterbutler/providers/s3/provider.py
Outdated
@@ -685,7 +688,11 @@ async def _metadata_file(self, path, revision=None): | |||
async def _metadata_folder(self, path): | |||
await self._check_region() | |||
|
|||
params = {'prefix': path.path, 'delimiter': '/'} | |||
# The user selected base folder, the root of the where that user's node is connected. | |||
prefix = self.settings['id'].split(':/')[1] if path == '/' and self.settings.get('id') else path.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure this is buggy. The two sides of this if
statement are not equivalent. The LHS side is the storage provider prefix. While WB needs this to send meaningful queries to S3, it is not something that we (directly) expose to the user. The RHS is the file/folder path that we return to the user in the WB metadata responses. If the prefix is /
i.e. not a subfolder, these two things end up being equivalent. But if the storage provider root folder is a subfolder, e.g. /foo/
, then the LHS is /foo/
and the RHS is /bar/
.
I'm not a 100% convinced I explained this correctly, so maybe an example would be better. On staging, connect a subfolder (for this example, lets call it blemmo
) of a bucket to a node. Then fetch the WB metadata listing for the root folder (e.g. https://files.us.staging.osf.io/v1/resources/mst3k/providers/s3/?meta
). Not only will blemmo
show up in the file/folder path metadata, but you can get the same response by appending the prefix name to the url (https://files.us.staging.osf.io/v1/resources/mst3k/providers/s3/blemmo/?meta
). That's weird.
WB file/folder metadata is defined relative to the storage provider prefix. We treat the prefix as the virtual root. If the prefix is the bucket root folder, then YAY, no extra work for us. If not, we need to do the path math to make sure our paths are adjusted properly.
I think the thing we need to be looking at for this is the The other thing i would recommend is to go ahead and do the |
More Path Fixes * fix issues with relative paths * fix tests for s3 subfolder improvements * fix dest path for s3 * take base_folder as part of __init__ * fix mocking for tests --------- Co-authored-by: John Tordoff <[email protected]>
Purpose
Enables WB to support both bucket-root and subfolder-root configuration for S3.
Credit @Johnetordoff for all the work 👍
Project notion: https://www.notion.so/cos/S3-improvements-476a5b07cb7e4f458e9cd4c77cfa03ec
OSF Part: CenterForOpenScience/osf.io#10416
Changes
:/
as the new delimiter for S3 bucketDevOps Notes
Dev Notes
Here are all child PRs that have been merged into this feature branch.
:/
)QA Notes
See QA docs in the project notion page
Documentation
Update our developers doc for S3 (if any)
Side Effects
boto
and thus won't support new regions added byboto3
.Ticket
https://openscience.atlassian.net/browse/ENG-4624