Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENG-4624] [S3 Improvements] Project PR - Waterbutler Part #406

Merged
merged 5 commits into from
Aug 17, 2023

Conversation

cslzchen
Copy link
Contributor

@cslzchen cslzchen commented Jul 26, 2023

Purpose

Enables WB to support both bucket-root and subfolder-root configuration for S3.

Credit @Johnetordoff for all the work 👍

Project notion: https://www.notion.so/cos/S3-improvements-476a5b07cb7e4f458e9cd4c77cfa03ec
OSF Part: CenterForOpenScience/osf.io#10416

Changes

  • Enables WB to support both bucket-root and subfolder-root configuration for S3
  • Updated WB to support :/ as the new delimiter for S3 bucket
  • Added/Updated unit tests

DevOps Notes

  • Deployment requires downtime, must be deployed at the same time as OSF
  • For details, see our release playbook (TBD)

Dev Notes

Here are all child PRs that have been merged into this feature branch.

QA Notes

See QA docs in the project notion page

Documentation

Update our developers doc for S3 (if any)

Side Effects

  • WB still uses boto and thus won't support new regions added by boto3.

Ticket

https://openscience.atlassian.net/browse/ENG-4624

@coveralls
Copy link

coveralls commented Jul 26, 2023

Coverage Status

coverage: 88.925% (-0.09%) from 89.014% when pulling 7683994 on feature/s3-improvements into 2745780 on develop.

@cslzchen cslzchen changed the title [S3 Improvements] Reference PR: feature -> develop [ENG-4624] [S3 Improvements] Project PR - Waterbutler Part Aug 7, 2023
@cslzchen cslzchen marked this pull request as ready for review August 8, 2023 12:50
@mfraezz mfraezz requested a review from felliott August 8, 2023 17:07
Copy link
Member

@felliott felliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some confusion in the path math for metadata. It might need some adjustments. It'd probably also be good to add some tests for metadata listings when the prefix is a bucket subfolder.

@@ -685,7 +688,11 @@ async def _metadata_file(self, path, revision=None):
async def _metadata_folder(self, path):
await self._check_region()

params = {'prefix': path.path, 'delimiter': '/'}
# The user selected base folder, the root of the where that user's node is connected.
prefix = self.settings['id'].split(':/')[1] if path == '/' and self.settings.get('id') else path.path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure this is buggy. The two sides of this if statement are not equivalent. The LHS side is the storage provider prefix. While WB needs this to send meaningful queries to S3, it is not something that we (directly) expose to the user. The RHS is the file/folder path that we return to the user in the WB metadata responses. If the prefix is / i.e. not a subfolder, these two things end up being equivalent. But if the storage provider root folder is a subfolder, e.g. /foo/, then the LHS is /foo/ and the RHS is /bar/.

I'm not a 100% convinced I explained this correctly, so maybe an example would be better. On staging, connect a subfolder (for this example, lets call it blemmo) of a bucket to a node. Then fetch the WB metadata listing for the root folder (e.g. https://files.us.staging.osf.io/v1/resources/mst3k/providers/s3/?meta). Not only will blemmo show up in the file/folder path metadata, but you can get the same response by appending the prefix name to the url (https://files.us.staging.osf.io/v1/resources/mst3k/providers/s3/blemmo/?meta). That's weird.

WB file/folder metadata is defined relative to the storage provider prefix. We treat the prefix as the virtual root. If the prefix is the bucket root folder, then YAY, no extra work for us. If not, we need to do the path math to make sure our paths are adjusted properly.

@felliott
Copy link
Member

felliott commented Aug 9, 2023

I think the thing we need to be looking at for this is the prepend kwarg to WaterButlerPath. When building a WBPath object, the first arg is the relative path (/bar/baz.txt) and the prepend kwarg is the storage provider root (/foo/). Such a WBPath would represent the file /foo/bar/baz.txt on the storage provider, but it's path metadata will be reported as /bar/baz.txt since that is the part relative to the project root. See the owncloud provider for a reference. owncloud is both path-based and supports subfolder roots and is probably the closest thing to what you want to do here.

The other thing i would recommend is to go ahead and do the {bucket}:{storage_root} split in the init method. You can save the results in attributes and refer to them throughout the code.

More Path Fixes

* fix issues with relative paths
* fix tests for s3 subfolder improvements
* fix dest path for s3
* take base_folder as part of __init__
* fix mocking for tests

---------

Co-authored-by: John Tordoff <[email protected]>
@cslzchen
Copy link
Contributor Author

fyi, Johnny's response to CR #409 has been merged as 7683994

@cslzchen cslzchen merged commit d5dab36 into develop Aug 17, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants