Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support public s3 files automatically #67

Merged
merged 5 commits into from
Aug 15, 2024

Conversation

SeanLeRoy
Copy link
Contributor

@SeanLeRoy SeanLeRoy commented Aug 15, 2024

Link to Relevant Issue

This pull request resolves #61

Description of Changes

When reading from an S3 protocol our current file system handlers require either explicit credentials passed in or an explicit declaration of an "anonymous" reader. This change makes it so we try to read a file given the current fs_kwargs (used by the readers for S3) and if that fails we try to read again as an explicitly "anonymous" reader.

Additionally I made it so we only create a text file once and only install the dummy-plugin once because I felt running the tests was rather slow and this sped it up IMO.

Testing

Added a unit test, unable to get this to work the S3 file I was provided s3://allencell/aics/emt_timelapse_dataset/data/3500005548_25_all_cells_mask.ome.zarr

bioio-ome-zarr has unreleased changes, when using the newest version of bioio-ome-zarr that is unreleased it works, so I am working on releasing that now but should be GTG after that

@SeanLeRoy SeanLeRoy self-assigned this Aug 15, 2024
@SeanLeRoy SeanLeRoy force-pushed the feature/support-public-s3-files-automatically branch from fe11a3f to 8e2917a Compare August 15, 2024 17:26
@SeanLeRoy SeanLeRoy marked this pull request as ready for review August 15, 2024 21:41
@SeanLeRoy SeanLeRoy requested a review from a team as a code owner August 15, 2024 21:41
example_file = tmp_path / "temp-example.txt"
example_file.write_text("just some example text here")
return example_file
yield example_file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to main changeset, this is just to make the tests faster by only creating the file once since they seemed slow

@pytest.fixture
def dummy_plugin() -> typing.Generator[str, None, None]:
with InstallPackage(package_path=DUMMY_PLUGIN_PATH, package_name=DUMMY_PLUGIN_NAME):
yield DUMMY_PLUGIN_NAME
Copy link
Contributor Author

@SeanLeRoy SeanLeRoy Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to main changeset, this is just to make the tests faster by only installing the plugin once and re-using this fixture since they seemed slow

image,
reader,
use_plugin_cache,
fs_kwargs | {"anon": True},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implication here, then, is that if you passed fs_kwargs={anon:True} in the initial bioImage constructor then s3 reads do work?

Copy link
Contributor Author

@SeanLeRoy SeanLeRoy Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 👍 (and I believe were we to always pass True in the initial constructor then users that need to auth like for private buckets would fail, haven't tested that explicitly yet)

Copy link
Contributor

@toloudis toloudis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I do wonder if we can handle the exception at an even lower level to avoid needing to run determine_plugin twice, but it's probably not a big deal for performance.

@SeanLeRoy
Copy link
Contributor Author

bioio-ome-zarr at version 1.1.0 has been released and works with this :) it seems the publish workflow for bioio-ome-zarr was broken from this change bioio-devs/bioio-ome-zarr#26 so I fixed it with bioio-devs/bioio-ome-zarr@0f3976d and now we are all good

@SeanLeRoy
Copy link
Contributor Author

Looks good! I do wonder if we can handle the exception at an even lower level to avoid needing to run determine_plugin twice, but it's probably not a big deal for performance.

Yeah I couldn't find a better place other than inside each individual plug-in and I thought it would be nicer to take the hit here for S3 files rather than be forced to do in in each plugin. Totally down to try something else out though if someone has an idea of where this could go it does seem like a performance bummer to have to straight up retry the read.

@toloudis
Copy link
Contributor

Yeah I couldn't find a better place other than inside each individual plug-in and I thought it would be nicer to take the hit here for S3 files rather than be forced to do in in each plugin. Totally down to try something else out though if someone has an idea of where this could go it does seem like a performance bummer to have to straight up retry the read.

one thought: Can we tell if we need the anon key by checking early for "s3://" in the path and looking at what fs_kwargs were passed in? If anon is omitted, I assume the s3fs just looks for environment variables with credentials or aws credential files. So maybe it's hard for us to make a good guess.

Copy link
Contributor

@BrianWhitneyAI BrianWhitneyAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for fixing up the dummy reader stuff :)

@SeanLeRoy
Copy link
Contributor Author

I assume the s3fs just looks for environment variables with credentials or aws credential files. So maybe it's hard for us to make a good guess.

I believe this is true, I think it looks at the same cred file s3 CLI and boto3 look at under /.aws/credentials

@SeanLeRoy SeanLeRoy merged commit 3c6097a into main Aug 15, 2024
20 checks passed
@SeanLeRoy SeanLeRoy deleted the feature/support-public-s3-files-automatically branch August 15, 2024 23:58
@pgarrison
Copy link

Thanks @SeanLeRoy! So glad to see this taken care of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read from public s3:// paths without authentication, without requiring any code change from users
4 participants