-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File sources for gdrive, gcs, onedata, basespace #12500
Conversation
Have also added a filesource for basespace. However, the python basespacesdk itself was last released in 2018 and the pyfilesystem2 plugin needed some patches for things to work. In addition, the basespacesdk appears to ignore the provided credentials and expect them in: '/home/.basespace/default.cfg'. That too requires a patch: basespace/basespace-python-sdk#32 Once configured though, it does work fine. One issue is that basespace returns a numeric id for the filename, and the actual filename as 'alias': https://github.com/emedgene/fs_basespace/blob/bb560444bdec3cbcbc5d66344f877bd823a4bfb6/fs_basespace/_basespacefs.py#L132 This makes it difficult to use in practice. @jmchilton Any thoughts on this? |
f7d58ce
to
1e9d5ce
Compare
1e9d5ce
to
6450d41
Compare
This should now be good to go. The cloudbridge update has been made and upstream libraries fixed. Onedata continues to have the installation and configuration issue, but this can in principle be done, even if the process is a bit tedious, so I'd propose we merge this anyway in the hope that those issues are resolved upstream at a later date. Adding oauth support to streamline the process of obtaining tokens etc. would also be a future enhancement. |
0f48e65
to
3ee1042
Compare
@nuwang good stuff! I tested out GCS/GDrive access, and had some success and some issues; preliminary thoughts: Google Drive:
GCS:
I will keep testing; please let me know if there is something I am missing as its entirely possible i've just borked up my configuration |
@luke-c-sargent Thanks for testing. The google drive issues is more of an upstream issue I guess: https://github.com/rkhwaja/fs.googledrivefs/blob/9568eb49a9084d4d9751eb27fb558ee77558cbfc/fs/googledrivefs/googledrivefs.py#L427 Regarding GCS, this will work if you plug in your OIDC credentials. However, there is an anonymous access mode for public buckets, let me see whether I can activate that, but we'll need an extra setting like: |
👍 issue filed. TL;DR: they could detect mimetype and change the download mechanism for google native items easily enough; the only considerations i can see are a) what type to export stuff to (e.g. sheets as csv or excel?) and b) what to do about files >10MB that
ahh i only used parameters from one additional note - when using an old token the refresh fails with:
you mentioned that this is preliminary work that requires a better OIDC solution ...is the refresh token a placeholder for future functionality or have i misconfig'd something again? |
3ee1042
to
4183fee
Compare
@luke-c-sargent Thanks for filing the issue, looks like it's already been fixed since! Have added back client_id, client_secret and token_uri, so that should now be specifiable along with the refresh token. Also, I've added an |
1afb02b
to
0f46edc
Compare
760e90c
to
2188b26
Compare
@mvdbeek Would you be able to give a quick once over on this? In particular, whether the regeneration of requirements looks ok? |
pyproject.toml
Outdated
@@ -116,3 +116,4 @@ testfixtures = "*" | |||
tuspy = "*" | |||
twill = "*" | |||
watchdog = "*" | |||
fs-gcsfs = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be a conditional requirement and not appear in pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I specify it as a conditional requirement only for tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd maybe do it the other way round and skip the test if the dependency can't be imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you suggesting not running the test by default? Since it's accessing a public bucket, the test can be run successfully, provided fs-gcsfs
is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally unit tests shouldn't contact external services. We can run release tests with the conditional dependencies installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, fs-gcsfs
appears to have been correctly included in dev-requirements, but not pinned requirements. Isn't that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we already do that for some other tests, I guess that is just a personal preference. If you want to run this one then this is ok and we do need to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rolled back the always on gcsfs test in favour of skipping the test if the dependency is unavailable.
) | ||
|
||
|
||
@skip_if_no_basespace_access_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added unit tests are all using the same test structure, can you parameterize them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think they can be. It's something that's generally the case for many file source tests. Maybe a refactoring run that should be done outside of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you at least push _configured_file_sources
into the helper ? That is literally the same in all those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have refactored the tests and helpers. Can you take a look?
2188b26
to
2be4b40
Compare
2be4b40
to
0924628
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, I really appreciate the test simplification
This PR partly addresses: #11784
It includes filesources for google drive, google cloud storage and onedata. While these work in varying degrees, they all have issues that hamper their use.
Google Drive
Google drive in particular requires an oauth2 flow to obtain user credentials. Without adding client side support to authorize access, these are extremely tedious for end-users to obtain.
Google Cloud Storage
Google cloud storage also requires oauth2 credentials, but presumably, these could be configured by admins as opposed to end-users. It should also work without credentials in Google cloud. However, there's a library clash with CloudVE/cloudbridge#275 which will be sorted out once a new version is released.
One Data
Onedata does not work because of: onedata/fs-onedatafs#5
Should it be resolved, the code should in principle work, but I've squashed it to one commit so we can drop it easily.
BaseSpace
See separate comment below.
Summary
There are concerns about the practical usability of all of these providers unfortunately. We should probably add some kind of oauth2 authorization mechanisms to smoothen the flow. That would roughly entail:
How to test the changes?
(Select all options that apply)
License