-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[catalystproject-latam, unam] Enable object storage #4214
Comments
I will pick this up today |
@jnywong Following the instructions in the docs, I have created a google group that will grant permission to write to a new persistent bucket from outside the hub: https://groups.google.com/u/1/a/2i2c.org/g/persistent-unam-writers I have added you as an owner and request that you add the hub community champion to the group too (as Group Owner) so that they may add community members (as Group Members) to the group without bottle-necking on 2i2c staff doing the work. |
There is now a persistent bucket setup for the UNAM community I think the remaining todo's are:
|
Fab! I'll pick up those tasks. Thank you Sarah |
I'm going to unassign myself and remove this from the engineering board - but feel free to pull me back in if something isn't working |
Issue with gcloud web app authContextI am reproducing the steps in the LEAP documentation, specifically the section Uploading large original data from an HPC system (no browser access on the system available). I have verified that the method preceding this section, Upload medium sized original data from your local machine works, so I can confirm that the bucket is public and that I can write to it from my local machine. The issue I think is if you look closer at the first command
The scopes allude to Error messagegcloud storage ls $SCRATCH_BUCKET
ERROR: (gcloud.storage.ls) HTTPError 403: [email protected] does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist). This command is authenticated as [email protected] which is the active account specified by the [core/account] property. |
I'm back at work tomorrow, but I think the key crux is the assumptions.
|
I'm not sure if we have guidance and documentation on: if, when, and how to provide users personal cloud account access, and whats the minimal access required to work against specific buckets if that is the goal. |
Yes, I am focused on this scenario!
Does anyone know what was done for the LEAP hub? |
Hey @jbusecke ! I was wondering if you know anything about the above comment? I'm working on generalising the wonderful LEAP documentation you have written for uploading large datasets from an HPC system for other 2i2c communities and ran into this issue with reproducing your workflow. Here is a preview of what I have written so far, with my issue arising in this section. |
Hey @jnywong, as a matter of fact one of our users also ran into this exact issue. These docs look great btw! Once these are done I should def link them in our docs! Unfortunately I have 0 clue how this is working behind the scenes 😩. I think @yuvipanda helped set some of this up originally, maybe he has better feedback. |
Interesting feedback about how you have seen this issue replicated elsewhere! Thank you for your insights 🙏 |
We are tracking that issue here internally |
Thanks for that @jbusecke ! Can you tell us whether this affects @suryadheeshjith only, or is this affecting every hub user? |
The temporary token approach works for all JupyterHub users, but the "large files" or "more than 60 minute access" approach only works for those with direct access to a cloud account. Currently, 2i2c engineers (as defined by a GCP group, @jnywong i just added you there!) and Julius have such access to the leap's GCP project. I think this kind of access has only been provided ad-hoc by 2i2c to individual power users like Julius, and we haven't come up with a way to do it sustainably to all users. Related
|
I suspected as much! I can confirm the workflow is working as expected. @consideRatio thank you for adding me to the GCP group, this makes it easier for me to investigate these issues myself in future. I will document that this is not a supported feature for everyone due to the 💰 💰 💰 involved. |
The key thing isn't money for cloud resources - its just that we haven't a way of doing this in a way that scales well with regards to security and maintenance burden (so it would cost us a lot to invest 2i2c and community users time handling this currently). The crux is that our "jupyterhub users" doesn't associate with "cloud provider users", and that makes us not able to grant direct cloud permissions to individual jupyterhub users, forcing us to create individual cloud accounts when needed. In practice from the perspective of the cloud provider when jupyterhub users access the object storage, its access made from the same cloud provider user/identity, and we haven't been giving out direct persistent access to that. |
Thanks for the explanation Erik, I will capture this insight for our Product board. Regarding cloud permissions, do you happen to know what the Google Group @sgibson91 mentioned above is for then? Here are the relevant infrastructure docs. |
We actually have a google group that I manage (and where I added Surya), is that method defunct? |
Context
The
unam
community will be participating in the upcoming CAMDA competition mid-July. It would be really great to set them up for success and capture this as a demo of how we can enable bioscience workflows for a global south community.Proposal
They had a scratch bucket set up recently, but would really benefit from a "LEAP" style method of data transfer with a persistent bucket in order to "input" data into the hub.
This relates to the workflow proposed in #4213
Updates and actions
The text was updated successfully, but these errors were encountered: