[catalystproject-latam, unam] Enable object storage #4214

jnywong · 2024-06-12T22:33:59Z

Context

The unam community will be participating in the upcoming CAMDA competition mid-July. It would be really great to set them up for success and capture this as a demo of how we can enable bioscience workflows for a global south community.

Proposal

They had a scratch bucket set up recently, but would really benefit from a "LEAP" style method of data transfer with a persistent bucket in order to "input" data into the hub.

This relates to the workflow proposed in #4213

Updates and actions

Add the community champion to the google group so they can add their community members
Reuse any relevant info from LEAP documentation to guide the community on writing to this bucket from outside the hub

The text was updated successfully, but these errors were encountered:

sgibson91 · 2024-06-13T09:44:12Z

I will pick this up today

sgibson91 · 2024-06-13T09:53:20Z

@jnywong Following the instructions in the docs, I have created a google group that will grant permission to write to a new persistent bucket from outside the hub: https://groups.google.com/u/1/a/2i2c.org/g/persistent-unam-writers I have added you as an owner and request that you add the hub community champion to the group too (as Group Owner) so that they may add community members (as Group Members) to the group without bottle-necking on 2i2c staff doing the work.

sgibson91 · 2024-06-13T13:22:39Z

There is now a persistent bucket setup for the UNAM community

I think the remaining todo's are:

Add the community champion to the google group so they can add their community members
Reuse any relevant info from LEAP documentation to guide the community on writing to this bucket from outside the hub

jnywong · 2024-06-13T13:32:11Z

Fab! I'll pick up those tasks. Thank you Sarah ☺️

sgibson91 · 2024-06-13T14:11:32Z

I'm going to unassign myself and remove this from the engineering board - but feel free to pull me back in if something isn't working

jnywong · 2024-06-24T09:07:59Z

Issue with gcloud web app auth

Context

I am reproducing the steps in the LEAP documentation, specifically the section Uploading large original data from an HPC system (no browser access on the system available).

I have verified that the method preceding this section, Upload medium sized original data from your local machine works, so I can confirm that the bucket is public and that I can write to it from my local machine.

The issue I think is if you look closer at the first command

gcloud auth application-default login --scopes=https://www.googleapis.com/auth/devstorage.read_write,https://www.googleapis.com/auth/iam.test --no-browser

The scopes allude to iam.test, so I suspect there are specific IAM roles that need to be enabled for this to work.

Error message

gcloud storage ls $SCRATCH_BUCKET
ERROR: (gcloud.storage.ls) HTTPError 403: [email protected] does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist). This command is authenticated as [email protected] which is the active account specified by the [core/account] property.

consideRatio · 2024-06-24T09:46:50Z

I'm back at work tomorrow, but I think the key crux is the assumptions.

JupyterHub users on their user servers have credentials that can be used to work against the bucket, and temporary credentials can be extracted from there to a local computer for use up to an hour. The LEAP docs sais this:

For medium sized datasets, that can be uploaded within an hour, you can use a temporary access token generated on the JupyterHub to upload data to the cloud.
The "upload large files" etc category is probably one where this "temporary" part becomes a problem and the upload takes longer than an hour. This strategy relies on something out of the ordinary with regards to cloud permissions -- personal google account permissions setup against the cloud account.

Due to this, for this procedure to work, you need to manually add permissions to your personal google cloud account ahead of time.

consideRatio · 2024-06-24T09:49:24Z

I'm not sure if we have guidance and documentation on: if, when, and how to provide users personal cloud account access, and whats the minimal access required to work against specific buckets if that is the goal.

jnywong · 2024-06-24T10:01:20Z

The "upload large files" etc category is probably one where this "temporary" part becomes a problem and the upload takes longer than an hour. This strategy relies on something out of the ordinary with regards to cloud permissions -- personal google account permissions setup against the cloud account.

Yes, I am focused on this scenario!

I'm not sure if we have guidance and documentation on: if, when, and how to provide users personal cloud account access, and whats the minimal access required to work against specific buckets if that is the goal.

Does anyone know what was done for the LEAP hub?

jnywong · 2024-06-24T13:39:27Z

Hey @jbusecke ! I was wondering if you know anything about the above comment?

I'm working on generalising the wonderful LEAP documentation you have written for uploading large datasets from an HPC system for other 2i2c communities and ran into this issue with reproducing your workflow.

Here is a preview of what I have written so far, with my issue arising in this section.

jbusecke · 2024-06-24T13:51:41Z

Hey @jnywong, as a matter of fact one of our users also ran into this exact issue.

These docs look great btw! Once these are done I should def link them in our docs!

Unfortunately I have 0 clue how this is working behind the scenes 😩. I think @yuvipanda helped set some of this up originally, maybe he has better feedback.

jnywong · 2024-06-24T15:47:19Z

Interesting feedback about how you have seen this issue replicated elsewhere! Thank you for your insights 🙏

jbusecke · 2024-06-24T23:28:11Z

We are tracking that issue here internally
cc @suryadheeshjith

jnywong · 2024-06-25T08:46:15Z

Thanks for that @jbusecke !

Can you tell us whether this affects @suryadheeshjith only, or is this affecting every hub user?

consideRatio · 2024-06-25T09:22:35Z

Can you tell us whether this affects @suryadheeshjith only, or is this affecting every hub user?

The temporary token approach works for all JupyterHub users, but the "large files" or "more than 60 minute access" approach only works for those with direct access to a cloud account. Currently, 2i2c engineers (as defined by a GCP group, @jnywong i just added you there!) and Julius have such access to the leap's GCP project.

I think this kind of access has only been provided ad-hoc by 2i2c to individual power users like Julius, and we haven't come up with a way to do it sustainably to all users.

Leap's google cloud project and the permissions granted to various user accounts: https://console.cloud.google.com/iam-admin/iam?project=leap-pangeo
M2LINES google cloud project at the and the permissions granted to various user accounts: https://console.cloud.google.com/iam-admin/iam?project=m2lines-hub

jnywong · 2024-06-25T09:40:40Z

I suspected as much! I can confirm the workflow is working as expected. @consideRatio thank you for adding me to the GCP group, this makes it easier for me to investigate these issues myself in future.

I will document that this is not a supported feature for everyone due to the 💰 💰 💰 involved.

consideRatio · 2024-06-25T09:51:58Z

not a supported feature for everyone due to the 💰 💰 💰 involved.

The key thing isn't money for cloud resources - its just that we haven't a way of doing this in a way that scales well with regards to security and maintenance burden (so it would cost us a lot to invest 2i2c and community users time handling this currently).

The crux is that our "jupyterhub users" doesn't associate with "cloud provider users", and that makes us not able to grant direct cloud permissions to individual jupyterhub users, forcing us to create individual cloud accounts when needed. In practice from the perspective of the cloud provider when jupyterhub users access the object storage, its access made from the same cloud provider user/identity, and we haven't been giving out direct persistent access to that.

jnywong · 2024-06-25T10:01:10Z

Thanks for the explanation Erik, I will capture this insight for our Product board.

Regarding cloud permissions, do you happen to know what the Google Group @sgibson91 mentioned above is for then? Here are the relevant infrastructure docs.

jbusecke · 2024-06-25T16:46:31Z

The temporary token approach works for all JupyterHub users, but the "large files" or "more than 60 minute access" approach only works for those with direct access to a cloud account. Currently, 2i2c engineers (as defined by a GCP group, @jnywong i just added you there!) and Julius have such access to the leap's GCP project.

We actually have a google group that I manage (and where I added Surya), is that method defunct?

jnywong · 2024-06-25T17:15:31Z

@jbusecke it doesn't seem to be working as expected. #4281 will investigate 👍

sgibson91 self-assigned this Jun 13, 2024

This was referenced Jun 13, 2024

[catalystproject-latam] Update tf filestore_capacity_gb variable to match what's in state #4217

Merged

[catalystproject-latam, unam] Create a persistent bucket with outside access #4218

Merged

sgibson91 removed their assignment Jun 13, 2024

jnywong mentioned this issue Jun 20, 2024

Add documentation on managing GCP object storage 2i2c-org/docs#236

Merged

jnywong self-assigned this Jun 24, 2024

jnywong mentioned this issue Jun 25, 2024

Document use of cloud object storage 2i2c-org/docs#226

Closed

jnywong mentioned this issue Jun 25, 2024

Google group access to persistent buckets #4281

Closed

jnywong closed this as completed in 2i2c-org/docs#236 Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[catalystproject-latam, unam] Enable object storage #4214

[catalystproject-latam, unam] Enable object storage #4214

jnywong commented Jun 12, 2024 •

edited

Loading

sgibson91 commented Jun 13, 2024

sgibson91 commented Jun 13, 2024 •

edited

Loading

sgibson91 commented Jun 13, 2024 •

edited by jnywong

Loading

jnywong commented Jun 13, 2024

sgibson91 commented Jun 13, 2024 •

edited

Loading

jnywong commented Jun 24, 2024 •

edited

Loading

consideRatio commented Jun 24, 2024

consideRatio commented Jun 24, 2024

jnywong commented Jun 24, 2024

jnywong commented Jun 24, 2024

jbusecke commented Jun 24, 2024

jnywong commented Jun 24, 2024

jbusecke commented Jun 24, 2024

jnywong commented Jun 25, 2024

consideRatio commented Jun 25, 2024

jnywong commented Jun 25, 2024

consideRatio commented Jun 25, 2024 •

edited

Loading

jnywong commented Jun 25, 2024

jbusecke commented Jun 25, 2024

jnywong commented Jun 25, 2024

[catalystproject-latam, unam] Enable object storage #4214

[catalystproject-latam, unam] Enable object storage #4214

Comments

jnywong commented Jun 12, 2024 • edited Loading

Context

Proposal

Updates and actions

sgibson91 commented Jun 13, 2024

sgibson91 commented Jun 13, 2024 • edited Loading

sgibson91 commented Jun 13, 2024 • edited by jnywong Loading

jnywong commented Jun 13, 2024

sgibson91 commented Jun 13, 2024 • edited Loading

jnywong commented Jun 24, 2024 • edited Loading

Issue with gcloud web app auth

Context

Error message

consideRatio commented Jun 24, 2024

consideRatio commented Jun 24, 2024

jnywong commented Jun 24, 2024

jnywong commented Jun 24, 2024

jbusecke commented Jun 24, 2024

jnywong commented Jun 24, 2024

jbusecke commented Jun 24, 2024

jnywong commented Jun 25, 2024

consideRatio commented Jun 25, 2024

Related

jnywong commented Jun 25, 2024

consideRatio commented Jun 25, 2024 • edited Loading

jnywong commented Jun 25, 2024

jbusecke commented Jun 25, 2024

jnywong commented Jun 25, 2024

jnywong commented Jun 12, 2024 •

edited

Loading

sgibson91 commented Jun 13, 2024 •

edited

Loading

sgibson91 commented Jun 13, 2024 •

edited by jnywong

Loading

sgibson91 commented Jun 13, 2024 •

edited

Loading

jnywong commented Jun 24, 2024 •

edited

Loading

consideRatio commented Jun 25, 2024 •

edited

Loading