Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting RAPIDS libraries as part the images #37

Closed
chinmaychandak opened this issue Jan 15, 2021 · 9 comments
Closed

Supporting RAPIDS libraries as part the images #37

chinmaychandak opened this issue Jan 15, 2021 · 9 comments

Comments

@chinmaychandak
Copy link

chinmaychandak commented Jan 15, 2021

Thank you for such an amazing project!

I want to use JupyterHub on Kubernetes to serve JupyterLab workspaces to multiple users. All users must be able to use the NVIDIA RAPIDS suite of GPU-accelerated Python Data Science libraries out-of-the-box.

However, none of the available Jupyter Docker Stacks provide RAPIDS yet.

I had a couple of questions:

  1. Does the current gpu-jupyter DockerHub image work with JupyterHub out-of-the-box? I mean, if I just provide gpu-jupyter as a base image in the JupyterHub Helm config, will it work automatically? [EDIT: I was able to do this successfully! :)]
  2. Can someone help in adding RAPIDS to the existing gpu-jupyter image, or direct me as to how I can build a custom image myself?

Any help would be greatly appreciated!

@ChristophSchranz
Copy link
Collaborator

Hi @chinmaychandak ,

I want to use JupyterHub on Kubernetes to serve JupyterLab workspaces to multiple users. All users must be able to use the NVIDIA RAPIDS suite of GPU-accelerated Python Data Science libraries out-of-the-box.

However, none of the available Jupyter Docker Stacks provide RAPIDS yet.

RAPIDS, as well as JupyterHub would be definitely nice here, yes.

I had a couple of questions:

1. Does the current [gpu-jupyter DockerHub image](https://hub.docker.com/r/cschranz/gpu-jupyter) work with JupyterHub out-of-the-box? I mean, if I just provide gpu-jupyter as a base image in the JupyterHub Helm config, will it work automatically? [EDIT: I was able to do this successfully! :)]

Cool, I've surrended in setting this up. Would you like to share your solution? Is it possible to switch between a JupyterLab and JupyterHub, similarly than between JupyterLab and Jupyter Notebook? The optimal result would be, if one can select the preferred interface.

2. Can someone help in adding RAPIDS to the existing gpu-jupyter image, or direct me as to how I can build a custom image myself?

Maybe the Dockerfile, instructions in RAPIDS ' Dockerfile help. They build onto the same base image, therefore it could work. Have you thought about the way optionally selects RAPIDS on top of the basic packages. It could be done in generate_Dockerfile.sh as optional parameter, e.g., generate_Dockerfile.sh --include RAPIDS which appends the RAPIDS's installations to .build/Dockerfile.

By the way, in issue 27 @mathematicalmichael and I have brainstormed how to set up GPU-Juypter such that the users can indiviually select their existing versions and desired software. We have thought about an interface similar than this on RAPIDS. Maybe some advanced packages like JupyterHub and RAPIDS could be managed comfortably using such an interface.

@chinmaychandak
Copy link
Author

Thank you so much for responding, @ChristophSchranz!

RAPIDS, as well as JupyterHub would be definitely nice here, yes.

Brilliant!

Would you like to share your solution?

I wrote a custom Helm Chart config to deploy JupyterHub on K8s here. This uses gpu-jupyter as a base image when spawning out new Jupyter workspaces to new users. It also installs RAPIDS, but currently it's a hacky way. I am going to work on creating a dedicated RAPIDS Docker stack for JupyterHub.

If one has a GPU K8s cluster running, deploying this Helm Chart is literally a single command, and it sets up everything nicely.

Is it possible to switch between a JupyterLab and JupyterHub, similarly than between JupyterLab and Jupyter Notebook? The optimal result would be, if one can select the preferred interface.

Yes, definitely! This is already inbuilt: Users can just change the URL to /tree OR /lab to switch b/w the Jupyter Notebook and JupyterLab interfaces in any spawned workspace.

They build onto the same base image, therefore it could work.

Yes, exactly! Going to work on that this week. :)

Have you thought about the way optionally selects RAPIDS on top of the basic packages. It could be done in generate_Dockerfile.sh as optional parameter, e.g., generate_Dockerfile.sh --include RAPIDS which appends the RAPIDS's installations to .build/Dockerfile

Yes, that is indeed an idea! We could just add this to .build/Dockerfile, maybe?

Maybe some advanced packages like JupyterHub and RAPIDS could be managed comfortably using such an interface.

Yes, JupyterHub can definitely help there. In the Helm Config I linked above, do see the profiles section wherein different base images can be used to spawn Jupyter Workspaces. I use RAPIDS in the "RAPIDS Profile" and Spark in the "Spark-ML-ETL Profile"; users can choose which profile they want whenever they log into JupyterHub for the first time.

@ChristophSchranz
Copy link
Collaborator

Okay, so this is a single-user mode of JupyterHub, right? How can we use that for multiple users on a single notebook?

Yes, that is indeed an idea! We could just add this to .build/Dockerfile, maybe?

The file .build/Dockerfile is built, modifications are made in src, here for RAPIDS I would suggest src/Dockerfile.gpulibs.

Yes, exactly! Going to work on that this week. :)

Great! I'm looking forward to the results :)

@mathematicalmichael
Copy link
Contributor

I didn't realize rapids built on the same base image! thanks for looking into it!

It does look like conda installs are an easy way to get it working in this environment. I think that's a low-friction approach and am curious how it'll work!

@mathematicalmichael
Copy link
Contributor

also a note on architecture: the way that containerized jupyterhub with dockerspawner works is that it needs no knowledge of GPU libraries to work. It's just responsible for authentication and selecting which image to spawn. This project provides those images in such a way that are compatible with what the Hub is expecting. I would fully expect jupyterhub + dockerspawner to work as a multi-user application regardless of which libraries are added into the base images, gpu-based or not.

@chinmaychandak
Copy link
Author

Okay, so this is a single-user mode of JupyterHub, right? How can we use that for multiple users on a single notebook?

I'm not sure I understand. Every time a new user logs into JupyterHub, they get their own personal, persistent across sessions, JupyterLab (& Notebook) workspace. Even their Conda environments can be preserved across sessions. Every such user workspace comes with all libraries gpu-jupyter installs as well as RAPIDS which I am installing currently as part of a lifecycleHook via conda.

If users need to access each other's workspaces to collaborate, we can always mount an additional persistent volume on the K8s cluster on a shared location.

It does look like conda installs are an easy way to get it working in this environment. I think that's a low-friction approach and am curious how it'll work!

Yes, I agree. As I mentioned above, I did get it working using this Helm config, but I think a dedicated RAPIDS Jupyter docker stack would be really helpful, which is what I will be working on soon. :)

@mathematicalmichael
Copy link
Contributor

@chinmaychandak I havent had time to look into this, so apologies for the perhaps obvious questions, but AFAIK the RAPIDS containers come with jupyter notebook/lab + a bunch of common libraries. What's missing from their stack that you've got in mind?

I think these pre-made stacks are helpful for setting up a dev environment but still advocate for custom-builds that are minimal in size once it comes time to ship.

@chinmaychandak
Copy link
Author

An update from my side:

With a few tweaks, I was able to get the existing RAPIDS runtime DockerHub images to work with JupyterHub on a K8s cluster. :)

I've published the Helm chart config I was able to use here.

RAPIDS will also be publishing a Medium blog on how users can get started with using RAPIDS w/ JupyterHub. Hopefully that will help some folks in the future!

Thanks a lot everyone for all the help here! Really appreciate it.

@ChristophSchranz
Copy link
Collaborator

Closing as no contribution for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants