Skip to content

JupyterHub Binder deployment strategies on AWS

Bhavya Kandimalla edited this page Sep 30, 2019 · 32 revisions

Notes for deploying JupyterHub/Binder on AWS

JupyterHub Binder

Binder

What is JupyterHub and Chosing a Distribution for Deployment on AWS

(https://jupyter.org/hub ; https://tljh.jupyter.org/en/latest/topic/whentouse.html#topic-whentouse)

What is JupyterHub and key features of JupyterHub can be found at the first link above.

Information about distributions of JupyterHub and choosing a distrubution of JupyterHub can be found and the first and second link above.

There are two distributions: Kubernetes and Littlest

  1. Kubernetes -
  • allows JupyterHub to scale to many thousands of users
  • can flexibly grow/shrink the size of resources it needs
  • uses container technology (Docker) in administering user sessions
  • allows users to interact with a computing environment through a webpage - makes it is easy to provide and standardize the computing environment of a group of people
  1. Littlest -
  • also known as The Littlest JupyterHub (TLJH)
  • an opinionated and pre-configured distribution to deploy a JupyterHub on a single virtual machine (in the cloud or on your own hardware)
  • designed to be a more lightweight and maintainable solution for use-cases where size, scalability, and cost-savings are not a huge concern
  • distribution for a small (0-100) number of users

Although we are testing with 1-5 users, we are chosing Kubernetes deployment because we are spreading users on a cluster of smaller machines that are scaled up or down, and we need to be able to run containers (docker or singularity). This will also allow us to scale up users as needed.


1. select amazon image machine and instance

open ports 80, 443, and 22

2. extend AWS instance volume size

reference: https://hackernoon.com/tutorial-how-to-extend-aws-ebs-volumes-with-no-downtime-ec7d9e82426e

a. login to AWS console
b. choose "EC2" from services list
c. click on "Volumes" under ELASTIC BLOCK STORE menu (on the left)
d. choose the volume to resize, right click on "Modify Volume"
e. set the new size for volume
    `# extend from 8GB to 50GB`
    `# need to at least ~15-20GB`
f. click on modify
g. make sure partition is extended
    `lsblk`
    -OR-
    `df -h`
    `# if partition is not extended see reference`


a. login to your AWS console
b. choose “EC2” from the services list
c. click on “Volumes” under ELASTIC BLOCK STORE menu (on the left)
d. choose the volume that you want to resize, right click on “Modify Volume”
d. set the new size for your volume 
    # extended from 8GB to 50GB
    # need at least ~15-20GB
e. click on modify
f. make sure partition is extended
    lsblk
    OR
    df -h
    # if partition is not extended see reference
  1. set up lets encrypt & nginx reference: https://github.com/dandi/infrastructure/wiki/Girder-setup-on-aws

    install pre-requisites: apt-get update && apt-get upgrade -y #update package list apt-get install -y git python3.7 python3-setuptools python3-7-pip nginx vim fail2ban

    setup nginx: vim /etc/nginx/sites-enabled/hub.dandiarchive.org

     edit nginx site file:
         reference: https://jupyterhub.readthedocs.io/en/stable/reference/config-proxy.html
         reference: https://jupyterhub.readthedocs.io/en/stable/reference/config-proxy.html)
    
         # top-level http config for websocket headers
         # If Upgrade is defined, Connection = upgrade
         # If Upgrade is empty, Connection = close
         map $http_upgrade $connection_upgrade {
             default upgrade;
             ''      close;
         }
    
         server {
                 server_name    hub.dandiarchive.org;
             location / {
         #          proxy_pass http://localhost:8080/;
                 proxy_pass http://localhost:8000/;
                 proxy_set_header Host $host;
                 proxy_set_header X-Real-IP $remote_addr;
                 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    
                         # websocket headers
                 proxy_set_header Upgrade $http_upgrade;
                 proxy_set_header Connection $connection_upgrade;
             }
    
             listen 443 ssl; # managed by Certbot
             ssl_certificate /etc/letsencrypt/live/hub.dandiarchive.org/fullchain.pem; # managed by Certbot
             ssl_certificate_key /etc/letsencrypt/live/hub.dandiarchive.org/privkey.pem; # managed by Certbot
             include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
             ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
             
             ssl_session_cache shared:SSL:50m;
             ssl_stapling on;
             ssl_stapling_verify on;
             add_header Strict-Transport-Security max-age=15768000;
         }
    
         server {
             if ($host = hub.dandiarchive.org) {
                 return 301 https://$host$request_uri;
             } # managed by Certbot
    
             listen 80;
             server_name    hub.dandiarchive.org;
             return 404; # managed by Certbot
         }
    

    restart nginx: nginx -t # test nginx configuration service nginx restart # restart nginx service nginx status # check nginx status

    setup lets encrypt: apt-get install -y software-properties-common add-apt-repository universe add-apt-repository -y ppa:certbot/certbot apt-get update apt-get install -y certbot python-certbot-nginx certbot --nginx

  2. install docker reference: https://phoenixnap.com/kb/install-kubernetes-on-ubuntu

    apt-get update && apt-get upgrade -y # update package list apt-get install docker.io docker -v # check docker version systemctl enable docker # set docker to launch at boot systemctl status docker # check docker is running systemctl start docker # start docker if it is not running

  3. create jupyterhub docker image reference: https://medium.com/@bluedme/connecting-jupyterhub-to-auth0-e92f0bb6efb0

    docker pull jupyterhub/jupyterhub # download jupyterhub container docker run -p 8000:8000 -d --name jupyterhub jupyterhub/jupyterhub jupyterhub # launch jupyterhub server docker exec -it jupyterhub bash # go inside/allow to run a bash process in container useradd --create-home # create user (with password) to log into jupyterHub server passwd

    username: dandi

    password: JupyterDemo1357

    conda install notebook # install jupyter notebook conda install jupyterlab # install jupyter lab apt-get update && apt-get upgrade -y # update package list apt-get install python3-pip # install pip exit # exit container

    docker restart jupyterhub # restart jupyterhub server

  4. install and start minikube install kubectl: reference: https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-linux

     curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl   # download latest release
     chmod +x ./kubectl                                                                                                                                                          # make the kubectl binary executable
     sudo mv ./kubectl /usr/local/bin/kubectl                                                                                                                                    # move the binary in to your PATH
     kubectl version                                                                                                                                                             # check kubectl is installed and version is up-to-date
    

    install minikube: reference: https://kubernetes.io/docs/tasks/tools/install-minikube/

     curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64                                                                              # download latest release
     chmod +x minikube                                                                                                                                                           # make the kubectl binary executable
     sudo mkdir -p /usr/local/bin/                                                                                                                                               # move the binary in to your PATH 
     sudo install minikube /usr/local/bin/
    

    check minikube version and start: minikube version sudo minikube start --vm-driver=none # start minikube without VM. if command returns error, see reference

  5. deploy jupyterhub to minikube pod reference: https://sweetcode.io/learning-kubernetes-getting-started-minikube/ reference: https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/

    setup pod: check which nodes and pods are up: kubectl get nodes kubectl get pods # no pods should be deployed to the cluster yet

     install pre-requisites:
         apt-get update && apt-get upgrade -y        # update package list
         apt-get install socat                       # required for port-forwarding
    
     edit pod configurations:
         vim pod.yaml            # create pod configuration options files
    
         apiVersion: v1
         kind: Pod
         metadata:
         name: pod-jupyter-test
         labels:
             app: pod-jupyter-test
         spec:  # specification of the pod's contents
         restartPolicy: Never
         containers:
         - name: pod-jupyter-test
             image: jupyterhub/jupyterhub
             ports:
             - containerPort: 8000
    

    deploy pod: kubectl create -f pod.yaml kubectl get pods kubectl describe pod pod-jupyter-test nohup kubectl port-forward pod-jupyter-test 8000:8000 & # run port forwarder in the background (even after logout)

Clone this wiki locally