Skip to content

JupyterHub Binder deployment strategies on AWS

Bhavya Kandimalla edited this page Sep 17, 2019 · 32 revisions

Notes for deploying JupyterHub/Binder on AWS

JupyterHub Binder

Binder

What is JupyterHub and Chosing a Distribution for Deployment on AWS

(https://jupyter.org/hub ; https://tljh.jupyter.org/en/latest/topic/whentouse.html#topic-whentouse)

What is JupyterHub and key features of JupyterHub can be found at the first link above.

Information about distributions of JupyterHub and choosing a distrubution of JupyterHub can be found and the first and second link above.

There are two distributions: Kubernetes and Littlest

  1. Kubernetes -
  • allows JupyterHub to scale to many thousands of users
  • can flexibly grow/shrink the size of resources it needs
  • uses container technology (Docker) in administering user sessions
  • allows users to interact with a computing environment through a webpage - makes it is easy to provide and standardize the computing environment of a group of people
  1. Littlest -
  • also known as The Littlest JupyterHub (TLJH)
  • an opinionated and pre-configured distribution to deploy a JupyterHub on a single virtual machine (in the cloud or on your own hardware)
  • designed to be a more lightweight and maintainable solution for use-cases where size, scalability, and cost-savings are not a huge concern
  • distribution for a small (0-100) number of users

Although we are testing with 1-5 users, we are chosing Kubernetes deployment because we are spreading users on a cluster of smaller machines that are scaled up or down, and we need to be able to run containers (docker or singularity). This will also allow us to scale up users as needed.


1. select amazon machine image

Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-05c1fa8df71875112 (64-bit x86) / ami-0606a0d9f566249d3 (64-bit Arm)

2. select instance type

t3a.medium (Variable ECUs, 2 vCPUs, 2.2 GHz, AMD EPYC 7571, 4 GiB memory, EBS only)

instance configuration details: T2/T3 Unlimited enabled
storage: root volume, device /dev/sda1, size 8 GiB, General Purpose SSD (gp2), not encrypted
tags: DANDI-HUB, Webserver, on instances and volumes

Type        Protocol        Port Range      Source          Description
HTTP        TCP             80              0.0.0.0/0
HTTP        TCP             80              ::/0
SSH         TCP             22              0.0.0.0/0
HTTPS       TCP             443             0.0.0.0/0
HTTPS       TCP             443             ::/0

3. launch with key pair!

4. attach elastic IP

hub     3.19.206.158

5. set up lets encrypt & nginx

https://github.com/dandi/infrastructure/wiki/Girder-setup-on-aws

#### install pre-requisites:
    apt-get update
    apt-get install -y git docker-compose python3.7 python3-setuptools nginx vim fail2ban

#### setup nginx:
    vim /etc/nginx/sites-enabled/hub.dandiarchive.org

#### edit nginx site file:
    `server {`
            `listen 80;`
            `server_name    hub.dandiarchive.org;`
            `location / {`
                `proxy_pass http://localhost:8080/;`
                `proxy_set_header Host $host;`
                `proxy_set_header X-Real-IP $remote_addr;`
                `proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;`
            `}`
    `}`

#### restart nginx:
    `service nginx restart`
    `service nginx status`

#### setup lets encrypt:
    `apt-get install -y software-properties-common`
    `add-apt-repository universe`
    `add-apt-repository -y ppa:certbot/certbot`
    `apt-get update`
    `apt-get install -y certbot python-certbot-nginx`
    `certbot --nginx`

6. install kops, kubectl, and aws cli tools

https://github.com/kubernetes/kops/blob/master/docs/install.md

 #### kops:
    `curl -Lo kops https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64`
    `chmod +x ./kops`
    `sudo mv ./kops /usr/local/bin/`

#### kubectl:
    `curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl`
    `chmod +x ./kubectl`
    `sudo mv ./kubectl /usr/local/bin/kubectl`

#### aws cli tools:
    #### install pip:
        (after installation of python3.7 and python3-setuptools during lets encrypt step)
????????   `apt update`
        `apt-get install python3-setuptools`
        `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
        `python3.7 get-pip.py`
    
    #### install aws cli tools:
        `pip install awscli`
        `pip install --user --upgrade awscli`

7. set the region to deploy in

`export REGION=`curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep region|awk -F '"' '{print }'``

??????? 8. install the AWS CLI:

`sudo apt-get update`
`sudo apt-get install awscli`

9. set the availability zones for the nodes - allowing nodes to be deployed in all AZ

`export ZONES=$(aws ec2 describe-availability-zones --region $REGION | grep ZoneName | awk '{print }' | tr -d '"')`

10. setup an ssh keypair to use with the cluster

`ssh-keygen`

11. create a a route53 domain for your cluster & S3 bucket to store your cluster state

https://kubernetes.io/docs/setup/production-environment/tools/kops/#creating-a-cluster

*** sub domain not found as per instructions on website above:

ubuntu@ip-172-31-42-61:~/.ssh$ dig ns hub.dandiarchive.org

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> ns hub.dandiarchive.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54931 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;hub.dandiarchive.org. IN NS

;; Query time: 36 msec ;; SERVER: 127.0.0.53#53(127.0.0.53) ;; WHEN: Tue Sep 17 02:09:26 UTC 2019 ;; MSG SIZE rcvd: 49

*** showing failure when trying to use sub-domain:

ubuntu@ip-172-31-42-61:~$ kops create cluster --name=cluster1.hub.dandiarchive.org \

--zones=us-east-2a
--authorization=RBAC
--master-size=t3a.medium
--master-volume-size=4
--node-size=t3a.medium
--node-volume-size=4
--state=s3://bucket1.hub.dandiarchive.org
--topology=private
--networking=weave
--yes I0917 18:14:33.124809 17849 create_cluster.go:519] Inferred --cloud=aws from zone "us-east-2a" I0917 18:14:33.171711 17849 subnets.go:184] Assigned CIDR 172.20.32.0/19 to subnet us-east-2a I0917 18:14:33.171745 17849 subnets.go:198] Assigned CIDR 172.20.0.0/22 to subnet utility-us-east-2a I0917 18:14:33.462670 17849 create_cluster.go:1486] Using SSH public key: /home/ubuntu/.ssh/id_rsa.pub

error doing DNS lookup for NS records for "hub.dandiarchive.org": lookup hub.dandiarchive.org on 127.0.0.53:53: no such host

*** aws only seems to want to use top level domains:

ubuntu@ip-172-31-42-61:~/.ssh$ dig ns dandiarchive.org

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> ns dandiarchive.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33252 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 65494 ;; QUESTION SECTION: ;dandiarchive.org. IN NS

;; ANSWER SECTION: dandiarchive.org. 21600 IN NS ns-cloud-d4.googledomains.com. dandiarchive.org. 21600 IN NS ns-cloud-d1.googledomains.com. dandiarchive.org. 21600 IN NS ns-cloud-d2.googledomains.com. dandiarchive.org. 21600 IN NS ns-cloud-d3.googledomains.com.

;; Query time: 104 msec ;; SERVER: 127.0.0.53#53(127.0.0.53) ;; WHEN: Tue Sep 17 02:09:31 UTC 2019 ;; MSG SIZE rcvd: 166

*** it works when using top-level domain *** but we are missing api.dandiarchive.org ...

ubuntu@ip-172-31-42-61:~$ kops create cluster --name=dandiarchive.org --zones=us-east-2a --authorization=RBAC --master-size=t3a.me dium --master-volume-size=4 --node-size=t3a.medium --node-volume-size=4 --state=s3://bucket1.hub.dandiarchive.org --topology=p rivate --networking=weave --yes I0917 02:23:23.999970 7524 create_cluster.go:519] Inferred --cloud=aws from zone "us-east-2a" I0917 02:23:24.050888 7524 subnets.go:184] Assigned CIDR 172.20.32.0/19 to subnet us-east-2a I0917 02:23:24.050918 7524 subnets.go:198] Assigned CIDR 172.20.0.0/22 to subnet utility-us-east-2a I0917 02:23:24.345531 7524 create_cluster.go:1486] Using SSH public key: /home/ubuntu/.ssh/id_rsa.pub I0917 02:23:25.170525 7524 executor.go:103] Tasks: 0 done / 103 total; 48 can run I0917 02:23:25.978820 7524 vfs_castore.go:729] Issuing new certificate: "etcd-manager-ca-events" I0917 02:23:26.030063 7524 vfs_castore.go:729] Issuing new certificate: "etcd-peers-ca-main" I0917 02:23:26.655894 7524 vfs_castore.go:729] Issuing new certificate: "apiserver-aggregator-ca" I0917 02:23:26.808139 7524 vfs_castore.go:729] Issuing new certificate: "ca" I0917 02:23:27.132766 7524 vfs_castore.go:729] Issuing new certificate: "etcd-clients-ca" I0917 02:23:27.157690 7524 vfs_castore.go:729] Issuing new certificate: "etcd-manager-ca-main" I0917 02:23:27.399647 7524 vfs_castore.go:729] Issuing new certificate: "etcd-peers-ca-events" I0917 02:23:27.566378 7524 executor.go:103] Tasks: 48 done / 103 total; 27 can run I0917 02:23:28.791094 7524 vfs_castore.go:729] Issuing new certificate: "kubelet-api" I0917 02:23:29.160236 7524 vfs_castore.go:729] Issuing new certificate: "kubecfg" I0917 02:23:29.352976 7524 vfs_castore.go:729] Issuing new certificate: "apiserver-proxy-client" I0917 02:23:29.406408 7524 vfs_castore.go:729] Issuing new certificate: "kube-proxy" I0917 02:23:29.497207 7524 vfs_castore.go:729] Issuing new certificate: "kubelet" I0917 02:23:29.525997 7524 vfs_castore.go:729] Issuing new certificate: "kube-scheduler" I0917 02:23:30.551625 7524 vfs_castore.go:729] Issuing new certificate: "apiserver-aggregator" I0917 02:23:30.780717 7524 vfs_castore.go:729] Issuing new certificate: "kube-controller-manager" I0917 02:23:31.094066 7524 vfs_castore.go:729] Issuing new certificate: "kops" I0917 02:23:31.410342 7524 vfs_castore.go:729] Issuing new certificate: "master" I0917 02:23:31.616379 7524 executor.go:103] Tasks: 75 done / 103 total; 22 can run I0917 02:23:31.752264 7524 launchconfiguration.go:364] waiting for IAM instance profile "nodes.dandiarchive.org" to be ready I0917 02:23:31.772118 7524 launchconfiguration.go:364] waiting for IAM instance profile "masters.dandiarchive.org" to be ready I0917 02:23:42.071310 7524 executor.go:103] Tasks: 97 done / 103 total; 4 can run I0917 02:23:42.587616 7524 executor.go:103] Tasks: 101 done / 103 total; 2 can run I0917 02:23:42.615335 7524 natgateway.go:286] Waiting for NAT Gateway "nat-0adac19b60471232b" to be available (this often takes about 5 minutes) I0917 02:25:28.329877 7524 executor.go:103] Tasks: 103 done / 103 total; 0 can run I0917 02:25:28.329926 7524 dns.go:153] Pre-creating DNS records I0917 02:25:28.566035 7524 update_cluster.go:291] Exporting kubecfg for cluster kops has set your kubectl context to dandiarchive.org

Cluster is starting. It should be ready in a few minutes.

Suggestions:

ubuntu@ip-172-31-42-61:~$ kubectl get nodes --show-labels Unable to connect to the server: dial tcp: lookup api.dandiarchive.org on 127.0.0.53:53: no such host

Clone this wiki locally