Docker recipes

This page captures recipes for extending the base stacks included in this repo.

Add RISE

@pdonorio said:

There is a great repo called RISE which allow via extension to create live slideshows of your notebooks, with no conversion, adding javascript Reveal.js.

I like it a lot, and find my self often adding this feature on top of your official images.

As a quick example of how you could do it, taken from my personal repo:

# Add Live slideshows with RISE
RUN wget https://github.com/pdonorio/RISE/archive/master.tar.gz \
    && tar xvzf *.gz && cd master && python3 setup.py install

Ref: https://github.com/jupyter/docker-stacks/issues/43

Running behind a nginx proxy

Sometimes it is useful to run the Jupyter instance behind a nginx proxy, for instance:

you would prefer to access the notebook at a server URL with a path (https://example.com/jupyter) rather than a port (https://example.com:8888)
you may have many different services in addition to Jupyter running on the same server, and want to nginx to help improve server performance in manage the connections

Here is a quick example NGINX configuration to get started. You'll need a server, a .crt and .key file for your server, and docker & docker-compose installed. Then just download the files at that gist and run docker-compose up -d to test it out. Customize the nginx.conf file to set the desired paths and add other services.

Using spark-packages.org

If you'd like to use packages from spark-packages.org, see https://gist.github.com/parente/c95fdaba5a9a066efaab for an example of how to specify the package identifier in the environment before creating a SparkContext.

Ref: https://github.com/jupyter/docker-stacks/issues/43

Enable matplotlib animation

Deprecated. libav-tools is included in the datascience-notebook and scipy-notebook stacks as of build/commit c9428336463c

Ref: https://github.com/jupyter/docker-stacks/issues/90

Let's Encrypt a Notebook server

See the README for the simple automation here https://github.com/jupyter/docker-stacks/tree/master/examples/make-deploy which includes steps for requesting and renewing a Let's Encrypt certificate.

Ref: https://github.com/jupyter/docker-stacks/issues/78

Add Incubating Dashboard, Declarative Widget, Content Management Extensions

Create a new Dockerfile like the one shown in this gist: https://gist.github.com/parente/0d735d93cb81a582d635. Switch the base stack image to whichever you please (e.g., FROM jupyter/datascience-notebook, FROM jupyter/pyspark-notebook).

Using `pip install` in a Child Docker image

Create a new Dockerfile like the one shown below.

FROM jupyter/datascience-notebook

# install in the default python3 environment
RUN pip install 'ggplot==0.6.8'
# install in the python2 environment also
RUN bash -c "source activate python2 && pip install 'ggplot==0.6.8'"

Then build a new image.

docker build --rm -t jupyter/my-datascience-notebook .

Ref: https://github.com/jupyter/docker-stacks/commit/79169618d571506304934a7b29039085e77db78c#commitcomment-15960081

Use with JupyterHub's dockerspawner

@jtyberg contributed https://github.com/jupyter/docker-stacks/pull/185

Originally, @quanghoc asked:

How does this [docker-stacks] work with dockerspawner?

@minrk replied:

... in most cases for use with DockerSpawner, given any image that already has a notebook stack set up, you would only need to add:

install the jupyterhub-singleuser script (for the right Python)

change the command to launch the single-user server

Swapping out the FROM line in the jupyterhub/singleuser Dockerfile should be enough for most cases.

Ref: https://github.com/jupyter/docker-stacks/issues/124

Use xgboost

You need to install conda's gcc for Python xgboost to work properly. Otherwise, you'll get an exception about libgomp.so.1 missing GOMP_4.0.

%%bash
conda install -y gcc
pip install xgboost

import xgboost

Ref: https://github.com/jupyter/docker-stacks/issues/177

Using PySpark with AWS S3

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'

import pyspark
sc = pyspark.SparkContext("local[*]")

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

hadoopConf = sc._jsc.hadoopConfiguration()
myAccessKey = input() 
mySecretKey = input()
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", myAccessKey)
hadoopConf.set("fs.s3.awsSecretAccessKey", mySecretKey)

df = sqlContext.read.parquet("s3://myBucket/myKey")

Ref: https://github.com/jupyter/docker-stacks/issues/127

Using Local Spark JARs

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/jovyan/spark-streaming-kafka-assembly_2.10-1.6.1.jar pyspark-shell'
import pyspark
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
sc = pyspark.SparkContext()
ssc = StreamingContext(sc,1)
broker = "<my_broker_ip>"
directKafkaStream = KafkaUtils.createDirectStream(ssc, ["test1"], {"metadata.broker.list": broker})
directKafkaStream.pprint()
ssc.start()

Ref: https://github.com/jupyter/docker-stacks/issues/154

Host volume mounts and notebook errors

If you are mounting a host directory as /home/jovyan/work in your container and you receive permission errors or connection errors when you create a notebook, be sure that the jovyan user (UID=1000 by default) has read/write access to the directory on the host. Alternatively, specify the UID of the jovyan user on container using this option from the README.

-e NB_UID=1000 - Specify the uid of the jovyan user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with --user root. (The start-notebook.sh script will su jovyan after adjusting the user id.)

Ref: https://github.com/jupyter/docker-stacks/issues/199

Run Jupyter Lab

You can build a child image that runs the latest Jupyter Lab release instead of the classic notebook.

# replace with your desired base stack
FROM jupyter/scipy-notebook

RUN pip install jupyterlab && \
    jupyter serverextension enable --py jupyterlab
CMD ["start.sh", "jupyter", "lab"]

Ref: https://github.com/jupyter/docker-stacks/pull/258

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker recipes

Add RISE

Running behind a nginx proxy

Using spark-packages.org

Enable matplotlib animation

Let's Encrypt a Notebook server

Add Incubating Dashboard, Declarative Widget, Content Management Extensions

Using `pip install` in a Child Docker image

Use with JupyterHub's dockerspawner

Use xgboost

Using PySpark with AWS S3

Using Local Spark JARs

Host volume mounts and notebook errors

Run Jupyter Lab

Clone this wiki locally

Docker recipes

Add RISE

Running behind a nginx proxy

Using spark-packages.org

Enable matplotlib animation

Let's Encrypt a Notebook server

Add Incubating Dashboard, Declarative Widget, Content Management Extensions

Using pip install in a Child Docker image

Use with JupyterHub's dockerspawner

Use xgboost

Using PySpark with AWS S3

Using Local Spark JARs

Host volume mounts and notebook errors

Run Jupyter Lab

Clone this wiki locally

Using `pip install` in a Child Docker image