Skip to content
Peter Parente edited this page Aug 12, 2016 · 10 revisions

This page captures recipes for extending the base stacks included in this repo.

Add RISE

@pdonorio said:

There is a great repo called RISE which allow via extension to create live slideshows of your notebooks, with no conversion, adding javascript Reveal.js.

I like it a lot, and find my self often adding this feature on top of your official images.

As a quick example of how you could do it, taken from my personal repo:

# Add Live slideshows with RISE
RUN wget https://github.com/pdonorio/RISE/archive/master.tar.gz \
    && tar xvzf *.gz && cd master && python3 setup.py install

Ref: https://github.com/jupyter/docker-stacks/issues/43

Running behind a nginx proxy

Sometimes it is useful to run the Jupyter instance behind a nginx proxy, for instance:

  • you would prefer to access the notebook at a server URL with a path (https://example.com/jupyter) rather than a port (https://example.com:8888)
  • you may have many different services in addition to Jupyter running on the same server, and want to nginx to help improve server performance in manage the connections

Here is a quick example NGINX configuration to get started. You'll need a server, a .crt and .key file for your server, and docker & docker-compose installed. Then just download the files at that gist and run docker-compose up -d to test it out. Customize the nginx.conf file to set the desired paths and add other services.

Using spark-packages.org

If you'd like to use packages from spark-packages.org, see https://gist.github.com/parente/c95fdaba5a9a066efaab for an example of how to specify the package identifier in the environment before creating a SparkContext.

Ref: https://github.com/jupyter/docker-stacks/issues/43

Enable matplotlib animation

Deprecated. libav-tools is included in the datascience-notebook and scipy-notebook stacks as of build/commit c9428336463c

Ref: https://github.com/jupyter/docker-stacks/issues/90

Let's Encrypt a Notebook server

See the README for the simple automation here https://github.com/jupyter/docker-stacks/tree/master/examples/make-deploy which includes steps for requesting and renewing a Let's Encrypt certificate.

Ref: https://github.com/jupyter/docker-stacks/issues/78

Add Incubating Dashboard, Declarative Widget, Content Management Extensions

Create a new Dockerfile like the one shown in this gist: https://gist.github.com/parente/0d735d93cb81a582d635. Switch the base stack image to whichever you please (e.g., FROM jupyter/datascience-notebook, FROM jupyter/pyspark-notebook).

Using pip install in a Child Docker image

Create a new Dockerfile like the one shown below.

FROM jupyter/datascience-notebook

# install in the default python3 environment
RUN pip install 'ggplot==0.6.8'
# install in the python2 environment also
RUN bash -c "source activate python2 && pip install 'ggplot==0.6.8'"

Then build a new image.

docker build --rm -t jupyter/my-datascience-notebook .

Ref: https://github.com/jupyter/docker-stacks/commit/79169618d571506304934a7b29039085e77db78c#commitcomment-15960081

Use with JupyterHub's dockerspawner

@jtyberg contributed https://github.com/jupyter/docker-stacks/pull/185

Originally, @quanghoc asked:

How does this [docker-stacks] work with dockerspawner?

@minrk replied:

... in most cases for use with DockerSpawner, given any image that already has a notebook stack set up, you would only need to add:

  1. install the jupyterhub-singleuser script (for the right Python)
  2. change the command to launch the single-user server

Swapping out the FROM line in the jupyterhub/singleuser Dockerfile should be enough for most cases.

Ref: https://github.com/jupyter/docker-stacks/issues/124

Use xgboost

You need to install conda's gcc for Python xgboost to work properly. Otherwise, you'll get an exception about libgomp.so.1 missing GOMP_4.0.

%%bash
conda install -y gcc
pip install xgboost

import xgboost

Ref: https://github.com/jupyter/docker-stacks/issues/177

Using PySpark with AWS S3

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'

import pyspark
sc = pyspark.SparkContext("local[*]")

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

hadoopConf = sc._jsc.hadoopConfiguration()
myAccessKey = input() 
mySecretKey = input()
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", myAccessKey)
hadoopConf.set("fs.s3.awsSecretAccessKey", mySecretKey)

df = sqlContext.read.parquet("s3://myBucket/myKey")

Ref: https://github.com/jupyter/docker-stacks/issues/127

Using Local Spark JARs

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/jovyan/spark-streaming-kafka-assembly_2.10-1.6.1.jar pyspark-shell'
import pyspark
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
sc = pyspark.SparkContext()
ssc = StreamingContext(sc,1)
broker = "<my_broker_ip>"
directKafkaStream = KafkaUtils.createDirectStream(ssc, ["test1"], {"metadata.broker.list": broker})
directKafkaStream.pprint()
ssc.start()

Ref: https://github.com/jupyter/docker-stacks/issues/154

Host volume mounts and notebook errors

If you are mounting a host directory as /home/jovyan/work in your container and you receive permission errors or connection errors when you create a notebook, be sure that the jovyan user (UID=1000 by default) has read/write access to the directory on the host. Alternatively, specify the UID of the jovyan user on container using this option from the README.

  • -e NB_UID=1000 - Specify the uid of the jovyan user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with --user root. (The start-notebook.sh script will su jovyan after adjusting the user id.)

Ref: https://github.com/jupyter/docker-stacks/issues/199

Run Jupyter Lab

You can build a child image that runs the latest Jupyter Lab release instead of the classic notebook.

# replace with your desired base stack
FROM jupyter/scipy-notebook

RUN pip install jupyterlab && \
    jupyter serverextension enable --py jupyterlab
CMD ["start.sh", "jupyter", "lab"]

Ref: https://github.com/jupyter/docker-stacks/pull/258