-
Notifications
You must be signed in to change notification settings - Fork 0
Docker recipes
This page captures recipes for extending the base stacks included in this repo.
@pdonorio said:
There is a great repo called RISE which allow via extension to create live slideshows of your notebooks, with no conversion, adding javascript Reveal.js.
I like it a lot, and find my self often adding this feature on top of your official images.
As a quick example of how you could do it, taken from my personal repo:
# Add Live slideshows with RISE
RUN wget https://github.com/pdonorio/RISE/archive/master.tar.gz \
&& tar xvzf *.gz && cd master && python3 setup.py install
Ref: https://github.com/jupyter/docker-stacks/issues/43
Sometimes it is useful to run the Jupyter instance behind a nginx proxy, for instance:
- you would prefer to access the notebook at a server URL with a path (
https://example.com/jupyter
) rather than a port (https://example.com:8888
) - you may have many different services in addition to Jupyter running on the same server, and want to nginx to help improve server performance in manage the connections
Here is a quick example NGINX configuration to get started. You'll need a server, a .crt
and .key
file for your server, and docker
& docker-compose
installed. Then just download the files at that gist and run docker-compose up -d
to test it out. Customize the nginx.conf
file to set the desired paths and add other services.
If you'd like to use packages from spark-packages.org, see https://gist.github.com/parente/c95fdaba5a9a066efaab for an example of how to specify the package identifier in the environment before creating a SparkContext.
Ref: https://github.com/jupyter/docker-stacks/issues/43
Deprecated. libav-tools
is included in the datascience-notebook and scipy-notebook stacks as of build/commit c9428336463c
Ref: https://github.com/jupyter/docker-stacks/issues/90
See the README for the simple automation here https://github.com/jupyter/docker-stacks/tree/master/examples/make-deploy which includes steps for requesting and renewing a Let's Encrypt certificate.
Ref: https://github.com/jupyter/docker-stacks/issues/78
Create a new Dockerfile like the one shown in this gist: https://gist.github.com/parente/0d735d93cb81a582d635. Switch the base stack image to whichever you please (e.g., FROM jupyter/datascience-notebook
, FROM jupyter/pyspark-notebook
).
Create a new Dockerfile like the one shown below.
FROM jupyter/datascience-notebook
# install in the default python3 environment
RUN pip install 'ggplot==0.6.8'
# install in the python2 environment also
RUN bash -c "source activate python2 && pip install 'ggplot==0.6.8'"
Then build a new image.
docker build --rm -t jupyter/my-datascience-notebook .
@jtyberg contributed https://github.com/jupyter/docker-stacks/pull/185
Originally, @quanghoc asked:
How does this [docker-stacks] work with dockerspawner?
@minrk replied:
... in most cases for use with DockerSpawner, given any image that already has a notebook stack set up, you would only need to add:
- install the jupyterhub-singleuser script (for the right Python)
- change the command to launch the single-user server
Swapping out the
FROM
line in thejupyterhub/singleuser
Dockerfile should be enough for most cases.
Ref: https://github.com/jupyter/docker-stacks/issues/124
You need to install conda's gcc for Python xgboost to work properly. Otherwise, you'll get an exception about libgomp.so.1 missing GOMP_4.0.
%%bash
conda install -y gcc
pip install xgboost
import xgboost
Ref: https://github.com/jupyter/docker-stacks/issues/177
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0 pyspark-shell'
import pyspark
sc = pyspark.SparkContext("local[*]")
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
hadoopConf = sc._jsc.hadoopConfiguration()
myAccessKey = input()
mySecretKey = input()
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", myAccessKey)
hadoopConf.set("fs.s3.awsSecretAccessKey", mySecretKey)
df = sqlContext.read.parquet("s3://myBucket/myKey")
Ref: https://github.com/jupyter/docker-stacks/issues/127
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /home/jovyan/spark-streaming-kafka-assembly_2.10-1.6.1.jar pyspark-shell'
import pyspark
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
sc = pyspark.SparkContext()
ssc = StreamingContext(sc,1)
broker = "<my_broker_ip>"
directKafkaStream = KafkaUtils.createDirectStream(ssc, ["test1"], {"metadata.broker.list": broker})
directKafkaStream.pprint()
ssc.start()
Ref: https://github.com/jupyter/docker-stacks/issues/154
If you are mounting a host directory as /home/jovyan/work
in your container and you receive permission errors or connection errors when you create a notebook, be sure that the jovyan
user (UID=1000 by default) has read/write access to the directory on the host. Alternatively, specify the UID of the jovyan
user on container using this option from the README.
- -e NB_UID=1000 - Specify the uid of the jovyan user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with --user root. (The start-notebook.sh script will su jovyan after adjusting the user id.)
Ref: https://github.com/jupyter/docker-stacks/issues/199
You can build a child image that runs the latest Jupyter Lab release instead of the classic notebook.
# replace with your desired base stack
FROM jupyter/scipy-notebook
RUN pip install jupyterlab && \
jupyter serverextension enable --py jupyterlab
CMD ["start.sh", "jupyter", "lab"]