Docker image is available at DockerHub Lifesciences repository
There are two version of NGB in the repository:
- ngb:latest - a "core" version - contains image of NGB without any data in it, only binaries
- ngb:latest-demo - a "demo" version - contains demo data set, which does not require any data registration, you need only to run an image
Warning: a demo version could take up to 2Gb of the disk space (FASTA sequence, genes annotations, BAM, VCFs) For a demo version run the following command
$ docker run -p 8080:8080 -d --name ngbcore lifescience/ngb:latest-demo
You can go to http://localhost:8080/catgenome or http://ip-of-the-host:8080/catgenome in a browser and view demo datasets (Sample 1 and Sample 2), which contain Structural Variations
For a core version replace <YOUR_NGS_DATA_FOLDER> placeholder with a real path to a folder with NGS data, and then run command
$ docker run -p 8080:8080 -d --name ngbcore -v <YOUR_NGS_DATA_FOLDER>:/ngs lifescience/ngb
This will create and start the container in a background mode and map port 8080 of the container to port 8080 of the host, then mount <YOUR_NGS_DATA_FOLDER> of the host to /ngs folder of the container and at last - make container accessible by name ngbcore
You can go to http://localhost:8080/catgenome or http://ip-of-the-host:8080/catgenome in a browser (Chrome) and verify that server started successfully (you should see empty list of datasets)
To register your own data you should attach to a running container
$ docker exec -it ngbcore /bin/bash
This will put you inside a container's console and make ngb command available First of all you should register reference (genome data), using a mounted folder /ngs. NGB accepts FASTA files for reference sequence
# ngb reg_ref /ngs/<PATH_TO_FASTA> -n my_genome -t
According to FASTA size you should wait several minutes.
To make NGS data available via NGB, you should create a DATASET, that is used to group linked files You can register files and then add them to a dataset
Register file
# ngb reg_file my_genome /ngs/<PATH_TO_FILE> -n my_file1 -t
Note that you should provide reference name (my_genome in this case), also -n (name) key is optional, if it is not specified - original file name will be used
Create dataset and add file(s) to it
# ngb reg_dataset my_genome my_sample my_file1
Or you can create dataset and register files simultaneously
# ngb reg_dataset my_genome my_sample /ngs/<PATH_TO_FILE> /ngs/<PATH_TO_FILE2>
Note that when registering a dataset, you should specify a genome name, to which files correspond
After all you can leave container's console using
# exit
NGB container will continue running in a background. When datasets are created - you can immediately browse NGS data.
When any data was registered in NGB container - it will be lost once a container is removed. To avoid this, cache locations inside a container shall be exposed to the host filesystem.
This can be achieved by mounting of host folders into a container, using paths that contain NGB index database (H2 dir) and files caches (contents dir):
- /opt/catgenome/H2
- /opt/catgenome/contents
Note: these options shall be specified to a docker run
command at start time
Example:
Imagine a host machine that contains two folders
/ngs
- stores NGS data that shall be registered in NGB/ngb-cache
- empty folder that will be used to persist NGB caches
The following command can be used to persist all changes made to a container into that folders:
$ docker run -p 8080:8080 \
-d \
--name ngbcore \
-v /ngs:/ngs \
-v /ngb-cache/H2:/opt/catgenome/H2 \
-v /ngb-cache/contents:/opt/catgenome/contents \
lifescience/ngb
Restarting a container using this command will not cause loss of data or NGB configuration
-v /host/ngs:/ngs -v /host/H2:/opt/catgenome/H2 -v /host/contents:/opt/catgenome/contents
ngb:latest-demo container is built to show some basic features of NGB. It uses mostly shrinked data to minimize a container size
-
SV_Sample1 dataset: ALK-EML4 fusion
-
SV_Sample2 dataset: ROS1-SLC34A2 fusion
-
FGFR3-TACC-Fusion-Sample dataset: FGFR3-TACC3 fusion
-
PIK3CA-E545K-Sample dataset: E545K SNV
-
Fruitfly dataset: LIMK 1 SNV-INDELS