Download and Format dbNSFP

This docker container is a workflow to download and prepare a dbNSFP database dump for usage in annotating outputs of other pipelines. It targets dbNSFP Version 4.0a for now with plans to update as new versions are released.

Two copies of the Database are made, one for hg19 and one for hg38. The downloaded database is 25GB and each reference genome version is also 25GB, so you'll need > 75GB of storage to build the two new versions. Once built an index file for tabix is created allowing for remote querying of the database.

Usage

The script will look for the zip file dbNSFP4.0a.zip and will not download it again if it is already there. You can test this container's further steps by pointing it at a directory already containing dbNSFP4.0a.zip.

Note: if you make changes to the Dockerfile or script, you'll need to rebuild the container.

git clone https://github.com/genomicsaotearoa/dbNSFP_build
cd dbNSFP_build
INPUTDIR=/data/dbSNFP # Set this to a location with 100GB+ free
docker build -t dbnsfp .
docker run -it -v ${INPUTDIR}:/data dbnsfp

Changelog

Initial Version as Gist - @sirselim
August 22, 2019 - @jduckles
- Build a Docker container with dependencies
- Output some amount of progress feedback
- Read the number of threads from the cpu count in /proc/cpuinfo (so assumes linux/not MacOS, ok for inside containers).
- I removed the -S 20G memory allocation as it won’t work gracefully on machines with less memory
- Using functions in the BASH script to organise things a bit better
- Switched to Aria2c (vs wget) so I can have 5 download channels at once (faster download). Google Drive hosting is HEAPS faster, but the URLs are craptastic.
- Sketching out some MD5 sum checking
- Doing a few tests to prevent re-downloading the 25GB file, but downloading it it isn’t present

Original work

This work was started by @sirselim in this gist: https://gist.github.com/sirselim/dcaad07523c90b46c1c0685efbc5d04e

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Dockerfile		Dockerfile
README.md		README.md
dbNSFP_pipeline_build.sh		dbNSFP_pipeline_build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download and Format dbNSFP

Usage

Changelog

Original work

About

Releases

Packages

Languages

GenomicsAotearoa/dbNSFP_build

Folders and files

Latest commit

History

Repository files navigation

Download and Format dbNSFP

Usage

Changelog

Original work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages