Skip to content

Commit

Permalink
Major DROP update (#113)
Browse files Browse the repository at this point in the history
* Major DROP update

* removed obsolete file

* changed project folder structure for initialisation

* API change: separated config and sample annotation parsers

* fixed missin root path

* import config differently

* package finding for subpackage

* running pipeline

* added circleci config

* removed obsolete file

* changed project folder structure for initialisation

* API change: separated config and sample annotation parsers

* fixed missin root path

* import config differently

* package finding for subpackage

* using correct .travis.yml

* removed subworkflow and running pipeline for aberrantExpression; new functions introduced for Script-Rule-Html conversions; analysis script directory renamed; pipeline code copied to different destination

* removed subworkflow for AS and MAE; changed folder structure; using log for tmp snakemake objects

* fixed input flags for analysis scripts

* running pipeline

* added circleci config

* removed obsolete file

* changed project folder structure for initialisation

* API change: separated config and sample annotation parsers

* fixed missin root path

* import config differently

* package finding for subpackage

* API change: separated config and sample annotation parsers

* fixed missin root path

* import config differently

* package finding for subpackage

* removed subworkflow and running pipeline for aberrantExpression; new functions introduced for Script-Rule-Html conversions; analysis script directory renamed; pipeline code copied to different destination

* removed subworkflow for AS and MAE; changed folder structure; using log for tmp snakemake objects

* fixed input flags for analysis scripts

* set submodules to master

* applied branch-specific modifications

* removed submodules.py and refactored path setup

* fixed Readme and config copy errors

* fixed copying helpers functions

* using wbuild with fixed Readme.html functionality and saving wbuild config in DropConfig

* moved exportCounts into separate class

* parsing GENE_COUNT_FILE column in sample annotation

* removed wbuild from conda recipe

* using bioconda version of drop for dependencies, pip install drop, no conda building

* remove bioconda wbuild

* install wbuild with expl pip

* remove wbuild with pip

* get count info from AE class

* some refactoring and saving external counts IDs in separate dictionary

* removed ID renaming in merge script (naming already in counts script); fixed missing input bug in DropConfig

* add external counts to count files

* merge counts and coverage including external counts

* file checks for config and sample annotation reimplemented

* config file getters for submodules

* refactored export counts

* travis run export counts rule

* reapplied bcftools command modification

* Documentation update (#102)

* README update to include Baylor counts
* updated drop installation command
* update install command docs to include conda-forge plus better descriptions

Co-authored-by: Vicente <[email protected]>
Co-authored-by: Michaela Müller <[email protected]>
Co-authored-by: Christian Mertes <[email protected]>

* updated documentation to include count import

* splicing export counts all columns added

* resolved requested formality errors

* fixed requested MAE changes

* removed scanBamParam

* create missing columns for MAE results

* Subindex (#3)

* first version of subindex implemented

* including readme and dependency graph for subindexes; removed fileRegex key from config (using default of ".*\.(R|md)")

* using different (wb1.8) config file

* separated dependency graph computation from rest of the pipeline

* using latest updated wbuild version

* use conda install for wbuild

Co-authored-by: mumichae <[email protected]>

* Fix version for wbuild (#108)

Co-authored-by: mumichae <[email protected]>

* updated version number in README and drop/cli

* Tests (#4)

Added pytest suite to project. The main things tested are:

* cli: basic drop command line functions

* all config classes

* pipeline runs for all submodules (including checking numbers of entries in output) and count export


## Commits

* first setup of pytest, pipeline runthroughs, no output checking yet

* updated version number in README and drop/cli

* fixed dependencies

* pip uninstall with -y

* changed pytest installation and error catching

* compare pipeline output and add more pipeline tests

* refactored pipeline tests

* fixed demo creation

* refactored pipeline tests again & fixed fixture scoping

* fixed minor issues in tests

* added config tests

* refactored getHtmlFromScript

Co-authored-by: mumichae <[email protected]>

* downloading data to temporary directory

* updated version to 1.0.0

Co-authored-by: Michaela Müller <[email protected]>
Co-authored-by: Vicente <[email protected]>
Co-authored-by: Christian Mertes <[email protected]>

* updated travis to use correct R and wbuild versions

* use github release badge instead of fixing it

Co-authored-by: Michaela Müller <[email protected]>
Co-authored-by: Vicente <[email protected]>
Co-authored-by: Christian Mertes <[email protected]>
  • Loading branch information
4 people authored Oct 17, 2020
1 parent 8a493cf commit 5ce58f4
Show file tree
Hide file tree
Showing 89 changed files with 1,817 additions and 1,434 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Output/
.ipynb_checkpoints*
__pycache__*
*.egg-info*
.eggs*
dist/*

# typical latex tmp files
Expand Down
28 changes: 12 additions & 16 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,17 @@ install:
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda

- conda config --add channels bioconda
- conda config --add channels conda-forge

# build package with cond
- conda install conda-build
- conda build conda.recipe --output-folder=$HOME/build
- conda config --add channels "file://${HOME}/build"

# test package
# install dependencies
- source $HOME/miniconda/etc/profile.d/conda.sh
- conda create -q -n drop drop_travis
- conda create -q -n drop -c conda-forge -c bioconda python=$TRAVIS_PYTHON_VERSION r-base=4.0.2
- conda activate drop
- conda install -c conda-forge -c bioconda drop
- conda remove --force drop wbuild
- conda install -c conda-forge -c bioconda wbuild=1.7.1
- pip install -r requirements_test.txt

# install drop
- pip install . -vv

script:
- conda list
Expand All @@ -37,10 +36,7 @@ script:
- samtools --version
- bcftools --version
- drop --version
- wbuild --version
- python --version

- mkdir drop_demo
- cd drop_demo
- drop demo
- snakemake -n
- snakemake --jobs 2 --cores 2

- pytest -vv -s
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Detection of RNA Outlier Pipeline
[![Pipeline status](https://travis-ci.org/gagneurlab/drop.svg?branch=master)](https://travis-ci.org/gagneurlab/drop)
[![Version](https://img.shields.io/badge/Version-0.9.2-green.svg)](https://github.com/gagneurlab/drop/releases/tag/0.9.2)
[![Version](https://img.shields.io/github/v/release/gagneurlab/drop?include_prereleases)](https://github.com/gagneurlab/drop/releases)
[![Version](https://readthedocs.org/projects/gagneurlab-drop/badge/?version=latest)](https://gagneurlab-drop.readthedocs.io/en/latest)

The manuscript main file, supplementary figures and table can be found in the manuscript folder or in
Expand Down Expand Up @@ -51,7 +51,8 @@ snakemake aberrantExpression -n
```

## Datasets
The following publicly-available datasets of gene counts can be used as controls:
The following publicly-available datasets of gene counts can be used as controls.
Please cite as instructed for each dataset.

* 119 non-strand specific fibroblasts: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3887451.svg)](https://doi.org/10.5281/zenodo.3887451)

Expand Down
91 changes: 0 additions & 91 deletions conda.recipe/meta.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
author = 'Michaela Müller'

# The full version, including alpha/beta/rc tags
release = '0.9.2'
release = '1.0.0'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Alternatively, DROP can be installed without ``conda``. In this case the followi

* `python <https://www.python.org/>`_ >= 3.6 and `pip <https://pip.pypa.io/en/stable/installing/>`_ >= 19.1

* `R <https://www.r-project.org/>`_ >= 3.6 and corresponding `bioconductor <https://bioconductor.org/install/>`_ version
* `R <https://www.r-project.org/>`_ >= 3.6, <=4.0.2 and corresponding `bioconductor <https://bioconductor.org/install/>`_ version

* Commandline tools:

Expand Down
72 changes: 45 additions & 27 deletions docs/source/prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,13 @@ Parameter Type Description
=================== ========== ======================================================================================================================================= ======
projectTitle character Title of the project to be displayed on the rendered HTML output ``Project 1``
htmlOutputPath character Full path of the folder where the HTML files are rendered ``/data/project1/htmlOutput``
indexWithFolderName boolean variable needed for wBuild, do not edit it ``true``
fileRegex character variable needed for wBuild, do not edit it ``.*\.R``
indexWithFolderName boolean If true, the basename of the project directory will be used as prefix for the index.html file ``true``
genomeAssembly character Either hg19 or hg38, depending on the genome assembly used for mapping ``/data/project1``
sampleAnnotation character Full path of the sample annotation table ``/data/project1/sample_annotation.tsv``
root character Full path of the folder where the subdirectories processed_data and processed_results will be created containing DROP's output files. ``/data/project1``
geneAnnotation dictionary A key-value list of the annotation name (key) and the full path to the GTF file (value). More than one annotation file can be provided. ``anno1: /path/to/gtf1.gtf``

``anno2: /path/to/gtf2.gtf``
scanBamParam character Either null or the path to an Rds file containing a scanBamParam object. Refer to the advanced options below. ``/path/to/scanBamParam.Rds``
tools dictionary A key-value list of different commands (key) and the command (value) to run them ``gatkCmd: gatk``

``bcftoolsCmd: bcftools``
Expand Down Expand Up @@ -126,10 +124,19 @@ qcGroups list Same as “groups”, but for the VCF-BAM matc
Creating the sample annotation table
------------------------------------

For details on how to generate the sample annotation, please refer to the DROP manuscript.
Here we provide some examples on how to deal with certain situations. For simplicity, we
do not include the other compulsory columns ``PAIRED_END``, ``COUNT_MODE``,
``COUNT_OVERLAPS`` and ``STRAND``.
For a detailed explanation of the columns of the sample annotation, please refer to
the DROP manuscript.
Inside the sample annotation, each row corresponds to a unique pair of RNA and DNA
samples derived from the same individual. An RNA assay can belong to one or more DNA
assays, and vice-versa. If so, they must be specified in different rows. The required
columns are ``RNA_ID``, ``RNA_BAM_FILE`` and ``DROP_GROUP``, plus other module-specific
ones (see DROP manuscript). In case external counts are included, add a new row for each
sample from those files (or a subset if not all samples are needed).

The sample annotation file should be saved in the tab-separated values (tsv) format. The
column order does not matter. Also, it does not matter where it is stored, as the path is
specified in the config file. Here we provide some examples on how to deal with certain
situations. For simplicity, we do not include all possible columns in the examples.

Example of RNA replicates
++++++++++++++++++++++++++++++++++
Expand All @@ -144,22 +151,41 @@ S10R_M S10G MUSCLE /path/to/S10R_M.BAM /path/to/S10G.vcf.gz
Example of DNA replicates
++++++++++++++++++++++++++++++++++

====== ====== ========== =================== ==
RNA_ID DNA_ID DROP_GROUP RNA_BAM_FILE DNA_VCF_FILE
====== ====== ========== =================== ==
S20R S20E WES /path/to/S20R.BAM /path/to/S20E.vcf.gz
S20R S20G WGS /path/to/S20R.BAM /path/to/S20G.vcf.gz
====== ====== ========== =================== ==
====== ====== ========== ================= ==
RNA_ID DNA_ID DROP_GROUP RNA_BAM_FILE DNA_VCF_FILE
====== ====== ========== ================= ==
S20R S20E WES /path/to/S20R.BAM /path/to/S20E.vcf.gz
S20R S20G WGS /path/to/S20R.BAM /path/to/S20G.vcf.gz
====== ====== ========== ================= ==

Example of a multi-sample vcf file
++++++++++++++++++++++++++++++++++

====== ====== ========== =================== ==
RNA_ID DNA_ID DROP_GROUP RNA_BAM_FILE DNA_VCF_FILE
====== ====== ========== =================== ==
S10R S10G WGS /path/to/S10R.BAM /path/to/multi_sample.vcf.gz
S20R S20G WGS /path/to/S20R.BAM /path/to/multi_sample.vcf.gz
====== ====== ========== =================== ==
====== ====== ========== ================= ==
RNA_ID DNA_ID DROP_GROUP RNA_BAM_FILE DNA_VCF_FILE
====== ====== ========== ================= ==
S10R S10G WGS /path/to/S10R.BAM /path/to/multi_sample.vcf.gz
S20R S20G WGS /path/to/S20R.BAM /path/to/multi_sample.vcf.gz
====== ====== ========== ================= ==

External count matrices
+++++++++++++++++++++++

In case counts from external matrices are to be integrated into the analysis,
the file must be specified in the GENE_COUNTS_FILE column. A new row must be
added for each sample from the count matrix that should be included in the
analysis. An RNA_BAM_FILE must not be specified. The DROP_GROUP of the local
and external samples that are to be analyzed together must be the same.
Similarly, the GENE_ANNOTATION of the external counts and the key of the `geneAnnotation`
parameter from the config file must match.

====== ====== ========== ================= ==
RNA_ID DNA_ID DROP_GROUP RNA_BAM_FILE GENE_COUNTS_FILE
====== ====== ========== ================= ==
S10R S10G BLOOD /path/to/S10R.BAM
EXT-1R BLOOD /path/to/externalCounts.tsv.gz
EXT-2R BLOOD /path/to/externalCounts.tsv.gz
====== ====== ========== ================= ==


Advanced options
Expand All @@ -183,12 +209,4 @@ We recommend the search space to be at most N/3 for the aberrant expression,
and N/6 for the aberrant splicing case. Nevertheless, the user can specify the
denominator with the parameter ``maxTestedDimensionProportion``.

In order to influence which fields of the BAM files are imported, the user can
provide a ``scanBamParam`` object. This will affect how the files are counted in
the aberrant expression and splicing modules. Refer to the function's
`documentation <https://www.rdocumentation.org/packages/Rsamtools/versions/1.24.0/topics/ScanBamParam>`_ for details.





26 changes: 0 additions & 26 deletions drop/GeneticDiagnosis_Demo.R

This file was deleted.

24 changes: 3 additions & 21 deletions drop/__init__.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,3 @@
from .setupDrop import setupDrop as drop
from .configHelper import ConfigHelper as config
from .submodules import *

def init():
wbuild.cli.init()
# compy our template

def update():
wbuild.cli.update()

if __name__ == '__main__':
import sys
import wbuild

arg = sys.args[1]
if arg == 'init':
init()
elif arg == 'update':
update()

from .setupDrop import *
from . import config
from . import utils
Loading

0 comments on commit 5ce58f4

Please sign in to comment.