Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiqc #14

Merged
merged 33 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
bfcac3e
add ngs_qa_qc dockerfile and ep
rroutsong Oct 5, 2023
61c62cc
begin restructuring of workflows for extension of demux workflow
rroutsong Oct 5, 2023
80d48c0
merge in dry_run action changes
rroutsong Oct 6, 2023
726c174
start ngs qc/qa pipeline, fastqc - trimmed/untrimmed + fastp trimming
rroutsong Oct 6, 2023
88d43eb
ngsqc fastqc, trimming, fastqc again after trim
rroutsong Oct 10, 2023
b91ad97
fix line endings on ep.sh
rroutsong Oct 11, 2023
b0f0407
add in fastq screen config to docker
rroutsong Oct 11, 2023
5a4b4fe
final ngsqc dockerfile
rroutsong Oct 12, 2023
3fa9e92
feat: expand ngs qc workflow
rroutsong Oct 13, 2023
0f5d1a0
feat: python module support for ngs qc-qa
rroutsong Oct 13, 2023
0101883
chore: ignore jsons, lower latentcy wait for biowulf
rroutsong Oct 13, 2023
b8a3b48
feat: finalize first half of ngsqc pipeline
rroutsong Oct 17, 2023
64252ef
fix: snakemake pathing correction
rroutsong Oct 17, 2023
160e368
fix: dry run action needs -s kwarg
rroutsong Oct 17, 2023
8034709
chore: fix args in github action
rroutsong Oct 17, 2023
b95ff7b
chore: refactor setuptools package, force include workflow and profil…
rroutsong Oct 17, 2023
889554d
chore: dry run command not being executed
rroutsong Oct 17, 2023
a9d1fd0
feat: kaiju and kraken annotation rules, beginning
rroutsong Oct 19, 2023
a6e5550
chore: merge in conda2src changes, drop using conda
rroutsong Oct 19, 2023
52558fc
chore: fix more merge conflicts
rroutsong Oct 19, 2023
797119a
feat: working kraken & kaiju
rroutsong Oct 24, 2023
63fd0a9
chore: remap outputs to discussed structure
rroutsong Oct 24, 2023
82b690e
fix: align io paths for ngsqc workflow
rroutsong Oct 24, 2023
bdb8c1d
chore: relocate slurm logs directory
rroutsong Oct 25, 2023
2d74018
fix: correct fastqc_trimmed output paths
rroutsong Oct 25, 2023
bc51fb2
fix: dry run action, add path to env
rroutsong Oct 25, 2023
34a33cd
fix: broken path in dry run action
rroutsong Oct 25, 2023
c063c04
fix: cat from /Users/routsongrm/git/NGS/Dmux
rroutsong Oct 25, 2023
ab25f77
fix: cat from \$PWD/NGS/Dmux
rroutsong Oct 25, 2023
25cbc30
Merge remote-tracking branch 'refs/remotes/origin/add_kaiju_kraken' i…
rroutsong Oct 25, 2023
c86aff4
feat: add multiqc report
rroutsong Oct 25, 2023
2ced134
chore: merge in main, fix mutliqc output report name
rroutsong Oct 25, 2023
e787239
fix: new flag for dry run in CI
rroutsong Oct 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/dryrun.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
- name: Dry Run with test data
run: |
docker run -h cn0000 -v $PWD:/opt2 snakemake/snakemake:stable /bin/bash -c \
"pip install /opt2; dmux run -s /opt2/.tests/illumnia_demux -o /opt2/.tests/illumnia_demux/dry_run_out --local --pretend /opt2/.tests/illumnia_demux"
"pip install /opt2; dmux run -s /opt2/.tests/illumnia_demux -o /opt2/.tests/illumnia_demux/dry_run_out --local --dry-run /opt2/.tests/illumnia_demux"
- name: View the pipeline config file
run: |
echo "Generated config file for pipeline...." && cat $PWD/.tests/illumnia_demux/dry_run_out/EXP_PROJ_demux/.config/config_job_0.json
Expand Down
10 changes: 5 additions & 5 deletions bin/dmux.py
rroutsong marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ def run(args):
config['bcl_files'].append(list(Path(rundir).rglob('*.bcl.*')))
out_to = Path(args.output, f"{sample_sheet.Header['Experiment Name']}_demux") if args.output \
else Path(rundir, f"{sample_sheet.Header['Experiment Name']}_demux")
utils.valid_run_output(out_to, dry_run=args.pretend)
utils.valid_run_output(out_to, dry_run=args.dry_run)
config['out_to'].append(out_to)

utils.exec_demux_pipeline(config, dry_run=args.pretend, local=args.local)
utils.exec_demux_pipeline(config, dry_run=args.dry_run, local=args.local)

# if qc not disabled:
# - mutate config into structs/data appropriate for `args`
Expand Down Expand Up @@ -80,7 +80,7 @@ def ngsqc(args):

configs['out_to'].append(out_base)

utils.exec_ngsqc_pipeline(configs, dry_run=args.pretend, local=args.local)
utils.exec_ngsqc_pipeline(configs, dry_run=args.dry_run, local=args.local)


def logs(args):
Expand All @@ -103,7 +103,7 @@ def logs(args):
'matching run ids, if not using full paths.')
parser_run.add_argument('-o', '--output', metavar='<output directory>', default=None, type=str,
help='Top-level output directory for demultiplexing data (defaults to input directory + runid + "_demux")')
parser_run.add_argument('-p', '--pretend', action='store_true',
parser_run.add_argument('-d', '--dry-run', action='store_true',
help='Dry run the demultiplexing workflow')
parser_run.add_argument('-l', '--local', action='store_true',
help='Execute pipeline locally without a dispatching executor')
Expand All @@ -119,7 +119,7 @@ def logs(args):
parser_ngs_qc.add_argument('-s', '--seq_dir', metavar='<sequencing directory>', default=None, type=str,
help='Root directory for sequencing data (defaults for biowulf/bigsky/locus), must contain directories ' + \
'matching run ids, if not using full paths.')
parser_ngs_qc.add_argument('-p', '--pretend', action='store_true',
parser_ngs_qc.add_argument('-d', '--dry-run', action='store_true',
help='Dry run the demultiplexing workflow')
parser_ngs_qc.add_argument('-l', '--local', action='store_true',
help='Execute pipeline locally without a dispatching executor')
Expand Down
2 changes: 2 additions & 0 deletions src/Dmux/workflow/ngs_qaqc/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ rule all:
# kaiju
expand("{out_dir}/{project}/{rid}/{sids}/kaiju/{sids}.tsv", out_dir=config['out_to'], sids=config['sids'],
project=config['projects'], rid=config['run_ids']),
f"{config['out_to']}/{config['projects']}/{config['run_ids']}/multiqc/multiqc_report.html",


include: "fastq.smk"
include: "qc.smk"
11 changes: 6 additions & 5 deletions src/Dmux/workflow/ngs_qaqc/fastq.smk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change how adapter sequences are being removed. Currently, there is a bug where the barcode sequences from Illumina's sample sheet (i7/i5) sequences are being passed to fastqc and fastp. These barcode sequences should be removed after bcl2fastq step and do not represent traditional library-prep-kit-specific adapter sequences that need to removed. With that being said, let's make use of fastp's auto-detect-adapter-sequences feature to remove them. We can also make use of fastqc's internal contaminates/adapters list to identify sequencing adapters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's fastp rule in new branch master_job_and_bigsky:

    shell:
        """
        fastp \
        --detect_adapter_for_pe \
        --in1 {input.in_read1} --in2 {input.in_read2} \
        --out1 {output.out_read1} \
        --out2 {output.out_read2} \
        --html {output.html} \
        --json {output.json} \
        """

Fastqc:

    shell:
        """
        mkdir -p {params.output_dir}
        fastqc -o {params.output_dir} -t {threads} {input.samples}
        """

FastQC before trim depends on demuxed reads, after trimmed depends on trimmed reads file.

Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@ rule trim_w_fastp:
in_read2 = config['demux_dir'] + "/{project}/{sid}_R2_001.fastq.gz",
output:
html = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastp/{sid}.html",
json = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastp/{sid}.json",
json = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastp/{sid}_fastp.json",
out_read1 = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastp/{sid}_trimmed_R1.fastq.gz",
out_read2 = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastp/{sid}_trimmed_R2.fastq.gz",
params:
adapters = get_adapter_opts,
# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
threads: 4,
resources: mem_mb = 8192,
Expand Down Expand Up @@ -42,7 +41,6 @@ rule fastq_screen:
subset = 1000000,
aligner = "bowtie2",
output_dir = lambda w: config['out_to'] + "/" + w.project + "/" + config['run_ids'] + "/" + w.sid + "/fastq_screen/",
# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add an option to point to a sif cache and dynamically resolve one of the following: a local SIF on the file-system or a URI to pull an image from Dockerhub.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a solution to this issue in the next coming PR. I have serialized the server-centric SIF directories and dynamically adding the specific server configuration at initialization time.

Ends up like:

containerized: server_config["sif"] + "dmux_ngsqc_0.0.1.sif"

SIF cache is always specified at execution time through environmental variables and subprocess.

threads: 4,
resources: mem_mb = 8192,
Expand All @@ -65,11 +63,13 @@ rule kaiju_annotation:
read2 = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastp/{sid}_trimmed_R2.fastq.gz",
output:
kaiju_report = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/kaiju/{sid}.tsv",
kaiju_species = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/kaiju/{sid}_species.tsv",
kaiju_phylum = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/kaiju/{sid}_phylum.tsv",
params:
# TODO: soft code these paths
nodes = "/data/OpenOmics/references/Dmux/kaiju/kaiju_db_nr_euk_2023-05-10/nodes.dmp",
names = "/data/OpenOmics/references/Dmux/kaiju/kaiju_db_nr_euk_2023-05-10/names.dmp",
database = "/data/OpenOmics/references/Dmux/kaiju/kaiju_db_nr_euk_2023-05-10/kaiju_db_nr_euk.fmi",
# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
log: config['out_to'] + "/.logs/{project}/" + config['run_ids'] + "/kaiju/{sid}.log",
threads: 24
Expand All @@ -86,6 +86,8 @@ rule kaiju_annotation:
-j {input.read1} \
-z {threads} \
-o {output.kaiju_report}
kaiju2table -t {params.nodes} -n {params.names} -r species -o {output.kaiju_species} {output.kaiju_report}
kaiju2table -t {params.nodes} -n {params.names} -r phylum -o {output.kaiju_phylum} {output.kaiju_report}
"""


Expand All @@ -98,7 +100,6 @@ rule kraken_annotation:
kraken_log = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/kraken/{sid}.log",
params:
kraken_db = "/data/OpenOmics/references/Dmux/kraken2/k2_pluspfp_20230605"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a method to dynamically resolve the reference files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed this in the next PR. I just kind of saved all the server resolution methods until I moved onto bigsky.

# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
log: config['out_to'] + "/.logs/{project}/" + config['run_ids'] + "/kraken/{sid}.log",
threads: 24
Expand Down
45 changes: 43 additions & 2 deletions src/Dmux/workflow/ngs_qaqc/qc.smk
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ rule fastqc_untrimmed:
output_dir = lambda w: config['out_to'] + "/" + w.project + "/" + config['run_ids'] + "/" + w.sid + "/fastqc_untrimmed/"
log: config['out_to'] + "/.logs/{project}/" + config['run_ids'] + "/fastqc_untrimmed/{sid}_R{rnum}.log"
threads: 4
# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
resources: mem_mb = 8096
shell:
Expand All @@ -47,7 +46,6 @@ rule fastqc_trimmed:
fqreport = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/fastqc_trimmed/{sid}_trimmed_R{rnum}_fastqc.zip",
params:
output_dir = lambda w: config['out_to'] + "/" + w.project + "/" + config['run_ids'] + "/" + w.sid + "/fastqc_trimmed/"
# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
threads: 4
resources: mem_mb = 8096
Expand All @@ -57,3 +55,46 @@ rule fastqc_trimmed:
mkdir -p {params.output_dir}
fastqc -o {params.output_dir} -t {threads} {input.in_read}
"""


rule multiqc_report:
input:
# fastqc on untrimmed reads
expand("{out_dir}/{project}/{rid}/{sids}/fastqc_untrimmed/{sids}_R{rnum}_001_fastqc.zip", out_dir=config['out_to'],
project=config['projects'], rid=config['run_ids'], sids=config['sids'], rnum=config['rnums']),
# fastqc on trimmed reads
expand("{out_dir}/{project}/{rid}/{sids}/fastqc_trimmed/{sids}_trimmed_R{rnum}_fastqc.zip", out_dir=config['out_to'],
sids=config['sids'], project=config['projects'], rid=config['run_ids'], rnum=config['rnums']),
# fastp trimming metrics
expand("{out_dir}/{project}/{rid}/{sids}/fastp/{sids}_trimmed_R{rnum}.fastq.gz", out_dir=config['out_to'],
sids=config['sids'], project=config['projects'], rid=config['run_ids'], rnum=config['rnums']),
# fastq screen
expand("{out_dir}/{project}/{rid}/{sids}/fastq_screen/{sids}_trimmed_R{rnum}_screen.html", out_dir=config['out_to'],
sids=config['sids'], rnum=config['rnums'], rid=config['run_ids'], project=config['projects']),
# kraken2
expand("{out_dir}/{project}/{rid}/{sids}/kraken/{sids}.tsv", out_dir=config['out_to'], sids=config['sids'],
project=config['projects'], rid=config['run_ids']),
# kaiju
expand("{out_dir}/{project}/{rid}/{sids}/kaiju/{sids}.tsv", out_dir=config['out_to'], sids=config['sids'],
project=config['projects'], rid=config['run_ids']),
output:
mqc_report = f"{config['out_to']}/{config['projects']}/{config['run_ids']}" + \
"/multiqc/Run-" + config['run_ids'] + \
"-Project-" + config['projects'] + "_multiqc_report.html"
params:
input_dir = config['out_to'],
demux_dir = config['demux_dir'],
output_dir = config['out_to'] + "/" + config['projects'] + "/" + config['run_ids'] + "/multiqc/",
report_title = f"Run: {config['run_ids']}, Project: {config['projects']}",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
threads: 4
resources: mem_mb = 8096
log: config['out_to'] + "/.logs/" + config['projects'] + "/" + config['run_ids'] + "/multiqc/multiqc.log"
shell:
"""
multiqc -q -ip \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we may want to point to a MutliQC config file to clean up the general statistics table, create two sections for fastqc, and create a preferred module order in the final report.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outlined in #15

--title \"{params.report_title}\" \
-o {params.output_dir} \
{params.input_dir} {params.demux_dir} \
--ignore ".cache" --ignore ".config" --ignore ".snakemake" --ignore ".slurm" --ignore ".singularity" --ignore ".logs"
"""