Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding single-read functionality to RAW and CLEAN #80

Merged
merged 108 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from 100 commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
15354f6
Adding single read option to raw/main.nf
simonleandergrimm Oct 21, 2024
ad2115d
Adding WIP version of run.nf to enable testing raw and clean versions…
simonleandergrimm Oct 21, 2024
03ee37a
Created separate versions of summarize-multiqc-single.R and summarize…
simonleandergrimm Oct 22, 2024
b517340
Split processes in fastp to a single read and paired-end read version.
simonleandergrimm Oct 22, 2024
01ea0c5
Split processes in MultiQC to a single read and paired-end read versi…
simonleandergrimm Oct 22, 2024
ad8faf9
Deleted summarizeMultiqcSingle, which was superseded by summarizeMultiqc
simonleandergrimm Oct 22, 2024
ef0e9c8
Split processes in truncateConcat to a single read and paired-end rea…
simonleandergrimm Oct 22, 2024
2535ccd
Created a single_end if clause in Clean to either use the single read…
simonleandergrimm Oct 22, 2024
cbcb109
Created a single_end if clause in hv_screen to either use the single …
simonleandergrimm Oct 22, 2024
c7f8c83
Created a single_end if clause in qc to either use the single read or…
simonleandergrimm Oct 22, 2024
ff0a8be
Renamed test dir to test-paired-end. Added clause in nextflow.config …
simonleandergrimm Oct 22, 2024
6048dd3
Edited gitignore to leave out test-paired-end and test-single-read ru…
simonleandergrimm Oct 22, 2024
92270e5
Fixed name of test-single-end dir to test-single-read
simonleandergrimm Oct 22, 2024
b13ac94
Created a version of test dir that allows the run of single-read data.
simonleandergrimm Oct 22, 2024
dff2302
Added script to quickly download the s3 output of test single read an…
simonleandergrimm Oct 23, 2024
64bb7f4
Added nextflow config for test paired and test single read.
simonleandergrimm Oct 23, 2024
5bd1aec
Fixed if clause in main.nf
simonleandergrimm Oct 23, 2024
c8fd3ac
Updated gen samplesheet scripts to pull in data from s3://nao-mgs-sim…
simonleandergrimm Oct 23, 2024
578fde0
Updated gitignore
simonleandergrimm Oct 23, 2024
59218b9
Activated CLEAN subworkflow in run.nf
simonleandergrimm Oct 23, 2024
fd9dc1e
Starting to adapt Will's https://data.securebio.org/wills-public-note…
simonleandergrimm Oct 23, 2024
81ff0ba
Adding ignoring mgs-results to gitignore
simonleandergrimm Oct 23, 2024
590b2c3
Adding Will's auxiliary scripts to run his quarto notebooks.
simonleandergrimm Oct 23, 2024
6a650b4
Merge branch 'master' into single-read-raw
simonleandergrimm Oct 23, 2024
9f1eb03
Amended qmd somewhat so data imports work.
simonleandergrimm Oct 24, 2024
9622004
Added a flag to summarize-multiqc-single.R that provides info on the…
simonleandergrimm Oct 25, 2024
c61ed0c
Amended logic of split_sample, so it does not split and pull out read…
simonleandergrimm Oct 25, 2024
f8d9c28
Deleting seperate version of summarize-multiqc I created for paired r…
simonleandergrimm Oct 25, 2024
8e1c7b5
Revert "Split processes in MultiQC to a single read and paired-end re…
simonleandergrimm Oct 25, 2024
0ba0552
Revert "Deleted summarizeMultiqcSingle, which was superseded by summa…
simonleandergrimm Oct 25, 2024
8bafee8
Revert "Created a single_end if clause in qc to either use the single…
simonleandergrimm Oct 25, 2024
68c7c50
Amended main.nf of summarizeMultiqcSingle, clean, qc, and raw, to pro…
simonleandergrimm Oct 25, 2024
1656b33
Amended summarize-multiqc-single.R's basic_info_fastqc so it also sub…
simonleandergrimm Oct 25, 2024
4ec6788
Switched the --paired flag to instead be --read_type, and have it be …
simonleandergrimm Oct 25, 2024
f2bb836
Merge branch 'dev' into single-read-raw
simonleandergrimm Oct 25, 2024
e13acc6
Deleted a directory with testing scripts that was superseded by https…
simonleandergrimm Oct 25, 2024
9c62aa4
this script is now in https://github.com/naobservatory/simon-analysis…
simonleandergrimm Oct 25, 2024
be46ee9
Adding normal test dataset back in.
simonleandergrimm Oct 26, 2024
17d61ff
removing new versions of generate_samplesheet.sh (will add two differ…
simonleandergrimm Oct 26, 2024
0ba23fb
Reinstating dev version of run.nf, and creating new version of run.nf…
simonleandergrimm Oct 26, 2024
118378c
Adding run_dev_se to main.nf, a run specifically used for checking if…
simonleandergrimm Oct 26, 2024
8cd5239
Fixing default value for --read_type in summarize-multiqc-single.R. A…
simonleandergrimm Oct 26, 2024
7d3e725
Dropping commented out sections in split_sample
simonleandergrimm Oct 26, 2024
c107e91
Pulling in newest version of generate_samplesheet.sh
simonleandergrimm Oct 26, 2024
2d07ae6
Fixing single vs paired end read logic in hv_screen
simonleandergrimm Oct 26, 2024
74cb53a
Turned generate_samplesheet.sh back into dev version. Will and single…
simonleandergrimm Oct 28, 2024
3b0a11c
Adding read_type information to run.nf so the correct processes are p…
simonleandergrimm Oct 28, 2024
8f6beda
Extended generate_samplesheet.sh so it also takes in single-read data.
Nov 12, 2024
69f404c
Merge branch 'master' into single-read-raw-clean
simonleandergrimm Nov 19, 2024
654dd1c
Amended subworkflows to take in single end data.
simonleandergrimm Nov 19, 2024
2a01243
Merge branch 'master' into single-read-raw-clean
simonleandergrimm Nov 19, 2024
e9f7384
Reworked summarize_multiqc_pair.R to take in single_end data.
simonleandergrimm Nov 19, 2024
793a061
Made run_dev_se.nf follow updates to run.nf, and fixed single_end det…
simonleandergrimm Nov 19, 2024
fdf81af
Dropped two versions of FASTP, created conditional statement instead.
simonleandergrimm Nov 19, 2024
95dcf91
Dropped two different versions of the truncate_concat and added condi…
simonleandergrimm Nov 19, 2024
ada8c5e
dropped conditional selsection of processes.
simonleandergrimm Nov 19, 2024
a448dc9
Fixed single_end variable passing
simonleandergrimm Nov 19, 2024
e9b89be
Added new single read flagging in run.nf
simonleandergrimm Nov 19, 2024
eb82a32
removed old summarize-multiqc file
simonleandergrimm Nov 19, 2024
00ddcfc
fixed index in nextflow.config for paired end data.
simonleandergrimm Nov 19, 2024
e5b5ec5
added grouping and ndew index info to test-single-read config
simonleandergrimm Nov 19, 2024
8e201e7
Adding improved configs
simonleandergrimm Nov 23, 2024
591138d
dropped single end definition in run file.
simonleandergrimm Nov 23, 2024
27244bd
Adding params to single end variable invocation
simonleandergrimm Nov 23, 2024
517961f
removed whitespace
simonleandergrimm Nov 23, 2024
c28749f
updating nextflow.config of test
simonleandergrimm Nov 23, 2024
e132ec4
fixed single_end config in normal run workflow
simonleandergrimm Nov 23, 2024
51b9cf3
make single-end variable logical.
simonleandergrimm Nov 23, 2024
12c3fdd
Reverted to old gitignore structure.
simonleandergrimm Nov 23, 2024
4fd3ce6
Changed test dirs to only have one dir for run_dev_se.
simonleandergrimm Nov 24, 2024
d460813
Adding WIP progress
simonleandergrimm Nov 24, 2024
f412b07
Merge branch 'dev' into single-read-raw-clean
simonleandergrimm Nov 24, 2024
3d10bb0
Fixing single_end being unbound.
simonleandergrimm Nov 24, 2024
7899979
Merge branch 'dev' into single-read-raw-clean
simonleandergrimm Nov 29, 2024
dd942fa
Took into account new testing setup
simonleandergrimm Nov 29, 2024
50c2edc
adding single end info to config
simonleandergrimm Nov 29, 2024
61ea369
Moved single end eval from config to run files
simonleandergrimm Nov 29, 2024
ad640c6
Update nextflow.config
simonleandergrimm Dec 3, 2024
e85dd45
Merge remote-tracking branch 'origin/harmon_fix_gh_actions_test' into…
simonleandergrimm Dec 3, 2024
3a6f6b5
Merge remote-tracking branch 'origin/harmon_fix_gh_actions_test' into…
simonleandergrimm Dec 4, 2024
a0f5f32
Put single_end into profiles.config
simonleandergrimm Dec 4, 2024
d14da14
fixed run-dev-se config in tests
simonleandergrimm Dec 4, 2024
3fe2bd2
Creating a new config for read_type flag.
simonleandergrimm Dec 4, 2024
d0375ab
added run dev se to end-to-end yml
simonleandergrimm Dec 4, 2024
f5cf80a
Made rundevse index and outputs look the same as run.nf
simonleandergrimm Dec 4, 2024
3dc323e
Fixing setup of run_dev_se test config.
simonleandergrimm Dec 5, 2024
1904931
Update .gitignore (dropped new line)
simonleandergrimm Dec 5, 2024
e24d79e
Setting profiles.config back to original
simonleandergrimm Dec 5, 2024
b38b93d
Updated comments in main.nf to represent the posiblity of not not ala…
simonleandergrimm Dec 9, 2024
21b15b8
Fixed duplicate par statement in fastp.
simonleandergrimm Dec 9, 2024
9d717b7
Responding to Harmon's comments.
simonleandergrimm Dec 9, 2024
ee7baf4
dropped unncessary single-end variable.
simonleandergrimm Dec 9, 2024
034914b
fixed faulty paired-end fastp
simonleandergrimm Dec 10, 2024
c5454b9
added end-to-end-se.yml
simonleandergrimm Dec 10, 2024
4b966d8
adedd subworkflow to create samplesheet
simonleandergrimm Dec 11, 2024
7a3a59b
split truncate concat into two processes/
simonleandergrimm Dec 11, 2024
6ad3ce2
removed run dev se from end to end yml.
simonleandergrimm Dec 11, 2024
c096c48
fixed samplesheet typo.
simonleandergrimm Dec 16, 2024
10dbc48
Put additional things into loadsamplesheet.
simonleandergrimm Dec 16, 2024
4384fc4
added params. info
simonleandergrimm Dec 16, 2024
92c9312
Added new logic for handling start_time_str variable.
simonleandergrimm Dec 18, 2024
2c356f8
Update .gitignore
simonleandergrimm Dec 18, 2024
69386c7
Update end-to-end.yml
simonleandergrimm Dec 18, 2024
a96cc36
Update .gitignore
simonleandergrimm Dec 18, 2024
73d1e70
Updated index
simonleandergrimm Dec 18, 2024
e2af24d
Merge branch 'single-read-raw-clean' of https://github.com/naobservat…
simonleandergrimm Dec 18, 2024
65a7b76
Edited CHANGELOG.md to take into account changes made.
simonleandergrimm Dec 18, 2024
be30318
Amended CHANGELOG.md with changes suggested by Will
simonleandergrimm Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/end-to-end-se.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: End-to-end MGS workflow test for single-end run

on: [pull_request]

jobs:
test-run-dev-se:
runs-on: ubuntu-latest
timeout-minutes: 10

steps:
- name: Checkout
uses: actions/checkout@v4


- name: Set up JDK 11
uses: actions/setup-java@v4
with:
java-version: '11'
distribution: 'adopt'

- name: Setup Nextflow latest (stable)
uses: nf-core/setup-nextflow@v1
with:
version: "latest"

- name: Install nf-test
run: |
wget -qO- https://get.nf-test.com | bash
sudo mv nf-test /usr/local/bin/

- name: Run run_dev_se workflow
run: nf-test test --tag run_dev_se --verbose
1 change: 1 addition & 0 deletions .github/workflows/end-to-end.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ jobs:
- name: Checkout
uses: actions/checkout@v4


- name: Set up JDK 11
uses: actions/setup-java@v4
with:
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonleandergrimm can you sync up with @harmonbhasin re naming here? I think he's going to rename the test directory anyway due to conflict with nf-test.

FWIW I'd prefer something like test/single/... and test/paired/... to keep the main directory clean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what are these?

analysis_files/*
mgs-results/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harmonbhasin What are your thoughts regarding having a test dataset for paired-end and single-end data? Could you rejig your test dataset by e.g., simply keeping the forward reads?

Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ test/.nextflow*
pipeline_report.txt

.nf-test/
.nf-test.log
.nf-test.log
91 changes: 74 additions & 17 deletions bin/generate_samplesheet.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harmonbhasin to review changes to this file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harmonbhasin ping on this.

simonleandergrimm marked this conversation as resolved.
Show resolved Hide resolved
simonleandergrimm marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you're interested in this, but if you want to turn this script into python, I wouldn't be mad lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/bin/bash


set -u
set -e

Expand All @@ -10,10 +11,28 @@ dir_path=""
forward_suffix=""
reverse_suffix=""
s3=0
single_end=0
output_path="samplesheet.csv" # Default output path
group_file="" # Optional parameter for the group file
group_across_illumina_lanes=false

# Function to print usage
print_usage() {
echo "Usage:"
echo "For paired-end reads:"
echo " $0 --dir_path <path> --forward_suffix <suffix> --reverse_suffix <suffix> [--s3] [--output_path <path>]"
echo "For single-end reads:"
echo " $0 --dir_path <path> --single_end [--s3] [--output_path <path>]"
echo
echo "Options:"
echo " --dir_path Directory containing FASTQ files"
echo " --forward_suffix Suffix for forward reads (required for paired-end only)"
echo " --reverse_suffix Suffix for reverse reads (required for paired-end only)"
echo " --single_end Flag for single-end data"
echo " --s3 Flag for S3 bucket access"
echo " --output_path Output path for samplesheet (default: samplesheet.csv)"
}

# Parse command-line arguments
while [[ $# -gt 0 ]]; do
case $1 in
Expand All @@ -33,10 +52,18 @@ while [[ $# -gt 0 ]]; do
s3=1
shift
;;
--single_end)
single_end=1
shift
;;
--output_path)
output_path="$2"
shift 2
;;
--help)
print_usage
exit 0
;;
--group_file) # Optional group file
group_file="$2"
shift 2
Expand All @@ -47,20 +74,22 @@ while [[ $# -gt 0 ]]; do
;;
*)
echo "Unknown option: $1"
print_usage
exit 1
;;
esac
done

# Check if all required parameters are provided
if [[ -z "$dir_path" || -z "$forward_suffix" || -z "$reverse_suffix" ]]; then
echo "Error: dir_path, forward_suffix, and reverse_suffix are required."
if [[ -z "$dir_path" || -z "$single_end" ]]; then
echo "Error: dir_path and single_end are required."
echo -e "\nUsage: $0 [options]"
echo -e "\nRequired arguments:"
echo -e " --dir_path <path> Directory containing FASTQ files"
echo -e " --forward_suffix <suffix> Suffix identifying forward reads, supports regex (e.g., '_R1_001' or '_1')"
echo -e " --reverse_suffix <suffix> Suffix identifying reverse reads, supports regex (e.g., '_R2_001' or '_2')"
echo -e " --single_end Flag for single-end data"
echo -e "\nOptional arguments:"
echo -e " --forward_suffix <suffix> When single_end is 0, suffix identifying forward reads, supports regex (e.g., '_R1_001' or '_1')"
echo -e " --reverse_suffix <suffix> When single_end is 0, suffix identifying reverse reads, supports regex (e.g., '_R2_001' or '_2')"
echo -e " --s3 Use if files are stored in S3 bucket"
echo -e " --output_path <path> Output path for samplesheet [default: samplesheet.csv]"
echo -e " --group_file <path> Path to group file for sample grouping [header column must have the names 'sample,group' in that order; additional columns may be included, however they will be ignored by the script]"
Expand All @@ -74,15 +103,28 @@ if $group_across_illumina_lanes && [[ -n "$group_file" ]]; then
exit 1
fi

if [ $single_end -eq 0 ]; then
# Paired-end validation
if [[ -z "$forward_suffix" || -z "$reverse_suffix" ]]; then
echo "Error: forward_suffix and reverse_suffix are required for paired-end reads."
print_usage
exit 1
fi
fi

# Display the parameters
echo "Parameters:"
echo "dir_path: $dir_path"
echo "forward_suffix: $forward_suffix"
echo "reverse_suffix: $reverse_suffix"
echo "single_end: $single_end"
echo "s3: $s3"
echo "output_path: $output_path"
echo "group_file: $group_file"
echo "group_across_illumina_lanes: $group_across_illumina_lanes"
if [ $single_end -eq 0 ]; then
echo "forward_suffix: $forward_suffix"
echo "reverse_suffix: $reverse_suffix"
fi



#### EXAMPLES ####
Expand All @@ -109,30 +151,45 @@ echo "group_across_illumina_lanes: $group_across_illumina_lanes"
# Create a temporary file for the initial samplesheet
temp_samplesheet=$(mktemp)

echo "sample,fastq_1,fastq_2" > "$temp_samplesheet"
# Create header based on single_end flag
if [ $single_end -eq 0 ]; then
echo "sample,fastq_1,fastq_2" > "$temp_samplesheet"
else
echo "sample,fastq" > "$temp_samplesheet"
fi
echo "group_file: $group_file"


# Ensure dir_path ends with a '/'
if [[ "$dir_path" != */ ]]; then
dir_path="${dir_path}/"
fi

listing=0

# Get file listing based on s3 flag
if [ $s3 -eq 1 ]; then
listing=$(aws s3 ls ${dir_path} | awk '{print $4}')
else
listing=$(ls ${dir_path} | awk '{print $1}')
fi

echo "$listing" | grep "${forward_suffix}\.fastq\.gz$" | while read -r forward_read; do
sample=$(echo "$forward_read" | sed -E "s/${forward_suffix}\.fastq\.gz$//")
reverse_read=$(echo "$listing" | grep "${sample}${reverse_suffix}\.fastq\.gz$")
# If sample + reverse_suffix exists in s3_listing, then add to samplesheet
if [ -n "$reverse_read" ]; then
echo "$sample,${dir_path}${forward_read},${dir_path}${reverse_read}" >> "$temp_samplesheet"
fi
done
# Process files based on single_end flag
if [ $single_end -eq 0 ]; then
# Paired-end processing
echo "$listing" | grep "${forward_suffix}\.fastq\.gz$" | while read -r forward_read; do
sample=$(echo "$forward_read" | sed -E "s/${forward_suffix}\.fastq\.gz$//")
reverse_read=$(echo "$listing" | grep "${sample}${reverse_suffix}\.fastq\.gz$")
# If sample + reverse_suffix exists in s3_listing, then add to samplesheet
if [ -n "$reverse_read" ]; then
echo "$sample,${dir_path}${forward_read},${dir_path}${reverse_read}" >> "$temp_samplesheet"
fi
done
else
# Single-end processing - just process all fastq.gz files
echo "$listing" | grep "\.fastq\.gz$" | while read -r read_file; do
sample=$(echo "$read_file" | sed -E "s/\.fastq\.gz$//")
echo "$sample,${dir_path}${read_file}" >> "$temp_samplesheet"
done
fi

# Check if group file is provided
if [[ -n "$group_file" ]]; then
Expand Down
6 changes: 6 additions & 0 deletions configs/read_type.config
willbradshaw marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Universal flags for read type (single-end vs paired-end)

params {
// Whether the underlying data is paired-end or single-end
single_end = new File(params.sample_sheet).text.readLines()[0].contains('fastq_2') ? false : true
}
1 change: 1 addition & 0 deletions configs/run.config
simonleandergrimm marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,5 @@ includeConfig "${projectDir}/configs/containers.config"
includeConfig "${projectDir}/configs/resources.config"
includeConfig "${projectDir}/configs/profiles.config"
includeConfig "${projectDir}/configs/output.config"
includeConfig "${projectDir}/configs/read_type.config"
process.queue = "will-batch-queue" // AWS Batch job queue
34 changes: 34 additions & 0 deletions configs/run_dev_se.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/************************************************
| CONFIGURATION FILE FOR NAO VIRAL MGS WORKFLOW |
************************************************/

params {
mode = "run_dev_se"

// Directories
base_dir = "s3://nao-mgs-simon/test_single_read" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-mgs-wb/index-20241113/output" // Reference/index directory (generated by index workflow)

// Files
sample_sheet = "${launchDir}/samplesheet.csv" // Path to library TSV
adapters = "${projectDir}/ref/adapters.fasta" // Path to adapter file for adapter trimming

// Numerical
grouping = false // Whether to group samples by 'group' column in samplesheet
n_reads_trunc = 0 // Number of reads per sample to run through pipeline (0 = all reads)
n_reads_profile = 1000000 // Number of reads per sample to run through taxonomic profiling
bt2_score_threshold = 20 // Normalized score threshold for HV calling (typically 15 or 20)
blast_hv_fraction = 0 // Fraction of putative HV reads to BLAST vs nt (0 = don't run BLAST)
kraken_memory = "128 GB" // Memory needed to safely load Kraken DB
quality_encoding = "phred33" // FASTQ quality encoding (probably phred33, maybe phred64)
fuzzy_match_alignment_duplicates = 0 // Fuzzy matching the start coordinate of reads for identification of duplicates through alignment (0 = exact matching; options are 0, 1, or 2)
host_taxon = "vertebrate"
}

includeConfig "${projectDir}/configs/logging.config"
includeConfig "${projectDir}/configs/containers.config"
includeConfig "${projectDir}/configs/resources.config"
includeConfig "${projectDir}/configs/profiles.config"
includeConfig "${projectDir}/configs/output.config"
includeConfig "${projectDir}/configs/read_type.config"
process.queue = "simon-batch-queue" // AWS Batch job queue
3 changes: 3 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include { RUN } from "./workflows/run"
include { RUN_VALIDATION } from "./workflows/run_validation"
include { INDEX } from "./workflows/index"
include { RUN_DEV_SE } from "./workflows/run_dev_se"

workflow {
if (params.mode == "index") {
Expand All @@ -9,6 +10,8 @@ workflow {
RUN()
} else if (params.mode == "run_validation") {
RUN_VALIDATION()
} else if (params.mode == "run_dev_se") {
RUN_DEV_SE()
}
}

Expand Down
38 changes: 37 additions & 1 deletion modules/local/fastp/main.nf
simonleandergrimm marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
process FASTP {
process FASTP_PAIRED {
label "max"
label "fastp"
input:
Expand Down Expand Up @@ -32,6 +32,41 @@ process FASTP {
'''
}

process FASTP_SINGLE {
label "max"
label "fastp"
input:
// reads is a list of two files: forward/reverse reads
tuple val(sample), path(reads)
path(adapters)
output:
tuple val(sample), path("${sample}_fastp.fastq.gz"), emit: reads
tuple val(sample), path("${sample}_fastp_failed.fastq.gz"), emit: failed
tuple val(sample), path("${sample}_fastp.{json,html}"), emit: log
shell:
/* Cleaning not done in CUTADAPT or TRIMMOMATIC:
* Higher quality threshold for sliding window trimming;
* Removing poly-X tails;
* Automatic adapter detection;
* Base correction in overlapping paired-end reads;
* Filter low complexity reads.
*/
'''
# Define paths and subcommands
of=!{sample}_fastp_failed.fastq.gz
oj=!{sample}_fastp.json
oh=!{sample}_fastp.html
ad=!{adapters}
o=!{sample}_fastp.fastq.gz
io="--in1 !{reads[0]} --out1 ${o} --failed_out ${of} --html ${oh} --json ${oj} --adapter_fasta ${ad}"
par="--cut_front --cut_tail --correction --detect_adapter_for_pe --trim_poly_x --cut_mean_quality 20 --average_qual 20 --qualified_quality_phred 20 --verbose --dont_eval_duplication --thread !{task.cpus} --low_complexity_filter"
# Execute
fastp ${io} ${par}
'''
}



// Run FASTP for adapter trimming but don't trim for quality
process FASTP_NOTRIM {
label "max"
Expand Down Expand Up @@ -66,3 +101,4 @@ process FASTP_NOTRIM {
fastp ${io} ${par}
'''
}

5 changes: 3 additions & 2 deletions modules/local/summarizeMultiqcPair/main.nf
simonleandergrimm marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ process SUMMARIZE_MULTIQC_PAIR {
label "single"
input:
tuple val(stage), val(sample), path(multiqc_data)
val(single_end)
output:
tuple path("${stage}_${sample}_qc_basic_stats.tsv.gz"), path("${stage}_${sample}_qc_adapter_stats.tsv.gz"), path("${stage}_${sample}_qc_quality_base_stats.tsv.gz"), path("${stage}_${sample}_qc_quality_sequence_stats.tsv.gz")
shell:
'''
summarize-multiqc-pair.R -i !{multiqc_data} -s !{stage} -S !{sample} -o ${PWD}
summarize-multiqc-pair.R -i !{multiqc_data} -s !{stage} -S !{sample} -r !{single_end} -o ${PWD}
'''
}
}
Loading
Loading