Skip to content

PM_WGS_pipeline

Kai Blumberg edited this page Nov 8, 2021 · 56 revisions

cd /xdisk/bhurwitz/mig2020/rsgrps/bhurwitz/kai/planet-microbe-functional-annotation/

modify the cluster.yml and config.yml files

add out and err directories

Change run_snakemake.sh pm_env to snakemake and path for cd to my version.

copied matt's verison's bowtie.simg to my singlarity dir

bowtie index folder is in: /xdisk/bhurwitz/mig2020/rsgrps/bhurwitz/planet-microbe-functional-annotation/data copy that into my version of the git repo so that the whole thing is portable

submit the main snakemake job which will submit other jobs need to make sure this isnt' submitting too many

sbatch run_snakemake.sh //submission for single-threaded version

//old sbatch submit_snakemake.sh //submission for multi-threaded version

squeue -u kblumberg

scancel job_ID_number

scancel -u kblumberg //cancel all my jobs

va // shows allocation remaining on HPC.

uquota shows quota for group

du -sh kai/ size of directory

scp [email protected]:/xdisk/bhurwitz/mig2020/rsgrps/bhurwitz/kai/planet-microbe-functional-annotation/bash/check_qc.sh .

sbatch headers

#SBATCH --account=bhurwitz
#SBATCH --partition=standard
#SBATCH --partition=windfall

Cleaning up to save space

in results/completed/

rm */step_05_chunk_reads/ -r

rm */step_06_get_orfs/ -r

irods

iinit standard command to get irods started doesn't work on my head node but does in interactive

Command to copy files over from cyverse to my data directory.

iget -PT /iplant/home/shared/planetmicrobe/sra/SRR4831663.fastq.gz /xdisk/bhurwitz/mig2020/rsgrps/bhurwitz/kai/planet-microbe-functional-annotation/data

UA hpc

https://public.confluence.arizona.edu/display/UAHPC/HPC+Documentation

https://public.confluence.arizona.edu/display/UAHPC/Puma+Quick+Start

Redo List

SRR5002349
SRR9178501_1
SRR9178319_1
SRR9178098_1
SRR9178375_1
SRR5002378
SRR5002337
SRR9178233_1
SRR9178483_1
SRR9178089_1
SRR5002311
SRR5002344
SRR9178407_1
SRR9178147_1
SRR9178359_1
SRR9178281_1
SRR5002319
SRR9178156_1
SRR9178118_1

run failures

Run 5



SRR5002349
total 0

SRR9178501_1
total 43M

SRR9178319_1
total 40M

SRR9178098_1
total 49M

SRR9178375_1
total 56M

SRR5002378
total 37M

SRR5002337
total 67M

SRR9178233_1
total 50M

SRR9178483_1
total 50M

SRR9178089_1
total 42M

SRR5002311
total 52M

SRR5002344
total 70M

SRR9178407_1
total 56M

SRR9178147_1
total 45M

SRR9178359_1
total 45M

SRR9178281_1
total 69M

SRR5002319
total 68M

SRR9178156_1
total 85M

SRR9178118_1
total 81M

old

Debugging the snakemake pipeline steps should be:

  1. Check the slurm output file and see which rule crashed
  2. Check the error file and see if there's useful information about the crash
  3. Check the log file (found at e.g. results/SRR4831664/step_01_trimming/log) for specific information about the running of that step's executable.
interactive

source ~/.bashrc

conda env create -f kraken2.yml

conda env create -f bracken.yml

conda env create -f pm_env.yml   // this failed make a new pm_env.yml with snakemake

# steps to create pm_env again do this in interactive
conda create -n pm_env

conda activate pm_env

conda install -n base -c conda-forge mamba

mamba create -c conda-forge -c bioconda -n snakemake snakemake

snakemake conda env

conda install -n base -c conda-forge mamba install mamba to install snakemake

mamba create -c conda-forge -c bioconda -n snakemake snakemake install snakemake this made a new conda environment called snakemake

conda install -c conda-forge biopython added bioptyon

conda install -c anaconda java-1.7.0-openjdk-cos6-x86_64 also added java from here and here. this didn't stay after loging back in didn't get added to PATH? Try conda install -c conda-forge openjdk from here

other

interpro docs https://interproscan-docs.readthedocs.io/en/latest/UserDocs.html bash script

/groups/bhurwitz/tools/interproscan-5.46-81.0/interproscan.sh -appl Pfam -i results/SRR4831664/step_05_chunk_reads/SRR4831664_trimmed_qcd_frags_2047.faa -b results/SRR4831664/step_06_get_orfs/SRR4831664_trimmed_qcd_frags_2047_interpro -goterms -iprlookup -dra -cpu 4 after installing java 11 this works but I'm still getting the log for step 6 saying

Java version 11 is required to run InterProScan. Detected version 1.8.0_292 Please install the correct version.

Changed the Snakemake interproscan rule to activate the snakemake conda env

Switch between pipeline Run section 4 and 7

In the directory job_runs/snakefile_versions we have the files regular_Snakefile to do step 7 and Snakefile_upto_step4 to do until step 4.

From /xdisk/bhurwitz/mig2020/rsgrps/bhurwitz/kai/planet-microbe-functional-annotation:

cp job_runs/snakefile_versions/Snakefile_upto_step4 Snakefile

cp job_runs/snakefile_versions/regular_Snakefile Snakefile

When running the Snakefile_upto_step4 make sure to up the time to 48 hours in run_snakemake.sh and back to 24 to do the regular_Snakefile. Same with the cluster.yml file.

job runs

Amazon Plume

ran 23 samples (1 done in previous testing) for 48 hours all but 3 finished. (Some were <1G). Not using the multi threaded version.

Amazon River

ran 24 samples for nearly 48 hours snakejob ended but not all finished. The log file said the job had timed out and there were jobs remaining but no snakemake job. So I canceled the snakejobs and resubmitted. Not using the multi threaded version.

HOT Chisholm

First job ran the smallest 24 samples. First time with shell parameter mistake.

tests

when testing the multi threaded version a single job submission sent out (at least at the time I captured it) 117 of the multi threaded ips_0 nodes doing the interproscan step. Didn't have the interproscan working again because I pulled Matt's version of the Snakemakefile without the following shell parameters, which I added back in and re-ran the job for rule run_pipeline. Might need to also add it to rule interproscan.

bash -c '
. $HOME/.bashrc
conda activate snakemake 
Clone this wiki locally