-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Welcome to the one-workflow-many-ways wiki!
As the README says, the point of this project is to get an idea of how easy or hard it is for a beginner to implement a basic workflow in different workflow systems. With the exception of bash, I am a beginner at all of these.
I whipped up this script in approximately 10 minutes and then spent another 30 minutes making it nice. I consider this the 'baseline' by which other scripts are measured.
- bash file : https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/bash/bamqc.sh
- results : https://travis-ci.org/morgantaschuk/one-workflow-many-ways/jobs/349431822
This was the first new workflow language I tried.
- wdl file: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/wdl/bamqc.wdl
- inputs: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/wdl/bamqc_inputs.json
- results: https://travis-ci.org/morgantaschuk/one-workflow-many-ways/jobs/349431820
Thoughts:
- The documentation is very good and all in one place. I had a practical working example very quickly
- The WDL file almost like bash.
- some escaping problems. basically nothing appreciates
samtools flagstat $BAM 2>&1 | perl -pe 's|(\d+ \+ \d+)\s+(.*)\R|"$2": "$1",|g' | sed 's/.$//'
- was finicky about colliding names (can't have a global
bamqc
variable and abamqc
task) - had it working pretty quickly
- some escaping problems. basically nothing appreciates
- Cromwell is a little bit verbose (but this is tunable)
- WOMtools lets you autogenerate the inputs.json file.
This was a totally different experience.
- cwl files: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/cwl/workflow.cwl
- input file: https://github.com/morgantaschuk/one-workflow-many-ways/blob/master/cwl/workflow.yml
- results: https://travis-ci.org/morgantaschuk/one-workflow-many-ways/jobs/349431821
Thoughts:
- No pipes??? NO PIPES???
- No you can have pipes but you need to have
requirements: class: ShellCommandRequirement
and then{valueFrom: " | ", shellQuote: false}
(see bamqc.cwl) - ..... ok
- No you can have pipes but you need to have
- Every step in a different cwl file, joined together with a workflow cwl
- documentation is disorganized and the 'user guide' doesn't actually show you anything useful for a really long time http://www.commonwl.org/user_guide/
- Creating workflows (with multiple steps... remember you can't pipe so each step is very small) is buried down in lesson 20
- Specifying inputs and outputs is a bit bonkers
- this is how you name output files:
stdout: bamqc_result.json
outputs:
outjson:
type: stdout
- yes that means the name is 'out of scope' of the actual outfile. For some reason.
- once I had the separate steps (I re-wrote flagstat2json.sh so that it could take a file as well as stdin) then creating the workflow was very simple 👍 for reuse
Interestingly, Toil one broke on the WDL file that worked on Cromwell (commit 7bac144)
task bamqc {
String samtools
File bamqc_pl
File bamfile
File bedfile
String outjson
String xtra_json
command {
eval '${samtools} view ${bamfile} | perl ${bamqc_pl} -r ${bedfile} -j "${xtra_json}" > ${outjson}'
}
output {
File out = "${outjson}"
}
It barfed with a overabundance of quotes:
File "/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py", line 122
eval ''''
^
SyntaxError: EOL while scanning string literal
Traceback (most recent call last):
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/bin/toil-wdl-runner", line 11, in <module>
sys.exit(main())
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/local/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 2312, in main
subprocess.check_call(cmd)
File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py']' returned non-zero exit status 1
In the python script that Toil makes toilwdl_compiled.py
, the task turned into the following block (indentation preserved):
command9 = '''
eval ''''
command10 = samtools
command11 = ''' view '''
command12 = bamfile_fs
command13 = ''' | perl '''
command14 = bamqc_pl_fs
command15 = ''' -r '''
command16 = bedfile_fs
command17 = ''' -j "'''
command18 = xtra_json
command19 = '''" > '''
command20 = outjson
command21 = ''''
'''
So it looks like there's a bug, which I will eventually figure out where to file. In the meantime I'm going to remove the eval statement and the 'single quotes' since that seems to be the issue.
Edit: Looks like Toil also doesn't like constants in WDL files. I set the output filename to "flagstat.json" in the flagstat task and it complains about it too.
Traceback (most recent call last):
File "/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py", line 203, in <module>
job1 = Job.wrapJobFn(flagstat, samtools=SAMTOOLS, flagstat_to_json=flagstat_to_json, bamfile=BAMFILE, outfile=outfile)
NameError: name 'outfile' is not defined
Traceback (most recent call last):
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/bin/toil-wdl-runner", line 11, in <module>
sys.exit(main())
File "/media/mtaschuk/Data/git/one-workflow-many-ways/venv/local/lib/python2.7/site-packages/toil/wdl/toilwdl.py", line 2312, in main
subprocess.check_call(cmd)
File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '/media/mtaschuk/Data/git/one-workflow-many-ways/toilwdl_compiled.py']' returned non-zero exit status 1
I also note that I needed to clean up my local working directory before I could try again: toil.jobStores.abstractJobStore.JobStoreExistsException: The job store '/media/mtaschuk/Data/git/one-workflow-many-ways/toilWorkflowRun' already exists. Use --restart to resume the workflow, or remove the job store with 'toil clean' to start the workflow from scratch
. Which, cool.
Giving up on Toil WDL for the moment because it requires too many changes to the WDL I made for Cromwell.