Optimize your data processing pipelines with doepipeline. The optimization strategy implemented in doepipeline is based on methods from statistical Design of Experiments (DoE). Use it to optimize quantitative and/or qualitative factors of simple (single tool) or complex (multiple tool) pipelines.
- Community developed: Users are welcome to contribute to add additional functionality.
- Installation: Easy installation through conda or PyPI.
- Generic: The optimization is useful for all kinds of CLI applications.
Take a look at the wiki documentation to getting started using doepipeline. Briefly, the following steps are needed to start using doepipeline.
Four example cases (including data and configuration files) are provided to as help getting started:
- de-novo genome assembly
- scaffolding of a fragmented genome assembly
- k-mer taxonomic classification of ONT MinION reads
- genetic variant calling
doepipeline: a systematic approach for optimizing multi-level and multi-step data processing workflows Svensson D, Sjögren R, Sundell D, Sjödin A, Trygg J BioRxiv doi: https://doi.org/10.1101/504050
doepipeline is implemented as a Python package. It is open source software made available under the MIT license.
If you experience any difficulties with this software, or you have suggestions, or want to contribute directly, you have the following options:
- submit a bug report or feature request to the issue tracker
- contribute directly to the source code through the github repository. 'Pull requests' are especially welcome.