Skip to content

bentsherman/awesome-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

awesome-workflow

An awesome list of workflow systems. See the table and the analysis.

Sources:

I curated this list to document the workflow systems that exist across many different domains, while trying to adhere to a specific definition of "workflow system". The entries are stored in a tabular format so that I can add metadata and perform analyses more easily.

Guidelines

Workflow systems are included in this list based on the following guidelines:

  • A workflow system should provide a way to describe workflows. A workflow is a sequence of computations that are related by dataflow (rather than control flow).

    Most programming languages are disqualified by this rule, however, Swift (not the one by Apple) can execute tasks in parallel based on dataflow, so I consider it a workflow system.

    Systems that provide a fixed set of workflows, but not a way to describe arbitrary workflows, are also disqualified by this rule.

  • A workflow system should provide a way to execute workflows. Ideally they should be able to delegate computations to distributed computing systems (i.e. HPC, cloud) and provide some kind of error recovery and caching, but I won't require these features.

    For example, workflow languages like CWL and WDL are not workflow systems by themselves, but implementations of them are (e.g. Toil).

    Systems that "wrap" another workflow system are also not included, for example, Seqera Platform (re-uses Nextflow) and Terra (re-uses Cromwell).

  • As a practical matter, the workflow system should be available to use. It should either be open-source or be publicly documented enough for me to verify my other criteria.

    For example, I include paid services like Google Cloud Workflows that are sufficiently documented, but not systems that are described only in research papers or are available only through a login with no public documentation.

Beyond those guidelines, I aim to include workflow systems from a variety of domains and with a variety of architectures. It feels like every field has reinvented this wheel in their own way, but I am fascinated by the variety of these wheels, and I hope to find novel approaches that I can use to improve my own work.

Metadata

I'm currently tracking the following metadata for each entry:

  • Language: The language(s) used by the workflow system to describe workflows (or "GUI" if a graphical interface is supported).

  • Domain: The application domain from which the workflow system originated. Note that many workflow systems are designed to be general-purpose even though they may have started out in a particular domain. Note also that categories are hard, and you could spend an eternity trying to find the best way to categorize a dataset. I have avoided this trap by using a few broad categories that more or less delineate the variations in workflow systems that interest me.

  • GitHub Stars: Number of stars for entries that are on GitHub. Note that stars might be fake, so don't take this metric too seriously. It can be fun to compare projects based on stars (or some composite metric based on issues, PRs, commits, etc) to figure out which ones are more popular, but if you're trying to decide which one to use, you should evaluate each option based on how well it enables you to do whatever you're trying to do. Really you should just use Nextflow.

Contributing

Feel free to submit an issue or PR if you find a workflow system that I haven't included. I'm also happy to be corrected if any of the entries don't meet my stated guidelines... I'll either remove the entry or update my guidelines 😄. There may be a few exceptions that I included because I found them interesting in some relevant way.