Modular Architecture

The YesWorkflow software is envisioned as a set of modules that can be used together or independently. The primary goal of this modularity is to enable YW users and developers independently to implement alternatives to any module, as needed, to solve problems particular to their research domain. It will be possible to develop these alternative implementations and extensions in any programming language.

One way we plan to facilitate such easy replacement of YW modules is to require that each standard module optionally input and output files--with well-defined formats--representing the expected inputs or outputs of that module. Any program that produces or consumes these file formats can then function as an alternative to one or more standard YW modules and can provide identical, overlapping, or completely different capabilities.

The Java prototype of YesWorkflow partially implements the functions of the YW-Extract, YW-Model, and YW-Graph modules described in the section below. The completed Java implementation will fully support these modules and also will include complete YW-Query and YW-Validate modules.

The purpose of the YW-CLI module is to make it easy to invoke the other YW modules from the command line. YW-CLI will enable a user to execute sequences of the standard, Java-based YW modules, starting from an input file with format appropriate to the first module in the executed sequence.

YesWorkflow Modules

YW-Extract###

The purpose of this module is to identify YesWorkflow comments in a script defined in one or more source files. It produces a programming-language independent representation of the script and the YW comments in it. This representation can be stored either in memory or in a file. Actual scripts that need to be analyzed may span multiple files (via include directives, etc), so a useful first step is to create a single file that parses the YesWorkflow comments and records their locations in the various input source files. YW-Extract is the only module that needs to handle differences in how comments are indicated in various programming languages.

YW-Model

The YW-Model module takes as input a set of YW comments extracted from one or more source files. It interprets these comments and builds a model of the script from which the comments were extracted. This model represents the script in terms of entities analogous to the components of a scientific workflow. These entities are referred to as programs, ports, channels, and workflows.

Entity	Meaning of the entity and related YW comments
Program	A program represents a computational step in the analyzed script that receives input data and produces intermediate or final data products. A program is designated in a script by bracketing the relevant code between a pair of `@begin` and `@end` comments .
Workflow	Programs can be nested within other programs. A program that contains other programs is considered a workflow.
Port	A port represents a way in which data flows into or out of a program (or workflow). Ports are identified by `@in` and `@out` comments in the analyzed source code.
Channel	A channel is a connection between an `@in` port and an `@out` port (typically on different programs). YW-Model infers channels by matching the names (or aliases) of `@in` and `@out` ports within the same workflow.

YW-Graph

YW-Graph operates on the outputs of YW-Model to produce a dataflow graph. YW-Graph will provide three different views of the workflow model. The process-centric view represents computational steps (programs) as graph nodes (blocks); and channels as directed edges (arrows) labeled with matching name (or alias) on the connected @in and @out ports. The data-centric view represents data output by programs as nodes in the graph, with directed edges between them labeled with the programs that consume and produce the data at each node. Finally, the combined view represents both programs and data as nodes; edges in this graph connect alternating computational-step and data-product nodes.

Note that because YW-Graph operates on the product of YW-Model, executions of the former do not need access to the original script files.

YW-Query

Like YW-Graph, the YW-Query module operates on the workflow model produced by YW-Model. YW-Query additionally takes as input a query about the script the model represents. Queries can be used to probe the structure of a complex script without having to inspect and interpret graphical representations. Examples of queries YW-Query will support include:

Q1. Given the name of an output of the script, list the inputs to the script that the output depends on (directly or indirectly).

Q2. List the computational steps involved in deriving a particular output of the script, or of a named intermediate data product.

Q3. For a particular computational step reveal where each input to the step comes from: an input to the script, a constant in the script, a value produced by a different step, etc.

Q4. For a particular computational step reveal what other steps are nested within it (including code blocks, calls to functions, and invocations of external programs).

YW-Validate

Any approach that depends on adding comments to a script incurs a risk that those comments do not accurately reflect the actual function or behavior of the script. Moreover, comments that originally were correct can become inaccurate when the code they are associated with is changed or refactored.

The YW-Validate module addresses these problems by comparing assertions made in YesWorkflow comments with the actual script(s). Initially this analysis may be limited to confirming that data names used in @in and @out comments actually appear in the code bracketed by associated @begin and @end comments. Later, YW-Validate will confirm that continuous data dependency chains exist from each script output all the way back to script inputs (and embedded constants), etc.

YW-CLI

The YW-CLI module invokes one or more of the above YW modules based on the commands and input data provided to it. For example, given the command graph and the path to a script source file, YW-CLI will invoke YW-Extract, YW-Model, and YW-Graph.

Inputs to YW-CLI may be provided via paths to files or via the standard input stream. Outputs can similarly be routed to files or to standard output. Consequently, YW modules can be invoked in shell pipelines with standard modules interleaved with non-standard implementations as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modular Architecture

YesWorkflow Modules

YW-Extract###

YW-Model

YW-Graph

YW-Query

YW-Validate

YW-CLI

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally