-
Notifications
You must be signed in to change notification settings - Fork 13
Modular Architecture
The YesWorkflow software is envisioned as a set of modules that can be used together or independently. The primary goal of this modularity is to enable YW users and developers independently to implement alternatives to any module, as needed, to solve problems particular to their research domain. It will be possible to develop these alternative implementations and extensions in any programming language.
One way we plan to facilitate such easy replacement of YW modules is to require that each standard module optionally input and output files--with well-defined formats--representing the expected inputs or outputs of that module. Any program that produces or consumes these file formats can then function as an alternative to one or more standard YW modules and can provide identical, overlapping, or completely different capabilities.
The Java prototype of YesWorkflow partially implements the functions of the YW-Extract
, YW-Model
, and YW-Graph
modules described in the section below. The completed Java implementation will fully support these modules and also will include complete YW-Query
and YW-Validate
modules.
The purpose of the YW-CLI
module is to make it easy to invoke the other YW modules from the command line. YW-CLI
will enable a user to execute sequences of the standard, Java-based YW modules, starting from an input file with format appropriate to the first module in the executed sequence.
The purpose of this module is to identify YesWorkflow comments in a script defined in one or more source files. It produces a programming-language independent representation of the script and the YW comments in it. This representation can be stored either in memory or in a file. Actual scripts that need to be analyzed may span multiple files (via include directives, etc), so a useful first step is to create a single file that parses the YesWorkflow comments and records their locations in the various input source files. YW-Extract is the only module that needs to handle differences in how comments are indicated in various programming languages.
The YW-Model module takes as input a set of YW comments extracted from one or more source files. It interprets these comments and builds a model of the script from which the comments were extracted. This model represents the script in terms of entities analogous to the components of a scientific workflow. These entities are referred to as programs, ports, channels, and workflows.
Entity | Meaning of the entity and related YW comments |
---|---|
Program | A program represents a computational step in the analyzed script that receives input data and produces intermediate or final data products. A program is designated in a script by bracketing the relevant code between a pair of @begin and @end comments . |
Workflow | Programs can be nested within other programs. A program that contains other programs is considered a workflow. |
Port | A port represents a way in which data flows into or out of a program (or workflow). Ports are identified by @in and @out comments in the analyzed source code. |
Channel | A channel is a connection between an @in port and an @out port (typically on different programs). YW-Model infers channels by matching the names (or aliases) of @in and @out ports within the same workflow. |
YW-Graph operates on the outputs of YW-Model to produce a dataflow graph. YW-Graph will provide three different views of the workflow model. The process-centric view represents computational steps (programs) as graph nodes (blocks); and channels as directed edges (arrows) labeled with matching name (or alias) on the connected @in and @out ports. The data-centric view represents data output by programs as nodes in the graph, with directed edges between them labeled with the programs that consume and produce the data at each node. Finally, the combined view represents both programs and data as nodes; edges in this graph connect alternating computational-step and data-product nodes.
Note that because YW-Graph operates on the product of YW-Model, executions of the former do not need access to the original script files.
Like YW-Graph, the YW-Query module operates on the workflow model produced by YW-Model. YW-Query additionally takes as input a query about the script the model represents. Queries can be used to probe the structure of a complex script without having to inspect and interpret graphical representations. Examples of queries YW-Query will support include:
Q1. Given the name of an output of the script, list the inputs to the script that the output depends on (directly or indirectly).
Q2. List the computational steps involved in deriving a particular output of the script, or of a named intermediate data product.
Q3. For a particular computational step reveal where each input to the step comes from: an input to the script, a constant in the script, a value produced by a different step, etc.
Q4. For a particular computational step reveal what other steps are nested within it (including code blocks, calls to functions, and invocations of external programs).
Any approach that depends on adding comments to a script incurs a risk that those comments do not accurately reflect the actual function or behavior of the script. Moreover, comments that originally were correct can become inaccurate when the code they are associated with is changed or refactored.
The YW-Validate module addresses these problems by comparing assertions made in YesWorkflow comments with the actual script(s). Initially this analysis may be limited to confirming that data names used in @in
and @out
comments actually appear in the code bracketed by associated @begin
and @end
comments. Later, YW-Validate will confirm that continuous data dependency chains exist from each script output all the way back to script inputs (and embedded constants), etc.
The YW-CLI module invokes one or more of the above YW modules based on the commands and input data provided to it. For example, given the command graph and the path to a script source file, YW-CLI will invoke YW-Extract, YW-Model, and YW-Graph.
Inputs to YW-CLI may be provided via paths to files or via the standard input stream. Outputs can similarly be routed to files or to standard output. Consequently, YW modules can be invoked in shell pipelines with standard modules interleaved with non-standard implementations as needed.