Skip to content

Future Capabilities

Timothy McPhillips edited this page Oct 6, 2015 · 19 revisions

This page describes the limitations of the current YesWorkflow prototype. It also describes planned features for future versions of the software.

Nested code blocks

The YW-Extract and YW-Model modules currently support simple nesting of code blocks. Any pair of @begin and @end comment lines can enclose code that contain any number of other code blocks delimited with @begin and @end comment lines. The workflow model constructed for such a script reflects such nesting. In terms of the YW model, a Workflow contains one or more Programs, and any of these Programs can in turn be a Workflows that contain further nested Programs and Workflows.

However, graphing support for nested workflows currently is incomplete. The process-graph view of a nested workflow renders each level of the workflow as a separate graph. The data- and combined-graph views do not reveal subworkflows nested within the top-level workflow.

Upcoming versions of YesWorkflow will be able to render all subworkflows of a workflow. Additionally, YW-Graph will be able to render a specific, single workflow or subworkflow as desired.

Functions and function calls

YW-Extract currently expects nested code blocks to be defined in-line. However, many scripts are structured as functions (or classes) with a top-level script that calls these functions (or methods on objects). These functions can in turn call other functions.

Upcoming versions of YesWorkflow will allow function declarations to be marked up with YW comments in a manner similar to that of Javadoc, DOxygen, or ROxygen. Calls to these functions also will be annotated with YW markup. The result will be that YW-Extract and YW-Model will be able to represent function calls as nested code blocks.

Packaging of input source scripts

YW currently reads input source files and outputs the results of the processing it performs. Future versions of YW optionally will capture and package in a single archive file the source code it analyzes. This will allow users to take a snapshot of their scripts whenever they use YW, and will allow YW models and results to refer to lines in the original scripts without fear of the original scripts disappearing or changing after YW is run. It also will allow the YW comments to be analyzed again, perhaps differently, in the future on the exact same scripts.

The captured scripts will be packaged in a single file along with any YW modeling and analysis results (e.g., the workflow model of the script). YW will provide tools for querying this package file to retrieve all of the original source files, individual source files, or snippets of code representing the YW-annotated code blocks found in the source files. This will allow UIs for interactively exploring the YW graphs easily to display the code for any block.

Intermediate file formats

YesWorkflow aims to be highly modular with the goal of enabling interoperability of standard YW modules with custom modules developed by others. Key to this modularity is the definition of simple input and output file formats for each module. Current versions of the software do not define these formats and depend on intermediate results begin passed from one module to the next via data structures in memory.

Upcoming versions of YesWorkflow will define output formats for YW-Extract and YW-Model. The YW-Extract output file will represent a scripting-language neutral description of the YW comments found in a set of analyzed script files. YW-Model will be able to take the YW-Extract output as input and in turn will be able to save its results as a text description of the analyzed scripts in terms of the YW workflow model. YW-Graph and YW-Query will take as input the output of YW-Model.

When support for these intermediate file formats are complete, users and other development teams will be able write alternative YW modules for graphing, querying, and analyzing YW comments and YW models of scripts. They will also be able to develop tools that extract YW-compatible workflow structure from scripts and workflows not marked up with YW comments, and to apply YW graphing and querying capabilities to such scripts and workflows. For example, the Kurator team plans to use the YW-Graph and YW-Query tools to graphically render workflows defined using the Kurator-Akka workflow system and to query the prospective provenance of products of these workflows.

Interactive graphs

YW-Graph currently produces static graphical views (in Graphviz DOT format). The resulting graphs can be large and complex. The data view can be particularly difficult to interpret due to many crossing arrows.

An interactive viewer for YW graphical output will make these graphs easier to explore and interpret. For example, clicking on a data item in the combined or data views optionally will highlight the (prospective) direct and indirect data dependencies for that data item (the data from which it will be derived when the script is run). Features for expanding and collapsing nested subworkflows also will facilitate exploration of these graphs.

Live graph view

The primary function of YesWorkflow is to reveal workflow-like structure in existing scripts. YesWorkflow also can be used as a design tool when developing new scripts (or even before a script is written). Future versions of YesWorkflow will better support such applications by providing live-update features to the interactive graph capabilities described above. Given a set of script files, the live-graph feature will monitor these files for changes and update the chosen graphical view automatically. Users of this feature will continue to be able use their favorite text editor or IDE for developing their scripts.

Workflow graph queries

The workflow structure of large scripts can be difficult to interpret fully even when represented graphically. The planned YW-Query module will allow this structure to be queried to answer specific questions about the script in workflow terms.

Example workflow-structure queries include:

  1. List all of the code blocks defined in the script along with any description given for each.
  2. List the code blocks nested (directly or indirectly) within a particular code block.
  3. List the code blocks that invoke a particular function or external program.
  4. List the code blocks that contain a particular block (directly or indirectly).
  5. List the code blocks that receive inputs derived (directly or indirectly) from the outputs of a particular upstream code block.
  6. List the code blocks affected (directly or indirectly) by a particular parameter value provided to the script.

Declarations of data models

YW annotations currently focus on revealing the data flows otherwise implicit in scripts. YW annotations will be provided for describing the data model(s) employed by the the script. Once a data model is declared, it will be possible to state that data flowing through an @in or @out is of a type represented by an element of the data model. Data models initially will be relatively simple but eventually may include primitive type declarations, tuples, records, multidimensional arrays, nested collections, and object inheritance hierarchies, in addition to an overall ER description of each model. The declared data model may be simpler or more detailed than what could be inferred from the data types used in the script.

YW graphs will reflect the declared data model and allow, for example, the channel for a record or tuple to be represented either as a single edge or multiple edges in the process view (or a single box or multiple boxes in the data view).

Prospective provenance queries

The future YW-Query module additionally will allow scripts marked up with YW comments to be queried from a data-provenance perspective. Because YesWorkflow analyzes the definition of a workflow (the script plus YW comments) rather than information recorded during a run of the script, the YW-Query module will provide 'prospective provenance' (see ideas for supporting Inference of Retrospective Provenance below).

Example prospective data provenance queries include:

  1. Given the name of an output of the script, list the inputs to the script that the output depends on (directly or indirectly).
  2. List the computational steps (code blocks) involved in deriving a particular output of the script, or of a named intermediate data product.
  3. For a particular computational step reveal where each input to the step comes from: an input to the script, a constant in the script, a value produced by a different step, etc.
  4. Reveal the complete derivation of a particular script output. That is, list the sequence of code blocks and input and intermediate data products leading to the output. Results of queries of this kind optionally may be rendered graphically.

Validation of comments

YesWorkflow currently provides minimal validation of the YW comments added to a script. The future YW-Validate module will perform extensive validation of YW comments. This capability will help guide users adding YW comments to their script. Perhaps more importantly, automatic validation will help prevent initially correct YW comments from becoming stale (i.e. incorrect) when the underlying script is changed or refactored.

Validity checks that YW-Validate will perform include:

  1. Confirm that data names used in @in and @out comments actually appear in the code bracketed by associated @begin and @end comments.
  2. Confirm that the names of functions referred to in YW comments (at function declaration or at function calls) match the names of the functions actually declared or called.
  3. Confirm that continuous data dependency chains exist from each script output all the way back to script inputs (and embedded constants).
Clone this wiki locally