RO checklist evaluation API

Table of Contents API function overview API usage Link relations http://purl.org/ro/service/evaluate/checklist HTTP methods Resources and formats Service description Research Object Minim description Minim Checklist Minim Model (check item list) Minim Requirements Requirement for an RO to contain a workflow primitive Liveness testing Software environment testing Proposed Minim enhancements Integrity testing Resource annotation with checksum values Comparing with a reference object or resource Constraint references to RO and target resources Checklist evaluation results Cache considerations Security considerations References

API function overview

The checklist evaluation API is intended to provide access to the minim-based evaluation of Research Objects, used to test for completeness, executability, repeatability and other desired features. The functionality provided is based on the ro-manager evaluate checklist option:

  ro evaluate checklist [ -d <dir> ] [ -a | -l <level> ] <minim> <purpose> [ <target> ]

where:

<dir> is the directory containing the RO to be evaluated
<level> indicates the level of information detail to be returned
<minim> is a URI reference for a minimum information model resource from which the checklist definition is obtained
<target> is a target resource with respect to which the evaluation is performed; the default <target> is the RO itself, but a component within the RO may be selected.
<purpose> is a keyword indicating the purpose for which the RO or <target> is to be evaluated.

For example:

  ro evaluate checklist -d /workspace/myro -l all /workspace/minim.rdf "creation" myro/wfoutput.dat

might evaluate the RO at /workspace/myro using the minim model in file /workspace/minim.rdf.

The Web API is intended to provide remote access to the above functionality using simple HTTP requests.

Research Objects and other data are provided as web resources, and indicated in the API using their URIs.

API usage

Suppose we have:

A checklist evaluation service accessible at http://service.example.org/evaluate/checklist
An RO with URI http://sandbox.example.org/ROs/myro>

A MINIM minimum information model description (containing checklist definitions including one for repeatability) at http://another.example.com/minim/repeatable.rdf The checklist definition in the MINIM model for repeatability is associated with the purpose keyword "repeatable". Note: there is an example of a simple minim model at https://github.com/wf4ever/ro-catalogue/blob/master/v0.1/simple-requirements/simple-requirements-minim.rdf

The checklist evaluation would then be invoked in a sequence of two HTTP operations:

Client retrieves service document:

C: GET /evaluate/checklist HTTP/1.1
C: Host: service.example.org
C: Accept: application/rdf+xml

S: HTTP/1.1 200 OK
S: Content-Type: application/rdf+xml
S:
S: <?xml version="1.0"?>
S: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
S:   xmlns:roe="http://purl.org/ro/service/evaluate/">
S:   <rdf:Description rdf:about="">
S:     <roe:checklist>/evaluate/checklist{?RO,minim,target,purpose}</roe:checklist>
S:   </rdf:Description>
S: </rdf:RDF>

Client parses the service document, extracts the URI template for the checklist evaluation service and assembles URI for the desired evaluation result (cf. RFC6570), and issues a second HTTP GET request:

C: GET /evaluate/checklist?RO=http%3A%2F%2Fsandbox.example.org%2FROs%2Fmyro
         &minim=http%3A%2F%2Fanother.example.com%2Fminim%2Frepeatable.rdf
         &purpose=repeatable HTTP/1.1
C: Host: service.example.org
C: Accept: application/rdf+xml

S: HTTP/1.1 200 OK
S: Content-Type: application/rdf+xml
S:
S: <?xml version="1.0"?>
S: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
S:   xmlns:...="..."
S:   >
S:   <rdf:Description rdf:about="...">
S:     (Result of checklist evaluation)
S:   </rdf:Description>
S: </rdf:RDF>

The result from the second request is the checklist evaluation result. The URI shown above has been split over several lines for readability - the actual HTTP request must present it without whitespace. The optional target URI parameter has been omitted in this example on the assumption that the target is the Research Object itself.

See also:

Link relations

http://purl.org/ro/service/evaluate/checklist

This relation is generally used used in the service description document

It indicates a relation between a service description and a URI template for RO evaluation results using the described service. The URI template is is used to construct a service result URI by:

applying the URI template expansion procedures with caller-supplied RO URI, minim URI, purpose and target URIs, and
resolving the resulting URI-reference to an absolute URI using normal URI resolution rules (e.g. typically, using the service document URI as a base URI)

See also:

HTTP methods

The service description is obtained in response to an HTTP GET to a checklist evaluation service URI.

The checklist evaluation service responds to an HTTP GET with the results of a checklist evaluation, using the URI defined by expanding the template provided by the service description.

Resources and formats

Service description

The checklist evaluation service description is an RDF file that contains URI templates for accessing RO related services, including checklist evaluation. The RDF syntax used may be content negotiated. In the absence of content negotiation, RDF/XML should be returned.

Example:

<?xml version="1.0"?>
  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:roe="http://purl.org/ro/service/evaluate/"
    >
    <rdf:Description rdf:about="">
      <roe:checklist>/evaluate/checklist{?RO,minim,target,purpose}</roe:checklist>
    </rdf:Description>
  </rdf:RDF>

Research Object

See:

Minim description

NOTE: this section describes the original Minim model structure, which is still fully supported by the software. However, the capabilities described here have been refactored and extended by a revised minim, model, which is described more fully in https://github.com/wf4ever/ro-manager/blob/develop/Minim/Minim-description.md and https://github.com/wf4ever/ro-manager/blob/develop/Minim/minim-revised.md

The MINIM description contains 3 levels of description:

minim:Checklist associates a target and purpose (e.g. runnable RO) to a minim:Model

to be evaluated.

minim:Model encodes the checklist (list of requirements) to be evaluated.

(There is provision for MUST / SHOULD / MAY requirements in a checklist to cater for limited variation in levels of conformance.)

minim:Requirement is a single requirement (checklist item), which is associated with a rule for evaluating whether or not it is satisfied or not satisfied. Each rule makes reference to a "checklist primitive" function. Additional capabilities can be added (in due course) by expanding the set of available checklist primitives (e.g. see http://www.wf4ever-project.org/wiki/display/docs/RO+decay+detection+using+checklists)

These 3 levels are called out in the examples that follow

See:

Minim Checklist

The "checklist" (previously called "constraint") describes a mapping from target and purpose values to a particular minim:Model to be used as the basis of an evaluation. Relative URI references are resolved relative to the location of the Minim resource. In this example, the Minim resource is taken from the root directory of an RO, so "." refers to the RO itself.

<minim:Checklist rdf:about="#runnable_RO">
  <minim:forPurpose>Runnable</minim:forPurpose>
  <minim:onResource rdf:resource="." />
  <minim:toModel rdf:resource="#runnable_RO_model" />
  <rdfs:comment>
    Constraint to be satisfied if the RO is to be runnable
  </rdfs:comment>
</minim:Constraint>

Minim Model (check item list)

A minim model represents a list of check items to be evaluated. It enumerates of a number of requirements which may be declared at levels of MUST, SHOULD or MAY be satisfied for the model as a whole to be considered satisfied. This follows a structure for minimum information models proposed by Matthew Gamble.

<minim:Model rdf:about="#runnable_RO_model">
  <rdfs:label>Runnable RO</rdfs:label>
  <rdfs:comment>
    This model defines information that must be available for the 
    requirements Research Object to be runnable.
  </rdfs:comment>
  <minim:hasMustRequirement rdf:resource="#environment-software/lpod-show" />
  <minim:hasMustRequirement rdf:resource="#environment-software/python" />
  <minim:hasMustRequirement rdf:resource="#isPresent/workflow-instance" />
  <minim:hasMustRequirement rdf:resource="#isPresent/workflow-inputfiles" />
</minim:Model>

Minim Requirements

Minim Requirements are evaluated using rules, which in turn invoke checklist evaluation primitives with appropriate parameters. This structure allows a relatively wide range of checklist items to be evaluated based on a relatively small number of primitive tests. The examples show the various primitives.

Requirement for an RO to contain a workflow primitive

The minim:ContentMatchRequirementRule is driven by a SPARQL query probe which is evaluated over a merge of all the RO annotations (including the RO manifest). In this case, it simply tests that the query can be satisfied. The minim:showpass and minim:showfail properties indicate strings that are used for reporting the status of the checklist evaluation.

<!-- Workflow instance must be present -->
<minim:Requirement rdf:about="#isPresent/workflow-instance">
  <minim:isDerivedBy>
    <minim:ContentMatchRequirementRule>
      <minim:exists>
        ?wf rdf:type wfdesc:Workflow .
      </minim:exists>
      <minim:showpass>Workflow instance or template found</minim:showpass>
      <minim:showfail>No workflow instance or template found</minim:showfail>
      <minim:derives rdf:resource="#isPresent/workflow-instance" />
    </minim:ContentMatchRequirementRule>
  </minim:isDerivedBy>
</minim:Requirement>

Liveness testing

To test for liveness of a resource, the evaluator will need to attempt to access the resource. If it is a local file, a file existence check should suffice. If it is a web resource, then a success response to an HTTP HEAD request is expected.

<!-- Workflow descriptions must be accessible (live) -->
<minim:Requirement rdf:about="#workflows_accessible">
  <minim:isDerivedBy>
    <minim:ContentMatchRequirementRule>
      <minim:forall>
        ?wf rdf:type wfdesc:Workflow .
      </minim:forall>
      <minim:isLiveTemplate>
        {+wf}
      </minim:isLiveTemplate>
      <minim:showpass>All declared workflow descriptions are accessible</minim:showpass>
      <minim:showfail>Workflow description %(wf)s is not accessible</minim:showfail>
      <minim:derives rdf:resource="#workflows_accessible" />
    </minim:ContentMatchRequirementRule>
  </minim:isDerivedBy>
</minim:Requirement>

This varies from the simple aggregation test in that the minim::aggregatesTemplate property is replaced by a minim:isLiveTemplate property.

Software environment testing

A minim:SoftwareEnvironmentRule tests to see if a particular piece of software is available by issuing a command and checking the response against a supplied regular expression. (This test is primarily intended for local use within RO-manager, and may be of limited use on the evaluation service as the command is issued on the host running the evaluation service, not on the host requesting the service.)

<!-- Environment needs python -->
<minim:Requirement rdf:about="#environment-software/python">
  <minim:isDerivedBy>
    <minim:SoftwareEnvironmentRule>
      <minim:command>python --version</minim:command>
      <minim:response>Python 2.7</minim:response>
      <minim:show>Installed python version %(response)s</minim:show>
      <minim:derives rdf:resource="#environment-software/python" />
    </minim:SoftwareEnvironmentRule>
  </minim:isDerivedBy>
</minim:Requirement>

Proposed Minim enhancements

In viewing the proposals for liveness and integrity testing, compare with the current framework used for testing that an RO aggregates a specified resource; the following rule example tests that all workflow inputs are aggregated by the evaluated RO:

<minim:ContentMatchRequirementRule>
  <minim:forall>
    ?wf rdf:type wfdesc:Workflow ;
        wfdesc:hasInput [ wfdesc:hasArtifact ?if ] .
  </minim:forall>
  <minim:aggregatesTemplate>{+if}</minim:aggregatesTemplate>
    :
</minim:ContentMatchRequirementRule>

Integrity testing

In this context, "integrity testing" means checking that a resource content matches some expected (e.g. previously calculated) value.

The integrity testing builds upon the liveness testing, by adding a reference to a resource with which the tested resource may be compared, by way of the minim:contentMatchTemplate property. Note that this property says nothing about using a hash or checksum value; the plan is that it's just the URI of a resource with which to compare. But that URI may be an ni: URI (cf. http://tools.ietf.org/html/draft-farrell-decade-ni-06), in which case, instead of trying to dereference the URI and do a byte-by-byte comparison, instead it calculates the appropriate hash function over the resource being tested and compares that with the value encoded by the ni: URI. To check for a known constant value, a data: URI (http://tools.ietf.org/html/rfc2397) can be used.

This structure assumes that appropriate annotations have been created to allow a SPARQL query to discover the URI of an appropriate resource with which to match the content. If may turn out that further queries against a separate RO are needed, in which case the above structure may need to be expanded. In the example below, the annotation is a roeval:contentMatch property directly about the resource being integrity-checked.

<minim:ContentMatchRequirementRule>
  <minim:forall>
    ?wf rdf:type wfdesc:Workflow ;
        wfdesc:hasInput [ wfdesc:hasArtifact ?if ] .
    ?if roeval:contentMatch ?match .
  </minim:forall>
  <minim:accessTemplate>{+if}</minim:accessTemplate>
  <minim:contentMatchTemplate>{+match}</minim:contentMatchTemplate>
    :
</minim:ContentMatchRequirementRule>

Resource annotation with checksum values

Integrity checking with a single RO may be achieved by adding checksum annotations to the resources that subsequently will be tested. A new RO-manager command could be provided to facilitate creation of such annotations; e.g.

  ro content-hash _resource_

which would calculate and output an appropriate ni: URI, and could be used as part of an RO command thus:

  ro annotate _resource1_ _attribute-name_ `ro content-hash _resource2_`

This flexible approach would allow a checksum to be calculated from one RO and applied as a reference for checking to another. But this combination of commands might be difficult to do in some systems (e.g. Windows), so a less flexible form of command might be offered:

  ro annotate-hash _resource1_ roeval:contentMatch _resource2_

where resource1 would be annotated with the content hash of resource2, using the indicated attribute identifier.

Comparing with a reference object or resource

A proposed requirement is that contents of one RO can be compared with the content of another calibration (or reference) RO.

This might be achieved by adding an annotation to an RO referring to its corresponding calibration RO. It is expected that this information might be provided by RO evolution information, but in the absence of such annotations a new, specialized annotation might be needed for this.

Operationally, the comparison between ROs might be invoked in any of the following ways:

One RO contains links to its corresponding calibration RO. This would appear to be quite natural if one RO is derived from another: it's evolution trace would contain a reference to the RO from which it is derived, but would be useful only when that is the RO with which one would want to perform the comparison. This would limit the scope of ROs that could be compared to those that are explicitly related to each other. Thus, a researcher finding two independent ROs that claim to calculate a common value would not be able to simply compare those without first (copying and) modifying at least one of them.
References to the calibration RO are carried in the Minim description. This would limit the range of ROs with which such a Minim description might be used, and would make it harder or impossible to create truly generic checklists that compare different ROs.
A new RO might contain references to two other ROs to be compared.
The URIs of the ROs to be compared could be provided in the service invocation.

@@TODO - define how to handle this. We need a clearer view of the use-cases to best understand which approach(es) would be most appropriate.

Constraint references to RO and target resources

Currently, the mechanisms used for defining minim:Constraint values contain references to the RO, which makes it difficult to use the same Minim resource with multiple ROs.

Current structure:

<rdf:Description rdf:about=".">
  <minim:hasConstraint>
    <minim:Constraint rdf:about="#runnable_RO">
      <minim:forPurpose>Runnable</minim:forPurpose>
      <minim:onResource rdf:resource="." />
      <minim:toModel rdf:resource="#runnable_RO_model" />
      <rdfs:comment>
        Constraint to be satisfied if the RO is to be runnable
      </rdfs:comment>
    </minim:Constraint>
  </minim:hasConstraint>
</rdf:Description>

Outline of change proposal:

Do not use the minim:hasConstraint property when selecting potential constraints; instead, query the Minim description for all resources of type minim:Constraint.
As an alternative to minim:onResource for selecting the target resource, also recognize a minim:onResourceTemplate parameter whose value is a string containing a URI template. The URI template is expanded with variables RoUri being the URI of the RO being evaluated, and TargetUri being the URI reference of the selected target (which defaults to the RO URI if no specific target is selected). The URI reference resulting from the expansion may be resolved to a full URI using the RO URI as a base URI.

Thus, the above example might become:

<minim:Constraint rdf:about="#runnable_RO">
  <minim:forPurpose>Runnable</minim:forPurpose>
  <minim:onResourceTemplate>{+RoUri}<minim:onResourceTemplate>
  <minim:toModel rdf:resource="#runnable_RO_model" />
  <rdfs:comment>
    Constraint to be satisfied if the RO is to be runnable
  </rdfs:comment>
</minim:Constraint>

A further development might allow an additional SPARQL query to be used, which is executed against a merge of the RO annotations resulting in additional variable bindings that can be referenced by the target resource template. Such might be used, for example, to locate a workflow template within the RO that uses a particular global database, which in turn might be used to test that the corresponding provenance record indicates use of an up-to-date version of that database.

Checklist evaluation results

Evaluation results are an RDF or JSON file containing a fully detailed description of the results of a checklist evaluation. RDF results are reported using the Minim vocabulary and requirement URIs, together with additional terms to include diagnostic and summary information.

Example of RDF (Turtle syntax here):

PREFIX rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX spin:       <http://spinrdf.org/sp#>
PREFIX result:     <http://www.w3.org/2001/sw/DataAccess/tests/result-set#>
PREFIX minim:      <http://purl.org/minim/minim#>
<http://sandbox.example.org/ROs/myro>
  minim:testedConstraint   <http://another.example.com/minim/repeatable.rdf#runnable-RO-constraint> ;
  minim:testedPurpose      "Runnable" ;
  minim:testedTarget       <http://sandbox.example.org/ROs/myro> ;
  minim:fullySatisfies     <http://another.example.com/minim/repeatable.rdf#runnable-RO-checklist> ;
  minim:nominallySatisfies <http://another.example.com/minim/repeatable.rdf#runnable-RO-checklist> ;
  minim:minimallySatisfies <http://another.example.com/minim/repeatable.rdf#runnable-RO-checklist> ;
  # minim:missingMust      ... (none)
  # minim:missingShould    ... (none)
  # minim:missingMay       ... (none)
  minim:satisfied
    [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#environment-software/lpod-show> ],
    [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#environment-software/python> ],
    [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#isPresent/workflow-instance>],
    [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#isPresent/workflow-inputfiles> ;
      result:binding [ result:variable "wf" ; result:value "http://sandbox.example.org/ROs/myro/docs/mkjson.sh" ],
      result:binding [ result:variable "wi" ; result:value "GZZzLCkR38" ],
      result:binding [ result:variable "if" ; result:value "http://sandbox.example.org/ROs/myro/data/UserRequirements-gen.ods" ]
    ] .

These are mainly terms from the current MINIM vocabulary, with a few new ones added to provide more detailed diagnostic information about the evaluation result:

minim:testedConstraint is the URI of the constraint
minim:testedTarget is the URI of the target of the constraint
minim:testedPurpose is the URI of the constraint
minim:missingMust refers to a minim:hasMustRequirement rule for the model used that is not satisfied by the RO with corresponding variable bindings where appropriate
minim:missingShould refers to a minim:hasShouldRequirement rule for the model used that is not satisfied by the RO, with corresponding variable bindings where appropriate
minim:missingMay refers to a minim:hasMayRequirement rule for the model used that is not satisfied by the RO, with corresponding variable bindings where appropriate
minim:satisfied refers to a rule that is satisfied by the RO, with corresponding variable bindings where appropriate
minim:tryRule object is URI of a rule in the MINIM model.

URIs that that appear as object values above are references to elements of the MINIM model tested, and are defined in the minim description resource. It is proposed that the result graph should also include the original minim description, so that all information needed for checklist reporting is obtainable from the result returned.

@@TODO JSON format for results TBD; proposed to use something like a JSON-LD rendering of the RDF.

Cache considerations

Effective (and correct) cacheing of checklist evaluation results is desirable, as this could prevent unnecessary recalculation of results that have already been calculated. But care is needed to avoid incorrect cacheing which could result in incorrect results being returned.

The result of a checklist evaluation should be cacheable, subject to cacheability of the evaluation service document, the minim description and the RO that is evaluated. The service should return cache-control headers on the evaluation result that require re-evaluation when the criteria for cacheing the various resources used are no longer satisfied (e.g. Cache-Control: max-age values returned should no greater than the minimum of the values associated with all resources used).

Also note any Vary: headers or other Cache-control: values that are returned: if the corresponding headers are derived from client-supplied header field values, the checklist evaluation response should indicate this.

@@Any other potential gotchas here?

See also:

http://www.mnot.net/cache_docs/

Security considerations

Checklist evaluation is a read-only function, so there is no obvious risk of unintended data modification.

Checklist evaluation results may expose information about the content of a Research Object. As a general rule, the results of a checklist evaluation should be provided only to agents who themselves have permission to access the RO contents. Mechanisms for implementing such access control are not described here.

References

http://tools.ietf.org/html/rfc2616
http://tools.ietf.org/html/rfc3986
http://tools.ietf.org/html/rfc6570
http://www.wf4ever-project.org/wiki/display/docs/RO+model
http://www.wf4ever-project.org/wiki/display/docs/Research+Object+Vocabulary+Specification
http://www.wf4ever-project.org/wiki/display/docs/Research+Object+model
https://github.com/wf4ever/ro-manager/blob/develop/Minim/Minim-description.md - general description of revised Minim model
https://github.com/wf4ever/ro-manager/blob/develop/Minim/minim-revised.md - contains details of evaluation rules introduced in the revised minim model
https://github.com/wf4ever/ro-manager/blob/master/src/iaeval/Minim/minim.rdf
http://www.mnot.net/cache_docs/

See also:

RO Services and APIs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RO checklist evaluation API

Table of Contents

API function overview

API usage

Link relations

http://purl.org/ro/service/evaluate/checklist

HTTP methods

Resources and formats

Service description

Research Object

Minim description

Minim Checklist

Minim Model (check item list)

Minim Requirements

Requirement for an RO to contain a workflow primitive

Liveness testing

Software environment testing

Proposed Minim enhancements

Integrity testing

Resource annotation with checksum values

Comparing with a reference object or resource

Constraint references to RO and target resources

Checklist evaluation results

Cache considerations

Security considerations

References

Clone this wiki locally