-
Notifications
You must be signed in to change notification settings - Fork 2
RO checklist evaluation API
|
The checklist evaluation API is intended to provide access to the minim-based evaluation of Research Objects, used to test for completeness, executability, repeatability and other desired features. The functionality provided is based on the ro-manager
evaluate checklist option:
ro evaluate checklist [ -d <dir> ] [ -a | -l <level> ] <minim> <purpose> [ <target> ]
where:
-
<dir>
is the directory containing the RO to be evaluated -
<level>
indicates the level of information detail to be returned -
<minim>
is a URI reference for a minimum information model resource from which the checklist definition is obtained -
<target>
is a target resource with respect to which the evaluation is performed; the default<target>
is the RO itself, but a component within the RO may be selected. -
<purpose>
is a keyword indicating the purpose for which the RO or<target>
is to be evaluated.
ro evaluate checklist -d /workspace/myro -l all /workspace/minim.rdf "creation" myro/wfoutput.dat
might evaluate the RO at /workspace/myro using the minim model in file /workspace/minim.rdf
.
The Web API is intended to provide remote access to the above functionality using simple HTTP requests.
Research Objects and other data are provided as web resources, and indicated in the API using their URIs.
Suppose we have:
- A checklist evaluation service accessible at http://service.example.org/evaluate/checklist
- An RO with URI http://sandbox.example.org/ROs/myro>
The checklist evaluation would then be invoked in a sequence of two HTTP operations:
- Client retrieves service document:
C: GET /evaluate/checklist HTTP/1.1 C: Host: service.example.org C: Accept: application/rdf+xml S: HTTP/1.1 200 OK S: Content-Type: application/rdf+xml S: S: <?xml version="1.0"?> S: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" S: xmlns:roe="http://purl.org/ro/service/evaluate/"> S: <rdf:Description rdf:about=""> S: <roe:checklist>/evaluate/checklist{?RO,minim,target,purpose}</roe:checklist> S: </rdf:Description> S: </rdf:RDF>
- Client parses the service document, extracts the URI template for the checklist evaluation service and assembles URI for the desired evaluation result (cf. RFC6570), and issues a second HTTP GET request:
The result from the second request is the checklist evaluation result. The URI shown above has been split over several lines for readability - the actual HTTP request must present it without whitespace. The optional target URI parameter has been omitted in this example on the assumption that the target is the Research Object itself.C: GET /evaluate/checklist?RO=http%3A%2F%2Fsandbox.example.org%2FROs%2Fmyro &minim=http%3A%2F%2Fanother.example.com%2Fminim%2Frepeatable.rdf &purpose=repeatable HTTP/1.1 C: Host: service.example.org C: Accept: application/rdf+xml S: HTTP/1.1 200 OK S: Content-Type: application/rdf+xml S: S: <?xml version="1.0"?> S: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" S: xmlns:...="..." S: > S: <rdf:Description rdf:about="..."> S: (Result of checklist evaluation) S: </rdf:Description> S: </rdf:RDF>
See also:
- http://tools.ietf.org/html/rfc2616
- http://tools.ietf.org/html/rfc3986
- http://tools.ietf.org/html/rfc6570
- https://github.com/wf4ever/ro-manager/blob/master/src/iaeval/Minim/minim.rdf
This relation is generally used used in the service description document
It indicates a relation between a service description and a URI template for RO evaluation results using the described service. The URI template is is used to construct a service result URI by:
- applying the URI template expansion procedures with caller-supplied RO URI, minim URI, purpose and target URIs, and
- resolving the resulting URI-reference to an absolute URI using normal URI resolution rules (e.g. typically, using the service document URI as a base URI)
The service description is obtained in response to an HTTP GET to a checklist evaluation service URI.
The checklist evaluation service responds to an HTTP GET with the results of a checklist evaluation, using the URI defined by expanding the template provided by the service description.
The checklist evaluation service description is an RDF file that contains URI templates for accessing RO related services, including checklist evaluation. The RDF syntax used may be content negotiated. In the absence of content negotiation, RDF/XML should be returned.
Example:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:roe="http://purl.org/ro/service/evaluate/" > <rdf:Description rdf:about=""> <roe:checklist>/evaluate/checklist{?RO,minim,target,purpose}</roe:checklist> </rdf:Description> </rdf:RDF>
See:
- http://www.wf4ever-project.org/wiki/display/docs/RO+model
- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+Vocabulary+Specification
- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+model
NOTE: this section describes the original Minim model structure, which is still fully supported by the software. However, the capabilities described here have been refactored and extended by a revised minim, model, which is described more fully in https://github.com/wf4ever/ro-manager/blob/develop/Minim/Minim-description.md and https://github.com/wf4ever/ro-manager/blob/develop/Minim/minim-revised.md
The MINIM description contains 3 levels of description:
-
minim:Checklist
associates a target and purpose (e.g. runnable RO) to aminim:Model
-
minim:Model
encodes the checklist (list of requirements) to be evaluated.
-
minim:Requirement
is a single requirement (checklist item), which is associated with a rule for evaluating whether or not it is satisfied or not satisfied. Each rule makes reference to a "checklist primitive" function. Additional capabilities can be added (in due course) by expanding the set of available checklist primitives (e.g. see http://www.wf4ever-project.org/wiki/display/docs/RO+decay+detection+using+checklists)
See:
- http://purl.org/minim/minim
- https://github.com/wf4ever/ro-catalogue/blob/master/v0.1/simple-requirements/simple-requirements-minim.rdf
- http://www.wf4ever-project.org/wiki/display/docs/RO+Examples
The "checklist" (previously called "constraint") describes a mapping from target and purpose values to a particular minim:Model to be used as the basis of an evaluation. Relative URI references are resolved relative to the location of the Minim resource. In this example, the Minim resource is taken from the root directory of an RO, so "." refers to the RO itself.
<minim:Checklist rdf:about="#runnable_RO"> <minim:forPurpose>Runnable</minim:forPurpose> <minim:onResource rdf:resource="." /> <minim:toModel rdf:resource="#runnable_RO_model" /> <rdfs:comment> Constraint to be satisfied if the RO is to be runnable </rdfs:comment> </minim:Constraint>
A minim model represents a list of check items to be evaluated. It enumerates of a number of requirements which may be declared at levels of MUST, SHOULD or MAY be satisfied for the model as a whole to be considered satisfied. This follows a structure for minimum information models proposed by Matthew Gamble.
<minim:Model rdf:about="#runnable_RO_model"> <rdfs:label>Runnable RO</rdfs:label> <rdfs:comment> This model defines information that must be available for the requirements Research Object to be runnable. </rdfs:comment> <minim:hasMustRequirement rdf:resource="#environment-software/lpod-show" /> <minim:hasMustRequirement rdf:resource="#environment-software/python" /> <minim:hasMustRequirement rdf:resource="#isPresent/workflow-instance" /> <minim:hasMustRequirement rdf:resource="#isPresent/workflow-inputfiles" /> </minim:Model>
Minim Requirements are evaluated using rules, which in turn invoke checklist evaluation primitives with appropriate parameters. This structure allows a relatively wide range of checklist items to be evaluated based on a relatively small number of primitive tests. The examples show the various primitives.
The minim:ContentMatchRequirementRule
is driven by a SPARQL query probe which is evaluated over a merge of all the RO annotations (including the RO manifest). In this case, it simply tests that the query can be satisfied. The minim:showpass
and minim:showfail
properties indicate strings that are used for reporting the status of the checklist evaluation.
<!-- Workflow instance must be present --> <minim:Requirement rdf:about="#isPresent/workflow-instance"> <minim:isDerivedBy> <minim:ContentMatchRequirementRule> <minim:exists> ?wf rdf:type wfdesc:Workflow . </minim:exists> <minim:showpass>Workflow instance or template found</minim:showpass> <minim:showfail>No workflow instance or template found</minim:showfail> <minim:derives rdf:resource="#isPresent/workflow-instance" /> </minim:ContentMatchRequirementRule> </minim:isDerivedBy> </minim:Requirement>
To test for liveness of a resource, the evaluator will need to attempt to access the resource. If it is a local file, a file existence check should suffice. If it is a web resource, then a success response to an HTTP HEAD request is expected.
This varies from the simple aggregation test in that the minim::aggregatesTemplate property is replaced by a minim:isLiveTemplate property.<!-- Workflow descriptions must be accessible (live) --> <minim:Requirement rdf:about="#workflows_accessible"> <minim:isDerivedBy> <minim:ContentMatchRequirementRule> <minim:forall> ?wf rdf:type wfdesc:Workflow . </minim:forall> <minim:isLiveTemplate> {+wf} </minim:isLiveTemplate> <minim:showpass>All declared workflow descriptions are accessible</minim:showpass> <minim:showfail>Workflow description %(wf)s is not accessible</minim:showfail> <minim:derives rdf:resource="#workflows_accessible" /> </minim:ContentMatchRequirementRule> </minim:isDerivedBy> </minim:Requirement>
A minim:SoftwareEnvironmentRule
tests to see if a particular piece of software is available by issuing a command and checking the response against a supplied regular expression. (This test is primarily intended for local use within RO-manager, and may be of limited use on the evaluation service as the command is issued on the host running the evaluation service, not on the host requesting the service.)
<!-- Environment needs python --> <minim:Requirement rdf:about="#environment-software/python"> <minim:isDerivedBy> <minim:SoftwareEnvironmentRule> <minim:command>python --version</minim:command> <minim:response>Python 2.7</minim:response> <minim:show>Installed python version %(response)s</minim:show> <minim:derives rdf:resource="#environment-software/python" /> </minim:SoftwareEnvironmentRule> </minim:isDerivedBy> </minim:Requirement>
In viewing the proposals for liveness and integrity testing, compare with the current framework used for testing that an RO aggregates a specified resource; the following rule example tests that all workflow inputs are aggregated by the evaluated RO:
<minim:ContentMatchRequirementRule> <minim:forall> ?wf rdf:type wfdesc:Workflow ; wfdesc:hasInput [ wfdesc:hasArtifact ?if ] . </minim:forall> <minim:aggregatesTemplate>{+if}</minim:aggregatesTemplate> : </minim:ContentMatchRequirementRule>
In this context, "integrity testing" means checking that a resource content matches some expected (e.g. previously calculated) value.
The integrity testing builds upon the liveness testing, by adding a reference to a resource with which the tested resource may be compared, by way of the minim:contentMatchTemplate
property. Note that this property says nothing about using a hash or checksum value; the plan is that it's just the URI of a resource with which to compare. But that URI may be an ni: URI (cf. http://tools.ietf.org/html/draft-farrell-decade-ni-06), in which case, instead of trying to dereference the URI and do a byte-by-byte comparison, instead it calculates the appropriate hash function over the resource being tested and compares that with the value encoded by the ni: URI. To check for a known constant value, a data: URI (http://tools.ietf.org/html/rfc2397) can be used.
This structure assumes that appropriate annotations have been created to allow a SPARQL query to discover the URI of an appropriate resource with which to match the content. If may turn out that further queries against a separate RO are needed, in which case the above structure may need to be expanded. In the example below, the annotation is a roeval:contentMatch
property directly about the resource being integrity-checked.
<minim:ContentMatchRequirementRule> <minim:forall> ?wf rdf:type wfdesc:Workflow ; wfdesc:hasInput [ wfdesc:hasArtifact ?if ] . ?if roeval:contentMatch ?match . </minim:forall> <minim:accessTemplate>{+if}</minim:accessTemplate> <minim:contentMatchTemplate>{+match}</minim:contentMatchTemplate> : </minim:ContentMatchRequirementRule>
Integrity checking with a single RO may be achieved by adding checksum annotations to the resources that subsequently will be tested. A new RO-manager command could be provided to facilitate creation of such annotations; e.g.
ro content-hash _resource_
which would calculate and output an appropriate ni: URI, and could be used as part of an RO command thus:
ro annotate _resource1_ _attribute-name_ `ro content-hash _resource2_`
This flexible approach would allow a checksum to be calculated from one RO and applied as a reference for checking to another. But this combination of commands might be difficult to do in some systems (e.g. Windows), so a less flexible form of command might be offered:
ro annotate-hash _resource1_ roeval:contentMatch _resource2_
where resource1
would be annotated with the content hash of resource2
, using the indicated attribute identifier.
A proposed requirement is that contents of one RO can be compared with the content of another calibration (or reference) RO.
This might be achieved by adding an annotation to an RO referring to its corresponding calibration RO. It is expected that this information might be provided by RO evolution information, but in the absence of such annotations a new, specialized annotation might be needed for this.
Operationally, the comparison between ROs might be invoked in any of the following ways:
- One RO contains links to its corresponding calibration RO. This would appear to be quite natural if one RO is derived from another: it's evolution trace would contain a reference to the RO from which it is derived, but would be useful only when that is the RO with which one would want to perform the comparison. This would limit the scope of ROs that could be compared to those that are explicitly related to each other. Thus, a researcher finding two independent ROs that claim to calculate a common value would not be able to simply compare those without first (copying and) modifying at least one of them.
- References to the calibration RO are carried in the Minim description. This would limit the range of ROs with which such a Minim description might be used, and would make it harder or impossible to create truly generic checklists that compare different ROs.
- A new RO might contain references to two other ROs to be compared.
- The URIs of the ROs to be compared could be provided in the service invocation.
Currently, the mechanisms used for defining minim:Constraint
values contain references to the RO, which makes it difficult to use the same Minim resource with multiple ROs.
Current structure:
Outline of change proposal:<rdf:Description rdf:about="."> <minim:hasConstraint> <minim:Constraint rdf:about="#runnable_RO"> <minim:forPurpose>Runnable</minim:forPurpose> <minim:onResource rdf:resource="." /> <minim:toModel rdf:resource="#runnable_RO_model" /> <rdfs:comment> Constraint to be satisfied if the RO is to be runnable </rdfs:comment> </minim:Constraint> </minim:hasConstraint> </rdf:Description>
- Do not use the minim:hasConstraint property when selecting potential constraints; instead, query the Minim description for all resources of type minim:Constraint.
- As an alternative to minim:onResource for selecting the target resource, also recognize a minim:onResourceTemplate parameter whose value is a string containing a URI template. The URI template is expanded with variables RoUri being the URI of the RO being evaluated, and TargetUri being the URI reference of the selected target (which defaults to the RO URI if no specific target is selected). The URI reference resulting from the expansion may be resolved to a full URI using the RO URI as a base URI.
A further development might allow an additional SPARQL query to be used, which is executed against a merge of the RO annotations resulting in additional variable bindings that can be referenced by the target resource template. Such might be used, for example, to locate a workflow template within the RO that uses a particular global database, which in turn might be used to test that the corresponding provenance record indicates use of an up-to-date version of that database.<minim:Constraint rdf:about="#runnable_RO"> <minim:forPurpose>Runnable</minim:forPurpose> <minim:onResourceTemplate>{+RoUri}<minim:onResourceTemplate> <minim:toModel rdf:resource="#runnable_RO_model" /> <rdfs:comment> Constraint to be satisfied if the RO is to be runnable </rdfs:comment> </minim:Constraint>
Evaluation results are an RDF or JSON file containing a fully detailed description of the results of a checklist evaluation. RDF results are reported using the Minim vocabulary and requirement URIs, together with additional terms to include diagnostic and summary information.
Example of RDF (Turtle syntax here):
These are mainly terms from the current MINIM vocabulary, with a few new ones added to provide more detailed diagnostic information about the evaluation result:PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX spin: <http://spinrdf.org/sp#> PREFIX result: <http://www.w3.org/2001/sw/DataAccess/tests/result-set#> PREFIX minim: <http://purl.org/minim/minim#> <http://sandbox.example.org/ROs/myro> minim:testedConstraint <http://another.example.com/minim/repeatable.rdf#runnable-RO-constraint> ; minim:testedPurpose "Runnable" ; minim:testedTarget <http://sandbox.example.org/ROs/myro> ; minim:fullySatisfies <http://another.example.com/minim/repeatable.rdf#runnable-RO-checklist> ; minim:nominallySatisfies <http://another.example.com/minim/repeatable.rdf#runnable-RO-checklist> ; minim:minimallySatisfies <http://another.example.com/minim/repeatable.rdf#runnable-RO-checklist> ; # minim:missingMust ... (none) # minim:missingShould ... (none) # minim:missingMay ... (none) minim:satisfied [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#environment-software/lpod-show> ], [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#environment-software/python> ], [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#isPresent/workflow-instance>], [ minim:tryRule <http://another.example.com/minim/repeatable.rdf#isPresent/workflow-inputfiles> ; result:binding [ result:variable "wf" ; result:value "http://sandbox.example.org/ROs/myro/docs/mkjson.sh" ], result:binding [ result:variable "wi" ; result:value "GZZzLCkR38" ], result:binding [ result:variable "if" ; result:value "http://sandbox.example.org/ROs/myro/data/UserRequirements-gen.ods" ] ] .
-
minim:testedConstraint
is the URI of the constraint -
minim:testedTarget
is the URI of the target of the constraint -
minim:testedPurpose
is the URI of the constraint -
minim:missingMust
refers to aminim:hasMustRequirement
rule for the model used that is not satisfied by the RO with corresponding variable bindings where appropriate -
minim:missingShould
refers to aminim:hasShouldRequirement
rule for the model used that is not satisfied by the RO, with corresponding variable bindings where appropriate -
minim:missingMay
refers to aminim:hasMayRequirement
rule for the model used that is not satisfied by the RO, with corresponding variable bindings where appropriate -
minim:satisfied
refers to a rule that is satisfied by the RO, with corresponding variable bindings where appropriate -
minim:tryRule
object is URI of a rule in the MINIM model.
@@TODO JSON format for results TBD; proposed to use something like a JSON-LD rendering of the RDF.
Effective (and correct) cacheing of checklist evaluation results is desirable, as this could prevent unnecessary recalculation of results that have already been calculated. But care is needed to avoid incorrect cacheing which could result in incorrect results being returned.
The result of a checklist evaluation should be cacheable, subject to cacheability of the evaluation service document, the minim description and the RO that is evaluated. The service should return cache-control headers on the evaluation result that require re-evaluation when the criteria for cacheing the various resources used are no longer satisfied (e.g. Cache-Control: max-age values returned should no greater than the minimum of the values associated with all resources used).
Also note any Vary:
headers or other Cache-control:
values that are returned: if the corresponding headers are derived from client-supplied header field values, the checklist evaluation response should indicate this.
@@Any other potential gotchas here?
See also:
Checklist evaluation is a read-only function, so there is no obvious risk of unintended data modification.
Checklist evaluation results may expose information about the content of a Research Object. As a general rule, the results of a checklist evaluation should be provided only to agents who themselves have permission to access the RO contents. Mechanisms for implementing such access control are not described here.
See also: [@@ref]
- http://tools.ietf.org/html/rfc2616
- http://tools.ietf.org/html/rfc3986
- http://tools.ietf.org/html/rfc6570
- http://www.wf4ever-project.org/wiki/display/docs/RO+model
- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+Vocabulary+Specification
- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+model
- https://github.com/wf4ever/ro-manager/blob/develop/Minim/Minim-description.md - general description of revised Minim model
- https://github.com/wf4ever/ro-manager/blob/develop/Minim/minim-revised.md - contains details of evaluation rules introduced in the revised minim model
- https://github.com/wf4ever/ro-manager/blob/master/src/iaeval/Minim/minim.rdf
- http://www.mnot.net/cache_docs/
- http://www.wf4ever-project.org/wiki/display/docs/Integrity+and+Authenticity+component
- http://www.wf4ever-project.org/wiki/display/docs/RO+SRS+REST+interface+-+ver.+5
- http://www.wf4ever-project.org/wiki/display/docs/Research+Objects+Digital+Library+%28including+the+ROSRS%29
- http://www.wf4ever-project.org/wiki/display/docs/RO+dereferencing
- [...]