The package is further developed at https://github.com/rue-a/provo.
The package creates provenance graphs according to PROV-O (i.e. in RDF). The package is based on rdflib. https://rdflib.readthedocs.io/en/stable/rdf_terms.html
- Download and unzip the package
- Open Shell and cd to unzipped package
- Run "pip install -e ." (in the folder that contains
setup.py
)
The package implements the PROV-O (https://www.w3.org/TR/prov-o/#starting-points-figure, graphic below) classes Entity, Activity and Agent as Python classes with methods to establish the basic relations (Arrows) between instances of those classes. Those relations are called properties. It also contains utilities to ease the provenance graph construction.
It is assumed that the basic unit of a provenance graph is an Activity with a number of Input Entities and Output Entities that is possibly controlled by an Agent. The following graphic shows this unit together with the resulting properties:
The script to generate this basic unit is provided in the examples folder as provBasicExample.py
:
from provo import ProvGraph, Activity, Entity, Agent
# setup the graph object (subclass of an rdflib-graph)
g = ProvGraph(namespace='https://provBasicExample.org/')
# at first we create all required objects
inputEntity = Entity(graph = g, id = 'inputEntity')
# any PROV-O object can have a label and a description
inputEntity.label('Input Entity')
inputEntity.description('This is the first entity')
outputEntity = Entity(g, 'outputEntity')
activity = Activity(g, 'activity')
agent = Agent(g, 'agent')
# now we build the relations
activity.used(inputEntity)
activity.wasAssociatedWith(agent)
outputEntity.wasGeneratedBy(activity)
outputEntity.wasAttributedTo(agent)
outputEntity.wasDerivedFrom(inputEntity)
# finally serialize the graph
path = './examples/out/provBasicExample_n3.rdf'
g.serialize(format = 'n3', destination = path)
Every object of the type Agent, Activity and Entity needs a unique identifier. This identifier needs to be an alphanumeric string without empty spaces. The instantiation also requires the graph to which the object should be added newEntity = Entity(graph = provGraph, id = 'newEntity')
as input.
The folder provo/examples contains an example script, that serializes an ArcGIS Workflow description into a PROV-O provenance graph. The folder out contains this graph. The ESRI tutorial with the example workflow is available at http://webhelp.esri.com/arcgisdesktop/9.3/pdf/Geoprocessing_in_ArcGIS_Tutorial.pdf, p. 36ff. The following figure shows the workflow (The wasDerivedBy properties between Entities are omitted):
The manual assignment of required properties in the graph (arrows) gets time-consuming and can lead to errors. The class ProvGraph
provides a utility method called link
that simplifies this task. The following example describes the "Intersect" Activity, which is shown graph above:
elev = Entity(g, 'ElevationsLessThan250m')
slopes = Entity(g, 'SlopesLessThan40Percent')
climate = Entity(g, 'ClimateZones')
intersect = Activity(g, 'Intersect')
intersectOut = Entity(g, 'intersectOutput')
g.link(
inputs = [elev, slopes, climate, suitMinusRoads],
process = intersect,
outputs = intersectOut,
agents = None
)
Every parameter of the link method can be set as list of according PROV-O objects or as single object. Every time the link method is called, it checks if there is a resulting wasInformedBy property that needs to be added to the graph (and adds it). This auto-inferencing of the link
-method can and should be deactivated when constructing large graphs ( g.link(..., inference = False)
).
- rdflib: https://rdflib.readthedocs.io/en/stable/, BSD License
MIT
Arne Rümmler ([email protected])