Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enhanced Targeting Features. #249

Merged
merged 14 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,35 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Python PEP 440 Versioning](https://www.python.org/dev/peps/pep-0440/).

## [Unreleased]

### Added
- Focus Node Filtering
- You can now pass in a list of focus nodes to the validator, and it will only validate those focus nodes.
- Note, you still need to pass in a SHACL Shapes Graph, and the shapes still need to target the focus nodes.
- This feature will filter the Shapes' targeted focus nodes to include only those that are in the list of specified focus nodes.
- SHACL Shape selection
- You can now pass in a list of SHACL Shapes to the validator, and it will use only those Shapes for validation.
- This is useful for testing new shapes in your shapes graph, or for many other procedure-driven use cases.
- Combined Shape Selection with Focus Node filtering
- The combination of the above two new features is especially powerful.
- If you give the validator a list of Shapes to use, and a list of focus nodes, the validator will operate in
a highly-targeted mode, it feeds those focus nodes directly into those given Shapes for validation.
- In this mode, the selected SHACL Shape does not need to specify any focus-targeting mechanisms of its own.

### Changed
- Don't make a clone of the DataGraph if the input data graph is ephemeral.
- An ephemeral graph is one that is loaded from a string or file location by PySHACL
- This includes all files opened by the PySHACL CLI validator tool
- We don't need to make a copy because PySHACL parsed the Graph into memory itself already, so we are not concerned about not polluting the user's graph.
- Refactorings
- shacl_path_to_sparql_path code to a reusable importable function
- move sht_validate and dash_validate routes to `validator_conformance.py` module.
- Removes some complexity from the main `validate` function.
- Typing
- A whole swathe of python typing fixes and new type annotations. Thanks @ajnelson-nist

### Fixed
- Fix logic determining if a datagraph is ephemeral.


## [0.26.0] - 2024-04-11
Expand Down
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,12 @@ optional arguments:
The maximum number of SHACL shapes "deep" that the
validator can go before reaching an "endpoint"
constraint.
-d, --debug Output additional runtime messages.
-d, --debug Output additional verbose runtime messages.
--focus [FOCUS] Optional IRIs of focus nodes from the DataGraph, the shapes will
validate only these node. Comma-separated list.
--shape [SHAPE] Optional IRIs of a NodeShape or PropertyShape from the SHACL
ShapesGraph, only these shapes will be used to validate the
DataGraph. Comma-separated list.
-f {human,table,turtle,xml,json-ld,nt,n3}, --format {human,table,turtle,xml,json-ld,nt,n3}
Choose an output format. Default is "human".
-df {auto,turtle,xml,json-ld,nt,n3}, --data-file-format {auto,turtle,xml,json-ld,nt,n3}
Expand Down Expand Up @@ -172,8 +177,8 @@ Some other optional keyword variables available on the `validate` function:

Return value:
* a three-component `tuple` containing:
* `conforms`: a `bool`, indicating whether or not the `data_graph` conforms to the `shacl_graph`
* `results_graph`: a `Graph` object built according to the SHACL specification's [Validation Report](https://www.w3.org/TR/shacl/#validation-report) structure
* `conforms`: a `bool`, indicating whether the `data_graph` conforms to the `shacl_graph`
* `results_graph`: a `Graph` object built according to the SHACL specification's [Validation Report](https://www.w3.org/TR/shacl/#validation-report) scheme
* `results_text`: python string representing a verbose textual representation of the [Validation Report](https://www.w3.org/TR/shacl/#validation-report)


Expand All @@ -200,6 +205,23 @@ Unlike `ValidationFailure`, these errors are not passed back as a result by the
caught in a `try ... except` block.
In the case of `ShapeLoadError` and `ConstraintLoadError`, see the `str()` string representation of the exception instance for the error message along with a link to the relevant section in the SHACL spec document.


## Focus Node Filtering, and Shape Selection
PySHACL v0.27.0 and above has two powerful new features:
- Focus Node Filtering
- You can pass in a list of focus nodes to the validator, and it will only validate those focus nodes.
- _Note_, you still need to use a SHACL ShapesGraph, and the Shapes _still need to target_ the focus nodes.
- This feature will filter the Shapes' targeted focus nodes to include only those that are in the list of specified focus nodes.
- SHACL Shape selection
- You can pass in a list of SHACL Shapes to the validator, and it will use only those Shapes for validation.
- This is useful for testing new shapes in your shapes graph, or for many other procedure-driven use cases.
- Combined Shape Selection with Focus Node filtering
- The combination of the above two new features is especially powerful.
- If you give the validator a list of Shapes to use, and a list of focus nodes, the validator will operate in
a highly-targeted mode, it feeds those focus nodes directly into those given Shapes for validation.
- In this mode, the selected SHACL Shape does not need to specify any focus-targeting mechanisms of its own.


## SPARQL Remote Graph Mode

_**PySHACL now has a built-in SPARQL Remote Graph Mode, which allows you to validate a data graph that is stored on a remote server.**_
Expand Down
27 changes: 26 additions & 1 deletion pyshacl/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,28 @@ def str_is_true(s_var: str):
help="The maximum number of SHACL shapes \"deep\" that the validator can go before reaching an \"endpoint\" constraint.",
)
parser.add_argument(
'-d', '--debug', dest='debug', action='store_true', default=False, help='Output additional runtime messages.'
'-d',
'--debug',
dest='debug',
action='store_true',
default=False,
help='Output additional verbose runtime messages.',
)
parser.add_argument(
'--focus',
dest='focus',
action='store',
help='Optional IRIs of focus nodes from the DataGraph, the shapes will validate only these node. Comma-separated list.',
nargs="?",
default=None,
)
parser.add_argument(
'--shape',
dest='shape',
action='store',
help='Optional IRIs of a NodeShape or PropertyShape from the SHACL ShapesGraph, only these shapes will be used to validate the DataGraph. Comma-separated list.',
nargs="?",
default=None,
)
parser.add_argument(
'-f',
Expand Down Expand Up @@ -262,6 +283,10 @@ def main(prog: Union[str, None] = None) -> None:
validator_kwargs['advanced'] = True
if args.js:
validator_kwargs['js'] = True
if args.focus:
validator_kwargs['focus_nodes'] = [_f.strip() for _f in args.focus.split(',')]
if args.shape:
validator_kwargs['use_shapes'] = [_s.strip() for _s in args.shape.split(',')]
if args.iterate_rules:
if not args.advanced:
sys.stderr.write("Iterate-Rules option only works when you enable Advanced Mode.\n")
Expand Down
14 changes: 5 additions & 9 deletions pyshacl/constraints/core/logical_constraints.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,10 @@ def constraint_name(cls):

def make_generic_messages(self, datagraph: GraphLike, focus_node, value_node) -> List[rdflib.Literal]:
if len(self.not_list) == 1:
m = f"Node {stringify_node(datagraph, value_node)} conforms to shape {stringify_node(self.shape.sg.graph, self.not_list[0])}"
m = f"Node {stringify_node(datagraph, value_node)} must not to shape {stringify_node(self.shape.sg.graph, self.not_list[0])}"
else:
nots_list = " , ".join(stringify_node(self.shape.sg.graph, n) for n in self.not_list)
m = f"Node {stringify_node(datagraph, value_node)} conforms to one or more shapes in {nots_list}"
m = f"Node {stringify_node(datagraph, value_node)} must not conform to any shapes in {nots_list}"
return [rdflib.Literal(m)]

def evaluate(self, executor: SHACLExecutor, datagraph: GraphLike, focus_value_nodes: Dict, _evaluation_path: List):
Expand Down Expand Up @@ -162,7 +162,7 @@ def make_generic_messages(self, datagraph: GraphLike, focus_node, value_node) ->
and_list = " , ".join(
stringify_node(self.shape.sg.graph, a_c) for a in self.and_list for a_c in self.shape.sg.graph.items(a)
)
m = "Node {} does not conform to all shapes in {}".format(stringify_node(datagraph, value_node), and_list)
m = "Node {} must conform to all shapes in {}".format(stringify_node(datagraph, value_node), and_list)
return [rdflib.Literal(m)]

def evaluate(
Expand Down Expand Up @@ -258,9 +258,7 @@ def make_generic_messages(self, datagraph: GraphLike, focus_node, value_node) ->
or_list = " , ".join(
stringify_node(self.shape.sg.graph, o_c) for o in self.or_list for o_c in self.shape.sg.graph.items(o)
)
m = "Node {} does not conform to one or more shapes in {}".format(
stringify_node(datagraph, value_node), or_list
)
m = "Node {} must conform to one or more shapes in {}".format(stringify_node(datagraph, value_node), or_list)
return [rdflib.Literal(m)]

def evaluate(
Expand Down Expand Up @@ -356,9 +354,7 @@ def make_generic_messages(self, datagraph: GraphLike, focus_node, value_node) ->
xone_list = " , ".join(
stringify_node(self.shape.sg.graph, a_c) for a in self.xone_nodes for a_c in self.shape.sg.graph.items(a)
)
m = "Node {} does not conform to exactly one shape in {}".format(
stringify_node(datagraph, value_node), xone_list
)
m = "Node {} must conform to exactly one shape in {}".format(stringify_node(datagraph, value_node), xone_list)
return [rdflib.Literal(m)]

def evaluate(
Expand Down
2 changes: 1 addition & 1 deletion pyshacl/constraints/core/shape_based_constraints.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def make_generic_messages(self, datagraph: GraphLike, focus_node, value_node) ->
m = "Value does not conform to Shape {}.".format(stringify_node(self.shape.sg.graph, self.node_shapes[0]))
else:
rules = "', '".join(stringify_node(self.shape.sg.graph, c) for c in self.node_shapes)
m = "Value does not conform to every Shape in ('{}').".format(rules)
m = "Value must conform to every Shape in ('{}').".format(rules)
m += " See details for more information."
return [rdflib.Literal(m)]

Expand Down
5 changes: 3 additions & 2 deletions pyshacl/pytypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
#

from dataclasses import dataclass
from typing import Optional, Union
from typing import List, Optional, Union

from rdflib import ConjunctiveGraph, Dataset, Graph, Literal
from rdflib.term import IdentifiedNode
from rdflib.term import IdentifiedNode, URIRef

ConjunctiveLike = Union[ConjunctiveGraph, Dataset]
GraphLike = Union[ConjunctiveLike, Graph]
Expand All @@ -23,3 +23,4 @@ class SHACLExecutor:
debug: bool = False
sparql_mode: bool = False
max_validation_depth: int = 15
focus_nodes: Optional[List[URIRef]] = None
16 changes: 10 additions & 6 deletions pyshacl/rdfutil/stringify.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,15 +95,19 @@ def stringify_list(node: rdflib.BNode) -> str:


def stringify_literal(graph: rdflib.Graph, node: rdflib.Literal, ns_manager: Optional[NamespaceManager] = None):
lit_val_string = str(node.value)
lex_val_string = str(node)
lit_val_string: Union[str, None] = None if node.value is None else str(node.value)
lex_string = str(node)
if ns_manager is None: # pragma: no cover
ns_manager = graph.namespace_manager
ns_manager.bind("sh", SH)
if lit_val_string != lex_val_string:
val_string = "\"{}\" = {}".format(lex_val_string, lit_val_string)
if lit_val_string is not None:
i_at = lit_val_string.find(" object at 0x")
if i_at > 0:
lit_val_string = lit_val_string[:i_at]
if lit_val_string is not None and lit_val_string != lex_string:
val_string = "\"{}\" = {}".format(lex_string, lit_val_string)
else:
val_string = "\"{}\"".format(lex_val_string)
val_string = "\"{}\"".format(lex_string)
if node.language:
lang_string = ", lang={}".format(str(node.language))
else:
Expand Down Expand Up @@ -136,7 +140,7 @@ def find_node_named_graph(dataset, node):
return g
except StopIteration:
continue
raise RuntimeError("Cannot find that node in any named graph.")
raise RuntimeError(f"Cannot find node {node} in any named graph.")


def stringify_node(
Expand Down
63 changes: 41 additions & 22 deletions pyshacl/shape.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
import sys
from decimal import Decimal
from time import perf_counter
from typing import TYPE_CHECKING, Dict, List, Optional, Set, Tuple, Type, Union
from typing import TYPE_CHECKING, Dict, List, Optional, Sequence, Set, Type, Union

from rdflib import BNode, Literal, URIRef
from rdflib import BNode, IdentifiedNode, Literal, URIRef

from .consts import (
RDF_type,
Expand Down Expand Up @@ -622,10 +622,8 @@ def validate(
target_graph: GraphLike,
focus: Optional[
Union[
Tuple[Union[URIRef, BNode]],
List[Union[URIRef, BNode]],
Set[Union[URIRef, BNode]],
Union[URIRef, BNode],
Sequence[RDFNode],
RDFNode,
]
] = None,
_evaluation_path: Optional[List] = None,
Expand All @@ -634,33 +632,54 @@ def validate(
if executor.debug:
self.logger.debug(f"Skipping shape because it is deactivated: {str(self)}")
return True, []
focus_list: Sequence[RDFNode] = []
if focus is not None:
lh_shape = False
rh_shape = True
self.logger.debug(f"Running evaluation of Shape {str(self)}")
if not isinstance(focus, (tuple, list, set)):
focus = [focus]
self.logger.debug(f"Shape was passed {len(focus)} Focus Node/s to evaluate.")
if len(focus) < 1:
return True, []
# Passed in Focus node _can_ be a Literal, happens in PropertyShapes
# when the path resolves to a literal or set of Literals
if isinstance(focus, (IdentifiedNode, Literal)):
focus_list = [focus]
else:
focus_list = list(focus)
self.logger.debug(f"Shape was passed {len(focus_list)} Focus Node/s to evaluate.")
else:
lh_shape = True
rh_shape = False
self.logger.debug(f"Checking if Shape {str(self)} defines its own targets.")
self.logger.debug("Identifying targets to find focus nodes.")
if executor.sparql_mode:
focus = self.focus_nodes_sparql(target_graph, debug=executor.debug)
focus_set = self.focus_nodes_sparql(target_graph, debug=executor.debug)
else:
focus = self.focus_nodes(target_graph, debug=executor.debug)
self.logger.debug(f"Found {len(focus)} Focus Nodes to evaluate.")
if len(focus) < 1:
# It's possible for shapes to have _no_ focus nodes
# (they are called in other ways)
if executor.debug:
self.logger.debug(f"Skipping shape {str(self)} because it found no focus nodes.")
focus_set = self.focus_nodes(target_graph, debug=executor.debug)
self.logger.debug(f"Found {len(focus_list)} Focus Nodes to evaluate.")
focus_list = list(focus_set)

if len(focus_list) < 1:
# It's possible for shapes to have _no_ focus nodes
# (they are called in other ways)
if executor.debug:
self.logger.debug(f"Skipping shape {str(self)} because it found no focus nodes.")
return True, []
else:
self.logger.debug(f"Running evaluation of Shape {str(self)}")

if executor.focus_nodes is not None and len(executor.focus_nodes) > 0:
filtered_focus_nodes: List[Union[URIRef]] = []
for _fo in focus_list: # type: RDFNode
if isinstance(_fo, URIRef) and _fo in executor.focus_nodes:
filtered_focus_nodes.append(_fo)
len_orig_focus = len(focus_list)
len_filtered_focus = len(filtered_focus_nodes)
if len_filtered_focus < 1:
self.logger.debug(f"Skipping shape {str(self)} because specified focus nodes are not targeted.")
return True, []
else:
self.logger.debug(f"Running evaluation of Shape {str(self)}")
elif len_filtered_focus != len_orig_focus:
self.logger.debug(
f"Filtered focus nodes based on focus_nodes option. Only {len_filtered_focus} of {len_orig_focus} focus nodes remain."
)
focus_list = filtered_focus_nodes
t1 = ct1 = 0.0 # prevent warnings about use-before-assign
collect_stats = bool(executor.debug)

Expand Down Expand Up @@ -703,7 +722,7 @@ def validate(
parameters = (p for p, v in self.sg.predicate_objects(self.node) if p in search_parameters)
reports = []
focus_value_nodes = self.value_nodes(
target_graph, focus, sparql_mode=executor.sparql_mode, debug=executor.debug
target_graph, focus_list, sparql_mode=executor.sparql_mode, debug=executor.debug
)
filter_reports: bool = False
allow_conform: bool = False
Expand Down
Loading