Skip to content

InspectorXSLT

Wendell Piez edited this page Feb 20, 2024 · 14 revisions

Inspector XSLT - a Metaschema validation engine for XML

See the Discussion topic on this initiative.

PRs are also welcome as a form of feedback, including edits to this page.

Rationale, purpose and behavior

The project has both concrete goals (enabling capabilities) and abstract goals (demonstrating principles, testing hypotheses).

Applying an XSLT to determine instance validity

"Validity" is usually determined for XML by a schema language such as XSD, RNG, DTD, sometimes in combination with Schematron (a query-based assertion language) or XPath.

However, the same set of properties or logical assertions that constitutes validity can be implemented by a single XSLT transformation, if the input rule set is limited to rules that are easily enforced with XSLT/XPath. Metaschema provides for exactly such a limitation, radically simplifying the requirements for full-stack 'schema emulation' in a simple transformation.*

This amounts to providing equivalent functionality to a schema-based validation, since the system of regularities imposed by the rules is the same, resulting in equivalent functional requirements for mappings from source ("valid" and "invalid") to results (i.e. known to be valid and invalid by virtue of testing for intrinsic properties, not say-so) - not always the same outputs, but outputs reporting the same variances from expected or defined state.

The equivalence can be demonstrated by comparing results of running an InspectorXSLT over a document or set of documents, with those running the equivalent schema validation. Among the same set of instances tested, the same documents should be reported in both cases as valid or invalid, for equivalent reasons.

* For markup language designers: Metaschema assemblies in v1.0 do not support the full range of XML element content model constructs for grouping, sequencing, and cardinality, instead limiting itself to a subset that sits cleanly within the constraints of object/property modeling in JSON or YAML.

Building such an XSLT using XSLT

As such an XSLT can be defined as a mapping from a Metaschema into an XSLT implementation of its rules, it can be codified and then generated by a 'Metaschema transpiler' transformation that produces the functional InspectorXSLT that checks the rules of that Metaschema module.

Capabilities

At this point the Inspector XSLT becomes useful in several ways:

  • As a validator of your Metaschema-based XML format
  • As a second validator. Where two implementations agree, the agreement itself is valuable
    • Reporting an error twice from two different systems is more than twice as good as reporting it once
    • Discrepancies in reporting expose variances and bugs in the implementations as well as the data
    • Thus cross-checking is as good as checking and sometimes better
  • As such it is also 'full stack' - it replaces both schema validation and constraints validation (as might be provided via Schematron or another rules engine), and cross-checks against both.
  • Because InspectorXSLT is XSLT-based, it can be deployed across a range of platforms with consistent results
    • May work in settings where you can't support another technology
    • Supports a wide range of workflows
  • Convenience features
    • batch processing
    • customizing reports
    • adjusting log levels and console tracebacks
  • For shops that already know XML/XSLT, or even that do not, this is a way into Metaschema

The implementation provides test metaschemas for trying out its features, or developers can use their own metaschemas or public ones such as OSCAL. We know there are bugs so patience is appreciated.

How to try it

At time of writing, the distribution awaits merging and lives in a fork: https://github.com/wendellpiez/metaschema-xslt/tree/issue72-XSLT-inspectorA/src/schema-gen/InspectorXSLT

It contains readme files with specific directions (e.g., readme1 and readme2).

Please feel free to provide feedback where these directions are not clear, or whether they should be moved to this wiki.

Command line invocation for validating single instances

Having generated an Inspector XSLT, it can be applied from the command line using a XSLT 3.0/3.1 engine (i.e., Saxon). Scripts are provided as demonstrations (requiring bash but free to port) using Apache Maven for Java libraries.

Directing outputs, batch processing

Similarly, scripts are shown or the XSLT can be directly invoked to produce file (report) outputs and to batch process files, using features either of Saxon or of XProc 1.0.

Currently documented in readmes, (script and XSLT) interfaces, and code comments, these are all features that could be documented on this wiki, if they are found useful.

What do we need to know

  • Confirm generally that it works, and how well it works
  • Provide guidance and any tips for user documentation
  • Contribute ideas for use cases
  • If you like XSpec,
    • Try the XSpecs starting with functional XSpecs
    • consider contributing more/better tests around more functional edges
  • If you are learning XSLT
    • Examine the generated Inspector XSLT for legibility, traceability and debuggability
    • Does it make sense? How can it be improved for learners, analysts and assessors?
  • If you really need it working, and not just experimentally
    • Consider contributing to a public implementation for your Metaschema including realistic functional (test) examples
    • OSCAL InspectorXSLT goes in OSCAL-xslt repository - and speak up in OSCAL channels

How to offer feedback

Even the most casual feedback is welcome. Indeed casual encouragement may be more welcome than more work to do.

  • Email the principal developer w e n d e l l (dot) p i e z (at) n i s t (dot) g o v.
  • A Discussion Board for this project hosts Q/A and free-form discussion
  • Or bug reports / feature requests are welcome on the Issues board
  • Join us in NIST Metaschema Element channel (chat)
  • Clone, copy, fork or reverse engineer this work, and let us know (but consider contributing first)

Future directions

On deck next might be:

Let us know if these are the wrong priorities.

Known issues and limitations

Dependencies

XSLT 3.0/XPath 3.1, Saxon 10+, hence Java/Maven or alternative Saxon distribution.

We would be very interested to see the tool running under any other XSLT implementation.

Testing - application and generator

This project aims to demonstrate not only the capability but also the viability of this approach, in combination with other approaches.

This necessitates that testing be intelligible, traceable and comprehensive.

This effort has only started (see the testing directory) but results are promising. XSpec testing is in place for functional testing of an InspectorXSLT (testing against the Metaschema semantics) but also unit testing for production of that XSLT from Metaschema sources. A testing harness is provided that brings no additional dependencies beyond Saxon/XSLT 3.0 and Maven (supporting a Java runtime).

Constraints implementation

Implementation of the constraints element set allowed-values, matches, index-has-key (with index declaration), is-unique and expect have been implemented and lightly tested. This is a promising area of work but needs testing and demonstration over realistic data sets.

allowed-values functionality is behind current draft spec - see https://github.com/usnistgov/metaschema/pull/413

Metapath implementation

In use as Metapath, some complex XPaths in metaschema inputs may produce bugs in constraints implementation, as this processor supports only the 'pattern subset' of XPath for purposes of defining targets for constraints (specifically the syntax for selection patterns). This may be a limitation compared to other prototype implementations of Metapath - although there are generally ways to rewrite any path that doesn't work to one that does. In a next-gen InspectorXSLT, full path parsing will be supported (see iXML, above) and this will become a non-issue, either because we can rewrite the paths, or refactor the approach entirely.