diff --git a/README.md b/README.md index 444bbda..4266c01 100644 --- a/README.md +++ b/README.md @@ -32,10 +32,18 @@ Enabling these lightweight, transparent and declarative "logical layers" written - [SchXSLT][schxslt] - ISO Schematron / community enhancements - [XSpec][xspec] - XSpec - XSLT/XQuery unit testing +As an alternative to Morgana, users are also invited to test [XML Calabash 3][xmlcalabash3]. At time of writing, this release is too new to be incorporated into the project, but appears promising as an alternative platform for everything demonstrated here. + These are open-source projects in support of W3C- and ISO-standardized technologies. Helping to install, configure, and make these work seamlessly, so users do not have to notice, is a goal of this project. If this software is as easy, securable and performant as we hope to show, it might be useful not only to XML-stack developers but also to others who wish to cross-check their OSCAL data or software supporting OSCAL by comparison with another stack. +### XProc testbed + +XProc developers, similarly, may be interested in this project as a testbed for performance and conformance testing. + +This deployment is also intended to demonstrate conformance to relevant standards and external specifications, not just to APIs and interfaces defined by tool sets. + ### Projects -- current and conceived See the [Projects folder](./projects/) for current projects. Projects now planned for deployment in this repository include: @@ -48,18 +56,22 @@ See the [Projects folder](./projects/) for current projects. Projects now planne - Find and demonstrate modeling or conformance issues in schemas or processors - Conversely, demonstrate conformance of validators and design of models - Showcase differences between valid and invalid documents, especially edge cases - - [`oscal-import`](projects/oscal-import/) - produce OSCAL from PDF via HTML and NIST STS formats - a demonstration showing conversion of a 'high-touch' document into OSCAL, mapping its structures + - [`cprt-import`](projects/cprt-import/) - produce OSCAL from a raw JSON feed (not OSCAL) - demonstrating conversion of NIST CPRT [NIST SP 800-171](https://csrc.nist.gov/projects/cprt/catalog#/cprt/framework/version/SP_800_171_3_0_0/home) into OSCAL + - [`FM6-22-import`](projects/FM6-22-import/) - produce OSCAL from PDF via HTML and NIST STS formats - a demonstration showing conversion of a 'high-touch' document into OSCAL, namely US Army Field Manual 6-22 Chapter 4 "Developing Leadership", mapping its structures into STS and OSCAL formats - `batch-validate` validate OSCAL in batches against schemas and schema emulators - `index-oscal` - produce indexes to information encoded in OSCAL TODO: update this list + READERS: [anything to add?][repo-issues] Applications in this repository may occasionally have general use outside OSCAL; users who find any of its capabilities should be generalized and published separately please [create a Github Issue][repo-issues]. ### Organization -Folders outside `projects` including `lib`, `smoketest`, `project-template`, `testing`, `icons` and (hidden) `.github` folders serve the repository as a whole; specific applications are all to be found among [projects](./projects). +Folders outside `projects` including `lib`, `smoketest`, `project-template`, `testing`, `icons` and (hidden) `.github` folders serve the repository as a whole; specific applications are all to be found among [projects](./projects/). + +An exception to this is the [tutorial](./tutorial/), which is a project, but also uses the projects as its source, so is kept apart from the other applications as a "global" project. [The `lib` directory](./lib) comes bare bones - it has only its readme, a configuration file and a couple of utility pipelines. This library is populated by the [installation script](./setup.sh), and (once the basic setup is done) by running the pipelines. @@ -87,7 +99,7 @@ The software in this repository is at varying levels of maturity. Many styleshee At the same time, the libraries we use (Morgana, Saxon and others) are themselves at various levels of maturity (Saxon in particular having been field-tested for over 20 years). And both particular initiatives and the code repository as a whole follow an incremental development model. Things left as good-enough-for-now are regarded as being good enough, until experience shows us it is no longer so. Punctuated equilibrium is normal. New contrivances are made of old and reliable parts. -Assume the worst, hope for the best, and test. +*Assume the worst, hope for the best, and test.* Cloning the repository is encouraged and taken as a sign of success. So is any participation in testing and development activities. @@ -119,7 +131,7 @@ Assuming 'TODO' items are addressed and these markers disappear, the git history
Innovations -As of mid-2024, we believe some aspects of this initiative are innovative or unusual, even as it stands on foundations laid by others. Please let us know of relevant prior art, or independent invention, especially if it anticipates the work here. +As of mid-2024, we believe some aspects of this initiative are innovative or unusual, even as it stands on foundations laid by others. Please let us know of relevant prior art, or independent invention, especially if it anticipates the work here. It is to be hoped that some of these applications are "obvious" and not as new as we think at least in conception. #### Pipelines for “self setup” @@ -165,6 +177,8 @@ This makes cloning and further development easier. ## Where to start +One way to start is to dive into the [Tutorial](tutorial/readme.md). This introduction to XProc does not assume prior XML expertise, only a willingness to learn. +
OSCAL developers @@ -196,7 +210,9 @@ An [XProc tutorial](tutorial/sequence/lesson-sequence.md) is offered on this sit ### Installation instructions -Note: if you already have Morgana XProc III installed, you should be able to use it, appropriately configured, to run any pipeline in the repository. But local installation is also easy and clean. +Needed only if you do not already have an XProc 3 engine such as Morgana or XML Calabash. If you already have support for XProc 3, consider using your available tooling, instead or in addition to the runtime offered. + +(Any bugs you find in doing so can be addressed and the entire repository "hardened" thereby -- one of the beneficial network effects of multiple implementations of a standard.) *Platform requirements*: Java, with a `bash` shell for automated installation. Only Java is required if you can install manually. @@ -280,11 +296,11 @@ See the [projects/](./projects/) directory with a list of projects - each should Or jump to these projects: +- [XProc Tutorial](tutorial/readme.md) provides step-by-step instructions and play-by-play commentary. - [Schema Field Tests](./schema-field-tests) - Testing whether OSCAL schemas correctly enforce rules over data (with surprises) - [OSCAL Profile Resolution](./profile-resolution) - converting an OSCAL profile (representing a baseline or overlay) into its catalog of controls -- [./projects/oscal-import/](./projects/oscal-import/) - Produce OSCAL from a PDF source via HTML and XML conversions +- Produce OSCAL from other data formats: from raw JSON source in [CPRT import](projects/CPRT-import/); or from PDF source via HTML and XML conversions[FM6-22 import](projects/FM6-22-import) - Any XProc3 pipeline can be executed using the script `xp3.sh` (`bash`) or `xp3.bat` (Windows CMD). For example: ```bash @@ -309,14 +325,14 @@ See the [House Rules](./house-rules.md) for more information.
Drag and drop (Windows only) -Optionally, Windows users can use a batch file command interface, with drag-and-drop functionality in the GUI (graphical user interface, your 'Desktop'). +[Optionally, Windows users can use a batch file command interface](https://github.com/usnistgov/oscal-xproc3/discussions/18), with drag-and-drop functionality in the GUI (graphical user interface, your 'Desktop'). In the File Explorer, try dragging an icon for an XPL file onto the icon for `xp3.bat`. (Tip: choose a pipeline whose name is in all capitals, as in 'ALL-CAPS.xpl' — explanation below.) Gild the lily by creating a Windows shortcut to the 'bat' file. This link can be placed on your Desktop or in another folder, ready to run any pipelines that happen to be dropped onto it. Renaming the shortcut and changing its icon are also options. Some icons for this purpose are provided [in the repository](./icons/). -TODO: Develop and test [./xp3.sh](./xp3.sh) so it too offers this or equivalent functionality on \*nix or Mac platforms - AppleScript! - lettuce know 🥬 if you want or can do this - +TODO: Develop and test [./xp3.sh](./xp3.sh) (or scripts to come) so it too offers this or equivalent functionality on \*nix or Mac platforms - AppleScript! - lettuce know 🥬 if you want or can do this +
## Testing @@ -375,11 +391,23 @@ Morgana and Saxon both require Java, as detailed on their support pages. SchXSLT See [THIRD_PARTY_LICENSES.md](./THIRD_PARTY_LICENSES.md) for more. +As noted above, however, all software is also conformant with relevant open-source language specifications, and should deliver the same results, verifiably, using other software that follows the same specifications, including XProc and XSLT processors yet to be developed. + XProc 3.0 aims to be platform- and application-independent, so one use of this project will be to test and assess portability across environments supporting XProc. ## XProc platform acknowledgements -With the authors of incorporated tooling, the many contributors to the XProc and XML stacks underlying this functionality are owed thanks and acknowledgement. These include Norman Walsh, Achim Berndzen and the developers of XProc versions 1.0 and 3.0; developers of embedded commodity parsers and processers such as Java Xerces, Trang, and Apache FOP (to mention only three); and all developers of XML, XSLT, and XQuery especially unencumbered and open-source. Only an open, dedicated and supportive community could prove capable of such a collective achievement. +With the authors of incorporated tooling, the many contributors to the XProc and XML stacks underlying this functionality are owed thanks and acknowledgement. These include + +- [Henry Thompson](https://www.xml.com/pub/a/ws/2001/02/21/devcon1.html) and other pioneers of XML pipelining on a standards basis +- Norman Walsh +- Norm's fellow committee members and developers of XProc versions 1.0 and 3.0 +- Developers of embedded commodity parsers and processers such as Java Xerces, Trang, and Apache FOP (to mention only three) +- All developers of XML, XSLT, and XQuery technologies and applications, especially unencumbered and open-source + +Only an open, dedicated and supportive community could prove capable of such a collective achievement. + +This work is dedicated to the memory of Michael Sperberg-McQueen and to all his students, past and future. --- @@ -400,8 +428,8 @@ This README was composed starting from the [NIST Open Source Repository template [oscal-xslt]: https://github.com/usnistgov/oscal-xslt [oscal-cli]: https://github.com/usnistgov/oscal-cli [xslt3-functions]: https://github.com/usnistgov/xslt3-functions - [xdm3]: https://www.w3.org/TR/xpath-datamodel/ +[xmlcalabash]: https://github.com/xmlcalabash/xmlcalabash3 [xslt3]: https://www.w3.org/TR/xslt-30/ [xproc]: https://xproc.org/ [xproc-specs]: https://xproc.org/specifications.html diff --git a/projects/FM6-22-import/PRODUCE_FM6-22-chapter4.xpl b/projects/FM6-22-import/PRODUCE_FM6-22-chapter4.xpl index a89009c..bd23864 100644 --- a/projects/FM6-22-import/PRODUCE_FM6-22-chapter4.xpl +++ b/projects/FM6-22-import/PRODUCE_FM6-22-chapter4.xpl @@ -29,8 +29,9 @@ for demonstration or diagnostics --> - - + diff --git a/tutorial/PRODUCE-TUTORIAL-PREVIEW.xpl b/tutorial/PRODUCE-TUTORIAL-PREVIEW.xpl index 36d1cc2..095151f 100644 --- a/tutorial/PRODUCE-TUTORIAL-PREVIEW.xpl +++ b/tutorial/PRODUCE-TUTORIAL-PREVIEW.xpl @@ -140,7 +140,7 @@ tr:nth-child(even) { background-color: gainsboro } th { width: clamp(10em, auto, 40em) } td { width: clamp(10em, auto, 40em); border-top: thin solid grey } -section.unit { width: clamp(45ch, 50%, 75ch); padding: 0.8em; outline: thin solid black; margin: 0.6em 0em } +section.unit { width: clamp(45ch, 100%, 75ch); padding: 0.8em; outline: thin solid black; margin: 0.6em 0em } section.unit h1:first-child { margin-top: 0em } .observer { background-color: honeydew ; grid-column: 2 } .maker { background-color: seashell ; grid-column: 3 } @@ -160,19 +160,22 @@ span.wordcount.over { color: darkred } - + + + + - + - + + diff --git a/tutorial/punchlist.md b/tutorial/punchlist.md index c714946..f2fa6fc 100644 --- a/tutorial/punchlist.md +++ b/tutorial/punchlist.md @@ -24,8 +24,8 @@ To add to the production pipeline, edit PRODUCE-TUTORIAL-MARKDOWN.xpl - review phase: - Commends Day? (week?) go through all the comments consider factoring out into p:documentation / tooling - - 101 sequence is inspection and observation (only) - - 102 sequence is hands-on + - Observer sequence is inspection and observation (only) + - Maker sequence is hands-on - all 'Goals' in sequence, all 'Resources' in sequence, etc - where can we default e.g. `with-input` in place of `with-input[@port='source']` ? test all these ... - Review and normalize usage of 'i', 'b', 'em' and other inline elements? @@ -337,5 +337,20 @@ Note - in some places there may be 'road work' going on Here we should start with a proposed visiting order? +### XProc Synopsis + +Input ports bound - p:document | p:inline + top-level + per step + inlines +Output ports defined +Options defined +Imports + +At a glance: +- all load and document/@href +- all store/@href + + diff --git a/tutorial/readme.md b/tutorial/readme.md index d8ab1e4..7b2dcbd 100644 --- a/tutorial/readme.md +++ b/tutorial/readme.md @@ -4,7 +4,7 @@ This is work in progress towards an XProc 3.0 (and 3.1) tutorial or set of tutor Coverage here is not a substitute for project documentation - the tutorial relies on projects in the repo for its treatments - but an adjunct to it for beginners and new users who wish for guidance and information on XProc that they are not likely to find for themselves. -In its current form, only introductory materials are offered. The framework is easily extensible to cover more topics, and an XProc-based tutorial production system is part of the demonstration. +In its current form, only the first introductory materials are offered. The framework is easily extensible to cover more topics, and an XProc-based tutorial production system is part of the demonstration. But the approach needs to be tested more before it is extended. Tutorial exercises can be centered on OSCAL-oriented applications but the learning focus will be XProc 3.0/3.1. @@ -20,7 +20,7 @@ Follow the tutorial by reading the files published in the repository, or by copy First and foremost this is a "practicum" or *hands-on* introduction that encourages readers not only to follow along, but to try things out, practice and learn by interactive observation. -Otherwise, the tutorial is designed to support multiple different approaches suitable for different learners and needs - both learning styles, and use cases ("user stories") as described below. Develop an approach that works for you by moving at your own speed and skipping, skimming or delving more deeply into topics and problems of interest. +Otherwise, the tutorial is designed to support multiple different approaches suitable for different learners and needs - both learning styles and goals as described below. Develop an approach that works for your case by moving at your own speed and skipping, skimming or delving more deeply into topics and problems of interest. Each topic ("Lesson") in a sequence offers a set of Lesson Units around a common problem area or theme, leveraging projects in the repository to provide problems and solutions with working pipelines to run and analyze. @@ -53,9 +53,9 @@ To enable readers to cater to their own needs, the tutorial offers these **track Since the different tracks are arranged along the same topics, the treatments are also suitable for groups who wish to work any or all tracks collaboratively. -If you want a no-code experience, skip the Maker track and skim the Observer track, but do not skip looking at the code base, accepting that much will remain mysterious. +If you want a no-code experience, read the Learner track, skip the Maker track and skim the Observer track. Keep in mind that you might have to run pipelines, if only to see their outputs. -If security concerns preclude you from running locally, post us an Issue and the dev team will investigate options including a container-based distribution of some nature. The beauty and simplicity of 'bare bones' however is what recommends it to us and you. +If for any reason you can't run XProc or Java, post us an Issue and the dev team will investigate options including a container-based distribution of some nature. The simplicity of 'bare bones' however recommends it to us and you. ### Observer Track @@ -83,7 +83,7 @@ If you are a tactile learner with no patience for reading, you can skim through In parallel with the other two tracks, the Learner track offers all readers more explanation and commentary, in greater depth and with more links. -Note that the Learner track represents the views of one still learning, so it is subject to change and refinement - most especially if you find things in it that are in need of clarification or correction. +Note that the Learner track itself represents the views of one still learning, so it is subject to change and refinement - most especially if you find things in it that are in need of clarification or correction. ### Easter eggs @@ -134,11 +134,11 @@ See the top-level pipelines for current capabilities. At time of writing: [PRODUCE-TUTORIAL-MARKDOWN.xpl](PRODUCE-TUTORIAL-MARKDOWN.xpl) produces a set of Markdown files, writing them to the `sequence` directory. -[PRODUCE-TUTORIAL-TOC.xpl]() produces the [Tutorial Table of Contents](sequence/lesson-sequence.md) +[PRODUCE-TUTORIAL-TOC.xpl](PRODUCE-TUTORIAL-TOC.xpl) produces the [Tutorial Table of Contents](sequence/lesson-sequence.md) [PRODUCE-TUTORIAL-PREVIEW.xpl](PRODUCE-TUTORIAL-PREVIEW.xpl) produces a single [preview tutorial on one HTML page](tutorial-preview.html) -[PRODUCE-PROJECTS-ELEMENTLIST.xpl] produces an [index to XProc elements appearing in pipelines](sequence/element-directory.md) under discussion - read about it in the lessons +[PRODUCE-PROJECTS-ELEMENTLIST.xpl](PRODUCE-PROJECTS-ELEMENTLIST.xpl) produces an [index to XProc elements appearing in pipelines](sequence/element-directory.md) under discussion - read about it in the lessons # Leave your tracks diff --git a/tutorial/sequence/Lesson01/acquire_101.md b/tutorial/sequence/Lesson01/acquire_101.md index cb709dd..3019804 100644 --- a/tutorial/sequence/Lesson01/acquire_101.md +++ b/tutorial/sequence/Lesson01/acquire_101.md @@ -54,7 +54,7 @@ After reading and reviewing these documents, perform the setup on your system as After running the setup script, or performing the installation by hand, make sure you can run all the smoke tests successfully. -As noted in the docs, if you happen already to have [Morgana XProc III](https://www.xml-project.com/morganaxproc-iiise.html), you do not need to download it again. Try skipping straight to the smoke tests. You can use a runtime script `xp3.sh` or `xp3.bat` as a model for your own, and adjust. Any reasonably recent version of Morgana should function if configured correctly, and we are interested if it does not. +As noted in the docs, if you happen already to have [Morgana XProc III](https://www.xml-project.com/morganaxproc-iiise.html), you do not need to download it again. Try skipping straight to the smoke tests. You can use a runtime script `xp3.sh` or `xp3.bat` as a model for your own, and adjust. Any reasonably recent version of Morgana should function if configured correctly, and we are interested if it does not. ### Shortcut @@ -96,7 +96,7 @@ Such a script itself must be “vanilla” and generic: it simply invoke ### When running from a command line -As simple examples, these scripts show only one way of running XProc. Keep in mind that even simple scripts can be used in more than one way. +As simple examples, these scripts show only one way of running XProc. Keep in mind that even simple scripts can be used in more than one way. For example, a pipeline can be executed from the project root: @@ -104,13 +104,13 @@ For example, a pipeline can be executed from the project root: $ ./xp3.sh smoketest/TEST-XPROC3.xpl ``` -Alternatively, a pipeline can be executed from its home directory, for example if currently in the `smoketest` directory (note the path to the script): +Alternatively, a pipeline can be executed from its home directory, for example if currently in the `smoketest` directory (note the path to the script): ``` $ ../xp3.sh TEST-XPROC3.xpl ``` -This works the same ways on Windows, with adjustments: +This works the same ways on Windows, with adjustments: ``` > ..\xp3 TEST-XPROC3.xpl @@ -120,7 +120,7 @@ This works the same ways on Windows, with adjustments: Windows users (and others to varying degrees) can set up a drag-and-drop based workflow – using your mouse or pointer, select an XProc pipeline file and drag it to a shortcut for the executable (Windows batch file). A command window opens to show the operation of the pipeline. See the [README](../../README.md) for more information. -It is important to try things out since any of these methods can be the basis of a workflow. +It is important to try things out since any of these methods can be the basis of a workflow. For the big picture, keep in mind that while the command line is useful for development and demonstration – and however familiar XProc itself may become to the developer – to a great number of people it remains obscure, cryptic and intimidating if not forbidding. Make yourself comfortable at the command line! diff --git a/tutorial/sequence/Lesson01/acquire_102.md b/tutorial/sequence/Lesson01/acquire_102.md index fefefd5..8581cae 100644 --- a/tutorial/sequence/Lesson01/acquire_102.md +++ b/tutorial/sequence/Lesson01/acquire_102.md @@ -9,7 +9,7 @@ ## Goals * Look at some pipeline organization and syntax on the inside -* Success and failure invoking XProc pipelines: an early chance to “learn to die” gracefully (to use the gamers' idiom). +* Success and failure invoking XProc pipelines: making friends with tracebacks. ## Prerequisites diff --git a/tutorial/sequence/Lesson01/acquire_599.md b/tutorial/sequence/Lesson01/acquire_599.md index 6023df5..0d6b6ea 100644 --- a/tutorial/sequence/Lesson01/acquire_599.md +++ b/tutorial/sequence/Lesson01/acquire_599.md @@ -6,22 +6,36 @@ # 599: Meeting XProc +## Goals + +Offer some more context; help reduce the intimidation factor. + +XProc is not a simple thing, but a way in. The territory is vast, but the sky is always above us. + ## Resources [A Declarative Markup Bibliography](https://markupdeclaration.org/resources/bibliography) is available on line for future reference on this theoretical topic. ## Some observations -Because it is now centered on *pipelines* as much as on files and software packages, dependency management is different from other technologies including Java and NodeJS – how so? +Because it is now centered on *pipelines* as much as on files and software packages, dependency management when using XProc is different from other technologies including Java and NodeJS – how so? MorganaXProc-III is implemented in Scala, and Saxon is built in Java, but otherwise distributions including the SchXSLT and XSpec distributions consist mainly of XSLT. This is either very good (with development and maintenance requirements in view), or not good at all. -Which is it, and what are the determining variables that tell you XProc is a good fit? How much of this is due to the high-level, abstracted nature of [4GLs](https://en.wikipedia.org/wiki/Fourth-generation_programming_language) including both XSLT 3.1 and XProc 3.0? Prior experience with XML-based systems and the problem domains in which they work well is probably a factor. How much are the impediments technical, and how much are they due to culture? +If not using Morgana but another XProc engine (at time of writing, XML Calabash 3 has been published in alpha), there will presumably be analogous arrangements: contracts between the tool and its dependencies, software or components and capabilities bundled and unbundled. + +So does this work well, on balance, and what are the determining variables that tell you XProc is a good fit for data processing, whether high touch, or at scale? How much of this is due to the high-level, abstracted nature of [4GLs](https://en.wikipedia.org/wiki/Fourth-generation_programming_language) including both XSLT 3.1 and XProc 3.0? Prior experience with XML-based systems and the problem domains in which they work well is probably a consideration. How much are the impediments technical, and how much are they due to culture and perceptions? + +Will it always be that a developer determined to use XSLT will find a way, whereas a developer determined not to, will find a way to refuse it? XProc in 2024 seems slow in adoption – maybe because everyone who would want it, already has a functional equivalent in place. + +This being said, going forward the principle remains that we gain an implicit advantage when we find ways of exploiting technology opportunities that our peers and competitors have decided to neglect. In essence, by leaving XML, XSLT and XProc off the table, developers who choose not to use it may actually be giving easy money to developers who are able to adopt and exploit this externality, where it works. + +It's all about the tools. Find ways to support your open-source developer and the software development operations who offer free tools and services. ## Declarative markup in action Considerable care is taken in developing these demonstrations to see to it that the technologies on which we depend, notably XProc and XSLT but not limited to these, are both nominally and actually conformant to externally specified standard technologies, i.e. XProc and XSLT respectively (as well as others), and reliant to the greatest possible extent on well-documented and accessible runtimes. -It is a tall order to ask that any code base should be both easy to integrate and use with others, and at the same time, functionally complete and self-sufficient. Of these two, we are lucky to get one, even if we are thoughtful enough to limit ourselves to building blocks. Because the world is complex, we are always throwing in one or another new dependency, along with new rule sets. The approach enabled by XML and openly-specified supporting specifications is to work by making everything transparent as possible. We seek for clarity and transparency at all levels (so nothing is downloaded behind the scenes, for example) while also documenting as thoroughly as we can, including with code comments. +Is it too much to expect that any code base should be both easy to integrate and use with others, and at the same time, functionally complete and self-sufficient? Of these two, we are lucky to get one, even if we are thoughtful enough to limit ourselves to building blocks. Because the world is complex, we are always throwing in one or another new dependency, along with new rule sets. The approach enabled by XML and openly-specified supporting specifications is to work by making everything transparent as possible. We seek for clarity and transparency at all levels (so nothing is downloaded behind the scenes, for example) while also documenting as thoroughly as we can, including with code comments. Can any code base be fully self-explanatory and self-disclosing? Doubtful, even assuming those terms are meaningful. But one can try and leave tracks and markers, at least. We call it “code” with the hope and intent that it should be amenable to and rewarding of interpretation. diff --git a/tutorial/sequence/Lesson02/walkthrough_101.md b/tutorial/sequence/Lesson02/walkthrough_101.md index 3f43867..517531c 100644 --- a/tutorial/sequence/Lesson02/walkthrough_101.md +++ b/tutorial/sequence/Lesson02/walkthrough_101.md @@ -73,7 +73,7 @@ Each of the test pipelines exercises a simple sequence of operations. Open any X The aim here is demystification. Understand the parts to understand the whole. Reading the element names also inscribes them in memory circuits where they can be recovered. -### TEST-XPROC3 +### [TEST-XPROC3](../../../smoketest/TEST-XPROC3.xpl) Examine the pipeline [TEST-XPROC3.xpl](../../../smoketest/TEST-XPROC3.xpl). It breaks down as follows: @@ -83,7 +83,7 @@ Examine the pipeline [TEST-XPROC3.xpl](../../../smoketest/TEST-XPROC3.xpl). It b When you run this pipeline, the `CONGRATULATIONS` document given in line will be echoed to the console, where designated outputs will appear if not otherwise directed. -### TEST-XSLT +### [TEST-XSLT](../../../smoketest/TEST-XSLT.xpl) [This pipeline](../../../smoketest/TEST-XSLT.xpl) executes a simple XSLT transformation, in order to test that XSLT transformations can be successfully executed. @@ -97,7 +97,7 @@ If your pipeline execution can't process the XSLT (perhaps Saxon is not installe Errors in XProc are reported by the Morgana engine using XML syntax. Among other things, this means they can be captured and processed in pipelines. -### TEST-SCHEMATRON +### [TEST-SCHEMATRON](../../../smoketest/TEST-SCHEMATRON.xpl) Schematron is a language used to specify rules to apply to XML documents. In this case a small Schematron is applied to a small XML. @@ -105,7 +105,7 @@ Schematron is a language used to specify rules to apply to XML documents. In thi * `p:validate-with-schematron` – This is an XProc step specifically for evaluating an XML document against the rules of a given Schematron. Like the TEST-XPROC3 and TEST-XSLT` pipelines, this one presents its own input, given as a literal XML document given in the pipeline document (using `p:inline`). A setting on this step provides for it to throw an error if the document does not conform to the rules. The Schematron file provided as input to this step, [src/doing-well.sch](../../../smoketest/src/doing-well.sch), gives the rules. This flexible technology enables easy testing of XML against rule sets defined either for particular cases in particular workflows, or for entire classes or sets of documents. * `p:namespace-delete` – This step is used here as in the other tests for final cleanup of the information produced. -### TEST-XSPEC +### [TEST-XSPEC](../../../smoketest/TEST-XSPEC.xpl) [XSpec](https://github.com/xspec/xspec) is a testing framework for XSLT, XQuery and Schematron. It takes the form of a vocabulary and a process (inevitably implemented in XSLT and XQuery) for executing queries, transformations, and validations, by running them over known inputs, comparing the results to expected results, and reporting the results of this comparison. XProc, built to orchestrate manipulations of XML contents, is well suited for running XSpec. diff --git a/tutorial/sequence/Lesson02/walkthrough_102.md b/tutorial/sequence/Lesson02/walkthrough_102.md index 557a035..36d2317 100644 --- a/tutorial/sequence/Lesson02/walkthrough_102.md +++ b/tutorial/sequence/Lesson02/walkthrough_102.md @@ -6,8 +6,6 @@ # 102: XProc fundamentals - - ## Goals * More familiarity with XProc 3.0, with more syntax @@ -22,13 +20,10 @@ You have done [Setup 101](../acquire/acquire_101.md), [Setup 102](../acquire/acq Take a quick look *now* (and a longer look later): -This tutorial's handmade [XProc links page](../../xproc-links.md) - -Also, the official [XProc.org dashboard page](https://xproc.org) - -Also, check out XProc index materials produced in this repository: [XProc docs](../../../projects/xproc-doc/readme.md) - -And the same pipelines you ran in setup: [Setup 101](../acquire/acquire_101.md). +* This tutorial's handmade [XProc links page](../../xproc-links.md) +* Also, the official [XProc.org dashboard page](https://xproc.org) +* If interested, check out XProc index materials produced in this repository: [XProc docs](../../../projects/xproc-doc/readme.md) +* In any case, the same pipelines you ran in setup: [Setup 101](../acquire/acquire_101.md). ## Learning more about XProc @@ -55,7 +50,7 @@ XProc pipelines described in [the previous lesson unit](walkthrough_101.md) cont * Yes, those conventions are enforced in the repository by [a Schematron](../../../testing/xproc3-house-rules.sch) that can be applied to any pipeline, both in development and when it is committed to the repository under CI/CD (continuous integration / continous development). Assuming we take care to run our tests and validations, this does most of the difficult work maintaining consistency, namely detecting the inconsistency. * Reassuring messages aside, no XSpec reports are actually captured by this XProc! With nothing bound to an output port, it *sinks* by default. That is because it is a smoke test, and we care only to see that it runs and completes without error. The inputs are all controlled, so we know what those reports say. Or we can find out. -### PRODUCE-PROJECTS-ELEMENTLIST.xpl +### PRODUCE-PROJECTS-ELEMENTLIST The pipeline [PRODUCE-PROJECTS-ELEMENTLIST.xpl](../../PRODUCE-PROJECTS-ELEMENTLIST.xpl) has “real-world complexity”. Reviewing its steps can give a sense of how XProc combines simple capabilities into complex operations. Notwithstanding the title of this section, it is not important to understand every detail – knowing they are there is enough. @@ -89,7 +84,7 @@ For newcomers to XML coding – you can “comment out” code in any XM Text ``` -becomes +becomes: ``` diff --git a/tutorial/sequence/Lesson02/walkthrough_219.md b/tutorial/sequence/Lesson02/walkthrough_219.md index 36c4372..dab08e3 100644 --- a/tutorial/sequence/Lesson02/walkthrough_219.md +++ b/tutorial/sequence/Lesson02/walkthrough_219.md @@ -18,9 +18,9 @@ More in depth. The same pipelines you ran in setup: [Setup 101](../acquire/acquire_101.md). -Also, [XProc.org dashboard page](https://xproc.org) +Also, [XProc.org dashboard page](https://xproc.org). -Also, XProc index materials produced in this repository: [XProc docs](../../../projects/xproc-doc/readme.md) +Also, XProc index materials produced in this repository: [XProc docs](../../../projects/xproc-doc/readme.md). ## XProc as XML @@ -84,7 +84,7 @@ Initiated in 1996, XML continues to be generative in 2024. ## Snapshot history: an XML time line -[TODO: complete this, or move it, or both] +[TODO: complete this, or move it, or both] ... | Year | Publication | Capabilities | Processing frameworks | Platforms | | --- | --- | --- | --- | --- | diff --git a/tutorial/sequence/Lesson02/walkthrough_301.md b/tutorial/sequence/Lesson02/walkthrough_301.md index 1431747..414d030 100644 --- a/tutorial/sequence/Lesson02/walkthrough_301.md +++ b/tutorial/sequence/Lesson02/walkthrough_301.md @@ -8,12 +8,12 @@ ## Goals -* See how XProc supports software testing, including testing itself, supportive of a test-driven development +* See how XProc supports software testing, including testing itself, supportive of test-driven development (TDD) * Exposure to the configuration of the Github repository supporting dynamic testing on Pull Requests and releases, subject to extension ## Prerequisites -**No prerequisites** +You have made it this far. ## Resources @@ -39,18 +39,21 @@ Both kinds of tests can be configured and executed using XProc. Pipelines here p Specifically, tests that are run anytime a Pull Request is updated against the home repository serve to guard against accepting non-functional code into the repository code base. The tests themselves are so far fairly rudimentary – while paying for themselves in the consistency and quality they help enforce. -Pipelines useful for the developer + +### Pipelines useful for the developer: * [VALIDATION-FILESET-READYCHECK.xpl](../../../testing/VALIDATION-FILESET-READYCHECK.xpl) runs a pre-check to validate that files referenced in FILESET Xprocs are in place * [REPO-FILESET-CHECK.xpl](../../../testing/REPO-FILESET-CHECK.xpl) for double checking the listed FILESET pipelines against the repository itself - run this preventatively to ensure files are not left off either list inadvertantly * [RUN_XPROC3-HOUSE-RULES_BATCH.xpl](../../../testing/RUN_XPROC3-HOUSE-RULES_BATCH.xpl) applies House Rules Schematron to all XProcs listed in the House Rules FILESET - just like the HARDFAIL House Rules pipeline except ending gracefully with error reports * [REPO-XPROC3-HOUSE-RULES.xpl](../../../testing/REPO-XPROC3-HOUSE-RULES.xpl) applies House Rules Schematron to all XProc documents in the repository * [RUN_XSPEC_BATCH.xpl](../../../testing/RUN_XSPEC_BATCH.xpl) runs all XSpecs listed in the XSpec FILESET, in a single batch, saving HTML and JUnit test results -Pipelines run under CI/CD + +### Pipelines run under CI/CD: * [HARDFAIL-XPROC3-HOUSE-RULES.xpl](../../../testing/HARDFAIL-XPROC3-HOUSE-RULES.xpl) runs a pipeline enforcing the House Rules Schematron to every XProc listed in the imported FILESET pipeline, bombing (erroring out) if an error is found - useful when we want to ensure an ERROR condition comes back on an error reported by a *successful* Schematron run * [RUN_XSPEC-JUNIT_BATCH.xpl](../../../testing/RUN_XSPEC-JUNIT_BATCH.xpl) runs all XSpecs listed in the XSpec FILESET, saving only JUnit results (no HTML reports) -Additionally + +### Additionally: * [FILESET_XPROC3_HOUSE-RULES.xpl](../../../testing/FILESET_XPROC3_HOUSE-RULES.xpl) provides a list of resources (documents) to be made accessible to importing pipelines * [FILESET_XSPEC.xpl](../../../testing/FILESET_XSPEC.xpl) provides a list of XSpec files to be run under CI/CD @@ -77,7 +80,7 @@ Demonstrating the capability is a more important goal, and XSpecs can and are ea The [XSpec FILESET](../../../testing/FILESET_XSPEC.xpl) will show XSpecs run under CI/CD but not all XSpecs in the repository will be listed there. -## XProc running under continuous integration +## XProc running under continuous integration and development (CI/CD) Any XProc pipelines designed, like the smoke tests or validations just described, to provide for quality checking over carefully maintained code bases, are natural candidates for running dynamically and on demand, for example when file change commits are made to git repositories under CI/CD (continuous integration / continuous deployment). diff --git a/tutorial/sequence/Lesson02/walkthrough_401.md b/tutorial/sequence/Lesson02/walkthrough_401.md index 37efdbd..aebacdd 100644 --- a/tutorial/sequence/Lesson02/walkthrough_401.md +++ b/tutorial/sequence/Lesson02/walkthrough_401.md @@ -4,34 +4,43 @@ > > Save this file elsewhere to create a persistent copy (for example, for purposes of annotation). -# 401: XSLT Forward and Back +# 401: The XSLT Factor -What is this XSLT? +## Goals -Read this page if you are a beginner, or an expert in XSLT, or if you plan never to use it. +What is this XSLT? Read this page if you are a beginner in XSLT, or an expert, or if you plan never to use it. -## Goals +* If you don't know XSLT and do not care to, consider skimming to help you understand what is XSLT and what it does. +* If you know XSLT or plan to learn it, read to understand something more about how it fits with XProc. +* XQuery is also mentioned. Much of what is said about XSLT here applies to XQuery as well. -* If you don't know XSLT, helps you understand what it is and what it does -* If you know XSLT, understand something more about how it fits with XProc +XSLT offers XProc a core capability. Even if not indispensable, what it brings is important and frequently necessary part of what makes XProc able to address problems with real-world complexity that evolve – or are only revealed – over time. It would be unfair to introduce developers or proprietors of data processing systems to XProc without gaining some sense of XSLT and its uses and strengths. ## Prerequisites -Possibly, you have run and inspected pipelines mentioned earlier, most especially [PRODUCE-PROJECTS-ELEMENTLIST.xpl](../../PRODUCE-PROJECTS-ELEMENTLIST.xpl), which contains `p:xslt` steps. +You may have run and inspected pipelines mentioned earlier, such as [PRODUCE-PROJECTS-ELEMENTLIST](../../PRODUCE-PROJECTS-ELEMENTLIST.xpl), which contain `p:xslt` steps. (If not, `p:xslt` is pretty easy to find.) + +Possibly, you have also inspected XSLT files (standalone transformations or *stylesheets*), to be found more or less anywhere in this repository, especially directories named `src`, with the file suffix `xsl` by convention. (XSLT being a part of XSL.) -Possibly, you have inspected XSLT files (standalone transformations or *stylesheets*), to be found more or less anywhere in this repository, especially directories named `src`. +XSLT practitioners might consider reading this section to see what they agree with. ## Resources -XSLT links! +XSLT links! Absorbing these documents is not necessary; but you need to know they exist. These provide the basis and history of the XML Data Model (XDM), the foundation of XProc. ### XSLT 1.0 and XPath 1.0 -* [XML Path Language (XPath) Version 1.0](https://www.w3.org/TR/1999/REC-xpath-19991116/) W3C Recommendation 16 November 1999 +This “Original Gangster” (OG) version is still available in browsers, and still capable, albeit not as general or powerful as it was to become. + +* [XML Path Language (XPath) Version 1.0](https://www.w3.org/TR/1999/REC-xpath-19991116/) W3C Recommendation 16 November 1999 * [XSL Transformations (XSLT) Version 1.0](https://www.w3.org/TR/xslt-10/) W3C Recommendation 16 November 1999 ### XSLT 2.0 and XQuery 1.0 +With capabilities for grouping, better string processing (regular expressions), a more extensive type system aligned with XQuery, *temporary trees* (to reprocess results) and other needed features, XSLT 2.0 was widely deployed in document production back-ends, and used successfully within XProc 1.0. + +The only reason not to use it today is that XSLT 3.0/3.1 is available. + * [XSL Transformations (XSLT) Version 2.0 (Second Edition)](https://www.w3.org/TR/xslt20/) W3C Recommendation 30 March 2021 (Amended by W3C) * [XQuery 1.0: An XML Query Language (Second Edition)](https://www.w3.org/TR/xquery-10/) W3C Recommendation 14 December 2010 * World Wide Web Consortium. *XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition)*. W3C Recommendation, 14 December 2010. See [http://www.w3.org/TR/xpath-datamodel/](https://www.w3.org/TR/xpath-datamodel/). @@ -41,23 +50,25 @@ XSLT links! ### XSLT 3.0, XQuery 3.0, XPath 3.1 +The current generation of the language – although work progresses on XPath 4.0, more capable than ever. + * [XSL Transformations (XSLT) Version 3.0](https://www.w3.org/TR/xslt-30/) W3C Recommendation 8 June 2017 * [Normative references](https://www.w3.org/TR/xslt-30/#normative-references) for XSLT 3.0 - data model, functions and operators, etc., including **XPath 3.1** * [XQuery 3.0: An XML Query Language](https://www.w3.org/TR/xquery-30/) W3C Recommendation 08 April 2014 ## XSLT: XSL (XML Stylesheet Language) Transformations -XSLT has a long and amazing history to go with its checkered reputation. Its role in XProc is similarly ambiguous: in one sense it is an optional power feature: a nice-to-have. In another sense it can be regarded as foundational. XSLT is the reason to have XProc. +XSLT has a long and amazing history to go with its checkered reputation. Its role in XProc is similarly ambiguous: in one sense it is an optional power feature: a nice-to-have. In another sense it can be regarded as foundational. One of the best reasons to have XProc is in how easy it makes it to deploy and run XSLT. -Chances are good that if you are not current on the latest XSLT version, you have little idea of what we are talking about, as it may have changed quite a bit (and even despite external appearances) since you last saw it. You may think you know it but you might have to reconsider. +Chances are good that if you are not current on the latest XSLT version, you have little idea of what we are talking about, as despite appearances, it may have changed quite a bit since you last saw it. You may think you know it but you might have to reconsider. Users who last used XSLT 1.0 and even 2.0, in particular, can consider their knowledge out of date until they have taken a look at XSLT 3.0. -Moreover, within the context of XProc, experienced users of XSLT should also consider it may take a different form, as it is unburdened of some operations that better belong outside it - operations such as those provided by XProc itself. Within XProc, XSLT may often be simpler than out in systems where it has to do more work. +Moreover, within the context of XProc, experienced users of XSLT should also consider it may take a different form, as it is unburdened of some operations that better belong outside it – often, operations such as those provided by XProc itself. Within XProc, XSLT may often be simpler than in systems where it has to do more work. -Over time, the principle of pipelining, iterative amelioration (as it might be described) or “licking into shape” has been repeatedly demonstrated. Of course it proves easier to do a complicated task when broken into a series of simpler tasks. On Java alone, ways of deploying operations in sequence include at least [Apache Ant](https://ant.apache.org/), Apache Tomcat/[Cocoon](https://cocoon.apache.org/) (a web processing framework), XQuery (such as [BaseX](https://basex.org/) or [eXist-db](https://exist-db.org/exist/apps/homepage/index.html) engines) and XSLT ([Saxon](https://www.saxonica.com/documentation12/index.html#!functions/fn/transform)) to say nothing of batch scripts, shell scripts and “transformation scenarios” or the like, as offered by XML tools and toolkits. +Over time, we have seen repeated demonstrations of the principle of pipelining, iterative amelioration (as it might be described) or “licking into shape” as applied to document processing. Of course it proves easier to do a complicated task when it is broken into a series of simpler tasks. On Java alone, ways of deploying transformations and modifications into sequences of steps include at least [Apache Ant](https://ant.apache.org/), Apache Tomcat/[Cocoon](https://cocoon.apache.org/) (a web processing framework), XQuery (using engines such as [BaseX](https://basex.org/) or [eXist-db](https://exist-db.org/exist/apps/homepage/index.html) engines) and XSLT itself ([Saxon](https://www.saxonica.com/documentation12/index.html#!functions/fn/transform)), to say nothing of batch scripts, shell scripts and “transformation scenarios” or the like, as offered by XML tools and toolkits. -All can be disturbingly haphazard. In contrast to the varied stopgap solutions, XProc helps quite a bit by taking over from XSLT, to whatever extent necessary and useful, any aspects of processing that require any sort of interaction with the wider system. This way XSLT plays to its strengths, while XProc standardizes and simplifies how it works. On one hand, XProc enables XSLT when needed, while on the other XProc may enable us largely to do without it, offering both a useful feature set and the flexibility we need, but with less overhead, especially with regard to routine chores like designating sets of inputs and outputs, or sequencing operations. The principle of Least Power may well apply here: it saves our present and our future selves effort if we can arrange and manage to do fewer things less. XProc lets us do less. +All this can be disturbingly haphazard. In contrast, XProc offers a single unified approach using a standard declarative vocabulary specifically for dealing with process orchestation and I/O (inputs and outputs, i.e. interfaces). Thus it helps quite a bit by taking over from XSLT, to whatever extent necessary and useful, all those aspects of processing that require any sort of interaction with the wider system. This way XSLT plays to its strengths, while XProc standardizes and simplifies how it works. Consequently, XProc enables XSLT when needed, on the one hand, while on the other XProc may enable us largely to do without it, as it *additionally* offers both its own useful feature set with regard to routine chores like designating sets of inputs and outputs, or sequencing operations. The principle of Least Power may well apply here: it saves our present and our future selves effort if we can arrange and manage to do fewer things less. XProc lets us do less. With XSLT together, this effect is magnified. XSLT lets us write less XProc, and XProc lets us write less XSLT. Together they are easier than either would be without the other to lighten the lift. @@ -65,15 +76,15 @@ XProc lets us use XSLT when we must, but also keeps routine and simple things bo ### Reflecting on XSLT -Programmers can think of XSLT as a domain-specific language (DSL) or fourth-generation language (4GL) designed for the purpose of manipulating data structures suitable for documents and messages as well as for structured data sets. As such, XSLT is highly generalized and abstract and can be applied to a very broad range of problems. Its main distinguishing feature among similar languages (which tend to be functional languages such as Scala and Scheme) is that it is optimized for use specifically with XML-based data formats, offering well-defined handling of information sets expressed in XML, while the language itself uses XML syntax, affording nice composability, reflection and code generation capabilities. XSLT's processing model is both broadly applicable, and workable in a range of environments including client software or within encapsulated, secure software configurations and deployments. +Programmers can think of XSLT as a domain-specific language (DSL) or fourth-generation language (4GL) designed for the purpose of manipulating data structures suitable for documents and messages as well as for structured data sets. As such, XSLT is highly generalized and abstract and can be applied to a very broad range of problems. Its main distinguishing feature among similar languages (which tend to be functional languages such as Scala and Scheme) is that it is optimized for use specifically with XML-based data formats, offering well-defined handling of information sets expressed in XML, while the language itself uses XML syntax, affording nice composability, reflection and code generation capabilities. XSLT's processing model is both broadly applicable, and workable in a range of environments from widely distributed client software, to encapsulated (“containerized”), secure software configurations and deployments. -If your XSLT is strong enough, you don't need XProc, or not much. But as a functional language, XSLT is best used in a functionally pure, “stateless” way that does not interact with the system: no “side effects”. This is related to its definitions of conformant processing (X inputs produce Y outputs) and the determinism, based in mathematical formalisms, that underlies its idea of conformance. However one cost of mathematical purity is that operations that do interact with stateful externalities – things such as reading and writing files – are not in XSLT's “comfort zone”. XSLT works by defining what a new structure A' should look like for any given structure A, using such terms as a conformant XSLT engine can then effectuate. But to turn an actual A into an actual A' we must first acquire A – or an effective surrogate thereof – and moreover make our A' available, in some form. XSLT leaves it up to its processor and “calling application” to handle this aspect of the problem – which they typically do by offering interfaces for an XSLT transformation's nominal *source* and (primary) *result*. Often, this gap has been bridged by extended functionality in processors. Does your processor read and parse XML files off the file system? Can it be connected to upstream data producers in different ways? Can it use HTTP `GET` and `PUT`? The answer may be Yes to any or all of these. Throughout its history, XSLT in later versions was also extended in this direction, with features such as the `collection()` function, `xsl:result-document`, `doc-available()` and other features we may not need if we are using XProc. +If your XSLT is strong enough, you don't need XProc, or not much. But as a functional language, XSLT is best used in a functionally pure, “stateless” way that does not interact with the system: no “side effects”. This is related to its definitions of conformant processing (X inputs produce Y outputs) and the determinism, based in mathematical formalisms, that underlies its idea of conformance. However one cost of mathematical purity is that operations that do interact with stateful externalities – operations such as reading and writing files – are not in XSLT's “comfort zone”. XSLT works by defining what a new structure A' should look like for any given structure A, using such terms as a conformant XSLT engine can then effectuate. But to turn an actual A into an actual A' we must first acquire A – or an effective surrogate thereof – and then make our A' available, in some form. XSLT leaves it up to its processor and “calling application” to handle this aspect of the problem – which they typically do by offering interfaces for an XSLT transformation's nominal *source* and (primary) *result*. Often, this gap has been bridged by extended functionality in processors. Does your processor read and parse XML files off the file system? Can it be connected to upstream data producers in different ways? Can it use HTTP `GET` and `PUT`? The answer may be Yes to any or all of these. Throughout its history, XSLT in later versions was also extended in this direction, with features such as the `collection()` function, `xsl:result-document`, `doc-available()` and other features we may not need if we are using XProc. -### Running XSLT without XProc +Much of this can be set aside when using XSLT with XProc, making the XSLT simpler and easier. -As a standard and an externally-specified technology, XSLT can in principle be implemented on any platform, but the leading XSLT implementation for some years has been Saxon, produced by Saxonica of Reading, England. Saxon has achieved market share and developer support on a record of strictly-conformant, performant applications, deployed as an open-source software product free for developers to use and integrate. (While doing this, Saxonica also has related product offerings including optimized processor for those who choose to support it.) +### Running XSLT without XProc -Download and run Saxon to apply XSLT to XML and other inputs, without XProc. +XSLT can also be run without XProc, often to exactly the same ends. But as you start addressing more complex requirements, you might find yourself reinventing XProc wheels in XSLT.... ## Using XSLT in XProc: avoiding annoyances @@ -81,52 +92,58 @@ If you are an experienced XSLT user, congratulations! The power XProc puts into There are a couple of small but potentially annoying considerations when embedding XSLT literals in your XProc code. They do not apply when your XSLT is called from out of line, acquired by binding to an input port or even `p:load`. If you acquire and even manipulate your XSLT without including literal XSLT code in your XProc, that eliminates the syntax-level clashes at the roots of both these problems. -### Text and attribute value syntax in embedded XSLT +### Namespaces in and for your XSLT -XSLT practitioners know that within XSLT, in attributes and (in XSLT 3.0) within text (as directed), the curly brace signs `{` and `}` have special semantics as [attribute](https://www.w3.org/TR/xslt-30/#attribute-value-templates) or [text value templates](https://www.w3.org/TR/xslt-30/#text-value-templates). In the latter case, the operation can be controlled with an `xsl:expand-text` setting. When effective as template delimiters, these characters can be escaped and hidden from processing by doubling them: `{{` for `{` etc. +[A subsequent Lesson Unit on namespaces in XProc](../oscal-convert/oscal-convert_350.md) may help newcomers or anyone mystified by XML namespaces. They are worth mentioning here because everything tricky in XProc regarding namespaces is doubly tricky with XSLT in the picture. -XProc offers a similar feature for expanding expressions dynamically, indicated with a `p:expand-text` setting much like XSLT's. +In brief: keep in mind XSLT has its own features for both configuring namespace-based matching on elements by name (such as `xpath-default-namespace`), and for managing namespaces in serialization (`exclude-namespace-prefixes`). In the XProc context, however, your XSLT will typically not be writing results directly, instead only producing the same kind of (XDM) tree as is emitted and consumed by other steps. -Because they both operate, an XSLT author must take care to provide for the correct escaping (sometimes more than one level) or settings on either language's `expand-text` option. Searching the repository for the string value `{{` (two open curlies together) will turn up instances of this – or try [a worksheet XProc with XSLT embedded](../../worksheets/NAMESPACE_worksheet.xpl). +### Text and attribute value syntax in embedded XSLT -### Namespaces in and for your XSLT +If not yet conversant with XSLT, you can read more about this in an [upcoming Lesson Unit](../oscal-convert/oscal-convert_102.md) on data conversion. -[A lesson unit on namespaces in XProc](../oscal-convert/oscal-convert_350.md) may help newcomers or anyone mystified by XML namespaces. They are worth mentioning here because everything tricky in XProc regarding namespaces is doubly tricky with XSLT in the picture. +XSLT practitioners know that within XSLT, in attributes and (in XSLT 3.0) within text (as directed), the curly brace signs `{` and `}` have special semantics as [attribute](https://www.w3.org/TR/xslt-30/#attribute-value-templates) or [text value templates](https://www.w3.org/TR/xslt-30/#text-value-templates). In the latter case, the operation can be controlled with an `xsl:expand-text` setting. When effective as template delimiters, these characters can be escaped and hidden from processing by doubling them: `{{` for `{` etc. -In brief: keep in mind XSLT has its own features for both configuring namespace-based matching on elements by name (such as `xpath-default-namespace`), and for managing namespaces in serialization (`exclude-namespace-prefixes`). In the XProc context, however, your XSLT will typically not be writing results directly, instead only producing the same kind of (XDM) tree as is emitted and consumed by other steps. +XProc offers a similar feature for expanding expressions dynamically, indicated with a `p:expand-text` setting much like XSLT's. + +Because they both operate, an XSLT author must take care to provide for the correct escaping (sometimes more than one level) or settings on either language's `expand-text` option. Searching the repository for the string value `{{` (two open curlies together) will turn up instances of this – or skip ahead and try [a worksheet XProc with XSLT embedded](../../worksheets/NAMESPACE_worksheet.xpl). ## Learning XSLT the safer way If setting out to learn XSLT, pause to read the following *short* list of things to which you should give early attention, in order: -1. Namespaces in XML and XSLT: names, name prefixes, unprefixed names and the `xpath-default-namespace` setting (not available until XSLT 2.0) +1. Namespaces in XML and XSLT: names, name prefixes, unprefixed names and the `xpath-default-namespace` setting (not available until XSLT 2.0). -1. Templates and modes in XSLT: template matching, `xsl:apply-templates`, built-in templates, and using modes to configure default behaviors when no template matches +1. Templates and modes in XSLT: template matching, `xsl:apply-templates`, built-in templates, and using modes to configure default behaviors when no template matches. -1. XPath, especially absolute and relative location paths: start easy and work up +1. XPath, especially absolute and relative location paths such as `/child::oscal:catalog` or `path/to/node[qualified(.)]`: start easy and work up. -Understanding each of these will provide useful insights into XProc. +Understanding each of these will provide also provide useful insights into XProc. ## XProc without XSLT? -XProc does not require XSLT absolutely, even if XSLT is indispensable for some XProc libraries, including those in this repository. +As noted, XProc does not require XSLT absolutely, even if XSLT is indispensable for some XProc libraries, including those in this repository. How could we do without it? * Using XQuery any time queries get complicated -* Use XProc where possible, for example steps that support matches on patterns? E.g. `p:insert`, `p:label-elements` and `p:add-attribute` -* Reliance on iterators and `p:viewport` -* Much smarter (declarative, data-centric) HTML or other dialect in the application space? +* Use XProc where possible, for example steps that support matches on patterns for XSLT-like functionality. Such steps include `p:insert`, `p:label-elements`, `p:add-attribute` and others. +* Similarly reliance on iterators and `p:viewport` +* High-level design and refactoring: using a smarter (declarative, data-centric) HTML or other dialect in the application space to simplify transformation requirements? Chances are, there is a limit. One thing XSLT does better than almost any comparable technology is support generalized or granular mappings between vocabularies. -So not only creating, but also consuming HTML, is the place we begin with XSLT. But since it is also very fine for other vocabulary mappings in the middle and back, it becomes indispensable almost as soon as it is available for use. +Typically, the place we begin with XSLT is to create HTML for viewing from an XML source. But since it is also very fine for other vocabulary mappings in the middle and back, it becomes indispensable almost as soon as it is available for use. -An XSLT that is used repeatedly can of course always be encapsulated as an XProc step. +An XSLT that is used repeatedly can be encapsulated as an XProc step. ## XProc, XDM (the XML data model) and the standards stack -Another critical consideration is whether and to what extent XProc and XSLT introduce unwanted dependencies, which make them strategically not a good choice (or not a good choice for everyone) at least in comparison to alternatives. These are standards in every way including nominally, emerging as the work of organizations such as W3C and ISO, while not escaping a reputation as “boutique” or “niche” technologies. Yet alternative models – whether offered by large software vendors and service providers, or by forests of Javascript libraries, or a bespoke stack using a developers' favorite flavor of Markdown or microformats – have not all fared very well either. Often scorned, XSLT has a reputation for projects migrating away from it as much as towards it. Yet look closely, and when problems arise, XSLT is never the issue by itself. (A project choosing not to use XSLT because of a lack of understanding or skills is something differet.) Often the question is, were you even using the right tool? It helps when your application is within the sweet spot of document processing at scale (and there is a sweet spot), but even this is not an absolute rule. Sometimes the question is, are you actually fitting the capabilities of the processing model to the problem at hand. Too often, that fit happens by accident. Too often, other considerations prevail and compromises are made - then the resulting system is blamed. +Another critical consideration is whether and to what extent XProc and XSLT introduce unwanted dependencies, which make them strategically not a good choice (or not a good choice for everyone) at least in comparison to alternatives. These are standards in every way including nominally, emerging as the work of organizations such as W3C and ISO, while not escaping a reputation as “boutique” or “niche” technologies. Yet alternative approaches to software development – whether offered by large software vendors and service providers, or by forests of Javascript libraries, or a bespoke stack using a developers' favorite flavor of Markdown or microformats – have not all fared very well either. Often spurned or ignored, XSLT has a reputation for projects migrating away from it as much as towards it. Yet look closely, and when problems arise, XSLT is never the issue by itself. (A project not able to use XSLT because of a lack of understanding or skills is something different.) Often the question is, were you even using the right tool? XSLT's reputation suffers when people decide not to use it or to migrate away. But no one talks about all the systems that take advantage of it quietly. + +The *Golden Hammer* is an [anti-pattern](https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Architecture/Anti-Patterns) – related to the **Silver Bullet** – but this does not make hammers superfluous. It helps when your application is within the sweet spot of XSLT and XProc's document processing at scale (and there is a sweet spot), but even this is not an absolute rule. Sometimes the question is, are you actually fitting the capabilities of the processing model to the problem at hand. Too often, that fit happens by accident. Too often, other considerations prevail and compromises are made – then the resulting system is blamed. So where has XML-based processing been not only tenable but rewarding over the long term? Interestingly, its success is to be found often in projects that have survived across more than one system over time, that have grown from one system into another, and that have morphed and adapted and grown new limbs. In many cases, look at them today and you do not see the same system as you would have only five years ago. + +Systems achieve sustainability when they are not only stable, but adaptive. This is a fine balance, but one that can be found by an evolutionary process of development and experiment. XProc 3.0 and its supporting technologies show the results of such an evolution. The demonstration should be in its ease of use combined with capability and maintainability. diff --git a/tutorial/sequence/Lesson03/oscal-convert_101.md b/tutorial/sequence/Lesson03/oscal-convert_101.md index fe4a8e1..1e790c5 100644 --- a/tutorial/sequence/Lesson03/oscal-convert_101.md +++ b/tutorial/sequence/Lesson03/oscal-convert_101.md @@ -10,17 +10,17 @@ Learn how OSCAL data can be converted between JSON and XML formats, using XProc. -Learn something about potential problems and limitations when doing this, and about how they can be detected, avoided, prevented or mitigated. +Learn something about potential problems and limitations when doing this, and about how they can be detected and avoided, prevented or mitigated. -Become familiar with the idea of generic conversions between syntaxes such as XML and JSON (not always possible), versus conversions capable of handling a single class or type of documents, such as OSCAL format conversions. +Become familiar with the idea of generic conversions between syntaxes such as XML and JSON, versus conversions capable of handling a single class or type of documents, such as OSCAL – a limitation that can also provide, in a case such as OSCAL, for a fully defined mapping supporting lossless, error-free translation across syntaxes. ## Prerequisites -You have succeeded in prior exercises, including tools installation and setup, and ready to run pipelines. +Having succeeded in prior exercises, including tools installation and setup, you are ready to run pipelines. ## Resources -This unit relies on the [oscal-convert project](../../../projects/oscal-convert/readme.md) in this repository, with its files. Like all projects in the repo, it aims to be reasonably self-contained and self-explanatory. The pipelines there (described below) provide rudimentary support for data conversions “in miniature” – infrastructures that might scale up. +This unit relies on the [oscal-convert project](../../../projects/oscal-convert/readme.md) in this repository, with its files. Like all projects in the repo, it aims to be reasonably self-contained and self-explanatory. The pipelines there (described below) provide rudimentary support for data conversions, demonstrating simplicity and scalability. As always, use your search engine and XProc resources to learn background and terminology. @@ -32,94 +32,92 @@ The idea here is simple: run the pipeline and observe its behaviors, including n ### [GRAB-RESOURCES](../../../projects/oscal-convert/GRAB-RESOURCES.xpl) -Like other pipelines with this name, run this to acquire resources. In this case, XSLTs used by other XProc steps are downloaded from their release page. +Like other pipelines with this name, run this to acquire resources. In this case, XSLTs used by other XProc steps are downloaded from the OSCAL release page. ### [BATCH-JSON-TO-XML](../../../projects/oscal-convert/BATCH_JSON-TO-XML.xpl) -This pipeline uses an input port to include a set of JSON documents, which it translates into XML using [generic semantics defined in XPath](https://www.w3.org/TR/xpath-functions-31/#json-to-xml-mapping). For each JSON file input, an equivalent XML file with the same filename base is produced, in the same place. +This pipeline uses an XProc input port to include a set of JSON documents, which it translates into XML using [generic semantics defined in XPath](https://www.w3.org/TR/xpath-functions-31/#json-to-xml-mapping). For each JSON file input, an equivalent XML file with the same filename base is produced, in the same place. As posted, the pipeline reads some minimalistic fictional data, which can be found in the system at the designated paths. -It then uses a pipeline step defined in an imported pipeline, which casts the data and stores the result for each JSON file input. In the XProc source, the imported step can be easily identified by its namespace prefix, different from the prefix given to XProc elements – and what is more important, under the nominal control of its developer or sponsor. +It then uses a pipeline step defined in an imported pipeline, which casts the data and stores the result for each JSON file input. In the XProc source, the imported step can be easily identified by its namespace prefix, different from the prefix given to XProc elements, as designated by the pipeline's developer or sponsor. -Follow the `p:import` link (via its `href` file reference) to find the step that is imported. Recognize a step by its `type` given at the top of the XML (as is described in more depth [in a subsequent lesson unit](oscal-convert_201.md)) +Follow the `p:import` link (via its `href` file reference) to find the step that is imported. An imported step is invoked by using its `type` name, given at the top of the XML (as is described in more depth [in a subsequent lesson unit](oscal-convert_201.md)). ### [BATCH-XML-TO-JSON](../../../projects/oscal-convert/BATCH_XML-TO-JSON.xpl) This pipeline performs the inverse of the JSON-to-XML batch pipeline. It loads XML files and converts them into JSON. -Note however how in this case, no guarantees can be made that any XML inputs will result in valid JSON. Many XML inputs will result in errors instead since only the XML vocabulary supporting JSON is considered well enough described for a comprehensive, rules-bound cast. Exception handling logic in the form of an XProc `p:try/p:catch` can be found in the imported pipeline step (which performs the casting). +Note however how in this case, no guarantees can be made that any XML inputs will result in valid JSON. Unless using the correct vocabulary, XML inputs will result in errors, as no comprehensive, rules-bound cast can be defined across its variations. To alleviate this problem, exception handling logic in the form of an XProc `p:try/p:catch` can be found in the imported pipeline step (which performs the casting). -Additionally, this variant has a fail-safe (looks for `p:choose`) that prevents the production of JSON from files not named `*.xml` – strictly speaking, this is only a naming convention, but respecting it prevents unseen and unwanted name collisions. It does *not* defend against overwriting any other files that happen to be in place with the target name. +Additionally, this variant has a fail-safe (look for `p:choose`) that prevents the production of JSON from files not named `*.xml` – strictly speaking, this is only a naming convention, but respecting it prevents unseen and unwanted name collisions. It does *not* defend against overwriting any other files that happen to be in place with the target name. ### [CONVERT-OSCAL-XML-DATA](../../../projects/oscal-convert/CONVERT-OSCAL-XML-DATA.xpl) -The requirement that any XML to be converted must already be JSON-ready by virtue of conforming to a JSON-descriptive vocabulary, is obviously an onerous one. To achieve a clean, complete and appropriate recasting and relabeling of data, depending on its intended semantics, is . +The requirement that any XML to be converted must already be JSON-ready by virtue of conforming to a JSON-descriptive vocabulary, is obviously an onerous one. To achieve a clean, complete and appropriate recasting and relabeling of data, depending on its intended semantics, depends on those semantics being defined in a way capable of a JSON expression. -OSCAL solves this problem by defining its XML and JSON formats in parity, such that a complete bidirectional conversion can be guaranteed over data sets already schema-valid. The bidirectional conversions themselves can be performed implicitly or overtly by tools that read and write OSCAL, or they can be deployed as XSLT transformations, providing for the conversion to be performed by any XSLT engine, in principle (that supports the required version of the language). +OSCAL solves this problem by defining its XML and JSON formats in parity, such that a complete bidirectional conversion can be guaranteed over data sets already schema-valid. The bidirectional conversions themselves can be performed implicitly or overtly by tools that read and write OSCAL, or they can be deployed as XSLT transformations, providing for the conversion to be performed by any XSLT 3.0 engine. -XProc has Saxon for its XSLT, so it falls into the latter category. The XSLTs in question can be acquired from an OSCAL release, as shown in the [GRAB-RESOURCES](../../../projects/oscal-convert/GRAB-RESOURCES.xpl) pipeline. +For XSLT 3.0, XProc has Saxon. The XSLTs in question can be acquired from an OSCAL release, as shown in the [GRAB-RESOURCES](../../../projects/oscal-convert/GRAB-RESOURCES.xpl) pipeline. -This pipeline applies one of these XSLTS to a set of given OSCAL XML files, valid to the catalog model, to produce JSON. - -It works on any XML file that has `catalog` as its root element, in the OSCAL namespace. It does *not* provide for validation of the input against the schema, which is might, as a defense. Instead, the garbage-in/garbage-out (GIGO) principle is respected. If the process breaks, currently this must be discovered in the result. +[CONVERT-OSCAL-XML-DATA](../../../projects/oscal-convert/CONVERT-OSCAL-XML-DATA.xpl) applies one of these XSLTS to a set of given OSCAL XML files, valid to the catalog model, to produce JSON. It works on any XML file that has `catalog` as its root element, in the OSCAL namespace. It does *not* provide for validation of the input against the schema: Instead, the garbage-in/garbage-out (GIGO) principle is respected. This means that some pipelines will run successfully while producing defective outputs, which must be discovered in the result (via formal validation and other checks). An XProc pipeline with a validation step preceding the conversion would present such errors earlier. The reverse pipeline is left as an exercise. Bring valid OSCAL JSON back into XML. Let us know if you have prototyped this and wish for someone to check your work! ### [CONVERT-OSCAL-XML-FOLDER](../../../projects/oscal-convert/CONVERT-OSCAL-XML-FOLDER.xpl) -This pipeline performs the same conversion as [CONVERT-OSCAL-XML-DATA](../../../projects/oscal-convert/CONVERT-OSCAL-XML-DATA.xpl) with an important distinction: instead of designating its sources, it processes all XML files in a designated directory. - -## The playing field is the internet - -Keep in mind that XProc in theory, and your XProc engine in practice, may read its inputs using whatever protocols it supports, while the `file` and `http` protocols are required for conformance, and work as they do on the Worldwide Web. - -Of course, permissions must be in place to read files from system locations, or save files to them. - -But when authentication is configured or resources are openly available, using `http` to reach resources or sources can be a very convenient option. - -## More catalogs needed! - -As we go to press we have only one example OSCAL catalog to use for this exercise. - -Other valid OSCAL catalogs are produced from other projects in this repo, specifically [CPRT import](../../../projects/CPRT-import/) and [FM6-22-IMPORT](../../../projects/FM6-22-import/). Run the pipelines in those projects to produce more catalogs (in XML) useful as inputs here. +This pipeline performs the same conversion as [CONVERT-OSCAL-XML-DATA](../../../projects/oscal-convert/CONVERT-OSCAL-XML-DATA.xpl) with an important distinction: instead of designating its sources, it processes all XML files in a designated directory. ## Working concept: return trip -Here's an idea: a single pipeline that would accept either XML or JSON inputs, and produce both as outputs. Would that be useful? +Note in this context that comparing the inputs and results of a round-trip conversion is an excellent way of determining, to some base level, the correctness and validity of your data set. While converting it twice cannot guarantee that anything in your data is “true”, if having converted XML to JSON and back again to XML, the result looks the same as the original, you can be sure that your information is “correctly represented” in both formats. -Note in this context that comparing the inputs and results of a round-trip conversion is an excellent way of determining, to some base level, the correctness and validity of your data set – as an encoded representation of *something* (expressed in OSCAL), even if not a truthful representation of *anything* (whether expressible in OSCAL or not). +Here's an idea: a single pipeline that would accept either XML or JSON inputs, and produce either, or both, as outputs. Would that be useful? ## What is this XSLT? If your criticism of XProc so far is that it makes it look easy when it isn't, you have a point. -Conversion from XML to JSON isn't free, assuming it works at all. +Conversion from XML to JSON isn't free, assuming it works at all. The runtime might be effectively free, but developing it isn't. -In this case, the heavy lifting is done by the XSLT component - the Saxon engine invoked by the `p:xslt` step, applying logic defined in an XSLT stylesheet (aka transformation) stored elsewhere. It happens that a converter for OSCAL data is available in XSLT, so rather than having to confront this considerable problem ourselves, we drop in the solution we have at hand. +Here, the heavy lifting is done by the XSLT component - the Saxon engine invoked by the `p:xslt` step, applying logic defined in an XSLT stylesheet (aka **transformation**) stored elsewhere. It happens that a converter for OSCAL data is available in XSLT, so rather than having to confront this considerable problem ourselves, we drop in the solution we have at hand. -In later units we will see how using the XProc steps described, rudimentary data manipulations can be done using XProc by itself, without entailing the use of either XSLT or XQuery (another capability invoked with a different step). +In later units we will see how using the XProc steps described, rudimentary data manipulations can be done using XProc by itself, without entailing the use of either XSLT or XQuery. At the same time, while pipelines are based on the idea of passing data through a series of processes, there are many cases where logic is sufficiently complex that it becomes essential to maintain – and test – that logic externally from the XProc. At what point it becomes more efficient to encapsulate logic separately (whether by XSLT, XQuery or other means), depends very much on the case. -The `p:xslt` pipeline step in particular is so important for real-world uses of XProc that it is introduced early, to show such a black-box application. +The `p:xslt` pipeline step in particular is so important for real-world uses of XProc that it is introduced early, to show such a black-box application. There is also an [XQuery](https://spec.xproc.org/3.0/steps/#c.xquery) step – for many purposes, functionally equivalent. -XProc also makes a fine environment for testing XSLT developed or acquired to handle specific tasks, a topic covered in more depth later. +XProc also makes a fine environment for testing XSLT developed or acquired to handle specific tasks – and it can support automated testing and test-driven development using [XSpec](https://github.com/xspec/xspec/wiki). Indeed XSLT and XQuery being, like XProc itself, declarative languages, it makes sense to factor them out while maintaining easy access and transparency for analysis and auditing purposes. ## What could possibly go wrong? -Among the range of possible errors, syntax errors are relatively easy to cope with. But anomalous inputs, especially invalid inputs, can result in lost data. (A common reason data sets fail validation is the presence of foreign unknown contents, or contents out of place - the kinds of things that might fail to be converted.) The most important concern when engineering a pipeline is to see to it that no data quality problems are introduced inadvertantly. While in comparison to syntax or configuration problems, data quality issues can be subtle, there is also good news: the very same tools we use to process inputs into outputs, can also be used to test and validate data to both applicable standards and local rules. +Three things can happen when we run a pipeline: + +* The pipeline can fail to run, typically terminating with an error message (or, unusually, failing to terminate) +* The pipeline can run successfully, but result in incorrect outputs given the inputs +* The pipeline can run successfully and correctly + +Among the range of possible errors, those that show up in your console with error messages are the easy ones. This will typically be a syntax error or error in usage (providing the wrong kind of input, etc.), remediable by a developer. Sometimes it is an input resource, not the pipeline, that must be corrected, or a different pipeline developed for the different input. If XML is expected but not provided, a conforming processor must emit an error. Correct it or plan on processing plain text. + +The second category is much harder. The most important concern when engineering a pipeline is to see to it that no data quality problems are introduced inadvertantly. Anomalous inputs might process “correctly” (for the input provided) but result in lost data or disordered results. Often this is obvious in testing, but not always. The key is defining and working within a scope of application (range of inputs) within which “correctness” can be specified, unambiguously and demonstrably, both with respect to the source data, and the processing requirement. Given such a specification, testing is possible. While in comparison to syntax or configuration problems, data quality issues can be subtle, there is also good news: the very same tools we use to process inputs into outputs, can also be used to test and validate data to both applicable standards and local rules. Generally speaking, OSCAL maintains “validation parity” between its XML and JSON formats with respect to their schemas. That is to say, the XSD (XML schema) covers essentially the same set of rules for OSCAL XML data as the JSON Schema does for OSCAL JSON data, accounting for differences between the two notations, the data models and how information is mapped into them. A consequence of this is that valid OSCAL data, either XML or JSON, can reliably be converted to valid data in the other notation, while invalid data may come through with significant gaps, or not be converted at all. -For this and related reasons on open systems, the working principle in XML is often to formalize a model (typically by writing and deploying a schema) as early as possible - or adopt a model already built - as a way to institute and enforce schema validation as a **prerequisite** and **primary requirement** for working with any data set. Validation against schemas is also supported by XProc, making it still easier to enforce this dependency. +For this reason (as it applies to OSCAL) and related reasons on open systems (applying across the board, and not only to data conversions), the working principle in XML is often to define and formalize a model as early as possible – or identify and adopt a model already built – as a way to institute and enforce schema validation as a *prerequisite* and *primary requirement* for working with any data set. We do this by acquiring or writing and deploying schemas. To this end, XProc supports several kinds of schema validation including [XML DTD (Document Type Definition)](https://spec.xproc.org/lastcall-2024-08/head/validation/#c.validate-with-dtd), [XSD (W3C Schema Definition language)](https://spec.xproc.org/lastcall-2024-08/head/validation/#c.validate-with-xml-schema), [RelaxNG (ISO ISO/IEC 19757-2)](https://spec.xproc.org/lastcall-2024-08/head/validation/#c.validate-with-relax-ng), [Schematron](https://spec.xproc.org/lastcall-2024-08/head/validation/#c.validate-with-schematron) and [JSON Schema](https://spec.xproc.org/lastcall-2024-08/head/validation/#c.validate-with-json-schema), making it straightforward to enforce this dependency at any point in a pipeline, whether applied to inputs or to pipeline results including interim results and pipeline outputs. Resource validation is described further in subsequent coverage including the next [Maker lesson unit](oscal-convert_102.md). + +### The playing field is the Internet -### Intercepting errors +File resources in XProc are designated and distinguished by URIs. Keep in mind that XProc in theory, and your XProc engine in practice, may read its inputs using whatever [URI schemes](https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml) it supports, while the schemes `file` and `http` (or `https`) are required for conformance, and work as they do on the Worldwide Web. -One way to manage the problem of ensuring input quality is to validate on the way in, either as a dependent (prerequisite) process, or built into a pipeline. Whatever you want to do with invalid inputs, including ignoring them and producing warnings or runtime exceptions, can be defined in a pipeline much like anything else. +Of course, permissions must be in place to read files from system locations, or save files to them. When authentication is configured or resources are openly available, using `http` to reach resources or sources can be a very convenient option. -In the [publishing demonstration project folder](../../../projects/oscal-publish/publish-oscal-catalog.xpl) is an XProc that valides XML against an OSCAL schema, before formatting it. The same could be done for an XProc that converts the data into JSON - either or both before or after conversion. +While this is important and powerful, it comes with complications. Internet access is not always a given, making such runtime dependencies fragile. XML systems that rely on URIs frequently also support one or another kind of URI indirection, such as [OASIS XML Catalogs](https://www.oasis-open.org/committees/entity/spec-2001-08-06.html), to enable resource management, redirection and local caching of standard resources. For the XProc developer, this can be a silent source of bugs, hard to find and hard to duplicate and analyze. The [next lesson unit](oscal-convert_102.md) describes some functions that can be used to provide the transparency needed. -Learn more about recognizing and dealing with errors in [Lesson 102](oscal-convert_102.md), or continue on to the next project, oscal-validate, for more on validation of documents and sets of documents. +## More catalogs needed! + +As we go to press we have only one example OSCAL catalog to use for this exercise. + +Other valid OSCAL catalogs are produced from other projects in this repo, specifically [CPRT import](../../../projects/CPRT-import/) and [FM6-22-IMPORT](../../../projects/FM6-22-import/). Run the pipelines in those projects to produce more catalogs (in XML) useful as inputs here. diff --git a/tutorial/sequence/Lesson03/oscal-convert_102.md b/tutorial/sequence/Lesson03/oscal-convert_102.md index eb3a4e1..8fd9989 100644 --- a/tutorial/sequence/Lesson03/oscal-convert_102.md +++ b/tutorial/sequence/Lesson03/oscal-convert_102.md @@ -10,13 +10,13 @@ Learn how OSCAL data can be converted between JSON and XML formats, using XProc. -Learn something about potential problems and limitations when doing this, and about how to detect, avoid, prevent or mitigate them. +Learn about potential problems and limitations when doing this, and about how to detect, avoid, prevent or mitigate them. -Work with XProc features designed for handling JSON data (XDM **map** objects that can be cast to XML). +Learn something about XProc features designed for handling JSON data (XDM **map** objects that can be cast to XML). ## Prerequisites -Run the pipelines described in [the 101 Lesson](https://github.com/usnistgov/oscal-xproc3/discussions/18) +Run the pipelines described in [the 101 Lesson Unit](oscal-convert_101.md) in this topic. ## Resources @@ -26,24 +26,24 @@ Same as the [101 lesson](oscal-convert_101.md). Every project you examine provides an opportunity to alter pipelines and see how they fail when not encoded correctly – when “broken”, any way we can think of breaking them. Then build good habits by repairing the damage. Experiment and observation bring learning. -After reading this page and [the project readme](../../../projects/oscal-convert/readme.md), run the pipelines while performing some more disassembly / reassembly. Here are a few ideas (including a few you may have already done): +After reading this page and [the project readme](../../../projects/oscal-convert/readme.md), run the pipelines while performing some more disassembly / reassembly. Here are a few ideas: * Switch out the value of an `@href` on a `p:document` or `p:load` step. See what happens when the file it points to is not actually there. * There is a difference between `p:input`, used to configure a pipeline in its prologue, and `p:load`, a step that loads data. Ponder what these differences are. Try changing a pipeline that uses one into a pipeline that uses the other. -* Similarly, there is a difference between a `p:output` configuration for a pipeline, and a `p:store` step executed by that pipeline. Consider this difference and how we might define a rule for when to prefer one or the other. How is the pipeline used - is it called directly, or intended for use as a step in other pipelines? How is it to be controlled at runtime? +* Similarly, there is a difference between a `p:output` configuration for a pipeline, and a `p:store` step executed by that pipeline. Consider this difference and how we might define a rule for when to prefer one or the other. How is the pipeline used – is it called directly, or intended for use as a step in other pipelines? How is it to be controlled at runtime? * Try inserting `p:store` steps into a pipeline to capture intermediate results, that is, the output of any step before they are processed by the next step. Such steps can aid in debugging, among other uses. * `@message` attributes on steps provide messages for the runtime traceback. They are optional but this repo follows a rule that any `p:load` or `p:store` should be provided with a message. Why? -* A `p:identity` step passes its input unchanged to the next step. But can also be provided with a `@message`. +* A `p:identity` step passes its input unchanged to the next step. It can also be provided with a `@message`. The two commonest uses of `p:identity` are probably to provide for a “no-op” option, for example within a conditional or try/catch – and to provide runtime messages to the console. After breaking anything, restore it to working order. Create modified copies of any pipelines for further analysis and discussion. * Concept: copy and change one of the pipelines provided to acquire a software library or resource of your choice. -## Value templates in attributes and text: { expr } +## Value templates in attributes and text: { XPath-expr } -Practitioners of XQuery, XSLT and related technologies will recognize the curly-bracket characters (U+007B and U+007D) as indicators of [attribute value templates](https://www.w3.org/TR/xslt-10/#dt-attribute-value-template), [text value templates](https://www.w3.org/TR/xslt-30/#text-value-templates), or [enclosed expressions](https://www.w3.org/TR/xquery-31/#id-enclosed-expr). The expression within the braces is to be evaluated dynamically by the processor. This is one of the most useful convenience features in the language. +Practitioners of XQuery, XSLT and related technologies will recognize the curly-bracket characters (U+007B and U+007D) as indicators of [attribute value templates](https://www.w3.org/TR/xslt-10/#dt-attribute-value-template), [text value templates](https://www.w3.org/TR/xslt-30/#text-value-templates), or [enclosed expressions](https://www.w3.org/TR/xquery-31/#id-enclosed-expr). The expression within the brackets is to be evaluated dynamically by the processor. This is one of the most useful convenience features in the language. -These quickly become invisible. Upon seeing +[This syntax](https://spec.xproc.org/3.0/xproc/#value-templates) is concise, but expressive. Upon seeing: ``` @@ -51,16 +51,18 @@ These quickly become invisible. Upon seeing the XProc developer understands: -* The date, in some form (try it and see) should be written into the message -* The variable reference `$filename` is defined somewhere, and here will expand to a string +* The date, in some form should be written into the message. (Try it and see.) The XPath function [format-date](https://www.w3.org/TR/xpath-functions-31/#func-format-date) can also be used if we want a different format: for example, `current-date() => format-date('[D] [MNn] [Y]')`. +* The variable reference `$filename` is defined somewhere, and here will expand to a string value due to the operation of the (attribute value) template. -If you need to see actual curly braces, escape by doubling: `{{` for the single open and `}}` for the single close. +If you need to see actual curly brackets, escape by doubling: `{{` for the single open and `}}` for the single close. -Extra care must be taken with embedded XSLT and XQuery due to this feature, since their functioning will depend on correctly interpreting these within literal code. Yes, double escaping is sometimes necessary. (This can be tried with [a worksheet XProc](../../worksheets/NAMESPACE_worksheet.xpl).) +One complication arises: because XSLT and XQuery support similar syntax, clashes can occur, since their functioning will depend on correctly interpreting the syntax within literal code. Yes, this means double escaping is sometimes necessary. (This can be tried with [a worksheet XProc](../../worksheets/NAMESPACE_worksheet.xpl).) -Setting `expand-text` to `false` on an XProc element turns this behavior off: the braces become regular braces again. [The spec also describes](https://spec.xproc.org/3.0/xproc/#expand-text-attribute) a `p:inline-expand-text` attribute that can be used in places (namely inside literal XML provided in your XProc using `p:inline`) where the regular expand-text has no effect. Either setting can be used inside elements already set, resulting in “toggling” behavior (it can be turned on and off), as any `expand-text` applies to override settings on its ancestors. +Alternatively, setting `expand-text` to `false` on an XProc element turns this behavior off: the brackets become regular brackets again. [The spec also describes](https://spec.xproc.org/3.0/xproc/#expand-text-attribute) an attribute `p:inline-expand-text` that can be used in places where the regular `expand-text` would interfere with a functional requirement (namely the representation of literal XML provided in your XProc using `p:inline`). Either of these settings can be used inside elements already set, resulting in “toggling” behavior (it can be turned on and off), as any `expand-text`, by applying to descendants, overrides settings on its ancestors. -## Designating an input at runtime by binding input ports +For the most part it is enough to know that the `expand-text` setting is “on” (`true`) by default, but it can be turned off (`false`) – and (for handling edge cases) back on, lower down in the hierarchy. + +## Designating inputs One potential problem with the pipelines we have looked at so far is that their inputs are hard-wired. While this is sometimes helpful, it should also be possible to apply a pipeline to an XML document (or other input) without having to designate the document inside the pipeline itself. The user or calling application should be able to say “run this pipeline, but this time with this input”. @@ -74,7 +76,7 @@ For example, the [CONVERT-OSCAL-XML-DATA](../../../projects/oscal-convert/CONVER ``` -By default, this pipeline will pick up and process the data it finds at path `data/catalog-model/xml/cat_catalog.xml`, relative to the stylesheet. But any call to this pipeline, whether directly or as a step in another pipeline, can override this. +By default, this pipeline will pick up and process the data it finds at path `data/catalog-model/xml/cat_catalog.xml`, relative to the pipeline instance (XProc file). But any call to this pipeline, whether directly or as a step in another pipeline, can override this. The Morgana processor defines [a command syntax for binding inputs to ports](https://www.xml-project.com/manual/ch01.html#R_ch1_s1_2). It looks like this (when used with the script deployed with this repository): @@ -82,13 +84,19 @@ The Morgana processor defines [a command syntax for binding input $ ../xp3.sh *PIPELINE.xpl* -input:*portname=path/to/a-document.xml* -input:*portname=path/to/another-document.xml* ``` -Here, two different `-input` arguments are given for the same port. You can have as many as needed if the port, like this one, has `sequence="true"`, meaning any number of documents (from zero to many) can be bound to the port, and the pipeline will accommodate. When more than one port is defined, one (only) can be designated as `primary="true"`, meaning it will be provided implicitly when a port connection is required (by a step) but not given in the pipeline. Notice that the name of the port must also appear, as in `-input:portname`, since pipelines can have ports supporting sequences, but also as many input ports as it needs, named differently, for documents playing different roles in the pipeline. In place of `portname` here, a common name for a port (conventional when it is the pipeline's only or primary input) is `source`. +Here, two different `-input` arguments are given for the same port. You can have as many as needed if the port, like this one, has `sequence="true"`, meaning any number of documents (zero, one or more) can be bound to the port, and the pipeline will accommodate. When more than one port is defined for a pipeline, one (only) can be designated as `primary="true"`, allowing it to be provided implicitly when a port connection is required (by a step) but not given in the pipeline. Notice that the name of the port must also appear in the command argument, as in `-input:portname`, since while pipelines can have ports supporting sequences, they will also have different ports, named differently, for documents playing different roles in the pipeline. + +In place of `portname` here, a common name for a port (conventional when it is the pipeline's only or primary input) is `source`. But you can also expect to see ports (especially secondary ports) with names like `schema`, `stylesheet` and `insertion`: port names that offer hints as to what the step does. + +A port designated with `sequence="true"` can be empty (no documents at all) and a process will run. But by default a single document is both expected and required. + +Among other things, this means that a pipeline that has ``, since it is not a sequence but also has no document, cannot be run unless a (single) document for the `x` port (as it is named here) is provided when it is invoked. -### Binding to input ports vs p:load steps +### Lightening the `p:load` -XProc offers two ways to acquire data from outside the pipeline: by using `p:load` or by binding inputs to an input port using `p:input/p:document`. These are somewhat different in operation - errors produced by `p:load` cannot be detected until the pipeline is run, whereas failures with `p:input` should be detected when the pipeline itself is loaded and compiled (i.e. during *static analysis*), and processors may be able to apply different kinds of exception handling, fallbacks or support for redirects. (As always you can try, test and determine for yourself.) Apart from this distinction the two approaches have similar effects – whether to use one or the other depends often on how you expect the pipeline to be used and distributed, not on whether it works. +As an alternative to binding inputs to using `p:input/p:document` (on a pipeline definition) or `p:with-input` (on a step invocation), XProc offers another way to acquire data from outside the pipeline: by using a `p:load` step. This is somewhat different in operation: as it is a step in itself, errors produced by `p:load` cannot be detected until the pipeline is run, whereas failures with `p:input` should be detected when the pipeline itself is loaded and compiled (i.e. during *static analysis*), and processors may be able to apply different kinds of exception handling, fallbacks or support for redirects. (As always you can try, test and determine for yourself.) Apart from this distinction the two approaches have similar effects – whether to use one or the other depends often on how you expect the pipeline to be used, distributed, and maintained, since either can work in operation. -Although one distinction is that p:document appears on input ports, which can be overridden, this does not mean that p:document can't be essentially “private” to a pipeline or pipeline step. For example, if you wish to acquire more than a single document, without p:load, known in advance (i.e. the file names can be hard-coded), make a step like this: +Although one distinction is that `p:document` appears on input ports, which can be overridden (or rather, set dynamically), this does not mean that `p:document` cannott be essentially “private” to a pipeline or pipeline step. For example, if you wish to acquire, without `p:load`, more than a single document known in advance (i.e. the file names can be hard-coded), provide your step (`p:identity` in this case) with inputs like so: ``` @@ -100,9 +108,9 @@ Although one distinction is that p:document appears on input ports, which can be ``` -This binds the documents to the input of an **identity** step (which supports a sequence), without exposing an input port in the main pipeline. +This binds the documents to the input of the step (as `p:identity` supports a sequence, more than one is fine), without exposing an input port in the main pipeline. -A more dynamic approach is sometimes useful: first, acquire a list of file names, for example: +Combining the approaches permits another useful capability: first, acquire a list of file names, for example (here using `p:input/p:inline)`: ``` @@ -130,37 +138,25 @@ One tradeoff is that the override mechanism will be different. We override the f This makes the second approach especially appealing if the file list can be derived from some kind of metadata resource or, indeed, `p:directory-list`…. -## Identity pipeline testbed - -An identity or “near-identity” or modified-identity pipeline has its uses, including diagnostics. Since inputs and outputs are supposed to look the same, any changes they show between inputs and outputs can be revealing. - -They are also useful for testing features in your environment or setup, for example features for resource acquisition and disposition, that is, how you get data into your pipeline and then out again. - -Additionally, there are actually useful operations supported by a pipeline that presents its input unchanged with respect to its model. For example, it can be used to transcode a file from one encoding to another – changing nothing in the data, but rewriting it into a different character set. This is because with XProc, transcoding does not actually happen within the pipeline, but on its boundaries - when a file is read, or written (aka serialized). So internally, a pipeline set up to do this doesn't have any action to take. +## Warning: do you know where your source files are? -### 0.01 - what is a “document” +As noted in the [101 Lesson Unit](oscal-convert_101.md), one of the advantages of using URIs, over and above the Internet itself, is that systems can support URI redirection when appropriate. This will ordinarily be in order to provide local (cached) copies of standard resources, thereby mitigating the need for copying files over the Internet. While this is a powerful and useful feature – arguably essential for systems at scale – it can present problems for transparency and debugging if the resource obtained by reference to a URI is not the same as the developer (or “contract”) expects. -Just about any kind of digital input can be an XProc document. Keeping things simple and regular, XProc's concept of document is broad enough to encompass XML, HTML, JSON and other kinds of inputs including plain text and binaries. [Read more here](oscal-convert_402.md). +A similar problem results from variations in URI syntax, both due to syntax itself and due to the fact that URIs can be relative file paths, so `file.xml` and `../file.xml` could be the same file, or not, depending on the context of evaluation. -### 0.1 - loading documents known or discovered in advance +To help avoid or manage problems resulting from this (i.e., from features as bugs), XPath and XProc offer some useful functions: -The XProc step `p:load` can be used to load the resource indicated into the pipeline. +* XPath [resolve-uri()](https://www.w3.org/TR/xpath-functions-31/#func-resolve-uri) can be used to expand a relative URI into an absolute URI +* XProc [p:urify](https://spec.xproc.org/3.0/xproc/#f.urify) will normalize URIs and rewrite file system paths as URIs – very useful. +* In XProc 3.1, a new function [p:lookup-uri](https://spec.xproc.org/lastcall-2024-08/head/xproc/#f.lookup-uri) can query the processor's URI resolver regarding a URI, without actually retrieving its resource. This makes available to the developer what address is actually to be used when a URI is followed – detecting any redirection – and permits defensive code to be written when appropriate. -Watch out, since `p:load` with `href=""` – loading the resource at the location indicated by the empty string, `""` – will load the XProc file itself. This is conformant with rules for URL resolution. +## Probing error space – data conversions -### 0.2 - binding a document to an input port +Broadly speaking, problems encountered running these conversions (or indeed, transformations in general) fall into two categories, the distinction being simple, namely whether a bad outcome is due to an error in the processor and its logic, or in the data inputs provided. The term “error” here hides a great deal. So does “bad outcome”. One type of bad outcome takes the form of failures at runtime – the term “failure” again leaving questions open, while at the same time it seems fair to assume that not being able to conclude successfully is a bad thing. But other bad outcomes are not detectable at runtime. If inputs are bad (inconsistent with stated contracts such as data validation), processes can run *correctly* and deliver incorrect results: correctly representing inputs, in their incorrectness. Again, the term *correct* here is underspecified and underdefined, except in the case. -### 0.3 - loading documents discovered dynamically with `p:directory-list` +For these and other reasons we sometimes prefer to call them “exceptions”, while at the same time we know many errors are not actually errors in the process but in the inputs. We need reliable ways to tell this difference. A library of reliable source examples -- a test suite – is one asset that helps a great deal. Even short of unit tests, however, a great deal can be discovered when working with “bad inputs” interactively. This knowledge is especially valuable once we are dealing with examples that are only “normally bad”. -### 0.4 - saving results to the file system - -### 0.5 - exposing results on an output port - -## Probing error space - data conversions - -Broadly speaking, problems encountered running these conversions fall into two categories, the distinction being simple, namely whether a bad outcome is due to an error in the processor and its logic, or in the data inputs provided. The term “error” here hides a great deal. So does “bad outcome”. One type of bad outcome takes the form of failures at runtime - the term “failure” again leaving questions open, while at the same time it seems fair to assume that not being able to conclude successfully, is bad. But other bad outcomes are not detectable at runtime. If inputs are bad (inconsistent with stated contracts such as data validation), processes can run *correctly* and deliver incorrect results: correctly representing inputs, in their incorrectness. Again, the term *correct* here is underspecified and underdefined, except in the case. - -For these and other reasons we sometimes prefer to call them “exceptions”, while at the same time we know many errors are not actually errors in the process but in the inputs. We need reliable ways to tell this difference. A library of reliable source examples -- a test suite – is one asset that helps a great deal. Even short of unit tests, however, a great deal can be discovered when working with “bad inputs” interactively. +Some ideas on how to do this appear below. ### Converting broken XML or JSON @@ -170,30 +166,58 @@ Create a syntactically-invalid (not **well-formed**) XML or JSON document - or r ### Converting not-OSCAL -XML practitioners understand how XML can be well-formed and therefore legible for processing, without being a valid instance of a specific markup vocabulary. You can have XML, for example, without having OSCAL. +XML practitioners understand how XML can be well-formed and therefore legible for processing, without being a valid instance of a specific markup vocabulary. You can have XML, for example, without having OSCAL. This was discussed in [the previous lesson unit](oscal-convert_101.md). + +But a hands-on appreciation, through experience, of how this actually looks, is better than a merely intellectual understanding of why it must be. + +When providing XML that is not OSCAL to a process that expects OSCAL inputs, you should properly see either errors (exceptions), or bad results (outputs missing or wrongly expressed) or both. *A tutorial is the perfect opportunity to experiment and see.* -When providing XML that is not OSCAL to a process that expects OSCAL inputs, you should properly see either errors (exceptions), or bad results (outputs missing or wrongly expressed) or both. *Experiment and see!* +For example, try using the OSCAL XML-to-JSON pipeline on an XProc document (which is XML, but not OSCAL). -Detection of bad results is an important capability - why we have validation against external constraint sets such as schemas. A later unit will cover this – meanwhile, inquiries on the topic are welcome. +The interesting thing here is how permissive XProc is, unless we code it to be jealous. Detection of bad results is an important capability, which is why we also need to be able to *validate* data against external constraint sets such as schemas, also covered in more detail later. ### Converting broken OSCAL The same thing applies to attempting to process inputs when OSCAL is expected, yet the data sources fail to meet requirements in some important respect, sometimes even a subtle requirement, depending on the case. The more fundamental problem here is the definition of “correct” versus “broken”. -We begin generally with the stipulation that by “OSCAL” what we mean is, any XML (or JSON or YAML) instance conformant to an OSCAL schema, and thereby defined in such a manner as to enable their convertibility. The reasoning is thus somewhat circular. If we can convert it successfully, we can claim to know it as OSCAL (by virtue of the knowledge we demonstrate in the conversion). If we know it to be OSCAL by virtue of schema validation, we have assurances also regarding its convertibility. +We begin generally with the stipulation that by “OSCAL” what we mean is, any XML (or JSON or YAML) instance conformant to an OSCAL schema, and thereby defined in such a manner as to enable their convertibility. The reasoning is thus somewhat circular. If we can convert it successfully, we have a basis to claim it is OSCAL, by virtue of its *evident* conformance to OSCAL models in operation. If we know it to be OSCAL by virtue of schema validation, we have assurances also regarding its convertibility. -This is because with respect to these model-based conversions, the OSCAL project also offers tools that can convert any schema-valid OSCAL XML into equivalent schema-valid JSON, while doing the same the other way, making OSCAL XML from OSCAL JSON. In either case, schema validation is invaluable for defining the boundaries of the conversion itself. Data that is not schema-valid, it is reasoned, cannot be qualified or described at all, so no straightforward mapping from arbitrary inputs can be specified. But a mapping can be specified for inputs that are known, namely OSCAL inputs. The converter respects the validation rule not by enforcing it directly, but rather by depending on it. +In contrast, data that is not schema-valid (as can be reasoned) cannot be *confidently* and *completely* qualified or described at all, so only very simple (“global”, generic or “wildcard”) mappings from arbitrary inputs can be specified. But a mapping can be specified for inputs that are known, such as OSCAL inputs. An OSCAL converter respects the validation rules not by enforcing them directly, but rather by depending on the consistency they describe and constrain. -Fortunately, by means of Schematron and transformations, XProc is an excellent tool not only for altering data sets, but also for detecting variances, either in inputs or its results, from any specifications that can be expressed in XPath. These capabilities – detection and amelioration – can be used together, and separately. When a pipeline cannot guarantee correct outputs, it can at least provide feedback. +Fortunately, by means of Schematron and transformations, XProc is an excellent tool not only for altering data sets, but also for imposing such validation rules, by detecting variances, either in inputs or its results. XPath, the query language, becomes key. With XPath to identify features (both good and bad), and XProc for modifications, these capabilities – detection and amelioration – can be used together, and separately. When a pipeline cannot guarantee correct outputs, it can at least provide feedback. -Altering XML to “break” it in various subtle ways is likely to happen by accident. Get used to the feeling by *making it happen* on purpose. +Depending on the application and data sources, XML that is “broken” in various subtle ways is more or less inevitable. See what it looks like by making this happen on purpose. ## XProc diagnostic how-to +These methods are noted above, but they are so important they should not be skipped. + ### Emitting runtime messages +Most XProc steps support a `message` attribute for designating a message to be emitted to the console or log. As shown, these also support Attribute Value Syntax for dynamic evaluation of XPath. + +For example, again using `p:identity`: + +``` + +``` + +This step does not change the document, but reports its current Base URI and content-type at that point in the pipeline. + +This can be useful information since both those properties can (and should) change based on your pipeline's operations. + ### Saving out interim results -`p:store` +Learn to use the `p:store` step, if only because it is so useful for saving interim pipeline results to a place where they can be inspected. + +[Produce-FM6-22-chapter4](../../../projects/FM6-22-import/PRODUCE_FM6-22-chapter4.xpl) is a demonstration pipeline in this repo with a switch at the top level, in the form of an option named `writing-all`. When set to `true()`, it has the effect of activating a set of `p:store` steps within the pipeline using the XProc [use-when feature](https://spec.xproc.org/3.0/xproc/#use-when) feature, to write intermediate results. The resulting set of files is written into a `temp` directory to keep them separate from final results: they show the changes being made over the input data set, at useful points for tracing the pipeline's progress. ## Validate early and often + +One way to manage the problem of ensuring quality is to validate the inputs before processing, either as a dependent (prerequisite) process, or built into a pipeline. This enables a useful separation between problems resulting from bad inputs, and problems within the pipeline. Whatever you want to do with invalid inputs, including skipping or ignoring them, producing warnings or runtime exceptions, or even making corrections when possible and practical – all this can be defined in a pipeline much like anything else. + +Keep in mind that since XProc brings support for multiple schema languages plus XPath, “validation” could mean almost anything. This must be determined for the case. + +In the [publishing demonstration project folder](../../../projects/oscal-publish/publish-oscal-catalog.xpl) is an XProc that valides XML against an OSCAL schema, before running steps to convert it to HTML, for display in a browser. The same could be done for an XProc that converts OSCAL data into JSON -- since OSCAL has both XSD for XML, and JSON Schema for JSON, this could be done before the conversion, after, or both. + +Two projects in this repository (at time of writing) deal extensively with validation: [oscal-validate](../../../projects/oscal-validate/) and [schema-field-tests](../../../projects/schema-field-tests/). diff --git a/tutorial/sequence/Lesson03/oscal-convert_201.md b/tutorial/sequence/Lesson03/oscal-convert_201.md index 53192dd..ccb53ac 100644 --- a/tutorial/sequence/Lesson03/oscal-convert_201.md +++ b/tutorial/sequence/Lesson03/oscal-convert_201.md @@ -22,7 +22,7 @@ Pipelines throughout the repository serve as examples for the description that f ## XProc as XML -An XProc pipeline takes the form of an XML “document entity”. Unless you are concerned to write an XML parser (which is not very likely for XProc's natural constituency), this typically means an XML file, that is to say a file encoded in plain text (typically the UTF-8 serialization of Unicode, or alternatively another form of “plain text” supported by your system or environment), and following the rules of XML syntax. These rules include how elements and attributes and other XML features are encoded in **tags** that +An XProc pipeline takes the form of an XML “document entity”. Unless you are concerned to write an XML parser (which is not very likely for XProc's natural constituency), this typically means an XML file, that is to say a file encoded in plain text (typically the UTF-8 serialization of Unicode, or alternatively another form of “plain text” supported by your system or environment), and following the rules of XML syntax. These rules include how elements and attributes and other XML features are encoded in **tags** that: * Follow the rules with respect to naming, whitespace, delimiters and reserved characters * Are correctly balanced, with an end tag for every start tag – for a `` there must be a `` (an end to the start). @@ -74,7 +74,7 @@ On `p:declare-step`, whether at the top or in a step definition within a pipelin The name makes it possible to reference the step by name. This is often useful and sometimes more or less essential, for example for providing input to one step from another step's output. (We say “more or less essential” because the processor will produce names for itself as a fallback, if it needs them, but these are brittle, somewhat opaque – such as `!1.2.3` – and more difficult to use than the names a developer gives.) -Understandably, the name of an XProc must be different from the names given to all the steps in the XProc (which must also be distinct). +Understandably, the name of an XProc must be different from the names given to all the steps in the XProc (which must also be distinct). This repository follows a rule that a step name should correspond to its file base name (i.e., without a filename suffix), so `identity_` for `identity_.xproc`, etc. But that is a rule for us, not for XProc in general. @@ -107,7 +107,7 @@ After imports, prologue and (optional) step declarations, the step sequence that One other complication: among the steps in the subpipeline, `p:variable` (a variable declaration) and `p:documentation` (for out-of-band documentation) are also permitted – these are not properly steps, but can be useful to have with them. -In summary: any XProc pipeline, viewed as a step declaration, can have the following -- +In summary: any XProc pipeline, viewed as a step declaration, can have the following: * Pipeline name and type assignment (if needed), given as attributes at the top * **Imports**: step declarations, step libraries and functions to make available @@ -190,7 +190,7 @@ But this is an important category, since such extensions may include XProc steps One answer: The [XSpec smoke test](./../../../smoketest/TEST-XSPEC.xpl) calls an extension step named `ox:execute-xspec`, defined in an imported pipeline. In this document, the prefix `ox` is bound to a utility namespace, `http://csrc.nist.gov/ns/oscal-xproc3`. -In an XProc pipeline (library or step declaration) one may also see additional namespaces, including +In an XProc pipeline (library or step declaration) one may also see additional namespaces, including: * The namespaces needed for XSLT, XSD, or other supported technology * One or more namespaces deployed by the XProc author to support either steps or internal operations (for example, XSLT functions) diff --git a/tutorial/sequence/Lesson03/oscal-convert_401.md b/tutorial/sequence/Lesson03/oscal-convert_401.md index 62890f0..b56dfa9 100644 --- a/tutorial/sequence/Lesson03/oscal-convert_401.md +++ b/tutorial/sequence/Lesson03/oscal-convert_401.md @@ -8,7 +8,7 @@ ## Goals -Understand a little more about JSON and other data formats in an XML processing environment +Understand a little more about JSON and other data formats in an XML processing environment. ## Resources @@ -16,8 +16,6 @@ A [content-types worksheet XProc](../../worksheets/CONTENT-TYPE_worksheet.xpl) i The pipeline [READ-JSON-TESTING.xpl](../../worksheets/READ-JSON-TESTING.xpl) provides an experimental surface for working functionality specifically related to JSON and XDM map objects. -Find more treatment in [the next lesson unit](oscal-convert_402.md). This is a topic you can also learn by through trial and error. - ## Exercise some options The worksheets just cited provide an opportunity to try out `content-type` configuration options. Note how you can specify a content type that will serve as a constraint on inputs and outputs, analogous in some ways to the type signature on a function. And the step `p:content-type` serves to cast one content type to another, according to [rules given in the Specification](https://spec.xproc.org/3.0/steps/#c.cast-content-type). Note that for this step to work, both inputs and outputs must conform to certain requirements: not everything can be cast! @@ -27,4 +25,4 @@ The worksheets just cited provide an opportunity to try out `content-type` confi * Use the function `p:document-properties(.,'content-type')` to bring back the content type of a document on a port or in a pipeline. In XPath, `.` refers to an designated as the (dynamic) processing context: so `p:document-properties($doc,'content-type')` works for any $doc considered by XProc to be or serve as a *document*. * Interestingly, this means we can expect to find content-type='application/json' whenever an XProc document is nothing more than an object or map – as can happen, by design. -[READ-JSON-TESTING.xpl](../../worksheets/READ-JSON-TESTING.xpl) is a sandbox for playing with JSON objects as XDM maps. The [content-types worksheet](../../worksheets/CONTENT-TYPE_worksheet.xpl) is set up for trying content-type options on inputs and outputs. +[READ-JSON-TESTING.xpl](../../worksheets/READ-JSON-TESTING.xpl) is a sandbox for playing with JSON objects as XDM maps. The [content-types worksheet](../../worksheets/CONTENT-TYPE_worksheet.xpl) is set up for trying content-type options on inputs and outputs. diff --git a/tutorial/sequence/Lesson04/courseware_101.md b/tutorial/sequence/Lesson04/courseware_101.md index e82bdfe..5687f3c 100644 --- a/tutorial/sequence/Lesson04/courseware_101.md +++ b/tutorial/sequence/Lesson04/courseware_101.md @@ -8,9 +8,9 @@ ## Goals -Understand better how this tutorial is produced +Understand better how this tutorial is produced. -See an example of a small but lightweight and scalable publishing system can be implemented in XProc and XSLT +See an example of a small but lightweight and scalable publishing system can be implemented in XProc and XSLT. ## Prerequisites diff --git a/tutorial/sequence/Lesson04/courseware_219.md b/tutorial/sequence/Lesson04/courseware_219.md index 962fefc..3b41025 100644 --- a/tutorial/sequence/Lesson04/courseware_219.md +++ b/tutorial/sequence/Lesson04/courseware_219.md @@ -12,7 +12,7 @@ Help yourself, your team and allies. Produce a useful spin-off from a task or problem you need to master anyway. -Learn not only by doing but by writing it down for yourself and others +Learn not only by doing but by writing it down for yourself and others. ## Prerequisites @@ -33,9 +33,9 @@ However, any text editor or programmers' coding environment also works (to whate Astute readers will have observed that a markup-based deployment invites editing. But the authoring or data acquisition model of this tutorial is not Markdown-based - Markdown is paradoxically not used for its intended purpose but as one of several **publication** formats for this data set, which is currently written in an XML-based HTML5 tag set defined for the project. By writing, querying and indexing in XHTML we can use XProc from the start. Extensibility and flexibility in publication is one of the strengths - to publish a new or rearranged tutorial sequence can be done with a few lines and commands. A drag and drop interface supporting XProc makes this even easier, while it is already installed and running under CI/CD, meaning both editorial and code quality checks can be done with every commit. -Improving a page is as simple as editing the copy found in XXX and XXX +Improving a page is as simple as editing the copy found in XXX and XXX ... -Making and deploying a new pages is a little harder: XXX +Making and deploying a new pages is a little harder: XXX ... ### Apply Schematron to your edits diff --git a/tutorial/source/acquire/acquire_101_src.html b/tutorial/source/acquire/acquire_101_src.html index db56efb..f1a578c 100644 --- a/tutorial/source/acquire/acquire_101_src.html +++ b/tutorial/source/acquire/acquire_101_src.html @@ -8,9 +8,10 @@

101: Project setup and installation

Goals

-

Set up and run an XProc 3.0 pipeline in an XProc 3.0 engine. See the results.

-

With a little practice, become comfortable running XProc pipelines, seeing results on a console (command - line) window as well as in the file system.

+

Set up and run an XProc 3.0 pipeline in an XProc 3.0 engine.

+

Get some results. See them in the console (message tracebacks), the file system (new files acquired or + produced), or both.

+

With a little practice, become comfortable running XProc pipelines.

After the first script to get the XProc engine, we use XProc for subsequent downloads. Finishing the setup gets you started practicing with the pipelines.

@@ -21,8 +22,9 @@

Prerequisites

If ready to proceed, you have a system with Java installed offering a JVM (Java Virtual Machine) available on the command line (a JRE or JDK), version 8 (and later).

Tip: check your Java version from the console using java --version.

-

Also, you have a live Internet connection and the capability to download and save resources (binaries - and code libraries) for local use.

+

Also, you have an Internet connection available and the capability to download and save resources + (binaries and code libraries) for local use. (There are no runtime dependencies on connecting, but some + XProc pipelines make requests over http/s.)

You are comfortable entering commands on the command line (i.e. terminal or console window). For installation, you want a bash shell if available. On Windows, both WSL (Ubuntu) and Git Bash have been found to work. If you cannot use bash, the setup can be done by hand (downloading and @@ -33,6 +35,8 @@

Prerequisites

continue to use bash.

If you have already performed the setup as described in README and setup notes, this lesson unit will be a breeze.

+

Prior knowledge of XProc, XSLT or XML is not a prerequisite (for this or any lesson unit). If you are + learning as we go – at any level – welcome and please seek us out for help and feedback.

Resources

@@ -65,11 +69,12 @@

Step One: Setup

distribution.

After running the setup script, or performing the installation by hand, make sure you can run all the smoke tests successfully.

-

As noted in the docs, if you happen already to have Morgana XProc III, you do not need to - download it again. Try skipping straight to the smoke tests. You can use a runtime script - xp3.sh or xp3.bat as a model for your own, and adjust. Any reasonably recent - version of Morgana should function if configured correctly, and we are interested if it does not.

+

As noted in the docs, if you happen already to have an XProc 3.0 processor, you do not need to download Morgana XProc III here. At time of writing + (December 2024) this notably includes XML Calabash + 3 (newly out in an alpha release). In any case, equipped with any conformant XProc 3.0/3.1 + implemenentation. try skipping straight to the smoke tests. You can use a runtime script + xp3.sh or xp3.bat as a model for your own, and adjust.

Shortcut

If you want to run through the tutorial exercises but you are unsure of how deeply you will delve, you @@ -101,51 +106,67 @@

Step Two: Confirm

Comments / review

Within this project as a whole, and within its subprojects, everything is done with XProc 3.0. The aim is to - make it possible to do anything needed with XProc, and moreover to make any one thing needed with a single - XProc pipeline, using a single script, which invokes an XProc processor to read and execute. This - simplicity, with the replicability that goes with it, is at the center of the argument for XProc.

+ make it possible to do anything needed with XProc, regarded as a general-purpose 'scripting' solution for + the choreography of arbitrarily complex jobs, tasks and workflows. To support arbitrary complexity and + scalability together, it must be very simple. This simplicity, with the composability that goes with it, is + at the center of the argument for XProc.

+

You will see this simplicity at the level of top-level + invocation XProc pipelines designed to serve as entry points. If things are done right, these will + be fairly simple, well encapsulated subroutines in potentially elegant arrangements. They in turn may + call on libraries of XProc pipelines for well-defined tasks.

Effectively (and much more could be said about the processing stack, dependency management and so forth) what this means is that XProc promises the user and the developer (in either or both roles) with focused and concentrated points of control or points of adjustment. In the field – where software is deployed and used – things almost never just drop in. User interfaces, APIs, dependencies and platform quirks: all these constrain what users can do, and even developers are rarely as free as they would like to experiment and explore.

-

To the extent this is the case, this project only works if things are actually simple enough to pick up, - use, learn and adapt.

-

xp3.sh and xp3.bat represent attempts at this. Each of them (on its execution - platform) enables a user to run, without further configuration, the Morgana XProcIIIse processor on any XProc - 3.0 pipeline, assuming the appropriate platform for each (bash in the case of the shell script, - Windows batch command syntax for the bat file). Other platforms supporting Java (and hence - Morgana with its libraries) could be provided with similar scripts.

-

Such a script itself must be vanilla and generic: it simply invokes the processor with the designated - pipeline, and stands back. The logic of operations is entirely encapsulated in the XProc pipeline - designated. XProc 3.0 is both scalable and flexible enough to open a wide range of possibilities for data - processing, both XML-based and using other formats such as JSON and plain text. It is the intent of this - project not to explore and map this space – which is vast – but to show off enough XProc and related logic - (XSLT, XSpec) to show how this exploration can be done. We are an outfitter at the beginning of what we hope - will be many profitable voyages to places we have never been.

+

What is offered here is therefore both an example of a deployment of a demonstration solution set + using an open-source tool (an XProc engine capable of running the pipelines we offer), doing things that are + actually or potential useful (with OSCAL data), and a set of pipelines that should in principle work + as well in any other tool or software deployment supporting XProc 3.0.

+

But to the extent this imposes a requirement for both abstract and testable conformance (or at any rate for + interoperability as a proxy for that), this project only works if things are actually simple enough to pick + up, use, learn and adapt. xp3.sh and xp3.bat represent attempts at making a simple + deployment, easy to emulate but better yet, improve.

+

Each of these scripts (on its execution platform) enables a user to run, without further configuration, the + Morgana XProcIIIse processor on any + XProc 3.0 pipeline, assuming the appropriate platform for each (bash in the case of the shell + script, Windows batch command syntax for the bat file). Providing a similar script for XML + Calabash remains (with apologies to NDW) a desideratum for this project as we post this version of + the tutorial. Stay tuned!

+

In any case such a script itself must be vanilla and generic: it will simply invoke the processor + with the designated pipeline, and stand back. (Yes, runtime arguments and settings can be provided.) The + logic of operations is entirely encapsulated in the XProc pipeline designated. XProc 3.0 is both scalable + and flexible enough to open a wide range of possibilities for data processing, both XML-based and using + other formats such as JSON and plain text. It is the intent of this project not to explore and map this + space – which is vast – but to show off enough XProc and related logic (XSLT, XSpec) to show how this + exploration can be done. We are an outfitter at the beginning of what we hope will be many profitable + voyages to places we have never been.

When running from a command line

As simple examples, these scripts show only one way of running XProc. Keep in mind that even simple - scripts can be used in more than one way.

+ scripts can be used in more than one way.

For example, a pipeline can be executed from the project root:

$ ./xp3.sh smoketest/TEST-XPROC3.xpl

Alternatively, a pipeline can be executed from its home directory, for example if currently in the - smoketest directory (note the path to the script):

+ smoketest directory (note the path to the script):

$ ../xp3.sh TEST-XPROC3.xpl
-

This works the same ways on Windows, with adjustments:

+

This works the same ways on Windows, with adjustments:

> ..\xp3 TEST-XPROC3.xpl 

(On Windows a bat file suffix marks it as executable and does not have to be given explicitly when called.)

-

Windows users (and others to varying degrees) can set up a drag-and-drop based workflow – using your - mouse or pointer, select an XProc pipeline file and drag it to a shortcut for the executable (Windows - batch file). A command window opens to show the operation of the pipeline. See the Windows users (and others to varying degrees) can set up a drag-and-drop based workflow – + using your mouse or pointer, select an XProc pipeline file and drag it to a shortcut for the executable + (Windows batch file). A command window opens to show the operation of the pipeline. See the README for more information.

-

It is important to try things out since any of these methods can be the basis of a workflow.

+

It is important to try things out since any of these methods can be the basis of a workflow.

For the big picture, keep in mind that while the command line is useful for development and demonstration - – and however familiar XProc itself may become to the developer – to a great number of people it remains - obscure, cryptic and intimidating if not forbidding. Make yourself comfortable at the command line!

+ – and however familiar XProc itself may become to the developer – to a great number of people it remains, + like XProc, obscure, cryptic and intimidating if not forbidding.

+

This is a pity because (among other reasons) the kind of layered system we will see and build here is not + endless or infinitely complex. Begin by making yourself comfortable at the command line. See how the + pieces fit together by working them.

Then too, if you have something better, by all means use it. XProc-based systems, when integrated into tools or developer editors and environments, can look much nicer than tracebacks in a console window. The elegance and power we are trying to cultivate are at a deeper level. First and last, the focus must be on diff --git a/tutorial/source/acquire/acquire_102_src.html b/tutorial/source/acquire/acquire_102_src.html index 8f5b93d..b2680c1 100644 --- a/tutorial/source/acquire/acquire_102_src.html +++ b/tutorial/source/acquire/acquire_102_src.html @@ -8,8 +8,7 @@

102: Examining the setup

Goals

  • Look at some pipeline organization and syntax on the inside
  • -
  • Success and failure invoking XProc pipelines: an early chance to learn to die gracefully (to use the - gamers' idiom).
  • +
  • Success and failure invoking XProc pipelines: making friends with tracebacks
@@ -19,9 +18,12 @@

Prerequisites

similar pipelines.

This discussion assumes basic knowledge of coding, the Internet (including retrieving resources via file and http protocols), and web-based technologies including HTML.

-

XML knowledge is also assumed. In particular, XProc uses XPath - 3.1, the query language for XML. This latest version of XPath builds on XPath 1.0, so any XPath - experience will help. In general, any XSLT or XQuery experience will be invaluable.

+

XML knowledge is not assumed. This poses a special challenge since in addition to its XML-based + syntax, XProc uses the XML Data Model (XDM) along with + XPath 3.1, the query language for XML: together, a deep + topic. We make the assumption that if you already know XML, XPath, XSLT or XQuery, much will be familiar, + but you will be tolerant of some restatement for the sake of those who do not. (As we all start somewhere, + why not here.)

You will also need a programmer's plain text editor, XML/XSLT editor or IDE (integrated development environment) for more interactive testing of the code.

@@ -33,15 +35,17 @@

Prerequisites

Step One: Inspect the pipelines

The two groupings of pipelines used in setup and testing can be considered separately.

The key to understanding both groups is to know that once the initial Setup - script is run, Morgana can be invoked directly, as paths and scripts are already in place. In doing - so – before extension libraries are in place – it can use only basic XProc steps, but those are enough to - start with.

+ script is run, your processor or engine (such as Morgana) can be invoked directly, as paths + and scripts are already in place. In doing so – before extension libraries are in place – it can use only + basic XProc steps, but those are enough to start with.

Specifically, the pipelines can acquire resources from the Internet, save them locally, and perform unarchiving (unzipping). Having been downloaded, each library provides software that the pipeline engine (Morgana) can use to do more.

Accordingly, the first group of pipelines (in the lib directory has a single purpose, namely (together and separately) to download software to augment Morgana's feature set.

+

If not using the open-source Morgana distribution, you can skip to smoke tests below, and see how far you + get.

  • lib/GRAB-SAXON.xpl
  • lib/GRAB-SCHXSLT.xpl
  • @@ -50,11 +54,13 @@

    Step One: Inspect the pipelines

    Pipelines in a second group work similarly in that each one exercises and tests capabilities provided by software downloaded by a member of the first group.

    Take a look at these files. It may be helpful (for those getting used to it) to envision the XML syntax as a set of nested frames with labels and connectors.

    @@ -93,7 +99,12 @@

    For consideration

    software developers together, who must define problems to be solved before approaches to them can be found.

    The open questions are: who can use XProc pipelines; and how can they be made more useful? The questions - come up in an OSCAL context or any context where XML is demonstrably capable.

    + come up in an OSCAL context or any context where XML is demonstrably capable, or indeed anywhere we find the + necessity of handling data with digital tools has become inescapable.

    +

    In order to help answer this question, actual experience will be invaluable – part of our motive here. + Unless we can make the demonstration pipelines in this repository accessible, they cannot be reasoned about. + That accessibility requires not only open publication, but also use cases and user bases ready to take + advantage.

    Having completed and tested the setup you are ready for work with XProc: proceed to the next lesson.

diff --git a/tutorial/source/acquire/acquire_599_src.html b/tutorial/source/acquire/acquire_599_src.html index 70be721..53a28b6 100644 --- a/tutorial/source/acquire/acquire_599_src.html +++ b/tutorial/source/acquire/acquire_599_src.html @@ -5,23 +5,42 @@

599: Meeting XProc

+
+

Goals

+

Gain some more sense of context.

+

XProc is not a simple thing, with only one way in. The territory is vast, but it has also been well charted. + And here we have a pathway marked in front of us.

+

Resources

A Declarative Markup Bibliography is - available on line for future reference on this theoretical topic.

+ available on line for future reference on this interesting topic.

Some observations

-

Because it is now centered on pipelines as much as on files and software packages, dependency - management is different from other technologies including Java and NodeJS – how so?

+

Because it is now centered on pipelines built out of combining capabilities of steps + (which may be black boxes), as much as on files and software packages, dependency management when using + XProc is different from other technologies including Java and NodeJS – how so?

MorganaXProc-III is implemented in Scala, and Saxon is built in Java, but otherwise distributions including the SchXSLT and XSpec distributions consist mainly of XSLT. This is either very good (with development and maintenance requirements in view), or not good at all.

-

Which is it, and what are the determining variables that tell you XProc is a good fit? How much of this is - due to the high-level, abstracted nature of 4GLs including both XSLT - 3.1 and XProc 3.0? Prior experience with XML-based systems and the problem domains in which they work well - is probably a factor. How much are the impediments technical, and how much are they due to culture?

+

If not using Morgana but another XProc engine (at time of writing, XML Calabash 3 has been published in + alpha), there will presumably be analogous arrangements: contracts between the tool and its dependencies, + software or components and capabilities bundled and unbundled.

+

So does this work well, on balance, and what are the determining variables that tell you XProc is a good fit + for data processing, whether high touch, or at scale? How much of this is due to the high-level, abstracted + nature of 4GLs including + both XSLT 3.1 and XProc 3.0? Prior experience with XML-based systems and the problem domains in which they + work well is probably a consideration. But maybe the more important blockers have to do with culture, states + of knowledge, incorrect assumptions and outdated perceptions.

+

Will it always be that a developer determined to use XSLT will find a way, whereas a developer determined + not to, will find a way to refuse it? XProc in 2024 seems slow in adoption – maybe because everyone who + would want it, already has a functional equivalent in place.

+

In any case, it might also be that such neglect creates a market opportunity. Those who use these + technologies without advertising the fact may have the most to gain. But building the commons is also a + common responsibility.

+

It's all about the tools. Find ways to support your open-source developer and the software development + operations who offer free tools and services.

Declarative markup in action

@@ -29,16 +48,33 @@

Declarative markup in action

depend, notably XProc and XSLT but not limited to these, are both nominally and actually conformant to externally specified standard technologies, i.e. XProc and XSLT respectively (as well as others), and reliant to the greatest possible extent on well-documented and accessible runtimes.

-

It is a tall order to ask that any code base should be both easy to integrate and use with others, and at - the same time, functionally complete and self-sufficient. Of these two, we are lucky to get one, even if we - are thoughtful enough to limit ourselves to building blocks. Because the world is complex, we are always +

Is it too much to expect that any code base should be both easy to integrate and use with others, and at the + same time, functionally complete and self-sufficient? Of these two, we are lucky to get one, even if we are + thoughtful enough to limit ourselves to building blocks. Because the world is complex, we are always throwing in one or another new dependency, along with new rule sets. The approach enabled by XML and openly-specified supporting specifications is to work by making everything transparent as possible. We seek for clarity and transparency at all levels (so nothing is downloaded behind the scenes, for example) while also documenting as thoroughly as we can, including with code comments.

-

Can any code base be fully self-explanatory and self-disclosing? Doubtful, even assuming those terms are - meaningful. But one can try and leave tracks and markers, at least. We call it code with the hope and - intent that it should be amenable to and rewarding of interpretation.

+

Can any code base be fully self-explanatory and self-disclosing? This may be doubtful even if we can agree + what those terms mean. At the same time, the attempt can be made: we can try and leave tracks and markers, + at least. We call it code with the hope and intent that it should be amenable to and rewarding of + interpretation.

+
+

Standards for documents

+

In addition to the web itself (in HTML and CSS), a number of important initiatives in recent decades have + capitalized on the core principles of declarative markup:

+ +
\ No newline at end of file diff --git a/tutorial/source/courseware/courseware_101_src.html b/tutorial/source/courseware/courseware_101_src.html index 5bbdb21..ff165e7 100644 --- a/tutorial/source/courseware/courseware_101_src.html +++ b/tutorial/source/courseware/courseware_101_src.html @@ -1,16 +1,16 @@ - Practicum: Learn by Teaching + Courseware 101: Producing this tutorial

Courseware 101: Producing this tutorial

Goals

-

Understand better how this tutorial is produced

+

Understand better how this tutorial is produced.

See an example of a small but lightweight and scalable publishing system can be implemented in XProc and - XSLT

+ XSLT.

Prerequisites

diff --git a/tutorial/source/courseware/courseware_219_src.html b/tutorial/source/courseware/courseware_219_src.html index fd7136b..9b2b812 100644 --- a/tutorial/source/courseware/courseware_219_src.html +++ b/tutorial/source/courseware/courseware_219_src.html @@ -1,7 +1,7 @@ - Practicum: Learn by Teaching + Courseware 219: Learn by Teaching @@ -10,7 +10,7 @@

Courseware 219: Learn by Teaching

Goals

Help yourself, your team and allies.

Produce a useful spin-off from a task or problem you need to master anyway.

-

Learn not only by doing but by writing it down for yourself and others

+

Learn not only by doing but by writing it down for yourself and others.

Prerequisites

@@ -20,7 +20,7 @@

Prerequisites

Writing HTML by hand can be arduous; accordingly, for producing tutorial pages we have used a structured XML authoring environment (oXygen XML Author) with many features including styling in display; full control over styling; Schematron rules in the background along with UI support for content corrections according to those - rules; etc.

+ rules; etc. oXygen Author or its functional equivalent is highly recommended.

However, any text editor or programmers' coding environment also works (to whatever extent generic HTML is supported), and Schematrons applied to HTML files can be run in XProc (as described).

@@ -35,25 +35,48 @@

Resources

Improve or enhance a lesson or lesson unit

-

Astute readers will have observed that a markup-based deployment invites editing. But the authoring or data - acquisition model of this tutorial is not Markdown-based - Markdown is paradoxically not used for its - intended purpose but as one of several publication formats for this data set, which is currently - written in an XML-based HTML5 tag set defined for the project. By writing, querying and indexing in XHTML we - can use XProc from the start. Extensibility and flexibility in publication is one of the strengths - to - publish a new or rearranged tutorial sequence can be done with a few lines and commands. A drag and drop - interface supporting XProc makes this even easier, while it is already installed and running under CI/CD, - meaning both editorial and code quality checks can be done with every commit.

-

Improving a page is as simple as editing the copy found in XXX and XXX

-

Making and deploying a new pages is a little harder: XXX

+

Alert readers will have observed that a Markdown-based deployment invites editing. But the authoring or data + acquisition model of this tutorial is not Markdown-based - Markdown is paradoxically not used for its usual + purpose, but as one of several publication formats for this data set, which is currently written in an + XML-based HTML5 tag set defined for the project. By writing, querying and indexing in XHTML we can use XProc + from the start. The HTML dialect means things mostly just work using HTML tools such as web browsers, + while we have transformations (provided in pipelines) to render into any other publication format we may + happen to need: Markdown being only one of a range of choices.

+

Extensibility and flexibility is one of the strengths of this approach: to publish a new or rearranged + tutorial sequence can be done with a few lines and commands. A drag and drop interface supporting XProc + makes this even easier, while again XProc is already supported under CI/CD, meaning both editorial and code + quality checks can be done with every commit by simply listing the appropriate pipeline with others.

+

The workflow supporting this publication model is simple. A set of lessons (lesson units) is gathered + together in a single folder. That folder is listed in a directory file. + To publish the tutorial, an XProc pipeline is executed that polls these directories and produces Markdown + files corresponding to the inputs, only in a new sequence with links rewritten. Other pipelines can be run + to update the directory to lesson units and other higher-level production such as a single-page HTML reading + preview version. See the earlier treatment for more + details.

+

Improving a page is as simple as editing the HTML source in the folder. Adding a page is as simple as + copying and altering a page in place, and seeing to it the new page is valid to both HTML5 and project + requirements.

Apply Schematron to your edits

+

A Schematron for lesson unit files also ensures that links are in + place and project conventions are observed.

+

You will be grateful to do this interactively in an editor that supports Schematron in the + background.

-

Create a new lesson unit ('area')

+

Create a new lesson unit (area)

+

Create a new folder and add it to the tutorial lesson plan XML.

+

When production pipelines are run, all HTML files present in the newly-listed folders will be included to + the tutorial as lesson units. Within the folder they will be listed alphabetically. Be sure that your new + HTML is also Schematron-valid.

Produce a new project and document it with a tutorial

+

You could make a new lesson topic with lessons on your own new project.

+

This repository is provided with a project template to + make it easier to get started with new pipelines for new applications. Start a new project by copying this + folder into the projects folder, renaming it, and proceeding to edit its file contents.

\ No newline at end of file diff --git a/tutorial/source/oscal-convert/oscal-convert_101_src.html b/tutorial/source/oscal-convert/oscal-convert_101_src.html index 90674b2..cb2ee8e 100644 --- a/tutorial/source/oscal-convert/oscal-convert_101_src.html +++ b/tutorial/source/oscal-convert/oscal-convert_101_src.html @@ -186,7 +186,7 @@

What could possibly go wrong?

Schema, making it straightforward to enforce this dependency at any point in a pipeline, whether applied to inputs or to pipeline results including interim results and pipeline outputs. Resource validation is described further in subsequent coverage including the next Maker lesson unit.

+ class="LessonUnit">Maker lesson unit.

The playing field is the Internet

File resources in XProc are designated and distinguished by URIs. Keep in mind that XProc in theory, and diff --git a/tutorial/source/oscal-convert/oscal-convert_201_src.html b/tutorial/source/oscal-convert/oscal-convert_201_src.html index e15690e..27f450a 100644 --- a/tutorial/source/oscal-convert/oscal-convert_201_src.html +++ b/tutorial/source/oscal-convert/oscal-convert_201_src.html @@ -10,7 +10,7 @@

201: Anatomy of an XProc pipeline

Goals

Get more in-depth information about XProc internals, including especially the parts of an XProc pipeline step, as a step.

-

This includes its imports, its prolog, subpipeline and steps.

+

This includes its imports, its prologue, subpipeline and steps.

Prerequisites

@@ -28,7 +28,7 @@

XProc as XML

is to say a file encoded in plain text (typically the UTF-8 serialization of Unicode, or alternatively another form of plain text supported by your system or environment), and following the rules of XML syntax. These rules include how elements and attributes and other XML features are encoded in tags - that

+ that:

  • Follow the rules with respect to naming, whitespace, delimiters and reserved characters
  • Are correctly balanced, with an end tag for every start tag – for a <start> there must @@ -41,8 +41,8 @@

    XProc as XML

    Thus they are also quickly and easily internalized, often in only a few minutes of working with XML.

    Over and above being XML, XProc has some rules of its own ...

    -

    XProc at the top XXX

    -

    XXX watch this space - has something rewritten the < characters?

    +

    XProc at the top

    +

    At the very top of an XProc file, expect to see something not unlike this:

    <p:declare-step version="3.0"
        xmlns:p="http://www.w3.org/ns/xproc"
        xmlns:ox="http://csrc.nist.gov/ns/oscal-xproc3"
    @@ -51,9 +51,9 @@ 

    XProc at the top XXX

    ... </p:declare-step>

    XProc pipelines are XML documents using the XProc vocabulary. At the top (paradoxically, we call this the - root), an XProc file will be either of p:declare-step or p:library. - XProc in this repository includes at least one p:library, and it might be nice to have more. - More on this below.

    + root), an XProc instance is identified by tagging it either of p:declare-step or + p:library. XProc in this repository includes at least one p:library, and it + might be nice to have more. More on this below.

    As noted next, the element at the top ordinarily provides namespace prefix bindings (namespace declaration attributes) along with a name and a type for the step.

    @@ -76,14 +76,14 @@

    @name and @type

    On p:declare-step, whether at the top or in a step definition within a pipeline, either or both a @name and a @type are permitted.

       type="ox:TEST-XPROC3"
    -   name="TEST-XPROC3">
    + name="TEST-XPROC3"

    The name makes it possible to reference the step by name. This is often useful and sometimes more or less essential, for example for providing input to one step from another step's output. (We say more or less essential because the processor will produce names for itself as a fallback, if it needs them, but these are brittle, somewhat opaque – such as !1.2.3 – and more difficult to use than the names a developer gives.)

    Understandably, the name of an XProc must be different from the names given to all the steps in the XProc - (which must also be distinct).

    + (which must also be distinct).

    This repository follows a rule that a step name should correspond to its file base name (i.e., without a filename suffix), so identity_ for identity_.xproc, etc. But that is a rule for us, not for XProc in general.

    @@ -93,19 +93,20 @@

    @name and @type

Prologue and body

-

Keep in mind that to build a pipeline is also to design and deploy a step, since any pipeline can be used as - a step, and any step may comprise, internally, a pipeline.

-

Since step definitions are more often out of line (in an external file) than inline (in the XProc - itself), learning XProc soon becomes an exercise in learning to use a toolkit of standard steps provided by - the standard vocabulary. The power of these steps comes not just through what they do as single operations – - which can be simple or complex – but in what they do when combined. Learning is accelerated by diving - in.

-

As described in the XProc 3.0 - specification, XProc step declarations can be divided into an initial set of elements for setup and - configuration, followed by a subpipeline, consisting of a sequence of steps to be executed – any - steps available, which could be anything. Think of the subpipeline as the working parts of the pipeline, - while the rest is all about how it is set up.

-

At a high level:

+

A pipeline will typically be a collection or sequence of steps, of arbitrary complexity. By this we mean + that any step might be simple or complex in its operations; and the sequence may be short or long, or simple + and singular or branching, multiple (with respect to inputs or outputs) and conditional. A pipeline provides + such a collection of steps with an interface, indeed (if one dare say) semantics in the form of a + specification, whether explicit or implicit, of what constitute valid inputs, expected outputs, and + recognized runtime options. This interface is defined in the pipeline's prologue. The sequence or + arrangement of steps, however long or short, constitutes the pipeline's body and serve as the + pipeline's subpipeline .

+

Think of the subpipeline as the working parts of the pipeline, while the prologue dictates how they, as a + body, are to be invoked, and with what kinds of results exposed to the calling system – that is, + what kinds of information (results) are made available for disposition (as opposed to simply being + visible as messages or result artifacts).

+

Before the prologue, of course, we see optional import declarations. Add to these any local step definitions + (not common but not unheard of), and at a high level we see these element groups:

  • Imports (optional) - configuring where the processor can find logic to be called from external pipelines: p:import, p:import-functions
  • @@ -116,19 +117,21 @@

    Prologue and body

  • Subpipeline - step invocations, connected implicitly or explicitly, with supporting variable declarations and documentation
-

The list of elements that come in the three groups before the subpipeline is short, which helps: six in - total between p:import and p:declare-step. Everything coming after is part of the +

In total, the list of elements coming before the subpipeline is short, which helps: six in total between + p:import and p:declare-step. Everything coming after is part of the subpipeline.

-

Within this set of elements (all preceding, none following the subpipeline) XProc further distinguishes - between the imports for steps and functions, appearing first (elements p:import and - p:import-functions), followed by the prologue (p:input, - p:output, p:option).

-

The prologue is used to define ports and options for the pipeline. It can be thought of as the definition of - the interface for the step as a whole. Defining ports and options is how you give the users of the step with - the affordances or control points they need to use it. If only a single input is needed, a single input port - (named source) can be defined. But some pipelines require no input bindings since the acquire - data in other ways. If your pipeline is self-contained, its prologue can be empty.

+

Imports will be discussed later, or can be reasoned about readily from examples.

+

The prologue is used to define ports and options for the pipeline. Defining + ports and options is how you give the users of the step with the affordances or control points they need to + use it. It is common and conventional to have a single input port named source as primary + input. But some pipelines require no input bindings since they acquire data in other ways. If your + pipeline is intended to be self-contained, its prologue can be empty. More commonly, ports and options are + defined, at least to provide default settings.

+

Keep in mind that just because a pipeline has no exposed ports for inputs or outputs, does not mean it does + nothing. Among other things, pipelines can read and write (or be asked to write) arbitrary resources to a + file system. Its exposed interfaces provide for functional composition: since they have inputs and outputs, + pipelines can be used as steps in pipelines. But those do not in any way preclude its operations. XProc is + not side-effect free.

Following the prologue, a step may also have local step definitions (p:declare-step). One might think of these as an XProc equivalent of a macro: these locally-defined pipelines can be used internally for logic that is used repeatedly, or that warrants separating from the main pipeline for some @@ -136,9 +139,9 @@

Prologue and body

After imports, prologue and (optional) step declarations, the step sequence that follows comprises the subpipeline.

One other complication: among the steps in the subpipeline, p:variable (a variable declaration) - and p:documentation (for out-of-band documentation) are also permitted – these are not properly + and p:documentation (for pipeline documentation) are also permitted – these are not properly steps, but can be useful to have with them.

-

In summary: any XProc pipeline, viewed as a step declaration, can have the following --

+

In summary with more detail: any XProc pipeline, viewed as a step declaration, can have the following:

If you fall into neither of these categories – welcome, and congratulations on your perseverence at least. These pages are certainly available to be read and referenced even if you are not running any of the software: as part of the resource provided (the repository as a whole), they are open to anyone who finds them to be useful, including for specialized purposes.

-

So you are also welcome to read what follows if this is your first look at XProc, or if you are not after - the runtime, only the ideas, or if you have some other use in mind (such as perhaps testing an XProc - engine?).

-

If you fall into this category, however, you should also (one last time!) consider running some of the code - after all. You might be surprised at how easy and not-so-scary it really is.

Resources

@@ -75,7 +71,7 @@

A closer look

distribution.

Essentially, these all replicate and capture the work a developer must do to identify and acquire libraries. - Maintaining our dependencies this way - not quite, but almost by hand -- appears to have benefits for + Maintaining our dependencies this way – not quite, but almost by hand – appears to have benefits for system transparency and robustness.

The second group of pipelines is a bit more interesting. Each of the utilities provided for in packages just downloaded is tested by running a smoke test.

@@ -97,9 +93,10 @@

A closer look

testing framework useful for testing deployments of XSLT, XQuery and Schematron.

Any and each of these can be used as a black box by any competent operator, even without - understanding the internals. But this simplicity masks and manages complexity. XProc is XProc but its - capabilities are limited without XSLT, XQuery, Schematron, XSpec and others, an open-ended set of compatible - and complimentary technologies.

+ understanding the internals. But this simplicity masks and manages complexity. XProc is XProc but never just + that, since its capabilities are also extended by XSLT, XQuery, Schematron, XSpec and others, an open-ended + set of compatible and complimentary technologies that are even more powerful together than they are in + particular.

At the same time, common foundations make it possible to learn these technologies together and in tandem.

@@ -108,30 +105,32 @@

Survey

Each of the test pipelines exercises a simple sequence of operations. Open any XProc file in an editor or viewer where you can see the tags. Skim this section to get only a high-level view.

The aim here is demystification. Understand the parts to understand the whole. Reading the element names - also inscribes them in memory circuits where they can be recovered.

+ also inscribes them in (metaphorical) memory circuits where they will resonate later.

-

TEST-XPROC3

-

Examine the pipeline TEST-XPROC3.xpl. It breaks down as - follows:

+

TEST-XPROC3

+

Examine the pipeline TEST-XPROC3.xpl. It is a short + chain of two steps, with one output offered. It breaks down as follows:

  • p:output – An output port is defined. It specifies that when the process results are delivered, a couple of serialization rules are followed: the text is indented and written without an XML declaration at the top. With this port, the process outputs can be captured by the calling process (such as your script), or simply echoed to the console.
  • -
  • p:identity – An identity step does nothing with its input but simply passes it - along. This one is a little different from usual in that its inputs are given as literal (XML) - contents in the pipeline. Essentially, because this pipeline has this step, it does not need to load - or rely on any inputs, because its inputs are given here. The input is a single line of XML.
  • +
  • p:identity – An identity step does nothing with its input except pass it along. + This one has its input given as a literal (XML) fragment in the pipeline. Essentially, because this + pipeline has this step, it does not need to load or rely on any inputs, because its inputs are given + here. The input is a single line of XML.
  • p:namespace-delete – A namespace-delete step is used to strip an XML namespace - definition from the document bound to the identity step. This XML inherits namespaces from the - pipeline itself, but it has no elements or attributes that use it, so the namespace is unneeded and - its declaration comes through as noise. With this step the pipeline results are clean and simple.
  • + definition from the document coming back (resulting from) the prior identity step. When nothing else + is specifically designated as such, the input of any step is assumed to be the last step's results. In + this case, our XML inherits namespaces from the pipeline itself (where it is embedded), but it has no + elements or attributes that use it, so the namespace is unneeded and its declaration comes through as + noise. With this step the pipeline results are clean and simple.

When you run this pipeline, the CONGRATULATIONS document given in line will be echoed to the console, where designated outputs will appear if not otherwise directed.

-

TEST-XSLT

+

TEST-XSLT

This pipeline executes a simple XSLT transformation, in order to test that XSLT transformations can be successfully executed.

    @@ -152,13 +151,17 @@

    TEST-XSLT

    modified by the transformation.

    If your pipeline execution can't process the XSLT (perhaps Saxon is not installed, or the XSLT itself has a problem) you will get an error saying so.

    -

    Errors in XProc are reported by the Morgana engine using XML syntax. Among other things, this means they - can be captured and processed in pipelines.

    +

    Errors in XProc are reported by the XProc engine, typically using XML syntax. (The exact format of errors + depends on the processor, and they are very much worth comparing.) Among other things, this means that + XML results (for example showing errors trapped in try/catch) can be captured and processed in + pipelines.

-

TEST-SCHEMATRON

+

TEST-SCHEMATRON

Schematron is a language used to specify rules to apply to XML documents. In this case a small Schematron - is applied to a small XML.

+ is applied to a small XML. This flexible technology enables easy testing of XML against rule sets defined + either for particular cases in particular workflows, or for entire classes or sets of documents whose + rules are defined for standards and across systems.

  • p:output – An output port is designated for the results with the same settings.
  • p:validate-with-schematron – This is an XProc step specifically for evaluating an XML @@ -166,24 +169,22 @@

    TEST-SCHEMATRON

    one presents its own input, given as a literal XML document given in the pipeline document (using p:inline). A setting on this step provides for it to throw an error if the document does not conform to the rules. The Schematron file provided as input to this step, src/doing-well.sch, gives the rules. This flexible - technology enables easy testing of XML against rule sets defined either for particular cases in - particular workflows, or for entire classes or sets of documents.
  • + href="../../../smoketest/src/doing-well.sch">src/doing-well.sch, gives the rules.
  • p:namespace-delete – This step is used here as in the other tests for final cleanup of - the information produced.
  • + the information set produced (as a namespace-qualified XML document).
-

TEST-XSPEC

+

TEST-XSPEC

XSpec is a testing framework for XSLT, XQuery and Schematron. It takes the form of a vocabulary and a process (inevitably implemented in XSLT and XQuery) for executing queries, transformations, and validations, by running them over known inputs, comparing the results to expected results, and reporting the results of this comparison. XProc, built to orchestrate manipulations of XML contents, is well suited for running XSpec.

-

An XSpec instance (or document) defines a set of tests for a transformation or query module using - the XSpec vocabulary. An XSpec implementation executes the tests and delivers the results. Since XSpec, - like Schematron, reports its findings in XML, XProc can be useful both to manage the inputs and outputs, - and to process the XSpec reports.

+

An XSpec instance (as a document in itself) defines a set of tests for a transformation or query module + using the XSpec vocabulary. An XSpec implementation executes the tests and delivers the results. Since + XSpec, like Schematron, reports its findings in XML, XProc can be useful both to manage the inputs and + outputs, and to process the XSpec reports.

  • p:import – calls to an external XProc file to make its step definitions available.
  • p:input – works as it does elsewhere, to declare inputs for the pipeline. In this case, @@ -202,9 +203,9 @@

    TEST-XSPEC

A not-so-simple pipeline

-

tl/dr - examine the Markdown file presenting an XProc - Element directory. It is generated by a - pipeline. Examine that pipeline to see XProc with real-world complexity.

+

Examine the Markdown file presenting an XProc Element + directory. It is generated by a pipeline. + Examine that pipeline to see XProc with real-world complexity.

The simple pipelines examined so far show how useful things can be done simply, while the pipeline architecture allows for great flexibility.

Simplicity and flexibility together enable complexity. Once it is factored out, a complex operation can be @@ -213,18 +214,18 @@

A not-so-simple pipeline

Next, take a look at a more complex example, the prototype pipeline PRODUCE-PROJECTS-ELEMENTLIST.xpl. Like the setup and smoke-test pipelines, this is a standalone pipeline (requiring no runtime bindings or settings): when this - plan (step or pipeline) is executed, the processor will acquires inputs, produce results and write - those results to the file system. The output it generates is stored as step or pipeline) is executed, the processor acquires inputs, produces results for its + operations, and write those results to the file system. In this case the output it generates is stored as element-directory.md, a Markdown file (find the p:store step).

The result is a reference resource encoded in Markdown: an index of XProc elements used in pipelines in this - repository. As Markdown, this result can be reposted back into the repository or viewed with any Markdown - viewing application. The index lists XProc elements, that is the core of the XProc vocabulary: for any XProc - element used anywhere among the projects listed, the listing shows the pipelines where it appears. Following - the index, the resource also shows a list of (repository) project folders in a prescribed order, with their + repository. As Markdown, once reposted back into the repository, it can be viewed with any Markdown viewing + application. The index lists XProc elements, i.e., the core of the XProc vocabulary: for any XProc element + used anywhere among the projects listed, the listing shows the pipelines where it appears. Following the + index, the resource also shows a list of (repository) project folders in a prescribed order, with their XProc files and whatever XProc elements appear first (within the entire sequence up to to point) within that file. Among other uses this is helpful for assessing coverage of tutorial lessons as it offers a - (semi) ordered survey of the elements.

+ (semi) ordered survey of their use of XProc (and other) elements.

For example, looking up p:store you can see all the pipelines that contain this common step. Or looking at the oscal-convert listing you can see the XProc steps appearing first in that project folder.

@@ -247,15 +248,17 @@

PRODUCE-PROJECTS-ELEMENTLIST

performs.
  • Part of the complexity is due to a two-step process here. First, the file system is surveyed in locations named in an input configuration. Then all those resources (which happen to be XML using the - XProc vocabulary) are indexed against the files in which they occur. The two listings are both - produced in this second analytical phase, showing how once the files are surveyed, more than one - analysis is possible.
  • + XProc vocabulary) are indexed again, this time showing only first occurrences of elements within files + given in the stipulated order. Both indexes are written into the results, showing how a single survey + can support more than one analysis.
  • In particular, the index to first use is not simple. A great deal of the complexity of detailed operations has been off-loaded into XSLT transformation code, which in this pipeline can be seen embedded in the XProc, indeed occupying the greater part of the XML in the file. (About two thirds of the element count: you can usually recognize XSLT by the conventional xsl: element prefix.) This pipeline also has an XSLT called from an external file (toward the end). XProc - can provide XSLT either way, and each has its advantages.
  • + can provide XSLT either way, and each has its advantages. In this case, XSLT is left in place as + literal embedded code, partly to show a borderline case. (It helps to use an editor or viewer with + code folding.)
  • One good thing about seeing the XSLT here is you can get a good sense of what it looks like, whether embedded or kept externally. XSLT is not essential to XProc, but it very much expands its practical capabilities.
  • @@ -263,7 +266,7 @@

    PRODUCE-PROJECTS-ELEMENTLIST

    -

    XML syntax, XPath and XProc

    +

    Respecting XML syntax, XPath and XProc

    Newcomers to XML may feel they are in the deep water with XML syntax.

    In the context of XProc, this is actually not as hard as it looks:

      @@ -276,7 +279,7 @@

      XML syntax, XPath and XProc

    • XML vocabularies are typically qualified with namespaces to show, and to disambiguate, which XML application or language they belong to. The namespaces are indicated by name prefixes. So in this repository (and conventionally for XProc), any element prefixed p: is an XProc element, - and another prefix or none indicates an extension or another vocabulary, such as appears in XML being + and another prefix or none indicates an extension or another vocabulary, such as may appear in XML being processed.
    <p:output port="result" serialization="map{'indent' : true(), 'omit-xml-declaration': true() }" />
    @@ -301,7 +304,7 @@

    Learning more about XProc

    simple steps in HTML, with code snips.

    There is a book, Erik Siegel's XProc 3.0 Programmer's Reference (2020) and an excellent - reference site by the same author.

    + reference site by the same author. Like this resource, it is generated using XProc.

    \ No newline at end of file diff --git a/tutorial/source/walkthrough/walkthrough_219_src.html b/tutorial/source/walkthrough/walkthrough_219_src.html index 2c057f7..658afc5 100644 --- a/tutorial/source/walkthrough/walkthrough_219_src.html +++ b/tutorial/source/walkthrough/walkthrough_219_src.html @@ -20,9 +20,9 @@

    Goals

    Resources

    The same pipelines you ran in setup: Setup 101.

    -

    Also, XProc.org dashboard page

    +

    Also, XProc.org dashboard page.

    Also, XProc index materials produced in this repository: XProc docs

    + >XProc docs.

    XProc as XML

    @@ -184,7 +184,7 @@

    XML and the XDM: context and rationale

    Snapshot history: an XML time line

    -

    [TODO: complete this, or move it, or both]

    +

    [TODO: complete this, or move it, or both] ...

    diff --git a/tutorial/source/walkthrough/walkthrough_301_src.html b/tutorial/source/walkthrough/walkthrough_301_src.html index e093819..a13753d 100644 --- a/tutorial/source/walkthrough/walkthrough_301_src.html +++ b/tutorial/source/walkthrough/walkthrough_301_src.html @@ -54,7 +54,7 @@

    XProc for quality testing

    The tests themselves are so far fairly rudimentary – while paying for themselves in the consistency and quality they help enforce.

    -

    Pipelines useful for the developer:

    +

    Pipelines useful for the developer:

    -

    Pipelines run under CI/CD:

    +

    Pipelines run under CI/CD

    • HARDFAIL-XPROC3-HOUSE-RULES.xpl runs a pipeline enforcing the House Rules Schematron to every XProc listed in the imported FILESET pipeline, @@ -82,7 +82,11 @@

      XProc for quality testing

    -

    Additionally:

    +

    File set listings as step declarations

    +

    These pipelines are used only as components in other pipelines, which import them. They are used to + provide central control of file listings for batching purposes (process or validation) - i.e., other + pipelines can access these steps to get to the named resources. They can also be validated externally, + for early detection of broken links.

    • FILESET_XPROC3_HOUSE-RULES.xpl provides a list of resources (documents) to be made accessible to importing pipelines
    • diff --git a/tutorial/source/walkthrough/walkthrough_401_src.html b/tutorial/source/walkthrough/walkthrough_401_src.html index 1587d45..a5152dd 100644 --- a/tutorial/source/walkthrough/walkthrough_401_src.html +++ b/tutorial/source/walkthrough/walkthrough_401_src.html @@ -190,9 +190,11 @@

      Namespaces in and for your XSLT

    Text and attribute value syntax in embedded XSLT

    -

    If not yet conversant with XSLT, you can read more about this in an If not yet conversant with XSLT, you can read more about this topic in an upcoming Lesson Unit on data - conversion.

    + conversion. Or you can avoid the problem by always using a p:document/@href to refer to XSLT + kept out of line. If you like XSLT and are prone to plant it into your XProc (it is an excellent golden + hammer), this applies to you.

    XSLT practitioners know that within XSLT, in attributes and (in XSLT 3.0) within text (as directed), the curly brace signs { and } have special semantics as attribute or Text and attribute value syntax in embedded XSLT more than one level) or settings on either language's expand-text option. Searching the repository for the string value {{ (two open curlies together) will turn up instances of this – or skip ahead and try a worksheet XProc with - XSLT embedded.

    + some XSLT embedded.

    diff --git a/tutorial/tutorial-preview.html b/tutorial/tutorial-preview.html index 40bc5aa..d252310 100644 --- a/tutorial/tutorial-preview.html +++ b/tutorial/tutorial-preview.html @@ -20,7 +20,7 @@ th { width: clamp(10em, auto, 40em) } td { width: clamp(10em, auto, 40em); border-top: thin solid grey } -section.unit { width: clamp(45ch, 50%, 75ch); padding: 0.8em; outline: thin solid black; margin: 0.6em 0em } +section.unit { width: clamp(45ch, 100%, 75ch); padding: 0.8em; outline: thin solid black; margin: 0.6em 0em } section.unit h1:first-child { margin-top: 0em } .observer { background-color: honeydew ; grid-column: 2 } .maker { background-color: seashell ; grid-column: 3 } @@ -69,7 +69,7 @@

    When running from a command line

    -

    102: Examining the setup (~749) +

    102: Examining the setup (~740)

    Goals

    @@ -93,8 +93,11 @@

    For consideration

    -

    599: Meeting XProc (~391) +

    599: Meeting XProc (~622)

    +
    +

    Goals

    +

    Resources

    @@ -112,7 +115,7 @@

    Declarative markup in action

    -

    101: Unpacking XProc 3.0 (~2680) +

    101: Unpacking XProc 3.0 (~2678)

    Goals

    @@ -132,16 +135,24 @@

    A closer look

    Survey

    -

    TEST-XPROC3

    +

    + TEST-XPROC3 +

    -

    TEST-XSLT

    +

    + TEST-XSLT +

    -

    TEST-SCHEMATRON

    +

    + TEST-SCHEMATRON +

    -

    TEST-XSPEC

    +

    + TEST-XSPEC +

    @@ -202,7 +213,7 @@

    Syntax tips

    -

    219: XProc, XML and XDM (the XML Data Model) (~2089) +

    219: XProc, XML and XDM (the XML Data Model) (~2090)

    Goals

    @@ -252,10 +263,13 @@

    Resources

    XProc for quality testing

    +

    Pipelines useful for the developer:

    +

    Pipelines run under CI/CD:

    +

    Additionally:

    About the XProc House Rules

    @@ -439,7 +453,7 @@

    Validate early and often

    -

    201: Anatomy of an XProc pipeline (~3006) +

    201: Anatomy of an XProc pipeline (~3136)

    Goals

    @@ -453,7 +467,7 @@

    Resources

    XProc as XML

    -

    XProc at the top XXX

    +

    XProc at the top

    Namespaces

    @@ -523,7 +537,7 @@

    Matching with namespace wildcard

    -

    400: What is an XProc document anyway (~1590) +

    400: What is an XProc document (~1589)

    Goals

    @@ -550,7 +564,7 @@

    Binaries and what-have-you

    -

    401: XProc, XML, JSON and content types (~296) +

    401: XProc, XML, JSON and content types (~275)

    Goals

    @@ -607,7 +621,7 @@

    -

    Courseware 219: Learn by Teaching (~395) +

    Courseware 219: Learn by Teaching (~703)

    Goals

    @@ -625,7 +639,7 @@

    Apply Schematron to your edits

    -

    Create a new lesson unit ('area')

    +

    Create a new lesson unit (area)

    Produce a new project and document it with a tutorial

    @@ -716,7 +730,7 @@

    Step One: Setup

    As noted in the docs, if you happen already to have Morgana XProc III, you do not need to download it again. Try skipping straight to the smoke tests. You can use a runtime script xp3.sh or xp3.bat as a model for your own, and adjust. Any reasonably recent - version of Morgana should function if configured correctly, and we are interested if it does not.

    + version of Morgana should function if configured correctly, and we are interested if it does not.

    Shortcut

    If you want to run through the tutorial exercises but you are unsure of how deeply you will delve, you @@ -774,20 +788,20 @@

    Comments / review

    When running from a command line

    As simple examples, these scripts show only one way of running XProc. Keep in mind that even simple - scripts can be used in more than one way.

    + scripts can be used in more than one way.

    For example, a pipeline can be executed from the project root:

    $ ./xp3.sh smoketest/TEST-XPROC3.xpl

    Alternatively, a pipeline can be executed from its home directory, for example if currently in the - smoketest directory (note the path to the script):

    + smoketest directory (note the path to the script):

    $ ../xp3.sh TEST-XPROC3.xpl
    -

    This works the same ways on Windows, with adjustments:

    +

    This works the same ways on Windows, with adjustments:

    > ..\xp3 TEST-XPROC3.xpl 

    (On Windows a bat file suffix marks it as executable and does not have to be given explicitly when called.)

    Windows users (and others to varying degrees) can set up a drag-and-drop based workflow – using your mouse or pointer, select an XProc pipeline file and drag it to a shortcut for the executable (Windows batch file). A command window opens to show the operation of the pipeline. See the README for more information.

    -

    It is important to try things out since any of these methods can be the basis of a workflow.

    +

    It is important to try things out since any of these methods can be the basis of a workflow.

    For the big picture, keep in mind that while the command line is useful for development and demonstration – and however familiar XProc itself may become to the developer – to a great number of people it remains obscure, cryptic and intimidating if not forbidding. Make yourself comfortable at the command line!

    @@ -806,8 +820,7 @@

    102: Examining the setup

    Goals

    • Look at some pipeline organization and syntax on the inside
    • -
    • Success and failure invoking XProc pipelines: an early chance to learn to die gracefully (to use the - gamers' idiom).
    • +
    • Success and failure invoking XProc pipelines: making friends with tracebacks.
    @@ -911,6 +924,11 @@

    For consideration

    id="acquire_599" data-track="learner">

    599: Meeting XProc

    +
    +

    Goals

    +

    Offer some more context; help reduce the intimidation factor.

    +

    XProc is not a simple thing, but a way in. The territory is vast, but the sky is always above us.

    +

    Resources

    @@ -920,14 +938,28 @@

    Resources

    Some observations

    Because it is now centered on pipelines as much as on files and software packages, dependency - management is different from other technologies including Java and NodeJS – how so?

    + management when using XProc is different from other technologies including Java and NodeJS – how so?

    MorganaXProc-III is implemented in Scala, and Saxon is built in Java, but otherwise distributions including the SchXSLT and XSpec distributions consist mainly of XSLT. This is either very good (with development and maintenance requirements in view), or not good at all.

    -

    Which is it, and what are the determining variables that tell you XProc is a good fit? How much of this is - due to the high-level, abstracted nature of 4GLs including both XSLT - 3.1 and XProc 3.0? Prior experience with XML-based systems and the problem domains in which they work well - is probably a factor. How much are the impediments technical, and how much are they due to culture?

    +

    If not using Morgana but another XProc engine (at time of writing, XML Calabash 3 has been published in + alpha), there will presumably be analogous arrangements: contracts between the tool and its dependencies, + software or components and capabilities bundled and unbundled.

    +

    So does this work well, on balance, and what are the determining variables that tell you XProc is a good fit + for data processing, whether high touch, or at scale? How much of this is due to the high-level, abstracted + nature of 4GLs including + both XSLT 3.1 and XProc 3.0? Prior experience with XML-based systems and the problem domains in which they + work well is probably a consideration. How much are the impediments technical, and how much are they due to + culture and perceptions?

    +

    Will it always be that a developer determined to use XSLT will find a way, whereas a developer determined + not to, will find a way to refuse it? XProc in 2024 seems slow in adoption – maybe because everyone who + would want it, already has a functional equivalent in place.

    +

    This being said, going forward the principle remains that we gain an implicit advantage when we find ways of + exploiting technology opportunities that our peers and competitors have decided to neglect. In essence, by + leaving XML, XSLT and XProc off the table, developers who choose not to use it may actually be giving easy + money to developers who are able to adopt and exploit this externality, where it works.

    +

    It's all about the tools. Find ways to support your open-source developer and the software development + operations who offer free tools and services.

    Declarative markup in action

    @@ -935,9 +967,9 @@

    Declarative markup in action

    depend, notably XProc and XSLT but not limited to these, are both nominally and actually conformant to externally specified standard technologies, i.e. XProc and XSLT respectively (as well as others), and reliant to the greatest possible extent on well-documented and accessible runtimes.

    -

    It is a tall order to ask that any code base should be both easy to integrate and use with others, and at - the same time, functionally complete and self-sufficient. Of these two, we are lucky to get one, even if we - are thoughtful enough to limit ourselves to building blocks. Because the world is complex, we are always +

    Is it too much to expect that any code base should be both easy to integrate and use with others, and at the + same time, functionally complete and self-sufficient? Of these two, we are lucky to get one, even if we are + thoughtful enough to limit ourselves to building blocks. Because the world is complex, we are always throwing in one or another new dependency, along with new rule sets. The approach enabled by XML and openly-specified supporting specifications is to work by making everything transparent as possible. We seek for clarity and transparency at all levels (so nothing is downloaded behind the scenes, for example) while @@ -1064,7 +1096,9 @@

    Survey

    The aim here is demystification. Understand the parts to understand the whole. Reading the element names also inscribes them in memory circuits where they can be recovered.

    -

    TEST-XPROC3

    +

    + TEST-XPROC3 +

    Examine the pipeline TEST-XPROC3.xpl. It breaks down as follows:

      @@ -1088,7 +1122,9 @@

      TEST-XPROC3

      console, where designated outputs will appear if not otherwise directed.

    -

    TEST-XSLT

    +

    + TEST-XSLT +

    This pipeline executes a simple XSLT transformation, in order to test that XSLT transformations can be successfully executed.

    @@ -1114,7 +1150,9 @@

    TEST-XSLT

    can be captured and processed in pipelines.

    -

    TEST-SCHEMATRON

    +

    + TEST-SCHEMATRON +

    Schematron is a language used to specify rules to apply to XML documents. In this case a small Schematron is applied to a small XML.

      @@ -1134,7 +1172,9 @@

      TEST-SCHEMATRON

    -

    TEST-XSPEC

    +

    + TEST-XSPEC +

    XSpec is a testing framework for XSLT, XQuery and Schematron. It takes the form of a vocabulary and a process (inevitably implemented in XSLT and XQuery) @@ -1169,10 +1209,9 @@

    TEST-XSPEC

    A not-so-simple pipeline

    -

    - tl/dr - examine the Markdown file presenting an XProc - Element directory. It is generated by a - pipeline. Examine that pipeline to see XProc with real-world complexity.

    +

    Examine the Markdown file presenting an XProc Element + directory. It is generated by a pipeline. + Examine that pipeline to see XProc with real-world complexity.

    The simple pipelines examined so far show how useful things can be done simply, while the pipeline architecture allows for great flexibility.

    Simplicity and flexibility together enable complexity. Once it is factored out, a complex operation can be @@ -1472,10 +1511,8 @@

    Resources

    The same pipelines you ran in setup: Setup 101.

    -

    Also, XProc.org dashboard page -

    -

    Also, XProc index materials produced in this repository: XProc docs -

    +

    Also, XProc.org dashboard page.

    +

    Also, XProc index materials produced in this repository: XProc docs.

    XProc as XML

    @@ -1666,7 +1703,7 @@

    XML and the XDM: context and rationale

    Snapshot history: an XML time line

    -

    [TODO: complete this, or move it, or both]

    +

    [TODO: complete this, or move it, or both] ...

    @@ -2109,7 +2146,7 @@

    XProc for quality testing

    The tests themselves are so far fairly rudimentary – while paying for themselves in the consistency and quality they help enforce.

    -

    Pipelines useful for the developer:

    +

    Pipelines useful for the developer:

    -

    Pipelines run under CI/CD:

    +

    Pipelines run under CI/CD:

    -

    Additionally:

    +

    Additionally:

    • FILESET_XPROC3_HOUSE-RULES.xpl provides @@ -2666,7 +2703,8 @@

      What could possibly go wrong?

      and JSON Schema, making it straightforward to enforce this dependency at any point in a pipeline, whether applied to inputs or to pipeline results including interim results and pipeline outputs. Resource validation - is described further in subsequent coverage including the next + is described further in subsequent coverage including the next Maker lesson unit.

      The playing field is the Internet

      @@ -3028,7 +3066,7 @@

      201: Anatomy of an XProc pipeline

      Goals

      Get more in-depth information about XProc internals, including especially the parts of an XProc pipeline step, as a step.

      -

      This includes its imports, its prolog, subpipeline and steps.

      +

      This includes its imports, its prologue, subpipeline and steps.

      Prerequisites

      @@ -3046,7 +3084,7 @@

      XProc as XML

      is to say a file encoded in plain text (typically the UTF-8 serialization of Unicode, or alternatively another form of plain text supported by your system or environment), and following the rules of XML syntax. These rules include how elements and attributes and other XML features are encoded in tags - that

      + that:

      • Follow the rules with respect to naming, whitespace, delimiters and reserved characters
      • Are correctly balanced, with an end tag for every start tag – for a <start> there must @@ -3059,8 +3097,8 @@

        XProc as XML

        Thus they are also quickly and easily internalized, often in only a few minutes of working with XML.

        Over and above being XML, XProc has some rules of its own ...

        -

        XProc at the top XXX

        -

        XXX watch this space - has something rewritten the < characters?

        +

        XProc at the top

        +

        At the very top of an XProc file, expect to see something not unlike this:

        <p:declare-step version="3.0"
            xmlns:p="http://www.w3.org/ns/xproc"
            xmlns:ox="http://csrc.nist.gov/ns/oscal-xproc3"
        @@ -3069,9 +3107,9 @@ 

        XProc at the top XXX

        ... </p:declare-step>

        XProc pipelines are XML documents using the XProc vocabulary. At the top (paradoxically, we call this the - root), an XProc file will be either of p:declare-step or p:library. - XProc in this repository includes at least one p:library, and it might be nice to have more. - More on this below.

        + root), an XProc instance is identified by tagging it either of p:declare-step or + p:library. XProc in this repository includes at least one p:library, and it + might be nice to have more. More on this below.

        As noted next, the element at the top ordinarily provides namespace prefix bindings (namespace declaration attributes) along with a name and a type for the step.

        @@ -3095,14 +3133,14 @@

        @name and @type

        On p:declare-step, whether at the top or in a step definition within a pipeline, either or both a @name and a @type are permitted.

           type="ox:TEST-XPROC3"
        -   name="TEST-XPROC3">
        + name="TEST-XPROC3"

        The name makes it possible to reference the step by name. This is often useful and sometimes more or less essential, for example for providing input to one step from another step's output. (We say more or less essential because the processor will produce names for itself as a fallback, if it needs them, but these are brittle, somewhat opaque – such as !1.2.3 – and more difficult to use than the names a developer gives.)

        Understandably, the name of an XProc must be different from the names given to all the steps in the XProc - (which must also be distinct).

        + (which must also be distinct).

        This repository follows a rule that a step name should correspond to its file base name (i.e., without a filename suffix), so identity_ for identity_.xproc, etc. But that is a rule for us, not for XProc in general.

        @@ -3112,19 +3150,20 @@

        @name and @type

      Prologue and body

      -

      Keep in mind that to build a pipeline is also to design and deploy a step, since any pipeline can be used as - a step, and any step may comprise, internally, a pipeline.

      -

      Since step definitions are more often out of line (in an external file) than inline (in the XProc - itself), learning XProc soon becomes an exercise in learning to use a toolkit of standard steps provided by - the standard vocabulary. The power of these steps comes not just through what they do as single operations – - which can be simple or complex – but in what they do when combined. Learning is accelerated by diving - in.

      -

      As described in the XProc 3.0 - specification, XProc step declarations can be divided into an initial set of elements for setup and - configuration, followed by a subpipeline, consisting of a sequence of steps to be executed – any - steps available, which could be anything. Think of the subpipeline as the working parts of the pipeline, - while the rest is all about how it is set up.

      -

      At a high level:

      +

      A pipeline will typically be a collection or sequence of steps, of arbitrary complexity. By this we mean + that any step might be simple or complex in its operations; and the sequence may be short or long, or simple + and singular or branching, multiple (with respect to inputs or outputs) and conditional. A pipeline provides + such a collection of steps with an interface, indeed (if one dare say) semantics in the form of a + specification, whether explicit or implicit, of what constitute valid inputs, expected outputs, and + recognized runtime options. This interface is defined in the pipeline's prologue. The sequence or + arrangement of steps, however long or short, constitutes the pipeline's body and serve as the + pipeline's subpipeline .

      +

      Think of the subpipeline as the working parts of the pipeline, while the prologue dictates how they, as a + body, are to be invoked, and with what kinds of results exposed to the calling system – that is, + what kinds of information (results) are made available for disposition (as opposed to simply being + visible as messages or result artifacts).

      +

      Before the prologue, of course, we see optional import declarations. Add to these any local step definitions + (not common but not unheard of), and at a high level we see these element groups:

      • Imports (optional) - configuring where the processor can find logic to be called from external pipelines: p:import, p:import-functions @@ -3137,27 +3176,30 @@

        Prologue and body

      • Subpipeline - step invocations, connected implicitly or explicitly, with supporting variable declarations and documentation
      -

      The list of elements that come in the three groups before the subpipeline is short, which helps: six in - total between p:import and p:declare-step. Everything coming after is part of the +

      In total, the list of elements coming before the subpipeline is short, which helps: six in total between + p:import and p:declare-step. Everything coming after is part of the subpipeline.

      -

      Within this set of elements (all preceding, none following the subpipeline) XProc further distinguishes - between the imports for steps and functions, appearing first (elements p:import and - p:import-functions), followed by the prologue (p:input, - p:output, p:option).

      -

      The prologue is used to define ports and options for the pipeline. It can be thought of as the definition of - the interface for the step as a whole. Defining ports and options is how you give the users of the step with - the affordances or control points they need to use it. If only a single input is needed, a single input port - (named source) can be defined. But some pipelines require no input bindings since the acquire - data in other ways. If your pipeline is self-contained, its prologue can be empty.

      +

      Imports will be discussed later, or can be reasoned about readily from examples.

      +

      The prologue is used to define ports and options for the pipeline. Defining + ports and options is how you give the users of the step with the affordances or control points they need to + use it. It is common and conventional to have a single input port named source as primary + input. But some pipelines require no input bindings since they acquire data in other ways. If your + pipeline is intended to be self-contained, its prologue can be empty. More commonly, ports and options are + defined, at least to provide default settings.

      +

      Keep in mind that just because a pipeline has no exposed ports for inputs or outputs, does not mean it does + nothing. Among other things, pipelines can read and write (or be asked to write) arbitrary resources to a + file system. Its exposed interfaces provide for functional composition: since they have inputs and outputs, + pipelines can be used as steps in pipelines. But those do not in any way preclude its operations. XProc is + not side-effect free.

      Following the prologue, a step may also have local step definitions (p:declare-step). One might think of these as an XProc equivalent of a macro: these locally-defined pipelines can be used internally for logic that is used repeatedly, or that warrants separating from the main pipeline for some other reason.

      After imports, prologue and (optional) step declarations, the step sequence that follows comprises the subpipeline.

      One other complication: among the steps in the subpipeline, p:variable (a variable declaration) - and p:documentation (for out-of-band documentation) are also permitted – these are not properly + and p:documentation (for pipeline documentation) are also permitted – these are not properly steps, but can be useful to have with them.

      -

      In summary: any XProc pipeline, viewed as a step declaration, can have the following --

      +

      In summary with more detail: any XProc pipeline, viewed as a step declaration, can have the following:

      • Pipeline name and type assignment (if needed), given as attributes at the top
      • @@ -3204,10 +3246,10 @@

        XProc steps

        step (a pipeline can be an interface or wrapper to another pipeline, or to a simple operation), or a long sequence or complex choreography of steps. In either case it can become a relatively self-contained black box process available to other processes.

        -

        Accommodating this design, an XProc file considered as an XML instance is either of two things: a - step declaration, or a collection of such declarations, a library. At the top level, - recognize an XProc step declaration by the element, p:declare-step (in the XProc namespace) and - a library by the element p:library.

        +

        Accommodating this design, an XProc file considered as an XML instance (as noted) is either of two + things: a step declaration, or a collection of such declarations, a library. At the top level + (as noted), recognize an XProc step declaration by the element, p:declare-step (in the XProc + namespace) and a library by the element p:library.

        <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0" 
             name="a-first-step">
         ...
        @@ -3220,7 +3262,8 @@ 

        XProc steps

        The advantage of defining a step at the top level, rather than putting all steps into libraries, is that such a step can be invoked without prior knowledge of its type name, which is used by XProc to distinguish it from other steps. The pipeline simply needs to be presented to the processor, which does the rest. Your - library of steps then looks very similar to your directory full of XProc files.

        + library of steps then looks very similar to your directory full of XProc files – and can be treated, where + appropriately, as self-contained scripts encapsulating everything they need for a given runtime.

        Implications of XProc as XML

        Like any language using XML syntax, XProc depends on a conceptual relation between primitive constructs @@ -3267,18 +3310,19 @@

        Atomic and compound steps

        Given an understanding of the organization of an XProc pipeline, the focus shifts to the steps themselves, which follow a common pattern. Briefly put, an atomic step is any step you use by simply invoking it with inputs and options: its logic is self-contained, and the operation it carries out is (at least conceptually) - single and unified. A compound step, in contrast, combines one or more other steps in a subpipeline - and manages these together through a single interface.

        + single and unified. A compound step, in contrast, combines one or more other steps in its own + subpipelines and manages these together through a single interface, while providing – unlike an + atomic step -- some other functionality depending on the step.

        XProc keeps things workable by providing only a few compound steps supporting the identified range of needs. This does not prove to be a practical limitation, since all steps including atomic steps can have multiple inputs and outputs, distinguished by type and role. (For example, a validation step might output both a copy of the input, potentially annotated, along with a validation report.) Atomic steps are not necessarily - simple, and may include compound steps in their own subpipelines, either externally or even within the same - step declaration. Accordingly, compound steps are not necessarily more complex than atomic steps: they are - useful because they handle common contingencies such as splicing (with p:viewport), splitting - (with p:for-each, perform an operation in parallel over a set of inputs, not a single - document), conditionals (p:if, p:choose) and exception handling - (p:try with p:catch and p:finally).

        + simple, and may include compound steps in their own subpipelines. All steps you define will be atomic steps. + Accordingly, compound steps are not necessarily more complex than atomic steps: they are useful because they + handle common contingencies such as splicing (with p:viewport), splitting (with + p:for-each, perform an operation in parallel over a set of inputs, not a single document), + conditionals (p:if, p:choose) and exception handling (p:try with + p:catch and p:finally).

        Here are all the compound steps. All others are atomic steps.

        • @@ -3349,7 +3393,7 @@

          Namespaces and extension steps

          ox is bound to a utility namespace, http://csrc.nist.gov/ns/oscal-xproc3.

          In an XProc pipeline (library or step - declaration) one may also see additional namespaces, including

          + declaration) one may also see additional namespaces, including:

          • The namespaces needed for XSLT, XSD, or other supported technology
          • One or more namespaces deployed by the XProc author to support either steps or internal operations @@ -3436,7 +3480,7 @@

            Matching with namespace wildcard

            -

            400: What is an XProc document anyway

            +

            400: What is an XProc document

            Goals

            Learn how XProc seeks to process just about any kind of digital information.

            @@ -3614,7 +3658,7 @@

            Binaries and what-have-you

            401: XProc, XML, JSON and content types

            Goals

            -

            Understand a little more about JSON and other data formats in an XML processing environment

            +

            Understand a little more about JSON and other data formats in an XML processing environment.

            Resources

            @@ -3622,9 +3666,6 @@

            Resources

            trying out content-type options on XProc inputs and outputs.

            The pipeline READ-JSON-TESTING.xpl provides an experimental surface for working functionality specifically related to JSON and XDM map objects.

            -

            Find more treatment in the next lesson unit. - This is a topic you can also learn by through trial and error.

            Exercise some options

            @@ -3648,7 +3689,7 @@

            Exercise some options

            READ-JSON-TESTING.xpl is a sandbox for playing with JSON objects as XDM maps. The content-types - worksheet is set up for trying content-type options on inputs and outputs.

            + worksheet is set up for trying content-type options on inputs and outputs.

            @@ -3661,9 +3702,9 @@

            Exercise some options

            Courseware 101: Producing this tutorial

            Goals

            -

            Understand better how this tutorial is produced

            +

            Understand better how this tutorial is produced.

            See an example of a small but lightweight and scalable publishing system can be implemented in XProc and - XSLT

            + XSLT.

            Prerequisites

            @@ -3738,7 +3779,7 @@

            Courseware 219: Learn by Teaching

            Goals

            Help yourself, your team and allies.

            Produce a useful spin-off from a task or problem you need to master anyway.

            -

            Learn not only by doing but by writing it down for yourself and others

            +

            Learn not only by doing but by writing it down for yourself and others.

            Prerequisites

            @@ -3748,7 +3789,7 @@

            Prerequisites

            Writing HTML by hand can be arduous; accordingly, for producing tutorial pages we have used a structured XML authoring environment (oXygen XML Author) with many features including styling in display; full control over styling; Schematron rules in the background along with UI support for content corrections according to those - rules; etc.

            + rules; etc. oXygen Author or its functional equivalent is highly recommended.

            However, any text editor or programmers' coding environment also works (to whatever extent generic HTML is supported), and Schematrons applied to HTML files can be run in XProc (as described).

            @@ -3763,25 +3804,50 @@

            Resources

        Improve or enhance a lesson or lesson unit

        -

        Astute readers will have observed that a markup-based deployment invites editing. But the authoring or data - acquisition model of this tutorial is not Markdown-based - Markdown is paradoxically not used for its - intended purpose but as one of several publication formats for this data set, which is currently - written in an XML-based HTML5 tag set defined for the project. By writing, querying and indexing in XHTML we - can use XProc from the start. Extensibility and flexibility in publication is one of the strengths - to - publish a new or rearranged tutorial sequence can be done with a few lines and commands. A drag and drop - interface supporting XProc makes this even easier, while it is already installed and running under CI/CD, - meaning both editorial and code quality checks can be done with every commit.

        -

        Improving a page is as simple as editing the copy found in XXX and XXX

        -

        Making and deploying a new pages is a little harder: XXX

        +

        Alert readers will have observed that a Markdown-based deployment invites editing. But the authoring or data + acquisition model of this tutorial is not Markdown-based - Markdown is paradoxically not used for its usual + purpose, but as one of several publication formats for this data set, which is currently written in an + XML-based HTML5 tag set defined for the project. By writing, querying and indexing in XHTML we can use XProc + from the start. The HTML dialect means things mostly just work using HTML tools such as web browsers, + while we have transformations (provided in pipelines) to render into any other publication format we may + happen to need: Markdown being only one of a range of choices.

        +

        Extensibility and flexibility is one of the strengths of this approach: to publish a new or rearranged + tutorial sequence can be done with a few lines and commands. A drag and drop interface supporting XProc + makes this even easier, while again XProc is already supported under CI/CD, meaning both editorial and code + quality checks can be done with every commit by simply listing the appropriate pipeline with others.

        +

        The workflow supporting this publication model is simple. A set of lessons (lesson units) is gathered + together in a single folder. That folder is listed in a directory file. + To publish the tutorial, an XProc pipeline is executed that polls these directories and produces Markdown + files corresponding to the inputs, only in a new sequence with links rewritten. Other pipelines can be run + to update the directory to lesson units and other higher-level production such as a single-page HTML reading + preview version. See the earlier treatment for more + details.

        +

        Improving a page is as simple as editing the HTML source in the folder. Adding a page is as simple as + copying and altering a page in place, and seeing to it the new page is valid to both HTML5 and project + requirements.

        Apply Schematron to your edits

        +

        + A Schematron for lesson unit files also ensures that links are in + place and project conventions are observed.

        +

        You will be grateful to do this interactively in an editor that supports Schematron in the + background.

        -

        Create a new lesson unit ('area')

        +

        Create a new lesson unit (area)

        +

        Create a new folder and add it to the tutorial lesson plan XML.

        +

        When production pipelines are run, all HTML files present in the newly-listed folders will be included to + the tutorial as lesson units. Within the folder they will be listed alphabetically. Be sure that your new + HTML is also Schematron-valid.

        Produce a new project and document it with a tutorial

        +

        You could make a new lesson topic with lessons on your own new project.

        +

        This repository is provided with a project template to + make it easier to get started with new pipelines for new applications. Start a new project by copying this + folder into the projects folder, renaming it, and proceeding to edit its file contents.

    diff --git a/tutorial/worksheets/CONTENT-TYPE_worksheet.xpl b/tutorial/worksheets/CONTENT-TYPE_worksheet.xpl index 117d79a..82989fa 100644 --- a/tutorial/worksheets/CONTENT-TYPE_worksheet.xpl +++ b/tutorial/worksheets/CONTENT-TYPE_worksheet.xpl @@ -44,11 +44,6 @@ - - - -