-
Notifications
You must be signed in to change notification settings - Fork 3
/
ypathspec.html
608 lines (608 loc) · 72.1 KB
/
ypathspec.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en-AU" xml:lang="en-AU" xmlns="http://www.w3.org/1999/xhtml">
<!-- This file was converted to xhtml by Writer2xhtml ver. 1.1.7. See http://writer2latex.sourceforge.net for more info. -->
<head profile="http://dublincore.org/documents/2008/08/04/dc-html/">
<title></title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="" name="description" />
<meta content="" name="keywords" />
<link href="http://purl.org/dc/elements/1.1/" rel="schema.DC" />
<meta content="" name="DC.title" />
<meta content="" name="DC.subject" />
<meta content="" name="DC.description" />
<meta content="Tim Murphy" name="DC.creator" />
<meta content="2011-10-12T22:21:17" name="DC.date" />
<meta content="en" name="DC.language" />
<style media="all" type="text/css">
body {font-family:'Times New Roman',serif;font-size:16.0px}
span.BookTitle {font-variant:small-caps;font-weight:bold;letter-spacing:1.0px}
span.SourceText {font-family:'Bitstream Vera Sans Mono',monospace}
span.Captioncharacters {}
a:link {color:#0000ff;text-decoration:underline}
p.Standard {margin-left:0;margin-right:0;margin-top:0;margin-bottom:0;border:none;padding:0;font-size:16.0px}
p.Subtitle {margin-left:0;margin-right:0;margin-top:0;margin-bottom:0;border:none;padding:0;text-align:center;font-family:Cambria,serif;font-style:italic;font-size:16.0px;color:#4f81bd;letter-spacing:1.0px}
p.ListParagraph {margin-left:48.0px;margin-right:0;margin-top:0;margin-bottom:0;border:none;padding:0;text-indent:0;font-size:16.0px}
p.Title {margin-left:0;margin-right:0;margin-top:0;margin-bottom:19.9937px;border-top:none;border-bottom:1.3228347px solid #4f81bd;border-left:none;border-right:none;padding-top:0;padding-bottom:5.329134px;padding-left:0;padding-right:0;text-align:center;font-family:Cambria,serif;font-weight:bold;font-size:34.666668px;color:#17365d;letter-spacing:1.0px}
p.Contents3 {margin-left:37.71969px;margin-right:0;margin-top:0;margin-bottom:0;border:none;padding:0;text-indent:0;font-size:16.0px}
p.ContentsHeading {margin-left:0;margin-right:0;margin-top:32.012596px;margin-bottom:0;border:none;padding:0;font-family:Cambria,serif;font-weight:bold;font-size:21.333334px;color:#365f91}
p.Textbody {margin-left:0;margin-right:0;margin-top:0;margin-bottom:8.012598px;border:none;padding:0;font-size:16.0px}
p.Contents2 {margin-left:14.664569px;margin-right:0;margin-top:0;margin-bottom:6.6519685px;border:none;padding:0;text-indent:0;font-size:16.0px}
p.Contents1 {margin-left:0;margin-right:0;margin-top:0;margin-bottom:6.6519685px;border:none;padding:0;text-indent:0;font-size:16.0px}
p.PreformattedText {margin-left:0;margin-right:0;margin-top:0;margin-bottom:0;border:none;padding:0;font-family:'Bitstream Vera Sans Mono',monospace;font-size:13.333334px}
h1 {margin-left:0;margin-right:0;margin-top:32.012596px;margin-bottom:0;border:none;padding:0;font-family:Cambria,serif;font-weight:bold;font-size:18.666668px;color:#365f91;clear:left}
h2 {margin-left:0;margin-right:0;margin-top:13.341732px;margin-bottom:0;border:none;padding:0;font-family:Cambria,serif;font-weight:bold;font-size:17.333334px;color:#4f81bd;clear:left}
h3 {margin-left:0;margin-right:0;margin-top:13.341732px;margin-bottom:0;border:none;padding:0;font-family:Cambria,serif;font-weight:bold;font-size:16.0px;color:#4f81bd;clear:left}
.listlevel1WWNum5 {margin-top:0;margin-bottom:0;list-style-type:decimal;clear:left}
.listlevel2WWNum5 {margin-top:0;margin-bottom:0;list-style-type:lower-alpha;clear:left}
.listlevel3WWNum5 {margin-top:0;margin-bottom:0;list-style-type:lower-roman;clear:left}
.listlevel4WWNum5 {margin-top:0;margin-bottom:0;list-style-type:decimal;clear:left}
.listlevel5WWNum5 {margin-top:0;margin-bottom:0;list-style-type:lower-alpha;clear:left}
.listlevel6WWNum5 {margin-top:0;margin-bottom:0;list-style-type:lower-roman;clear:left}
.listlevel7WWNum5 {margin-top:0;margin-bottom:0;list-style-type:decimal;clear:left}
.listlevel8WWNum5 {margin-top:0;margin-bottom:0;list-style-type:lower-alpha;clear:left}
.listlevel9WWNum5 {margin-top:0;margin-bottom:0;list-style-type:lower-roman;clear:left}
.listlevel1WWNum4 {margin-top:0;margin-bottom:0;list-style-type:disc;clear:left}
.listlevel2WWNum4 {margin-top:0;margin-bottom:0;list-style-type:bullet;clear:left}
.listlevel3WWNum4 {margin-top:0;margin-bottom:0;list-style-type:square;clear:left}
.listlevel4WWNum4 {margin-top:0;margin-bottom:0;list-style-type:disc;clear:left}
.listlevel5WWNum4 {margin-top:0;margin-bottom:0;list-style-type:bullet;clear:left}
.listlevel6WWNum4 {margin-top:0;margin-bottom:0;list-style-type:square;clear:left}
.listlevel7WWNum4 {margin-top:0;margin-bottom:0;list-style-type:disc;clear:left}
.listlevel8WWNum4 {margin-top:0;margin-bottom:0;list-style-type:bullet;clear:left}
.listlevel9WWNum4 {margin-top:0;margin-bottom:0;list-style-type:square;clear:left}
.frameGraphics {margin-left:0;margin-right:0;margin-top:0;margin-bottom:0;border:none;padding:0;font-size:16.0px}
.frameGraphics p {margin-left:0;margin-right:0;margin-top:0;margin-bottom:0;font-size:16.0px}
</style>
</head>
<body dir="ltr">
<p class="Title" dir="ltr">YAML Path Language (YPath)</p>
<p class="Subtitle" dir="ltr">Version 0.2</p>
<p class="Standard" dir="ltr">Copyright © Peter Murphy 2011 <a href="mailto:[email protected]">[email protected]</a></p>
<p class="Standard" dir="ltr"><span class="BookTitle">Abstract</span></p>
<p class="Standard" dir="ltr">YPath is a language for addressing parts of a YAML document. </p>
<p class="Standard" dir="ltr"><span class="BookTitle">Status of this document</span></p>
<p class="Standard" dir="ltr">This document is a working draft. The content of this specification will be subject to change, especially as a response to user feedback on the yaml-core mailing list. All comments and criticism are encouraged, especially by implementers.</p>
<p class="Standard" dir="ltr"> </p>
<div id="TableofContents1">
<p class="ContentsHeading" dir="ltr">Table of Contents</p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc0">Introduction</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc1">Goals</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc2">YPath should be able to address any part of a YAML document</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc3">YPath should be as simple as possible</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc4">YPath should be intuitive to write and read</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc5">Data is more important than its expression</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc6">YPath should not be restrictive </a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc7">YPath results can be represented as YAML documents</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc8">YAML streams are not YAML documents, but YPath should support them anyway</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc9">Well defined YPath expressions should never throw errors on well defined YAML documents</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc10">Prerequisites</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc11">Prior Art</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc12">Relation to XPath</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc13">Similarities to XPath</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc14">Differences from XPath</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc15">Implementation</a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc16"></a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc17">Preview</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc18">The problem domain: musician data</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc19">Observations</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc20">Using YPath</a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc21">YPath Components</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc22">Document Matching</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc23">Value Uniqueness</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc24">Location Paths</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc25">Location Steps</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc26">Axis</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc27">Tag Predicates</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc28">Other Predicates</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc29">Set Operators in YPath</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc30">Abbreviated Steps</a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc31"></a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc32">Fundamentals of YPath</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc33">YAML Processing</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc34">YPath Processing</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc35">The Representation Graph Model</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc36">Representation Graphs</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc37">Node Properties</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc38">Misconceptions</a></p>
<p class="Contents3" dir="ltr"><span class="SectionNumber"></span><a href="#toc39">Limitations of the Representation Graph model</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc40">Evaluating YPath</a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc41"></a></p>
<p class="Contents1" dir="ltr"><span class="SectionNumber"></span><a href="#toc42">YPath Expressions</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc43">YPath character set</a></p>
<p class="Contents2" dir="ltr"><span class="SectionNumber"></span><a href="#toc44"></a></p>
</div>
<p class="Standard" dir="ltr"> </p>
<h1 dir="ltr" id="toc0"><a id="RefHeading12571012985828"></a>I<a id="Toc300681969"></a>ntroduction</h1>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath is a language designed to address parts of a YAML document. The name of “YPath” is inspired by XPath: a URL-like path notation designed for finding information inside an XML document. </p>
<p class="Standard" dir="ltr">Like XPath, YPath operates on the abstract structure of documents, rather than its surface syntax. In particular, YPath acts on the representation graph of a YAML document, where data is represented as a rooted, connected, directed graph of nodes. Nodes can represent scalars objects such as strings or integers; nodes can also represent collections such as sequences or mappings, which in turn reference other nodes.</p>
<p class="Standard" dir="ltr">YPath expressions specify patterns for matching nodes. By taking an initial node as an argument, it returns a set of nodes (possibly including itself) based on whether they match the pattern or not.</p>
<h2 dir="ltr" id="toc1"><a id="RefHeading12591012985828"></a>Goals</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">There are several goals for the YPath specification, and they are listed here in no particular order.</p>
<h3 dir="ltr" id="toc2"><a id="RefHeading12611012985828"></a>YPath should be able to address any part of a YAML document</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">All parts of a YAML document are represented in its representation graph, and YPath should be able to access any node inside it. Moreover, it should be possible to write YPath expressions that access exactly one node inside it chosen by the user. </p>
<p class="Standard" dir="ltr">One difference between YAML and most other data formats is that it can represent composite keys inside mappings. YPath should be able to access the node for such keys, and it should be able to access the node for its matching value inside the mapping. Finally, YPath should be able to access any node within composite keys.</p>
<h3 dir="ltr" id="toc3"><a id="RefHeading12631012985828"></a>YPath should be as simple as possible</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Too many specifications become bloated as multiple parties add their own desired features to it. In contrast, YPath is a language for accessing parts of a YAML document, and only parts of a YAML document. Features can be added, but only if they aid users in the primary goal: addressing any part of a YAML document. </p>
<h3 dir="ltr" id="toc4"><a id="RefHeading12651012985828"></a>YPath should be intuitive to write and read</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath should be based on concepts that are familiar to most users, and should use the most appropriate syntax for these concepts. For example, the forward slash character “/” is used for indicating a descent of one layer into the data structure – the same purpose it is used in XPath. <span style="font-style:italic"> </span></p>
<h3 dir="ltr" id="toc5"><a id="RefHeading12671012985828"></a>Data is more important than its expression</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The same data may be expressed several different ways in a YAML document. For example, a string could theoretically be expressed in five different ways. YPath users would be more interested in extracting the data into a format they control than worry about the indentation levels used to represent it. For that reason, YPath should ignore presentation and concern itself with the data itself. The representation graph is already part of the YAML specification, so YPath might as well utilise it. </p>
<h3 dir="ltr" id="toc6"><a id="RefHeading12691012985828"></a>YPath should not be restrictive </h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath expressions should be based on the full set of data in the representational graph model. For example, all nodes have a tag, so expressions should be able to target particular nodes based on that tag. In addition, regular expressions should be a part of the language; simple equality checking would not be flexible enough for many users. </p>
<h3 dir="ltr" id="toc7"><a id="RefHeading12711012985828"></a>YPath results can be represented as YAML documents</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath returns node-sets, which reference a subset of the original representation graph. By adding a node representing the node-set, with references to its nodes inside it, it constructs a new representation graph. This can be serialized and presented as a YAML document. Alternately, it can be constructed as a native structure. </p>
<h3 dir="ltr" id="toc8"><a id="RefHeading12731012985828"></a>YAML streams are not YAML documents, but YPath should support them anyway</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">A YAML stream may consist of several YAML documents, and YAML processors act on streams. However, a YPath expression acts on representation graph of a document. The YPath syntax should allow users to address particular documents in a stream. </p>
<h3 dir="ltr" id="toc9"><a id="RefHeading12751012985828"></a>Well defined YPath expressions should never throw errors on well defined YAML documents</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath expressions which contain syntax errors or are otherwise ill-defined cannot be used for processing. However, all well-defined YPath expressions can be used to parse arbitrary YAML documents, as long as they are well-defined in turn. The result may be an empty node set, but this is not an error according to the specification. </p>
<p class="Standard" dir="ltr">There are many components available in the YPath specification that are designed for specific types of data. For example, numerical comparison predicates such as "less than m" and "greater than n"evaluate nodes representing integers or floats. However, the results of these predicates are only nodes that satisfy these comparisons. Comparison predicates acting on a node representing a string return nothing, rather than throwing an error and failing. This gives YPath expressions authors a lot more freedom in writing. They can concentrate on accessing the data they want, without pre-emptively adding filters to prevent errors.</p>
<h2 dir="ltr" id="toc10"><a id="RefHeading12771012985828"></a>Prerequisites</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The only information necessary to understand this specification is the YAML specification (1.0, 1.1, or 1.2) and Unicode. Understanding of XML and XPath may be helpful but is not essential.</p>
<h2 dir="ltr" id="toc11"><a id="RefHeading12791012985828"></a>Prior Art</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath uses many concepts found in YAML, such as its flow styles used to represent mappings and sequences. YPath also uses the same escape sequences that were "borrowed" from the C language.</p>
<p class="Standard" dir="ltr">YPath permits users to search and match strings using regular expressions. This method was first used by Ken Thompson for the text editor ed and the text-search utility grep. Regular expressions are now available in many different programming languages. YPath tries to present the most commonly supported parts of the regular expression syntax.</p>
<p class="Standard" dir="ltr">YPath uses many concepts first popularised in XPath, such as the “axis-node test-predicate” model for location steps. Some parts of the syntax, such as the use of "/" for descent, are re-used here. </p>
<h2 dir="ltr" id="toc12"><a id="Toc300681970"></a><a id="RefHeading12811012985828"></a>Relation to XPath</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Despite the copious borrowing of concepts from the former to the latter, there is no direct relationship between XPath and YPath. However, many readers will be familiar with the XPath standard, and thus would be interested in a comparison with YPath.</p>
<h3 dir="ltr" id="toc13"><a id="RefHeading12831012985828"></a>Similarities to XPath</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">XPath acts on XML, which is a document-centric text format. The main ingredients of an XML document are elements, attributes and text, plus possibly processing instructions. XPath provides a mechanism for expressions to address these parts, and models the XML document as a tree of nodes, including element nodes, attribute nodes and text nodes. </p>
<p class="Standard" dir="ltr">In contrast, YAML is a data-centric text format, where the three primitives are mappings, sequences and scalars. YPath is designed to address these primitives, and models the document as a representation graph. </p>
<p class="Standard" dir="ltr">Never the less, the philosophy for both XPath and YPath is roughly the same: use expressions that reference the abstract, logical structure of a file, rather than the actual file positions of data within it. </p>
<p class="Standard" dir="ltr"> </p>
<h3 dir="ltr" id="toc14"><a id="RefHeading12851012985828"></a>Differences from XPath</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">However, there are significant differences between XPath and YPath.</p>
<ol class="listlevel1WWNum5">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations and XPointer, two other specifications associated with XML. In contrast, YPath has no dependency or interaction with any other specification, apart from YAML.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">XPath (version 1.0) expressions can be evaluated to yield four types of objects: node sets, Booleans, numbers and strings. This information can then be used within XSLT. In contrast, YPath expressions only yield node sets. For similar reasons, YPath lacks the arithmetic operators, casting functions and variables present in XPath.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">An XML file contains exactly one XML document, and XPath acts on that document. In contrast, a YAML stream may contain many YAML documents. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">XPath classifies the root node of a document separately from the document element. In contrast, there is no specific classification for document roots in YAML. The root of a YAML document can be a mapping, a sequence or a scalar, and is classified as one of these types.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">XML elements and attributes have names, and XML elements may be assigned one or more namespaces (which are also represented as nodes in the XPath data model). In contrast, YAML nodes do not have names, and thus are not assigned to a namespace. YAML tag values may have namespaces, but this data is contained in the node for the tag, rather than treated independently. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">XPath expressions can be used to reference XML comments. YAML files have comments, but are not part of the representation graph, and are ignored by YPath expressions. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">The nodes in an XML document have a clear hierarchical relationship between them. For example, one element can contain another or the reverse, but it is impossible for two elements to contain each other at the same time. In contrast, it is possible for nodes to contain themselves through the use of anchors and aliases. This makes it easy to map “child” or “descendant” relationships, but less easy to find “parent” or “ancestor” relationships. For this reason, YPath does not support these types of axes.</p>
</li>
</ol>
<p class="ListParagraph" dir="ltr" style="margin-left:0;margin-right:0;text-indent:0"> </p>
<h2 dir="ltr" id="toc15"><a id="Toc300681971"></a><a id="RefHeading12871012985828"></a>Implementation</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">There is no implementation of the YPath specification. A limited implementation can be found here:</p>
<p class="Standard" dir="ltr">http://pyyaml.org/browser/trunk/yaml/ypath.py?rev=71</p>
<p class="Standard" dir="ltr">http://pyyaml.org/browser/trunk/TestingSuite/ypath.yml?rev=71 </p>
<h1 dir="ltr" id="toc16"></h1>
<h1 dir="ltr" id="toc17" style="page-break-before:always"><a id="RefHeading12891012985828"></a>Preview</h1>
<h2 dir="ltr" id="toc18"><a id="RefHeading12911012985828"></a>The problem domain: musician data</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">This section is intended as a preview to what YPath can do. Here, we consider an example of a music lover who want to create a program that stores a database of her favourite artists. She decides to use YAML as the configuration file, as it makes a good sketchpad for ideas. A first pass of the document looks like this.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">---</p>
<p class="PreformattedText" dir="ltr">- &KC !muso {name: Kurt Cobain,</p>
<p class="PreformattedText" dir="ltr"> birth: 20-02-1967, </p>
<p class="PreformattedText" dir="ltr"> death: 05-04-1994,</p>
<p class="PreformattedText" dir="ltr"> plays: [vocals, guitar],</p>
<p class="PreformattedText" dir="ltr"> bio: 'Grunge musician. What a way to go.' }</p>
<p class="PreformattedText" dir="ltr">- &JH !muso</p>
<p class="PreformattedText" dir="ltr"> name: Josh Homme</p>
<p class="PreformattedText" dir="ltr"> birth: 17-05-1973</p>
<p class="PreformattedText" dir="ltr"> death:</p>
<p class="PreformattedText" dir="ltr"> plays: </p>
<p class="PreformattedText" dir="ltr"> - vocals</p>
<p class="PreformattedText" dir="ltr"> - guitars</p>
<p class="PreformattedText" dir="ltr"> bio: |</p>
<p class="PreformattedText" dir="ltr"> Big in the Palm Desert scene. </p>
<p class="PreformattedText" dir="ltr">- &DG !muso </p>
<p class="PreformattedText" dir="ltr"> name: Dave Grohl</p>
<p class="PreformattedText" dir="ltr"> birth: 14-01-1969</p>
<p class="PreformattedText" dir="ltr"> plays: [drums, guitar, vocals]</p>
<p class="PreformattedText" dir="ltr"> bio: "Cool dude. Been with a lot of bands."</p>
<p class="PreformattedText" dir="ltr"> collaborations: {[*KC, Krist Novoselic]: Nirvana,</p>
<p class="PreformattedText" dir="ltr"> [*JH, Nick Oliveri, Mark Lanegan]: Queens of the Stone Age,</p>
<p class="PreformattedText" dir="ltr"> [*JH, John Paul Jones]: Them Crooked Vultures}</p>
<p class="PreformattedText" dir="ltr">- &ДШ !muso</p>
<p class="PreformattedText" dir="ltr"> name: {en: Dmitri Shostakovich, ru: Дмитрий Шостакович, </p>
<p class="PreformattedText" dir="ltr"> de: Dmitri Schostakowitsch}</p>
<p class="PreformattedText" dir="ltr"> birth: 25-09-1906 # Gregorian, not Julian calendar date.</p>
<p class="PreformattedText" dir="ltr"> death: 09-08-1975</p>
<p class="PreformattedText" dir="ltr"> plays: [piano] # Compositions can go later.</p>
<p class="PreformattedText" dir="ltr"> bio: ></p>
<p class="PreformattedText" dir="ltr"> Got Three "Hero of the Soviet Union" medals.</p>
<p class="PreformattedText" dir="ltr"> Didn't get along with Stalin.</p>
<p class="PreformattedText" dir="ltr">...</p>
<h2 dir="ltr" id="toc19"><a id="RefHeading12931012985828"></a>Observations</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Music purists may be appalled by this example. There is no mention of the Foo Fighters, and John Paul Jones's <a href="http://www.ledzeppelin.com/">work with other artists curiously remains unexplored</a>. Never the less, there are several observations that can be made about how this anonymous music lover arranged her data. </p>
<ul style="margin-top:0;margin-bottom:0;list-style-type:disc;clear:left">
<li>
<p class="Standard" dir="ltr">The first observation is that the music lover has created a custom tag "muso" to represent musician data. She has implemented it as a mapping with the following keys: "name", "birth" date, "death" date, the instrument(s) the musician "plays", and a "bio" (-graphy). Last but not least is "collaborations", which lists other people the musician has worked with. All keys are strings.</p>
</li>
<li>
<p class="Standard" dir="ltr">However, not all keys nor values are present for each musician; "death" maps to a null for Josh Homme's record, and there is no "death" key for Dave Grohl. This is hardly surprising, as neither musician is dead at the time of writing. </p>
</li>
<li>
<p class="Standard" dir="ltr">The "collaborations" key corresponds to another mapping, where keys are sequences of musicians. In this mapping, all corresponding values are the names of the resulting band. YAML permits anything to be use as a mapping key, as long as there are no duplicates in one mapping.</p>
</li>
<li>
<p class="Standard" dir="ltr">In some cases, musicians are indicated by aliases to other records; in other cases, they are names. This is one reason that our music lover chose YAML: arbitrary data types can be mixed together in the one structure. </p>
</li>
<li>
<p class="Standard" dir="ltr">The corresponding value for "name" keys is generally a string. However, non-English musicians such as Dmitri Shostakovich may have completely different names in their own language. For this record, the corresponding value of "name" is a mapping, where <a id="firstHeading"></a>ISO 639-1 two digit language codes map onto the relevant transcription. </p>
</li>
<li>
<p class="Standard" dir="ltr">YAML data is Unicode data. Our musician could use '\uxxxx' escape sequences to represent the characters in Shostakovich's name, but it is far easier to write it in the original Cyrillic.</p>
</li>
<li>
<p class="Standard" dir="ltr">The "bio" key always maps onto strings. YAML allows strings to be styled in several different ways: as a single quoted string, a double quoted string, a folded scalar or a literal scalar. Our musician lover has chosen each format for convenience. For example, the folded style allows her to put double quotes around "Hero of the Soviet Union" without escape sequences for the quote characters. The representation graph does not care about style information. Neither does YPath.</p>
</li>
</ul>
<h2 dir="ltr" id="toc20"><a id="RefHeading12951012985828"></a>Using YPath</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Given a well-formed YAML document, anyone can execute YPath expressions to extract data out of it. For the above YAML example, our music lover has the following use cases in mind.</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find musicians</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath only needs to return all data with the tag '!muso'. More formally, we can express all descendant nodes of the root which are musicians. So we can use:</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">/*!muso</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">But YPath is a terse language for terse expressions, so the following would suffice.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso</p>
<p class="PreformattedText" dir="ltr"> </p>
<p class="Standard" dir="ltr">In the example document above, the root node is a document, and its children are exactly the musician nodes in the document. So the author could also write the following to extract the same information:</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">/.</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">However, the result of evaluating "/." would not always be the same as evaluting "!muso". The author may add more nodes to the root sequence <span style="font-style:italic">which do not represent musicians</span>. Alternately, the author could add information for other musicians elsewhere in the document, such as the value of another musician's "collaborations" key. Either is permissible in YAML. If an author is looking for musician data, then his or her YPath expression should be explicit about it. That is why "!muso" is a lot safer. </p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find the names of all musicians </p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">A valid YPath expression would find all data with the tag '!muso'. For each structure found, YPath returns the corresponding value for key 'name'. More formally, this could be expressed in either of the following ways.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/value("name")</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">But we can use the <span style="font-style:italic">abbreviated</span> syntax. A plain, unquoted scalar in a YPath expression denotes a key in a mapping, and indicates that the corresponding value should be returned. So the following would be equivalent.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/name</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Finds the English names of all musicians</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The data from the last use case is refined as follows: if the data is a string, return it. Otherwise, it is assumed that the data is a mapping, so return the value corresponding to the 'en' key. How can this be achieved? Not all name values are strings.</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath allows data to be restricted to types by using tags - the same tags as are in YAML itself. The results can be combined by using the "|" operator, which indicates set union. So the result looks like this.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/(name.!!str|name.!!map/en)</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find the Russian names of all musicians</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Regular expressions come to the rescue. We can match all names that contain Cyrillic characters. So we look for the values of name attributes that contain Cyrillic characters. It is unnecessary to filter out name values which are mappings; regular expression predicates will ignore them.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/name.res("[А-Ӿ]")</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-style:italic">Note: the expression contains a Cyrillic "А" rather than a Latin "A".</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find all musicians that have collaborated with Dave Grohl</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath finds the unique record where the tag is '!muso' and the name is Dave Grohl. For this record, it finds the value for the "collaborations" key. This is a mapping, where all keys are sequences. YPath finds each key using "keys()", and returns all elements in each sequence using "*".</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/name.!!str.="Dave Grohl"/collaborations/keys()/*</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find all musicians that have collaborated with Dave Grohl that are represented as strings</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">A YPath expression here would be similar to that for the last use case, with one refinement: "YPath returns all elements in each sequence <span style="font-style:italic">that is a string</span>."</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/name.!!str.="Dave Grohl"/collaborations/keys()/*.!!str</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find all birthdates of musicians born before 1970</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">We can use abbreviated YPath forms to find the values of birthdates, and then filter the ones that are before 1970 by using the lessthan predicate. So we can write:</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/birthdate.lessthan(01-01-1970)</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find all musicians born before 1970</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The situation is different from above - YPath needs to return the musicians with birthdays before 1970, rather than returning the birthdates themselves. So the $ operator is appended to the birthdate to indicate a predicate that musicians must supply. The contents of it are another YPath expression. The predicate is true if the expression is non-empty, and false otherwise.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso.$(birthdate.lessthan(01-01-1970))</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find all musicians who are dead</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath finds all musicians with the tag '!muso', and where the value for 'death' is a valid timestamp. </p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso.$(death.!!timestamp)</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-weight:bold">Find all musicians who worked with Dave Grohl and are dead</p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">A valid expression would start with the 'Find all musicians that have collaborated with Dave Grohl' use case, and refine it so that 'YPath returns all elements in each sequence <span style="font-style:italic">that is of tag !muso, and the value for "death" is a valid timestamp</span>':</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">!muso/name.!!str.="Dave Grohl"/collaborations/keys()/$(death.!!timestamp)</p>
<h1 dir="ltr" id="toc21"><a id="RefHeading12971012985828"></a>YPath Components</h1>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">This section describes what YPath provides in general detail. It gives an overview of what functionality YPath provides, and the components that can be utilised to find content inside YAML streams.</p>
<p class="Textbody" dir="ltr">The primary parts of a YPath expression are document matching, value uniqueness, and the location path; only the last is required. Each will be described in turn. The location path is the "core" part of a YPath expression, where users specify what sort of data they want from a YAML document.</p>
<h2 dir="ltr" id="toc22"><a id="RefHeading12991012985828"></a>Document Matching</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">A YAML stream can contain one document, or it can contain multiple documents. YPath expressions can be written to target specific documents inside a stream. This is done by writing a document matching component inside the YPath expression. This component is optional. If present, always occurs at the start of the expression. If absent, then the subsequent parts of the expression is taken to act on all documents in the stream.</p>
<p class="Textbody" dir="ltr">The document matching component can be written to select a document by its index. By convention, YPath uses 0-based indices, so an index of 0 is used to select the first document in the stream to be the target of YPath, an index of 1 for the second document, and so on. For convenience, users can write -1 to select the last document in the stream.</p>
<p class="Textbody" dir="ltr">Alternatively, users can write slices to address ranges of documents in a YAML stream. For example, "0:3" would select the first to fourth documents, while "4:-1" would select the fifth to last (if any). Each document selected (if they exist) becomes the target of the subsequent parts of the YPath expression, and the remainder is ignored. YPath then acts on the representation graph for each document selected, and creates a result set based on its components. </p>
<p class="Textbody" dir="ltr">It is possible for users to write a document matching component for documents that don't exist in a YAML stream, or for a stream that lacks YAML documents. In that case, the application should deliver a warning, but not an error. However, it is possible for YPath to return empty result sets for a document. This is judged a successful result. </p>
<p class="Textbody" dir="ltr">Documents in YAML streams are distinct from each other, as are their representation graphs - and each document's YPath result set is distinct from other documents' result sets. </p>
<h2 dir="ltr" id="toc23"><a id="RefHeading13011012985828"></a>Value Uniqueness</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">YPath processers will take a YPath expression and a document, and return a result set. This is a set of all nodes in the document's representation graph that match the expression. YPath does not allow duplicate nodes in one result set - that is, nodes that occupy the same part of memory. However, it is possible to have two nodes with the same value. No nodes in a result set are identical to each other, but some are equal to each other</p>
<p class="Textbody" dir="ltr">For this reason, YPath allows users to specify expressions so that all values in a result set are unique. In particular, scalar values should be unique in value, while collections may be unique. This is added as YPath users may not want duplicate values in their result set, and duplicates may cause errors being passed elsewhere. Alternatively, users may indicate that duplicate values are permitted.</p>
<h2 dir="ltr" id="toc24"><a id="RefHeading13031012985828"></a>Location Paths</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">Location paths are the core part of the YPath syntax, as they are in XPath. The contents of a location path are evaluated relative to a context node - a node in the representation graph. Generally (but not always), this is the root node of a document.</p>
<p class="Textbody" dir="ltr">Each location path can be broken down into a series of steps, where each step is joined to the previous by the forward slash character "/". The first step is the leftmost part of the location path, and each step contains information that defines a set of nodes based on its relation to the content node. Each node in a set is used as a context node for the following step, and the node set for the following step is the union of all nodes found by evaluating that step. A YPath processor works its way along the location path until the last step is evaluated - and then the result for that step becomes the result for the location path.</p>
<p class="Textbody" dir="ltr">In XPath, there is a distinction between relative location paths, which do not start with a "/" character, and absolute location paths, which do. Absolute locations paths are always evaulated relative to the root node. YPath also distinguishes between relative and absolute location paths, but it is almost always unnecessary to define an absolute location path in YPath. Unless otherwise specified, the context node for a YPath expression is the root node of a document.</p>
<h2 dir="ltr" id="toc25"><a id="RefHeading13051012985828"></a>Location Steps</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">There are several parts that may be in a location step.</p>
<h3 dir="ltr" id="toc26"><a id="RefHeading13071012985828"></a>Axis</h3>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">An axis specifies the relationship between the nodes selected by the location step and the context node. There are four primary axes that can be specified.</p>
<p class="Textbody" dir="ltr">Self: the node selected by the location step is the context node itself.</p>
<p class="Textbody" dir="ltr">Keys: the node selected by the location step are the keys of the context node (if the node is a map). For scalars and sequence context nodes, a keys axis returns an empty node set.</p>
<p class="Textbody" dir="ltr">Values: the node selected by the location step are the values of the context node (if the node is a map). For scalars and sequence context nodes, a values axis returns an empty node set.</p>
<p class="Textbody" dir="ltr">Elem: the node selected by the location step are the elements of the context node (if the node is a sequence). For scalars and mapping context nodes, a elem axis returns an empty node set.</p>
<p class="Textbody" dir="ltr">Axes can be combined. For example, YPath allows a self and values axis to be entered by users. One common combination is "child", which is the combination of keys and values and elem. All nodes which are contained in another (mapping or sequence) are its children. Descendants are nodes which are children, grand-children and so on. </p>
<p class="Textbody" dir="ltr">It is possible to refine axes by putting conditions on it. For sequence context nodes, it is possible to restrict elements returned by the elem axis by their index in a sequence. Users can write a single index, or a slice, akin to the document matching component.</p>
<p class="Textbody" dir="ltr">For mapping context nodes, it is also possible to refine the "values" axis by their corresponding key. There are two ways to do it.</p>
<p class="Textbody" dir="ltr">The first is to specify the key in the YPath expression. For complex keys, it is possible to write the key using YAML flow notation inside the expression. This can include "{", "}", "[", "]", ":", and even "&" and "*" characters for anchors and aliases! The values axis is then refined to return only the value with the matching key.</p>
<p class="Textbody" dir="ltr">The second method is to specify a second YPath expression which uses the key node as the context node, and is evaluated to search for particular descendants of the key. The values axis is then refined to return values whose keys return non-empty nodeset when evaluating the second YPath expression. </p>
<h3 dir="ltr" id="toc27"><a id="RefHeading13091012985828"></a>Tag Predicates</h3>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">The node set returned by the axes may then be refined to allow only those whose tags match a certain value. YPath expressions can be written to limit nodes by namespace (the URL for the tag), and by local name.</p>
<h3 dir="ltr" id="toc28"><a id="RefHeading13111012985828"></a>Other Predicates</h3>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">It is possible to add other predicates to a step, which then refine the node set defined by the axes and the tag predicates. Other predicates include (but are not limited to):</p>
<p class="Textbody" dir="ltr">Equal-to: Only returns those nodes which are equal to a given value; returns nothing otherwise.</p>
<p class="Textbody" dir="ltr">Not-equal-to: Only returns those nodes which are not equal to a value.</p>
<p class="Textbody" dir="ltr">Less-than, greater-than: Only returns those nodes which are less than or greater than given value. This is useful for numerical and timestamp nodes. For other context nodes, these predicates return nothing.</p>
<p class="Textbody" dir="ltr">Ranges: Only returns those nodes which are between A and B in value.</p>
<p class="Textbody" dir="ltr">(Length: returns the length of strings in characters, collections in number of elements, etc. Not really a predicate, but should be passed to one.)</p>
<p class="Textbody" dir="ltr">Regexp: Returns nodes only when the regular expression is evaluated to match it.</p>
<p class="Textbody" dir="ltr">HasYPath: Returns nodes only when a given YPath expression evaluated on it yields a non-empty node set. This is useful when users are interested in nodes rather than its descendants, but need to pick nodes by its descendants.</p>
<p class="Textbody" dir="ltr">IsYPath: Returns nodes only when a given YPath expression evaluated on itself yields itself! Possible due to the wondrous recursive nature of YAML.</p>
<p class="Textbody" dir="ltr"> </p>
<p class="Standard" dir="ltr" style="font-style:italic">Note: the actual YPath syntax for each predicate remains to be defined. Users on the yaml-core mailing list will have their input into this process. For the time being, this specification is interested in what predicates are necessary in a useful YPath.</p>
<p class="Standard" dir="ltr" style="font-style:italic"> </p>
<h2 dir="ltr" id="toc29"><a id="RefHeading13131012985828"></a>Set Operators in YPath</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">There are three operators in YPath to combine the results of YPath expressions. They are:</p>
<ul style="margin-top:0;margin-bottom:0;list-style-type:disc;clear:left">
<li>
<p class="Standard" dir="ltr">Set union, indicated by the vertical bar"|" symbol. The result of evaluating A | B is the union of the result set of evaluating A and the result set of evaluating B.</p>
</li>
<li>
<p class="Standard" dir="ltr">Set intersection, indicated by the ampersand "&" symbol. The result of evaluating A & B is the intersection of the result set of evaluating A and the result set of evaluating B.</p>
</li>
<li>
<p class="Standard" dir="ltr">Set complement, indicated by the caret "^" symbol. The result of evaluating ^A is the complement of the result set of evaluating A relative to the (document set).</p>
</li>
</ul>
<p class="Textbody" dir="ltr">Full location paths, location steps, and even sub-components of steps - such as axes or predicates - can be the arguments of set operators in YPath. Parentheses - "(" and ")" - can be used to combine the results of set operations in arbitrary order. </p>
<h2 dir="ltr" id="toc30"><a id="RefHeading13151012985828"></a>Abbreviated Steps</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Textbody" dir="ltr">For convenience, YPath allows an abbreviated notation. If a step is a plain scalar (without quotes), then for a given context node, YPath returns:</p>
<ul style="margin-top:0;margin-bottom:0;list-style-type:disc;clear:left">
<li>
<p class="Standard" dir="ltr">The corresponding value for the key specified by the plain scalar - if the context node is a mapping, and/or,</p>
</li>
<li>
<p class="Standard" dir="ltr">The corresponding index or range of elements in the context node, if the context node is a sequence, and the plain scalar can be "cast" to an index or range. </p>
</li>
</ul>
<p class="Textbody" dir="ltr">For other situations, YPath returns nothing.</p>
<h1 dir="ltr" id="toc31"></h1>
<h1 dir="ltr" id="toc32" style="page-break-before:always"><a id="Toc300681972"></a><a id="RefHeading13171012985828"></a>Fundamentals of YPath</h1>
<p class="Standard" dir="ltr">YPath is a language for addressing parts of a YAML document or stream, and processing YPath requires the processing of YAML. To aid understanding, the following sections may paraphrase chapter 3 of the YAML specification: "Processing YAML Information". For more information, refer to the <a href="http://www.yaml.org/spec/1.2/spec.html">YAML specification itself</a>.</p>
<h2 dir="ltr" id="toc33"><a id="RefHeading13191012985828"></a>YAML Processing</h2>
<p class="Textbody" dir="ltr"> </p>
<p class="Standard" dir="ltr">YAML is a text format used for presenting any native data structure in a file or stream. In theory, any data structure could be converted into a YAML file which preserves its structure in a human-readable format. The verb "dump" is used to refer to this conversion, and the verb "load" denotes the reverse - taking a YAML file and converting it to native structures. </p>
<p class="Standard" dir="ltr">The YAML suggests (but does not mandate) that conversion between a YAML stream and the native structures use two intermediate stages. One stage is the is the representation graph, where each node in the graph correspond to a datum - a collection or an atomic object - inside the YAML stream. An analogy for the representation graph model is the DOM (Document Object Model) for XML documents. This will be described below in more detail. The other stage is the serialisation stage, where each node corresponds to an occurrence of a datum read inside the YAML stream - <span style="font-style:italic">in the order in which it is referenced</span>. This permits a serial event-based API to access it, analogous to how SAX can be used to parse XML. </p>
<p class="Standard" dir="ltr">The four stages of YAML processing - the native data structure, the representation graph, the serialisation event tree, and the character stream - are shown in the diagram below.</p>
<p class="Standard" dir="ltr"> </p>
<div style="text-align:center">
<img alt="graphics1" class="frameGraphics" id="graphics1graphic" src="ypathspec-img/ypathspec-img001.png" style="width:624.0px;height:243.06143px" />
</div>
<p class="Standard" dir="ltr"></p>
<p class="Subtitle" dir="ltr"><span class="Captioncharacters">Figure 3.1 - YAML Processing Overview. </span></p>
<p class="Subtitle" dir="ltr"><span class="Captioncharacters">Taken from figure 3.1 of the YAML 1.2 specification</span></p>
<h2 dir="ltr" id="toc34"><a id="RefHeading13211012985828"></a>YPath Processing</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath uses the Representation Graph layer as its information model. It has been chosen for several reasons.</p>
<ul style="margin-top:0;margin-bottom:0;list-style-type:disc;clear:left">
<li>
<p class="Standard" dir="ltr">The Representation Graph is a fairly simple model, with only three types of nodes to support. This contrasts with the DOM, where at the time of writing there are twelve types of nodes to support. In addition, the Representation Graph contains data, but otherwise does not specify any methods or interfaces to interact with other applications, unlike DOM. The representation graph model provides exactly what is necessary for YPath - no more and no less. </p>
</li>
<li>
<p class="Standard" dir="ltr">The Representation Graph model is part of the YAML specification itself, rather than specified through external documentation. This compares with XPath, where an Infoset Model had to be created specifically to support it. </p>
</li>
<li><p class="Standard" dir="ltr">Other layers in the YAML specification are unsuitable for providing a data model for YPath. </p><ul style="margin-top:0;margin-bottom:0;list-style-type:bullet;clear:left"><li><p class="Standard" dir="ltr">The presentation layer is not an abstract structure of a stream, and stylistic choices such as indentation levels should be ignored. </p></li><li><p class="Standard" dir="ltr">The serialisation layer also includes information that should be ignored, such as the order of keys in a stream. </p></li><li><p class="Standard" dir="ltr">Finally, the native data layer is unsuitable, as any YAML stream is likely to be implemented differently in different platforms and languages. For example, PHP does not even provide an Unicode string, and the implementation of Unicode strings in Python differs between 2.x and 3.x, and between Linux and Windows.</p></li></ul></li>
</ul>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The necessity of the Representation Graph is one instance where YAML processing differs from YPath. As the YAML specification (3.1) states: </p>
<p class="Standard" dir="ltr">"A YAML processor need not expose the serialization or representation stages. It may translate directly between native data structures and a character stream (dump and load in the diagram above)."</p>
<p class="Standard" dir="ltr">YPath differs in that the representation stage must be exposed. However, the serialisation stage may or may not be exposed. The following sentences in the specification remain true for both YAML and YPath.</p>
<p class="Standard" dir="ltr">"However, such a direct translation should take place so that the native data structures are constructed only from information available in the representation. In particular, mapping key order, comments, and tag handles should not be referenced during composition." </p>
<p class="Standard" dir="ltr">From this, it is possible to construct a simplified YPath Processing overview.</p>
<p class="Standard" dir="ltr"> </p>
<div style="text-align:center">
<img alt="graphics3" class="frameGraphics" id="graphics3graphic" src="ypathspec-img/ypathspec-img002.png" style="width:598.1481px;height:281.6126px" />
</div>
<p class="Standard" dir="ltr"></p>
<p class="Subtitle" dir="ltr"><span class="Captioncharacters">Figure 3.2 - YPath Processing Overview </span></p>
<p class="Standard" dir="ltr">In this diagram, the serialisation layer has been elided away, "parsing" is combined into composition, and serialisation is combined into presentation. Readers should not interpret this to mean that Serialisation Event Tree is incompatible with YPath. Rather, YPath processors may compose Representation Graphs from YAML streams without necessarily passing through a serialisation phase. </p>
<h2 dir="ltr" id="toc35"><a id="RefHeading13231012985828"></a>The Representation Graph Model</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">A YAML stream can contain one or more documents. For each well-formed and valid document inside it, a YAML processor will compose a representation graph of data for it. Each graph is distinct from each other, and it is not possible for a graph to span multiple documents. The Representation Graph model is the data model used by YPath, and like the last section, it may be helpful to paraphrase the YAML specification for YPath implementers. </p>
<p class="Standard" dir="ltr"> </p>
<h3 dir="ltr" id="toc36"><a id="RefHeading13251012985828"></a>Representation Graphs</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">A YAML document can present many different types of data as text. Some are scalars, such as integers and strings, and others are collections (such as sequences and mappings). Composing a Representation Graph means turning the data into rooted, directed, connected graph of nodes, with each representing a datum in the original document. All nodes fall into one of the following categories, or kinds:</p>
<ul class="listlevel1WWNum4">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Mappings: a collection of unique keys, each mapping to a value. Keys and values are nodes in their own right.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Sequences: an ordered collection list of elements. Each element is a node in its own right.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Scalars: data which cannot be represented as collections, such as integers, strings, floating point numbers and date time expressions. </p>
</li>
</ul>
<p class="Standard" dir="ltr"> </p>
<h3 dir="ltr" id="toc37"><a id="RefHeading13271012985828"></a>Node Properties</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">All nodes have a tag - a string - which identifies the type of the native data structure presented by the node. Tags may be global, and assigned a namespace - an URI indicating its family - which is contained in the tag property. Alternatively, tags may be local, and have no namespace. For example, "quaternion" is a local tag;, while "tag:yaml.org,2002:str" is a global tag which YAML uses for strings, with “tag:yaml.org,2002” being the standard namespace for standard YAML tags. By convention, all tags are prefixed with the "!" character in YAML streams, to distinguish it from other content. However, it would be redundant to store this character in the tag property </p>
<p class="Standard" dir="ltr">All scalar nodes also have a canonical format property. This is a Unicode character string which represents the same content as the data in the original content. For example, the canonical format for "0xf", "0x0f" and "15" (all representations of the number 15) is "15". Neither mappings nor sequences have canonical format properties. </p>
<p class="Standard" dir="ltr"> </p>
<h3 dir="ltr" id="toc38"><a id="RefHeading13291012985828"></a>Misconceptions</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">As stated above, all Representation Graphs for documents are "rooted, directed, connected" graphs of nodes. To expand on this:</p>
<ul style="margin-top:0;margin-bottom:0;list-style-type:disc;clear:left">
<li>
<p class="Standard" dir="ltr">Each document has one root node, which can be a mapping, a sequence or a scalar itself. This corresponds to the root datum of the original YAML document. If a document's root node is a scalar, then that scalar is the only data in the document.</p>
</li>
<li>
<p class="Standard" dir="ltr">Mapping nodes contain references directed towards both its key nodes and its value nodes. Sequences nodes contain references directed towards its element nodes. However, the reverse relationship does not hold, and the contained nodes may not refer to their containers. Scalar nodes are not collections, and thus do not contain references to other nodes.</p>
</li>
<li>
<p class="Standard" dir="ltr">A YAML document is composed to create one connected graph of nodes; it is impossible to create two or more unconnected graph from one YAML document. </p>
</li>
</ul>
<p class="Standard" dir="ltr"> </p>
<div style="text-align:center">
<img alt="graphics2" class="frameGraphics" id="graphics2graphic" src="ypathspec-img/ypathspec-img003.png" style="width:624.0px;height:471.11813px" />
</div>
<p class="Standard" dir="ltr"></p>
<p class="Subtitle" dir="ltr"><span class="Captioncharacters">Figure 3.3 - YAML Representation Model. </span></p>
<p class="Subtitle" dir="ltr"><span class="Captioncharacters">Taken from figure 3.3 of the YAML 1.2 specification</span></p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Two or more nodes are equal to each other if they have the same tag and content. However, two or more nodes are identical only if they occupy the same part of memory. Note that identity infers equality, but not vice versa. </p>
<p class="Standard" dir="ltr">One important fact about YAML collections is that they can contain arbitrary information – including itself. For example, a YAML sequence can consist of itself, a string, another sequence, the same string, and itself again. A YAML mapping may have one key value pair consisting of a string mapping to itself, and another key value pair of itself mapping to the same string. However, no two keys of a mapping can reference nodes which are equal in value.</p>
<p class="Standard" dir="ltr"> </p>
<h3 dir="ltr" id="toc39"><a id="Toc300681975"></a><a id="RefHeading13311012985828"></a>Limitations of the Representation Graph model</h3>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">There are presentation details in a YAML document that cannot be accessed by YPath. </p>
<ul class="listlevel1WWNum4">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">YPath cannot identify or select collections based on whether they were expressed in block or flow form. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">YPath cannot identify or select strings or other scalars based on whether they were expressed as singled-quoted, double quoted, or otherwise. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">YPath cannot identify nodes based on their anchor value.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">YPath cannot return the value of directives such as TAG or YAML.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">YPath cannot return the value of comments. </p>
</li>
</ul>
<p class="Standard" dir="ltr">This is because this information is discarded when translated into the representation model. </p>
<p class="Standard" dir="ltr"> </p>
<h2 dir="ltr" id="toc40"><a id="RefHeading13331012985828"></a>Evaluating YPath</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The core operation of YPath is evaluating: taking a YPath expression, composing a stream to one or more representation graphs, and finding out which nodes in each graph are returned from it. A YPath expression consists of the following parts:</p>
<p class="Standard" dir="ltr"> </p>
<ul class="listlevel1WWNum4">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">A document matching part. This indicates which documents in the stream are accessed by the YPath expression.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Uniqueness expression. This indicates whether duplicate elements with equal value can be returned by the expression, or whether all elements have to have unique values. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">The core YPath expression, which is used to match data in each document specified by the document matching part.</p>
</li>
</ul>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">There are three possible situations which may occur:</p>
<p class="Standard" dir="ltr"> </p>
<ul class="listlevel1WWNum4">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Error. This occurs if the YPath expression is not well formed or the YAML stream is not well formed.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Warning. This occurs if the YPath expression is well formed but no documents are matched in the stream.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Success. This occurs if the YPath expression is well formed and documents are matched inside the stream. A node set is returned for each document matched; all, any or none of them may be empy.</p>
</li>
</ul>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Node sets can take two forms.</p>
<p class="Standard" dir="ltr"> </p>
<ul class="listlevel1WWNum4">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">A set: a mapping consisting of unique keys with no matching values. This tag for this is !!set.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">A multiset: a sequence of nodes which may contain repeated value. The order of elements for each may match the order they were found in the original document.</p>
</li>
</ul>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Each node set returned - whether set or multiset - is the root node of a new representation graph, and nodes returned by the YPath expression are elements of it. It is then possible to perform the following operations:</p>
<p class="Standard" dir="ltr"> </p>
<ul class="listlevel1WWNum4">
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Presentation: Each representation graph is presented into a YAML document, and a new stream is constructed consisting of all aforesaid documents concatenated altogether. </p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Construction: All representation graphs are constructed into native data.</p>
</li>
<li>
<p class="ListParagraph" dir="ltr" style="margin-left:24.0px;margin-right:0;text-indent:0">Representation: the reverse operation of construction. </p>
</li>
</ul>
<p class="Standard" dir="ltr"> </p>
<h1 dir="ltr" id="toc41"></h1>
<p class="Standard" dir="ltr"> </p>
<h1 dir="ltr" id="toc42"><a id="RefHeading13351012985828"></a>YPath Expressions</h1>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">This is the section which delves into the syntax of YPath. Like the YAML specification, this document uses parameterized BNF productions. Each BNF production is both named and numbered for easy reference. Whenever possible, basic structures are specified before the more complex structures using them in a “bottom up” fashion. </p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">The order of alternatives inside a production is significant. Subsequent alternatives are only considered when previous ones fails. In addition, production matching is expected to be greedy. Optional (?), zero-or-more (*) and one-or-more (+) patterns are always expected to match as much of the input as possible. </p>
<h2 dir="ltr" id="toc43"><a id="RefHeading13371012985828"></a>YPath character set</h2>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath expressions are strings consisting of Unicode characters - in particular printable Unicode characters. So we can use the same production for YAML.</p>
<p class="Standard" dir="ltr"> </p>
<p class="PreformattedText" dir="ltr">[1] <a id="cprintable"></a>c-printable <span class="SourceText">::=</span> #x9 | #xA | #xD | [#x20-#x7E] /* 8 bit */<br />| #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] /* 16 bit */<br />| [#x10000-#x10FFFF] /* 32 bit */ </p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">Note that this specification does not mandate the encoding used to represent it. YPath expressions could be used inside YAML streams (or other files), but may also be shared between applications, each of which has its own format for representing strings. </p>
<p class="Standard" dir="ltr"> </p>
<p class="Standard" dir="ltr">YPath expressions cannot contain the byte order mark character #xFFFE, because the order of bytes would be determined by the medium used to contain it. Other characters excluded from a YPath string are the C0 and C1 control blocks (except for the tab, LF, CR and NEL characters), DEL (#x7F), the surrogate block ##xD800-#xDFFF, and the error non-character #xFFFF.</p>
<h2 dir="ltr" id="toc44"></h2>
<p class="Standard" dir="ltr"> </p>
</body>
</html>