Skip to content
Benito van der Zander edited this page Dec 16, 2017 · 5 revisions

In some situations the XQuery/Xidel syntax can lead to very confusing outputs. They are explained here as warning.

Using / on sequences rather than on sets

When the W3C designed XPath 2 / XQuery there were two conflicting goals: Extend the type system to have many different data types and arbitrary, ordered sequences of values, while still keeping everything compatible to XPath 1 whose type system only supported nodes in sets.

Thus they defined / on nodes to behave as if it returns a set of nodes. (although it is still a sequence, since XPath 2 does not have a set of nodes data type)

This means / will remove duplicates and reorder nodes to document order, which is often unexpected. For example on an input document

<x>
 <a>a</a>
 <b>b</b>
</x>

A query (x/a,x/a) will return a twice, while x/(a,a) or (x,x)/a or (x,x)/(a,a) will only return a once.

(x,x)/(b,a,a) or (x,x)/(b,a,a,a) will only return two elements: one a followed by one b.

It implies that appending /. to a query can be used as duplicate removal operator as it has the set semantics of /, while . maps everything to itself.

On the other hand (x,x)/(1,1) will return 1, 1, 1, 1, every x is mapped to two 1, because there was no need to stay compatible to XPath 1 on non-node output, since XPath 1 had no type for a collection of non-nodes.

Furthermore x/(a,1) is a type error. a still needs to behave as if it returns a set without duplicates, but 1 wants a sequence, so they cannot be mixed.

The most surprising caveat might be the construction of new nodes. x/(a,b,a,b)/<foo>{.}</foo> will wrap the elements in foo-elements. This will return either <foo><a>a</a></foo>, <foo><b>b</b></foo> or <foo><b>b</b></foo>, <foo><a>a</a></foo>. Since the <foo> elements have no common ancestors, they are in different documents. Different documents have no defined order, so the output is randomly shuffled.

Solution: Always use the XPath 3 mapping operator !, unless you actually want a reordering of the output. ! returns all nodes in the order they occurred in the query. I.e. x/(b,a,a) will return one b followed by two a.

Confuse the syntax of order by and group by

order by is used to sort the output of a FLOWR expression, and group by is used to group the output into sequences that have a same key, e.g. find a group of all odd numbers and a group of all even numbers.

order by is followed by a sorting key as expression that calculates that key. If you just want to sort the values, the key is the value (variable), e.g.:

$ xidel --xquery 'for $i in 1 to 10 order by $i return join($i)'
1
2
3
4
5
6
7
8
9
10

To reverse the sequence, you can negate the key (or use order by $i descending)

$ xidel --xquery 'for $i in 1 to 10 order by -$i return join($i)'
10
9
8
7
6
5
4
3
2
1

Or use any other expression, like just the last bit:

$ xidel --xquery 'for $i in 1 to 10 order by $i mod 2 return join($i)'
2
4
6
8
10
1
3
5
7
9

(Minor caveat 1: In Xidel sorting is always stable, but the XQuery standard does not require that, so other processors might sort it unstable. stable order by is required to be stable everywhere.)

group by is followed by a key as variable in XQuery, not by an expression. So the first nop-example can be changed to group by, but the other two cannot.

$ xidel --xquery 'for $i in 1 to 10 group by $i return join($i)'
1
2
3
4
5
6
7
8
9
10

$ xidel --xquery 'for $i in 1 to 10 group by -$i return join($i)'
Error:
err:XPST0003: "$" expected, but "-" found
in: for $i in 1 to 10 group by - [<- error occurs before here] $i return join($i)

$ xidel --xquery 'for $i in 1 to 10 group by $i mod 2 return join($i)'
**** Processing: data:,<empty/> ****
Error:
err:XPST0003: Expected return 
in: for $i in 1 to 10 group by $i mod [<- error occurs before here]  2 return join($i)

(Minor caveat 2: In Xidel group by currently also sorts the output according to the key, which is an implementation detail. Many XQuery processors might rather randomly shuffle the output in the first example and future Xidel might as well. )

If you want to group by an expression, you need to declare a new a variable for that expression. One could use let, but XQuery allows a shortcut by putting the assignment of the expression to a temporary variable directly after the group by:

$ xidel --xquery 'for $i in 1 to 10 group by $temp := -$i return join($i)'
10
9
8
7
6
5
4
3
2
1

$ xidel --xquery 'for $i in 1 to 10 group by $temp := $i mod 2 return join($i)'
2 4 6 8 10
1 3 5 7 9

That last example is the first in which the grouping actually does anything. In all other examples $i just iterates over a sequence as in any normal for-loop; in that example $i is changed to contain a sequence of 5 values, all odd or all even, and the expression after return is only called twice.

The big caveat occurs now, when you use a temporary variable with order by and get:

$ xidel --xquery 'for $i in 1 to 10 order by $temp := -$i return join($i)'
** Current variable state: **
temp := -1
temp := -2
temp := -3
temp := -4
temp := -5
temp := -6
temp := -7
temp := -8
temp := -9
temp := -10

$ xidel --xquery 'for $i in 1 to 10 order by $temp := $i mod 2 return join($i)'
** Current variable state: **
temp := 1
temp := 0
temp := 1
temp := 0
temp := 1
temp := 0
temp := 1
temp := 0
temp := 1
temp := 0

An output that appear to make any sense.

However, it is very logical. $temp := $i mod 2 is actually a Xidel extension that assigns $i mod 2 to a global variable $temp. As global variable, it does not participate in the FLOWR expression nor any sorting, so it is stored in the order the temporary variable is assigned to. Furthermore, if there are any global variables, Xidel only prints the value of the global variables and ignores the output of the XQuery expression.

The solution is to not use temporary variables with order by.

Dynamic indices, e.g. getting a random element of a sequence

Xidel has an extension function random($x), that returns a random integer between 0 and $x - 1, so $sequence[random(count($sequence)) + 1] seems to be a good way to choose a random element of $sequence. However, this does not work. Rather than returning one random element, it returns a random subsequence, you can get an empty sequence or even the complete sequence.

The reason is that in XPath [] is not a get-array-element operator, but a filter sequence operator that is applied to every element in the sequence. This is clear, if the expression returns a boolean; for example $sequence[. < 100] returns every element less than 100; but it is also true for an integer and $sequence[something-returning-integer] is equivalent to $sequence[position() = something-returning-integer]. So the random example is $sequence[position() = random(count($sequence)) + 1], and since random() is not deterministic, it picks any random subsequence.

The correct way to pick a random element is to use a variable

 let $i := random(count($sequence)) + 1 return $sequence[$i]

(it is a semantic equivalence. Xidel is smart enough to check if the expression is constant like a variable, and only evaluate it once. However, non-deterministic functions are not constant)