You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #3205, @itchyny changed the behaviour of foreach/reduce when UPDATE returns no outputs.
In this issue, I would like to discuss a more far-reaching modification of the behaviour of reduce/foreach based on the current description in the jq manual, which I call @itchyny's desugarings.
The jq manual describes reduce/foreach using desugarings given by @itchyny. In a nutshell, these desugarings suggest that reduce xs as $x (INIT; UPDATE) yields the following, when xs yields x1, ..., xn:
INIT
| x1 as $x | UPDATE
...
| xn as $x | UPDATE
Similarly for foreach xs as $x (INIT; UPDATE; EXTRACT):
INIT |
x1 as $x | UPDATE | EXTRACT,
...
xn as $x | UPDATE | EXTRACT,
empty
However, these desugarings correspond to the current jq behaviour only if UPDATE yields exactly one value.
The manual does not make this explicit, using rather vague explanations such as "So the effect is similar to running something like this". I find this kind of formulation quite unfortunate, leaving readers without understanding what is really going on.
Let me digress a bit to consider the following question: What would a user find "intuitive" when using reduce with UPDATE returning zero or more than one outputs? For this, I find it interesting to consider the question: What has a user learned so far about jq?
In particular, over time, users infer a number of principles from jq's behaviour:
jq filters can yield any number of outputs, between zero and infinite.
jq tries to consider all outputs of a filter, if possible.
I find the second point particularly important.
For example, a user learns that (1, 2) + (3, 4) evaluates to all combinations of sums.
Similarly, a user learns that running a function def f($x; $y): ...; f(1, 2; 3, 4) is not the same as running f(1; 3) or f(2; 4) --- again, it runs all combinations.
Also, a user learns that .[] | . + 1 takes all values returned by .[] and add 1 to every one of these values.
This behaviour is everywhere in jq, and I believe that most reasonably advanced jq users recognise it as a core principle of the language.1
Now let us look at the current implementation of reduce:
Even after @itchyny's change, reduce actually throws away all results of UPDATE except for the last, although it takes the effort to actually compute the outputs.
I would argue that this goes pretty much against user expectations in two ways --- first, reduce does not use all outputs
(violating the core principle that I postulated above), and in addition, it still actually computes all outputs.
You can see this by an example:
foreach is a bit better in that respect, because it pipes all outputs of UPDATE to EXTRACT, however, it also does not consider these outputs in future iterations of UPDATE.
In contrast, @itchyny's desugarings of reduce and foreach have a number of advantages:
They are the most compact description of what happens when UPDATE always yields a single output, which is the case most users care about anyway.
They are formulated using the jq language itself, whereas the behaviour currently implemented in jq needs to be described by pseudo-code (which is not currently done anywhere else in the jq manual).
@itchyny's desugarings consider all outputs of UPDATE, which I believe to align more closely with user expectations (as explained above).
When UPDATE returns infinitely many outputs, then executing @itchyny's desugarings yields outputs, whereas in the current jq behaviour, no outputs are yielded.
@itchyny's desugarings only use operators that can be used together with updates (| and ,), meaning that users can infer naturally how
updates with reduce/foreach on the left-hand side are executed. (Update with reduce/foreach on the LHS currently work only to a limited extent.)
Thinking about the disadvantages of @itchyny's desugarings, I can honestly think about two main issues:
Compatibility:
Existing jq scripts might depend on the current actual jq behaviour of handling UPDATE. However, even @itchyny's recent change (fix: reduce/foreach state variable should not be reset each iteration #3205) breaks compatibility to some degree, so perhaps this would be a good opportunity to go all the way here. Especially because so far, the current jq behaviour is not documented.
Performance:
When UPDATE may return more than one value, executing @itchyny's desugaring makes it necessary to store on the stack
a closure to retrieve the remaining values later. In particular, this can become a problem when we run foreach with
an infinite number of input elements $x, because each $x then increases the stack size, which effectively limits the number of input elements that we can process. However, I have not met this issue so far in practice.
TL;DR: I would like the jq maintainers to reconsider how reduce/foreach are to be executed when UPDATE yields not exactly one output. I hope to have made a convincing case for @itchyny's desugarings.
Having implemented them myself in jaq, I can speak from experience when saying that they are not that hard to implement and no users complained about them yet. :)
Footnotes
There are exceptions to this principle, but these usually exist for good reasons; for example, {a: 1} | .a |= (2, 3) yields only {"a":2}, because an object cannot have more than one value for a single field. ↩
The text was updated successfully, but these errors were encountered:
... reconsider how reduce/foreach are to be executed when UPDATE yields not exactly one output.
And so we see again the stability issues of three-body systems! The three here being the C, Go and Rust implementations.
And the "stability" issue here being compounded by the need for "backwards compatibility".
One way to navigate around these difficulties would be to adopt strategies and tactics for the short term, the medium term, and the long term. In the short-term, we could say that certain behaviors are currently "not specified in detail" (i.e. undefined but within certain constraints); in the medium term, we could work on getting agreement on detailed specifications for a "converged jq"; and in the long-term, the three implementations would converge according to each implementation's assurance of backward compatibility.
In #3205, @itchyny changed the behaviour of
foreach
/reduce
whenUPDATE
returns no outputs.In this issue, I would like to discuss a more far-reaching modification of the behaviour of
reduce
/foreach
based on the current description in the jq manual, which I call @itchyny's desugarings.The jq manual describes
reduce
/foreach
using desugarings given by @itchyny. In a nutshell, these desugarings suggest thatreduce xs as $x (INIT; UPDATE)
yields the following, whenxs
yieldsx1
, ...,xn
:Similarly for
foreach xs as $x (INIT; UPDATE; EXTRACT)
:However, these desugarings correspond to the current jq behaviour only if
UPDATE
yields exactly one value.The manual does not make this explicit, using rather vague explanations such as "So the effect is similar to running something like this". I find this kind of formulation quite unfortunate, leaving readers without understanding what is really going on.
Let me digress a bit to consider the following question: What would a user find "intuitive" when using
reduce
withUPDATE
returning zero or more than one outputs? For this, I find it interesting to consider the question: What has a user learned so far about jq?In particular, over time, users infer a number of principles from jq's behaviour:
I find the second point particularly important.
For example, a user learns that
(1, 2) + (3, 4)
evaluates to all combinations of sums.Similarly, a user learns that running a function
def f($x; $y): ...; f(1, 2; 3, 4)
is not the same as runningf(1; 3)
orf(2; 4)
--- again, it runs all combinations.Also, a user learns that
.[] | . + 1
takes all values returned by.[]
and add 1 to every one of these values.This behaviour is everywhere in jq, and I believe that most reasonably advanced jq users recognise it as a core principle of the language.1
Now let us look at the current implementation of
reduce
:Even after @itchyny's change,
reduce
actually throws away all results ofUPDATE
except for the last, although it takes the effort to actually compute the outputs.I would argue that this goes pretty much against user expectations in two ways --- first,
reduce
does not use all outputs(violating the core principle that I postulated above), and in addition, it still actually computes all outputs.
You can see this by an example:
foreach
is a bit better in that respect, because it pipes all outputs ofUPDATE
toEXTRACT
, however, it also does not consider these outputs in future iterations ofUPDATE
.In contrast, @itchyny's desugarings of
reduce
andforeach
have a number of advantages:UPDATE
always yields a single output, which is the case most users care about anyway.UPDATE
, which I believe to align more closely with user expectations (as explained above).UPDATE
returns infinitely many outputs, then executing @itchyny's desugarings yields outputs, whereas in the current jq behaviour, no outputs are yielded.|
and,
), meaning that users can infer naturally howupdates with
reduce
/foreach
on the left-hand side are executed. (Update withreduce
/foreach
on the LHS currently work only to a limited extent.)Thinking about the disadvantages of @itchyny's desugarings, I can honestly think about two main issues:
Existing jq scripts might depend on the current actual jq behaviour of handling
UPDATE
. However, even @itchyny's recent change (fix: reduce/foreach state variable should not be reset each iteration #3205) breaks compatibility to some degree, so perhaps this would be a good opportunity to go all the way here. Especially because so far, the current jq behaviour is not documented.When
UPDATE
may return more than one value, executing @itchyny's desugaring makes it necessary to store on the stacka closure to retrieve the remaining values later. In particular, this can become a problem when we run
foreach
withan infinite number of input elements
$x
, because each$x
then increases the stack size, which effectively limits the number of input elements that we can process. However, I have not met this issue so far in practice.TL;DR: I would like the jq maintainers to reconsider how
reduce
/foreach
are to be executed whenUPDATE
yields not exactly one output. I hope to have made a convincing case for @itchyny's desugarings.Having implemented them myself in jaq, I can speak from experience when saying that they are not that hard to implement and no users complained about them yet. :)
Footnotes
There are exceptions to this principle, but these usually exist for good reasons; for example,
{a: 1} | .a |= (2, 3)
yields only{"a":2}
, because an object cannot have more than one value for a single field. ↩The text was updated successfully, but these errors were encountered: