Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce/foreach behaviour when UPDATE yields not exactly one output #3211

Open
01mf02 opened this issue Nov 29, 2024 · 1 comment
Open

reduce/foreach behaviour when UPDATE yields not exactly one output #3211

01mf02 opened this issue Nov 29, 2024 · 1 comment

Comments

@01mf02
Copy link

01mf02 commented Nov 29, 2024

In #3205, @itchyny changed the behaviour of foreach/reduce when UPDATE returns no outputs.
In this issue, I would like to discuss a more far-reaching modification of the behaviour of reduce/foreach based on the current description in the jq manual, which I call @itchyny's desugarings.

The jq manual describes reduce/foreach using desugarings given by @itchyny. In a nutshell, these desugarings suggest that reduce xs as $x (INIT; UPDATE) yields the following, when xs yields x1, ..., xn:

INIT
| x1 as $x | UPDATE
...
| xn as $x | UPDATE

Similarly for foreach xs as $x (INIT; UPDATE; EXTRACT):

INIT |
x1 as $x | UPDATE | EXTRACT,
...
xn as $x | UPDATE | EXTRACT,
empty

However, these desugarings correspond to the current jq behaviour only if UPDATE yields exactly one value.
The manual does not make this explicit, using rather vague explanations such as "So the effect is similar to running something like this". I find this kind of formulation quite unfortunate, leaving readers without understanding what is really going on.

Let me digress a bit to consider the following question: What would a user find "intuitive" when using reduce with
UPDATE returning zero or more than one outputs? For this, I find it interesting to consider the question: What has a user learned so far about jq?

In particular, over time, users infer a number of principles from jq's behaviour:

  1. jq filters can yield any number of outputs, between zero and infinite.
  2. jq tries to consider all outputs of a filter, if possible.

I find the second point particularly important.
For example, a user learns that (1, 2) + (3, 4) evaluates to all combinations of sums.
Similarly, a user learns that running a function def f($x; $y): ...; f(1, 2; 3, 4) is not the same as running f(1; 3) or f(2; 4) --- again, it runs all combinations.
Also, a user learns that .[] | . + 1 takes all values returned by .[] and add 1 to every one of these values.
This behaviour is everywhere in jq, and I believe that most reasonably advanced jq users recognise it as a core principle of the language.1

Now let us look at the current implementation of reduce:
Even after @itchyny's change, reduce actually throws away all results of UPDATE except for the last, although it takes the effort to actually compute the outputs.
I would argue that this goes pretty much against user expectations in two ways --- first, reduce does not use all outputs
(violating the core principle that I postulated above), and in addition, it still actually computes all outputs.
You can see this by an example:

$ ./jq -n 'reduce (1, 2) as $x (0; 2, 3 | debug)'
["DEBUG:",2]
["DEBUG:",3]
["DEBUG:",2]
["DEBUG:",3]
3

foreach is a bit better in that respect, because it pipes all outputs of UPDATE to EXTRACT, however, it also does not consider these outputs in future iterations of UPDATE.

In contrast, @itchyny's desugarings of reduce and foreach have a number of advantages:

  • They are the most compact description of what happens when UPDATE always yields a single output, which is the case most users care about anyway.
  • They are formulated using the jq language itself, whereas the behaviour currently implemented in jq needs to be described by pseudo-code (which is not currently done anywhere else in the jq manual).
  • @itchyny's desugarings consider all outputs of UPDATE, which I believe to align more closely with user expectations (as explained above).
  • When UPDATE returns infinitely many outputs, then executing @itchyny's desugarings yields outputs, whereas in the current jq behaviour, no outputs are yielded.
  • @itchyny's desugarings only use operators that can be used together with updates (| and ,), meaning that users can infer naturally how
    updates with reduce/foreach on the left-hand side are executed. (Update with reduce/foreach on the LHS currently work only to a limited extent.)

Thinking about the disadvantages of @itchyny's desugarings, I can honestly think about two main issues:

  • Compatibility:
    Existing jq scripts might depend on the current actual jq behaviour of handling UPDATE. However, even @itchyny's recent change (fix: reduce/foreach state variable should not be reset each iteration #3205) breaks compatibility to some degree, so perhaps this would be a good opportunity to go all the way here. Especially because so far, the current jq behaviour is not documented.
  • Performance:
    When UPDATE may return more than one value, executing @itchyny's desugaring makes it necessary to store on the stack
    a closure to retrieve the remaining values later. In particular, this can become a problem when we run foreach with
    an infinite number of input elements $x, because each $x then increases the stack size, which effectively limits the number of input elements that we can process. However, I have not met this issue so far in practice.

TL;DR: I would like the jq maintainers to reconsider how reduce/foreach are to be executed when UPDATE yields not exactly one output. I hope to have made a convincing case for @itchyny's desugarings.
Having implemented them myself in jaq, I can speak from experience when saying that they are not that hard to implement and no users complained about them yet. :)

Footnotes

  1. There are exceptions to this principle, but these usually exist for good reasons; for example, {a: 1} | .a |= (2, 3) yields only {"a":2}, because an object cannot have more than one value for a single field.

@pkoppstein
Copy link
Contributor

@01mf02 wrote:

... reconsider how reduce/foreach are to be executed when UPDATE yields not exactly one output.

And so we see again the stability issues of three-body systems! The three here being the C, Go and Rust implementations.
And the "stability" issue here being compounded by the need for "backwards compatibility".

One way to navigate around these difficulties would be to adopt strategies and tactics for the short term, the medium term, and the long term. In the short-term, we could say that certain behaviors are currently "not specified in detail" (i.e. undefined but within certain constraints); in the medium term, we could work on getting agreement on detailed specifications for a "converged jq"; and in the long-term, the three implementations would converge according to each implementation's assurance of backward compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants