Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Coercion Rules #186

Open
JohnBrinkman opened this issue Nov 15, 2024 · 4 comments
Open

Update Coercion Rules #186

JohnBrinkman opened this issue Nov 15, 2024 · 4 comments

Comments

@JohnBrinkman
Copy link
Collaborator

With the proposed changes for issue #185 (Allow functions to process arrays) we will have many function calls that currently work, but will fail once many functions accept more parameter types. Those failures happen because of the coercion rules.

The changes proposed here modify our coercion rules so that many of those function calls will continue to work.
These modified rules are breaking changes, that would be combined with the changes of #185 to define json-formula 2.0.0

Proposal 1

Parameters to functions shall be coerced when there is a single viable coercion available. For example, if a null value is provided to a function that accepts a number or string, then coercion shall not happen, since a null value can be coerced to both types. Conversely if a string is provided to a function that accepts a number or array of numbers, then the string shall be coerced to a number, since there is no supported coercion to convert it to an array of numbers.

Proposal 2

Today, a null value can be coerced to all other data types: number (0), string (""), boolean (false), array ([]) and object ({}).

The problem is that in the context of function calls we never coerce nulls to a function parameter with multiple types, since nulls can be converted to anything.

For example, calling upper(null) will fail because upper() accepts either a string or array of strings, and null can be coerced to either.

We propose that we no longer coerce null values to empty arrays and empty objects.

It makes sense to convert nulls to a number, string or boolean. It makes less sense to convert to an array or object.

With this change many functions will not throw an error when passed a null. e.g. abs(null) will return zero rather than throw a TypeError.

@Eswcvlad
Copy link

It makes sense to convert nulls to a number, string or boolean. It makes less sense to convert to an array or object.

Not sure I agree. In all these cases type coercion is, pretty much, the same and returns the "default" value of the type. Whether it make sense or not is more context dependent.

For function arguments like T | T[] it make less sense to coerce null to an empty array instead of a scalar. But if a function accepts only an array, it seems pretty reasonable.

I would say it would make sense if either all of them threw an error (akin to a strict language with NullPointerException) or none of them did (akin to a more lenient language with default values). The main concern with the first approach, that you would need to wrap input/form data with notNull for robustness, which might not be ideal (unless there is an explicit default value for fields).

Another alternative might be to leave null type coercion rules as-is and add explicit null type handling for functions, so that it would just propagate null forward. For example, have upper(`null`) return null. In practice this will result in delaying type coercion further down the expression tree, where there might be less ambiguity.


Also it's not only null, which is problematic, but scalar -> array implicit conversions as well. It seems pretty weird, that round("7.6") would throw a TypeError, but round(`["7.6"]`) on the other hand doesn't...

It also means, that round(["7.6", "8.4"]) works, but ["7.6", "8.4"].map(@, &round(@)) doesn't, which seems odd, as the idea is for them to be synonymous.

Not sure, what the best solution for this is, besides removing those conversions as well...

@JohnBrinkman
Copy link
Collaborator Author

Another option for the null issues is that we adopt Proposal 2, but we change function signatures to explicitly allow null.
e.g. upper() could allow string | string[] | null. Not necessarily recommending the idea yet -- just putting another option on the table. Need to give this more thought.

On the other examples, as a result of "Proposal 1" they would not throw errors.
e.g. round("7.6") would work because round() accepts number | number[]. And since "7.6" cannot be coerced to number[], we can unambiguously coerce it to 7.6.

@Eswcvlad
Copy link

Another option for the null issues is that we adopt Proposal 2, but we change function signatures to explicitly allow null. e.g. upper() could allow string | string[] | null. Not necessarily recommending the idea yet -- just putting another option on the table. Need to give this more thought.

Yeah, I was also thinking about that. In that case you could "force" it to coerce to a scalar. Though, depending on the context, if null appeared instead of an array, it might be a bit unexpected.

Also, after some more thinking, my alternative with null forwarding might also not be ideal for consistency. As, for example, upper(`null`) would be null, but something like upper(["a", null]) would be ["A", ""], as array will be casted to string[]. So there are potential pitfalls there as well.

On the other examples, as a result of "Proposal 1" they would not throw errors. e.g. round("7.6") would work because round() accepts number | number[]. And since "7.6" cannot be coerced to number[], we can unambiguously coerce it to 7.6.

Oh... I was thinking this is similar to the integer situation, where there is no integer type per-se, but just number, and the function ignores the fractional part. But in this case number[] is, actually, just array and the function interprets elements as number. So a string would be coerced to array as per the type coercion table and then the function will handle the elements.

Don't remember, whether there is explicit information on coercing to T[] in the spec as opposed to just array. Might be worth adding, if it is not there.

@JohnBrinkman
Copy link
Collaborator Author

Don't remember, whether there is explicit information on coercing to T[] in the spec as opposed to just array. Might be worth adding, if it is not there.

Agreed. e.g. The spec indicates that we will coerce a number to a string, but doesn't explicitly say we'd convert a number array to a string array.

There is yet another possibility for handling nulls inside arrays. For aggregating numeric functions (max, min, sum, stddev etc) we should consider changing the behavior to ignore null entries in the array.
e.g. avg([2, 4, 6, null, null, null) should be 4. Whereas if we coerce null to zero it will be 2.
This would be consistent with spreadsheet behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants