-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New approach to function call matching #307
Conversation
which no more needs to return a value
@@ -20,12 +20,12 @@ export default function supportsCase(format: FormatFn) { | |||
|
|||
it('formats CASE ... WHEN with an expression', () => { | |||
const result = format( | |||
"CASE toString(getNumber()) WHEN 'one' THEN 1 WHEN 'two' THEN 2 WHEN 'three' THEN 3 ELSE 4 END;" | |||
"CASE trim(sqrt(2)) WHEN 'one' THEN 1 WHEN 'two' THEN 2 WHEN 'three' THEN 3 ELSE 4 END;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in several other tests I've switched over to using the some widely adopted function names, so the tests would work across all dialects.
SELECT IF(dq.id_discounter_shopping = 2, dq.value, dq.value / 100), | ||
IF (dq.id_discounter_shopping = 2, 'amount', 'percentage') FROM foo); | ||
SELECT COALESCE(dq.id_discounter_shopping = 2, dq.value, dq.value / 100), | ||
COALESCE (dq.id_discounter_shopping = 2, 'amount', 'percentage') FROM foo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched from IF
to COALESCE
which is supported in all dialects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting that IF isn't as universal as COALESCE
@@ -164,7 +165,7 @@ describe('PlSqlFormatter', () => { | |||
`); | |||
expect(result).toBe(dedent` | |||
WITH | |||
t1(id, parent_id) AS ( | |||
t1 (id, parent_id) AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure which way is better here. It's not a function call.
However it should be possible to improve this with special parsing of WITH clause in the future.
@@ -114,10 +114,11 @@ describe('BigQueryFormatter', () => { | |||
`); | |||
}); | |||
|
|||
// TODO: Possibly incorrect formatting of STRUCT<>() and ARRAY<>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently an unfortunate change in formatting.
This whole typed arrays/structs parsing needs a complete rewrite though. The current solution is a big hack. It should really be parsed inside Parser not hacked together inside postProcess() of tokenizer.
'JSON_GROUP_OBJECT', | ||
'JSON_EACH', | ||
'JSON_TREE', | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found lots of SQLite functions that we were missing.
This change also paves the way for implementing separate uppercasing option for function names: #237 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this overall makes sense but agree that the keyword lists need to be reviewed 👍
This one is part of standard, unlike LAG()
Previously the approach to detecting function calls was as follows:
whitespaceBefore
fieldRESERVED_KEYWORD
orIDENTIFIER
is followed by(
and there's no whitespace between them.This has been a great hack to format function calls without really having a knowledge of which words are function names. But this approach has the fundamental downside of it relying on the whitespace of original code to make formatting decisions, which leads to surprising behavior (see #140 and #243) where re-formatting an already formatted SQL produces a different result.
The new approach introduced in this PR goes as follows:
type: RESERVED_FUNCTION_NAME
tokens.RESERVED_FUNCTION_NAME
is followed by(
.This is mostly how one would expect this to work. However there are various caveats:
DECIMAL(5, 2)
. For now this is solved by also including data types (that can take parameters) also to the functions list. In the future should completely separate data types from functions.CAST
is used in various forms, but as its main use is inCAST(expr AS type)
, it's for now listed alongside functions.ANY
andSOME
can be used as aggregate functions, in which case they should be formatted asHAVING ANY(condition)
. But a way more common usage is as an operator:WHERE x = ANY (1, 2, 3)
. So I've opted to remove these from list of functions.NULLIF
andCOALESCE
have sometimes been listed as keywords, at other times as function names. Changed them to be function names in every dialect. That really hints of a general problem that these keyword/function-name lists need a proper review.The major downside of this new approach is, that:
foo(1, 2)
are now formatted asfoo (1, 2)
.But there's also a major benefit from parsing side:
So, in the end it's about tradeoffs. IMHO the sacrifices are worth the gains.