-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TypeSignature::Comparable and update NullIf
signature
#13356
Conversation
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
@Omega359 ptal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about the signature change causing queries that used to work to now fail. I am not certain that the signature change for nullif is for the better for that reason.
With respect to the TypeSignature::Boolean I left some comments.
let valid_types = valid_types.swap_remove(0); | ||
if let Some(t) = maybe_data_types_without_coercion(&valid_types, current_types) { | ||
return Ok(t); | ||
} | ||
} else { | ||
// TODO: Deprecate this branch after all signatures are well-supported (aka coercion are happend already) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: Deprecate this branch after all signatures are well-supported (aka coercion are happend already) | |
// TODO: Deprecate this branch after all signatures are well-supported (aka coercion has happened already) |
TypeSignature::Boolean(number) => { | ||
function_length_check(current_types.len(), *number)?; | ||
|
||
// Find common boolean type amongs given types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Find common boolean type amongs given types | |
// Find common boolean type amongst the given types |
function_length_check(current_types.len(), *number)?; | ||
|
||
// Find common boolean type amongs given types | ||
let mut valid_type = current_types.first().unwrap().to_owned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm mistaken there is only one possible valid type here - Boolean doesn't have multiple types, does it? If so, I don't see the need for this variable nor the code below the for loop. valid_type must be DataType::Boolean, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure the given types are all boolean, if it isn't we should return error here. If we got null, we return boolean
2, | ||
SUPPORTED_NULLIF_TYPES.to_vec(), | ||
signature: Signature::one_of( | ||
// Hack: String is at the beginning so the return type is String if both args are Nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'd call that a hack tbh
---- | ||
1 | ||
|
||
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to work, I just ran this locally against v43. I can't see a reason why this should no longer be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select nullif(2, 'a');
This query failed in Postgres, I think we should return error for this query
# TODO: support numeric string | ||
# This query success in Postgres and DuckDB | ||
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 | ||
select nullif(2, '1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also used to work.
query T
select nullif(2, '1');
----
2
Interesting that the type is text vs number but still, it did work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a hard issue to fix, since we can't tell the difference between Unknown type and String type currently
This query pass in main
too but it should not.
query T
select nullif('1'::varchar, 2);
----
1
The change did have regression on nullif(2, '1')
, but also fix nullif('1'::varchar, 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue in #13240 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions against this PR because it fixes nullif('1'::varchar, 2)
and refines the TypeSignature
. We could file a ticket for it and address it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #13285
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duckdb can run this query
D select nullif('2'::varchar, 2);
┌─────────────────────────────────┐
│ nullif(CAST('2' AS VARCHAR), 2) │
│ varchar │
├─────────────────────────────────┤
│ │
└─────────────────────────────────┘
😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me what is important is not strictly what other systems support - those may/could be a guide to what datafusion will support - but whether the provided arguments to a signature can be losslessly converted to what the signature accepts and whether it logically makes sense to do so.
I personally would rather be lenient for what is accepted and do casting/coercion as required than to be strict and push the onus onto the user to do that. That's just me though, I don't know if that is the general consensus of the community. Perhaps we should file a discussion ticket with the options and decide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just double checked on this branch and it seems good to me (supports things reasonably)
Running `target/debug/datafusion-cli`
DataFusion CLI v43.0.0
> select nullif(2, '1');
+----------------------------+
| nullif(Int64(2),Utf8("1")) |
+----------------------------+
| 2 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.027 seconds.
> select nullif('1'::varchar, 2);
+----------------------------+
| nullif(Utf8("1"),Int64(2)) |
+----------------------------+
| 1 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
Signed-off-by: jayzhan211 <[email protected]>
TypeSignature::String(arg_count) => get_data_types(&NativeType::String) | ||
.into_iter() | ||
.map(|dt| vec![dt; *arg_count]) | ||
.collect::<Vec<_>>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
# TODO: support numeric string | ||
# This query success in Postgres and DuckDB | ||
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 | ||
select nullif(2, '1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions against this PR because it fixes nullif('1'::varchar, 2)
and refines the TypeSignature
. We could file a ticket for it and address it in a follow-up.
Signed-off-by: jayzhan211 <[email protected]>
NullIf
signatureNullIf
signature
I revert to the previous function's behaviour since it makes more sense to me |
This is ready for review too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- I think this looks like a nice improvement.
I have some suggestions on improving the documentation to be more explicit about what TypeSignature::Comparable
means but I also think we could implement that as a follow on
function_length_check(current_types.len(), *num)?; | ||
let mut target_type = current_types[0].to_owned(); | ||
for data_type in current_types.iter().skip(1) { | ||
if let Some(dt) = comparison_coercion_numeric(&target_type, data_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like TypeSignature::Comparable
has different rules that, for example, =
or <
(it prefers numeric)
I think this is fine, but should probably be documented better / more explicitly as it might be expected
Also the coercion / defaulting to Utf8
might also be a good thing to document
# TODO: support numeric string | ||
# This query success in Postgres and DuckDB | ||
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64 | ||
select nullif(2, '1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just double checked on this branch and it seems good to me (supports things reasonably)
Running `target/debug/datafusion-cli`
DataFusion CLI v43.0.0
> select nullif(2, '1');
+----------------------------+
| nullif(Int64(2),Utf8("1")) |
+----------------------------+
| 2 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.027 seconds.
> select nullif('1'::varchar, 2);
+----------------------------+
| nullif(Utf8("1"),Int64(2)) |
+----------------------------+
| 1 |
+----------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- I think this looks like a nice improvement.
I have some suggestions on improving the documentation to be more explicit about what TypeSignature::Comparable
means but I also think we could implement that as a follow on
@@ -113,6 +113,8 @@ pub enum TypeSignature { | |||
/// arguments like `vec![DataType::Int32]` or `vec![DataType::Float32]` | |||
/// since i32 and f32 can be casted to f64 | |||
Coercible(Vec<LogicalTypeRef>), | |||
/// The number of arguments that are comparable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to define what comparable
means in this context?
Signed-off-by: Jay Zhan <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
Which issue does this PR close?
Part of #13301
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?