Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce TypeSignature::Comparable and update NullIf signature #13356

Merged
merged 8 commits into from
Nov 22, 2024

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Nov 11, 2024

Which issue does this PR close?

Part of #13301

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions labels Nov 11, 2024
@jayzhan211 jayzhan211 requested a review from Weijun-H November 13, 2024 00:51
@jayzhan211
Copy link
Contributor Author

@Omega359 ptal

Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about the signature change causing queries that used to work to now fail. I am not certain that the signature change for nullif is for the better for that reason.

With respect to the TypeSignature::Boolean I left some comments.

let valid_types = valid_types.swap_remove(0);
if let Some(t) = maybe_data_types_without_coercion(&valid_types, current_types) {
return Ok(t);
}
} else {
// TODO: Deprecate this branch after all signatures are well-supported (aka coercion are happend already)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: Deprecate this branch after all signatures are well-supported (aka coercion are happend already)
// TODO: Deprecate this branch after all signatures are well-supported (aka coercion has happened already)

TypeSignature::Boolean(number) => {
function_length_check(current_types.len(), *number)?;

// Find common boolean type amongs given types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Find common boolean type amongs given types
// Find common boolean type amongst the given types

function_length_check(current_types.len(), *number)?;

// Find common boolean type amongs given types
let mut valid_type = current_types.first().unwrap().to_owned();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken there is only one possible valid type here - Boolean doesn't have multiple types, does it? If so, I don't see the need for this variable nor the code below the for loop. valid_type must be DataType::Boolean, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure the given types are all boolean, if it isn't we should return error here. If we got null, we return boolean

2,
SUPPORTED_NULLIF_TYPES.to_vec(),
signature: Signature::one_of(
// Hack: String is at the beginning so the return type is String if both args are Nulls
Copy link
Contributor

@Omega359 Omega359 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'd call that a hack tbh

----
1

query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to work, I just ran this locally against v43. I can't see a reason why this should no longer be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select nullif(2, 'a');

This query failed in Postgres, I think we should return error for this query

# TODO: support numeric string
# This query success in Postgres and DuckDB
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64
select nullif(2, '1');
Copy link
Contributor

@Omega359 Omega359 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also used to work.

query T
select nullif(2, '1');
----
2

Interesting that the type is text vs number but still, it did work.

Copy link
Contributor Author

@jayzhan211 jayzhan211 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a hard issue to fix, since we can't tell the difference between Unknown type and String type currently

This query pass in main too but it should not.

query T
select nullif('1'::varchar, 2);
----
1

The change did have regression on nullif(2, '1'), but also fix nullif('1'::varchar, 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue in #13240 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions against this PR because it fixes nullif('1'::varchar, 2) and refines the TypeSignature. We could file a ticket for it and address it in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #13285

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duckdb can run this query

D select nullif('2'::varchar, 2);
┌─────────────────────────────────┐
│ nullif(CAST('2' AS VARCHAR), 2) │
│             varchar             │
├─────────────────────────────────┤
│                                 │
└─────────────────────────────────┘

😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me what is important is not strictly what other systems support - those may/could be a guide to what datafusion will support - but whether the provided arguments to a signature can be losslessly converted to what the signature accepts and whether it logically makes sense to do so.

I personally would rather be lenient for what is accepted and do casting/coercion as required than to be strict and push the onus onto the user to do that. That's just me though, I don't know if that is the general consensus of the community. Perhaps we should file a discussion ticket with the options and decide?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just double checked on this branch and it seems good to me (supports things reasonably)

     Running `target/debug/datafusion-cli`
DataFusion CLI v43.0.0
> select nullif(2, '1');
+----------------------------+
| nullif(Int64(2),Utf8("1")) |
+----------------------------+
| 2                          |
+----------------------------+
1 row(s) fetched.
Elapsed 0.027 seconds.

> select nullif('1'::varchar, 2);
+----------------------------+
| nullif(Utf8("1"),Int64(2)) |
+----------------------------+
| 1                          |
+----------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

@jayzhan211 jayzhan211 added the regression Something that used to work no longer does label Nov 13, 2024
Signed-off-by: jayzhan211 <[email protected]>
Comment on lines +299 to +302
TypeSignature::String(arg_count) => get_data_types(&NativeType::String)
.into_iter()
.map(|dt| vec![dt; *arg_count])
.collect::<Vec<_>>(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# TODO: support numeric string
# This query success in Postgres and DuckDB
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64
select nullif(2, '1');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions against this PR because it fixes nullif('1'::varchar, 2) and refines the TypeSignature. We could file a ticket for it and address it in a follow-up.

@jayzhan211 jayzhan211 marked this pull request as draft November 15, 2024 07:11
@jayzhan211 jayzhan211 marked this pull request as ready for review November 15, 2024 07:53
@jayzhan211 jayzhan211 marked this pull request as draft November 15, 2024 08:40
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 changed the title Introduce TypeSignature::Boolean and update NullIf signature Introduce TypeSignature::Comparable and update NullIf signature Nov 15, 2024
@jayzhan211
Copy link
Contributor Author

I revert to the previous function's behaviour since it makes more sense to me

@jayzhan211 jayzhan211 removed the regression Something that used to work no longer does label Nov 15, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review November 15, 2024 09:45
@jayzhan211
Copy link
Contributor Author

This is ready for review too

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I think this looks like a nice improvement.

I have some suggestions on improving the documentation to be more explicit about what TypeSignature::Comparable means but I also think we could implement that as a follow on

THank you @Omega359 and @Weijun-H for the reviews

function_length_check(current_types.len(), *num)?;
let mut target_type = current_types[0].to_owned();
for data_type in current_types.iter().skip(1) {
if let Some(dt) = comparison_coercion_numeric(&target_type, data_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like TypeSignature::Comparable has different rules that, for example, = or < (it prefers numeric)

I think this is fine, but should probably be documented better / more explicitly as it might be expected

Also the coercion / defaulting to Utf8 might also be a good thing to document

# TODO: support numeric string
# This query success in Postgres and DuckDB
query error DataFusion error: Error during planning: Internal error: Failed to match any signature, errors: Error during planning: The signature expected NativeType::String but received NativeType::Int64
select nullif(2, '1');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just double checked on this branch and it seems good to me (supports things reasonably)

     Running `target/debug/datafusion-cli`
DataFusion CLI v43.0.0
> select nullif(2, '1');
+----------------------------+
| nullif(Int64(2),Utf8("1")) |
+----------------------------+
| 2                          |
+----------------------------+
1 row(s) fetched.
Elapsed 0.027 seconds.

> select nullif('1'::varchar, 2);
+----------------------------+
| nullif(Utf8("1"),Int64(2)) |
+----------------------------+
| 1                          |
+----------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I think this looks like a nice improvement.

I have some suggestions on improving the documentation to be more explicit about what TypeSignature::Comparable means but I also think we could implement that as a follow on

THank you @Omega359 and @Weijun-H for the reviews

@@ -113,6 +113,8 @@ pub enum TypeSignature {
/// arguments like `vec![DataType::Int32]` or `vec![DataType::Float32]`
/// since i32 and f32 can be casted to f64
Coercible(Vec<LogicalTypeRef>),
/// The number of arguments that are comparable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to define what comparable means in this context?

@jayzhan211 jayzhan211 merged commit 2f150f6 into apache:main Nov 22, 2024
25 checks passed
@jayzhan211 jayzhan211 deleted the nullif branch November 22, 2024 03:20
@jayzhan211
Copy link
Contributor Author

Thanks @alamb @Omega359 @Weijun-H

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants