Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrites PLUS and BITWISE_AND implementations using new modeling #1647

Merged
merged 6 commits into from
Nov 19, 2024

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Nov 14, 2024

Relevant Issues

Description

  • Rewrites PLUS and BITWISE_AND
    • Slightly updates how function instances are chosen. See FnResolver.
    • See ArithmeticDiadicOperator. It holds a table of pointers to functions given two argument types. The function is in charge of providing the closest function impl, but the engine is still in charge of coercions and final resolution.
  • Passes all tests, and now is even more accurate. See the new plusTests in PartiQLEvaluatorTests
  • Dynamic calls now hold functions, not functions instances since they haven't been resolved yet.

Performance Implications

This should make dynamic dispatches faster, since we now no longer need to loop over several candidates. They are consolidated. On top of that, the choosing of an implementation is internalized and uses a lookup table to find the most appropriate instance for PLUS and BITWISE_AND.

Edit: I ran some quick benchmarks to confirm this theory, and the assumption is correct. I've published these benchmarks via a tag on my fork. Comparing the new plus operator and the old minus operator:

Benchmark                               Mode  Cnt  Score    Error  Units
PartiQLBenchmark.minusIntDynamicStatic  avgt   20  0.150 ±  0.005  us/op
PartiQLBenchmark.minusIntStatic         avgt   20  0.010 ±  0.001  us/op
PartiQLBenchmark.plusIntDynamicStatic   avgt   20  0.081 ±  0.005  us/op
PartiQLBenchmark.plusIntStatic          avgt   20  0.010 ±  0.001  us/op

The static execution time remains the same (makes sense), but the dynamic invocation sees performance gains. An approximate speedup of 1.85. This translates to almost twice as fast. And, looking at the implementation of ExprCallDynamic, I can see how we can make this even faster 👍 .

Huge props to @rchowell for this modelling change.

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.03%. Comparing base (da22cf8) to head (aaa2a0c).
Report is 59 commits behind head on v1.

Additional details and impacted files
@@            Coverage Diff            @@
##                 v1    #1647   +/-   ##
=========================================
  Coverage     80.03%   80.03%           
  Complexity       47       47           
=========================================
  Files            19       19           
  Lines           506      506           
  Branches         23       23           
=========================================
  Hits            405      405           
  Misses           88       88           
  Partials         13       13           
Flag Coverage Δ
EXAMPLES 80.03% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnedquinn johnedquinn marked this pull request as ready for review November 14, 2024 19:14
@johnedquinn johnedquinn requested a review from rchowell November 14, 2024 19:14
* coercion groups.
*
* TODO: [UNKNOWN] should likely be removed in the future. However, it is needed due to literal nulls and missings.
* TODO: [DYNAMIC] should likely be removed in the future. This is currently only kept to map function signatures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if VARIANT should go to the DYNAMIC coercion family – as this family effectively says, "we don't know what it is, you have to ask the value itself", but I also don't know exactly how that might effect the plumbing.

No changes requested, just trying to brainstorm how coercions of variants work / not happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I talked to Josh about this last week -- how our variant is more similar to dynamic than say Spark's variant (which requires explicit casts). I'll leave this for now, but I think I'm inclined to agree.

return matches.first().match
val match = matches.first()
val fn = match.match.getInstance(args.toTypedArray()) ?: return null
return FnMatch.Static(fn, match.mapping)
}

// TODO: Do we care about preferred types? This is a PostgreSQL concept.
// 5. Run through all candidates and keep those that accept preferred types (of the input data type's type category) at the most positions where type conversion will be required.

// 6. Find the highest precedence one. NOTE: This is a remnant of the previous implementation. Whether we want
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes request, just saying that it makes sense to have a "tie"-breaker otherwise we'll have another dynamic dispatch in a place that didn't have dynamic values. I think it makes sense to follow PostgreSQL to break ties when the number of exact matches is the same.

Comment on lines -350 to +273
Fn_PLUS__INT8_INT8__INT8,
Fn_PLUS__INT16_INT16__INT16,
Fn_PLUS__INT32_INT32__INT32,
Fn_PLUS__INT64_INT64__INT64,
Fn_PLUS__INT_INT__INT,
Fn_PLUS__FLOAT32_FLOAT32__FLOAT32,
Fn_PLUS__FLOAT64_FLOAT64__FLOAT64,
Fn_PLUS__DECIMAL_ARBITRARY_DECIMAL_ARBITRARY__DECIMAL_ARBITRARY,
FnPlus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@@ -32,6 +32,7 @@ internal abstract class Fn_COLL_AGG__BAG__ANY(
override fun getInstance(args: Array<PType>): Function.Instance = instance

private val instance = object : Function.Instance(
name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do instances need names? Maybe its obvious, but I don't think these names should be user provided. Makes me think Function needs a builder with .addInstance(...) which controls adding the names so we can't have a scenario where a Function returns an instance with a different name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some logic checking the contents of the signature. Specifically, checking for is_struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IS_ is out of spec and would be its own node anyways. Other than is_struct, is there a reason why instances would need names?

I can see it being useful for introspection. I don't think it should be removed, but I don't think it should be user-provided.

@johnedquinn johnedquinn merged commit 6a4b3e2 into partiql:v1 Nov 19, 2024
7 checks passed
@johnedquinn johnedquinn deleted the v1-fn-consolidation branch November 19, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants