GH-4878 optimise sub-select #4879

hmottestad · 2024-01-23T10:40:21Z

GitHub issue resolved: #4878

Briefly describe the changes proposed in this PR:

PR Author Checklist (see the contributor guidelines for more details):

my pull request is self-contained
I've added tests for the changes I made
I've applied code formatting (you can use mvn process-resources to format from the command line)
I've squashed my commits where necessary
every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

hmottestad · 2024-01-23T21:26:43Z

Develop branch

Benchmark                                                     Mode  Cnt    Score    Error  Units
QueryBenchmark.complexQuery                                   avgt    5    1.026 ±  0.024  ms/op
QueryBenchmark.different_datasets_with_similar_distributions  avgt    5    0.475 ±  0.003  ms/op
QueryBenchmark.groupByQuery                                   avgt    5    0.591 ±  0.002  ms/op
QueryBenchmark.long_chain                                     avgt    5  165.798 ±  2.056  ms/op
QueryBenchmark.lots_of_optional                               avgt    5   42.455 ±  0.284  ms/op
QueryBenchmark.minus                                          avgt    5  896.323 ± 26.089  ms/op
QueryBenchmark.nested_optionals                               avgt    5   62.486 ±  0.219  ms/op
QueryBenchmark.pathExpressionQuery1                           avgt    5    5.174 ±  0.081  ms/op
QueryBenchmark.pathExpressionQuery2                           avgt    5    0.491 ±  0.003  ms/op
QueryBenchmark.query_distinct_predicates                      avgt    5   51.896 ±  0.587  ms/op
QueryBenchmark.simple_filter_not                              avgt    5    2.072 ±  1.333  ms/op

This branch

Benchmark                                                     Mode  Cnt    Score    Error  Units
QueryBenchmark.complexQuery                                   avgt    5    1.061 ±  0.004  ms/op
QueryBenchmark.different_datasets_with_similar_distributions  avgt    5    0.472 ±  0.002  ms/op
QueryBenchmark.groupByQuery                                   avgt    5    0.606 ±  0.003  ms/op
QueryBenchmark.long_chain                                     avgt    5  172.897 ± 18.938  ms/op
QueryBenchmark.lots_of_optional                               avgt    5   44.313 ±  3.020  ms/op
QueryBenchmark.minus                                          avgt    5  914.117 ± 23.632  ms/op
QueryBenchmark.nested_optionals                               avgt    5   64.688 ±  2.071  ms/op
QueryBenchmark.pathExpressionQuery1                           avgt    5    5.386 ±  0.596  ms/op
QueryBenchmark.pathExpressionQuery2                           avgt    5    0.478 ±  0.027  ms/op
QueryBenchmark.query_distinct_predicates                      avgt    5   53.481 ±  1.733  ms/op
QueryBenchmark.simple_filter_not                              avgt    5    1.878 ±  0.519  ms/op

…RDF4J

hmottestad · 2024-01-24T14:07:53Z

@JervenBolleman Any chance you could take a look at this PR? I made a test, but I didn't make a benchmark query yet. The benchmark run I did shows that there at least doesn't seem to be any performance degradation.

JervenBolleman

I think this is a good improvement and worth committing even without a benchmark query. A one of diff test is fine by me.

JervenBolleman · 2024-01-24T20:23:52Z

...on/src/test/java/org/eclipse/rdf4j/query/algebra/evaluation/impl/QueryCostEstimatesTest.java

-				+ LINE_SEP,
-				q.getTupleExpr().toString());
+		assertThat(q.getTupleExpr().toString()).isEqualToNormalizingNewlines("QueryRoot\n" +
+				"   Projection\n" +


Should we not have + LINE_SEP instead of \n to be consistent?

isEqualToNormalizingNewlines fixes the new line seperators for us

But might change it back actually, since it's the only place that tests the line separation aspect of the query plan.

JervenBolleman · 2024-01-24T20:27:00Z

...queryalgebra/model/src/main/java/org/eclipse/rdf4j/query/algebra/AbstractQueryModelNode.java

@@ -170,16 +170,16 @@ public void setTotalTimeNanosActual(long totalTimeNanosActual) {
 	/**
 	 * @return Human readable number. Eg. 12.1M for 1212213.4 and UNKNOWN for -1.
 	 */
-	static String toHumanReadbleNumber(double number) {
+	static String toHumanReadableNumber(double number) {


Thanks for catching this typo. Should we extract this into an utility at somepoint?

Could be moved to a more common module, but I don't think I want to do that now.

JervenBolleman · 2024-01-24T20:28:54Z

core/sail/memory/src/test/java/org/eclipse/rdf4j/sail/memory/QueryPlanRetrievalTest.java

-					+
-					"         │              o: Var (name=d)\n" +
-					"         └── StatementPattern (resultSizeActual=2) [right]\n" +
+					"         │        ╚══ LeftJoin (new scope) (BadlyDesignedLeftJoinIterator) (costEstimate=6.61, resultSizeEstimate=12, resultSizeActual=4) [right]\n"


This looks like a nice improvement in the query plan.

hmottestad · 2024-01-24T22:18:51Z

@JervenBolleman I actually found out that the QueryJoinOptimizer is able to optimise sub-selects, but not if there are multiple sub-selects or if there are any BIND clauses anywhere. I've made two benchmarks that will make sure that we don't break that optimisation later.

hmottestad force-pushed the GH-4878-optimise-sub-select branch from 909870f to a8b468f Compare January 23, 2024 15:11

GH-4878 optimise sub-selects

c1d042b

hmottestad force-pushed the GH-4878-optimise-sub-select branch from a8b468f to c1d042b Compare January 23, 2024 21:05

JoinVisitor needs to be possible to extend for developers who extend …

eb1376a

…RDF4J

hmottestad requested a review from JervenBolleman January 24, 2024 14:06

JervenBolleman approved these changes Jan 24, 2024

View reviewed changes

add benchmark

c0aedf9

hmottestad force-pushed the GH-4878-optimise-sub-select branch from 5a31ca3 to c0aedf9 Compare January 24, 2024 22:17

adjustments based on code review

f1d136a

hmottestad enabled auto-merge (squash) January 25, 2024 10:55

hmottestad merged commit 5f67425 into develop Jan 25, 2024
8 checks passed

hmottestad deleted the GH-4878-optimise-sub-select branch January 25, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-4878 optimise sub-select #4879

GH-4878 optimise sub-select #4879

hmottestad commented Jan 23, 2024 •

edited

Loading

hmottestad commented Jan 23, 2024 •

edited

Loading

hmottestad commented Jan 24, 2024

JervenBolleman left a comment

JervenBolleman Jan 24, 2024

hmottestad Jan 24, 2024

hmottestad Jan 24, 2024

JervenBolleman Jan 24, 2024

hmottestad Jan 24, 2024

JervenBolleman Jan 24, 2024

hmottestad commented Jan 24, 2024

GH-4878 optimise sub-select #4879

GH-4878 optimise sub-select #4879

Conversation

hmottestad commented Jan 23, 2024 • edited Loading

hmottestad commented Jan 23, 2024 • edited Loading

Develop branch

This branch

hmottestad commented Jan 24, 2024

JervenBolleman left a comment

Choose a reason for hiding this comment

JervenBolleman Jan 24, 2024

Choose a reason for hiding this comment

hmottestad Jan 24, 2024

Choose a reason for hiding this comment

hmottestad Jan 24, 2024

Choose a reason for hiding this comment

JervenBolleman Jan 24, 2024

Choose a reason for hiding this comment

hmottestad Jan 24, 2024

Choose a reason for hiding this comment

JervenBolleman Jan 24, 2024

Choose a reason for hiding this comment

hmottestad commented Jan 24, 2024

hmottestad commented Jan 23, 2024 •

edited

Loading

hmottestad commented Jan 23, 2024 •

edited

Loading