-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gh 5121 fedx bind left join #5122
Conversation
.../src/main/java/org/eclipse/rdf4j/federated/evaluation/join/ControlledWorkerBindJoinBase.java
Show resolved
Hide resolved
/** | ||
* A {@link LookAheadIteration} for processing bind left join results. | ||
* | ||
* Algorithm: | ||
* | ||
* <ul> | ||
* <li>execute left bind join using regular bound join query</li> | ||
* <li>process result iteration similar to {@link BoundJoinVALUESConversionIteration}</li> | ||
* <li>remember seen set of bindings (using index) and add original bindings to those, i.e. put to result return all | ||
* non-seen bindings directly from the input</li> | ||
* | ||
* | ||
* @author Andreas Schwarte | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that this is used for OPTIONAL clauses. Is it used for anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in the docs. This is only for OPTIONALs
protected BindingSet convert(BindingSet bIn, int bIndex) throws QueryEvaluationException { | ||
QueryBindingSet res = new QueryBindingSet(); | ||
Iterator<Binding> bIter = bIn.iterator(); | ||
while (bIter.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a for-each look?
for (Binding bs : bIn){
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, can be simplified like that. I think when initially writing the bind join logic it was not possible
int bIndex = Integer.parseInt( | ||
bIn.getBinding(BoundJoinVALUESConversionIteration.INDEX_BINDING_NAME).getValue().stringValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the binding Value object ever a literal? So it would be possible to check if it's instanceof IntegerLiteral or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is always an integer literal. is passed as that in the VALUES clause. Have optimized it
.../src/main/java/org/eclipse/rdf4j/federated/evaluation/join/ControlledWorkerBindJoinBase.java
Show resolved
Hide resolved
...ion/src/main/java/org/eclipse/rdf4j/federated/evaluation/iterator/BindLeftJoinIteration.java
Outdated
Show resolved
Hide resolved
@hmottestad thanks for the feedback. I will get back to this PR soon (we are currently in the release finalization phase of our product and this requires most of my attention). Quick question: is there a timeline for the 5.1 release? I would like to see this change included (if I get it ready) such that we can make use of it in our next release. Ideally the 5.1 release would be available in the second half of november. What are the current project plans? |
We don't have any plans at the moment. I'm currently working on the last SHACL paths that aren't supported. I also have a PR for configuring the Apache HttpClient timeouts. And there is a request for adding support for configuring the document loader in the new JSON-LD 1.1 parser. Would really want the timeouts included, and if I start on the JSON-LD document loader then it shouldn't take more than a few weeks to get it merged. So we can say mid november for RDF4J 5.1. |
c058da3
to
40b03a5
Compare
Refactor the existing logic for executing bind joins into a reusable base class. This change mostly moves the implementation logic from the existing ControlledWorkerBindJoin class to a new intermediate implementation (with the goal to make it reusable in a second step for left joins). Note that the new bind join implementation no longer uses the ControlledWorkerJoin as base class, i.e. the decision of which join implementation to use is moved to the strategy. For backwards code compatibility the "ControlledWorkerBoundJoin" is kept, but no longer used. Instead the new code is in ControlledWorkerBindJoin.
Prepare to execute a specific implementation of a left join implementation through the federation strategy.
This change provides the implementation and activation for the left bind join operator. The algorithm is as follows: - execute left bind join using regular bound join query - process result iteration similar to BoundJoinVALUESConversionIteration - remember seen set of bindings (using index) and add original bindings to those, i.e. put to result return all non-seen bindings directly from the input Note that the terminology in literature has changed to "bind joins". Hence, for new classes and methods I try to follow that. Change is covered with some unit tests
Bind left joins for OPTIONAL can be disabled using the "enableOptionalAsBindJoin" flag in the federation config Integrate the switch between implementations in the unit test as parameterized test
- use for-each loop for iterating bindingset - use IntHashSet - use Literal#intValue instead of Integer#parseInt
40b03a5
to
01bc075
Compare
The PR should not be fully ready for review. I have addressed my remaining TODOS and the feedback from @hmottestad (Thanks for that btw). Can I ask for reviews on this? Note: when this is merged, I will create some additional improvements as part of orthogonal improvement issues (still to be created)
That sounds really great. Let's aim for that. Might be helpful for us in our product to test the 5.1 branch earlier in our product (to be able to react when there are still issues encountered during testing / using). Note: we cannot really use RDF4J snapshot versions. Would it be poissible to create a milestone build some time end of October for acceptance testing? |
I would like to merge this PR in the upcoming days, to continue with #5151 and #5152 Anybody interested in doing further reviews before merging (e.g. @hmottestad )? I will also ask a colleague of mine to help with a review, will likely be there early next week |
LGTM. Feel free to merge! |
thanks @hmottestad Have merged it. Unfortunately the merge commit message does not reference the PR correctly (and I messed up the commit message, has a lower case issue reference). Is this something that requires to be resolved? The individual commits below that merge commit are fine though |
Note: The PR is still WIP (some further changes, and especially unit test coverage will follow) however it is ready for early review. The goal is to have this available in RDF4J 5.1
GitHub issue resolved: #5121
Briefly describe the changes proposed in this PR:
The PR consists of multiple commits which add an implementation for bind left joins. It is recommended to follow the change commit by commit (as the first ones are preparational refactoring)
Commit 1: GH-5121: refactor the bind join logic into a reusable base class
Refactor the existing logic for executing bind joins into a reusable
base class.
This change mostly moves the implementation logic from the existing
ControlledWorkerBindJoin class to a new intermediate implementation
(with the goal to make it reusable in a second step for left joins).
Note that the bind join implementation no longer uses the
ControlledWorkerJoin as base class, i.e. the decision of which join
implementation to use is moved to the strategy.
For backwards code compatibility the "ControlledWorkerBoundJoin" is
kept, but no longer used. Instead the new code is in
ControlledWorkerBindJoin.
Commit 2: GH-5121: prepare execution of left joins in the federation strategy
Prepare to execute a specific implementation of a left join
implementation through the federation strategy.
Commit 3: GH-5121: implementation of left bind join operator
This change provides the implementation and activation for the left bind
join operator.
The algorithm is as follows:
to those, i.e. put to result return all non-seen bindings directly from
the input
Note that the terminology in literature has changed to "bind joins".
Hence, for new classes and methods I try to follow that.
Change is covered with a unit tests
Commit 4: GH-5121: configurability of bind left joins
Bind left joins for OPTIONAL can be disabled using the
"enableOptionalAsBindJoin" flag in the federation config
Integrate the switch between implementations in the unit test as
parameterized test
Commit 5: GH-5121: code simplifications in bind join implementation
TODOs
PR Author Checklist (see the contributor guidelines for more details):
mvn process-resources
to format from the command line)