Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve "exploit max length" for Dangling Operator #43

Open
falcaopetri opened this issue Sep 11, 2020 · 4 comments
Open

Improve "exploit max length" for Dangling Operator #43

falcaopetri opened this issue Sep 11, 2020 · 4 comments

Comments

@falcaopetri
Copy link
Contributor

Hi.

Maximum rule length is an interesting pruning technique introduced in AMIE+. In "Fast rule mining in ontological knowledge bases with AMIE+", we can read:

Maximum rule length: [...] for a not-yet-closed rule of length maxLen−1, AMIE+ will not apply the add-dangling-atom operator O_D, because this results in a non-closed rule, which will be neither output nor refined. In the same spirit, if the same rule contains more than two non-closed variables (see Sect. 3.3), AMIE+ will skip the application of the add-closing-atom operator O_C . This happens because an application of the operator O C can close at most two variables with one atom. This reasoning also applies to the instantiation operator O_I: rules with more than one non-closed variable are not refined with instantiated atoms, because the addition of an instantiated atom can close at most one variable.

Summing up, we have that:

  • Dangling operator checks if there are non-closed variables
  • Closing operator checks if there are more than two non-closed variables
  • Instantiation operator checks if there are more than one non-closed variable

I guess that dangling operator can be safely modified to consider if there are more than one non-closed variable, just as the instantiation operator.

Indeed, the dangling operator will:

  • Close an open variable
  • Add an open variable

Therefore, it can close at most one variable. The newly added variable may be closed by instantiation operator, but the previously opened variables cannot.

Best,
Antonio.

@falcaopetri
Copy link
Contributor Author

Hi.

I think the following logic may be related to "max length exploitation" for the Dangling Operator. I also feel that this may be a bug that incorrectly filters out candidate rules. The problem seems to occur only when mining constants, so it's hard to spot.

https://github.com/lajus/amie/blob/651455d0438e11ec68d83981c2c0dba309e4b984/mining/src/main/java/amie/mining/AMIE.java#L433-L448

The above code's idea seems to be that "the last atom added to a rule should not be generated by the dangling operator since the resulting rule will definitely be not-closed". Nonetheless, this ignores the possibility of applying the instantiation operator and make the rule closed.

Indeed, the code above seems to contradict the check done by the dangling operator itself:

https://github.com/lajus/amie/blob/651455d0438e11ec68d83981c2c0dba309e4b984/mining/src/main/java/amie/mining/assistant/DefaultMiningAssistant.java#L249-L257

@falcaopetri
Copy link
Contributor Author

0k, now I get it.

The above code's idea seems to be that "the last atom added to a rule should not be generated by the dangling operator since the resulting rule will definitely be not-closed". Nonetheless, this ignores the possibility of applying the instantiation operator and make the rule closed.

The highlighted afirmation is wrong. Instantiation operator has already been applied to the dangling's output and is at temporalOutputMap.get("instantiated"). The code I suggested to remove is just avoiding adding a rule which has max size and is surely unclosed (because of the newly added dangling variable). If added, the rule will be shortly filtered by AMIE's output check, and the final output will be the same.

I will drop PR #44 since it has already incomporated this wrong understanding.

Nonetheless, the modification to the dangling's exploitMaxLengthOption is still valid, since it:

  • Works better if not mining constants (by extending the pruning to query.getOpenVariables().size() == 0)
  • Works better if there is already more than 1 dangling variable (by adding pruning if query.getOpenVariables().size() > 1). This skips the application of the dangling operator to rules such as ?a <r1> ?f => ?a <r2> ?b.

@falcaopetri
Copy link
Contributor Author

I think I've finally found the real culprit for the performance boost shown in #46. This is MiningAssistant.getInstantiatedAtoms:

https://github.com/lajus/amie/blob/651455d0438e11ec68d83981c2c0dba309e4b984/mining/src/main/java/amie/mining/assistant/MiningAssistant.java#L793-L796

Note that if exploitMaxLengthOption is set, then... max length is not exploited (because the condition checked will be true).

Therefore, my proposed fix to this issue is: the current PR #46 + fixing the code above.
The PR is still useful since it discards rules that would be unnecessarily enqueued.

@lgalarra
Copy link
Collaborator

Hi Antonio,

Thanks for this nice inquiry. I agree with your analysis and I vote for merging your PR into AMIE's main branch. I am waiting for @lajus approval.

Best,
Luis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants