Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring equivalent rules when mining symmetric relations #40

Open
falcaopetri opened this issue Aug 5, 2020 · 3 comments
Open

Ignoring equivalent rules when mining symmetric relations #40

falcaopetri opened this issue Aug 5, 2020 · 3 comments

Comments

@falcaopetri
Copy link
Contributor

falcaopetri commented Aug 5, 2020

I've been wrongly (and consistently) using the term reflexive, but I am actually referring to the term symmetric. I updated the issue's title for clarity, but the whole discussion below regarding reflexive relations is actually about symmetric relations.

This is somewhat a continuation of #7. Since there is an open discussion there, I'm opening a new issue.

Contextualization

My KB has some reflexive-relations, i.e., if r is reflexive, r(x, y) ∈ KB <=> r(y, x) ∈ KB.

Mining rules with such relations yields many equivalent rules. In this issue, I will report my approach to prune/filter these duplicated rules. I would be happy if you could provide any comments that you think might be helpful.

Equivalent rules

Two cases yield equivalent rules. The first case is swapped reflexive relations' arguments.

Consider the following rules:

  1. functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)
  2. functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(**b**, **a**)
  3. functional_relation(a, X), reflexive_relation1(**b**, **a**) => reflexive_relation2(a, b)

Note that Rules 2. and 3. are equivalent to 1., since they only differ by the parameter's order of a reflexive relation.

The second equivalent case is a little bit harder to spot. Consider the following rule:

  1. functional_relation(**b**, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)

This rule is also equivalent to 1. See the details below for proof.

Proof

Given 4., we can:

  • Swap the reflexive-relations: functional_relation(b, X), reflexive_relation1(b, a) => reflexive_relation2(b, a);
  • And then substitute ?a for ?b, and ?b for ?a: functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(a, b).

Of course, these two equivalent cases can occur simultaneously.

My approach

Given a set of known reflexive relations, my approach to detect (and prune) these equivalent rules is two-fold:

  1. Add a new bias to the Dangling and Closing Operators. This new bias imposes that if r is reflexive, its subject variable must be smaller than its object variable.

    E.g., we might add a reflexive_relation(a, b), but not reflexive_relation(b, a).

  2. Modify Rule.equals in order to consider all posible permutations of all reflexive relations within a rule. I.e., for rules R1 and R2, instead of only checking QueryEquivalence(R1, R2), check QueryEquivalence between R1 and each of get_all_reflexive_permutations(R2).

    Rule.equals example

    For example, let's do R1=Rule 1. and R2=Rule 4. We have that get_all_reflexive_permutations(R2) yields 4 permutations:

    • functional_relation(b, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)
    • functional_relation(b, X), reflexive_relation1(a, b) => reflexive_relation2(b, a)
    • functional_relation(b, X), reflexive_relation1(b, a) => reflexive_relation2(a, b)
    • functional_relation(b, X), reflexive_relation1(b, a) => reflexive_relation2(b, a)

    Each of them is compared with QueryEquivalence with R1. If any of them is equivalent to R1, then R2 is equivalent to R1.

Note that the new bias reduces the number of generated rules, while the new Rule.equals does not allow reflexive-equivalent rules in AMIEQueue.

Expectative

My expectative was that all these equivalent rules yielded the same metrics. This is true for support and body size, but not for PCA body size.

Indeed, I noticed that the two following mined rules presented different PCA confidence:

  1. functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)
  2. functional_relation(b, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)

More specifically, the second rule had a smaller PCA body size, and therefore a greater PCA confidence.

Reality

When calculating the PCA body size, Rule 1. yields the PCA body size query:

  1. functional_relation(a, X), reflexive_relation1(a, b), reflexive_relation2(a, x9)

Rule 2. yields the PCA body size query:

  1. functional_relation(b, X), reflexive_relation1(a, b), reflexive_relation2(a, x9)

At this point, the two queries are not equaled anymore.

For example, let's instantiate the most restrictive atom in both queries:

  1. Instantiate a: functional_relation(CONST, X), reflexive_relation1(CONST, b), reflexive_relation2(CONST, x9)
  2. Instantiate b: functional_relation(CONST, X), reflexive_relation1(a, CONST), reflexive_relation2(a, x9)

i.e., in the first case, CONST must have a reflexive_relation2, while in the second case, it is a. Consequently, this yields different body sizes.

My workaround

As far as I thought, the new bias I introduced can still be used. The issue occurs when eliminating rules based on my new Rule.equals, since two rules might be equal, but yield different PCA Confidence.

My idea then is to maintain the new bias and add a new output check (similar to skyline). It works as follows:

  • Impose that if we add a new edge r(s, o) and r is reflexive, then KB.map(s) > KB.map(o);
  • Maintain the current Rule.equals, but also implement a Rule.reflexive_equals;
  • While adding a rule to the output set, check if there is already an equal rule using Rule.reflexive_equals;
  • If there is, keep the one with greater PCA Confidence.

Since the problem occurs only if the head relation is reflexive, a further optimization step is:

  • If the head relation is reflexive, do as said above (the workaround);
  • Otherwise, do as said before (my initial approach), since this [potentially] reduces the number of enqueued rules.

Is this too much of a hack? Or does it make sense within the reflexive "issue" I'm trying to fix?

The only problem I saw is that modifying the output set requires -oute to be specified.

Any idea is welcomed.

Best,
Antonio.

@lgalarra
Copy link
Collaborator

lgalarra commented Aug 6, 2020

Hi Antonio,

I fully agree with your observation and while this seems to require quite some hacks, I would not mind to have it as a mining assistant. I only have one remark:

Does your analysis assume that the inverse relationships are fully materialized?
If it is the case and we do not know the reflexive pairs, rule mining could spot those regularities trivially and use that information to do post-processing on the resulting rule set. If we do not assume full materialization of the inverse facts, then the reflexive pairs must be provided as input.

Best,
Luis

@falcaopetri
Copy link
Contributor Author

falcaopetri commented Aug 7, 2020

Hi,

I fully agree with your observation and while this seems to require quite some hacks, I would not mind to have it as a mining assistant.

I guess that most part of the hack comes from extending parts which are a little bit hard to currently extend in AMIE.
The new bias can be implemented in two lines of code, but requires copying-and-pasting the whole code for getDanglingAtoms and getClosingAtoms. The filtering using reflexive_equals I've mentioned has to be implemented in AMIE class, outside a MiningAssistant.

Does your analysis assume that the inverse relationships are fully materialized?
If it is the case and we do not know the reflexive pairs, rule mining could spot those regularities trivially and use that information to do post-processing on the resulting rule set. If we do not assume full materialization of the inverse facts, then the reflexive pairs must be provided as input.

I'm not sure if I understand what you meant. But yes, at least in my case/currently, I provide AMIE with both r(x, y) and r(y, x) facts.

Post-processing is indeed interesting, and would be done using the Rule.reflexive_equals I proposed. The new bias I've mentioned is trying to reduce (prune) the number of rules which will be filtered out in this post-processing.

For the new bias, I'd love to be able to create a "ReflexiveInvariantRule", similar to InjectiveRule (see #33 (comment)). As this is not currently supported, I will indeed implement a custom MiningAssistant.

In order to enable the special treatment to reflexive-relations, I see a few ways:

  1. CLI argument specifying the reflexive-relations and...
    1. AMIE asserts that all r(x,y) and r(y, x) have been provided; or
    2. AMIE "infers" all r(x,y) or r(y, x) not provided.
  2. AMIE parses owl:ReflexiveProperty using amie.data.Schema and...
    1. AMIE asserts that all r(x,y) and r(y, x) have been provided; or
    2. AMIE "infers" all r(x,y) or r(y, x) not provided.
  3. Auto-detecting reflexive (seems like a really bad idea).

I think that infering the inverse triple would also require checking for rdfs:subPropertyOf.

Option 1.i seems to follow:

The methodology is more: you provide all the (saturated) facts, and we compute the rules. (@lajus , #7 (comment))

On the other hand, amie.data.Schema already supports rdfs:subPropertyOf.

If you think that one of these (or another) approaches would be more close to get into upstream AMIE, I would gladly opt to implement it that way.

Regards,
Antonio

@lgalarra
Copy link
Collaborator

Hi Antonio,

I vote for implementing option 1.i as it is the simplest (the user can always run AMIE with 2 atoms to identify those patterns and then use that info to make the mining of longer rules faster), although from a user perspective 1.b is more useful. This functionality would have been useful for these cases (a), (b)
Some benchmark datasets used to evaluate link prediction approaches were inappropriate due to redundancy caused by reflexive relations.

@falcaopetri falcaopetri changed the title Ignoring equivalent rules when mining reflexive relations Ignoring equivalent rules when mining symmetric relations Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants