Ignoring equivalent rules when mining symmetric relations #40

falcaopetri · 2020-08-05T12:35:12Z

I've been wrongly (and consistently) using the term `reflexive`, but I am actually referring to the term `symmetric`. I updated the issue's title for clarity, but the whole discussion below regarding `reflexive` relations is actually about `symmetric` relations.

This is somewhat a continuation of #7. Since there is an open discussion there, I'm opening a new issue.

Contextualization

My KB has some reflexive-relations, i.e., if r is reflexive, r(x, y) ∈ KB <=> r(y, x) ∈ KB.

Mining rules with such relations yields many equivalent rules. In this issue, I will report my approach to prune/filter these duplicated rules. I would be happy if you could provide any comments that you think might be helpful.

Equivalent rules

Two cases yield equivalent rules. The first case is swapped reflexive relations' arguments.

Consider the following rules:

functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)
functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(**b**, **a**)
functional_relation(a, X), reflexive_relation1(**b**, **a**) => reflexive_relation2(a, b)

Note that Rules 2. and 3. are equivalent to 1., since they only differ by the parameter's order of a reflexive relation.

The second equivalent case is a little bit harder to spot. Consider the following rule:

functional_relation(**b**, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)

This rule is also equivalent to 1. See the details below for proof.

Proof

Given 4., we can:

Swap the reflexive-relations: functional_relation(b, X), reflexive_relation1(b, a) => reflexive_relation2(b, a);
And then substitute ?a for ?b, and ?b for ?a: functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(a, b).

Of course, these two equivalent cases can occur simultaneously.

My approach

Given a set of known reflexive relations, my approach to detect (and prune) these equivalent rules is two-fold:

Add a new bias to the Dangling and Closing Operators. This new bias imposes that if r is reflexive, its subject variable must be smaller than its object variable.

E.g., we might add a reflexive_relation(a, b), but not reflexive_relation(b, a).
Modify Rule.equals in order to consider all posible permutations of all reflexive relations within a rule. I.e., for rules R1 and R2, instead of only checking QueryEquivalence(R1, R2), check QueryEquivalence between R1 and each of get_all_reflexive_permutations(R2).
Rule.equals example

For example, let's do R1=Rule 1. and R2=Rule 4. We have that get_all_reflexive_permutations(R2) yields 4 permutations:
- functional_relation(b, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)
- functional_relation(b, X), reflexive_relation1(a, b) => reflexive_relation2(b, a)
- functional_relation(b, X), reflexive_relation1(b, a) => reflexive_relation2(a, b)
- functional_relation(b, X), reflexive_relation1(b, a) => reflexive_relation2(b, a)
Each of them is compared with QueryEquivalence with R1. If any of them is equivalent to R1, then R2 is equivalent to R1.

Note that the new bias reduces the number of generated rules, while the new Rule.equals does not allow reflexive-equivalent rules in AMIEQueue.

Expectative

My expectative was that all these equivalent rules yielded the same metrics. This is true for support and body size, but not for PCA body size.

Indeed, I noticed that the two following mined rules presented different PCA confidence:

functional_relation(a, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)
functional_relation(b, X), reflexive_relation1(a, b) => reflexive_relation2(a, b)

More specifically, the second rule had a smaller PCA body size, and therefore a greater PCA confidence.

Reality

When calculating the PCA body size, Rule 1. yields the PCA body size query:

functional_relation(a, X), reflexive_relation1(a, b), reflexive_relation2(a, x9)

Rule 2. yields the PCA body size query:

functional_relation(b, X), reflexive_relation1(a, b), reflexive_relation2(a, x9)

At this point, the two queries are not equaled anymore.

For example, let's instantiate the most restrictive atom in both queries:

Instantiate a: functional_relation(CONST, X), reflexive_relation1(CONST, b), reflexive_relation2(CONST, x9)
Instantiate b: functional_relation(CONST, X), reflexive_relation1(a, CONST), reflexive_relation2(a, x9)

i.e., in the first case, CONST must have a reflexive_relation2, while in the second case, it is a. Consequently, this yields different body sizes.

My workaround

As far as I thought, the new bias I introduced can still be used. The issue occurs when eliminating rules based on my new Rule.equals, since two rules might be equal, but yield different PCA Confidence.

My idea then is to maintain the new bias and add a new output check (similar to skyline). It works as follows:

Impose that if we add a new edge r(s, o) and r is reflexive, then KB.map(s) > KB.map(o);
Maintain the current Rule.equals, but also implement a Rule.reflexive_equals;
While adding a rule to the output set, check if there is already an equal rule using Rule.reflexive_equals;
If there is, keep the one with greater PCA Confidence.

Since the problem occurs only if the head relation is reflexive, a further optimization step is:

If the head relation is reflexive, do as said above (the workaround);
Otherwise, do as said before (my initial approach), since this [potentially] reduces the number of enqueued rules.

Is this too much of a hack? Or does it make sense within the reflexive "issue" I'm trying to fix?

The only problem I saw is that modifying the output set requires -oute to be specified.

Any idea is welcomed.

Best,
Antonio.

The text was updated successfully, but these errors were encountered:

lgalarra · 2020-08-06T20:59:39Z

Hi Antonio,

I fully agree with your observation and while this seems to require quite some hacks, I would not mind to have it as a mining assistant. I only have one remark:

Does your analysis assume that the inverse relationships are fully materialized?
If it is the case and we do not know the reflexive pairs, rule mining could spot those regularities trivially and use that information to do post-processing on the resulting rule set. If we do not assume full materialization of the inverse facts, then the reflexive pairs must be provided as input.

Best,
Luis

falcaopetri · 2020-08-07T17:47:30Z

Hi,

I fully agree with your observation and while this seems to require quite some hacks, I would not mind to have it as a mining assistant.

I guess that most part of the hack comes from extending parts which are a little bit hard to currently extend in AMIE.
The new bias can be implemented in two lines of code, but requires copying-and-pasting the whole code for getDanglingAtoms and getClosingAtoms. The filtering using reflexive_equals I've mentioned has to be implemented in AMIE class, outside a MiningAssistant.

Does your analysis assume that the inverse relationships are fully materialized?
If it is the case and we do not know the reflexive pairs, rule mining could spot those regularities trivially and use that information to do post-processing on the resulting rule set. If we do not assume full materialization of the inverse facts, then the reflexive pairs must be provided as input.

I'm not sure if I understand what you meant. But yes, at least in my case/currently, I provide AMIE with both r(x, y) and r(y, x) facts.

Post-processing is indeed interesting, and would be done using the Rule.reflexive_equals I proposed. The new bias I've mentioned is trying to reduce (prune) the number of rules which will be filtered out in this post-processing.

For the new bias, I'd love to be able to create a "ReflexiveInvariantRule", similar to InjectiveRule (see #33 (comment)). As this is not currently supported, I will indeed implement a custom MiningAssistant.

In order to enable the special treatment to reflexive-relations, I see a few ways:

CLI argument specifying the reflexive-relations and...
1. AMIE asserts that all r(x,y) and r(y, x) have been provided; or
2. AMIE "infers" all r(x,y) or r(y, x) not provided.
AMIE parses owl:ReflexiveProperty using amie.data.Schema and...
1. AMIE asserts that all r(x,y) and r(y, x) have been provided; or
2. AMIE "infers" all r(x,y) or r(y, x) not provided.
Auto-detecting reflexive (seems like a really bad idea).

I think that infering the inverse triple would also require checking for rdfs:subPropertyOf.

Option 1.i seems to follow:

The methodology is more: you provide all the (saturated) facts, and we compute the rules. (@lajus , #7 (comment))

On the other hand, amie.data.Schema already supports rdfs:subPropertyOf.

If you think that one of these (or another) approaches would be more close to get into upstream AMIE, I would gladly opt to implement it that way.

Regards,
Antonio

lgalarra · 2020-09-12T12:23:19Z

Hi Antonio,

I vote for implementing option 1.i as it is the simplest (the user can always run AMIE with 2 atoms to identify those patterns and then use that info to make the mining of longer rules faster), although from a user perspective 1.b is more useful. This functionality would have been useful for these cases (a), (b)
Some benchmark datasets used to evaluate link prediction approaches were inappropriate due to redundancy caused by reflexive relations.

falcaopetri changed the title ~~Ignoring equivalent rules when mining reflexive relations~~ Ignoring equivalent rules when mining symmetric relations Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignoring equivalent rules when mining symmetric relations #40

Ignoring equivalent rules when mining symmetric relations #40

falcaopetri commented Aug 5, 2020 •

edited

Loading

lgalarra commented Aug 6, 2020

falcaopetri commented Aug 7, 2020 •

edited

Loading

lgalarra commented Sep 12, 2020

Ignoring equivalent rules when mining symmetric relations #40

Ignoring equivalent rules when mining symmetric relations #40

Comments

falcaopetri commented Aug 5, 2020 • edited Loading

I've been wrongly (and consistently) using the term reflexive, but I am actually referring to the term symmetric. I updated the issue's title for clarity, but the whole discussion below regarding reflexive relations is actually about symmetric relations.

Contextualization

Equivalent rules

My approach

Expectative

Reality

My workaround

lgalarra commented Aug 6, 2020

falcaopetri commented Aug 7, 2020 • edited Loading

lgalarra commented Sep 12, 2020

falcaopetri commented Aug 5, 2020 •

edited

Loading

I've been wrongly (and consistently) using the term `reflexive`, but I am actually referring to the term `symmetric`. I updated the issue's title for clarity, but the whole discussion below regarding `reflexive` relations is actually about `symmetric` relations.

falcaopetri commented Aug 7, 2020 •

edited

Loading