Skip to content

Latest commit

 

History

History
333 lines (271 loc) · 11.2 KB

Implications.org

File metadata and controls

333 lines (271 loc) · 11.2 KB

Working with Implications

Implications are a fundamental means to express dependencies between attributes within formal contexts. Because of this importance, conexp-clj provides a wide range of functionality for implications. This page discusses basic usage of implications, the computation of some common bases and association rules. Exploration (which also utilizes implications) is discuss in [Exploration](Exploration).

Basic Operations on Implications

Implications are defined by means of the function make-implication, which gets two sets as arguments (the premise and the conclusion of the implication).

(make-implication #{1} #{2})
(#{1} ⟶ #{2})

The arguments can be anything seqable, for example vectors or lists; the arguments will be converted to sets automatically.

(make-implication [1] '(2))
(#{1} ⟶ #{2})

There is also a macro impl, which provides some kind of easier generation of implications.

(impl 1 2 3 ==> 4 5 6)
(#{1 3 2} ⟶ #{4 6 5})

Everything before the token ==> will be an element of the premise, and everything after it will be in the conclusion. Note that both can be empty.

[(impl ==> 4 5 6),
 (impl 1 2 3 ==> ),
 (impl 1 2 3) ; will produce a warning
]
WARNING: «impl» does not contain ==>
[(#{} ⟶ #{4 6 5}) (#{1 3 2} ⟶ #{}) (#{1 3 2} ⟶ #{})]

Having an implication, its premise and conclusion can be accessed by the same-named functions, premise and conclusion.

[(premise (make-implication #{1} #{2})),
 (conclusion (make-implication #{1} #{2}))]
[#{1} #{2}]

To test something for being an implication or not, one can make use of implication?

[(implication? (impl ==>)),
 (implication? 1)]
[true false]

Having an implication, one can test whether a set respects this implication, and whether this implication holds withing a formal context.

(def implication (impl 1 ==> 2))
[(respects? #{1 2} implication),
 (holds? implication (diag-context 5))]
[true false]

Implications are closely related to closure operators, and indeed every set of implications induces one (and every closure operator on sets can be described in this way). conexp-clj provides the function clop-by-implications to obtain this closure operator. Underlying this function is close-under-implications, which computes the inclusion-minimal set of a given set which respects all given implications.

(close-under-implications [(impl 1 ==> 2) (impl 2 ==> 3)] #{1})
#{1 3 2}

Based upon such closure operators, basic logical notions such as entailment for implications can be decided. The function follows? (or follows-semantically?, for historical reasons) provides this functionality.

(follows? (impl 1 ==> 3) [(impl 1 ==> 2) (impl 2 ==> 3)])
true

Moreover, one can decide whether sets of implications are equivalent (equivalent-implications?), i.e. have the same induced closure operator and whether a set of implications is minimal (minimal-implication-set?), i.e. whether no implications can be removed from the set without changing the induced closure operator. If a set is not minimal, a minimal subset can be generated using irredundant-subset.

A single implication can be unitary (unitary?), i.e. have a premise length of one. The set of all unitary implications can be returned by unitary-subset and the remaining implications can be found by non-unitary-subset. By use of unitary implications, a set of implications can be tested for ideal closure (ideal-closed?), i.e. for every implications A ⟶ B the closure of A under all non-unitary implications is closed under all unitary implications.

If a formal context is given, one can check whether a set of implications is sound (sound-implication-set?), i.e. whether all implications in this set hold in the given formal context, and whether the set of implications is complete (complete-implication-set?), i.e. every implication which holds in the formal context already follows from the given set of implications. However, note that checking for completeness is a rather expensive operation (at least, if the implications are sound).

Another interesting function is intersect-implicational-theories, which, given some sets of implications on a common base set, computes a set of implications whose induced closure operator corresponds to the intersection of the induced closure operators of the given sets of implications.

Computing Bases

Given a formal context ctx, one of the most basic operations in formal concept analysis is to compute a base of ctx, i.e. a sound and complete set of implications of ctx. Such a base is given by all implications A ⟶ A” for all subsets A of the attribute set of ctx. In conexp-clj, this base can be computed by

(let [ctx (diag-context 3)]
  (set-of (make-implication A (adprime ctx A)) | A (subsets (attributes ctx))))
#{(#{0} ⟶ #{}) (#{0 1} ⟶ #{2}) (#{1 2} ⟶ #{0}) (#{0 2} ⟶ #{1})
  (#{} ⟶ #{}) (#{0 1 2} ⟶ #{}) (#{2} ⟶ #{}) (#{1} ⟶ #{})}

but, of course, this base is rather large (always exponential in (count (attributes ctx))).

A better solution for this might (might) be the computation of the canonical base, a special base whose size is minimal among all sizes of bases of the formal context. This can be done using canonical-base (stem-base, as it was called In The Old Days):

(canonical-base (diag-context 3))
((#{1 2} ⟶ #{0}) (#{0 2} ⟶ #{1}) (#{0 1} ⟶ #{2}))
(canonical-base (adiag-context 3))
()

The result returned is a lazy sequence, and as such its elements are not computed if they are not needed. This can be handy when computing the canonical base of rather large contexts, as the computation can (and mostly is) very expensive. Indeed, even the canonical base can be exponential in the size of the given formal context (and thus exponential in the size of its attribute set). Instead of lazily calculating the canonical base an attribute incremental function is implementen with incremental-canonical-base (incremental-stem-base) for the initial context and add-attribute to add new attributes.

The implementation of canonical-base makes use of a more general algorithm, namely the computation of the canonical base of a given closure operator on sets. This functionality is exposed by canonical-base-from-clop.

If one has already computed a base of the formal context at hand, but instead wants to have the canonical base of the formal context instead, one can make use of the function canonical-base-from-base (stem-base-from-base) which computes this canonical base from the original one in time quadratic in its size.

Instead of computing the canonical base, one can also make use of the base of proper premises, which may be larger than the canonical base but easier to compute. This base can be computed using proper-premise-implications, and functions related to this are proper-conclusion, proper-premise?, proper-premises-for-attributes and proper-premises.

Various functions take the resulting bases as a argument for further computations. largest-extension-by-implications e.g. takes a closure system and grows it by use of implications.

All bases can be transformed into a ganter-base which is designed to increase the readability for humans without changing any of the contained knowledge.

Association Rules

A very prominent use of implications is in the form of association rules as employed by basic data-mining techniques. Within this scope, implications are tagged with their support and confidence within the given data set (here: a formal context), which are then used as measures of interestingness for an implication. The support of an implication is just the relative amount of objects whose extent includes both the premise and the conclusion of the implication. In other words, the support of an implication is just the amount of objects where this implication takes effect. The confidence of an implication is the “relative probability” that an object that satisfies the premise also satisfies the conclusion. Support and confidence can be computed by (guess what!) support and confidence.

The task is then, given a formal context and some lower bounds for support and confidence, to compute a base of all implications whose support and confidence is above these bounds. This can be achieved by Luxenburger’s base, implemented as luxenburger-base (luxenburger-basis). A function related to this is frequent-closed-itemsets.

(def rctx (make-context-from-matrix 10 10
                                    [0 0 0 1 0 1 0 0 1 0
                                     1 0 1 1 1 1 0 1 1 0
                                     1 1 1 0 1 1 1 0 1 0
                                     1 0 0 0 1 0 1 1 1 1
                                     1 0 1 0 1 0 0 0 1 0
                                     1 1 1 1 0 0 0 0 0 1
                                     0 0 0 0 0 0 0 0 0 0
                                     0 0 0 1 1 1 1 1 0 1
                                     1 0 0 1 1 1 1 0 0 0
                                     1 0 0 1 0 0 1 0 0 0]))
rctx
  |0 1 2 3 4 5 6 7 8 9 
--+--------------------
0 |. . . x . x . . x . 
1 |x . x x x x . x x . 
2 |x x x . x x x . x . 
3 |x . . . x . x x x x 
4 |x . x . x . . . x . 
5 |x x x x . . . . . x 
6 |. . . . . . . . . . 
7 |. . . x x x x x . x 
8 |x . . x x x x . . . 
9 |x . . x . . x . . . 

(luxenburger-base rctx 0.2 0.8)
((#{5} ⟶ #{3})
 (#{5} ⟶ #{4})
 (#{6} ⟶ #{4})
 (#{6} ⟶ #{0})
 (#{4} ⟶ #{0})
 (#{8} ⟶ #{0 4})
 (#{0 4} ⟶ #{8}))

Note that Luxenburger’s base does not include valid implications. If you want to have these as well, you can additionally compute the canonical base with the desired support threshold:

(canonical-base rctx #{} #(>= (support % rctx) 0.2))
((#{2} ⟶ #{0})
 (#{3 8} ⟶ #{5})
 (#{6 8} ⟶ #{0 4})
 (#{6 5} ⟶ #{4})
 (#{6 9} ⟶ #{7 4})
 (#{4 8} ⟶ #{0})
 (#{4 9} ⟶ #{7 6})
 (#{4 3} ⟶ #{5})
 (#{1} ⟶ #{0 2})
 (#{7} ⟶ #{4})
 (#{7 4 5} ⟶ #{3})
 (#{7 4 6} ⟶ #{9})
 (#{0 8} ⟶ #{4})
 (#{0 5} ⟶ #{4})
 (#{0 4 5 8} ⟶ #{2})
 (#{0 4 2} ⟶ #{8})
 (#{0 7 4} ⟶ #{8}))