Skip to content

Commit

Permalink
Apriori: implement prune step of apriori-gen
Browse files Browse the repository at this point in the history
The apriori-gen function described in section 2.1.1 of Apriori paper
has two steps; the first step had been implemented in previous commit.

The second step of apriori-gen function is called prune step, it takes
candidates c from first step and check that all (k-1) tuples built by
removing any single element from c is in L(k-1).

As Numpy arrays are not hashable, we cannot use set() for itemset lookup,
and define a very simple prefix tree class.
  • Loading branch information
dbarbier committed Jan 3, 2020
1 parent e34ff8c commit f8131a7
Showing 1 changed file with 52 additions and 4 deletions.
56 changes: 52 additions & 4 deletions mlxtend/frequent_patterns/apriori.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,44 @@
from ..frequent_patterns import fpcommon as fpc


class _FixedLengthTrie:

"""Fixed-length trie (prefix tree).
Parameters
----------
combinations: list of itemsets
All combinations with enough support in the last step
Attributes
----------
root : dict
Root node
"""
__slots__ = ("root")

def __init__(self, combinations):
self.root = dict()
for combination in combinations:
current = self.root
for item in combination:
try:
current = current[item]
except KeyError:
next_node = dict()
current[item] = next_node
current = next_node

def __contains__(self, combination):
current = self.root
try:
for item in combination:
current = current[item]
return True
except KeyError:
return False


def generate_new_combinations(old_combinations):
"""
Generator of all combinations based on the last state of Apriori algorithm
Expand All @@ -32,8 +70,7 @@ def generate_new_combinations(old_combinations):
-----------
Generator of combinations based on the last state of Apriori algorithm.
In order to reduce number of candidates, this function implements the
join step of apriori-gen described in section 2.1.1 of Apriori paper.
Prune step is not yet implemented.
apriori-gen function described in section 2.1.1 of Apriori paper.
Examples
-----------
Expand All @@ -43,15 +80,26 @@ def generate_new_combinations(old_combinations):
"""

length = len(old_combinations)
trie = _FixedLengthTrie(old_combinations)
for i, old_combination in enumerate(old_combinations):
head_i = list(old_combination[:-1])
j = i + 1
while j < length:
*head_j, tail_j = old_combinations[j]
if head_i != head_j:
break
yield from old_combination
yield tail_j
# Prune old_combination+(item,) if any subset is not frequent
candidate = tuple(old_combination) + (tail_j,)
# No need to check the last two values, because test_candidate
# is then old_combinations[i] and old_combinations[j]
for idx in range(len(candidate) - 2):
test_candidate = list(candidate)
del test_candidate[idx]
if test_candidate not in trie:
# early exit from for-loop skips else clause just below
break
else:
yield from candidate
j = j + 1


Expand Down

0 comments on commit f8131a7

Please sign in to comment.