Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm? #658

soliverc · 2020-01-22T14:53:05Z

For example, if I'm only interested in items frequently bought with eggs, is there a way to avoid generating itemsets for every other possible combination?
This takes a lot of time, and with the generated rule set I end up filtering the huge dataframe to find only rows where consequents == eggs, discarding everything else.
Can I speed this process up if I'm only interested in one consequent?

The text was updated successfully, but these errors were encountered:

rasbt · 2020-01-22T15:28:05Z

Sorry, there's currently no option for that. A "must_contain=('some item', 'some other item', ...)" feature could be a useful enhancement though.

harenbergsd · 2020-05-20T03:56:30Z

There are two things here. If you are only interested in frequent items containing some item, a "must_contain" type feature would be useful, and fairly straightforward to implement.

If you are interested in filtering on the consequent in rules, then that is more challenging. For example, say you are only interested in rules with consequent==eggs. You cannot mine only frequent patterns containing "eggs" because the rules rely on supports of other items. And, to get those, you need patterns which may not contain "eggs".

rasbt · 2020-05-20T21:41:52Z

I agree with you @harenbergsd . In addition, while adding a check for whether frequent itemsets contain a particular item may look useful in practice, I am not sure if that would necessarily speed things up (compared to pruning the resulting data frame). I think the major thing that it would accomplish is reducing the memory footprint of the resulting dataframe, because it really is just a check of whether an already computed itemset is to be added to the dataframe or not.

harenbergsd · 2020-05-23T15:05:05Z

I found this blog post which I believe talks about this sort of thing: https://wimleers.com/article/fp-growth-powered-association-rule-mining-with-support-for-constraints

Like you, I do wonder how much perf improvement you could get by doing this sort of thing. It's certainly an interesting idea. You have to imagine, if you are only interested in a few consequents, you end up doing a lot of extra work and tossing stuff a way. But, it's tricky to know what is the extra work. Nice research project, depending on existing literature in this area :)

kno10 · 2020-05-26T11:48:13Z

The performance of MLXtend "apriori" is pretty poor.
It does not implement the optimizations of the original Apriori algorithm such as the prefix join to avoid generating unnecessary candidates or the hash tree; but it is a quite naive version instead that generates way too many candidates. See #644 - the runtime of MLXtend was 2 minutes, of pyfim just 150 ms; so MLXtend was about 750x slower.
Hence if performance is a problem for you, consider using the well-known much faster "fim" module by Christian Borgelt: http://www.borgelt.net/pyfim.html - or improve the Apriori of MLXtend and first add the missing nice optimizations that are part of the Apriori algorithm.

rasbt · 2020-05-26T15:03:55Z

Yes, #646 should address that.

rasbt added the Enhancement label Jan 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm? #658

Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm? #658

soliverc commented Jan 22, 2020

rasbt commented Jan 22, 2020

harenbergsd commented May 20, 2020

rasbt commented May 20, 2020

harenbergsd commented May 23, 2020

kno10 commented May 26, 2020

rasbt commented May 26, 2020

Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm? #658

Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm? #658

Comments

soliverc commented Jan 22, 2020

rasbt commented Jan 22, 2020

harenbergsd commented May 20, 2020

rasbt commented May 20, 2020

harenbergsd commented May 23, 2020

kno10 commented May 26, 2020

rasbt commented May 26, 2020