-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm? #658
Comments
Sorry, there's currently no option for that. A " |
There are two things here. If you are only interested in frequent items containing some item, a "must_contain" type feature would be useful, and fairly straightforward to implement. If you are interested in filtering on the consequent in rules, then that is more challenging. For example, say you are only interested in rules with consequent==eggs. You cannot mine only frequent patterns containing "eggs" because the rules rely on supports of other items. And, to get those, you need patterns which may not contain "eggs". |
I agree with you @harenbergsd . In addition, while adding a check for whether frequent itemsets contain a particular item may look useful in practice, I am not sure if that would necessarily speed things up (compared to pruning the resulting data frame). I think the major thing that it would accomplish is reducing the memory footprint of the resulting dataframe, because it really is just a check of whether an already computed itemset is to be added to the dataframe or not. |
I found this blog post which I believe talks about this sort of thing: https://wimleers.com/article/fp-growth-powered-association-rule-mining-with-support-for-constraints Like you, I do wonder how much perf improvement you could get by doing this sort of thing. It's certainly an interesting idea. You have to imagine, if you are only interested in a few consequents, you end up doing a lot of extra work and tossing stuff a way. But, it's tricky to know what is the extra work. Nice research project, depending on existing literature in this area :) |
The performance of MLXtend "apriori" is pretty poor. |
Yes, #646 should address that. |
For example, if I'm only interested in items frequently bought with
eggs
, is there a way to avoid generating itemsets for every other possible combination?This takes a lot of time, and with the generated rule set I end up filtering the huge dataframe to find only rows where
consequents
==eggs
, discarding everything else.Can I speed this process up if I'm only interested in one consequent?
The text was updated successfully, but these errors were encountered: