-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Rule induction (CN2) #1397
Conversation
random = (np.random if random_seed is None | ||
else np.random.RandomState(random_seed)) | ||
|
||
def f(x): return random.choice((x == np.nanmax(x)).nonzero()[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use some vspace here.
You can import bottlechest as bn; bn.nanmax()
for 50% faster.
e6156d0
to
722b3a6
Compare
x[x == 0] = 1e-5 | ||
y[y == 0] = 1e-5 | ||
lrs = np.sum(x * np.log(x / y)) | ||
lrs = 2 * (lrs - np.sum(x) * np.log(np.sum(x) / np.sum(y))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If x
and y
are arrays, which they are, you can call .sum()
methods directly, which is marginally faster and shorter to write.
return X[examples_to_keep], Y[examples_to_keep] | ||
|
||
|
||
class RuleClassifier(Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class RuleClassifier(Model, metaclass=ABCMeta)
309b3f7
to
11e23e8
Compare
Implemented compact view, offering a more suitable one-line dispaly of rules, regardless of their size. Only one column is manipulated to achive the result (section size, resize mode, wrap). Other slight changes included (more expressive relation signs, copy mode worked on).
Enables users to save selected rules to clipboard, for later use.
Labels are now more expressive if compact view is ON.
The selection indexes are not really sorted. According to the sorting procedure, new selection indexes (QModelIndex) are determined/calculated and selected manually.
To enable some specific widget behavior, the core of the widget was re-built. Selection now sticks not only when sorting, but also if 'compact view' is toggled. Copying rules to clipboard was cleaned up. Also implemented output signals. In general, the code is now more expressive and easier to follow.
A new method to compare rules was implemented. It is not obvious to compare rules this way (covered_examples truth vectors are compared), but the implementation is fast, correct, and elegant. It works for all included learners, regardless of chosen covering algorithm and rule ordering.
…evel search procedure and the top-level control procedure.
The selection persists through ordering, compat view toggling, and restoring original order. The solution is not yet optimal, however. Looking into QSortFilterProxyModel next.
PyTableModel will be updated at a later time to provide the same functionality.
Also included are some missing docstrings.
Small fix involving rule comparison. In rare occurrences, not having copied covered examples (numpy array) resulted in incorrect cmp result.
…allback function. Fixes a bug previously produced using test&score on windows machines. Other instances using the generated learner will no longer have affect on the widget's progress bar.
7f2d9c8
to
c678871
Compare
Having the data input is a good idea, but ... For classification trees, I can have schema File -> Classification tree -> Classification tree viewer -> Table. When I select a node in the viewer, I see the corresponding subset from the training data. If I try to do the same with CN2 rules, I have to connect File to the Viewer. Have the data stored with the rules (either always, or add it just in the widget), so the viewer can get it from there. |
Could this PR be merged in the present state and additional features added in subsequent PRs? Are there any obvious parts missing? |
That was my plan, too -- merge and perhaps add features later. If you'd do it, yes, please give it a quick check and merge. |
The existing code is fully functional and ready to be merged as is. Some features are being worked on (including additional algorithms) and should be ready by the end of next week. Those can however be handled in subsequent PRs. |
Merging this increased "python setup.py test" on my computer from 17 to 33 seconds. @matevzkren , could you look into this? |
A more general framework of replaceable individual components that can be fine-tuned to specific needs. To induce rules from examples, divide-and-conquer strategy is applied. CN2 and CN2Unordered algorithms have been implemented so far.. Next up -> widgets!