Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Rule induction (CN2) #1397

Merged
merged 38 commits into from
Sep 16, 2016
Merged

[ENH] Rule induction (CN2) #1397

merged 38 commits into from
Sep 16, 2016

Conversation

matevzkren
Copy link
Contributor

@matevzkren matevzkren commented Jul 1, 2016

A more general framework of replaceable individual components that can be fine-tuned to specific needs. To induce rules from examples, divide-and-conquer strategy is applied. CN2 and CN2Unordered algorithms have been implemented so far.. Next up -> widgets!

@matevzkren matevzkren added the gsoc label Jul 1, 2016
random = (np.random if random_seed is None
else np.random.RandomState(random_seed))

def f(x): return random.choice((x == np.nanmax(x)).nonzero()[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use some vspace here.

You can import bottlechest as bn; bn.nanmax() for 50% faster.

x[x == 0] = 1e-5
y[y == 0] = 1e-5
lrs = np.sum(x * np.log(x / y))
lrs = 2 * (lrs - np.sum(x) * np.log(np.sum(x) / np.sum(y)))
Copy link
Contributor

@kernc kernc Jul 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If x and y are arrays, which they are, you can call .sum() methods directly, which is marginally faster and shorter to write.

@kernc
Copy link
Contributor

kernc commented Jul 1, 2016

return X[examples_to_keep], Y[examples_to_keep]


class RuleClassifier(Model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class RuleClassifier(Model, metaclass=ABCMeta)

@matevzkren matevzkren force-pushed the gsoc_rules_CN2 branch 3 times, most recently from 309b3f7 to 11e23e8 Compare July 6, 2016 09:32
Implemented compact view, offering a more suitable one-line dispaly of rules, regardless of their size. Only one column is manipulated to achive the result (section size, resize mode, wrap). Other slight changes included (more expressive relation signs, copy mode worked on).
Enables users to save selected rules to clipboard, for later use.
Labels are now more expressive if compact view is ON.
The selection indexes are not really sorted. According to the sorting procedure, new selection indexes (QModelIndex) are determined/calculated and selected manually.
To enable some specific widget behavior, the core of the widget was re-built. Selection now sticks not only when sorting, but also if 'compact view' is toggled. Copying rules to clipboard was cleaned up. Also implemented output signals. In general, the code is now more expressive and easier to follow.
A new method to compare rules was implemented. It is not obvious to compare rules this way (covered_examples truth vectors are compared), but the implementation is fast, correct, and elegant. It works for all included learners, regardless of chosen covering algorithm and rule ordering.
…evel search procedure and the top-level control procedure.
The selection persists through ordering, compat view toggling, and restoring original order. The solution is not yet optimal, however. Looking into QSortFilterProxyModel next.
PyTableModel will be updated at a later time to provide the same functionality.
Also included are some missing docstrings.
Small fix involving rule comparison. In rare occurrences, not having copied covered examples (numpy array) resulted in incorrect cmp result.
…allback function.

Fixes a bug previously produced using test&score on windows machines. Other instances using the generated learner will no longer have affect on the widget's progress bar.
@janezd
Copy link
Contributor

janezd commented Aug 26, 2016

Having the data input is a good idea, but ...

For classification trees, I can have schema File -> Classification tree -> Classification tree viewer -> Table. When I select a node in the viewer, I see the corresponding subset from the training data. If I try to do the same with CN2 rules, I have to connect File to the Viewer.

Have the data stored with the rules (either always, or add it just in the widget), so the viewer can get it from there.

@astaric
Copy link
Member

astaric commented Sep 8, 2016

Could this PR be merged in the present state and additional features added in subsequent PRs? Are there any obvious parts missing?

@janezd
Copy link
Contributor

janezd commented Sep 8, 2016

That was my plan, too -- merge and perhaps add features later. If you'd do it, yes, please give it a quick check and merge.

@matevzkren
Copy link
Contributor Author

The existing code is fully functional and ready to be merged as is. Some features are being worked on (including additional algorithms) and should be ready by the end of next week. Those can however be handled in subsequent PRs.

@matevzkren matevzkren changed the title [WIP] [ENH] Rule induction (CN2) [ENH] Rule induction (CN2) Sep 9, 2016
@janezd janezd merged commit 9749ee2 into biolab:master Sep 16, 2016
@markotoplak
Copy link
Member

Merging this increased "python setup.py test" on my computer from 17 to 33 seconds. @matevzkren , could you look into this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants