[ENH] Rule induction (CN2) #1397

matevzkren · 2016-07-01T12:20:25Z

A more general framework of replaceable individual components that can be fine-tuned to specific needs. To induce rules from examples, divide-and-conquer strategy is applied. CN2 and CN2Unordered algorithms have been implemented so far.. Next up -> widgets!

kernc · 2016-07-01T12:24:17Z

Orange/classification/rules.py

+    random = (np.random if random_seed is None
+              else np.random.RandomState(random_seed))
+
+    def f(x): return random.choice((x == np.nanmax(x)).nonzero()[0])


Please use some vspace here.

You can import bottlechest as bn; bn.nanmax() for 50% faster.

kernc · 2016-07-01T12:42:26Z

Orange/classification/rules.py

+    x[x == 0] = 1e-5
+    y[y == 0] = 1e-5
+    lrs = np.sum(x * np.log(x / y))
+    lrs = 2 * (lrs - np.sum(x) * np.log(np.sum(x) / np.sum(y)))


If x and y are arrays, which they are, you can call .sum() methods directly, which is marginally faster and shorter to write.

kernc · 2016-07-01T13:29:27Z

https://stackoverflow.com/questions/12590058/python-performance-with-global-variables-vs-local

kernc · 2016-07-01T13:36:50Z

Orange/classification/rules.py

+        return X[examples_to_keep], Y[examples_to_keep]
+
+
+class RuleClassifier(Model):


class RuleClassifier(Model, metaclass=ABCMeta)

Implemented compact view, offering a more suitable one-line dispaly of rules, regardless of their size. Only one column is manipulated to achive the result (section size, resize mode, wrap). Other slight changes included (more expressive relation signs, copy mode worked on).

Enables users to save selected rules to clipboard, for later use.

Labels are now more expressive if compact view is ON.

The selection indexes are not really sorted. According to the sorting procedure, new selection indexes (QModelIndex) are determined/calculated and selected manually.

To enable some specific widget behavior, the core of the widget was re-built. Selection now sticks not only when sorting, but also if 'compact view' is toggled. Copying rules to clipboard was cleaned up. Also implemented output signals. In general, the code is now more expressive and easier to follow.

A new method to compare rules was implemented. It is not obvious to compare rules this way (covered_examples truth vectors are compared), but the implementation is fast, correct, and elegant. It works for all included learners, regardless of chosen covering algorithm and rule ordering.

…evel search procedure and the top-level control procedure.

The selection persists through ordering, compat view toggling, and restoring original order. The solution is not yet optimal, however. Looking into QSortFilterProxyModel next.

PyTableModel will be updated at a later time to provide the same functionality.

Also included are some missing docstrings.

…TableModel.

Small fix involving rule comparison. In rare occurrences, not having copied covered examples (numpy array) resulted in incorrect cmp result.

…allback function. Fixes a bug previously produced using test&score on windows machines. Other instances using the generated learner will no longer have affect on the widget's progress bar.

janezd · 2016-08-26T12:16:43Z

Having the data input is a good idea, but ...

For classification trees, I can have schema File -> Classification tree -> Classification tree viewer -> Table. When I select a node in the viewer, I see the corresponding subset from the training data. If I try to do the same with CN2 rules, I have to connect File to the Viewer.

Have the data stored with the rules (either always, or add it just in the widget), so the viewer can get it from there.

astaric · 2016-09-08T10:54:46Z

Could this PR be merged in the present state and additional features added in subsequent PRs? Are there any obvious parts missing?

janezd · 2016-09-08T21:14:03Z

That was my plan, too -- merge and perhaps add features later. If you'd do it, yes, please give it a quick check and merge.

matevzkren · 2016-09-09T08:44:43Z

The existing code is fully functional and ready to be merged as is. Some features are being worked on (including additional algorithms) and should be ready by the end of next week. Those can however be handled in subsequent PRs.

markotoplak · 2016-09-16T11:03:55Z

Merging this increased "python setup.py test" on my computer from 17 to 33 seconds. @matevzkren , could you look into this?

matevzkren added the gsoc label Jul 1, 2016

kernc reviewed Jul 1, 2016
View reviewed changes

matevzkren force-pushed the gsoc_rules_CN2 branch from e6156d0 to 722b3a6 Compare July 1, 2016 12:42

kernc reviewed Jul 1, 2016
View reviewed changes

matevzkren force-pushed the gsoc_rules_CN2 branch 3 times, most recently from 309b3f7 to 11e23e8 Compare July 6, 2016 09:32

matevzkren added 22 commits August 26, 2016 11:55

Implemented 'copy to clipboard'

5269bc0

Enables users to save selected rules to clipboard, for later use.

Adjusted rule labels.

6c28c21

Labels are now more expressive if compact view is ON.

QSelectionModel can now be sorted.

297a61e

The selection indexes are not really sorted. According to the sorting procedure, new selection indexes (QModelIndex) are determined/calculated and selected manually.

Unified rule.is_significant() to accomodate cleaner code in the low-l…

fdcb62e

…evel search procedure and the top-level control procedure.

Selection model sorting.

4287761

The selection persists through ordering, compat view toggling, and restoring original order. The solution is not yet optimal, however. Looking into QSortFilterProxyModel next.

Improved selection handling with QSortFilterProxyModel.

9b1e63e

PyTableModel will be updated at a later time to provide the same functionality.

Added RuleViewer report.

085a7ce

Tests improved (rules.py).

74d6df8

OWRules widget unit tests.

32c230f

RuleViewer widget unit tests.

a4ba445

Also included are some missing docstrings.

Progress bar (OWRules).

3fd2729

Included icons & small code improvements.

1159027

Rearranged RuleViewer buttons & checkbox.

fb44b00

OWRuleViewer: Added sizehint and replaced PyTableModel with QAbstract…

bfc716d

…TableModel.

OWRuleViewer: rule quality format.

97a8ca3

rules module: _BaseCN2Learner and minor code clean-up.

b9e9c04

Documentation update.

f54ed26

Rule comparison FIX.

037019f

Small fix involving rule comparison. In rare occurrences, not having copied covered examples (numpy array) resulted in incorrect cmp result.

[FIX] OWRuleLearner: Progress bar updates are now handled through a c…

c678871

…allback function. Fixes a bug previously produced using test&score on windows machines. Other instances using the generated learner will no longer have affect on the widget's progress bar.

matevzkren force-pushed the gsoc_rules_CN2 branch from 7f2d9c8 to c678871 Compare August 26, 2016 12:00

matevzkren changed the title ~~[WIP] [ENH] Rule induction (CN2)~~ [ENH] Rule induction (CN2) Sep 9, 2016

janezd merged commit 9749ee2 into biolab:master Sep 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Rule induction (CN2) #1397

[ENH] Rule induction (CN2) #1397

matevzkren commented Jul 1, 2016 •

edited by kernc

Loading

kernc Jul 1, 2016

kernc Jul 1, 2016 •

edited

Loading

kernc commented Jul 1, 2016

kernc Jul 1, 2016

janezd commented Aug 26, 2016

astaric commented Sep 8, 2016

janezd commented Sep 8, 2016

matevzkren commented Sep 9, 2016

markotoplak commented Sep 16, 2016

		return X[examples_to_keep], Y[examples_to_keep]


		class RuleClassifier(Model):

[ENH] Rule induction (CN2) #1397

[ENH] Rule induction (CN2) #1397

Conversation

matevzkren commented Jul 1, 2016 • edited by kernc Loading

kernc Jul 1, 2016

Choose a reason for hiding this comment

kernc Jul 1, 2016 • edited Loading

Choose a reason for hiding this comment

kernc commented Jul 1, 2016

kernc Jul 1, 2016

Choose a reason for hiding this comment

janezd commented Aug 26, 2016

astaric commented Sep 8, 2016

janezd commented Sep 8, 2016

matevzkren commented Sep 9, 2016

markotoplak commented Sep 16, 2016

matevzkren commented Jul 1, 2016 •

edited by kernc

Loading

kernc Jul 1, 2016 •

edited

Loading