[ENH] OwLouvain: Add normalize data checkbox to PCA preprocessing #3573

pavlin-policar · 2019-02-02T09:57:00Z

Edited on 4.2.2019

Issue

In #3448 we found that normalizing the data can have beneficial effects for t-SNE. This likely also holds for Louvain clustering, when PCA preprocessing is applied. It also makes little sense to enable data normalization for PCA in one widget, but not another.

Description of changes

I tried to separate the PCA specific parameters from the "Enable PCA preprocessing" by adding an indent. IMO it looks better than if there we no indent, but it's not very pretty.

Description of changes:

024d125 Add the option to skip zero centering to the normalization preprocessor. This will also be necessary later on, when I add normalization for sparse matrices.

This is already implemented in Orange.preprocess.preprocess.Scale, which seems to do exactly the same thing as Orange.preprocess.preprocess.Normalize. Indeed, it seems as though Scale actually uses the same normalization implementation. This is very confusing, as Normalize is just a slightly less powerful version of Scale, but having a much better name. It also seems that the only place Scale is used is in the Preprocess widget, while Normalize is used all over the place.

This is very confusing and IMO Scale should be renamed to Normalize, since they do the same thing, but Scale does it better. In any case, I am using Normalize here, mainly because I found Scale after implementing this.
495dfa8 Moved table_dense_sparse decorator into Orange.widget.testing.utils. It's quite handy.
e1d91ef Add normalization support for sparse data. This is included here because this is not PCA specific. We only scale by SD. For PCA, this is fine, because centering is applied in PCA. This is also allright where PCA is not used, because the table is directly used for computing pairwise distances between samples. Euclidean and Manhattan distance are independent of the position i.e. they are the same if the points are centered or not. So applying only scaling is all right here as well.

Note that this is only correct as long as the distances are independent of absolute position. If we ever add other distances, this needs to be considered e.g. the cosine distance doesn't have this property.
bb1f3e0 Disable PCA components slider if Apply PCA is not checked; it makes little sense to be enabled.

Includes

Code changes
Tests
Documentation

codecov · 2019-02-02T10:08:56Z

Codecov Report

Merging #3573 into master will decrease coverage by <.01%.
The diff coverage is 87.5%.

@@            Coverage Diff             @@
##           master    #3573      +/-   ##
==========================================
- Coverage   83.97%   83.97%   -0.01%     
==========================================
  Files         370      370              
  Lines       66941    66960      +19     
==========================================
+ Hits        56215    56229      +14     
- Misses      10726    10731       +5

codecov · 2019-02-02T10:08:56Z

Codecov Report

Merging #3573 into master will increase coverage by 0.01%.
The diff coverage is 95.83%.

@@            Coverage Diff            @@
##           master   #3573      +/-   ##
=========================================
+ Coverage   83.99%     84%   +0.01%     
=========================================
  Files         370     370              
  Lines       67023   67098      +75     
=========================================
+ Hits        56298   56369      +71     
- Misses      10725   10729       +4

janezd · 2019-02-02T11:53:29Z

I tried to run the widget (using its "main"). It erred with

  File "/Users/janez/Dropbox/orange3/Orange/clustering/louvain.py", line 142, in fit
    graph, resolution=self.resolution, random_state=self.random_state
TypeError: best_partition() got an unexpected keyword argument 'random_state'

This is not related to this PR and may be a problem in my environment. However, the label remained "Running...". I think the widget should change the label to "Erred..." (it this the right verb? :) when optimization fail.

Unrelated to this PR, but can be fixed here since it involves a change in the same place: the error was reported in the status bar, yet it's an error in the code and should be shown as crash. Similar to issue #3548.

pavlin-policar · 2019-02-02T13:11:26Z

Yes, you're completely right. This slipped my mind completely. I'll definitely add something to indicate an error, but I don't think repeating the error message shown in the status bar is needed. It might encourage users to look for error messages somewhere in the widget UI, while this would be completely specific to this widget. I think it's better to enforce one single place for the error message on the widget (despite them being kind of hard to notice sometimes).

As far as the error you got, you likely have an old version of python-louvain, probably 0.11. I added random state support in the library and should be 0.13 or higher. So you probably just need to update the package version.

janezd · 2019-02-02T14:53:14Z

Sure, no need to repeat the message. Just set the label to "Error" or something.

My louvain is indeed 0.11. Which is good, otherwise I wouldn't have spotted this glitch. :)

lanzagar · 2019-02-04T11:39:00Z

I would suggest to:

rename PCA Preprocessing box to Preprocessing
put Normalize data above "Apply PCA preprocessing" and have it independent of PCA
maybe change Components -> PCA Components

lanzagar · 2019-02-04T11:40:17Z

And also, same comment as in t-SNE: why is normalization not supported for sparse data?

pavlin-policar · 2019-02-04T16:16:37Z

PCA Components reads as Principal component analysis components. Principal components would be better but that's too long, and the slider basically disappears. PCs can be unclear. I don't have any other ideas. I'll keep it as PCA Components for now.

pavlin-policar · 2019-02-04T17:00:46Z

I've updated the original PR description describing comments and changes.

Orange/preprocess/preprocess.py

lanzagar · 2019-02-05T09:16:45Z

Orange/widgets/unsupervised/owlouvainclustering.py

@@ -190,6 +208,7 @@ def cancel(self):
        self.__set_state_ready()

    def commit(self):
+        # pylint: disable=too-many-branches


I see no reason why this pylint check should be disabled for this function.
And I kind of agree with pylint that this function has a lot of ifs... Can we reduce them?
E.g. looking at the len(...attributes) < 1 check - wouldn't a better place for this be in set_data (and on error just set self.data to None, so it does not have to be checked ever again).

pavlin-policar force-pushed the louvain-pca-normalize branch from 16a8d6b to e273604 Compare February 2, 2019 10:08

pavlin-policar force-pushed the louvain-pca-normalize branch 3 times, most recently from 39cf5b7 to a7a3b19 Compare February 2, 2019 13:59

pavlin-policar force-pushed the louvain-pca-normalize branch from a7a3b19 to b5acdc6 Compare February 3, 2019 17:49

pavlin-policar force-pushed the louvain-pca-normalize branch 3 times, most recently from 939f9e2 to de07f08 Compare February 4, 2019 16:59

pavlin-policar force-pushed the louvain-pca-normalize branch 2 times, most recently from 3b932f6 to bb1f3e0 Compare February 4, 2019 17:23

pavlin-policar mentioned this pull request Feb 4, 2019

[ENH] PCA: Remove SVD & add normalization for sparse #3581

Merged

3 tasks

lanzagar reviewed Feb 5, 2019

View reviewed changes

pavlin-policar force-pushed the louvain-pca-normalize branch 3 times, most recently from 46d64bf to ee5d4f3 Compare February 11, 2019 13:58

janezd assigned lanzagar Feb 14, 2019

pavlin-policar force-pushed the louvain-pca-normalize branch 5 times, most recently from acc2add to 8fc6215 Compare February 15, 2019 10:19

pavlin-policar force-pushed the louvain-pca-normalize branch 2 times, most recently from 470625e to fc1fad9 Compare February 15, 2019 10:27

pavlin-policar added 6 commits February 15, 2019 11:28

Normalize: Add option to skip zero-centering

5066373

Move table_dense_sparse test utility to Orange.widgets.tests.utils

5001e4d

OwLouvain: Enable normalization for sparse data

a624258

OwLouvain: Disable PCA slider if Apply PCA is unchecked

9e3a237

Preprocess: Fixup docstrings for Normalize

c1de7d9

OwLouvain: Move data preprocessing to separate function

cd31ed5

pavlin-policar force-pushed the louvain-pca-normalize branch from fc1fad9 to f065341 Compare February 15, 2019 10:30

OwLouvain: Properly compare new data with old without warnings

fe28eaf

pavlin-policar force-pushed the louvain-pca-normalize branch from f065341 to fe28eaf Compare February 15, 2019 10:42

lanzagar merged commit 59228cf into biolab:master Feb 15, 2019

pavlin-policar deleted the louvain-pca-normalize branch February 15, 2019 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] OwLouvain: Add normalize data checkbox to PCA preprocessing #3573

[ENH] OwLouvain: Add normalize data checkbox to PCA preprocessing #3573

pavlin-policar commented Feb 2, 2019 •

edited

Loading

codecov bot commented Feb 2, 2019

codecov bot commented Feb 2, 2019 •

edited

Loading

janezd commented Feb 2, 2019

pavlin-policar commented Feb 2, 2019

janezd commented Feb 2, 2019

lanzagar commented Feb 4, 2019

lanzagar commented Feb 4, 2019

pavlin-policar commented Feb 4, 2019

pavlin-policar commented Feb 4, 2019 •

edited

Loading

lanzagar Feb 5, 2019

[ENH] OwLouvain: Add normalize data checkbox to PCA preprocessing #3573

[ENH] OwLouvain: Add normalize data checkbox to PCA preprocessing #3573

Conversation

pavlin-policar commented Feb 2, 2019 • edited Loading

Issue

Description of changes

Includes

codecov bot commented Feb 2, 2019

Codecov Report

codecov bot commented Feb 2, 2019 • edited Loading

Codecov Report

janezd commented Feb 2, 2019

pavlin-policar commented Feb 2, 2019

janezd commented Feb 2, 2019

lanzagar commented Feb 4, 2019

lanzagar commented Feb 4, 2019

pavlin-policar commented Feb 4, 2019

pavlin-policar commented Feb 4, 2019 • edited Loading

lanzagar Feb 5, 2019

Choose a reason for hiding this comment

pavlin-policar commented Feb 2, 2019 •

edited

Loading

codecov bot commented Feb 2, 2019 •

edited

Loading

pavlin-policar commented Feb 4, 2019 •

edited

Loading