-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] OwLouvain: Add normalize data checkbox to PCA preprocessing #3573
Conversation
16a8d6b
to
e273604
Compare
Codecov Report
@@ Coverage Diff @@
## master #3573 +/- ##
==========================================
- Coverage 83.97% 83.97% -0.01%
==========================================
Files 370 370
Lines 66941 66960 +19
==========================================
+ Hits 56215 56229 +14
- Misses 10726 10731 +5 |
Codecov Report
@@ Coverage Diff @@
## master #3573 +/- ##
=========================================
+ Coverage 83.99% 84% +0.01%
=========================================
Files 370 370
Lines 67023 67098 +75
=========================================
+ Hits 56298 56369 +71
- Misses 10725 10729 +4 |
I tried to run the widget (using its "main"). It erred with
This is not related to this PR and may be a problem in my environment. However, the label remained "Running...". I think the widget should change the label to "Erred..." (it this the right verb? :) when optimization fail. Unrelated to this PR, but can be fixed here since it involves a change in the same place: the error was reported in the status bar, yet it's an error in the code and should be shown as crash. Similar to issue #3548. |
Yes, you're completely right. This slipped my mind completely. I'll definitely add something to indicate an error, but I don't think repeating the error message shown in the status bar is needed. It might encourage users to look for error messages somewhere in the widget UI, while this would be completely specific to this widget. I think it's better to enforce one single place for the error message on the widget (despite them being kind of hard to notice sometimes). As far as the error you got, you likely have an old version of |
39cf5b7
to
a7a3b19
Compare
Sure, no need to repeat the message. Just set the label to "Error" or something. My louvain is indeed 0.11. Which is good, otherwise I wouldn't have spotted this glitch. :) |
a7a3b19
to
b5acdc6
Compare
I would suggest to:
|
And also, same comment as in t-SNE: why is normalization not supported for sparse data? |
PCA Components reads as Principal component analysis components. Principal components would be better but that's too long, and the slider basically disappears. PCs can be unclear. I don't have any other ideas. I'll keep it as PCA Components for now. |
939f9e2
to
de07f08
Compare
I've updated the original PR description describing comments and changes. |
3b932f6
to
bb1f3e0
Compare
@@ -190,6 +208,7 @@ def cancel(self): | |||
self.__set_state_ready() | |||
|
|||
def commit(self): | |||
# pylint: disable=too-many-branches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason why this pylint check should be disabled for this function.
And I kind of agree with pylint that this function has a lot of if
s... Can we reduce them?
E.g. looking at the len(...attributes) < 1
check - wouldn't a better place for this be in set_data (and on error just set self.data to None, so it does not have to be checked ever again).
46d64bf
to
ee5d4f3
Compare
acc2add
to
8fc6215
Compare
470625e
to
fc1fad9
Compare
fc1fad9
to
f065341
Compare
f065341
to
fe28eaf
Compare
Edited on 4.2.2019
Issue
In #3448 we found that normalizing the data can have beneficial effects for t-SNE. This likely also holds for Louvain clustering, when PCA preprocessing is applied. It also makes little sense to enable data normalization for PCA in one widget, but not another.
Description of changes
I tried to separate the PCA specific parameters from the "Enable PCA preprocessing" by adding an indent. IMO it looks better than if there we no indent, but it's not very pretty.
Description of changes:
024d125 Add the option to skip zero centering to the normalization preprocessor. This will also be necessary later on, when I add normalization for sparse matrices.
This is already implemented in
Orange.preprocess.preprocess.Scale
, which seems to do exactly the same thing asOrange.preprocess.preprocess.Normalize
. Indeed, it seems as thoughScale
actually uses the same normalization implementation. This is very confusing, asNormalize
is just a slightly less powerful version ofScale
, but having a much better name. It also seems that the only placeScale
is used is in the Preprocess widget, whileNormalize
is used all over the place.This is very confusing and IMO
Scale
should be renamed toNormalize
, since they do the same thing, butScale
does it better. In any case, I am usingNormalize
here, mainly because I foundScale
after implementing this.495dfa8 Moved
table_dense_sparse
decorator intoOrange.widget.testing.utils
. It's quite handy.e1d91ef Add normalization support for sparse data. This is included here because this is not PCA specific. We only scale by SD. For PCA, this is fine, because centering is applied in PCA. This is also allright where PCA is not used, because the table is directly used for computing pairwise distances between samples. Euclidean and Manhattan distance are independent of the position i.e. they are the same if the points are centered or not. So applying only scaling is all right here as well.
Note that this is only correct as long as the distances are independent of absolute position. If we ever add other distances, this needs to be considered e.g. the cosine distance doesn't have this property.
bb1f3e0 Disable PCA components slider if Apply PCA is not checked; it makes little sense to be enabled.
Includes