-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Silhouette Plot: Add cosine distance #3176
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3176 +/- ##
==========================================
+ Coverage 82.48% 82.64% +0.16%
==========================================
Files 336 342 +6
Lines 58338 59016 +678
==========================================
+ Hits 48118 48774 +656
- Misses 10220 10242 +22 |
In case of selected Cosine distance can you check the input domain to ensure it has no discrete columns? Either show an error and stop, or show a warning and drop them from the domain before computing the distance. The way that For instance using $ cat discrete-confound-a.tab
A B C
d d d
class
a1 b1 +
a1 b2 +
a3 b3 -
a1 b2 -
a2 b3 +
a3 b4 -
$ cat discrete-confound-b.tab
A B C
d d d
class
a0 a1 +
a1 a2 +
a3 a3 -
a1 b2 -
a2 b3 +
a3 b4 - compare import Orange
A = Orange.data.Table("discrete-confound-a.tab")
print(Orange.distance.Cosine(A).round(3)) which prints:
and import Orange
B = Orange.data.Table("discrete-confound-b.tab")
A = Orange.data.Table("discrete-confound-a.tab")
print(Orange.distance.Cosine(A).round(3)) that produces
|
Not just that, it seems to me that the way Cosine currently treats discrete columns is just plain wrong. It basically differentiates between the first and all other values (which it treats as equal)? |
Currently discrete features are only clipped (i.e. first value, all other values) which can give misleading results. Until better handling of discrete features, Cosine should say it does not support them so that a warning is displayed when using it.
I have changed Cosine to not advocate support of categorical features until this is resolved. This affects the Distances widget too - everything above is the same there as well. Now it shows a warning and ignores categorical features for Cosine distance. |
Added cosine distance to Silhouette Plot.
Handle nan values in the computed dist matrix (e.g. in case of all-zero vectors for cosine) by omitting instances and showing a warning.
Includes