-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Update t-SNE widget #6345
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6345 +/- ##
==========================================
- Coverage 87.78% 87.43% -0.36%
==========================================
Files 321 317 -4
Lines 69445 68455 -990
==========================================
- Hits 60962 59852 -1110
- Misses 8483 8603 +120 |
One more thing I'd like feedback on. Both t-SNE and MDS use ContextSettings, and use the "effective data" to open the context. The "effective data" is either taken from the distance matrix However, with t-SNE, a problem arises:
This happens because in both scenarios, the context being opened and saved is on the Iris data table. When a distance matrix is provided, the preprocessing controls are disabled and unchecked to indicate they aren't being used. However, when we re-open Iris using a data table (not distances), these settings would still be useful. Ideally, the context would remember whether or not we were using a This issue never arises in MDS because MDS has no such parameters. |
Don't set
That is, add You can do the same for others, e.g. PCA preprocessing. |
97875dd
to
693fc65
Compare
Thanks! I didn't know about this option. I believe this now works. It works properly in Orange, but I can't get my test |
32453ad
to
573ce27
Compare
If I remember correctly, this PR still has a bug where the widget crashes if, for instance, the data contains NaN, and Normalize data is not checked. I am speaking from memory from a few months ago, so it might not be this exact thing, but something related to this. I'll need to fix that before this PR can be merged. |
Bombs when "Apply PCA preprocessing" is switched off and the data contains unknowns. Try with HDI data set from Datasets widget. I suggest to use the same preprocessor for imputation as used with PCA initialization. |
When distances are provided as input, the slider with PCA components should be disabled as well (not to confuse the user that PCA applies to this data). |
573ce27
to
6d9a3ca
Compare
This should now be fixed. We use the default t-SNE preprocessor at the start of the pipeline now.
I can't reproduce this; it's always disabled for me. Could you maybe provide your Qt version?
In general, no. By default, t-SNE uses PCA initialization, which can only be computed when the original data matrix is available. If we pass in a distance matrix, we can't compute the PCA projection, so we instead initialize the t-SNE embedding using a spectral layout, which can be obtained from the similarity graph obtained from the distance matrix. So, in general, the answer is no. However, the results should be the same if we used the spectral initialization for both the Table input and Distance matrix input. However, I see that this is, in fact, not the case, and some rotation seems to be happening. I will need to look into this further. |
6d9a3ca
to
5ef6b03
Compare
5ef6b03
to
1da1967
Compare
I've finally tracked down the bug. It turns out that when testing comboboxes, we can't just write
in the tests. This doesn't update the widget settings. It seems as though we have to instead use the
No idea why this works, but I'm just glad to have tracked down this bug. This PR is now probably ready for review. |
Never mind. Apparently this times out on the CI servers. Does anyone have any idea how to fix this? @janezd @ales-erjavec @VesnaT |
Don't call the WidgetTest.show() method in the tests. |
b54f065
to
34c79d3
Compare
Oh my, I forgot to remove that... Sorry, that was very silly of me! Hopefully it should work now then. |
This is now (really) be ready for review. |
Description of changes
This PR revamps the t-SNE widget.
Changes include:
The main change is added support for a Distances signal, as requested by @BlazZupan, which required a large number of changes. I've tried to make the behaviour of the t-SNE widget as similar as possible to the MDS widget, which also supports both Data and Distances signals. Using a distance matrix defaults to spectral initialization, and ignores any preprocessing steps.
The only discernible difference between the two widgets is the following:
When the MDS widget gets a Distance matrix, it comptues the embedding. If it then gets an incompatible Data table, it will hide the embedding, which makes sense. If, however, the data table has other issues which get caught by MDS's error checking, e.g. the data table has less than two rows, it will show an error for that, and still keep the embedding visible. The t-SNE widget does not do this. The t-SNE widget shows the "Incompatible data" error and hides the embedding for any error.
I have also found a potential improvement for both MDS and t-SNE, which will not be included in this PR: If the widget gets a valid distance matrix, it computes the embedding. If, however, it then gets an incompatible data table, it clears everything. Once the offending data table is removed from the input, the embedding has to be recomputed again. If the widgets cached the embedding for a given distance matrix, even when the input data signal is invalid, the recomputation could be avoided.
Includes