Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] OWTSNE: Offload computation to separate thread #3604

Merged
merged 9 commits into from
May 9, 2019

Conversation

pavlin-policar
Copy link
Collaborator

@pavlin-policar pavlin-policar commented Feb 15, 2019

Issue

The t-SNE widget previously performed all computation inside the main thread, while simulating a thread i.e. it would pause every once in a while and handle user action. For larger data sets, this meant the widget would freeze for a while until the widget received control again.

Description of changes

The widget now offloads all of the computation into a separate thread, and the widget is only responsible for preparing a Task specification, which is then handled by the other thread.

Other changes include:

  • The widget now remembers partial results better, i.e. changing the exaggeration parameter doesn't recompute all the nearest neighbours again.
  • More descriptive status message and the progress bar now takes into account all the steps of the procedure.
  • Since t-SNE is deterministic, once the optimization is complete, pressing "Start" does nothing now, because the results will invariably the same.
  • The PCA slider now changes its maximum value based on the number of features in the data.
  • Add Info thing to indicate that the settings have been changed and the user should rerun the optimization.
  • I've made the t-SNE API a bit more granular so that it's possible to gain access to the affinities (nearest neighbors). In principle, this is a bad idea, because this means the widget is tied to my openTSNE implementation, but I think it's justified because 1. in case we ever opt for a different implementation, removing this from the widget requires little effort and 2. finding the nearest neighbors can take a very long time (up to half of all the computation time for larger data sets) therefore the widget should be smart enough to cache results wherever possible.
  • The widget now replots fairly frequently again. I did this because it seems that doing the work in a separate thread and drawing in the main one play well together, and the speed difference between updating often and updating rarely is small.
Includes
  • Code changes
  • Tests
  • Documentation

@pavlin-policar pavlin-policar force-pushed the tsne-threaded branch 2 times, most recently from d5097c9 to db0a98e Compare February 15, 2019 14:11
@codecov
Copy link

codecov bot commented Feb 15, 2019

Codecov Report

Merging #3604 into master will increase coverage by 0.05%.
The diff coverage is 96.64%.

@@            Coverage Diff             @@
##           master    #3604      +/-   ##
==========================================
+ Coverage    84.9%   84.95%   +0.05%     
==========================================
  Files         374      374              
  Lines       68852    69084     +232     
==========================================
+ Hits        58457    58692     +235     
+ Misses      10395    10392       -3

@pavlin-policar pavlin-policar force-pushed the tsne-threaded branch 2 times, most recently from dc1f337 to a329ccd Compare March 1, 2019 12:10

def resume(self):
self.__set_update_loop(self.tsne_iterator)
self.run_button.setText("Stop")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is not needed, since the text is already set in start().

@VesnaT
Copy link
Contributor

VesnaT commented Mar 5, 2019

Maybe you could write some tests for the TSNERunner. Each its method is easily testable. And it would also be good to check that the embedding is not modified inplace.

@pavlin-policar pavlin-policar changed the title [ENH] OWTSNE: Offload computation to separate thread [WIP][ENH] OWTSNE: Offload computation to separate thread Mar 29, 2019
@pavlin-policar pavlin-policar force-pushed the tsne-threaded branch 2 times, most recently from e628b78 to 077fc02 Compare April 5, 2019 08:47
@pavlin-policar pavlin-policar changed the title [WIP][ENH] OWTSNE: Offload computation to separate thread [ENH] OWTSNE: Offload computation to separate thread Apr 5, 2019
Copy link
Contributor

@VesnaT VesnaT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some observations:

  • OWFile (Iris) -> OWtSNE (Check Preserve global structure -> Info message appears) -> OWFile (Reload file) -> OWtSNE (Info message has disappeared)
  • OWFile (Iris) -> OWtSNE
    -> OWDataTable (Select 10 rows) -> OWtSNE (the same widget as above (input subset)) -> OWtSNE (Check Preserve global structure) -> OWDataTable (the same widget as above - reselect 10 rows) -> Crash
Traceback (most recent call last):
  File "/Users/vesna/orange3/Orange/canvas/scheme/widgetsscheme.py", line 1082, in process_signals_for_widget
    widget.handleNewSignals()
  File "/Users/vesna/orange3/Orange/widgets/visualize/utils/widget.py", line 461, in handleNewSignals
    self.graph.update_point_props()
  File "/Users/vesna/orange3/Orange/widgets/visualize/owscatterplotgraph.py", line 554, in update_point_props
    self.update_colors()
  File "/Users/vesna/orange3/Orange/widgets/visualize/owscatterplotgraph.py", line 932, in update_colors
    pen_data, brush_data = self.get_colors()
  File "/Users/vesna/orange3/Orange/widgets/visualize/owscatterplotgraph.py", line 826, in get_colors
    subset = self.master.get_subset_mask()
  File "/Users/vesna/orange3/Orange/widgets/visualize/utils/widget.py", line 482, in get_subset_mask
    dtype=np.bool, count=len(valid_data))
  File "/Users/vesna/orange3/Orange/widgets/visualize/utils/widget.py", line 481, in <genexpr>
    return np.fromiter((ex.id in self.subset_indices for ex in valid_data),
TypeError: unhashable type: 'numpy.ndarray'
  • When running tsne with several interruptions I get slightly different results than without interrupting calculation:
    OWFile (Iris) -> OWtSNE (Click Stop/Resume several times)
    -> OWtSNE
    -> Output Data vary

@VesnaT
Copy link
Contributor

VesnaT commented May 6, 2019

The following still happens:

  • OWFile (Iris) -> OWtSNE (Check Preserve global structure -> Info message appears) -> OWFile (Reload file) -> OWtSNE (Info message has disappeared)
  • OWFile (Iris) -> OWtSNE
    -> OWDataTable (Select 10 rows) -> OWtSNE (the same widget as above (input subset)) -> OWtSNE (Check Preserve global structure) -> OWDataTable (the same widget as above - reselect 10 rows) -> Crash

@pavlin-policar
Copy link
Collaborator Author

@VesnaT I fixed the two issues you found above. I've fixed them in a very simple and hacky way. I tried to find an elegant solution, but any of these would require touching the base widget and then fixing the other widgets.

@VesnaT VesnaT merged commit 3fe6160 into biolab:master May 9, 2019
@pavlin-policar pavlin-policar deleted the tsne-threaded branch May 9, 2019 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants