Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Parallelized cross validation & other evaluation methods #1004

Merged
merged 5 commits into from
Jun 30, 2016

Conversation

kernc
Copy link
Contributor

@kernc kernc commented Feb 3, 2016

Parallel multiprocess execution of model evaluation methods.

Assumptions:

  • there is some n_cpu * (data.nbytes + C) extra RAM available,
  • passed arguments (learners, data) are picklable, else it fallbacks to one single thread.

@@ -28,7 +28,7 @@ def test_LinearSVM(self):
def test_NuSVM(self):
learn = NuSVMLearner(nu=0.01)
res = CrossValidation(self.data, [learn], k=2)
self.assertGreater(CA(res)[0], 0.9)
self.assertGreater(CA(res)[0], 0.8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov-io
Copy link

codecov-io commented Feb 3, 2016

Current coverage is 87.59%

Merging #1004 into master will decrease coverage by 0.12%

@@             master      #1004   diff @@
==========================================
  Files            75         75          
  Lines          7419       7458    +39   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6508       6533    +25   
- Misses          911        925    +14   
  Partials          0          0          
  1. 3 files (not in diff) in Orange/data were modified. more
    • Misses +2
    • Hits -26
  2. 2 files (not in diff) in ...range/classification were modified. more
    • Hits +9
  3. 5 files (not in diff) in Orange were modified. more
    • Misses +13
    • Hits -16

Sunburst

Powered by Codecov. Last updated by ae82d28...66531a4

@kernc
Copy link
Contributor Author

kernc commented Feb 3, 2016

@BlazZupan
Copy link
Contributor

Please add JOBLIB_START_METHOD=forkserver for MacOS X.

@kernc
Copy link
Contributor Author

kernc commented Feb 8, 2016

There's a lot room for improvement in terms of condensing that code. But for now ...

Please test thoroughly. Someone (@ajdapretnar, @thocevar) should try it on Windows too.

@ajdapretnar
Copy link
Contributor

I get this on Windows (zoo.tab, SVM classifier).

File "d:\orange\orange3\Orange\widgets\evaluate\owtestlearners.py", line 573, in apply
self._update_results()
File "d:\orange\orange3\Orange\widgets\evaluate\owtestlearners.py", line 391, in _update_results
self.data, learners, **common_args)
File "d:\orange\orange3\Orange\evaluation\testing.py", line 458, in init
with joblib.Parallel(n_jobs=n_jobs, backend=ctx) as parallel:
AttributeError: exit

@kernc
Copy link
Contributor Author

kernc commented Feb 9, 2016

Should work now.

@thocevar
Copy link
Contributor

The reason for "AttributeError: exit" was an old version of joblib (0.8 instead of 0.9). Otherwise, it works for me.

@kernc kernc changed the title testing.CrossValidation & co. parallelized across learners/folds/splits/iterations [ENH] Parallelized cross validation & other evaluation methods Mar 16, 2016
@kernc kernc force-pushed the parallel-cv branch 2 times, most recently from 7e669db to 7fa3811 Compare March 31, 2016 15:04
@BlazZupan BlazZupan removed their assignment Apr 8, 2016
@kernc
Copy link
Contributor Author

kernc commented Apr 21, 2016

Blocked on #1114.

@kernc kernc force-pushed the parallel-cv branch 5 times, most recently from 26aef6d to c02f849 Compare May 10, 2016 16:34
@kernc kernc assigned BlazZupan and astaric and unassigned BlazZupan and astaric May 19, 2016
@kernc kernc changed the title [WIP] [ENH] Parallelized cross validation & other evaluation methods [ENH] Parallelized cross validation & other evaluation methods May 19, 2016
@kernc kernc force-pushed the parallel-cv branch 4 times, most recently from 364b3bf to 90ade88 Compare May 20, 2016 18:36
@kernc
Copy link
Contributor Author

kernc commented Jun 3, 2016

@BlazZupan, @astaric This is ready for another review.

Not sure why these lines don't cover. An appropriate test has been added.

@astaric
Copy link
Member

astaric commented Jun 3, 2016

For me, the gui still freezes while the cross valudation is being executed. Would it be possible to do it without blocking the gui thread?

(They way widget that download data, like https://github.com/biolab/orange-bio/blob/master/orangecontrib/bio/widgets3/OWBioMart.py#L710, do it?)

@BlazZupan
Copy link
Contributor

BlazZupan commented Jun 4, 2016

Causes a RuntimeError

Can't run multiprocessing code without a __main__ guard.

It works with n_jobs=1, or with the main guard, but I guess the intended behaviour is that CrossValidation should work as before, unchanged, without main guard.

@kernc
Copy link
Contributor Author

kernc commented Jun 13, 2016

The default is now n_jobs=1 so it should work as before.

@astaric joblib complains: 😬

UserWarning: Multiprocessing backed parallel loops cannot be nested below threads, setting n_jobs=1

FWIW, it doesn't stall at all with 'fork' backend e.g. on Linux. 😃

@kernc
Copy link
Contributor Author

kernc commented Jun 29, 2016

@ajdapretnar can you confirm this works well on Windos and just merge it if it does. Thanks. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants