Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine-learning.ipynb on http://pangeo.pydata.org RuntimeError #1

Open
raybellwaves opened this issue Jun 25, 2018 · 10 comments
Open

Comments

@raybellwaves
Copy link
Member

Tried to run the the cell

from sklearn.externals import joblib

with joblib.parallel_backend('dask', scatter=[X, y]):
    grid_search.fit(X, y)

and got the output (it's long...)
Possibly the RuntimeError: Joblib backend requires either joblib>= '0.10.2' orsklearn > '0.17.1'. Please install or upgrade is the main issue?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-085d2322fa37> in <module>()
      2 
      3 with joblib.parallel_backend('dask', scatter=[X, y]):
----> 4     grid_search.fit(X, y)

/opt/conda/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
    640 
    641         # if one choose to see train score, "out" will contain train score info

/opt/conda/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time

/opt/conda/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in retrieve(self)
    699                     self._output.extend(job.get(timeout=self.timeout))
    700                 else:
--> 701                     self._output.extend(job.get())
    702 
    703             except BaseException as exception:

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in get()
    249 
    250         def get():
--> 251             return ref().result()
    252 
    253         future.get = get # monkey patch to achieve AsyncResult API

/opt/conda/lib/python3.6/site-packages/distributed/client.py in result(self, timeout)
    190                                   raiseit=False)
    191         if self.status == 'error':
--> 192             six.reraise(*result)
    193         elif self.status == 'cancelled':
    194             raise result

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads()
     57 def loads(x):
     58     try:
---> 59         return pickle.loads(x)
     60     except Exception:
     61         logger.info("Failed to deserialize %s", x[:10000], exc_info=True)

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in <module>()
     38     _bases.append(ParallelBackendBase)
     39 if not _bases:
---> 40     raise RuntimeError("Joblib backend requires either `joblib` >= '0.10.2' "
     41                        " or `sklearn` > '0.17.1'. Please install or upgrade")
     42 

RuntimeError: Joblib backend requires either `joblib` >= '0.10.2'  or `sklearn` > '0.17.1'. Please install or upgrade
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d45c7b8>, <Future finished exception=CancelledError(['_fit_and_score-batch-c8bc3da59762435bb023dded3c77fb1c'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c8bc3da59762435bb023dded3c77fb1c']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d459f28>, <Future finished exception=CancelledError(['_fit_and_score-batch-c4ce3d7618034bec8f259a15b9b99b3f'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c4ce3d7618034bec8f259a15b9b99b3f']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6e527620>, <Future finished exception=CancelledError(['_fit_and_score-batch-4ca1e7b762c44a0d930e15f6c6a981f9'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-4ca1e7b762c44a0d930e15f6c6a981f9']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6edb52f0>, <Future finished exception=CancelledError(['_fit_and_score-batch-29b5dd78588d448a8eb6e33d0d7400ca'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-29b5dd78588d448a8eb6e33d0d7400ca']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6fddf950>, <Future finished exception=CancelledError(['_fit_and_score-batch-c0c51b4512904a449c9cd169b95b749e'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c0c51b4512904a449c9cd169b95b749e']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6edb11e0>, <Future finished exception=CancelledError(['_fit_and_score-batch-50ac41eee8364dcbb7b42e46ef9b0912'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-50ac41eee8364dcbb7b42e46ef9b0912']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6ed93378>, <Future finished exception=CancelledError(['_fit_and_score-batch-c20e4a9fc8654ae290286dbe6fab8c14'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c20e4a9fc8654ae290286dbe6fab8c14']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d45e048>, <Future finished exception=CancelledError(['_fit_and_score-batch-eea80eb9ac67456abbc3f6ab66742105'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-eea80eb9ac67456abbc3f6ab66742105']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6e527e18>, <Future finished exception=CancelledError(['_fit_and_score-batch-f9de1c20b4034245968ae293f0296956'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-f9de1c20b4034245968ae293f0296956']
@martindurant
Copy link
Collaborator

@TomAugspurger, does this mean anything to you, perhaps a joblib/sklearn release schedule thing?

@TomAugspurger
Copy link
Member

Did you import dask_ml.joblib, or import distributed.joblib first?

@raybellwaves
Copy link
Member Author

raybellwaves commented Jun 25, 2018

The imports (in order) throughout the notebook are:

from dask_kubernetes import KubeCluster
from dask.distributed import Client, progress
import dask_ml.joblib  # register the distriubted backend
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import pandas as pd
from sklearn.externals import joblib

@TomAugspurger
Copy link
Member

Thanks, that would have raised a different error anyway.

Will take a look later.

@dazzag24
Copy link

Seeing same issue in example notebooks

@TomAugspurger
Copy link
Member

TomAugspurger commented Jul 20, 2018

Hopefully fixed by pangeo-data/helm-chart#51

You could maybe work around it by adding dask-ml to the worker-template.yaml, something like

    env:
      - name: EXTRA_CONDA_PACKAGES
        value: dask-ml

for now, but that isn't a long-term solution.

@rsignell-usgs
Copy link
Member

This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge.

@TomAugspurger
Copy link
Member

TomAugspurger commented Jul 23, 2018 via email

@TomAugspurger
Copy link
Member

Ah, of course my diagnosis is incorrect, since the example doesn't actually require dask-ml, just scikit-learn and distributed.

I'll do some further debugging...

@rsignell-usgs
Copy link
Member

@TomAugspurger, we actually are using the notebook image for the workers too, so that old worker Dockerfile is misleading. The notebook environment contains dask-ml, which is required by the example notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants