-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try dask-gateway #8
Comments
This is installed and configured, waiting for https://controlsweb.nsls2.bnl.gov/trac/ticket/4279. |
Some permissions were given via the Trac ticket mentioned above, however I am still not able to start a cluster. The logs are: mrakitin@apcpu-master:/var/dask-gateway$ sudo -u dask /opt/dask-gateway/start-dask-gateway
[I 2020-02-24 10:41:07.608 DaskGateway] Starting dask-gateway-server - version 0.6.1
[D 2020-02-24 10:41:07.608 DaskGateway] Looking for /etc/dask-gateway/dask_gateway_config in /var/dask-gateway
[D 2020-02-24 10:41:07.618 DaskGateway] Loaded config file: /etc/dask-gateway/dask_gateway_config.py
[I 2020-02-24 10:41:07.627 DaskGateway] Cluster manager: 'dask_gateway_server.managers.jobqueue.slurm.SlurmClusterManager'
[I 2020-02-24 10:41:07.627 DaskGateway] Authenticator: 'dask_gateway_server.auth.DummyAuthenticator'
[D 2020-02-24 10:41:07.685 DaskGateway] Generating new cookie secret
[I 2020-02-24 10:41:07.686 DaskGateway] Generating new auth token for scheduler proxy
[I 2020-02-24 10:41:07.686 DaskGateway] Starting the Dask gateway scheduler proxy...
[I 2020-02-24 10:41:07.695 DaskGateway] Dask gateway scheduler proxy started at 'tls://apcpu-master:8786', api at 'http://127.0.0.1:52504'
[I 2020-02-24 10:41:07.994 DaskGateway] Generating new auth token for web proxy
[I 2020-02-24 10:41:07.994 DaskGateway] Starting the Dask gateway web proxy...
[I 2020-02-24 10:41:07.995 DaskGateway] Dask gateway web proxy started at 'http://apcpu-master:8080', api at 'http://127.0.0.1:38993'
[I 2020-02-24 10:41:08.001 DaskGateway] Gateway private API serving at http://127.0.0.1:58625
[D 2020-02-24 10:41:08.001 DaskGateway] Adding route '/gateway/' -> 'http://127.0.0.1:58625'
[D 2020-02-24 10:41:08.002 DaskGateway] Removed 0 expired clusters from the database
[I 2020-02-24 10:41:08.003 DaskGateway] Dask-Gateway started successfully!
[I 2020-02-24 10:41:08.003 DaskGateway] - Public address at http://apcpu-master:8080 or http://127.0.0.1:8080
[I 2020-02-24 10:41:08.003 DaskGateway] - Proxy address at tls://apcpu-master:8786 or tls://127.0.0.1:8786
[W 2020-02-24 10:42:40.640 DaskGateway] 401 GET /api/clusters/ (127.0.0.1) 0.56ms
[I 2020-02-24 10:42:40.645 DaskGateway] 200 GET /api/clusters/ (127.0.0.1) 1.43ms
[W 2020-02-24 10:42:49.178 DaskGateway] 401 POST /api/clusters/ (127.0.0.1) 0.48ms
[I 2020-02-24 10:42:49.378 DaskGateway] 201 POST /api/clusters/ (127.0.0.1) 199.05ms
[I 2020-02-24 10:42:49.378 DaskGateway] Starting cluster cd08d58f74c94e6a9a9099d84e197abc for user mrakitin...
[D 2020-02-24 10:42:49.498 DaskGateway] State update for cluster cd08d58f74c94e6a9a9099d84e197abc
[I 2020-02-24 10:42:49.501 DaskGateway] Cluster cd08d58f74c94e6a9a9099d84e197abc has started, waiting for connection
[D 2020-02-24 10:42:49.507 DaskGateway] Polling status of 1 jobs
[I 2020-02-24 10:43:09.401 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 20018.66ms
[D 2020-02-24 10:43:19.519 DaskGateway] Polling status of 1 jobs
[I 2020-02-24 10:43:29.922 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 20019.21ms
[W 2020-02-24 10:43:49.388 DaskGateway] Cluster cd08d58f74c94e6a9a9099d84e197abc startup timed out after 60.0 seconds
[I 2020-02-24 10:43:49.388 DaskGateway] Stopping cluster cd08d58f74c94e6a9a9099d84e197abc...
[D 2020-02-24 10:43:49.389 DaskGateway] Removing route '/gateway/clusters/cd08d58f74c94e6a9a9099d84e197abc'
[I 2020-02-24 10:43:49.389 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 18965.73ms
[D 2020-02-24 10:43:49.391 DaskGateway] Removing route '/cd08d58f74c94e6a9a9099d84e197abc'
[I 2020-02-24 10:43:49.484 DaskGateway] Stopped cluster cd08d58f74c94e6a9a9099d84e197abc
[I 2020-02-24 10:43:49.893 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 0.83ms
[I 2020-02-24 10:43:49.895 DaskGateway] 204 DELETE /api/clusters/cd08d58f74c94e6a9a9099d84e197abc (127.0.0.1) 0.44ms apcpu-master:~$ conda activate dask-gateway
(/opt/conda_envs/dask-gateway) mrakitin@apcpu-master:~$ from dask_gateway import Gateway^C
(/opt/conda_envs/dask-gateway) mrakitin@apcpu-master:~$ ipython
Python 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.11.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from dask_gateway import Gateway
In [2]: gateway = Gateway("http://apcpu-master:8080")
In [3]: gateway.list_clusters()
Out[3]: []
In [4]: cluster = gateway.new_cluster()
---------------------------------------------------------------------------
GatewayClusterError Traceback (most recent call last)
<ipython-input-4-bf15802dc141> in <module>
----> 1 cluster = gateway.new_cluster()
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in new_cluster(self, cluster_options, shutdown_on_close, **kwargs)
589 cluster_options=cluster_options,
590 shutdown_on_close=shutdown_on_close,
--> 591 **kwargs,
592 )
593
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in __init__(self, address, proxy_address, auth, cluster_options, shutdown_on_close, asynchronous, loop, **kwargs)
759 shutdown_on_close=shutdown_on_close,
760 asynchronous=asynchronous,
--> 761 loop=loop,
762 )
763
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _init_internal(self, address, proxy_address, auth, cluster_options, cluster_kwargs, shutdown_on_close, asynchronous, loop, name)
851 self.status = "starting"
852 if not self.asynchronous:
--> 853 self.gateway.sync(self._start_internal)
854
855 @property
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in sync(self, func, *args, **kwargs)
310 )
311 try:
--> 312 return future.result()
313 except BaseException:
314 future.cancel()
/opt/conda_envs/dask-gateway/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
433 raise CancelledError()
434 elif self._state == FINISHED:
--> 435 return self.__get_result()
436 else:
437 raise TimeoutError()
/opt/conda_envs/dask-gateway/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _start_internal(self)
865 self._start_task = asyncio.ensure_future(self._start_async())
866 try:
--> 867 await self._start_task
868 except BaseException:
869 # On exception, cleanup
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _start_async(self)
883 # Connect to cluster
884 try:
--> 885 report = await self.gateway._wait_for_start(self.name)
886 except GatewayClusterError:
887 raise
/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _wait_for_start(self, cluster_name)
524 raise GatewayClusterError(
525 "Cluster %r failed to start, see logs for "
--> 526 "more information" % cluster_name
527 )
528 elif report.status is ClusterStatus.STOPPED:
GatewayClusterError: Cluster 'cd08d58f74c94e6a9a9099d84e197abc' failed to start, see logs for more information |
cc @cowanml, it's also tracked here. |
@dhidas, here is some information we discussed via email. |
We solved the issues during today's video-call with @danielballan and @dhidas. Will post the working configuration somewhere under version control later. |
https://gateway.dask.org/install-jobqueue.html
May supersede the implementation in #7.
The text was updated successfully, but these errors were encountered: