Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try dask-gateway #8

Open
mrakitin opened this issue Dec 20, 2019 · 5 comments
Open

Try dask-gateway #8

mrakitin opened this issue Dec 20, 2019 · 5 comments

Comments

@mrakitin
Copy link
Member

mrakitin commented Dec 20, 2019

https://gateway.dask.org/install-jobqueue.html

May supersede the implementation in #7.

@mrakitin
Copy link
Member Author

This is installed and configured, waiting for https://controlsweb.nsls2.bnl.gov/trac/ticket/4279.

@mrakitin
Copy link
Member Author

Some permissions were given via the Trac ticket mentioned above, however I am still not able to start a cluster. The logs are:

mrakitin@apcpu-master:/var/dask-gateway$ sudo -u dask /opt/dask-gateway/start-dask-gateway
[I 2020-02-24 10:41:07.608 DaskGateway] Starting dask-gateway-server - version 0.6.1
[D 2020-02-24 10:41:07.608 DaskGateway] Looking for /etc/dask-gateway/dask_gateway_config in /var/dask-gateway
[D 2020-02-24 10:41:07.618 DaskGateway] Loaded config file: /etc/dask-gateway/dask_gateway_config.py
[I 2020-02-24 10:41:07.627 DaskGateway] Cluster manager: 'dask_gateway_server.managers.jobqueue.slurm.SlurmClusterManager'
[I 2020-02-24 10:41:07.627 DaskGateway] Authenticator: 'dask_gateway_server.auth.DummyAuthenticator'
[D 2020-02-24 10:41:07.685 DaskGateway] Generating new cookie secret
[I 2020-02-24 10:41:07.686 DaskGateway] Generating new auth token for scheduler proxy
[I 2020-02-24 10:41:07.686 DaskGateway] Starting the Dask gateway scheduler proxy...
[I 2020-02-24 10:41:07.695 DaskGateway] Dask gateway scheduler proxy started at 'tls://apcpu-master:8786', api at 'http://127.0.0.1:52504'
[I 2020-02-24 10:41:07.994 DaskGateway] Generating new auth token for web proxy
[I 2020-02-24 10:41:07.994 DaskGateway] Starting the Dask gateway web proxy...
[I 2020-02-24 10:41:07.995 DaskGateway] Dask gateway web proxy started at 'http://apcpu-master:8080', api at 'http://127.0.0.1:38993'
[I 2020-02-24 10:41:08.001 DaskGateway] Gateway private API serving at http://127.0.0.1:58625
[D 2020-02-24 10:41:08.001 DaskGateway] Adding route '/gateway/' -> 'http://127.0.0.1:58625'
[D 2020-02-24 10:41:08.002 DaskGateway] Removed 0 expired clusters from the database
[I 2020-02-24 10:41:08.003 DaskGateway] Dask-Gateway started successfully!
[I 2020-02-24 10:41:08.003 DaskGateway] - Public address at http://apcpu-master:8080 or http://127.0.0.1:8080
[I 2020-02-24 10:41:08.003 DaskGateway] - Proxy address at tls://apcpu-master:8786 or tls://127.0.0.1:8786
[W 2020-02-24 10:42:40.640 DaskGateway] 401 GET /api/clusters/ (127.0.0.1) 0.56ms
[I 2020-02-24 10:42:40.645 DaskGateway] 200 GET /api/clusters/ (127.0.0.1) 1.43ms
[W 2020-02-24 10:42:49.178 DaskGateway] 401 POST /api/clusters/ (127.0.0.1) 0.48ms
[I 2020-02-24 10:42:49.378 DaskGateway] 201 POST /api/clusters/ (127.0.0.1) 199.05ms
[I 2020-02-24 10:42:49.378 DaskGateway] Starting cluster cd08d58f74c94e6a9a9099d84e197abc for user mrakitin...
[D 2020-02-24 10:42:49.498 DaskGateway] State update for cluster cd08d58f74c94e6a9a9099d84e197abc
[I 2020-02-24 10:42:49.501 DaskGateway] Cluster cd08d58f74c94e6a9a9099d84e197abc has started, waiting for connection
[D 2020-02-24 10:42:49.507 DaskGateway] Polling status of 1 jobs
[I 2020-02-24 10:43:09.401 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 20018.66ms
[D 2020-02-24 10:43:19.519 DaskGateway] Polling status of 1 jobs
[I 2020-02-24 10:43:29.922 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 20019.21ms
[W 2020-02-24 10:43:49.388 DaskGateway] Cluster cd08d58f74c94e6a9a9099d84e197abc startup timed out after 60.0 seconds
[I 2020-02-24 10:43:49.388 DaskGateway] Stopping cluster cd08d58f74c94e6a9a9099d84e197abc...
[D 2020-02-24 10:43:49.389 DaskGateway] Removing route '/gateway/clusters/cd08d58f74c94e6a9a9099d84e197abc'
[I 2020-02-24 10:43:49.389 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 18965.73ms
[D 2020-02-24 10:43:49.391 DaskGateway] Removing route '/cd08d58f74c94e6a9a9099d84e197abc'
[I 2020-02-24 10:43:49.484 DaskGateway] Stopped cluster cd08d58f74c94e6a9a9099d84e197abc
[I 2020-02-24 10:43:49.893 DaskGateway] 200 GET /api/clusters/cd08d58f74c94e6a9a9099d84e197abc?wait (127.0.0.1) 0.83ms
[I 2020-02-24 10:43:49.895 DaskGateway] 204 DELETE /api/clusters/cd08d58f74c94e6a9a9099d84e197abc (127.0.0.1) 0.44ms
apcpu-master:~$ conda activate dask-gateway
(/opt/conda_envs/dask-gateway) mrakitin@apcpu-master:~$ from dask_gateway import Gateway^C
(/opt/conda_envs/dask-gateway) mrakitin@apcpu-master:~$ ipython
Python 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.11.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from dask_gateway import Gateway

In [2]: gateway = Gateway("http://apcpu-master:8080")

In [3]: gateway.list_clusters()
Out[3]: []

In [4]: cluster = gateway.new_cluster()
---------------------------------------------------------------------------
GatewayClusterError                       Traceback (most recent call last)
<ipython-input-4-bf15802dc141> in <module>
----> 1 cluster = gateway.new_cluster()

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in new_cluster(self, cluster_options, shutdown_on_close, **kwargs)
    589             cluster_options=cluster_options,
    590             shutdown_on_close=shutdown_on_close,
--> 591             **kwargs,
    592         )
    593

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in __init__(self, address, proxy_address, auth, cluster_options, shutdown_on_close, asynchronous, loop, **kwargs)
    759             shutdown_on_close=shutdown_on_close,
    760             asynchronous=asynchronous,
--> 761             loop=loop,
    762         )
    763

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _init_internal(self, address, proxy_address, auth, cluster_options, cluster_kwargs, shutdown_on_close, asynchronous, loop, name)
    851             self.status = "starting"
    852         if not self.asynchronous:
--> 853             self.gateway.sync(self._start_internal)
    854
    855     @property

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in sync(self, func, *args, **kwargs)
    310             )
    311             try:
--> 312                 return future.result()
    313             except BaseException:
    314                 future.cancel()

/opt/conda_envs/dask-gateway/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    433                 raise CancelledError()
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:
    437                 raise TimeoutError()

/opt/conda_envs/dask-gateway/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _start_internal(self)
    865             self._start_task = asyncio.ensure_future(self._start_async())
    866         try:
--> 867             await self._start_task
    868         except BaseException:
    869             # On exception, cleanup

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _start_async(self)
    883         # Connect to cluster
    884         try:
--> 885             report = await self.gateway._wait_for_start(self.name)
    886         except GatewayClusterError:
    887             raise

/opt/conda_envs/dask-gateway/lib/python3.7/site-packages/dask_gateway/client.py in _wait_for_start(self, cluster_name)
    524                     raise GatewayClusterError(
    525                         "Cluster %r failed to start, see logs for "
--> 526                         "more information" % cluster_name
    527                     )
    528                 elif report.status is ClusterStatus.STOPPED:

GatewayClusterError: Cluster 'cd08d58f74c94e6a9a9099d84e197abc' failed to start, see logs for more information

@mrakitin
Copy link
Member Author

cc @cowanml, it's also tracked here.

@mrakitin
Copy link
Member Author

@dhidas, here is some information we discussed via email.

@mrakitin
Copy link
Member Author

We solved the issues during today's video-call with @danielballan and @dhidas. Will post the working configuration somewhere under version control later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant