Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

[QST] test_numpy() fail with "rabit::Init is already called in this thread" #47

Open
ksangeek opened this issue Jul 1, 2019 · 2 comments

Comments

@ksangeek
Copy link
Contributor

ksangeek commented Jul 1, 2019

I am using dask-xgboost 0.1.7 with xgboost 0.82.
test_core.py::test_numpy was failing for me and I looked more into the failure and this is my understanding. I am bit amused as these tests were passing for me the last week and AFAIR with the same version of packages )!
Need some help to understand what is going on here.

  1. test_core.py::test_numpy failed with rabit::Init is already called in this thread. And these are the details from pdb -
$ pytest test_core.py::test_numpy
====================================== test session starts =======================================
platform linux -- Python 3.6.8, pytest-4.6.2, py-1.8.0, pluggy-0.12.0
rootdir: ./tests
plugins: cov-2.7.1, forked-1.0.2, xdist-1.28.0
collected 1 item

test_core.py
>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> ./tests/test_core.py(200)test_numpy()
-> dX = da.from_array(X, chunks=(2, 2))
(Pdb) n
> ./tests/test_core.py(201)test_numpy()
-> dy = da.from_array(y, chunks=(2,))
(Pdb)
> ./tests/test_core.py(202)test_numpy()
-> dbst = yield dxgb.train(c, param, dX, dy)
(Pdb)
[08:42:34] Tree method is automatically selected to be 'approx' for distributed training.[08:42:34
] Tree method is automatically selected to be 'approx' for distributed training.

> ./tests/test_core.py(203)test_numpy()
-> dbst = yield dxgb.train(c, param, dX, dy)  # we can do this twice
(Pdb)
[08:42:38] Tree method is automatically selected to be 'approx' for distributed training.[08:42:38
] Tree method is automatically selected to be 'approx' for distributed training.

> ./tests/test_core.py(205)test_numpy()
-> predictions = dxgb.predict(c, dbst, dX)
(Pdb)
rabit::Init is already called in this thread
  1. On seeing the comment python# workaround for "Doing rabit call after Finalize" in the test-case; I attempted to fix it with -
@@ -179,6 +179,7 @@ def test_dmatrix_kwargs(c, s, a, b):


 def _test_container(dbst, predictions, X_type):
+    xgb.rabit.init()  # workaround for "Doing rabit call after Finalize"
     dtrain = xgb.DMatrix(X_type(X), label=y)
     bst = xgb.train(param, dtrain)

@@ -195,7 +196,6 @@ def _test_container(dbst, predictions, X_type):

 @gen_cluster(client=True, timeout=None, check_new_threads=False)
 def test_numpy(c, s, a, b):
-    xgb.rabit.init()  # workaround for "Doing rabit call after Finalize"
     dX = da.from_array(X, chunks=(2, 2))
     dy = da.from_array(y, chunks=(2,))
     dbst = yield dxgb.train(c, param, dX, dy)

and this particular test case worked fine, but it does not help me to fix failure with overall test script execution. That still fails like this -

$ pytest
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.6.8, pytest-4.6.2, py-1.8.0, pluggy-0.12.0 -- ./anaconda3/envs/test-dask-xgb/bin/python
cachedir: .pytest_cache
rootdir: ./sandbox/dask-xgboost, inifile: setup.cfg
plugins: cov-2.7.1, forked-1.0.2, xdist-1.28.0
[gw0] linux Python 3.6.8 cwd: ./sandbox/dask-xgboost/dask_xgboost/tests
[gw0] Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:34:02)  -- [GCC 7.3.0]
gw0 [12]
scheduling tests via LoadScheduling
[gw0] [  8%] PASSED test_core.py::test_basic
[gw0] [ 16%] PASSED test_core.py::test_dmatrix_kwargs
[gw0] [ 25%] FAILED test_core.py::test_numpy
[gw0] [ 33%] FAILED test_core.py::test_scipy_sparse
[gw0] [ 41%] FAILED test_core.py::test_sparse
[gw0] [ 50%] PASSED test_core.py::test_errors
[gw0] [ 58%] FAILED test_core.py::test_classifier
[gw0] [ 66%] FAILED test_core.py::test_multiclass_classifier
[gw0] [ 75%] FAILED test_core.py::test_classifier_multi[array]
[gw0] [ 83%] FAILED test_core.py::test_classifier_multi[dataframe]
[gw0] [ 91%] FAILED test_core.py::test_regressor
[gw0] [100%] FAILED test_core.py::test_synchronous_api ./anaconda3/envs/test-dask-xgb/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 6 leaked semaphores to clean up at shutdown
..
@TomAugspurger
Copy link
Member

TomAugspurger commented Jul 1, 2019 via email

@ksangeek
Copy link
Contributor Author

ksangeek commented Jul 2, 2019

@TomAugspurger Thanks for the link to your attempt.
Inferring from the comments in #39 (comment) I am expecting that the work being done for low-level integration of dask in xgboost would not suffer from this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants