-
-
Notifications
You must be signed in to change notification settings - Fork 43
xgboost does not train from existing model in distributed environment #70
Comments
Do you know why that might be? Do things work if you use the native dask integration? https://xgboost.readthedocs.io/en/latest/tutorials/dask.html |
I don't know why. The native dask integration in the link can train from existing model. However, I have a different problem with it. Its performance is just like random in distributed environment vs. good performance from Dask xgboost with the same parameters and data. |
I'm not sure why that would be, but the usual recommendation is to create a
performance report:
https://distributed.dask.org/en/latest/diagnosing-performance.html#performance-reports
…On Mon, Mar 16, 2020 at 6:45 PM tuanatrieu ***@***.***> wrote:
I don't know why. The native dask integration in the link can train from
existing model.
However, I have a different problem with it. Its performance is just like
random in distributed environment vs. good performance from Dask xgboost
with the same parameters and data.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#70 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIVHPPOY5YKVF6KTKFLRH22ZJANCNFSM4LKICDDQ>
.
|
I can create a performance report, but the thing is that the training does not seem to happen and never finish even if I build just one tree. CPU usage in all workers are %2-6% (vs. ~ +100% CPU usage if I remove the parameter |
Just a blind guess, have you tried deleting They will be wherever |
Thank for the suggestion. I had |
When continuing training xgboost from an existing model in distributed environment with more than 3 workers, xgboost does not train: nothing happens in workers and it never finishes. But in local cluster or distributed cluster with less than 3 workers, the training happens and finishes.
dxgb.train(client, params, X_train, y_train,
xgb_model=existing_model,...)
The text was updated successfully, but these errors were encountered: