Verify the benchmark of XgboostClassifier with initial xgboost #44

xiaozhongtian · 2019-06-21T16:14:01Z

Hello,
I find maybe a bug about the XgboostClassifier in dask.xgboost.

from sklearn.datasets import load_iris
import dask.dataframe as dd
import pandas as pd
dataset = load_iris()
train = dataset.data
target = dataset.target

pdf = pd.DataFrame(data = train,columns=["1","2","3","4"])
pdf_y = pd.Series(target)

# pass the multi-class to binary problem to easily show the bug.
pdf_y.replace(2,1,inplace =True) 

from xgboost import XGBClassifier
est = XGBClassifier(n_estimators=30,max_depth=7,verbosity=0,learning_rate= 0.1)

est.fit(pdf, pdf_y)
est.score(pdf, pdf_y)

with the intial xgboost , we can easily get 100% accuracy.

from dask_ml.xgboost import XGBClassifier
from distributed import Client


client = Client()
est = XGBClassifier(n_estimators=30,max_depth=7,verbosity=1,learning_rate= 0.1)
df = dd.from_pandas(pdf,chunksize=640000)
df_y = dd.from_pandas(pdf_y,chunksize=640000).astype(int)
est.fit(df, df_y )
est.score(df, df_y )

with the same parameter and the same data, we can only get 66% accuracy and the problem is that the estimator with predict() only returns 1 all the time. The 66% have no sense.

This is a simple example to show the bug. I have tested on my project with titanic dataset and it has the same problem.

est.predict(df).compute()
return 1 for all the df.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-06-21T19:38:28Z

Does the same issue affect distributed XGBoost without dask (e.g. https://xgboost.readthedocs.io/en/release_0.72/tutorials/aws_yarn.html)?

xiaozhongtian · 2019-06-21T19:51:52Z

I haven't tried it, maybe i will try next Monday.
but I found that https://xgboost.readthedocs.io/en/release_0.72/tutorials/aws_yarn.html is not existed in the latest version.
https://xgboost.readthedocs.io/en/latest/tutorials/aws_yarn.html
It's really intresting.

xiaozhongtian changed the title ~~Verify the benchmark of gboostClissfier~~ Verify the benchmark of XgboostClissfier with initial xgboost Jun 21, 2019

xiaozhongtian changed the title ~~Verify the benchmark of XgboostClissfier with initial xgboost~~ Verify the benchmark of XgboostClassifier with initial xgboost Jun 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify the benchmark of XgboostClassifier with initial xgboost #44

Verify the benchmark of XgboostClassifier with initial xgboost #44

xiaozhongtian commented Jun 21, 2019 •

edited

Loading

TomAugspurger commented Jun 21, 2019

xiaozhongtian commented Jun 21, 2019 •

edited

Loading

Verify the benchmark of XgboostClassifier with initial xgboost #44

Verify the benchmark of XgboostClassifier with initial xgboost #44

Comments

xiaozhongtian commented Jun 21, 2019 • edited Loading

TomAugspurger commented Jun 21, 2019

xiaozhongtian commented Jun 21, 2019 • edited Loading

xiaozhongtian commented Jun 21, 2019 •

edited

Loading

xiaozhongtian commented Jun 21, 2019 •

edited

Loading