Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Verify the benchmark of XgboostClassifier with initial xgboost #44

Open
xiaozhongtian opened this issue Jun 21, 2019 · 2 comments
Open

Comments

@xiaozhongtian
Copy link

xiaozhongtian commented Jun 21, 2019

Hello,
I find maybe a bug about the XgboostClassifier in dask.xgboost.

from sklearn.datasets import load_iris
import dask.dataframe as dd
import pandas as pd
dataset = load_iris()
train = dataset.data
target = dataset.target

pdf = pd.DataFrame(data = train,columns=["1","2","3","4"])
pdf_y = pd.Series(target)

# pass the multi-class to binary problem to easily show the bug.
pdf_y.replace(2,1,inplace =True) 

from xgboost import XGBClassifier
est = XGBClassifier(n_estimators=30,max_depth=7,verbosity=0,learning_rate= 0.1)

est.fit(pdf, pdf_y)
est.score(pdf, pdf_y)

with the intial xgboost , we can easily get 100% accuracy.

from dask_ml.xgboost import XGBClassifier
from distributed import Client


client = Client()
est = XGBClassifier(n_estimators=30,max_depth=7,verbosity=1,learning_rate= 0.1)
df = dd.from_pandas(pdf,chunksize=640000)
df_y = dd.from_pandas(pdf_y,chunksize=640000).astype(int)
est.fit(df, df_y )
est.score(df, df_y )

with the same parameter and the same data, we can only get 66% accuracy and the problem is that the estimator with predict() only returns 1 all the time. The 66% have no sense.

This is a simple example to show the bug. I have tested on my project with titanic dataset and it has the same problem.

est.predict(df).compute()
return 1 for all the df.

@xiaozhongtian xiaozhongtian changed the title Verify the benchmark of gboostClissfier Verify the benchmark of XgboostClissfier with initial xgboost Jun 21, 2019
@xiaozhongtian xiaozhongtian changed the title Verify the benchmark of XgboostClissfier with initial xgboost Verify the benchmark of XgboostClassifier with initial xgboost Jun 21, 2019
@TomAugspurger
Copy link
Member

Does the same issue affect distributed XGBoost without dask (e.g. https://xgboost.readthedocs.io/en/release_0.72/tutorials/aws_yarn.html)?

@xiaozhongtian
Copy link
Author

xiaozhongtian commented Jun 21, 2019

I haven't tried it, maybe i will try next Monday.
but I found that https://xgboost.readthedocs.io/en/release_0.72/tutorials/aws_yarn.html is not existed in the latest version.
https://xgboost.readthedocs.io/en/latest/tutorials/aws_yarn.html
It's really intresting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants