You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.
Hello,
I find maybe a bug about the XgboostClassifier in dask.xgboost.
from sklearn.datasets import load_iris
import dask.dataframe as dd
import pandas as pd
dataset = load_iris()
train = dataset.data
target = dataset.target
pdf = pd.DataFrame(data = train,columns=["1","2","3","4"])
pdf_y = pd.Series(target)
# pass the multi-class to binary problem to easily show the bug.
pdf_y.replace(2,1,inplace =True)
from xgboost import XGBClassifier
est = XGBClassifier(n_estimators=30,max_depth=7,verbosity=0,learning_rate= 0.1)
est.fit(pdf, pdf_y)
est.score(pdf, pdf_y)
with the intial xgboost , we can easily get 100% accuracy.
from dask_ml.xgboost import XGBClassifier
from distributed import Client
client = Client()
est = XGBClassifier(n_estimators=30,max_depth=7,verbosity=1,learning_rate= 0.1)
df = dd.from_pandas(pdf,chunksize=640000)
df_y = dd.from_pandas(pdf_y,chunksize=640000).astype(int)
est.fit(df, df_y )
est.score(df, df_y )
with the same parameter and the same data, we can only get 66% accuracy and the problem is that the estimator with predict() only returns 1 all the time. The 66% have no sense.
This is a simple example to show the bug. I have tested on my project with titanic dataset and it has the same problem.
est.predict(df).compute()
return 1 for all the df.
The text was updated successfully, but these errors were encountered:
xiaozhongtian
changed the title
Verify the benchmark of gboostClissfier
Verify the benchmark of XgboostClissfier with initial xgboost
Jun 21, 2019
xiaozhongtian
changed the title
Verify the benchmark of XgboostClissfier with initial xgboost
Verify the benchmark of XgboostClassifier with initial xgboost
Jun 21, 2019
Hello,
I find maybe a bug about the XgboostClassifier in dask.xgboost.
with the intial xgboost , we can easily get 100% accuracy.
with the same parameter and the same data, we can only get 66% accuracy and the problem is that the estimator with predict() only returns 1 all the time. The 66% have no sense.
This is a simple example to show the bug. I have tested on my project with titanic dataset and it has the same problem.
est.predict(df).compute()
return 1 for all the df.
The text was updated successfully, but these errors were encountered: