-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ColumnTransformer horizontally stacks the output #1000
Comments
Thanks or the bug report. Is the `reset_index(drop=True)` component necessary to reproduce that?
Let us know if you’re able to look into this some more.
… On Aug 30, 2024, at 4:07 AM, reinierstorm ***@***.***> wrote:
The dask ColumnTransformer stacks the different transformers. The following code (essentially #365 <#365>) gives an undesirable output
import pandas as pd
import dask.dataframe as dd
import dask_ml.compose
import dask_ml.preprocessing
df = pd.DataFrame({"A": pd.Categorical(["a", "a", "b", "a"]), "B": [1.0, 2, 4, 5]})
ddf = dd.from_pandas(df, npartitions=2).reset_index(drop=True)
ct = dask_ml.compose.ColumnTransformer([
("A", dask_ml.preprocessing.OneHotEncoder(dtype='uint8'), ['A']), # Example categorical feature
("B", dask_ml.preprocessing.RobustScaler(), ['B']) # Numeric features
],
)
ct.fit_transform(ddf).compute()
The output I get is:
A_a A_b B
0 1.0 0 NaN
1 1.0 0 NaN
0 0 1.0 NaN
1 1.0 0 NaN
0 NaN NaN -1.000000
1 NaN NaN -0.666667
0 NaN NaN 0.000000
1 NaN NaN 0.333333
The output should be like that of #365 <#365>
A_a A_b B
0 1.0 0.0 -1.000000
1 1.0 0.0 -0.666667
0 0.0 1.0 0.000000
1 1.0 0.0 0.333333
Environment:
dask-ml version: 2024.4.4
dask version: 2024.8.1
Python version:3.10.14
Operating System: Ubuntu 23.04
Install method (conda, pip, source): pip
—
Reply to this email directly, view it on GitHub <#1000> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIUBNI5ITHOARWUA4EDZUAY4DBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKBZGQ2DKNJXGQ2YFJDUPFYGLJLJONZXKZNFOZQWY5LFVIZDIOJWGY3DENJVGWTXI4TJM5TWK4VGMNZGKYLUMU>.
You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
No it is not.
Yes I am able to look into this some more. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The dask ColumnTransformer stacks the different transformers. The following code (essentially #365) gives an undesirable output
The output I get is:
The output should be like that of #365
Environment:
The text was updated successfully, but these errors were encountered: