You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a training pipeline that hyperparameter tunes the best imputation method
My pipeline fails when sklearn's train_test_split(stratify=stratify_data) is insufficient with cols containing Nan values
Curious if this seems like a scikit-lego feature people would want
Here's my attempt to stratify cols with some Nans for more context, I am a beginner so open to better ideas or comments if this feature request is out of scope. Thanks in advance!! Appreciate everyone's contributions to this package!
Strat attempt:
X=result_df[feature_cols]
y=result_df['strokes_to_hole_out']
#Extract the columns for stratificationstratify_cols= ['from_location_scorer','from_location_laser']
stratify_data=result_df[stratify_cols]
#Split the data, using 'stratify_data' for stratificationX_train, X_valid, y_train, y_valid=train_test_split(X, y, test_size=0.2, random_state=42, stratify=stratify_data)
error I receive come training: Trial failed with exception: Found unknown categories ['blue'] in column 9 during transform
The text was updated successfully, but these errors were encountered:
Hello!
Here's my attempt to stratify cols with some Nans for more context, I am a beginner so open to better ideas or comments if this feature request is out of scope. Thanks in advance!! Appreciate everyone's contributions to this package!
Strat attempt:
error I receive come training: Trial failed with exception: Found unknown categories ['blue'] in column 9 during transform
The text was updated successfully, but these errors were encountered: