Add stratify split support for multilabel #134
Labels
3 - Quality of Life
Not a priority
enhancement
New feature or request
NLP
Issue related to the NLP template
NUM
Issue related to the NUM template
Problem
We provide a stratify split function, but on a single column.
In some multilabel problems, we would like to stratify our dataset on several columns.
Let's say I have 30% 1s for label A, and 60% 1s for label B. I would like to split my dataset on several sets, and still have these proportions on each new sets. Today I only can apply stratification on a single label.
Concerned template
Solution
Adapt
stratified_split
function to allow serveral columns to stratify on.Additional context
Also update scripts
0_split_train_valid_test.py
.The text was updated successfully, but these errors were encountered: