Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stratify split support for multilabel #134

Open
2 of 5 tasks
LexABzH opened this issue Jan 25, 2023 · 0 comments
Open
2 of 5 tasks

Add stratify split support for multilabel #134

LexABzH opened this issue Jan 25, 2023 · 0 comments
Labels
3 - Quality of Life Not a priority enhancement New feature or request NLP Issue related to the NLP template NUM Issue related to the NUM template

Comments

@LexABzH
Copy link
Collaborator

LexABzH commented Jan 25, 2023

Problem

We provide a stratify split function, but on a single column.
In some multilabel problems, we would like to stratify our dataset on several columns.

Let's say I have 30% 1s for label A, and 60% 1s for label B. I would like to split my dataset on several sets, and still have these proportions on each new sets. Today I only can apply stratification on a single label.

Concerned template

  • NLP template
  • NUM template
  • VISION template
  • API template
  • How templates are generated - Jinja

Solution

Adapt stratified_split function to allow serveral columns to stratify on.

Additional context

Also update scripts 0_split_train_valid_test.py.

@LexABzH LexABzH added enhancement New feature or request NLP Issue related to the NLP template NUM Issue related to the NUM template 3 - Quality of Life Not a priority labels Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Quality of Life Not a priority enhancement New feature or request NLP Issue related to the NLP template NUM Issue related to the NUM template
Projects
None yet
Development

No branches or pull requests

1 participant