Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested configuration for training NN models on highly imbalanced datasets #229

Open
henrykang7177 opened this issue Nov 30, 2022 · 1 comment

Comments

@henrykang7177
Copy link

henrykang7177 commented Nov 30, 2022

Hi!

I have a binary classification dataset with highly imbalanced label distributions (pos : neg == 1 : 200)

I was trying to apply the BERT code in Neural Network Quick Start Tutorial
directly on this dataset, with val metric set to "Macro-F1", but the trained model would mostly produce all negatives in this case.

I am wondering if there are parameters or configurations I could tune in LibMultiLabel for such an imbalanced dataset to improve the model's performance?

For your reference:

I also tried the linear method, where I saw using train_cost_sensitive instead of train_1vsrest improved noticeably on this issue. (with train_cost_sensitive, the model predicts 4 times more positive samples than with train_1vsrest. Although both methods have 'Micro-F1 and 'P@1' close to 0.99 (due to dominating negative samples) and Macro-F1 around 0.5)

Thanks!

@cjlin1
Copy link
Collaborator

cjlin1 commented Nov 30, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants