Suggested configuration for training NN models on highly imbalanced datasets #229

henrykang7177 · 2022-11-30T07:58:40Z

Hi!

I have a binary classification dataset with highly imbalanced label distributions (pos : neg == 1 : 200)

I was trying to apply the BERT code in Neural Network Quick Start Tutorial
directly on this dataset, with val metric set to "Macro-F1", but the trained model would mostly produce all negatives in this case.

I am wondering if there are parameters or configurations I could tune in LibMultiLabel for such an imbalanced dataset to improve the model's performance?

For your reference:

I also tried the linear method, where I saw using train_cost_sensitive instead of train_1vsrest improved noticeably on this issue. (with train_cost_sensitive, the model predicts 4 times more positive samples than with train_1vsrest. Although both methods have 'Micro-F1 and 'P@1' close to 0.99 (due to dominating negative samples) and Macro-F1 around 0.5)

Thanks!

The text was updated successfully, but these errors were encountered:

cjlin1 · 2022-11-30T12:27:52Z

Can you provide more details of your exp settings? For example, configuration and exp log. Thanks

…

On 2022-11-30 15:58, henrykang7177 wrote: Hi I have a binary classification dataset with highly imbalanced label distributions (pos : neg == 1 : 200) I was trying to apply the BERT code in Neural Network Quick Start Tutorial [1] directly on this dataset, with val metric set to "Macro-F1", but the trained model would mostly produce all negatives in this case. I am wondering if there are parameters or configurations I could tune in LibMultiLabel for such an imbalanced dataset to improve the model's performance? For your reference: I also tried the linear method, where I saw using train_cost_sensitive instead of train_1vsrest improved noticeably on this issue. (with train_cost_sensitive, the model predicts 4 times more positive samples than with train_1vsrest. Although both methods have 'Micro-F1 and ***@***.***' close to 0.99 (due to dominating negative samples) and Macro-F1 around 0.5) Thanks! -- Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#229", "url": "#229", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] Links: ------ [1] https://www.csie.ntu.edu.tw/~cjlin/libmultilabel/api/nn_tutorial.html#neural-network-quickstart-tutorial [2] #229 [3] https://github.com/notifications/unsubscribe-auth/ABI3BHVOXY2O3TOLNKTSS53WK4CLZANCNFSM6AAAAAASPKRENM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested configuration for training NN models on highly imbalanced datasets #229

Suggested configuration for training NN models on highly imbalanced datasets #229

henrykang7177 commented Nov 30, 2022 •

edited

Loading

cjlin1 commented Nov 30, 2022 via email

Suggested configuration for training NN models on highly imbalanced datasets #229

Suggested configuration for training NN models on highly imbalanced datasets #229

Comments

henrykang7177 commented Nov 30, 2022 • edited Loading

cjlin1 commented Nov 30, 2022 via email

henrykang7177 commented Nov 30, 2022 •

edited

Loading