Overfitting of ridge regression? #1

jhfoxliu · 2021-06-21T13:05:30Z

Hello, it seems that in FP and BLAC experiments, the ridge regression worked well in validation set. However, both in my hand and in examples from Ivan's re-implementation, the ridge regression sucessfully fits the training set but has bad performance in validation set. I guess this can be blamed for insufficient training of eUnirep. The problem is, if it deserves moving on to directed evolution with a overfit model?

surgebiswas · 2021-06-21T13:23:31Z

Hard to answer without more information.

A few things that might help me answer your question:
What's the application you're working on here? What does train vs val look like? What ridge implementation are you using and how are you doing hyperparameter selection.

jhfoxliu · 2021-06-21T13:48:10Z

I am training models for ADAR2. I only found <10,000 closely related proteins, so I used ~60,000 sequences including other editases to re-train Unirep. I did the training with JAX-unirep. The loss decreased very fast from 0.12 to ~0.02 within 10 epoches. I used RidgeCV to fit the fitness scores with a set of single amino acid mutations (N=33).

Two figures attached, the first one is from my results, and the second is from the Ivan's notebook.

surgebiswas · 2021-06-23T22:16:54Z

When you say "re-train" what do you mean? Evotune/fine-tune? How did you monitor unsupervised loss when you were evotuning, and how did you use that info to know when to stop evotuning? Surge Biswas

…

On Mon, Jun 21, 2021 at 9:48 AM, JH Liu ***@***.***> wrote: I am training models for ADAR2. I only found <10,000 closely related proteins, so I used ~60,000 sequences including other editases to re-train Unirep. I did the training with JAX-unirep. The loss decreased very fast from 0.12 to ~0.02 within 10 epoches. I used RidgeCV to fit the fitness scores with a set of single amino acid mutations (N=33). Two figures attached, the first one is from my results, and the second is from the Ivan's notebook. [image: ADAR] <https://user-images.githubusercontent.com/20188476/122772288-ce8cc400-d2d9-11eb-9e91-25484d41896a.png> [image: Ivan] <https://user-images.githubusercontent.com/20188476/122772304-d2204b00-d2d9-11eb-94ef-6a1de4bb5c0f.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3ZM2PBXYAQSDPMSLYVBTLTT47K5ANCNFSM47BRMVRA> .

jhfoxliu · 2021-07-07T14:25:47Z

When you say "re-train" what do you mean? Evotune/fine-tune? How did you monitor unsupervised loss when you were evotuning, and how did you use that info to know when to stop evotuning? Surge Biswas
…
On Mon, Jun 21, 2021 at 9:48 AM, JH Liu @.***> wrote: I am training models for ADAR2. I only found <10,000 closely related proteins, so I used ~60,000 sequences including other editases to re-train Unirep. I did the training with JAX-unirep. The loss decreased very fast from 0.12 to ~0.02 within 10 epoches. I used RidgeCV to fit the fitness scores with a set of single amino acid mutations (N=33). Two figures attached, the first one is from my results, and the second is from the Ivan's notebook. [image: ADAR] https://user-images.githubusercontent.com/20188476/122772288-ce8cc400-d2d9-11eb-9e91-25484d41896a.png [image: Ivan] https://user-images.githubusercontent.com/20188476/122772304-d2204b00-d2d9-11eb-94ef-6a1de4bb5c0f.png — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3ZM2PBXYAQSDPMSLYVBTLTT47K5ANCNFSM47BRMVRA .

I have additional runs these days. It seems that the global unirep parameters might be broken in a few epoches if the lr is too high (1e-6 or 1e-5). Hence I am now doing evotuning with lr=1e-7. Now it seems much better but I need some time to make it reach the best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfitting of ridge regression? #1

Overfitting of ridge regression? #1

jhfoxliu commented Jun 21, 2021

surgebiswas commented Jun 21, 2021

jhfoxliu commented Jun 21, 2021

surgebiswas commented Jun 23, 2021 via email

jhfoxliu commented Jul 7, 2021

Overfitting of ridge regression? #1

Overfitting of ridge regression? #1

Comments

jhfoxliu commented Jun 21, 2021

surgebiswas commented Jun 21, 2021

jhfoxliu commented Jun 21, 2021

surgebiswas commented Jun 23, 2021 via email

jhfoxliu commented Jul 7, 2021