-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overfitting of ridge regression? #1
Comments
Hard to answer without more information. A few things that might help me answer your question: |
I am training models for ADAR2. I only found <10,000 closely related proteins, so I used ~60,000 sequences including other editases to re-train Unirep. I did the training with JAX-unirep. The loss decreased very fast from 0.12 to ~0.02 within 10 epoches. I used RidgeCV to fit the fitness scores with a set of single amino acid mutations (N=33). Two figures attached, the first one is from my results, and the second is from the Ivan's notebook. |
When you say "re-train" what do you mean? Evotune/fine-tune? How did you
monitor unsupervised loss when you were evotuning, and how did you use that
info to know when to stop evotuning?
Surge Biswas
…On Mon, Jun 21, 2021 at 9:48 AM, JH Liu ***@***.***> wrote:
I am training models for ADAR2. I only found <10,000 closely related
proteins, so I used ~60,000 sequences including other editases to re-train
Unirep. I did the training with JAX-unirep. The loss decreased very fast
from 0.12 to ~0.02 within 10 epoches. I used RidgeCV to fit the fitness
scores with a set of single amino acid mutations (N=33).
Two figures attached, the first one is from my results, and the second is
from the Ivan's notebook.
[image: ADAR]
<https://user-images.githubusercontent.com/20188476/122772288-ce8cc400-d2d9-11eb-9e91-25484d41896a.png>
[image: Ivan]
<https://user-images.githubusercontent.com/20188476/122772304-d2204b00-d2d9-11eb-94ef-6a1de4bb5c0f.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3ZM2PBXYAQSDPMSLYVBTLTT47K5ANCNFSM47BRMVRA>
.
|
I have additional runs these days. It seems that the global unirep parameters might be broken in a few epoches if the lr is too high (1e-6 or 1e-5). Hence I am now doing evotuning with lr=1e-7. Now it seems much better but I need some time to make it reach the best. |
Hello, it seems that in FP and BLAC experiments, the ridge regression worked well in validation set. However, both in my hand and in examples from Ivan's re-implementation, the ridge regression sucessfully fits the training set but has bad performance in validation set. I guess this can be blamed for insufficient training of eUnirep. The problem is, if it deserves moving on to directed evolution with a overfit model?
The text was updated successfully, but these errors were encountered: