Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to embed MHC #2

Open
JingqiZhang1102 opened this issue May 24, 2024 · 2 comments
Open

How to embed MHC #2

JingqiZhang1102 opened this issue May 24, 2024 · 2 comments

Comments

@JingqiZhang1102
Copy link

Hello, I have a question regarding the embeddings of MHC. I tried the following command

python3 esm/ecripts/extract.py esm1v_t33_650M_UR90S_1 /path/to/fasta_file.fasta /path/to/pt_files --repr_layers 33 --include mean

and got KeyError: '*01'.

I assume that the model esm1v_t33_650M_UR90S_1 may not be able to handle characters such as *. But based on 4_VDJDB_trainESMmodel.ipynb, mhclist contains MHC A information, and you were able to compute the embeddings of MHC. Could you elaborate on which model(s) have been used? Thank you in advance!

@JingqiZhang1102
Copy link
Author

We found a website to get protein sequence with MHC alleles. Is this potentially how you get the MHC sequence to embed?
https://www.ebi.ac.uk/ipd/imgt/hla/alleles/

@xinformatics
Copy link
Collaborator

Hi @JingqiZhang1102, you are correct. We used the MHC sequences from EBI and prepared a cleaned-up version of the fasta files with correct HLA nomenclature. For embeddings the ESM1v model was used.

hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants