Skip to content

Latest commit

 

History

History
55 lines (50 loc) · 5.14 KB

pLM.md

File metadata and controls

55 lines (50 loc) · 5.14 KB

Protein Language Models

(sorted by number of parameters)

Name Params Paper Code Notes
xTrimoPGLM 100B bioRxiv Not available
ESM2 8M - 15B bioRxiv Code
ProGen2 151M - 6.4B arXiv Code
ProtTrans 420M - 3B Paper Code BFD+UniRef50
ProteinLM 200M, 3B arXiv Code
RITA 85M - 1.2B arXiv Code
ProGen1 1.2M bioRxiv Code
Ankh 450M, 1.15B arXiv Code
ProtGPT2 738M Paper Code
Tranception 700M Paper Code
ESM1 43M - 670M Paper Code
PoET 57M - 604M arXiv Not available Only available through OpenProtein.AI web app
DistilProtBert 230M bioRxiv Code
DARK 128M bioRxiv
PRoBERTa 44M Paper Code
TAPE 38M arXiv Code
ProteinBERT 16M Paper Code, PyTorch ~106M proteins from UniRef90; 28 days over ~670M records (i.e. ~6.4 iterations)
AminoBERT bioRxiv Code

Special purpose pLM

Name Params Paper Code Notes
PeTriBERT 40M bioRxiv N/A Optimized for protein design

Non-transformer-based sequence models

Name Params Paper Code Notes
CARP 600K - 640M bioRxiv Code CNN
SeqVec 93M Paper Code bidirectional LSTM; UniRef50
UniRep 90M Paper Code mLSTM
ProSE 24M Paper Code LSTM

pLM specific to Antibody sequences

Name Params Paper Code Notes
TCR-BERT 100M bioRxiv Code
AntiBERTa 86M Paper Code
AntiBERTy 26M arXiv Code
IgLM 1.5M, 13M bioRxiv Code
Sapiens 0.6M Paper Code
AbLang Paper Code

DNA language models

Name Params Paper Code Notes
GenSLM 25M - 25B bioRxiv Code
Nucleotide Transformer 500M - 2.5B bioRxiv Code
GENA-LM 110M - 336M bioRxiv Code Inputs up to 36,000 base pairs

Building on pLMs