Paper | Published in | Resources |
---|---|---|
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences | PNAS, 2021 | Code |
Language models enable zero-shot prediction of the effects of mutations on protein function | Advances in Neural Information Processing Systems, 2021 | Code |
Learning inverse folding from millions of predicted structures | ICML, 2022 | Code |
Evolutionary-scale prediction of atomic-level protein structure with a language model | Science, 2023 | Code |
Simulating 500 million years of evolution with a language model | bioRxiv, 2024 | Code |
Paper | Published in | Resources |
---|---|---|
MSA transformer | ICML, 2021 | Code |
Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval | ICML, 2022 | Code |
Leveraging protein language models for accurate multiple sequence alignments | Genome Research, 2023 | Code |
PoET: A generative model of protein families as sequences-of-sequences | Neurips, 2023 | Code |
Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes | Nature Communications, 2023 | Code |
Paper | Published in | Resources |
---|---|---|
A systematic study of joint representation learning on protein sequences and structures | arXiv preprint, 2023 | Code |
Saprot: Protein language modeling with structure-aware vocabulary | bioRxiv, 2023 | Code |
Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models | arXiv preprint, 2024 | Code |
Multi-level Protein Structure Pre-training via Prompt Learning | ICLR, 2023 | Code |
Structure-informed protein language models are robust predictors for variant effects | Human Genetics, 2024 | N/A |
Integration of pre-trained protein language models into geometric deep learning networks | Communications Biology, 2023 | Code |
Structure-Informed Protein Language Model | arXiv preprint, 2024 | Code |
S-PLM: Structure-Aware Protein Language Model via Contrastive Learning Between Sequence and Structure | Advanced Science, 2024 | Code |
CCPL: Cross-modal Contrastive Protein Learning | Pattern Recognition, 2024 | N/A |
Paper | Published in | Resources |
---|---|---|
OntoProtein: Protein Pretraining With Gene Ontology Embedding | ICLR, 2022 | Code |
ProteinCLIP: enhancing protein language models with natural language | bioRxiv, 2024 | Code |
ProteinBERT: a universal deep-learning model of protein sequence and function | Bioinformatics, 2022 | Code |
Protein Representation Learning via Knowledge Enhanced Primary Structure Reasoning | ICLR, 2023 | Code |
MolBind: Multimodal Alignment of Language, Molecules, and Proteins | arXiv preprint, 2024 | N/A |
Paper | Published in | Resources |
---|---|---|
Prot2text: Multimodal protein’s function generation with gnns and transformers | AAAI, 2024 | Code |
Protranslator: zero-shot protein function prediction using textual description | International Conference on Research in Computational Molecular Biology, 2022 | Code |
Multilingual translation for zero-shot biomedical classification using BioTranslator | Nature Communications, 2023 | Code |
Biot5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations | EMNLP, 2023 | Code |
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning | ACL, 2024 | Code |
ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts | ICML, 2023 | Code |
ProtChatGPT: Towards Understanding Proteins with Large Language Models | arXiv, 2024 | N/A |
ProteinChat: Towards Achieving ChatGPT-Like Functionalities on Protein 3D Structures | TechRxiv, 2023 | N/A |
Paper | Published in | Resources |
---|---|---|
Large language models generate functional protein sequences across diverse families | Nature Biotechnology, 2023 | Code |
ProtGPT2: Deep Unsupervised Language Model for Protein Design | Nature Communications, 2022 | Code |
ProGen2: Exploring the Boundaries of Protein Language Models | Cell Systems, 2023 | Code |
IgLM: Infilling Language Modeling for Antibody Sequence Design | Cell Systems, 2023 | Code |
PALM-H3: Targeted Antibody Generation for SARS-CoV-2 | Nature Communications, 2024 | Code |
Integrating protein language models and automatic biofoundry for enhanced protein evolution | Nature Communications, 2025 | Code |
Paper | Published in | Resources |
---|---|---|
ProtST: Multi-modality Learning of Protein Sequences and Biomedical Texts | ICML 2023 | Code |
ProteinBERT: a universal deep-learning model of protein sequence and function | Bioinformatics, 2022 | Code |
Bertology meets biology: Interpreting attention in protein language models | arXiv preprint, 2020 | Code |
Prottrans: Toward understanding the language of life through self-supervised learning | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 | Code |
Modeling protein using large-scale pretrain language model | arXiv preprint, 2021 | Code |
Paper | Published in | Resources |
---|---|---|
ProstT5: Bilingual Modeling of Protein Sequence and Structure | bioRxiv, 2023 | Code |
Fold2Seq: A Joint Sequence–Fold Embedding-based Generative Model for Protein Design | ICML 2021 | Code |
Ankh: Optimized Protein Language Model for Efficient Generation | arXiv, 2023 | Code |
Paper | Published in | Resources |
---|---|---|
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding | arXiv, 2024 | Code |
ProteinChat: ChatGPT-like Functionalities on Protein 3D Structures | Authorea Preprints, 2023 | Code |
ProtChatGPT: Towards Understanding Proteins with Large Language Models | arXiv, 2024 | Code |
ProteinDT: A Text-guided Protein Design Framework | arXiv, 2023 | Code |
Paper | Published in | Resources |
---|---|---|
Artificial intelligence to solve the X-ray crystallography phase problem: a case study report | BiorXiv, 2021 | N/A |
Paper | Published in | Resources |
---|---|---|
FID-Net: A versatile deep neural network architecture for NMR spectral reconstruction and virtual decoupling | Journal of Biomolecular NMR, 2021 | Code |
Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning | Angewandte Chemie, 2020 | Code |
Paper | Published in | Resources |
---|---|---|
CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks | Nature Methods, 2021 | Code |
CryoGAN: A New Reconstruction Paradigm for Single-Particle Cryo-EM Via Deep Adversarial Learning | IEEE Transactions on Computational Imaging , 2021 | Code |
Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM | Nature Methods, 2021 | Code |
3dflex: determining structure and motion of flexible proteins from cryo-em | Nature Methods, 2023 | Code |
Cryostar: leveraging structural priors and constraints for cryo-em heterogeneous reconstruction | Nature Methods, 2024 | Code |
Dataset Name | Description | Resources |
---|---|---|
UniProtKB/Swiss-Prot | Manually curated protein database with detailed functional annotations | Link |
UniProtKB/TrEMBL | Automatically annotated protein database with computational analysis | Link |
UniRef Clusters | Clustered protein sequences for reduced redundancy and efficient searches | Link |
Pfam | Database of protein families and domains | Link |
PDB | Database of 3D structures of biological macromolecules | Link |
BFD | Large database of clustered protein sequences | Link |
UniParc | Non-redundant archive of protein sequences from public databases | Link |
PIR | Comprehensive annotated protein sequence database | Link |
AlphaFoldDB | Database of predicted protein structures using AI | Link |
Paper | Published in | Resources |
---|---|---|
Critical assessment of methods of protein structure prediction (CASP)—Round XV | Proteins: Structure, Function, and Bioinformatics | Link |
ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design | Neurips, 2023 | Code |
Evaluating protein transfer learning with tape | Neurips, 2019 | Code |
CATH–a hierarchic classification of protein domain structures | Structure, 1997 | Link |
Peer: a comprehensive and multi-task benchmark for protein sequence understanding | Neurips, 2022 | Code |
ExplorEnz: the primary source of the IUBMB enzyme list | Nucleic acids research, 2009 | Link |
HIPPIE: Integrating protein interaction networks with experiment based quality scores | PloS One, 2012 | Link |
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding | arXiv, 2019 | Code |