D 2024

Learning to design protein-protein interactions with enhanced generalization

BUSHUIEV, Anton, Roman BUSHUIEV, Petr KOUBA, Anatolii FILKIN, Marketa GABRIELOVA et. al.

Basic information

Original name

Learning to design protein-protein interactions with enhanced generalization

Authors

BUSHUIEV, Anton, Roman BUSHUIEV, Petr KOUBA, Anatolii FILKIN, Marketa GABRIELOVA, Michal GABRIEL, Jiri SEDLAR, Tomas PLUSKAL4, Jiří DAMBORSKÝ, Stanislav MAZURENKO and Josef SIVIC

Edition

12th International Conference on Learning Representations 2024, 26 pp. 2024

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10608 Biochemistry and molecular biology

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

URL

Organization unit

Faculty of Science

Keywords in English

protein-protein interactions; protein design; generalization; self-supervised learning; equivariant 3D representations

Tags

International impact, Reviewed
Změněno: 24/4/2024 10:22, Mgr. Marie Šípková, DiS.

Abstract

V originále

Discovering mutations enhancing protein-protein interactions (PPIs) is critical for advancing biomedical research and developing improved therapeutics. While machine learning approaches have substantially advanced the field, they often struggle to generalize beyond training data in practical scenarios. The contributions of this work are three-fold. First, we construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions, enabling effective large-scale learning. Second, we leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants. We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function. Finally, we demonstrate the enhanced generalization of our new PPIformer approach by outperforming other state-of-the-art methods on new, non-leaking splits of standard labeled PPI mutational data and independent case studies optimizing a human antibody against SARS-CoV-2 and increasing the thrombolytic activity of staphylokinase.

Links

EF17_043/0009632, research and development project
Name: CETOCOEN Excellence
LM2023055, research and development project
Name: Česká národní infrastruktura pro biologická data
Investor: Ministry of Education, Youth and Sports of the CR, ELIXIR-CZ: Czech National Infrastructure for Biological Data
LM2023069, research and development project
Name: Výzkumná infrastruktura RECETOX
Investor: Ministry of Education, Youth and Sports of the CR, RECETOX research infrastructure
857560, interní kód MU
(CEP code: EF17_043/0009632)
Name: CETOCOEN Excellence (Acronym: CETOCOEN Excellence)
Investor: European Union, Spreading excellence and widening participation
90254, large research infrastructures
Name: e-INFRA CZ II
Displayed: 1/11/2024 15:18