BUSHUIEV, Anton, Roman BUSHUIEV, Petr KOUBA, Anatolii FILKIN, Marketa GABRIELOVA, Michal GABRIEL, Jiri SEDLAR, Tomas PLUSKAL4, Jiří DAMBORSKÝ, Stanislav MAZURENKO and Josef SIVIC. Learning to design protein-protein interactions with enhanced generalization. Online. In 12th International Conference on Learning Representations 2024. 2024, 26 pp.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Learning to design protein-protein interactions with enhanced generalization
Authors BUSHUIEV, Anton, Roman BUSHUIEV, Petr KOUBA, Anatolii FILKIN, Marketa GABRIELOVA, Michal GABRIEL, Jiri SEDLAR, Tomas PLUSKAL4, Jiří DAMBORSKÝ, Stanislav MAZURENKO and Josef SIVIC.
Edition 12th International Conference on Learning Representations 2024, 26 pp. 2024.
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10608 Biochemistry and molecular biology
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
Organization unit Faculty of Science
Keywords in English protein-protein interactions; protein design; generalization; self-supervised learning; equivariant 3D representations
Tags International impact, Reviewed
Changed by Changed by: Mgr. Marie Šípková, DiS., učo 437722. Changed: 24/4/2024 10:22.
Abstract
Discovering mutations enhancing protein-protein interactions (PPIs) is critical for advancing biomedical research and developing improved therapeutics. While machine learning approaches have substantially advanced the field, they often struggle to generalize beyond training data in practical scenarios. The contributions of this work are three-fold. First, we construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions, enabling effective large-scale learning. Second, we leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants. We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function. Finally, we demonstrate the enhanced generalization of our new PPIformer approach by outperforming other state-of-the-art methods on new, non-leaking splits of standard labeled PPI mutational data and independent case studies optimizing a human antibody against SARS-CoV-2 and increasing the thrombolytic activity of staphylokinase.
Links
EF17_043/0009632, research and development projectName: CETOCOEN Excellence
LM2023055, research and development projectName: Česká národní infrastruktura pro biologická data
Investor: Ministry of Education, Youth and Sports of the CR, ELIXIR-CZ: Czech National Infrastructure for Biological Data
LM2023069, research and development projectName: Výzkumná infrastruktura RECETOX
Investor: Ministry of Education, Youth and Sports of the CR, RECETOX research infrastructure
857560, interní kód MU
(CEP code: EF17_043/0009632)
Name: CETOCOEN Excellence (Acronym: CETOCOEN Excellence)
Investor: European Union, Spreading excellence and widening participation
90254, large research infrastructuresName: e-INFRA CZ II
PrintDisplayed: 21/8/2024 20:16