1 2DProts: Family-wide 2D diagrams of protein secondary structure Radka Svobodová CEITEC MASARYK UNIVERSITY Current trends: Number of available structures grows 2 Current trends: Size of deposited structures also grows 3 Current trends: Protein families are getting bigger 4 Analysis of individual structure Analysis of a whole family 5 Protein families – members 6 Protein families – members 7 Protein families – members Protein secondary structure: A clue for protein family analysis 8 ▪ Comparison of protein family members ▪ Different species ▪ Different substituents ▪ Mutations ▪ Active and inactive forms ▪ Firm and flexible secondary structure elements ▪ Binding of ligands Visualization of secondary structure in 2D: Solved in past? Not for protein families! 9 1TQN 1OG2 ISSUE 1: Similar proteins have different 2D diagrams RMSD: 2.295 Å Hera, PDBe Visualization of secondary structure in 2D: Solved in past? Not for protein families! 10 ISSUE 2: Secondary structure elements close in 2D diagrams are far in reality 1TQN Hera, PDBe ISSUE 3: 2D diagrams does not reflect a shape of a protein Visualization of secondary structure in 2D: Solved in past? 11 1ORW HERA 2DProts Protein family based 2D diagrams Input: ▪ A CATH superfamily (e.g. 2.60.120.400) ▪ the list of its domains (e.g. 1gztA00, 1ourA00, ...) ▪ PDB structures of these domains. Step 1: For each domain in the given family, find its SSEs (via SecStrAnnotator) and annotate them: ▪ topologically equivalent SSEs have the same name. Step 2: For each group of SSEs with the same name, compute average length and frequency of SSE occurrence. 12 2DProts Protein family based 2D diagrams Step 3: For each domain in the family: Step 3.1: Try to select an appropriate starting layout among the previously computed domains. Step 3.2: Group all b-strands into sheets and compute a 2D model of each individual sheet. Step 3.3: Divide the helices and sheets into primary (common for most of the domains) and secondary (the remaining ones). Step 3.4: Place all primary helices and sheets into the 2D diagram. Step 3.5: Adjust the angles of the primary helices and sheets. Step 3.6: Add all secondary helices and sheets into the 2D diagram. Step 3.7: Adjust the angles of the secondary helices and sheets. Step 4: Draw an individual 2D diagram for each domain and a common multiple 2D diagram for the whole family 13 2DProts Database ▪ Precalculated 2D diagrams of domains from all CATH families ▪ Includes multiple 2D diagrams for a whole protein family ▪ Freely available at: http://ncbr.muni.cz/2DProts/ ▪ Updated each week 14 2DProts outputs 2D diagram of a protein domain 15 2DProts outputs: Multiple 2D diagram of protein domains in a family 16 With opacity No opacity Superfamily: Dipeptidylpeptidase IV (2.140.10.30) PROTEIN FAMILY 2DProts HERA CATH PROTEIN Current solution Superfamily: Iron dependent repressor (1.10.60.10) PROTEIN FAMILY 2DProts HERA CATH PROTEIN Current solution Superfamily: Rhodopsin 7-helix transmembrane proteins (1.20.1070.10) PROTEIN FAMILY 2DProts HERA CATH PROTEIN Current solution Superfamily: Aldolase class I (3.20.20.70) PROTEIN FAMILY 2DProts HERA CATH PROTEIN Current solution 2DProts Other features 21 ▪ Precalculated results for CATH structural clusters ▪ Possibility to process user defined sets of domains (e.g., select some organism, resolution, experimental method, etc.) 2DProts User defined sets 22 2DProts User defined sets 23 2DProts User defined sets 24 Aldolase class I (3.20.20.70) Archea Thermotoga maritima 2DProts new features Proteins 25 1gzt 1aqd 2DProts new features Proteins 26 7c2l 2DProts new features Alpha fold 27 2DProts integration to CATH 28 29 Publications Sillitoe I, ..., Berka K, Hutařová Vařeková I, Svobodová R., et al., 2021. CATH: increased structural coverage of functional space. Nucleic Acids Research, 49(D1), D266-D273. Hutařová Vařeková, I., Hutař, J., Midlik, A., Horský, V., Hladká, E., Svobodová, R., & Berka, K. (2021). 2DProts: database of family-wide protein secondary structure diagrams. Bioinformatics. ELIXIR and ELIXIR CZ CATH, EBI: Ian Sillitoe, Christine A Orengo EMBL-EBI, PDBe: Dr. Sameer Velankar Acknowledgement 30 Ivana Hutařová Vařeková Jan Hutař Adam Midlik Karel Berka Aliaksei Charesneu Eva Hladká Central European Institute of Technology Masaryk University Kamenice 753/5 625 00 Brno, Czech Republic www.ceitec.muni.cz | info@ceitec.muni.cz Thank you for your attention