BIOIN FORM ATICS DATABASES OF PROTEIN SEQUENCES UniProtKB SWISS-PROT: high-quality manual annotation TrEMBL: automatic annotation (TrEMBL SWISS-PROT) PIR: USA DATABASES OF DNA SEQUENCES EMBL-Bank : Europe (EMBL-EBI), access from ENA (European Nucleotide Archive) GenBank: USA, retrieved by ENTREZ DDBJ Japan, retrieved by ARSA, DBGet STRUCTURE DATABASES PDB PDBsum: summaries and analyses EDS (Uppsala): electron density maps EMDataBank: 3D maps from alectron microscopy SCOP: fold-superfamily-family CATH: class-architecture-topology-homology PAIR WISE ALIGNMENT DAGTKVSAEQ I L DAGTKECHQ I L DAGTKVSAEQIL DAGTKECH - QIL score=5, gap=0 score=8, gap=l DAGTKVSAE- - QIL score=9, gap=5 D A G T K - E C H Q I L BLOSUM62 C S T A G P D E Q N H R K M I L V W Y F C 9| C s T A G P -1 -1 0 -3 -3 4 1 5 10 4 0-206 -1 -1 -1 -2 7 S T A G P D E Q N -3 -4 -3 -3 0 -1 -2 -1 -1 0 -1 -1 -2 -1 0 -1 -1 -2 -1 10-20 -21 6 2 5 0 2 5 10 0 6 D E Q N H R K -3 -3 -3 -1 -2 -2 -2 -2 -1 -1 -1 -2 -2 0 -1 -1 -2 -1 -10 0 1 -2010 -1 110 3 G 5 -12 5 H R K H I L V -1 -1 -1 -1 -1 -1 -1 -3 -2 -2 -1 -1 -4 -3 -2 -1 -1 -4 -3 -2 0 0 -3 -2 -3 -2 0 -2 -3 -3 -3 -3 -4 -3 -2 -3 -3 -2 -2 -3 -2 -1 -1 -3 -3 -3 -3 -2 -2 -3 -3 -2 5 1 4 2 2 4 13 14 H I L V W Y F -2 -2 -2 -3 -2 -3 -2 -4 -2 -2 -2 -3 -3 -2 -2 -2 -3 -4 -4 -3 -2 -4 -3 -2 -1 -2 -3 -3 -3 -3 -2 -3 -3 2 -2 -2 -1 -3 -3 -1 -3 -2 -3 -1 -1 -1 -1 0 G G -1 11 2 7 1 3 6 W Y F C S T A G P DEQM|HRK|MILV W Y F PAIRWISE DATABASE SEARCH Fast local similarity algorithms • Fast A • BLAST MULTIPLE ALIGNMENT Progressive algorithms • CLUSTAL: evolutionary tree + pairwise alignment • PSI-BLAST: hybrid (pairwise + multiple), iterative, sensitive Databases: Pfam, PRINTS STRUCTURE PREDICTION • Secondary structure: PSI-PRED • Fold: threading • Tertiary structure from homologous structure: homology modelling • Tertiary structure from multiple sequence alignment: AlphaFold.2 test sequence: PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK ALPHAFOLD2 Article Highly accurate protein structure prediction with AlphaFold https://doi.org/10.1038/s41586-021-03819-2 Received: 11 May 2021 Accepted: 12 July 2021 Published online: 15 July 2021 Open access Check for updates John Jumper1,4^, Richard Evans1,4, Alexander Pritzel1,4, Tim Green1,4, Michael Figurnov1,4, Olaf Ronneberger1,4, Kathryn Tunyasuvunakool1,4, Russ Bates1,4, Augustin Žídek1,4, Anna Potapenko1,4, Alex Bridgland1,4, Clemens Meyer1,4, Simon A. A. Kohl1,4, Andrew J. Ballard1,4, Andrew Cowie1,4, Bernardino Romera-Paredes1,4, Stanislav Nikolov1,4, Rishub Jain1,4, Jonas Adler1, Trevor Back1, Stig Petersen1, David Reiman1, Ellen Clancy1, Michal Zielinski1, Martin Steinegger2,3, Michalina Pacholska1, Tamas Berghammer1, Sebastian Bodenstein1, David Silver1, Oriol Vinyals1, Andrew W. Senior1, Koray Kavukcuoglu1, Pushmeet Kohli1 & Demis Hassabis14H Nature | Vol 596 | 26 August 2021 | 583 24 64 Physeter macrocephalus sperm whale hgq hgvtv Balaena mysticetus Sus scrofa Orycteropus afer afer Equus cabal I us Homo sapiens bowhead pig aardvark horse man hgq hgntv hgq hgnti hgq hgttv hgq hgtvv hgq hgatv g small 26 116 Physeter macrocephalus sperm whale Balaena mysticetus Sus scrofa Orycteropus afer afer Equus caballus Homo sapiens bowhead pig aardvark horse man QDH HSRH QDH HSRH QEH QSKH QEH QSKH QEH HSKH QEH QSKH short-D © • • • © R - I o n g I o n g - E © • • • © K-short short-D © • • • 0 R — I o n g I o n g - E e • • • © K-short flTTTttt Input sequence ( ft TTTttt ) Genetic database ssearch , -( Pairing)- £>ttTtTt ^TTtttt £#T T T tt MSA r 'tit o h|J J J J. J J J J J - 4 4. ■ , —l —l ro OOlOOlOOlO Energy ->■ ■ —L —L l\D OOlOOlOOlO 48 blocks (no shared weights) 0 re| MSA presentation (s/,c) Row-wise gated self-attention with pair bias <±> Columnwise gated self-attention Transition Outer product mean 4> Triangle update using outgoing edges <±> Triangle update using incoming edges Triangle self-attention around starting node <±> f > Triangle self- S J attention around ending node <±> Transition <±> MSA representation (s/,c) (jj Single repr. (r,c) 1 Pair representation ir,r,c) c ( \ r \ Backbone frames (r, 3x3) and (r,3) (initially all at the origin) 8 blocks (shared weights) 1 IPA module Predict! angles and compute all atom positions 4 1—" (jj Single repr. (r,c) 2 Predict relative rotations and translations 4 0 Backbone frames (r, 3x3) and (r,3)