Lecture 10: Fitting and Building Models 1. Model building and fitting into EM maps 2. Comparative and homology modeling 3. Rigid body fitting of atomic models 4. Flexible fitting of atomic models 5. Building models, hybrid methods 6. De novo model building Model Building Approaches D. Baker & A. Sali. Science 294, 93, 2001. Comparative Modeling Many more sequences available than structures Many applications rely on structural information Structure is often more conserved than sequence (evolution preserves function) 1)  Assembly of rigid bodies  (core, loops, sidechains) 2)  Segment matching 3)  Satisfaction of spatial restraints A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000. Comparative Modeling Comparative Modeling Comparative Modeling All information is combined into a single objective function (restraints are converted to an “energy” by taking the negative log) Function is optimized by conjugate gradients and simulated annealing molecular dynamics, starting from the target sequence threaded onto template structure(s) Sánchez, R., Sali, A. PNAS (1998) 95, 13597 Model Accuracy vs. Sequence Identity Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000. MEDIUM ACCURACY LOW ACCURACYHIGH ACCURACY NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY / MODEL Scope for improvement: Sidechains Cα equiv 147/148 RMSD 0.41Å Sidechains Core backbone Loops Cα equiv 122/137 RMSD 1.34Å Sidechains Core backbone, Loops Alignment, Fold (SSE) Cα equiv 90/134 RMSD 1.17Å I‐TASSER: Protein Structure Prediction  I‐TASSER workflow: Accuracy estimation: C‐score > ‐1.5  => 90% correct topology Roy, A. et al. (2010) Nat. Protocols, 5, 725. Comparative Modeling • Problem: comparative models are often inaccurate. • Solution: Use cryoEM maps to assess the models by rigid density fitting. refinement Δ G • Problem: the structures may exhibit conformational changes (induced fit, target-template differences). • Solution: use flexible fitting to refine the structures in the map. • Problem: the resolution of the map can be too low for an unambiguous placement of a component. • Solution: use additional information to determine the assembly architecture. Topf & Sali. Curr Opin Struct Biol 2005. Errors in Comparative Modeling Distortion and  shifts of  aligned regions Regions  without  a template Sidechain  packing Incorrect templates MisalignmentsRigid‐body  movements 20 Å 10 Å 2 Å Information & EM‐map Resolution GroEL at different resolutions (levels of detail) Fitting of known structures (rigid body fitting) Flexible fitting of  known structures Building of  de novo models Model Building Approaches Rigid Body Fitting of Known Structures CC(Ra,rk )  EM (rj )probe (Rarj  rk ) j1 J  LE ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………… r  probe native • LE - Local exhaustive search (rotations only or rotations+translations) MC …………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………  probe native r • MC - Monte Carlo in translation, with exhaustive rotation • SMC - Scanning of the map to find regions with high CC; LE or MC search probe probe ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………… probe SMC native Topf, Baker, John, Chiu & Sali. J Struct Biol 2005. Rigid Body Fitting of Known Structures Native structure (0)1.00 Best-fitting model (1)0.69 Single component  fitting result Multi‐component  optimization result 20 Å resolution 1cid:2rhe 12% seq. identity 10 Å resolution Rigid Body Fitting of Known Structures Avoiding fitting clashes –> Sequential fitting Avoiding fitting clashes –> Symmetric fitting • Fit one monomer taking account of clashes of symmetrically placed monomers. • This optimizes the correlation of the full symmetric assembly by moving only one monomer. • This avoids clashes because if two monomers overlap they create double density that gives poor correlation with experimental map. Clashes are implicitly avoided and there is no special repulsion introduced. • Fit command in Chimera"fit #1 #0 res 20 sym true". • Fit sequentially the three monomers and subtract density. • Fits each in turn subtracting the other two from the density first. • Repeat last command to get better convergence. MDFF: Flexible Fitting of Known Structures Additional potential from the EM map:   Protocol to refine a 6.8‐A  EM map of the ribosome:   MDFF: Flexible Fitting of Known Structures https://youtu.be/_hysNlxDkXw Rosetta with Low‐Resolution Constrains Rosetta – comparative modeling (EM density at 4‐6 A resolution): DiMaio, F. et al. (2009) J.Mol.Biol., 392, 181. Rosetta ‐ building a model from a Ca trace: Rosetta with Low‐Resolution Constrains EM density maps at 10 A resolution: DiMaio, F. et al. (2009) J.Mol.Biol., 392, 181. Homology model Crystal structure Rosetta model EM density maps at 4‐6 A resolution: Hand‐made model Crystal structure Rosetta model EM‐Fold: Refinement guided by EM map Lindert, S. et al. (2009) Structure, 17, 990. Energy Function terms : ‐ radius of gyration => increase compactness ‐ distance between AA pairs => good distance of side chains ‐ solvation of individual AA =>  reasonable solvent exposure ‐ loop distance => proper closure of loops ‐ pairing of ‐strands => proper folding of ‐sheets ‐ packing of secondary structure elements ‐ connectivity => reasonable placement of SSE ‐ occupancy => good correspondence with the cryoEM map  Benchmarking using PDB structures with about 300 AA : ‐ good prediction for 7 of 10 selected proteins (rmsd < 4 Å) ‐ accuracy is  sensitive to the correct prediction of SSE Example: Final refinement of the helicobacter cysteine‐rich protein C 60,000 models 75 models 100 runs per model Protocol (EM map at 5‐7A resolution): EM‐Fold: Refinement guided by EM map Lindert, S. et al. (2009) Structure, 17, 990. EM‐Fold: Refinement guided by EM map Lindert, S. et al. (2009) Structure, 17, 990. Application to the adenovirus protein IIIa 6.8‐Å map of the N‐term. of protein IIIa      ‐ 400 AA, predicted 68% ‐helical ‐ identified 14 rod‐like densities A partial model after EM‐fold analysis ‐ 11 confidently placed ‐helices  ‐ 3 ‐helices and loops are ambiguous Validation of the model ‐ density bump at the location of Trp27 ‐ match of Tyr in other two helices Lindert, S. et al. (2011) Microsc. Microanal. 17 (Supp 2) model #1 model #2 model #3 homolog Application to a domain of DNA-PK catalytic subunit (4128 AA, 135 helices) - EM-fold applied only to the heat repeat motive with 25 density rods De Novo Model Building Wang, R. et al. (2015) Nature Methods, 12, 335 1. Matching fragments into EM density 2. Evaluating sets of compatible fragmets (scoretotal) 3. Simulated annealing with MC sampling 4. Iterative assembly of models 5. Completing models with RossetaCM 6. Model building with Buccaneer De Novo Model Building Wang, R. et al. (2015) Nature Methods, 12, 335