Lecture 10: Fitting and Building Models
1. Model building and fitting into EM maps
2. Comparative and homology modeling
3. Rigid body fitting of atomic models
4. Flexible fitting of atomic models
5. Building models, hybrid methods
6. De novo model building
Model Building Approaches
D. Baker & A. Sali. Science 294, 93, 2001.
Comparative Modeling
Many more sequences available than structures
Many applications rely on structural information
Structure is often more conserved than sequence
(evolution preserves function)
1)  Assembly of rigid bodies 
(core, loops, sidechains)
2)  Segment matching
3)  Satisfaction of spatial restraints
A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.
J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.
A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.
Comparative Modeling
Comparative Modeling
Comparative Modeling
All information is combined into a single objective function (restraints
are converted to an “energy” by taking the negative log)
Function is optimized by conjugate gradients and simulated annealing
molecular dynamics, starting from the target sequence threaded onto
template structure(s)
Sánchez, R., Sali, A. PNAS (1998) 95, 13597
Model Accuracy vs. Sequence Identity
Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.
MEDIUM ACCURACY LOW ACCURACYHIGH ACCURACY
NM23
Seq id 77%
CRABP
Seq id 41%
EDN
Seq id 33%
X-RAY / MODEL
Scope for improvement:
Sidechains
Cα equiv 147/148
RMSD 0.41Å
Sidechains
Core backbone
Loops
Cα equiv 122/137
RMSD 1.34Å
Sidechains
Core backbone, Loops
Alignment, Fold (SSE)
Cα equiv 90/134
RMSD 1.17Å
I‐TASSER: Protein Structure Prediction 
I‐TASSER workflow:
Accuracy estimation:
C‐score > ‐1.5 
=> 90% correct topology
Roy, A. et al. (2010) Nat. Protocols, 5, 725.
Comparative Modeling
• Problem: comparative models are often inaccurate.
• Solution: Use cryoEM maps to assess the models by
rigid density fitting.
refinement
Δ
G
• Problem: the structures may exhibit conformational changes
(induced fit, target-template differences).
• Solution: use flexible fitting to refine the structures in the
map.
• Problem: the resolution of the map can be too low for an
unambiguous placement of a component.
• Solution: use additional information to determine the assembly
architecture.
Topf & Sali. Curr Opin Struct Biol 2005.
Errors in Comparative Modeling
Distortion and 
shifts of 
aligned regions
Regions 
without 
a template
Sidechain 
packing
Incorrect
templates
MisalignmentsRigid‐body 
movements
20 Å 10 Å 2 Å
Information & EM‐map Resolution
GroEL at different resolutions (levels of detail)
Fitting of known structures
(rigid body fitting)
Flexible fitting of 
known structures
Building of 
de novo models
Model Building Approaches
Rigid Body Fitting of Known Structures
CC(Ra,rk )  EM
(rj )probe
(Rarj  rk )
j1
J

LE
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
r

probe native
• LE - Local exhaustive search (rotations only or rotations+translations)
MC
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

probe
native
r
• MC - Monte Carlo in translation, with exhaustive rotation
• SMC - Scanning of the map to find regions with high CC; LE or MC search
probe
probe
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
probe
SMC
native
Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.
Rigid Body Fitting of Known Structures
Native structure
(0)1.00
Best-fitting model
(1)0.69
Single component 
fitting result
Multi‐component 
optimization result
20 Å resolution
1cid:2rhe
12% seq. identity
10 Å resolution
Rigid Body Fitting of Known Structures
Avoiding fitting clashes –> Sequential fitting
Avoiding fitting clashes –> Symmetric fitting
• Fit one monomer taking account of clashes of symmetrically placed monomers.
• This optimizes the correlation of the full symmetric assembly by moving only one monomer.
• This avoids clashes because if two monomers overlap they create double density that gives poor correlation
with experimental map. Clashes are implicitly avoided and there is no special repulsion introduced.
• Fit command in Chimera"fit #1 #0 res 20 sym true".
• Fit sequentially the three monomers and subtract density.
• Fits each in turn subtracting the other two from the density first.
• Repeat last command to get better convergence.
MDFF: Flexible Fitting of Known Structures
Additional potential from the EM map:   Protocol to refine a 6.8‐A 
EM map of the ribosome:  
MDFF: Flexible Fitting of Known Structures
https://youtu.be/_hysNlxDkXw
Rosetta with Low‐Resolution Constrains
Rosetta – comparative modeling (EM density at 4‐6 A resolution):
DiMaio, F. et al. (2009) J.Mol.Biol., 392, 181.
Rosetta ‐ building a model from a Ca trace:
Rosetta with Low‐Resolution Constrains
EM density maps at 10 A resolution:
DiMaio, F. et al. (2009) J.Mol.Biol., 392, 181.
Homology model
Crystal structure
Rosetta model
EM density maps at 4‐6 A resolution:
Hand‐made model
Crystal structure
Rosetta model
EM‐Fold: Refinement guided by EM map
Lindert, S. et al. (2009) Structure, 17, 990.
Energy Function terms :
‐ radius of gyration => increase compactness
‐ distance between AA pairs => good distance of side chains
‐ solvation of individual AA =>  reasonable solvent exposure
‐ loop distance => proper closure of loops
‐ pairing of ‐strands => proper folding of ‐sheets
‐ packing of secondary structure elements
‐ connectivity => reasonable placement of SSE
‐ occupancy => good correspondence with the cryoEM map 
Benchmarking using PDB structures with about 300 AA :
‐ good prediction for 7 of 10 selected proteins (rmsd < 4 Å)
‐ accuracy is  sensitive to the correct prediction of SSE
Example: Final refinement of the helicobacter cysteine‐rich protein C
60,000
models
75
models
100 runs
per model
Protocol (EM map at 5‐7A resolution):
EM‐Fold: Refinement guided by EM map
Lindert, S. et al. (2009) Structure, 17, 990.
EM‐Fold: Refinement guided by EM map
Lindert, S. et al. (2009) Structure, 17, 990.
Application to the adenovirus protein IIIa
6.8‐Å map of the N‐term. of protein IIIa     
‐ 400 AA, predicted 68% ‐helical
‐ identified 14 rod‐like densities
A partial model after EM‐fold analysis
‐ 11 confidently placed ‐helices 
‐ 3 ‐helices and loops are ambiguous
Validation of the model
‐ density bump at the location of Trp27
‐ match of Tyr in other two helices
Lindert, S. et al. (2011) Microsc. Microanal. 17 (Supp 2)
model #1 model #2 model #3 homolog
Application to a domain of DNA-PK catalytic subunit (4128 AA, 135 helices)
- EM-fold applied only to the heat repeat motive with 25 density rods
De Novo Model Building
Wang, R. et al. (2015) Nature Methods, 12, 335
1. Matching fragments into EM density
2. Evaluating sets of compatible fragmets (scoretotal)
3. Simulated annealing with MC sampling
4. Iterative assembly of models
5. Completing models with RossetaCM
6. Model building with Buccaneer
De Novo Model Building
Wang, R. et al. (2015) Nature Methods, 12, 335