1
Synthetic protein biology I.
Introduction to protein engineering & computational
(de novo) protein design
Modifying protein structure and function for biotechnology,
biomedicine and basic research
Dr. Martin Marek
Loschmidt Laboratories
Faculty of Science, MUNI
Kamenice 5, bld. A13, room 332
martin.marek@recetox.muni.cz
2
What will we talk about
• Introduction to protein engineering and design
• definition, goals and applications
• Rational protein design (knowledge-based strategies)
• concepts, methodology, limitations, success stories
• Directed evolution (lab-based brute force engineering)
• strategies, methodology, disadvantages, success stories
• Integrative (combined) approaches
• the best of both approaches, beneficial synergy, examples
• Selection and screening technologies
• classical versus emerging technologies, unmet challenges
3
Introduction to protein engineering
Concepts
Methods
Applications
4
What protein engineering is
• Protein engineering
The activities to modify the structure and molecular function of a
protein so that it acquires new specific properties (stability,
substrate specificity, enantioselectivity etc.)
• Genetic engineering
The alteration of the genome of a target organism by laboratory
techniques.
• Protein engineering
The activities to modify the structure and molecular function of a
target protein so that it acquires new specific properties (stability,
substrate specificity, enantioselectivity etc.)
• Genetic engineering
The alteration of the genome of a target organism by laboratory
techniques
5
The anatomy of protein structure
• Proteins are an important class of biological macromolecules which are
polymers of amino acids
• Biochemists have distinguished several levels of structural organization
of proteins
6
Protein synthesis: the ribosomeProtein synthesis: the ribosome
• The ribosomes are protein synthesizers
of the cell
• Made up of rRNAs and distinct
ribosomal proteins
• Arranged into two pieces:
– Small ribosomal subunit
– Large ribosomal subunit
7
Protein variety and functional diversity
• Single-domain proteins
• Multi-domain proteins
• Multi-subunit protein complexes
• DNA-binding proteins
• Protein-RNA complexes
• Sugar-binding protein
• Light-emitting proteins
• Integral membrane proteins
• Intrinsically disordered proteins
• Etc.
8
The basic concepts of protein engineering
• The amino acid sequence of a protein affects both its
structure and its function
• Thus, the ability to modify the sequence, and hence the structure and
activity, of individual proteins in a systematic way, opens up many
opportunities, both scientifically and biotechnologically
• Today, large DNA sequences can be synthesised de novo, allowing an
unprecedented ability to engineer proteins with novel functions
• However, the number of possible proteins (protein sequence space) is
far too large to test individually, so we need ‘navigating system‘ to find
desirable molecular activities and other properties
9
Protein sequence space
• There are 400 possible dipeptides arranged in a 20x20 space but that
expands to 10130 for even a small protein of 100 amino acids
• Most sequences in sequence space have no function, leaving relatively small
regions that are populated by naturally occurring proteins
10
Major milestones in protein engineering
11
Protein engineering: goals
12
Why is protein engineering important for synthetic biology?
• Protein engineering is creating building blocks for synthetic
biology applications
13
Practical applications of protein engineering
Why should we bother to think about protein engineering?
14
The two major strategies of protein engineering
15
Rational protein design
Concepts
Methods
Applications
16
Rational protein design workflow
• Based on protein knowledge
Evolutionary history, natural variation
Bioinformatics (MSA, ASR etc.)
Biochemistry & biophysics
Structure & dynamics
Mode of action, molecular mechanism
• Similar to mechanical engineering
17
Rational protein design
• Range of available techniques
• Controlled outcome
• Intellectually satisfying
• Increasing computational power
• Deep understanding is essential
Evolutionary history, natural variation
Structure & dynamics
Mode of action, molecular mechanism
• Algorithms are not perfect
• High failure rate
Unsuccessful stories are not reported
BENEFITS LIMITATIONS
18
Rational protein design
“Low-resolution“ design “High-resolution“ design
• Engineered fusion proteins
• Split proteins
• Chemical modifications
• Antibody-drug conjugates
• Cyclisation
• Disulphides
• Manipulating existing proteins
• De novo protein design
• Computational modelling
19
“Low-resolution“ protein design
Concepts
Methods
Applications
20
Fusion protein technologies
• The fundamental idea is to combine protein units of defined function
(domains) to engineer a fusion protein with novel functionality
• Examples include biosensors, chimeric antibodies, signal transduction
components, DNA-binding transcription factors, cell biology application,
structural biology application, therapeutics etc.
21
Fusion proteins: considerations
• Remove stop codon of a first gene
• Ligate (fuse) genes together in a frame
• Include linker codons
• Linker length and flexibility
• Distance between protein components
• Ability for proteins rotate relatively to each other
• Protease resilience
• Ability for domains to fold
22
Examples of synthetic fusion protein applications
1. Substrate channeling was achieved by fusing
dihydroxyacetone (DHA) kinase with fructose-1,6biphosphate
(FBP) aldolase using a linker peptide
QGQGQ.
2. A protein On and Off switch was built by linking the
tetracycline repressor protein (TetR or rTetR) and the
transcription activator (LuxR∆N) together. Depending
on the presence of anhydrotetracycline, TetR/rTetR
undergoes a conformational changes and binds to tetO,
which allows the fused LuxR∆N to bind the luxbox
sequence, thereby controlling downstream gene
expression.
3. A fusion protein-based biosensor was created for Ca2+
detection. The system consisted of a tandem fusion of
the cyan fluorescent protein (CFP), N-terminal fragment
of calmodulin (CaM), CaM-binding peptide from CaMdependent
kinase (CKKp), C-terminal fragment of CaM
and yellow fluorescent protein (YFP). CaM is able to
bind Ca2+ and thus wraps around the fused CKKp,
which places the two fluorophores in close proximity to
enhance the FRET efficiency.
4. The dimeric structure of a typical Fc fusion protein. An
effector protein is covalently attached to an
immunoglobulin Fc domain that provides immune
functions and extends half-life.
Yu et al., Biotech. Adv. 1: 155-164 (2015)
23
Enhanced DNA polymerase processivity via protein fusion
Wang et al., Nucleic Acid Res. 32: 1197-1207 (2004)
• The fusion of a heterologous dsDNA binding protein to a polymerase can increase
processivity without compromising catalytic activity and enzyme stability.
• Second, polymerase processivity is limiting for the efficiency of PCR, such that the fusion
enzymes exhibit profound advantages over unmodified enzymes in PCR applications.
• This technology improved the performance of nucleic acid modifying enzymes.
24
Cabantous et al., Scientific Reports 3: 2854 (2013)
Split protein technology
• A promising approach for deconvoluting the role of macromolecular partnerships
is split-protein reassembly, also called protein fragment complementation.
• This approach relies on the appropriate fragmentation of protein reporters, such
as the green fluorescent protein or firefly luciferase, which when attached to
possible interacting partners can reassemble and regain function, thereby
confirming the partnership.
• Split-protein methods are
effectively utilized for
detecting protein-protein
interactions in cell-free
systems, E. coli, yeast,
mammalian cells,
plants and live
animals.
25
Split protein technology: basic concept
• A generic split-protein system is shown where a functional protein is dissected into two
inactive fragments, purple and yellow.
• The attachment of two interacting proteins or protein domains brings the inactive
fragments into close proximity and overcomes the entropic cost of fragmentation.
• This leads to the reassembly or complementation of the fragments thus providing a
direct readout for the partnership between the interacting domains.
• Crystal structures of representative proteins which have been shown to be amenable to
interaction dependent reassembly.
Shekhavat and Ghosh,
Curr. Opin. Chem. Biol. 15: 789-797 (2012)
26
Chemical modification of proteins
• Formaldehyde
Extensive modification → inactivated protein toxins
• Poly-ethylene glycol
Flexible hydrophilic coat → solubility
Reduced accessibility → Protease resistance, non-antigenicity
Increased size → Serum half-life
• Fluorescent probes (Fluorescein, Cy5 etc.)
Labelling → tracking location in cell, dynamics
• Prosthetic catalytic groups
Modified reactivity → new catalytic properties
• Antibody-drug conjugates (ADCs)
Covalent linking of antibodies with small-molecule drugs
Poly-ethylene glycol
Formaldehyde
Fluorescein
Selenocysteine
27
Antibody-drug conjugates (ADCs)
• Antibody–drug conjugates (ADCs) are one of the
fastest growing classes of oncology therapeutics
• ADCs consist of recombinant monoclonal
antibodies (mAbs) that are covalently bound to
cytotoxic chemicals (known as warheads) via
synthetic linkers
• Such immunoconjugates combine the antitumour
potency of highly cytotoxic small-molecule drugs
(300–1,000 Da, with subnanomolar half-maximal
inhibitory concentration (IC50) values) with the
high selectivity, stability and favourable
pharmacokinetic profile of mAbs
• Alternative formats to mAbs, such as protein
scaffolds (designed ankyrin-repeat proteins
(DARPins), nanobodies, single-chain variable
fragments (scFvs) and peptide–drug conjugates),
dual-labelled ADCs and biparatopic drug
conjugates, present new research avenues
Beck et al., Nature Rev. Drug Discov.,
16: 315-337 (2017)
28
Antibody-drug conjugates: the mode of action
Anti-epidermal growth factor
receptor 2 (HER2) monoclonal
antibody trastuzumab
conjugated to the maytansinoid
DM1 via a nonreducible
thioether linkage (MCC) with
potential antineoplastic activity
Antibody-drug conjugate
composed of a CD30-directed
monoclonal antibody that is
covalently linked to the
antimicrotubule agent
monomethyl
auristatin E (MMAE)
29
Protein cyclisation
• Termini of most proteins happen to be close together
• Cyclisation via head-to-tail linkage of the termini of a peptide chain occurs in only
a small percentage of proteins, but engenders the resultant cyclic proteins with
exceptional stability
• Engineering efforts to cyclise peptides or proteins to gain superior stability and/or
protease resistance
Daniel and Clark, Adv. Exp. Med.
Biol., vol. 1030, pp 229-225,
Springer, (2017)
30
An example of successful protein cyclisation
• Conotoxins are disulfide-rich peptides found in the venoms of marine snails (Conus)
• Pain killer activity by specific binding to ion channels
• They have attracted great attention from the pharmaceutical industry because of their
potential uses as drug leads, but like most peptides, conotoxins are susceptible to
proteolysis and typically are not orally bioavailable
• Multiple approaches have been used to
stabilise conotoxins to improve their
potential pharmaceutical use
• Specifically, the use of backbone
cyclisation dramatically improved their
stability in biological fluids and
protease resistance
Wu et al., Eur. J. Organ. Chem.
3462-3472 (2016)
31
Disulfide bond engineering to enhance protein stability
• Two cysteine residues in close proximity
will form a covalent bond
• Disulphide bond, disulphide bridge
• Significantly stabilizes protein tertiary
structure
• Engineering efforts to mutate two codons
into cysteines → creation of disulphide
bond in oxidising environment
• Considerations: inter-cysteine distance
and inter-cysteine orientation
32
“High-resolution“ protein design
Concepts
Methods
Applications
33
Design of new protein functions
• Proteins fold to their lowest free
energy states
• For designing new proteins, we must
be able to calculate energies
reasonably accurately, and sample
protein conformations sufficiently to
find global minimum
• To design proteins with new functions,
we need hypotheses about
configurations of atoms necessary to
achieve desired function
• Finally, we have to test experimentally
all designed proteins
34
“High-resolution“ protein design: requirements
• Etot = Ebond + Eangl + Edihe + Eimpr + EVDW + Eelec + EHbond + ...
• This may be evaluated using a force field (e.g. CHARMM, Amber) and
atomic coordinates available from simulation or modified PDB file
What knowledge is required for “high-resolution” protein engineering:
• determination of 3D structure, for mutagenesis-based engineering
• knowledge of protein folding rules for de novo engineering
• computational modelling techniques usually required
Computational methods important for protein engineering:
• modelling & visualization
• energy/thermodynamic calculations
• searching conformation and sequence spaces
• comparison with known protein structures/sequences
The basis of more automated analysis of structural perturbations than
our own “inspect and try” approach involves use of an energy function
to evaluate plausibility of candidate structures:
35
Protein structure determination techniques
• Determining 3D structure of proteins help protein engineers to understand
molecular mechanism of protein action and its biological function.
Structural biology methods
• X-ray crystallography
Crystallization required, no size limits,
challenging for highly flexible proteins
• Nuclear magnetic resonance (NMR) spectroscopy
Labelling required, suitable for smaller proteins, capturing protein
motions
• Cryo-electron microscopy
Automation, direct electron detectors, image processing
suitable for large protein complexes
36
Retrieval of atomic coordinates – Protein Data Bank (PDB)
The Protein Data Bank (PDB) is a database for the three-dimensional structural
data of large biological molecules (proteins, nucleic acids), and determined by X-ray
crystallography, NMR spectroscopy and cryo-electron microscopy.
37
Protein Data Bank (PDB): overview
• The PDB is a key in areas of structural
biology. Many other databases use protein
structures deposited in the PDB. For
example, SCOP and CATH
• Each structure published in PDB receives a
four-character alphanumeric identifier, its
PDB ID
• The file format initially used by the PDB was
called the PDB file format
• From 2019, mmCIF is the master format for
the PDB archive
38
PDB (mmCIF) file format
• PDB file format: The Protein Data Bank (PDB) format provides a standard
representation for macromolecular structure data derived from X-ray diffraction,
NMR and cryo-EM studies.
39
Retrieval of coordinate files from the PDB: considerations
• PDB (mmCIF) file format – 3D model
• Structure factors file (CIF) – Data file
40
Visualisation, modelling and computational tools
https://www.rcsb.org/pages/thirdparty/molecular_graphics
41
Selected visualisation and modelling software tools
https://www.rcsb.org/pages/thirdparty/molecular_graphics
• PyMOL
A free and open-source molecular graphics system for visualization, animation, editing, and publicationquality
imagery. PyMOL is scriptable and can be extended using the Python language. Supports Windows,
Mac OSX, Unix, and Linux
• Chimera
Interactive molecular modeling system, free to academic/non-profit; displays multiple sequence alignments
and associated structures, atom-type and H-bond identification, molecular dynamics trajectories (AMBER
format), and offers ligand-screening interface (DOCK), filter by number/position of H-bonds, and
extensibility to create custom modules - for Windows, Linux, Mac OS X, IRIX, and Tru64 Unix
• Swiss PDB viewer
A 3D graphics and molecular modeling program for the simultaneous analysis of multiple models and for
model-building into electron density maps. The software is available for Mac (OSX or PPC), Windows,
Linux, or SGI
• YASARA
A complete molecular graphics and modeling program, including interactive molecular dynamics
simulations, structure determination, analysis and prediction, docking, movies and eLearning for Windows,
Linux and MacOSX
• VMD
VMD (Visual Molecular Dynamics) runs on many platforms including MacOS X, and several versions of
Unix and Windows. VMD provides visualization, analysis, and Tcl/Python scripting features, and has
recently added sequence browsing and volumetric rendering features. VMD is distributed free of charge
• Foldit
Foldit is a crowdsourcing computer game based on protein modelling
42
De novo protein design: problem definition
• Given the desired three dimensional structure of a protein, design an
amino acid sequence that will assume that structure
• Of course, a precise set of atomic coordinates would determine
sequence. Usually we start with an approximate desired structure
• Alternatively, we may want to design for a particular function (e.g. ability
to bind a particular ligand)
• Protein design is the inverse of the protein folding problem!
43
Protein design workflow
• Computer calculation of optimal sequence for desired structure and
function
• Read off amino acid sequence of designed protein
• Back translate to DNA sequence, and make a gene
• Make a protein and assay
Computational protein design
44
Protein folding and design are two sides of the same coin
Khoury et al., Trends in Biotechnology 32: 99-109 (2013)
45
The key calculation: total energy of the protein
ET = Ei (ri )+ Eij (ri ,rj )
i≠j
∑
i
∑
• Design along the backbone or scaffold
• Rotamer/backbone and rotamer/rotamer interaction energies
tabulated
• Given a target backbone geometry, we aim to select side-chain
rotamers at each position to minimize total protein energy:
where: ri specifies both the amino acid at position i and its
side-chain geometry
Ei the self-energy of a particular rotamer ri
Eij the pair energy of rotamers ri, rj
46
A typical computational protein design workflow
• Initial backbone structures can be
either generated de novo or taken
from solved protein structures
• Sequences that stabilise
the designed backbone structure
are then computationally designed,
and the backbone may be permitted
to move as part of an iterative
design cycle
• Finally, promising designs are
selected for experimental
characterisation
MacDonalds & Freemont, Biochemical Society
Transactions 44:1523-1529 (2016)
47
The protein backbone design and selection
• First step in de novo protein design is selection of target backbone
structures:
• This is as much art as science
• Often multiple target structures are selected, some won’t work
• Proteins can only adopt a limited set of backbone structures
• But we do not have a perfect description of that set
• Approaches used towards this goal:
• Use an experimentally determined backbone structure
• Assemble secondary structural elements by hand
• Use a fragment assembly program like Rosetta, and select
fragment combinations that fit approximate desired scaffolds
48
Example of successful backbone design
• Towards design of “Top7,” a protein with a novel fold
• Started with a 2D schematic, then used Rosetta fragment assembly
• Ended up with 172 backbone models that fit initial scaffold
Kuhlman et al., Science 302:1364-8 (2003)
Initial schematic of target scaffold: hexagons; β sheet,
squares; α helix, arrows; hydrogen bonds. Letters
indicate amino acids in final designed Top7 sequence
49
New topologies of de novo designed hyperstable peptides
Bhardwaj et al., Nature (2016): doi: 10.1038/nature19791
• The development of computational
methods for accurate de novo design of
conformationally restricted peptides, and
the use of these methods to design 18-47
residue, disulfide-crosslinked peptides, a
subset of which are heterochiral and/or NC
backbone-cyclized.
• Both genetically encodable and noncanonical
peptides are exceptionally
stable to thermal and chemical
denaturation, and 12 experimentally
determined X-ray and NMR structures are
nearly identical to the computational
design models.
• The computational design methods and
stable scaffolds presented here provide
the basis for development of a new
generation of peptide-based drugs.
50
Computational design of a new enzyme
Lock and Key analogy: lock = enzyme; key = substrate
51
Computational design of a new enzyme
1. Disembodied residues are
placed to stabilise reaction
transition state - THEOZYME
2. PDB database is then searched
for protein structures
(backbones) with correct
orientations (geometries)
3. Remaining residues in active site
are optimized for proper packing
and elimination clashes
4. Final adjustments of overall
structure of tailor-made enzyme
Kries et al., Curr. Opin. in Chem. Biol.
17:221-8 (2013)
52
The hub for Rosetta modelling software
https://www.rosettacommons.org/
• The Rosetta software suite includes algorithms for computational modelling and
analysis of protein structures. It has enabled notable scientific advances in
computational biology, including de novo protein design, enzyme design, ligand docking,
and structure prediction of biological macromolecules and macromolecular complexes.
53
RosettaDesign
http://rosettadesign.med.unc.edu/
• Rosetta design can be used to identify sequences compatible with a given
protein backbone. Some of Rosetta design's successes include the design of a
novel protein fold, redesign of an existing protein for greater stability, increased
binding affinity between two proteins, and the design of novel enzymes.
54
What to do after the design is finished ...
… Experimental validation
What to do after the design is finished ...
… Experimental validation
55
• Many viral surface glycoproteins and cell surface receptors are homo-oligomers and thus
can potentially be targeted by geometrically matched homo-oligomers that engage all
subunits simultaneously to attain high avidity and/or lock subunits together
• A general strategy for the computational design of homo-oligomeric protein assemblies with
binding functionality precisely matched to homo-oligomeric target sites
• In the first step, a small protein is designed that binds a single site on the target. In the
second step, the designed protein is assembled into a homo-oligomer such that the
designed binding sites are aligned with the target sites
• This approach was used to design
high-avidity trimeric proteins that bind
influenza A hemagglutinin (HA) at its
conserved receptor binding site
• The designed trimers can both capture
and detect HA in a diagnostic format,
neutralizes influenza in cell culture, and
completely protects mice
De novo design of trimeric influenza-neutralizing proteins
An example of computational protein design I.
Strauch et al., Nature Biotechnology,
35: 667-671 (2017)
56
• (a) A co-crystal structure of HSB.2A bound to
HK68 HA shows that the design binds to the
RBS as designed
• (b) Superposition of antibody C05 and the TriHSB.2A-HA
crystal structure
• (c) Close agreement is found for the contacts
with HA between the tip of HCDR3 of C05
(blue), the HSB designed model (gray) and the
bound crystal structure HSB.2A (orange)
• (d) Translational and rotational sampling of
trimeric protein scaffolds (gray, magenta, blue)
to identify trimers that connect the termini of
three HSB (orange) molecules bound to trimeric
HA (green)
• (e) EM reconstruction of Tri-HSB.1C bound to
HK68 HA
• (f) BLI titrations of H3 HK68 HA binding to
monomeric HSB.2 and trimer Tri-HSB.2
De novo design of trimeric influenza-neutralizing proteins
An example of computational protein design I.
Strauch et al., Nature Biotechnology,
35: 667-671 (2017)
57
How well does de novo protein design work?
• Impressive recent successes
• But, keep in mind that:
• successful protein design projects often involve creating and
experimental testing dozens of candidate proteins to find one
• unsuccessful projects are not reported
• design of membrane proteins is still challenging
58
Computational protein design: the reality
• An increase in the sequence lengths of computationally designed and
structurally validated proteins
59
Great future for protein designers
60
Summary
Today Next lecture
61
Questions
62
Dr. Martin Marek
Loschmidt Laboratories
Faculty of Science, MUNI
Kamenice 5, bld. A13, room 332
martin.marek@recetox.muni.cz
63
Supplementary materials
64
Computational protein design: the glossary
• De novo protein design: computationally designed proteins that can fold into a
target structure with a desired function
• Intrinsic disorder proteins or regions in a protein that do not have a
unique three-dimensional structure as a monomer at
physiological conditions
• Knowledge-based energy function an energy function derived from statistical or
statistical mechanical analysis of known protein
structures
• Physical-based energy function an energy function derived by the laws of physics
that is composed of many approximate terms
• Energy function the scoring function that is minimized during iterative
protein design
• Local interaction the interaction between amino acid residues that are
sequence neighbours
• Nonlocal interaction the interaction between amino acid residues that are
located close to each other in three-dimensional
space but far from each other in their sequence
positions
65
A designed metalloprotein using an unnatural amino acid
An example of computational protein design II.
• Genetically encoded unnatural amino acids could facilitate the design of
proteins and enzymes of novel function
• The Rosetta design was used to design metalloproteins in which the amino acid (2,2′bipyridin-5yl)alanine
(Bpy-Ala) is a primary ligand of a bound metal ion
• A buried metal binding site with octahedral coordination geometry consisting of Bpy-Ala,
two protein-based metal ligands, and two metal-bound water molecules
• Experimental characterization revealed a Bpy-Ala-mediated metalloprotein with the ability
to bind divalent cations including Co2+, Zn2+, Fe2+, and Ni2+, with a Kd for Zn2+ of ∼40 pM
Mills et al., J. Am. Chem. Soc., 135: 13393-13399 (2013)
(2,2′-bipyridin-5yl)alanine
Theozyme for secondround
design
calculations
66
• X-ray crystal structures of the designed protein bound to Co2+ and Ni2+ have RMSDs to the
design model of 0.9 and 1.0 Å respectively over all atoms in the binding site.
Mills et al., J. Am. Chem. Soc., 135: 13393-13399 (2013)
A designed metalloprotein using an unnatural amino acid
An example of computational protein design II.
X-ray crystallographic analysis of MB_07 bound to
Co2+ and Ni2+. Electron density from a 2Fo–Fc map
in the vicinity of the Bpy-Ala bound to Co2+ (a) and
Ni2+ (c) contoured at 1.0 σ. Density for Bpy-Ala, Co2+
or Ni2+ (pink and green spheres, respectively), D184,
E159, and a metal-bound water molecule (red
spheres) is visible.
67
Design to improve detoxification rates of nerve agents
An example of computational protein design III.
• Organophosphate pesticides rapidly inactivate acetylcholinesterase and are the most
toxic stockpile nerve agents
• An integrated computational and experimental approach was applied to increase
Brevundimonas diminuta phosphotriesterase’s (PTE) detoxification rate of V-agents by
5000-fold
• Computational models were built of the complex between PTE and V-agents. On the
basis of these models, the active site was redesigned to be complementary in shape to
VX and RVX and to include favorable electrostatic interactions with their choline-like
leaving group
• five rounds of iterating between experiment
and model refinement led to variants that
hydrolyze the toxic SP isomers of all three
V-agents with kcat/KM values of up to
5×106 M–1min–1 and also efficiently detoxify
G-agents
• These new catalysts provide the basis for
broad spectrum nerve agent detoxification
68
Design to improve detoxification rates of nerve agents
An example of computational protein design III.
• Computational models of wild-type PTE (a) and the 5th generation variant A53
(b) with the bound substrate model (the SP isomer of a VX-RVX hybrid). (c) The
designed pocket of the A53 variant is complementary to VX’s leaving group,
including charge complementarity to the choline-like moiety
69
Poly-ethylene glycol
Sherman et al., Adv. Drug Deliv. Rev. 60:59-68 (2008)
Uricase PEGylation to reduce immunogenecity
• Uricase is used for gout
treatment, but its high
immunogenecity is limitation
• Attachment of PEG polymers via
lysine coupling to reduce side-
effects
• PEG number optimization
10-kDa optimal size
• 1000x reduced antigenecity upon
PEGylation
• Improved solubility and
increased serum half-life
70
Flow chart of the key steps in the design of novel proteins
71
Protein dynamics
Chao and Byrd, Em. Top. Life Sci., 2:93-105 (2018)
72
Design of proteins that exchange on functional timescales
• Proteins are intrinsically dynamic molecules that can exchange
between multiple conformational states, enabling them to carry
out complex molecular processes with extreme precision and
efficiency.
• Attempts to design novel proteins with tailored functions have
mostly failed to yield efficiencies matching those found in nature
because standard methods do not allow the design of exchange
between necessary conformational states on a functionally
relevant timescale.
• A broadly applicable computational method was developed to
engineer protein dynamics that we term meta-multistate design.
• This methodology was capable to design spontaneous exchange
between two novel conformations introduced into the global fold of
Streptococcal protein G domain β1.
• The designed proteins, named DANCERs, for dynamic and native
conformational exchangers, are stably folded and switch between
predicted conformational states on the millisecond timescale.
• The successful introduction of defined dynamics on functional
timescales opens the door to new applications requiring a protein
to spontaneously access multiple conformational states.
Davey et al., Nat. Chem. Biol. 13, 1280–1285 (2017)
73
The meta-multistate design framework for design of
conformational exchange
Davey et al., Nat. Chem. Biol. 13, 1280–1285 ( (2017)
• (a,b) Multistate design (MSD) with
an ensemble of backbone
templates approximating the
conformational landscape for
dynamic exchange between
targeted states
• (c) MSD also returned an energy
value for each microstate that
reflects its predicted stability.
• (d,e) Geometry-based analysis of
the rotamer-optimized microstates
(d) allowed assignment of each
microstate to major, minor, or
transition state regions of the
energy landscape (e).
• (f) Prediction of conformational
dynamics was done based on an
evaluation of the relative energies
of these states. For meta-MSD to
predict a sequence as dynamic, all
three states must be stable and
have an energy profile that is
compatible with exchange (for
example, sequence D).
74
Examples of computationally designed enzymes
75
Design of ancestral proteins
1. The hypothesis suggests that ancestral
proteins were able to withstand the harsh
conditions prevalent on earth at that time
2. In addition to their thermostability,
ancestral enzymes may have been
promiscuous with respect to
substrates. The evolution theory of
proteins holds that current proteins
evolved from low-specificity ancestral
proteins
3. Because of their low specificity, the
ancestral proteins evolved to become
more efficient at using specific substrates
4. Thus, the reconstruction of ancestral
sequences from multiple sequence
alignments and phylogenetic trees may
provide the opportunity to change
enzyme specificity
Akanuma, Life,7 (3): 33 (2017)
76
Computational challenge in protein simulations
• Computer simulations can provide atomistic details of the processes that are hardly observed
through experimental measurements. Biological processes typically microsecond or even
millisecond long, need to be followed in computer simulations femtosecond by femtosecond,
simulating such processes in every atomistic detail is computationally challenging.
• Despite ever-growing computing power, computer simulations cannot be used for the modelling
of large biomolecular systems over time scales long enough to be of biological interest.
• To overcome the challenge, coarse-grained models, in which multiple atomistic sites are
grouped into one site, have been developed. The models significantly reduce computational
cost and, thereby, enhance the speed of simulation
https://www.ks.uiuc.edu/Research/cgfolding/
77
“High-resolution“ protein design workflow
• Protein design begins with a structure or complex and produces new sequences
• Design positions are chosen to be mutated. Next, the sequence may be aligned
to other homologous sequences to produce biological constraints on the
sequence space
• Sequence design is then performed and can be done using a single state or
multiple states. In this step, the structure being designed can remain fixed, with
only side-chain rotamers changing, or may be completely flexible
• The algorithms for sampling come from the same classes of techniques used in
protein folding
• Designed sequences may then be clustered and evaluated with a more detailed
scoring function
• Design produces one or many sequences that are predicted to fold into the input
structure, often with enhanced biophysical characteristics
78
Rotamers
• Protein side chain may have many different
conformations
• They are mostly defined by the dihedral
angles (bond length and bond angle is
relative fixed)
• Side chains are only permitted to adopt
a discrete set of statistically preferred
conformations: rotamers
• The figure shows dihedral angles in
glutamate: dihedral angles are the main
degrees of freedom for the backbone
(ϕ and ψ angles) and the side chain (χ angles)
of an amino acid. The number of χ angles varies between zero and four for
the 20 standard amino acids. The figure shows a ball-and-stick
representation of glutamate, which has three χ angles. The fading
conformations in the background illustrate a rotation around χ1.
79
Rotamer libraries
• Rotamer libraries were compiled by clustering the side chains of each
amino acid over the whole database. Each cluster is a representative
conformation (rotamer), and is represented in the library by the best sidechain
angles
80
Principles for designing ideal protein structures
Koga et al., Nature
491: 222-227 (2012)
• Fundamental rules: (a), ββ-rule. L (left-handed) and R (right-handed) ββ-units are
illustrated , (b), βα-rule. P (parallel) and A (antiparallel) βα-units are illustrated. (c),
αβ-rule. (d), Chirality (L versus R) of a ββ-unit. The chirality is defined on the basis
of the orientation of the Cα-to-Cβ vector.
81
Energy functions and molecular force fields
• In structure-based computational
protein design, folds are represented
by the spatial coordinates of the
backbone atoms or design scaffold
• Protein design is done by amino acid
side chain along the scaffold
• Side chains are only permitted to
adopt a discrete set of statistically
preffered conformations: rotamers
• Rotamer/backbone and
rotamer/rotamer interaction energies
are tabulated
• These potential energies can then be
approximated by using any of the
standard force fields: CHARMM,
AMBER, GROMOS
82
Comparison of models with experimental structures
Koga et al., Nature 491: 222-227 (2012)
• Experimental validation by NMR
• Comparison of overall topology. Design
models (left) and NMR structures
(right); the Cα root mean squared
deviation (r.m.s.d.) between them is
indicated
• Comparison of core side-chain packing
in superpositions of design models
(rainbow) and NMR structures (grey)
• These results illuminate how the folding
funnels of natural proteins arise and
provide the foundation for engineering a
new generation of functional proteins
free from natural evolution
83
Principles for designing ideal protein structures
Koga et al., Nature 491: 222-227 (2012)
• A new approach based on a set of rules relating secondary structure patterns to protein
tertiary motifs, which make possible the design of funnel-shaped protein folding energy
landscapes leading into the target folded state
• Guided by these rules, they designed sequences predicted to fold into ideal protein structures
consisting of α-helices, β-strands and minimal loops
• Designs for five different topologies were found to be monomeric and very stable and to adopt
structures in solution nearly identical to the computational models
84
Strategies for the construction of synthetic fusion proteins
1. Linker-mediated tandem fusion is
achieved by joining two proteins in
a head-to-tail manner with a linker
peptide in between, which can be
selected from natural linker
reservoir or artificially designed to
separate fusion partners as well as
to maintain favorable interactions
between them.
2. Domain insertion is implemented
by inserting one domain into a host
domain through carefully selected
recombination sites.
3. Post-translational conjugations
used to combine separately
expressed proteins to form a
branched architecture, in which
chemical reagents or enzymes
such as transglutaminase and
sortase A that recognize specific
amino acids or sequences are used
to cross-link these tagged proteins.
85
Computational protein design: what is it?
https://www.ipd.uw.edu/tag/protein-design/
• Dissecting the rules that govern protein structures
• Implementation of these rules into a computer program, Rosetta, Robetta
• Cracking the protein folding code enables to model protein structures and design new
proteins with desired properties
86
Engineered fusion proteins help protein crystallographers
Hai & Christianson, Nat. Chem. Biol.
12:741-747 (2016)
Fusion tags used:
• Thioredoxin (Thx)
• Maltose-binding protein (MBP)
• Glutathione S-transferase (GST)
• Small ubiquitin-like modifier (SUMO)
• Polyhistidenes (6xHis, 12xHis)
87Liu et al., PNAS 115: 3362-3367 (2018)
Imaging of small proteins displayed on protein scaffolds
• New electron microscopy (EM) methods are making it possible to
view the structures of large proteins and nucleic acid complexes at
atomic detail, but the methods are difficult to apply to molecules
smaller than approximately 50 kDa.
• This limit can be successfully visualized when it is attached to a
large protein scaffold designed to hold 12 copies of the attached
protein in symmetric and rigidly defined orientations.
88
The Rosetta modelling software: overview
• The Rosetta software suite includes algorithms for computational modelling and
analysis of protein structures. It has enabled notable scientific advances in computational
biology, including de novo protein design, enzyme design, ligand docking, and structure
prediction of biological macromolecules and macromolecular complexes.
• Rosetta is available to all non-commercial users for free and to commercial users for a fee.
• Rosetta development began in the laboratory of Dr. David Baker at the University of
Washington as a structure prediction tool but since then has been adapted to solve common
computational macromolecular problems.
• Development of Rosetta has moved beyond the University of Washington into the members
of RosettaCommons, which include government laboratories, institutes, research centers,
and partner corporations.
• The Rosetta community has many goals for the software, such as:
Understanding macromolecular interactions
Designing custom molecules
Developing efficient ways to search conformation and sequence space
Finding a broadly useful energy functions for various biomolecular representations
https://www.rosettacommons.org/