Masaryk University
k
%
Faculty of Science
Loschmidt Laboratories
Department of Experimental Biology ^ e h u m H ^
Molecular Modelling of StructureFunction
Relationships in Enzymes
Ph.D. Dissertation
David Bednář
Supervisors:
Prof. Mgr. Jiří Damborský, Dr.
Mgr. Jan Brezovský, Ph.D.
Brno 2017
2
Poděkování:
Rád bych poděkoval svému školiteli Jiřímu Damborskému za umožnění studia v
Loschmidtových laboratořích a také za odborné vedení, četné diskuze a motivaci při řešení
obtížných úkolů.
Velmi děkuji Janu Brezovskému za bezmeznou trpělivost, přátelský přístup a cenné rady,
které mi během celého studia poskytoval.
Děkuji také všem současným i bývalým kolegům, kteří vytvářeli příjemné a inspirativní
prostředí, za pomoc a diskuze, bez kterých by se žádná vědecká práce neobešla.
V neposlední ředě bych rád poděkoval mým rodičům a blízkým za podporu a důvěru při
studiu i v osobním životě.
3
Bibliographic Entry
Author:
Title of dissertation:
Study Programme:
Field of Study:
Supervisor:
Supervisor-Specialist:
Year of Defence:
Keywords:
Mgr. David Bednář
Loschmidt Laboratories
Department of Experimental Biology
Faculty of Science
Masaryk University
Molecular Modelling of Structure-Function
Relationships in Enzymes
Biology
Molecular and Cellular Biology
prof. Mgr. Jiří Damborský, Dr.
Mgr. Jan Brezovský, Ph.D.
2017
molecular modelling; enzyme; protein engineering;
stability
4
Bibliografický záznam
Autor:
Název disertace:
Studijní program:
Studijní obor:
Školitel:
Školitel specialista:
Rok obhajoby:
Klíčová slova:
Mgr. David Bednář
Loschmidtovy laboratoře
Ústav experimentální biologie
Přírodovědecká fakulta
Masarykova univerzita
Molekulové modelování vztahů mezi strukturou a
funkcí enzymů
Biologie
Molekulární a buněčná biologie
prof. Mgr. Jiří Damborský, Dr.
Mgr. Jan Brezovský, Ph.D.
2017
molekulové modelování; enzym; proteinové
inženýrství; stabilita
5
© David Bednář, Masaryk University 2017
6
CONTENT
Content 7
Abstract 9
Abstrakt H
Motivation 13
INTRODUCTION
1 Protein Stability 15
1.1 T h e r m o d y n a m i c Stability 16
1.2 Kinetic Stability 22
1.3 M e t h o d s for Protein Stabilization 25
2 Molecular M o d e l i n g M e t h o d s 32
2.1 M o l e c u l a r Docking 32
2.2 Molecular Dynamic Simulations 35
2.3 Hybrid Q u a n t u m Mechanics/Molecular Mechanics 38
3 M o d e l Proteins 4 1
3.1 Haloalkane Dehalogenase DhaA 41
3.2 y-Hexachlorocyclohexane Dehydrochlorinase LinA 43
3.3 Fibroblast G r o w t h Factor 2 45
RESULTS
4 FireProt: Energy- and Evolution-Based Computational Design of Thermostable M u l t i p l e Point
Mutants 48
4.1 Abstract 49
4.2 Author S u m m a r y 50
4.3 Introduction 51
4.4 Results 53
4.5 Discussion 63
4.6 M e t h o d s 67
4.7 Supporting Information 75
5 Computer-Assisted Engineering of Hyperstable Fibroblast G r o w t h Factor 2 84
5.1 Abstract 85
5.2 M a i n paper 85
5.3 M e t h o d s 93
7
5.4 Supporting Information 108
6 Balancing the Stability-Activity Trade-Off by Fine-Tuning Dehalogenase Access Tunnels. 128
6.1 Abstract 129
6.2 Introduction 130
6.3 Results 132
6.4 Discussion 142
6.5 Conclusion 146
6.6 Experimental Section 146
6.7 Supporting Information 155
7 Site-Specific Analysis of Protein Hydration Based on Unnatural A m i n o Acid
Fluorescence 163
7.1 Abstract 164
7.2 Introduction 165
7.3 Results and Discussion 165
7.4 Conclusions 172
7.5 Supplementary Information 174
8 Different Structural Origins of Haloalkane Dehalogenases' Enantioselectivity towards Linear
ß-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites 196
8.1 Abstract 197
8.2 M a i n Article 198
8.3 Supporting Information 207
Conclusions 222
References 223
Curriculum Vitae 239
List of Publications 240
8
ABSTRACT
This dissertation is devoted to studying the structure-function relationships of proteins
using the methods of molecular modeling. In the opening chapter, the emphasis is on the
theoretical basis of protein stability. Stability is one of the most important properties and
the use of proteins depends on good stability in both basic research and industry. Stability
can be divided into thermodynamic (the energy difference between folded and unfolded
state) and kinetic stability (separation of relevant states by high activation energy). Several
possibilities of protein stabilization are discussed ranging from purely experimental
methods based on directed evolution and saturation mutagenesis, through identification
of hot-spots, to sophisticated algorithms combining free energy calculations with the
evolutionary inference.
Next two chapters in introduction deal with the molecular modeling methods and model
proteins used in the results section. Molecular dynamics is one of the most important
methods that can be used to describe protein stability, activity, folding, hydration,
enantioselectivity and substrate specificity. The other methods are molecular docking,
which predicts binding modes and binding energy of the substrate-enzyme complex, and
hybrid quantum mechanics and molecular mechanics methods used to describe the
reactivity of macromolecules with small molecules.
Results section consists of five parts. The first one (chapter 4), deals with the computational
method FireProt applied for effective stabilization of a protein. FireProt combines
evolutionary approach with a calculation of Gibbs free energy, complemented by efficient
filtering using a variety of in silico tools. The method was applied to the enzyme haloalkane
dehalogenase DhaA and dehydrochlorinase LinA resulting in thermostability increase 24
and 21 °C, respectively.
Chapter 5 is focused on the stabilization of human fibroblast growth factor FGF2. This
protein is involved in numerous regulatory processes, suggesting a good applicability in
both medicine and basic research. Nowadays, the main application of FGF2 is its addition
9
into the medium used for stem cells cultivation. However, FGF2 is not very stable with a
half-life between 10 and 12 hours. Stabilizing mutations were identified by using the free
energy calculation by Rosetta and FoldX in combination with evolutionary approach "backto-consensus".
These hits were combined with mutations from saturation mutagenesis
libraries. The final nine-point mutant showed thermostability increase of 19 °C and was
stable in the cultivation medium for more than twenty days.
The chapters 6-8 deal with a characterization of DhaA enzyme. Chapter 6 focuses on the
stability-activity relationships. A stable four-point mutant of DhaA showed tolerance
against organic co-solvents, but on the other hand had a very low activity. A single mutation
at the mouth of access tunnel significantly improved activity while preserving the
thermostability and stability against organic co-solvents. A single mutation increased the
mobility of secondary structures and significantly increased the diameter of the access
tunnel.
Chapter 7 deals with a method for the analysis of protein hydration. A fluorescent artificial
amino acid, which provides a different signal in the presence of varying number of water
molecules, was introduced into the tunnel mouth of two dehalogenases. Using molecular
dynamics, different hydration of both dehalogenases was observed, which is in close
correlation with experimental fluorescent spectroscopy.
The last 8t h
chapter is devoted to the study of enantioselectivity of dehalogenases. Previous
studies showed that the enantioselectivity of DbjA dehalogenase with 2-bromopentane is
due to increased hydration in the accessible active site. Active site hydration aligned the
substrate with the hydrophobic wall of the protein favoring the conformation of (R)enantiomer.
A five-point mutant of DhaA has the opposite properties of its active site than
DbjA (less hydrated and less accessible active site) and yet exhibits the same
enantioselectivity. The difference in enantiodiscrimination of dehalogenases was explained
using a combination of site-directed mutagenesis, kinetic measurements, and molecular
modeling.
10
ABSTRAKT
Disertační práce se zabývá studiem strukturních a funkčních závislostí v proteinech
metodami molekulového modelování. V úvodních kapitolách je popsán teoretický zaklad
stability proteinů. Stabilita je jednou z nejdůležitějších vlastností proteinů a odvíjí se od ní
možnosti použití proteinů jak v základním výzkumu, tak i v průmyslu. Stabilitu dělíme na
termodynamickou (rozdíl v energiích mezi sbaleným a rozvolněným stavem) a kinetickou
(rozdělení stavů vysokou aktivační energií). Dále jsou diskutovány možnosti stabilizace
proteinů od čistě experimentálních metod založených na řízené a saturační mutagenezi,
přes identifikaci „hot-spotů" až po sofistikované algoritmy kombinující výpočty volných
energii a evoluční závislosti.
Následující dvě kapitoly úvodu se zabývají metodami molekulového modelování a
modelovými proteiny použitými ve výsledkové části. V této části je představena molekulová
dynamika jako jedna ze základních metod s širokým uplatněním při popisu stability, aktivity,
hydratace, enantioselektivity či substrátové specifity proteinů. Dalšími metodami jsou
molekulový docking pro získávání vazebného modu a vazebné energie mezi substrátem a
enzymem a hybridní kvantově mechanická a molekulově mechanická metoda pro popis
reaktivity makromolekul.
Výsledková část se skládá z pěti kapitol. První z nich (kapitola 4) popisuje výpočetní metodu
FireProt pro efektivní stabilizaci proteinů. FireProt kombinuje evoluční přístup s výpočtem
Gibbsovy volné energie doplněné o efektivní filtrování pomocí nejrůznějších in silico
nástrojů. Metoda byla aplikována na enzymy halogenalkandehalogenasu DhaA a
dehydrochlorinasu LinA s výsledným zvýšením termostability o 24 a 21°C.
Kapitola 5 se věnuje stabilizaci lidského fibroblastového růstového faktoru FGF2. Tento
protein se účastní četných regulačních procesů a dá se předpokládat dobré uplatnění
stabilizované molekuly v medicíně i základním výzkumu. V současné době je hlavní a
nejdůležitější aplikací přidávání FGF2 do média pro kultivaci kmenových buněk. Protein je
však málo stabilní s poločasem rozpadu mezi 10 a 12 hodinami. Pomocí výpočtů volné
11
energie programy Rosetta a FoldX a evolučního přístupu „back-to-consensus" byly
vytipovány stabilizující mutace. Ty byly skombinovaný s mutacemi získanými saturační
mutagenezí jednotlivých pozic. Výsledkem projektu byl protein se zvýšenou
termostabilitou o 19 °C a se stabilitou v kultivačním médiu vice než dvacet dní.
Další tři kapitoly se zabývají charakterizací různých vlastností enzymu DhaA. Kapitola 6 se
věnuje vztahu mezi stabilitou a aktivitou. Stabilní čtyřbodový mutant DhaA vykazoval
toleranci vůči organickým solventům, současně však měl velmi nízkou aktivitu. Pomocí
jediné mutace v ústí přístupového tunelu došlo k výraznému zlepšení aktivity a přitom
k zachování termostability a stability vůči organickým solventům. Tato jediná mutace
zvýšila mobilitu sekundárních motivů ve svém okolí a výrazně zvětšila průměr tunelu.
Kapitola 7 se zabývá metodou pro analýzu hydratace proteinů. Do ústí tunelu dvou
dehalogenas byla vnesena flourescenční umělá aminokyselina, která vykazuje odlišný signál
v přítomnosti různého počtu a chování molekul vody. Za pomoci molekulové dynamiky byla
zjištěna různá hydratace obou dehalogenas, která úzce korelovala s experimentální
flourescenční spektroskopiíTato nově vyvinutá technika značení a studia hydratace
proteinů je široce aplikovatelná na různé proteiny.
Poslední 8. kapitola se věnuje studiu enantioselektivity dehalogenas. Z předchozí studie
vyplynulo, že enantioselektivita dehalogenasy DbjA s 2-brompentanem je způsobena
zvýšenou hydratací v dobře přístupném aktivním místě, která vede k vazbě substrátu podél
hydrofobní stěny proteinu v konformaci výhodné pro (7?J-enantiomer. Pětibodový mutant
DhaA má opačné vlastnosti než DbjA (hůře přístupné a méně hydratované aktivní místo) a
přesto vykazuje stejnou selektivitu. Kombinací přístupů místně-cílené mutageneze,
kinetických měření a molekulového modelování byl vysvětlen rozdíl mezi dvěma různými
způsoby enantiodiskriminace u dehalogenas.
12
MOTIVATION
Stability is an important property of every protein and determines its applicability in basic
research as well as in industry. There are many protein stabilization methods, usually
demanding extensive experimental characterization. Methods for direct prediction of
stable multiple-point mutants are very rare and have lower reliability. A design of a method
for effective protein stabilization where only a few variants have to be characterized would
increase the applicability of many proteins.
Haloalkane dehalogenases are an important group of enzymes providing many different
application possibilities. Their ability to degrade dangerous environmental pollutants
determines them for application in biodegradation, bioremediation or biosensing. A better
understanding of their reactivity would be useful for optimization of properties and
broadening applications.
Objectives of this dissertation:
• Development of a new method for fast and efficient protein stabilization
• Validation of this new method with three different model proteins
• In silico characterization of engineered haloalkane dehalogenase DhaA
13
PARTI
INTRODUCTION
14
1 Protein Stability
Stability is a fundamental property affecting the function, activity or regulation of every
biomolecule. It determines the application feasibility of proteins in both the industry and
the basic science1
. In basic science, the understanding of protein stability is essential for
the expression optimization, purification, formulation, storage and structural studies of
proteins. Protein stability is of major interest in biotechnology, pharmaceutical or food
industry. Proteins have to withstand harsh conditions during technological processes such
as the use at higher temperatures, solubilization of substrates by organic solvents or
decrease the risk of microbial contamination by disinfectants. Further, higher
concentrations of additives like organic solvents, which are used to increase the solubility
of water-insoluble reactants and suppress their undesired hydrolysis, can have a significant
effect on protein stability2
. With increasing use of different enzymes in the research and
the industry, finding a wild type enzyme with required properties is often impossible,
therefore many protein engineering techniques have been applied to improve the
applicability of enzymes by stabilization of their structures.
Stability of a native state protein at standard conditions can be driven either by
thermodynamics (the most stable conformation) or kinetics (the most accessible
conformation) and is influenced by many different factors3
. Therefore, protein stability can
have a different meaning for different people based on their field of study. From a
biotechnology point of view, one would consider primarily a half-life of the protein as the
main measurement of a protein stability4
. On the other hand, structural biology mostly
focuses on changes in primary to quaternary structure of the protein where stability is
determined as the change of the free energy between individual states. There are many
factors which may lead to unfolding or denaturation of the native folded state: i) physical
factors, e.g., temperature and pressure, ii) chemical factors, e.g., pH, salt and co-solvents
iii) biological factors, e.g., cleavage by proteases.
15
1.1 Thermodynamic Stability
The hypothesis which concerns a protein folding was originally postulated by the Nobel
Prize Laureate Christian B. Anfinsen and is known as Anfinsen's dogma or thermodynamic
hypothesis. Anfinsen observed with small globular proteins that the protein native
structure is determined only by its amino acids sequence5
. The protein under natural
conditions is folding into stable, unique and kinetically accessible conformation with the
minimum free energy. The folding, according to the Levinthal's paradox, cannot be driven
only by a sampling of all possible conformations, but the formation of local interactions
(nucleation centers), which decrease the folding time to the reasonable level6
'7
. The
Anfinsen's hypothesis was experimentally confirmed, even though rare contradicting cases
exist when a protein remain in its local energy minimum whereas global minimum is
separated by a high activation energy.
Most of the smaller globular proteins undergo a two-state mechanism. The equilibrium
between the native folded state (F) and an ensemble of unfolded states (U) is defined by
the equilibrium constant of protein folding (Ke q , Equation 1).
Folded <—> Unfolded
Equation 1
Equilibrium constant describes a thermodynamic balance between both states and it can
be calculated from folding (kf) and unfolding (ku) rate (Equation 2).
Equation 2
A quantitative measure of the protein stability at constant temperature and pressure is
then defined as a difference in Gibbsfree energy of folding (AG) between Gibbs free energy
of native folded (Gf) and unfolded state (Gu, Equation 3).
16
Equation 3
A negative change in the folding free energy signifies that the folded state is more stable
and will be spontaneously formed (Figure 1). The relation between the equilibrium
constant and folding free energy change is defined Equation 4.
AG° = -RT lnKeq
Equation 4
where R is the gas constant and T temperature.
Transition state
nergy
j
A G *
" u
folding
7\ "unfolding
/ \
CD , / \
CD
LL Unfolded
A G \
Native
Reaction coordinate
Figure 1 . The energy diagram for the folding process of the two-state model. AG#
f0 iding and AG#
unfoiding show
the activation energy between the unfolded and transition state and between the native folded and transition
state, respectively. Adopted from8
.
The Gibbs free energy is a thermodynamic potential which is used to calculate the
maximum reversible work which may be performed by a system (Equation 5).
17
G(p,T) = U + pV-TS
Equation 5
Where U is internal energy, p pressure, V volume, T thermodynamic temperature and S
entropy.
From a stability point of view, the difference in the free energy is the maximum work done
by the system which transforms reversibly from an initial state to a final state at constant
temperature and pressure. At standard state, it can be written as Equation 6.
AG0
= AH0
- TAS°
Equation 6
Where AH0
is the enthalpic term and TAS the entropic term. AG has to be negative for
protein folding to be spontaneous and the native protein structure to be stable. Both terms
are results of individual interactions and other effects occurring in protein structures.
1.1.1 Interactions Contributing to Protein Stability
Historically, it was believed that hydrogen bonds, enabling the formation of secondary
structures9 , 1 0
, are the most important forces for the proper protein folding and keeping the
protein in its native conformation. Later, the importance of hydrophobic interactions (van
der Waals forces of non-polar residues which form compact protein core) was strongly
emphasized as the main driving force of protein folding1 1
. Nowadays, both effects together
with the protein conformational flexibility play the most important role in the
determination of protein stability (Figure 2)1 2
. Individual contributions of these interactions
to protein stability are discussed below.
18
*AG = 0
Hydrophobic
Effect
Hydrogen
Bonds
Other
Forces
Net Stability
Conformational
Entropy
Figure 2. Protein folding diagram depicting the contributions to the protein stability. Final net protein
stability in native conditions is usually marginal (5-15 kcal.mol"1
)1 3
. It is the delicate balance between
destabilizing effect of conformational entropy and stabilizing effects mostly covered by hydrophobic effect
and hydrogen bonding.
Hydrophobic effect. On average, proteins bury about 85 % of their non-polar side chains to
preserve them from contact with water molecules1 4
. The protein core is more tightly
packed then water molecules and the difference between the van der Waals interaction of
-CH2- group in protein and in water was calculated to be 1.3 kcal.mol"1
lower in favor of the
protein1 5
. Mutational experiment on 13 proteins was performed to evaluate individual side
chain contributions. Larger non-polar amino acid was mutated into smaller one and AAG
between wild type and mutant protein was measured1 6
. It was estimated that for every lost
-CH2- group protein loses 1.1 kcal.mol"1
on average1 7
. The decrease in the number of the
atom contacts and formation of voids within the protein structure leads to the loss of van
der Waals interactions, and therefore to the lower stability1 8
. Stabilizing hydrophobic
effects are connected also with the entropic contribution. Water molecules H-bonded to
the protein residues are released during protein folding. This results in an increase of
conformational entropy of water and therefore contribute to protein stability1 9
. The
hydrophobic interactions contribute to total protein stability with 60 % on average1 7
.
19
Hydrogen bonds. Determination of the real contribution of H-bonds to the protein stability
is much more difficult compared to hydrophobic interactions. Every residue in the folded
protein forms on average 1.1 hydrogen bonds2 0
and 70% of the peptide groups and 65% of
the polar side chains are buried in the protein core are not in contact with water1 4
lt is still
disputable whether the burying of the polar groups is energetically more advantageous
than the formation of the van der Waals and H-bonds with water molecules1 2
.
Studying the effect of H-bonds simply by mutating the bonding residue is rather less
effective because the hydrophobicity, the conformational entropy, and the packing of the
side chain in the protein is altered as well2 1
. Therefore, an experiment was carried out using
double mutant cycle. Both H-bond donor and acceptor were mutated individually and also
as a double point mutant2 2
. Double mutant cycles, as well as experiments removing side
chain involved in H-bond, showed very similar AAG values. For example, stability of Tyr to
Phe mutants was lowered by 1.4 ± 0.9 kcal.mol"1
when the Tyr side chain was hydrogen
bonded and only 0.2 ± 0.4 kcal.mol"1
when it was not1 2
. On average, H-bonds contributed
by 1.2 ± 1.0 kcal.mol"1
. In the case of Thrto Val mutation, the contribution was 1.0 ± 1.2
kcal.mol"1
. High standard deviations suggest that the H-bond contribution is highly
dependent on the environment. An interesting behavior is also the positive effects on
stability in cases where the Tyr to Phe mutation was not hydrogen bonded. This shows that
buried polar residues can frequently contribute to protein stability. This contribution of
improved packing may be larger than stabilization of the unfolded state where a polar chain
is stabilized by water H-bond2 3
.
The strength of the H-bond depends on its distance and geometry2 4
, but also on the
surrounding environment. H-bonds in non-polar environments with a lower dielectric
constant form stronger interactions where AAG can be more than 1 kcal.mol"1
more stable
than when they are solvent exposed2 5
. Also, charge-stabilized H-bonds are about 2
kcal.mol"1
stronger than neutral and their stability contribution can be as high as 7 kcal.mol"
1 2 6
. Altogether, H-bonds make about 40 % contribution to total protein stability1 7
.
20
Other forces contributing to protein stability. Disulfide bridges can contribute significantly
to protein stability by reducing the conformation entropy in the unfolded state. The
thermostability (melting temperature) of proteins can be usually increased by several
°C2 7
'2 8
and in rare cases even up to 40°C by introducing a single disulfide bridge2 9
. The effect
of the disulfide bridge formation can be reasonably estimated by Equation 7 which is based
on the experimental observations for proteins with two-state unfolding:
AS = - 2 . 1 - (3/2) R ln(n)
Equation 7
Where n is the number of residues in the loop formed by the disulfide bridge, and R is the
gas constant2 7
.
Charge-charge interactions are the strongest non-bonded interactions in proteins and they
are expected to have a significant effect on protein stability3 0
. It turns out that stability
increase by charge mutation isalways lowerthan predicted by Coulomb's law. This suggests
that not only folded but also unfolded state is stabilized by charge-charge interactions so
the contribution to total protein stability is small3 1
. Salt bridges (ion pairs) usually located
on the surface contribute less than 1 kcal.mol"1
. However, a higher contribution of about 4
kcal.mol"1
can be obtained from buried salt bridges, but their numbers in proteins are
rather small1 2
.
1.1.2 Effects Contributing to Protein Instability
Protein conformational entropy. The main and the most important effect decreasing
protein stability upon folding is the loss of conformational freedom1 9
. Compared to the
highly dynamic unfolded protein which can adopt many different conformational states,
residues in the folded state are tightly packed in the protein core and restricted to one or
few conformations.
Determination of the entropy loss is important for dissecting the energetics of protein
motions, including folding, conformational changes, and protein or ligand binding3 2 - 3 4
.
However, it is very difficult to measure conformational entropy directly. Therefore, mostly
21
molecular dynamics simulations and recently NMR relaxation methods3 5
are applied to
identify changes in protein motions.
Baxa and coworkers3 6
used molecular dynamics simulation to observe backbone and side
chain motion differences in folded and unfolded state of mammalian ubiquitin. They
observed that the average loss of entropy (TAS) is 1.4 kcal.mol"1
per residue (at 300 K). Side
chain entropy loss contributed only by a small amount (0.2-0.3 kcal.mol"1
). The highest
destabilizing effect was due to loss of backbone flexibility. A difference was observed based
on the position in particular secondary structure (1.5,1.0 and 1.2 kcal.mol"1
for helix, sheet,
and coil, respectively). There are many studies determining conformational entropy loss
and some of them are in the agreement with Baxa's results3 7 , 3 8
, others show different
values varying from 1.7 to 3.6 kcal.mol"1
per residue 12
<32
<39
<40
. Exact values can be contextdependent.
The lack of reliable experimental methods for determination of conformational
entropy is making prediction difficult.
1.2 Kinetic Stability
The majority of studies concerned with protein stability are focused on thermodynamic
stability whereas kinetic stability has been partially overlooked. It is understandable for two
reasons: (i) thermodynamic stability is easy to determine in vitro and (ii) there are many
computational methods available which can estimate the effect of a mutation on protein
thermodynamic stability. Recently, more studies demonstrating kinetic control of protein
stability have occurred. The kinetic stability is studied in biotechnology where stability in
harsh conditions is essential for technological applications. In medicine, the decrease in
kinetic stability leads to protein aggregation, accelerated degradation or formation of
amyloid fibers. These are effects linked with diseases due to protein misfolding like
phenylketonuria4 1
, amyotrophic lateral sclerosis4 2
, or transthyretin amyloidoses4 3
.
In thermodynamic stability, only folded and unfolded state are taken into consideration for
the two-state model. The equilibrium between folded and unfolded state favors more
22
stable folded state under physiological conditions. But there are also partially folded or
irreversibly denaturated states (proteolysed or aggregated proteins) which can change the
equilibrium between active and inactive states towards the later ones. Therefore, the twostate
model should be replaced by Lumry-Eyring model to which the final irreversible state
(I) is added (Equation 8)
Keq k
Folded <—> Unfolded -> Irreversible
Equation 8
The biological activity has to be maintained also in cases when folded state in not
sufficiently stable with respect to the unfolded or denaturated states. A sufficiently high
barrier is necessary to maintain the protein in the functional state long enough to fulfill its
biological function4 4
. To determine the height of the barrier, the transition state theory can
be applied. The irreversible denaturation rate can be obtained from Eyring equation
(Equation 9)
AG^( AU\
k = k0 expy-xj)
Equation 9
where ko is the front factor, k is the rate constant for irreversible denaturation and AG is
the difference between natural folded state and the transition state (activation energy).
1.2.1 Types of Kinetic Stability
The kinetic stability can be one of two following types4 5
:
i) The native folded state is thermodynamically more stable than the unfolded
states, but those can be irreversibly modified (interactions with other
molecules, aggregation, proteolysis, etc.). This leads sooner or later (depending
on the denaturation rate constant) to the situation where all protein is
irreversibly denaturated, and therefore non-functional. Unless there is a high
energy barrier dividing both states, many different proteins would undergo the
denaturation process when they are in crowded or harsh in vivo environment.
23
ii) The second possibility is that the folded state is notthermodynamically the most
stable under physiological conditions. Then, even if the irreversible
denaturation does not take place, the protein needs kinetic stabilization to
maintain it in the folded state. The example is the a-lytic protease which is
synthesized with a terminal region providing a driving force for protein folding
(stabilization about 26 kcal.mol"1
for the folded state)4 6
. The terminal region is
cleaved after the folding is finished and only the high unfolding barrier keeps
the enzyme in its active form.
1.2.2 The Relation Between Kinetic and Thermodynamic Stability
Based on the type of kinetic stability, there can be a strong or weak correlation or no
correlation at all between the thermodynamic and kinetic stability4 7
. Considering the first
type of kinetic stability, it depends whether the formation of the final irreversible state is
the rate-limiting step or not. If so, then the kinetic rate of irreversible state formation is
directly dependent on equilibration constant between folded and unfolded state.
Therefore, any change in the thermodynamics of natural folded and unfolded state is
directly affecting the kinetic stability. If the process of irreversible state formation is fast
and the unfolding is the rate-limiting step, then change in thermodynamic stability may or
may not affect kinetic stability. If we thermodynamically stabilize residues which are
unfolding in the transition state, the activation energy will be higher and the stabilization
will propagate into kinetic stability. Thermodynamic stabilization in the structured parts of
protein will not have an effect on kinetic stability. For the second type of kinetic stability,
any thermodynamic stabilization does not have the effect on kinetic stability unless we
stabilize the folded state to the level when it becomes the global minimum.
1.2.3 Kinetic Stability of Natural Proteins
One of the first enzymes ever described as an exception to Anfinsen's thermodynamic
hypothesis was the a-lytic protease4 6 , 4 8
. Later, many other proteins were proved to be
stabilized kinetically. Two approaches using high-throughput proteomic screening were
applied to identify how easily different proteins access the unfolded or partially folded
24
states. One technique was based on two-dimensional SDS-PAGE where the resistance to
SDS (sodium dodecyl sulfate) was observed4 9
. The second technique was based on protein
survival in the mixture with proteolytic enzymes5 0
. In both studies together, there were 81
kinetically stabilized proteins identified. Both methods are focused mostly on highly stable
proteins and it should be mentioned that kinetic stability depends on the height of a free
energy barrier which cannot be divided strictly into two groups. Both studies overlap in
only six proteins, which show that also kinetic stability can be divided according to the type
of unfolded state, which is targeted during irreversible denaturation. An interesting
common feature of kinetically stable proteins in both studies was that the most of the
proteins had solved X-ray structure. It was hypothesized that kinetic stability is beneficial
in harsh crystallographic conditions4 5
.
1.3 Methods for Protein Stabilization
Proteins are increasingly used in many different fields including biotechnology, medicine,
pharmacy, biocatalysis or nanomaterials5 1 - 5 3
. The most of the naturally-occurring proteins
are only marginally stable which limits their applicability. The difference between the
stability of folded and unfolded state is often only 5 - 1 5 kcal.mol"1
, and even a single
mutation or change in conditions can significantly destabilize a protein5 4
. Therefore,
protein stabilization is necessary for all applications. Many methods for protein
stabilization were established based on different approaches ranging from experimental
screening and rational designs to bioinformatics and evolutionary analyses.
1.3.1 Random Mutagenesis
Completely random mutagenesis can lead to stable variants, but unfortunately, the
majority of mutations are neutral or destabilizing5 5
. Therefore, a large number of mutants
has to be constructed to identify those that have a significant stabilizing effect. Mutations
are introduced by DNA shuffling or error-prone PCR5 6 - 5 8
creating a large library of individual
25
mutants. The finding of required mutations is dependent on effective high-throughput
screening or selection assay5 9
.
As an example, Gene Site Saturation Mutagenesis (GSSM)6 0
'6 1
method was used to increase
thermostability of haloalkane dehalogenase DhaA for biotechnological application. A large
library comprised of more than 100,000 variants was constructed to cover all possible
single-point mutants. A high-throughput screening for thermostability increase was used
and resulted in an eight-point mutant with 18°C increase in melting temperature.
Effectivity of similar methods is quite low when only a tiny fraction of screened mutants is
stabilizing. Other disadvantages are time and financial demands and that appropriate
screening method does not have to be available for every case. A potential solution for
decreasing time and cost of large screenings appears to be microfluidic techniques which
are on the uptrend nowadays6 2 , 6 3
.
1.3.2 Hot-Spot Prediction
Another possibility how to decrease experimental work and increase the effectivity of
stabilizing mutations detection is by using computational approaches. Areas or residues
possessing a property linked with stability may be identified and experimental mutagenesis
takes place only in relevant regions. Iterative saturation mutagenesis is then used in
selected positions and the best variants are identified. The advantage of this approach is
that it can decrease the number of experiments by a few orders of magnitude (compared
to the random mutagenesis), but still screening or selection technique is usually needed.
Another disadvantage is that structural information has to be available.
One of these methods is B-fitter which focuses on flexible parts of protein structures6 4
.
Thermophilic enzymes are characterized by higher degrees of rigidity resulting from better
packing and increased number of different interactions. Reetz and co-workers used bfactors
(Debye-Waller factor, temperature factor) from crystallographic analyses to
identify the most flexible regions and selected them for iterative saturation mutagenesis.
They applied their method on the lipase from Bacillus subtilis LipA and measured the
26
increase in T$o (temperature where 50 % of an enzyme is irreversibly denaturated). The
introduction of seven mutations into wild type enzyme (T"5o= 50°C) led to the abolition of
irreversible denaturation even after heating to 100°C. The strategy to focus on residues
with high b-factors was used successfully repeatedly either in combination with
experimental screening6 5 , 6 6
or other stabilization approaches6 7 , 6 8
.
A different strategy is used in Hotspot Wizard6 9
. Prediction of hot-spots for mutagenesis is
focused on functional regions like the active site or access tunnel. The protocol is extended
by identification of functional and conserved residues. The later version7 0
adds analyses of
b-factors described above and stability prediction by the back-to-consensus approach.
Finally, Hotspot Wizard can be used also for library design varying identified residues.
An effective approach for stabilization of multimeric proteins is optimization of proteinprotein
interfaces7 1
. Maugini and co-workers7 2
analyzed interface interactions of
oligomeric thermophilic and hyperthermophilic enzymes. The results showed that the
difference compared to mesophilic enzymes is mostly in increased area of inter-domain
interactions and that the hydrophobic interactions are the main driving force of
oligomerization in thermophilic enzymes. Therefore, improved packing of the interface is a
good approach for stabilization of multimeric proteins.
Screening or selection techniques are demanding and there is a need to identify directly
specific mutations and not only positions. Computational chemistry and bioinformatics
have a solution for that and single- or even multiple-point mutants can be obtained by
calculation of free energy and analysis of evolutionary relations.
1.3.3 Evolution-Based Methods
Interesting and effective stabilization methods are those based on bioinformatics analyses
of protein evolution. Their biggest advantages are that they do not need any structural
information, are easy to use, and computational demands are much lower compared to the
methods predicting the free energy. The bioinformatics analysis is based on multiple
sequence alignment (MSA) of evolutionarily related sequences5 4
. For prediction of
27
stabilizing mutations, the concept of consensus is used, based on the idea that mutation to
the most common residue in MSA is likely to be stabilizing. Two main methods use this
principle: (i) back-to-consensus and (ii) ancestral reconstruction.
Consensus approach is simply mutating every residue in the target sequence which is not
the most frequent in MSA. The success rate of identifying stabilizing mutations for
immunoglobin domains was about 60 %7 3 , 7 4
, but the rate may differ for different proteins.
Usually, about a half of the mutations are stabilizing7 5
. Mutations not improving protein
stability may be conserved due to their importance for other properties, e.g., activity or
folding, which are also interesting for protein engineers. However, it is not possible to
distinguish why the particular residue is conserved. Moreover, consensus approach can fail
when the residue is not well conserved or when the residue is coupled (correlated) with
other residues. Elimination of residues with lower conservation or with statistical
correlation from consensus approach can lead to higher success rate7 5
.
A similar approach is used also for reconstruction of ancestral proteins. Here the MSA is
extended by phylogenetic analysis. MSA is used for the construction of the phylogenetic
tree. Based on the mutation rate, the output sequence can be calculated for every internal
node in that tree7 6
. There are three main algorithms for ancestral sequence reconstruction:
(i) maximum parsimony, (ii) maximum likelihood and (iii) Bayesian reconstruction6 3
'7 7 - 7 9
.
The correspondence between residues identified by consensus and ancestral approach is
often very high8 0
, but in some cases is the later superior8 1
. It seems possible that ancestral
proteins are using both sequence consensus information for we 11-conserved sites and also
some sequence correlation information which further improves its effectivity5 4
.
The contribution of individual mutations in evolution-based approaches is usually low but
a high number of identified mutations results in significant stability improvements.
1.3.4 Computational Methods
A large number of computational methods and protocols have been developed for
prediction of stabilizing mutations. They range from not very precise sequence-based
28
methods8 2
'8 3
, to structure-based methods introducing prolines or disulfide bridges8 4
,
optimizing protein core8 5
, surface charges8 6
or increasing surface polarity5 4
. However, the
most of the methods focus on the evaluation of Gibbs free energy change upon mutation,
i.e., calculation of AAG between wild type protein and single-point mutant.
Calculation of exact energy change upon mutations brings two problems. Firstly, the threedimensional
conformational space of protein is too large, therefore reduced rotamer
library of preferred side chain conformations has to be used8 7
. Secondly, precise energy
calculation by quantum mechanics is not possible for as large system as a protein, therefore
molecular mechanics is employed. Forcefields take their energy functions either from i)
physical-based potentials which analyze individual atom interactions (e.g. Rosetta, Eris,
EGAD, CC/PBSA, ...)8 8 - 9 2
. They are computationally the most expensive but on the other
hand, they provide very good accuracy9 3
; or from ii) empirical or statistical potentials which
are derived from statistical analysis of experimental data (e.g. FoldX, PoPMuSiC, PEAT-SA,
SDM, ... )9 3 - 9 6
. Another large group is machine learning methods. Approaches like artificial
neural networks, support vector machines or decision tree learnings are using different
weighted descriptors to predict protein stability, implemented in l-Mutant, Prethermut,
AUTO-MUTE and MAESTRO9 7 - 1 0 1
. With this large number of different methods, it is difficult
to choose the best one or few which are appropriate for the particular problem.
Fortunately, several comparisons were already done.
Potapov and coworkers1 0 2
published a comparison of six tools (CC/PBSA, EGAD, FoldX, IMutant2.0,
Rosetta, and Hunter). They calculate the correlation coefficient between
predicted and experimentally measured change in stability (AAG) upon mutation. The
evaluation was done on 2156 single-point mutations obtained from FoldX9 4
and ProTherm
database1 0 3
. The results showed that all methods are able to detect hotspots or classify
mutations as stabilizing and destabilizing but the exact AAG value is not predicted so
precisely (average error 1.2 kcal.mol"1
). The correlation coefficient ranged between 0.59 for
EGAD and 0.26 for Rosetta. This significantly lower value for Rosetta led to a reaction from
Kellog and coworkers8 8
where they presented 20 different protocols employing Rosetta
29
with correlation coefficient ranging from 0.04 to 0.69 (depending on different settings).
This proved that i) Potapov's setting (of Rosetta) is not appropriate for energy evaluation,
ii) Rosetta can perform even slightly better than all the other tools and iii) Rosetta is not so
user friendly.
Khan and Vihinen1 0 4
evaluated eleven online prediction tools (CUPSAT, Dmutant, FoldX, IMutant2.0,
two versions of I-Mutant3.0, MultiMutate, MUpro, SCide, Scpred, and SRide)
on updated (2008) ProTherm database. Only mutations which were not used for training
of particular tool were used for evaluation. Therefore, testing datasets ranged from 28 to
1784 mutations and differed for individual tools. CUPSAT performed the best for prediction
of stabilizing mutations, on the other hand, I-Mutant3.0 and FoldX had the highest accuracy
for prediction of destabilizing mutations.
Evaluation performed by Thiltgen and Goldstein1 0 5
on FoldX, Rosetta, Eris, and I-Mutant3.0
used different comparison. They selected pairs of known structures of wild type and
corresponding single-point mutant and observed how consistent will be the prediction of
forward and back mutation. Rosetta performed significantly better than the other three
tools showing small systematic bias and low energy errors.
Recently, different tools and approaches were combined into methods which increase the
success rate of stabilizing mutation identification or predict directly stable multiple-point
mutants. One of the methods is FRESCO (Framework for Rapid Enzyme Stabilization by
Computational libraries)2 8
which was used on model enzyme limonene epoxide hydrolase.
First of all, disulfide bridges were designed using their own DDD algorithm2 8
. Secondly,
Rosetta and FoldX were used to identify potentially stabilizing single-point mutants. Those
mutations were subsequently filtered by very short (100 ps) MD simulations. Using atoms
root-mean-square-deviation (RMSD), the flexibility of introduced mutations was
determined and mutations increasing flexibility were discarded. Finally, 64 single-point
mutants were experimentally characterized and combined into the final 10-point mutant.
Melting temperature was for 35°C and half-life >250-fold higher compared to the wild type.
Even though, the most of the stabilization came from disulfide bridge and mutations on the
30
interface of two subunits. Therefore, it would not be so applicable for monomeric proteins,
however it is interesting method providing high percentage of stabilizing mutations.
Second method enables the design of multiple-point mutants directly (without the need of
testing single-point variants) and is called PROSS (Protein Repair One Stop Shop)1 0 6
. The
method uses Rosetta ConsensusDesignMover1 0 7
, a modified consensus approach (every
residue more frequent in MSA than wild type residue is allowed in design), followed by
evaluation of energy by Rosetta. The most stable multiple point mutants are synthetized
and experimentally characterized. PROSS was applied on five different enzymes
(acetylcholinesterase, histone deacylase, ADP-ribosylase, NAD+
-dependent deacylase, DNA
cytosine-methyltransferase). Many of the designs showed an increase in thermostability
and/or expression level but in some cases the protocol failed. On the other hand, activity
or functional yield was also improved for several proteins. This proves that consensusbased
methods have a very high success rate for protein improvements but it is usually not
possible to determine which properties will be targeted by selected mutations.
Another strategy belonging into methods for direct prediction of stable multiple-point
mutants is FireProt (see chapter 4).
31
2 Molecular Modeling Methods
Methods of molecular modeling are a very extensive topic which would be sufficient for
several theses. Therefore, here I would like to just briefly introduce methods used in
projects compiled in this dissertation.
2.1 Molecular Docking
High-throughput techniques for protein characterizations together with higher numbers of
solved protein structures by crystallography or nuclear magnetic resonance have led to
increased interest for protein-ligand complexes. Molecular docking, as a basic tool for
virtual screening, has become one of the most important methods in drug discovery1 0 8
.
Compared to the experimental screenings, virtual screening is able to evaluate several
orders of magnitude more complexes, decreasing the cost and increasing the effectivity
significantly1 0 9
'1 1 0
.
Molecular docking is a method which predicts the behavior of a small molecule in the active
site of a target protein by analysis of interactions on atomic level1 1 1
. This can explain
biochemical processes or produce inputs for other computational methods. Docking is
composed of two steps i) prediction of ligand position and its conformation (sampling
methods) and ii) evaluation of binding energy (scoring functions). Both steps are very
demanding for precise evaluation, and therefore many approximations have to be
introduced.
One of the important approximations is receptor and ligand partial or total rigidity. First
sampling algorithms handled both protein and ligand as rigid bodies1 1 2
. The configuration
space was reduced to ligand rotational and translational degrees of freedom which
significantly increased a speed of calculation but the prediction power was very low.
Therefore, ligand flexibility1 1 3
and later also protein flexibility was introduced to model
ligand binding behavior in real proteins1 1 4 - 1 1 8
.
32
2.1.1 Sampling Methods
Matching algorithm (MA). Both ligand and receptor active site are represented by
pharmacophores, physical and chemical features. Complementary pharmacophores are
placed to match and energy of the conformation is evaluated (Figure 3). MAs are fast and
are used for enrichment of large libraries. MA is implemented in SAN DOCK1 1 9
or FLOG1 2 0
.
Figure 3. Scheme of the matching algorithm. Individual pharmacophores are represented by different color.
MA places ligand into the active site of the protein so the corresponding colors match.
Incremental construction (IC). The ligand is divided between single bonds into individual
fragments and one of these fragments (usually the larger one) is docked into the active site
as an anchor. The rest of the fragments are added subsequently in different orientations.
IC is implemented in Dock 4.01 2 1
, SLIDE1 2 2
or Hammerhead1 2 3
.
Stochastic methods. The search in the configuration space is done randomly by
modification of ligand conformation or its position in the active site. The main two
stochastic methods are Monte Carlo and genetic algorithm1 1 1
.
Monte Carlo (MC) randomly places ligand into the active site. The conformation of the
ligand is then changed and the newly obtained binding mode is evaluated for the energy.
Metropolis acceptance criterion is used to decide if the new conformation shall be
accepted. If the energy is lower the conformation change is accepted. If the energy is higher
it is either accepted or rejected based on energy penalty compared to the random value
within some interval. This process is iterated until the predefined number of steps is
reached. MC was used in first versions of AutoDock1 2 4
, QXP1 2 5
or ICM1 2 6
.
33
Genetic algorithms (GA) are inspired by Darwin's evolution theory. Different ligand
conformations become subjects of one of two operations - mutation (rotation on single
bond) or crossover (exchange of ligand parts between the pairs). All conformations are
evaluated by scoring function and the favorable are used in the next generation (Figure 4).
GA is used in AutoDock41 2 7
, DARWIN1 2 8
, or GOLD1 2 9
.
Chromosome
Parents Children
Crossover
point"
Translation —•
Orientation — »
•*— Mutation
E = +2.2 kcal/mol E = -22.8 kcal/mol E = -13.3 kcal/mol E = +295 kcal/mol
Removed
Selection of the
next generation
101 -19
171 134
175 95
69 179
86 86
-61 -61
-77 -77
x, y, z x,y,z
0, 0. V 0, 0, V
Removed
Next Generation
Figure 4. Scheme of a genetic algorithm. Conformations from parent generation undergo a rotation of torsion
angle (mutation) or exchange part of the structure with different conformation (crossover). New
conformations with favorable energy pass into the next generation. Adopted from1 0 9
.
2.1.2 Scoring Functions
The sampling algorithms are usually able to find a binding mode which is close to the
experiment but it is not easy to distinguish it from non-relevant modes. Scoring functions
enable to evaluate binding energy of individual protein-ligand configurations. Scoring
functions can be divided in force-field-based, empirical and knowledge-based1 3 0
.
Force-field-based scoring functions. These scoring functions use classical molecular
mechanics force-fields1 3 1
. The binding free energy is typically computed as a sum of non-
34
bonded interactions (van der Waals and electrostatic) between the binding partners. Later,
they have been supplemented by solvation energy or entropy terms. The advantage is that
improvements can be expected with further development of modern force-fields. On the
other hand, the calculation is still slow and the results are not significantly better compared
to less demanding scoring functions.
Empirical scoring functions. The binding energy is decomposed into components like
hydrophobic interaction, hydrogen bonds, binding entropy, etc.1 1 1
. Individual terms are
weighted by coefficients obtained from regression analysis of experimentally determined
binding affinities of protein-ligand complexes. Empirical scoring function energy terms are
easy to calculate, therefore speed is the main advantage of this approach. On the other
hand, energy evaluation can fail if similar cases were not present in the training dataset.
Knowledge-based scoring functions. Crystal structure of protein-ligand complexes from
PDB are statistically analyzed for pairwise interatomic distances. If a particular atom
interaction occurs more frequently than in random distribution, the interaction is
favorable. The advantage is computational simplicity and unlike the other methods,
knowledge-based functions take into consideration all protein-ligand interactions.
Methods are sometimes combined with solvation and entropy terms to increase the
predictive power1 3 2
.
2.2 Molecular Dynamic Simulations
Molecular dynamics (MD) is a computational method which characterizes the timedependent
behavior of a macromolecular system. Molecular movements are closely
connected to different molecular properties, and therefore MD has become one of the
most used computational techniques applied to various protein problems. It can be used
for studying protein stability1 3 3 - 1 3 5
, activity (see chapter 6 ) 1 3 6 - 1 3 8
, folding1 3 9 - 1 4 1
,
unfolding1 4 2
-1 4 3
, protein hydratation (see chapter 7)1 4 4
-1 4 5
, substrate binding1 4 6
'1 4 7
, product
release1 4 8 , 1 4 9
or enantioselectivity (see chapter 8)1 5 0
'1 5 1
.
35
MD uses molecular mechanics force fields and Newton's equations of motion to calculate
changes in protein structure during the small time steps1 5 2
-1 5 3
. First of all, input data has to
be selected (Figure 5). Initial coordinates are obtained from crystallography, NMR
spectroscopy or homology modeling. Initial velocities are selected from MaxwellBoltzmann
distribution. Potential energy surface of the molecule is described by chosen
force field. An example of the force field description is in Equation 10.
Structure
Coordinates and velocities
Force Field
Definition on potential function
Compute forcesCompute forces
Solve Newton's
equations of motion
Update coordinates,
velocities and energies
Update coordinates,
velocities and energies
T
to
04-1
Finish calculation
Figure 5. Scheme of molecular dynamics simulation.
u=J2 \hir - rQf + \ue - Oof + T [ 1 + c o s ( w
^ bonds
angles
+ ^ + E 4
^ 7
torsions
4
improper LJ
r 6
elec
€
0r
ij
Equation 10
36
where the first four terms (bond stretching, angle bending, dihedral and improper torsions)
describe energy contributions of bonded interactions and the last two terms (van der Waals
and electrostatics) of non-bonded interactions.
The simulated molecule is inserted into the box and the whole system is solvated by explicit
water molecules1 5 2
'1 5 3
. Periodic boundary conditions (replication of the box in each
direction) are applied to simulate bulk water with a relatively small number of water
molecules. Each molecule, which leaves the box, is substituted by another molecule from
the neighboring box.
Potential energy and individual forces affecting every atom in a molecule are calculated
(Equation 10). Bonded terms are easy to calculate and their number is proportional to a
number of atoms. The most time-consuming part is the evaluation of non-bonded terms
where the complexity rises as the square of the number of atoms. Therefore, several
approximations are used to decrease computational demands.
Van der Waals interactions use minimum image convention where each atom can interact
with only one copy of all atoms. Further, the distance cutoff (at least 9 A) is applied above
which the forces are neglected1 5 4
. A switching function can be used for a smooth approach
towards zero energy value.
Electrostatic interactions are long-ranged and thus cannot be turned-off completely.
Particle Mesh Ewald (PME) method1 5 5
is used which divides electrostatics into short-range
(real space) and long-range (reciprocal space). The long-range charges are then converted
into a grid of density values. The potential is calculated for density grids, and forces on a
particle are applied according to the position of the grid cell and position in the cell. This
lowers the complexity to N logN. PME becomes a standard for solving long-range
interactions but also other methods can be used in particular situations e.g. Isotropic
periodic sum1 5 6
or Pressure-based long-range correction1 5 7
.
Forces applied on each atom are obtained by calculating all bonded and non-bonded
interactions1 5 2 , 1 5 3
. Newton's motion equations are deterministic which means that we are
37
able to calculate positions of every atom after a time step (At) when we know atom initial
coordinates and velocities. Potential energy is a complicated function (Equation 10) of all
atom positions and the equations of motion do not have the analytical solution. Therefore,
numerical solutions are employed using particular integrator (Leap-frog1 5 8
, Verlet1 5 9
,
Beeman1 6 0
algorithms).
2.3 Hybrid Quantum Mechanics/Molecular Mechanics
Modeling of the enzymatic reactions is a difficult task. Classical MD simulation has a fixed
topology and it is not possible to create or cleave chemical bonds using molecular
mechanics (MM)1 6 1
unless a special force-field is employed1 6 2
. On the other hand, quantum
mechanics (QM) is able to describe chemical reactivity and other electronic processes.
However, Q M can only deal with hundreds of atoms since the calculation complexity rises
with N3
or more, depending on the level of used theory (N is a number of atoms).
Application of Q M directly to the biomolecules is not practical, therefore a hybrid QM/MM
method was introduced by Arieh Warshel1 6 3
. It combines advantages of both methods, the
accuracy and ability to simulate chemical reactions from Q M , and the speed of M M . The
region hosting a chemical reaction is handled by Q M (substrate, co-factor, catalytic
residues), whereas the rest of the system (most of the protein and solvent) is treated by
M M (Figure 6).
38
Figure 6. An example of system division into two parts. Q M part (green sticks) consists of a ligand and the
active site residues divided between CA and CB atoms from M M part (grey lines).
Partitioning of the system into Q M and M M parts may be straightforward for noncovalently
bound molecules, but in the case of covalent bonds, the division has to be
treated by introduction of specific parameters1 6 1
. Q M atoms which participate in bond
formation or breaking should not be involved in any bonded M M term. A good place to cut
a bond is a nonpolar single aliphatic bond not involved in any conjugated interactions
(peptide bond is not suitable).
The total potential cannot be obtained simply as the sum of individual parts because both
parts are strongly interacting. Therefore, coupling terms are introduced to describe the
interactions between both parts. A subtractive (Equation 11) or additive (Equation 12)
Q M / M M coupling can be applied1 6 4
.
VTOTAL = VMM(QM + MM) + VQM(QM) - VMM(QM)
Equation 11
where the total potential energy is the sum of the potential of the whole system (QM+MM)
treated by the M M force field and Q M part described by Q M theory. The force field
evaluation of QM part is subsequently subtracted.
VTOTAL = VQM(QM) + VMM(MM) + V0M_MM(QM + MM)
Equation 12
39
where, compared to the Equation 11, only M M part is treated by the force field and the
correction between QM and M M parts is treated explicitly at various sophistication level.
The boundary is described by special interactions1 6 1
: i) link-atom schemes introduce
additional hydrogen atom which in reality is not present in the system, ii) boundary-atom
schemes exchange first M M atom for special one which can be found in both QM and M M
calculations and iii) localized-orbital schemes introduce a frozen orbital on the Q M / M M
interface replacing the cut bond.
40
3 Model Proteins
3.1 Haloalkane Dehalogenase DhaA
Haloalkane dehalogenases (HLDs) are mostly bacterial enzymes (EC 3.8.1.5) catalyzing
hydrolytic cleavage of the carbon-halogen bond in chlorinated, brominated and iodinated
alkanes, cycloalkanes, alkenes, esters, ethers, alcohols, amides or acetonitrils1 6 5 - 1 6 7
. Many
of the halogenated compounds are important environmental pollutants with toxic or
genotoxic effect on human. Therefore, the ability to degrade these substances
predetermined them for application in bioremediation1 6 8
, warfare agents detoxification1 6 9
,
biosensing1 7 0
, synthesis of optically pure compounds1 6 6
and cell imaging1 7 1
. HLDs are wellcharacterized
enzymes with reasonable stability and extensively studied reaction
mechanism. A wealth of mechanistic information, together with broad applicability, makes
HLDs ideal model enzymes for enzymology or protein engineering studies.
DhaA is HLD isolated from Rhodococcus rhodochrous NCIMB 130641 7 2
and together with all
the others HLDs belongs structurally to the a/|3-hydrolase fold1 7 3
. DhaA consists of two
covalently bound domains (Figure 7A). The main domain is composed of the conserved
eight-stranded (3-sheet surrounded by six a-helices and it is connected a-helical cap domain
by two loops. The buried active site cavity is situated on the interface of both domains and
is connected with the solvent by the main tunnel leading through the cap domain. Catalytic
residues of DhaA, so-called catalytic pentad (Figure 7B), are situated on the main domain
and is conserved within the whole HLD-II subfamily1 7 4
. In DhaA, it consists of the catalytic
triad (aspartate 106, glutamate 130, histidine 272)1 7 5
and two halide-stabilizing residues
(asparagine 41 and tryptophan 107)1 7 6
.
41
Figure 7. The structure of haloalkane dehalogenase DhaA and the catalytic pentad. A) The main domain is
represented in gray and cap domain in black color. The active site containing the catalytic pentad is situated
between both domains (red spheres). B) The arrangement of catalytic residues within the active site.
Dehalogenation reaction involves the two-step catalytic mechanism (Figure 8), where the
first step is the bimolecular nucleophilic substitution followed by the hydrolysis in the
second step 1 7 5
-1 7 7
-1 7 8
. initially, the halogenated substrate binds near the catalytic
nucleophile at the position 106 with the leaving halogen atom pointing towards the halidestabilizing
residues. Aspartate 106 performs the nucleophilic attack forming a covalently
bound enzyme-substrate intermediate and the halide ion stabilized by two hydrogen bonds
from the halide-stabilizing residues. The catalytic base (histidine 272) stabilized by the
catalytic acid (glutamine 130) takes a proton from one of the active site water molecules.
Subsequently, the hydroxyl anion hydrolyses the enzyme-substrate intermediate and the
formed alcohol and the halide ion leave the active site.
42
Figure 8. Two-step dehalogenation reaction of DhaA enzyme. In the first step, Sn2 reaction is performed
through nucleophile (Aspl06) attacking the carbon atom carrying the halogen. Leaving halide ion is stabilized
by halide-stabilizing residues and alkyl-enzyme intermediate is formed. Subsequently, the water molecule
activated by catalytic base (His272) hydrolyses the intermediate forming the alcohol product and restores
enzyme to the original form.
3.2 y-Hexachlorocyclohexane Dehydrochlorinase LinA
y-hexachlorocyclohexane dehydrochlorinase LinA is a unique enzyme (EC 4.5.1.B1) isolated
from bacterium Sphingobium japonicum UT26 found in the areas with soil contaminated
by y-hexachlorocyclohexane (HCH, lindane)1 7 9 , 1 8 0
. HCH was used worldwide as an insecticide
for many years before it has been prohibited in the most of the countries as a dangerous
environmental pollutant1 8 1 , 1 8 2
with toxic and potentially genotoxic effect on human health.
The long persistence in the soil together with its toxicity makes HCH an important target
for degradation attempts using LinA in the first and the second step of the HCH degradation
pathway. LinA is followed by LinB, LinC, LinD, LinE, LinF, LinG, LinH and LinJ enzymes which
metabolize the substrate to succinyl-CoA and acetyl-CoA, which are metabolized in the
citrate acid cycle1 8 3
.
LinA is 156 amino acids long dehydrochlorinase belonging to the a+(3 proteins1 8 4
but it
shows very low amino acids identity with structurally similar proteins, e.g., 16 % sequence
identity with scytalone dehydratase1 8 5
. The biological unit is a homotrimer and each chain
forms a cone-shaped barrel fold composed of six-stranded (3-list, four a-helices and the Cterminal
region interacting with the neighboring chain (Figure 9). The catalytic active site,
large enough to accommodate the HCH, is situated inside the barrel of each chain. The
43
active site and the access tunnel are primarily formed by the hydrophobic residues. The
catalytic residues histidine 73 and aspartate 25 form so-called catalytic dyad.
Figure 9. The structure of y-hexachlorocyclohexane dehydrochlorinase LinA. Each of three homomeric units
is highlighted by different color. The active site containing two catalytic residues is represented by red
spheres.
C-terminal regions from the neighboring subunits are proposed to serve as molecular gates
that distinguish between the "open" unbound conformation and the "closed"
conformation. These conformational changes create a hydrophobic environment for the
bound substrate as it is observed in scytalone dehydratase1 8 4
-1 8 6
. The proposed reaction
mechanism is bimolecular elimination E2 reaction (Figure 10). Histidine 73 functions as a
base and abstracts the proton from HCH or y-pentachlorocyclohexene (PCCH). Histidine 73
is activated by aspartate 25 which increases its basicity and stabilizes the positive charge
that develops on the histidine after the deprotonation. The other active site residues are
probably stabilizing transition state or are directing the substrate into proper binding mode
but site-directed mutagenesis showed that they are not essential for the reaction1 8 4
'1 8 7
.
44
11
HCH PCCH TCDN TCB
Figure 1 0 . Generalized reaction mechanism of dehydrochlorinase LinA. Firstly, HCH is converted to
pentachlorocyclohexene (PCCH). In the second step, PCCH is converted again by LinA to tetrachlorocyclohexadiene
(TCDN). TCDN is then metabolized by the second pathway enzyme haloalkane dehalogenase
LinB, followed by other enzymes or in the absence of LinB it undergoes spontaneous conversion to
trichlorobenzene (TCB).
3.3 Fibroblast Growth Factor 2
Fibroblast growth factors (FGF1-23) belong to one of the largest families of protein growth
factors, which can be found in the broad range of organisms from nematodes to human1 8 8 -
1 9 0
. In human, FGFs have important functions in both embryos and adults. In embryonic
development, FGFs regulate many developmental processes like brain patterning,
branching morphogenesis, angiogenesis and limb development. In adults, FGFs show
important roles during wound healing or tissue repair and during regulation of metabolism
and homeostasis. All these functions make FGFs highly attractive for medical,
pharmaceutical or biotechnological applications.
Human fibroblast growth factor 2 (basic fibroblast growth factor, bFGF, FGF-(3, FGF2) is a
pleiotropic regulator of cell proliferation, differentiation, and migration1 9 0
'1 9 1
. Recently, it
has been studied thoroughly for the applications in medicine. Particularly, angiogenic
stimulation by FGF2 can be used in tissue repair and its overexpression in animals showed
cardio-protective function from a heart attack and better regeneration after re perfusion1 9 2
.
The promoted angiogenesis has a positive effect on artery diseases and healing of patients
suffering from ulcers1 9 3
. FGF2 shows increased regeneration of alveolar bone in patients
with periodontitis1 9 4
and upregulation of FGF2 can be used for the treatment of mood
45
disorders1 9 5
. From the biotechnological point of view, FGF2 is an essential component of
culture medium used for cultivation of human embryonic stem cells. It prevents
differentiation of pluripotent cells1 9 6
. The high instability of the protein at 37 °C complicate
cultivation of stem cells. Even daily media replacement does not prevent significant
fluctuation of the protein concentration level. Protein engineering of FGF2 towards
prolonged half-life has a potential to increase the efficiency and decrease the cost of stem
cell cultivation.
Human FGF2 encodes 5 different isoforms whose length depends on subcellular
localization and function1 9 7
. Basic protein is 155 amino acids long starting its translation
classically from AUG codon. Isoforms with higher molecular weight initiate the translation
from non-standard codons. The structure is composed solely of |3-strands comprising 12stranded
antiparallel |3-sheets forming a trigonal pyramidal structure (Figure l l ) 1 9 8
. The
pleiotropic effect can be reached by different binding partners. The positions 13 - 30 and
106 - 115 represent FGF-receptors or heparin-binding sites1 9 9
-2 0 0
.
Figure 1 1 . The structure of human fibroblast growth factor 2 .
46
PART II
RESULTS
47
4 FireProt: Energy- and Evolution-Based Computational Design of
Thermostable Multiple-Point Mutants
David Bednar1 4 Ť
, Koen Beerens1 +
, Eva Šebestová1
, Jaroslav Bendi1
'2
'4
, Sagar Khare3
, Radka
Chaloupková1
, Zbyněk Prokop1 , 5
, Jan Brezovský1 , 4
, David Baker6
, Jiri Damborsky1
'4 , 5
*
1
Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in
the Environment RECETOX, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic.
2
Department of Information Systems, Faculty of Information Technology, Brno University of Technology,
Bozetechova 1/2, 612 66 Brno, Czech Republic.
3
Department of Chemistry and Chemical Biology, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854,
USA.
"International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech
Republic.
5
Enantis, Ltd., Palackeho trida 1802/129, 612 00 Brno, Czech Republic.
6
Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.
f
These authors contributed equally to the work
PLoS Computational Biology, 2015,11, el004556
DOI: 10.1371/journal.pcbi. 1004556
48
4.1 Abstract
There is great interest in increasing proteins' stability to enhance their utility as
biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful,
but experimentally strenuous approach. Computational methods offer attractive
alternatives. However, due to the limited reliability of predictions and potentially
antagonistic effects of substitutions, only single-point mutations are usually predicted in
silico, experimentally verified and then recombined in multiple-point mutants. Thus,
substantial screening is still required. Here we present FireProt, a robust computational
strategy for predicting highly stable multiple-point mutants that combines energy- and
evolution-based approaches with smart filtering to identify additive stabilizing mutations.
FireProt's reliability and applicability was demonstrated by validating its predictions against
656 mutations from the ProTherm database. We demonstrate that thermostability of the
model enzymes haloalkane dehalogenase DhaA and y-hexachlorocyclohexane
dehydrochlorinase LinA can be substantially increased (ATm = 24°C and 21°C) by
constructing and characterizing only a handful of multiple-point mutants. FireProt can be
applied to any protein for which a tertiary structure and homologous sequences are
available, and will facilitate the rapid development of robust proteins for biomedical and
biotechnological applications.
49
4.2 Author Summary
Proteins are increasingly used in numerous biotechnological applications. A key property
determining proteins' applicability is their stability under operating conditions. Natural
proteins can be stabilized by modification of their structure. Methods of molecular biology
allow introduction of modifications - mutations - to the protein structure at will, but it is
not straightforward where to mutate and which amino acid to introduce for better stability.
Computational methods can be used for prediction of stabilizing mutations using
computers. Current computational methods predict libraries of single-point mutations,
which need to be constructed individually, tested and recombined, resulting in non-trivial
experimental effort. Here we present a robust computational strategy for predicting
multiple-point mutants, providing extremely stabilized proteins with a minimal
experimental effort.
50
4.3 Introduction
Proteins are increasingly used in biotechnological applications as therapeutics 5 1
,
diagnostics5 3
, nanomaterials5 1
and biocatalysts5 2
. Despite numerous advantages, the utility
of proteins is frequently restricted by their limited stability under practical conditions, such
as high temperatures, extreme pH, or the presence of organic solvents or proteases. Their
thermostability is usually positively correlated with stability and performance in the
presence of denaturing agents2 0 1
, expression yield2 0 2
, serum survival time2 0 3
and shelflife2
0 4
. Thus, it is a key determinant of proteins' applicability in biotechnological processes.
High temperatures may also be required to prevent bacterial contamination during
enzymatic food processing2 0 5
. Moreover, thermostable proteins can tolerate much larger
numbers of mutations than mesophilic variants and show enhanced evolvability in protein
engineering projects2 0 6
.
Protein engineering is frequently applied to obtain more stable proteins. If successful, such
efforts typically enhance the melting temperature (T"m) of engineered proteins by 2 to
15°C2 0 4
'2 0 7
. Extremely stabilized proteins with even greater increases in melting
temperature (AT"m) have been engineered by incorporating multiple mutations, and several
outstanding increases of up to 35°C have been achieved using directed evolution
methods2 0 5
. However, these methods generally require extensive experiments, including
screening up to 108
colonies of organisms expressing mutant variants to identify stable
constructs, and appropriate high-throughput screening assays must be available2 0 8
. A
currently popular strategy is saturated mutagenesis of hotspots identified by (semi)rational
approaches2 0 4 , 2 0 5 , 2 0 9
, such as the most flexible residues2 0 7
; tunnel-forming
residues2 1 0
; or residues at multimeric interfaces7 1
. The selected hotspots are then
subjected to site-saturation mutagenesis (while leaving the rest of the protein unchanged)
to create smaller smart libraries, markedly reducing the required screening to thousands
of colonies.
A long-sought alternative to screening-based approaches is reliable in silico design of
stability-enhancing mutations. Numerous stable proteins have been computationally
51
engineered via diverse approaches (singly or in combination), e.g., identification of backto-consensus
or ancestral mutations, calculation of changes in folding free energies upon
mutation, introduction of disulfide bridges and elimination of highly flexible
regions2 0 4 , 2 0 5 , 2 0 9
. However, mutants generated using computational methods have rarely
surpassed the 15°C AT"m threshold of outstanding stabilization as a result of neutral,
destabilizing or function-corrupting mutations that were predicted as stabilizing due to
moderate accuracy of these methods2 1 1 , 2 1 2
. To overcome this obstacle and provide
substantial stabilization, predicted mutations are usually introduced by site-directed
mutagenesis and tested individually. The most viable mutations are then recombined in
multiple-point mutants assuming they have additive effects, but this is often invalid due to
antagonistic epistatic effects of individual mutations2 1 3
. For all those reasons, no
computational method capable of directly designing highly stable multiple-point mutants
has been previously published.
Here we introduce a strategy, FireProt, for computationally designing multiple-point
mutants, enabling significant improvements of protein stability with minimal experimental
effort. We demonstrate its power by stabilizing the model proteins haloalkane
dehalogenase (HLD) DhaA and y-hexachlorocyclohexane dehydrochlorinase LinA. The
method's general applicability was further verified by validation against information from
the ProTherm database1 0 3
, demonstrating that it can be used to identify stabilizing
mutations in diverse proteins with known tertiary structures and homologous sequences
allowing phylogenetic analysis, and thus should have broad utility in protein stabilization
projects.
52
4.4 Results
4.4.1 Development of FireProt strategy for design of stabilizing multiple-point
mutants
The FireProt strategyfor protein stabilization is based on combiningthe best multiple-point
mutants obtained from predictions of AAG following mutation from a set of crystal
structures and evolutionary information derived from multiple sequence alignment (Figure
12). Additional pre- and post-processing filters are applied in both approaches to improve
prediction reliability and reduce the required computational effort.
Energy-based
approach
Evolution-based
approach
Target protein
tz
o
£
a.
E
o
Ü
CD
c
CD
CD
Q-
X
LU
Conservation and correlation
analysis
FoldX prediction
Rosetta prediction
Interaction analysis
Antagonistic effect prediction
Multiple-point mutant design
Structure and activity check
Stability determination
Back-to-consensus
analysis
FoldX prediction
Interaction analysis
Multiple-point mutant design
Structure and activity check
Stability determination
Combined mutant
Figure 12. Workflow of the FireProt method. Individual steps involved in the energy- and evolution-based
approaches
Dataset construction. FireProt's capacity to identify the most stabilizing single point
mutations either by energy- or evolution-based approaches was evaluated using the
53
dataset derived from the ProTherm database employing the records with absolute AAG
values > 0.5 kcal.mol"1
. This value was selected since the estimated experimental error for
measurements of AAG is about 0.48 kcal.mol"1 2 1 4
. When multiple AAG values were
available for a single mutation, the value determined closest to the standard experimental
conditions was retained. Ten most mutated proteins from the ProTherm database,
representing all four protein structural classes (Table S 1), were selected to increase the
chance that predicted mutations will be experimentally characterized. The total number of
possible mutations in these 10 proteins was 30,058, but good quality experimental data
were available for only about 2% of the mutations, yielding the dataset of 656 mutations
(119 stabilizing and 537 destabilizing) at 337 different positions (Table S 1).
Optimization of energy-based approach. We evaluated the performance of four prediction
tools: FoldX9 4
, Rosetta8 8
, ERIS89
and CUPSAT2 1 5
, using the ProTherm dataset (Table S 2). The
purpose was to select suitable combination of tools and the thresholds of predicted energy
change upon mutation that would achieve very high precision (the ratio between the truly
stabilizing mutations and mutations predicted as stabilizing) and simultaneously very low
false positive rate (the fraction of destabilizing mutations incorrectly predicted as
stabilizing from all truly destabilizing mutations). The stability change thresholds were
tested in the range between -2.5 and +2.5 kcal.mol"1
with the step size 0.5 kcal.mol"1
.
According to this evaluation, Rosetta and FoldX achieved the highest precision, 76% and
67%, respectively, when using the thresholds of -2 kcal.mol"1
and -1 kcal.mol"1
(Table S 2).
The false positive rates for these thresholds for Rosetta and FoldX were 1% and 2%,
respectively (Table S 2).
Evaluation of energy-based approach. Combination of the prediction by the two best
performing tools using their optimal thresholds with conservation analysis by Rate4Site
tool2 1 6
in the energy-based FireProt approach performed notably better than either
Rosetta or FoldX alone: FireProt achieved 100% precision and 0% false positive rate. Such
high reliability in prediction was achieved at the expense of the number of recognized
stabilizing mutations. Rosetta and FoldX correctly identified 16 and 20 truly stabilizing
54
mutations out of 21 and 30 mutations predicted by these tools as stabilizing (Table S 2).
However, only 8 mutations were agreed upon as stabilizing by both methods following the
FireProt strategy. Conservation analysis discarded 2 false positive mutations since they
targeted immutable positions with high CONSURF conservation grade (> 8)2 1 7 , 2 1 8
. The
remaining 6 mutations selected by FireProt as stabilizing were all true positives. In addition
to 6 mutations included in the evaluation, the FireProt predicted other 101 stabilizing
mutations for 10 most mutated proteins for which, however, experimental data were not
available (Table S 3).
Evaluation of evolution-based approach. In addition to the energy-based approach, the
evolution-based approach of FireProt strategy was also evaluated using the ProTherm
dataset. The back-to-consensus method2 1 9
identified six potentially stabilizing mutations in
the 10 most mutated proteins for which experimental data were available. Four of these
mutations were true positives and two were false positives. The subsequent application of
the FoldX filter (Figure 12) correctly discarded one of the false positive mutations, giving a
precision of 80%.
Evaluation of prediction for multiple-point mutants. The last step in the development of
the FireProt workflow before its application towards a design of thermostable proteins was
to evaluate Rosetta's ability to predict the stability effects for multiple-point mutations. We
collected a consistent dataset of previously measured stability changes in 46 mutants of
DhaA enzyme for this purpose. Using this dataset a correlation of 0.81 between the
predicted AAG and experimentally determined Tm values was observed, suggesting that
Rosetta could be employed for this purpose (Figure S 1).
4.4.2 Design of thermostable haloalkane dehalogenase DhaA
DhaA enzyme was selected as the first model protein due to the wealth of knowledge
available on mutants engineered towards higher thermostability, prolonged half-life and
stability in organic co-solvents that enables quantitative comparison of their performance
with the mutants designed by FireProt2 2 0
'2 2 1
.
55
Energy-based approach. Out of 5,529 possible single-point mutations, 1,919 were at
positions with high evolutionary conservation (CONSURF grade > 8) or exhibiting
evolutionary correlation with other residues (Correlated Mutation Analysis score > 0.8
calculated by Comulator tool2 2 2
), indicating that these positions are functionally important.
All these mutations were discarded to avoid major changes in the enzymes' activity or
substrate specificity. FoldX analysis of AAG for the remaining 3,610 mutations identified
151 potentially stabilizing single-point mutations. Rosetta calculation was then applied to
further decrease the number of potentially false positives among these mutations,
resulting in 22 promising mutations (Table S 4). A mutation disrupting a salt bridge and five
with antagonistic effects (AAG of double-point mutants > -3.0 kcal.mol"1
) were also
discarded, leaving 16 potentially stabilizing mutations. Only the most favorable mutations
at each position were further analyzed for their mutual interactions, leaving a final set of
eight potentially non-antagonistic stabilizing mutations: C128F, T148L, A172I, C176F,
D198W, V219W, C262L and D266F (Figure 13). The Rosetta predicted high stability of the
recombined multiple-point mutant DhaA112 carrying all these mutations (Table 1).
Evolution-based approach. The back-to-consensus approach employing simple consensus
and frequency ratio predictions identified 42 potentially stabilizing substitutions (Table S 5,
Table S 6, Table S 7, Table S 8). Of these, 22 were excluded by FoldX predictions as
potentially destabilizing (AAG > 0.5 kcal.mol"1
), and seven to preserve residues with sidechains
involved in important interactions. In total, 13 back-to-consensus mutations passed
all of the applied filters and were combined into four multiple-point mutants (Figure 13).
These mutations were combined according to their different origin from two applied
consensus techniques using either representative MSA of the HLD-II subfamily or MSA of
whole HLD family (see Methods for more details). DhaAlOO featured the I136L, V184E and
V197E mutations, which were predicted by both simple consensus and frequency ratio
analyses of the representative MSA of the HLD-II subfamily. DhaAlOl contained the
mutations E20S, F80R and A155P, which were predicted solely by the frequency ratio
analysis, while DhaA102 included the mutations L161M, 1162V and D198S, which were
predicted solely by the simple consensus analysis. Finally, DhaA103 contained the V55L,
56
A127V, H188A and E191A mutations, which were predicted by simple consensus analysis
of the representative MSA of the whole HLD family. Interestingly, none of these mutants
was predicted to be more stable than the wild-type by Rosetta (Table 1).
Energy-based mutant Evolution-based mutant Energy-based mutant Evolution-based mutant
Combined mutant
Figure 13. Location of stabilizing mutations in designed enzymes. A) Locations of substitutions in energybased,
evolution-based and combined mutants of DhaA enzyme. Substitutions in the multiple-point mutant
designed by the energy-based approach (DhaA112) are represented as orange spheres, while substitutions
in multiple-point mutants designed by the evolution-based approach are represented as red (DhaAlOO), blue
(DhaAlOl), green (DhaA102) and magenta (DhaA103) spheres. Mutations in the combined mutant (DhaA115)
are colored in orange and blue in correspondence with their original mutants (DhaA112 and DhaAlOl). B)
Locations of substitutions in energy-based, and evolution-based mutants of LinA enzyme. Substitutions in the
multiple-point mutant designed by the energy-based approach (LinAOl) are represented as orange spheres,
while substitutions in multiple-point mutant designed by the evolution-based approach (LinA02) are
represented as blue spheres.
57
Table 1. Characteristics of predicted multiple-point mutants of DhaA
Rosetta DSC Activity at
Method Protein Mutations
AAG 37°Cb
(kcal.mol1
)
Tm CO A7m (°C)
(nmol s_1
mg"1
)
DhaAwt _a _a 49.0 ±0.7 _a 18.0
Energy-based DhaA112
C128F + T148L + A172I + C176F + D198W
+ V219W + C262L+D266F
-32.0 ± 1.4 65.2 ±0.1 +16.2 5.5
DhaAlOO I136L + V184E + V197E 1.1 ±0.3 48.5 ±0.4 -0.6 9.4
DhaAlOl E20S + F80R + A155P 0.8 ±0.1 58.6 ±0.3 +9.6 49.3
Evolution-based
DhaA102 L161M +1162V+ D198S 2.3 ±0.7 48.1 ±0.1 -0.9 9.4
DhaA103 V55L + A127V + H188A + E191A 3.0 ± 1.0 51.4 ±0.1 +2.3 6.5
E20S + F80R + C128F + T148L + A155P
Combined DhaA115 + A172I + C176F + D198W + V219W -32.4 ± 1.0 73.6 ±0.1 +24.6 5.6
+ C262L + D266F
a
not applicable;b
activity determined with 1-iodohexane at 37°Cand pH 8.6; AAG-predicted change in Free Gibbs Energy;
DSC - Differential Scanning Calorimetry; DhaA115 combines mutations of DhaAlOl and DhaA112
Characterization of mutants designed by FireProt. Expression and purification of all
constructed mutants resulted in good yields and protein purity. Far-UV CD spectra of
wild-type and mutants show that none of the mutations caused significant changes in
secondary structure (Figure S 2A) and all tested variants were active with 1-iodohexane at
37°C (Table 1). The constructed variants' thermostability was determined by thermallyinduced
denaturation using DSC (Table 1) and CD spectroscopy (Figure S 2A). A substantial
increase in melting temperatures (A7"m 16.2°C) was observed for variant DhaA112 designed
by the energy-based approach, indicating strong stabilization effects of the introduced
mutations (Table 1). The effect of the consensus substitutions was moderate - only two
tested variants, DhaAlOl and DhaA103, showed positive thermostabilization effects (AT"m
9.6 and 2.3°C), while the mutations in DhaAlOO and DhaA102 had neutral or destabilizing
effects (Table 1). Combining the best energy-based mutant (DhaA112) with the best mutant
identified by the evolution-based approach (DhaAlOl) produced a final mutant, DhaA115.
Effects of the evolution-based substitutions were complementary and additive with those
58
predicted by the energy-based approach, giving an outstanding increase in thermostability:
A r m 24.6°C (Table 1).
Characterization of the final combined mutant. The combined mutant DhaA115 was
characterized in detail, in terms of its thermostability in the presence of organic cosolvents,
half-life at elevated temperature and temperature profile. The Tm determinations
show that the mutations had stabilizing effects in the presence of three organic co-solvents
comparable to those in the pure buffer (AT"m: 20 to 26°C; Figure 14A). The enhanced
thermostability was also reflected in a strongly improved half-life at 60°C (Figure 14B). After
seven days incubation at 60°C, the mutant DhaA115 still retained about 50% of its initial
activity, while the wild-type became inactivated within six hours. Two inactivation phases
were observed for all DhaA variants: rapid initial inactivation followed by a slower decay of
activity (Figure 14B). The wild-type lost around 80% of its activity in the first fast phase,
while the mutations in DhaA115 reduced the loss during the first inactivation phase to only
30%. The rate of inactivation during the first phase was comparable for both wild-type and
DhaA115, while the rate during the second phase was dramatically (100-fold) slower for
the mutant. Similar effects were observed with a previously reported stable variant of
DhaA63 (denoted Dhla8)2 2 1
constructed using Gene Site-Saturation Mutagenesis (Figure S
3). The apparent optimal temperature shifted from 45°C for the wild-type enzyme to 65°C
for the mutant DhaA115, and its specific activity with 1-iodohexane at optimum
temperature was 28% higher, but the shape of the temperature profile remained largely
unchanged (Figure 14C). Steady-state kinetic constants of the two enzymes determined at
their respective suboptimal temperatures with 1-iodohexane revealed comparable
catalytic properties (Table 2).
Table 2. Steady-state kinetic constants of DhaA wild-type and the final mutant D h a l l 5
determined with 1-iodohexane at 37°C and 57°C, respectively, and pH 8.6
Enzyme fcca, (S-1
) Kos (UM) N Ks, (mM)
DhaAwt 2.47 ±0.01 12.00 ± 1.14 1.80 ± 0.04 0.53 ±0.01
DhaA115 2.85 ± 0 . 0 3 5.00 ±2.31 1.89 ±0.01 /Cos
- concentration of substrate at half maximal velocity, feat - catalytic constant, n - Hill coefficient Ks\ - substrate
inhibition constant
59
80
60
u
ü
— 40
20
buffer
B
50%
DMSO
20% 2 0 %
acetone methanol
30 60 90 120
Time (h)
150 180
20 30 40 50 60 70 80
Temperature (°C)
Figure 14. Biochemical properties of DhaA wild-type and the final mutant DhaA115. A) Melting
temperatures of DhaA wild-type (blue) and DhaA115 (red) in the presence of indicated solvents. B) Half-life
of DhaA wild-type (blue) and DhaA115 (red) determined at 60°C and pH 8.6 with the substrate 1-iodohexane.
C) Temperature profiles of DhaA wild-type (blue) and DhaA115 (red) determined at pH 8.6 with the substrate
1-iodohexane.
4.4.3 Design of thermostable y-hexachlorocyclohexane dehydrochlorinase LinA
y-hexachlorocyclohexane dehydrochlorinase LinA enzyme was selected as the second
model protein to illustrate broader applicability of FireProt strategy to other proteins of
60
very different characteristics: (i) LinA is natively homotrimer (DhaA is monomer), (ii) LinA
monomers form a+(3 barrel fold (DhaA possesses a/|3-hydrolase fold), (iii) LinA is mainly
composed of |3-sheets (a-helices and |3-sheets and equally represented in DhaA) and (iv)
LinA is with 156 amino acids two-times shorter (DhaA has 294 residues).
Energy-based approach. Out of 2,689 possible single-point mutations, 1,533 passed the
evolutionary conservation or correlation filter. FoldX analysis of AAG for the remaining
mutations identified 68 potentially stabilizing single-point mutations. Subsequent Rosetta
calculation further decreased the number of promising mutations to 8 (Table S 9). A
mutation D19M disrupting a salt bridge and T133L with antagonistic effect with position
D3 were also discarded, leaving 6 potentially stabilizing mutations at four positions. Only
the most favorable mutations at each position were further analyzed: D3i, S127Y, T133I and
A145H (Figure 14B). The Rosetta predicted high stability of the recombined multiple-point
mutant LinAOl carrying all these four mutations (Table 3).
Evolution-based approach. The back-to-consensus identified 15 potentially stabilizing
substitutions (Table S 10). Of these, 10 were excluded as destabilizing due to FoldX
predictions. Mutation K20Y touches the halide-stabilizing residue and F113Y has a negative
effect on enzyme activity according to Uniprot database. Remaining 3 back-to-consensus
mutations passed all of the applied filters and were combined into three-point mutant:
Y50F, F68W and A131V (Figure 13B).
Table 3. Characteristics of predicted multiple-point mutants of LinA
Method Protein Mutations
Rosetta AAG
(kcal.mol1
)
DSC
Tm (°C) A7m (°C)
Activity at 30°C
(umol s"1
mg-1
)
b
LinAwt _a _a 41.410.1
0.21
(0.12 mM)c
1.91
(0.38 mM)c
Energy-based LinAOl D3I + S127Y +
T133I +A145H
-31.4 62.310.4 +20.9
0.32
(0.12 mM)c
1.29
(0.34 mM)c
Evolution-based LinA02
Y50F + F68W +
A131V
0.1 37.710.2 -3.7
0.17
(0.11 mM)c
ND
a
not applicable; b
activity determined with y-hexachlorocyclohexane at 30°C and pH 8.6;c
initial y-HCH concentration is
given since it affects determined specific activity; AAG - predicted change in Free Gibbs Energy; DSC - Differential
Scanning Calorimetry; ND, not determined
61
Characterization of mutants designed by FireProt. Expression and purification of all
constructed mutants resulted in good protein yields and purity. Comparison of the far-UV
CD spectra of LinA wild-type and its mutants show that none of the mutations caused
significant changes in secondary structure (Figure S 2B). Both LinA variants retained similar
level of specific activity as LinAwt (Table 3) showing that the introduced mutations did not
alter activity negatively. The thermostability of the constructed LinA variants was
determined by thermally-induced denaturation using both DSC (Table 3) and CD
spectroscopy (Figure S 2B). Similar to the energy-based DhaA variant, the energy-based
LinA variant (LinAOl) showed a substantial increase in melting temperatures (A7"m +20.9°C)
again showing the strong stabilization effects of the introduced mutations (Table 3). The
evolution-based mutant (LinA02) showed small decrease in thermostability (AT"m -3.7°C)
indicating that mutations identified by consensus methods are not conserved to preserve
the stability of the enzyme (Table 3). No combined mutant was constructed due to the
absence of stable evolution-based mutations.
4.4.4 Structural basis of mutation effect
Visual inspection of mutant structures coupled with detailed analysis of their individual
energy terms calculated by Rosetta provided indications of the possible structural basis of
protein stabilization by mutations of DhaA115 and LinAOl (Table S 11). These mutations
were introduced to various locations in the protein structure with different types of
secondary structures.
Stabilizing mutations in DhaA. Out of 11 mutations, 3 residues are lining a main transport
tunnel, 3 mutated residues were buried in the protein core and 5 are exposed to solvent
on the protein surface (Figure 13A and Table S 11). 8 mutations identified by the energybased
approach introduce bulkier, more hydrophobic residues (Table S 11) that probably
enhance stability by improving packing of atoms in the protein interior and/or
strengthening hydrophobic interactions. The contributions to stability of the mutations
proposed by the evolution-based approach are more difficult to explain. The A155P
62
mutation (at the fourth most flexible position in the protein structure) could increase
rigidity by introducing proline to the affected loop and account for most of the observed
stability improvement, while effects of the E20S and F80R mutations are probably neutral
or restructuring the charged network on the surface of DhaA (Table S 11).
Stabilizing mutations in LinA. Out of 4 stabilizing mutations, 2 are buried in the protein
core and 2 are exposed on the protein surface (Figure 13B and Table S 11). In
correspondence with observation for stabilizing mutations introduced into DhaA enzyme,
all 4 mutations identified by the energy-based approach for LinA introduced bulkier and
more hydrophobic residues (Table S 11).
4.5 Discussion
The last decade has seen significant advances towards more rational approaches to reduce
the experimental effort required to engineer highly stable proteins (Figure 15 and Table S
12). As a contribution to these efforts we have developed a hybrid strategy integrating
energy-based and evolution-based approaches, with smart filtering of mutations that are
destabilizing or may impair enzymes' functions, enabling the identification of additively
stabilizing substitutions in multiple-point mutants. It is essential to correctly configure all
of the tools used in both the energy- and evolution-based approaches of the FireProt
workflow in order to achieve robust and reliable predictions. Therefore, individual steps of
the workflow were verified using a dataset featuring diverse proteins from the ProTherm
database. The predictions carried out for 656 mutations confirmed the FireProt's precision:
the energy- and evolution-based approaches identified stabilizing mutations with success
rates of 100% and 80%, respectively. Strikingly, only one stabilizing mutation that exceeded
our thresholds was identified by both approaches, suggesting that they are highly
complementary. The potential downside of the stringent conditions imposed to avoid false
positives was that 92% of the available stabilizing mutations were discarded. However, the
remaining correctly identified stabilizing mutations should be more than sufficient to
construct highly stable catalysts (Table S 3).
63
When the energy-based approach was applied to DhaA and LinA enzymes, the removal of
conserved and correlated positions from analysis helped to avoid modification of
structurally and functionally important residues, thereby greatly reducing the number of
possible mutations requiring evaluation by computationally intensive free energy
calculation. Since FoldX computation is about an order of magnitude faster than Rosetta, it
was applied as a pre-filter, further reducing numbers of mutations to be analyzed by
Rosetta. Regarding the prediction of multiple-point mutants, simple recombination of the
most promising mutants could weaken stabilization, since strong antagonistic effects were
detected even at the level of double-point mutants. The thermostability enhancement for
the eight- and four-point mutants predicted by this approach, DhaA112 (AT"m 16°C) and
LinAOl (AT"m 21°C), both exceeded the threshold for outstanding stabilization, although
none of the introduced mutations optimized either hydrogen bonds or charge-charge
interactions. This may be due to sampling limited rotamer libraries during the calculations
and the requirement for both FoldX and Rosetta to unambiguously evaluate selected
mutations as stabilizing. FoldX and Rosetta employ simplified scoring functions and despite
using three protein structures for analysis, only limited protein flexibility is allowed,
implying that it should be possible to supplement mutations proposed by free energy
calculations with beneficial substitutions identified using different principles.
Experimental effort
Computational effort Success rate
Figure 15. Schematic comparison of protein stabilization methods. Examples of representative methods
with their characteristics and success rates are presented in Table S 12.
64
To this end, additional mutations were selected by the evolution-based approach. The
mutations predicted by the back-to-consensus method were filtered by FoldX to discard
mutations proposed due to function-related evolutionary constraints rather than structural
stabilization. This filtering step proved to be very important as over half of the mutations
were discarded as potentially destabilizing. Interestingly, all five multiple-point mutants
DhaA100-DhaA103 and LinA02 were predicted as destabilizing by Rosetta and had to be
tested experimentally. While this prediction was accurate for three of them (DhaAlOO,
DhaA102 and LinA02), the other two mutants (DhaAlOl and DhaA103) were clearly more
stable than the wild-type. This result suggests that some underlying principles important
for stability detected by the back-to-consensus method are not captured by the applied
Rosetta protocol. We speculate that these may include larger backbone rearrangements,
interactions with ions present in the solvent, or other entropic contributions that are not
well accounted for in the current protocols. Experimental characterization of these
mutants by microcalorimetry, temperature-jump stopped-flow and protein crystallography
is currently on-going in our laboratory. Despite its lower reliability, the evolution-based
approach should still be considered as useful supplement to the energy-based approach,
potentially enabling further improvement in the stability of designed proteins. The final 11point
mutant DhaA115 arising from this hybrid prediction strategy is one of the most stable
HLD protein known to date (A7"m > 24°C)2 2 0
<2 2 3
.
We have compared our strategy against several methods providing exceptional protein
stabilization (Table S 12). The experimentally intensive protocols of directed evolution and
hot-spot predictions can provide engineered enzymes with comparable enhancement.
However, since their success rate is generally below 1%, stable proteins can only be
obtained after extensive screening. Notably, two of these studies also focused on
improving stability of the enzyme DhaA. In one, an eight-point mutant DhaA was obtained
with a A7"m of 18°C after screening all 121,000 possible variants2 2 1
. We have obtained a
clearly superior enzyme after experimental evaluation of as few as six mutants, highlighting
the importance of removing mutations with antagonistic and uncertain stabilizing effects.
In the other study performed with DhaA, four hotspots in an access tunnel were
65
experimentally randomized, requiring experimental screening of 5,000 mutations2 2 0
, and
the AT"m forthe best four-point mutant was 19°C.
Highly stable proteins have been obtained by in silico prediction of stabilizing effects of
single-point mutations in four recently published studies8 5
'2 1 1 , 2 2 4 , 2 2 5
. In one, 67 variants of
epoxide hydrolase with mutations identified as potentially stabilizing by the FRESCO
method were experimentally tested, 24 were reportedly more stable than the parent
protein, and the variant with the best permutation of mutations had remarkably enhanced
thermostability (AT"m 36°C)2 2 5
. Much of this enhancement arose from disulfide bridges at
the dimer interface, making this approach particularly suitable for multimeric proteins. In
another of the studies, four out of six engineered methionine aminopeptidases designed
by the Rosettavip method were found to be stabilizing and a combined five-point mutant
reportedly had a AT"m of 18°C85
. The authors noted that their final construct is still less stable
than the most thermostable native aminopeptidases and that the method is particularly
effective for mutagenesis of buried residues around internal cavities. In the other study, a
12-point mutant of Tobacco 5-epi-aristolochene synthase was generated using the SCADS
method with an impressive AT"m (45°C), but at the expense of 98% of catalytic activity at the
optimal temperature2 2 4
. In comparison to the methods applied in these and other relevant
studies (Table S 12), FireProt affords a reduction of experimental screening effort due to
robust identification of stabilizing mutations and ensuring their additivity. In addition, it has
promising applicability to diverse proteins, potentially all proteins with known tertiary
structure and homologous sequences, due to the diverse locations of introduced mutations
and universal applicability of underlying principles.
In summary, the presented hybrid strategy FireProt affords rapid design of stable proteins.
Consideration of the additivity of identified potentially beneficial mutations enables
prediction of multiple-point mutants with significantly enhanced stability. Despite a
dramatic reduction in experimental effort, the workflow provided two proteins with
outstanding stability. One of them a HLD with greater thermostability than all known HLD
enzymes, either obtained from thermophilic organisms or engineered using extensive
66
combinatorial screening. Furthermore, owing to the smart filtering, this strategy is
affordable by users with limited access to powerful computer facilities. In addition,
implementation of the FireProt strategy in the web-based protein engineering tool Hotspot
Wizard6 9
is currently on-going in our laboratory to ensure user convenience.
4.6 Methods
4.6.1 Bioinformatics analysis
Construction of multiple sequence alignments. Sequences of six experimentally
characterized HLDs - DhaA, LinB, DrbA, DmbC, DhIA and DmbB - or the sequence of LinA
were used as queries for PSI-BLAST2 2 6
searches against the nr NCBI database (version July-
2009 and May-2015, respectively)2 2 7
, with threshold f-values of 10"1 0
and 10"1 5
forthe initial
BLAST search and inclusion of a sequence in the position-specific matrix, respectively.
Sequences collected after three PSI-BLAST iterations were clustered by CD-HIT2 2 8
at 90%
identity threshold. The resulting dataset (including 8,226 sequences for DhaA and 946 for
LinA) was clustered with CLANS2 2 9
using default parameters and varying threshold P-values.
Sequences clustering with query at the P-value of 10"2 9
were extracted and aligned with
MUSCLE2 3 0
. All artificial, incomplete or divergent sequences were removed. Final multiple
sequence alignment (MSA) of LinA contained 13 sequences. The prepared MSA of the HLD
protein family comprised 168 sequences. All sequences (47) belonging to the HLD-II
subfamily were then extracted to create a HLD-II subfamily dataset and aligned with
MUSCLE. To reduce possible bias toward highly similar sequences, UniqueProt2 3 1
was used
(with a HSSP cut-off value of 40) to select representative sequences from both datasets.
The created representative HLD family and HLD-II subfamily datasets comprised 87 and 27
sequences, respectively. Each representative dataset was then aligned with MUSCLE2 3 0
.
Analysis of evolutionary conservation and correlation. The MSA of the whole HLD-II
subfamily or the MSA of LinA was used to estimate the level of conservation at individual
positions. Normalized evolutionary rates for each amino acid site of the MSA were
67
calculated by the Bayesian method implemented in Rate4Site v2.012 1 6
with the WAG
evolutionary model2 3 2
and maximum likelihood optimization of branch lengths using a
gamma model with four discrete categories. Calculated normalized evolutionary rates were
converted to CONSURF conservation grades2 1 7 , 2 1 8
. Positions with a grade > 8 were
considered immutable. The MSA of the whole HLD family or MSA of LinA was used to
identify positions with correlated mutations (and a threshold Correlated Mutation Analysis
score > 0.8), by applying the 3DM database's Comulator online tool2 2 2
.
Back-to-consensus analysis. Back-to-consensus mutations were selected by analyzing both
representative MSAs of HLDs and the MSA of LinA by simple consensus and frequency ratio
approaches2 1 9
. Residues from poorly aligned regions (DhaA residues 1-18, 131-179 and
279-293 from the HLD family MSA, and residues 1-14 from the HLD-II subfamily MSA) were
excluded from the analysis. The simple consensus analysis was performed using the
consensus cut-off of 0.5, meaning that a given residue must be present at a given position
in at least 50% of all analyzed sequences to be assigned as the consensus residue. Two cutoffs
were simultaneously applied in the frequency ratio analyses: a frequency cut-off of 0.2
for the maximum allowed ratio between target and conserved residue frequencies, and a
minimal frequency of 0.4 for the most conserved residue.
Compilation of a validation dataset. The validation dataset was compiled from ProTherm
records that include experimentally determined differences between the Gibbs free
energies of folding for mutant proteins and the corresponding wild type (AAG). To increase
the reliability of the analysis, only records with absolute AAG values of > 0.5 kcal.mol"1
were
included; the experimental error for measurements of AAG is estimated to be about
0.48 kcal.mol"1 2 1 4
. When multiple AAG values were available for a single mutation, only the
value determined under the experimental conditions closest to the physiological pH of 7
was retained. The dataset was also limited to mutations in the ten most mutated proteins
in ProTherm. Multiple sequence alignments for each protein in the dataset were
constructed using a protocol similar to that applied in the analysis of the model enzyme
DhaA. However, an automatic multi-step procedure was developed to circumvent the need
68
to manually select PSI-BLAST queries. First, the name of the relevant protein family was
found in the SCOP database2 3 3
. Then, all members of the same protein family were
clustered by CD-HIT using an identity threshold of 90%. Finally, up to five representatives
of each resulting cluster were selected at random to constitute the set of PSI-BLAST queries.
4.6.2 Molecular modeling
Preparation of protein structures. Crystal structures of wild-type DhaA (PDB ID: 1BN6,
1BN7 and 1CQW) and wild-type LinA (PDB ID: 3A76) were downloaded from the RCSB PDB
database2 3 4
. PyMOL v l . 4 2 3 5
was used to model substitutions A172V, I209L and G292A in
the crystal structures of DhaA to ensure their correspondence with DhaA from
Rhodococcus rhodochrous (Gl number 3114657). The crystal structures were then prepared
for predictions by removing ligands and water molecules. Missing side-chain atoms were
added by the <RepairPDB> module of FoldX 3.09 4
. Repaired structures were minimized by
the minimize_with_cst module of Rosetta8 8
, with: both backbone and side-chains
optimization enabled (-sc_min_only false), distance for full atom pair potential set to 9 A
(-fa_max_dis 9.0), standard weights for the score function and a constraint weight of 1 (--
constraint_weight 1.0). Output from the minimization was used to create a constraint file
by script convert_to_cst_file.sh.
Prediction of stability effects by FoldX. Stability effects of all possible single-point
mutations were estimated using the <BuildModel> module of FoldX. Calculations were
performed five times for each mutation following the recommended protocol (pH 7,
temperature 298 K, ion strength 0.050 M, VdWDesign 2). Mutations with predicted AAG
averaged over all three analyzed structures smaller than -1.0 kcal.mol"1
were considered as
stabilizing, while a tighter criterion was applied for detecting potentially destabilizing
mutations (AAG > 0.5 kcal.mol"1
).
Prediction of stability effects by Rosetta. Protocol 16 incorporating backbone flexibility
within the ddg_monomer module of Rosetta was applied according to Kellogg et al.8 8
. The
soft-repulsive design energy function (-soft_rep_design weights) was used for repacking
side-chains (-sc_min_only false). Optimization was performed on each whole protein
69
without distance restriction (-local_opt_only false). The previously created constraint file
was used during backbone minimization (-min_cst true). Three rounds of optimization with
increasing weight on the repulsive term (-ramp_repulsive true) were applied. The
minimum energies from 20 and 50 iterations were used as the final parameters describing
the stability effects of single- and multiple-point mutations, respectively. Mutations with
AAG averaged over all three analyzed DhaA structures or three chains of LinA < -2.0
kcal.mol"1
were considered as stabilizing. The additivity of stabilizing mutations was
evaluated by predicting the stability of variants with all pairs of potentially stabilizing singlepoint
mutations. Mutation pairs for which the respective double-point mutants showed
predicted AAG > -3.0 kcal.mol"1
were considered as antagonistic. The cumulative mutants
were then prepared by combining mutually additive single-point mutations starting with
the most stabilizing mutation. If there were more than one stabilizing mutations at the
same position, the most favorable mutation was used.
Analyses of interactions. Selected mutations were visually analyzed in PyMOL. All three
crystal structures of DhaA and three chains of LinA were analyzed for the presence of sidechains
involved in intra-molecular salt bridges by the ESBRI server2 3 6
and additional intramolecular
interactions using the Protein Interactions Calculator server2 3 7
. An interaction
had to be present in at least one of the analyzed structures or chains to be considered as
important.
4.6.3 Construction of mutants and biochemical characterization
Subcloning. All reagents and primers were purchased from Sigma-Aldrich unless otherwise
specified. Restriction enzymes, T4 DNA ligase and accompanying buffers were purchased
from New England Biolabs. Genes encoding tested Rhodococcus rhodochrous DhaA
(Uniprot: P0A3G2) mutants were cloned in the pET21b (EMD Biosciences) (DhaAwt, 101,
110-112,115-116) or pAQN vector2 3 8
(DhaA63,100-103) using the restriction enzyme pairs
Nde\/Hind\\\ or BamH\/Hind\\\, respectively, followed by ligation with T4 DNA ligase
according to the supplier's protocol. The plasmid pET28-LinAwt encoding the wild-type LinA
from Sphingomonas paucimobilis UT26 (Uniprot: P51697) was a gift from Dr. Yuji Nagata2 3 9
.
70
The genes encoding for the mutants were cloned in the pET28b (EMD Biosciences) using
the restriction enzymes Nde\ and fcoRI, followed by ligation with T4 DNA ligase (Promega)
according to the supplier's protocol. Correct integration of the genes was verified by
sequencing (GATC Biotech) and analyzed using Clone Manager Professional (Sci-Ed
Software) and BioEdit (Ibis Biosciences) software. DhaA and LinA variants were expressed
with a C-terminal and N-terminal HiS6-tag, respectively, to facilitate purification.
Enzyme expression and purification. The HiS6-tagged DhaA and LinA mutants were
overexpressed in Escherichia coli BL21 (DE3) cells as previously described2 3 8
. Proteins were
purified using Ni-NTA Superflow Cartridges (Qiagen) and a previously described method2 4 0
.
Protein concentration was determined by assays with the Bradford reagent (SigmaAldrich).
The purity of purified proteins was checked by sodium dodecyl sulfate
polyacrylamide gel electrophoresis (SDS-PAGE) followed by Coomassie Brilliant Blue R-250
staining.
Enzyme activity assays with DhaA. Reaction mixtures containing 12 pL of substrate in 12
ml 100 mM glycine buffer (pH 8.6) were preheated at 37°C for 30 min, 240 pi of purified
enzyme (0.4-1.0 mg.mL"1
) was added to initiate the reaction, and it was monitored by
withdrawing samples at periodic intervals (0-30 min). The samples were immediately mixed
with 35% nitric acid to terminate the reaction. The released halide ions were measured
spectrophotometrically at 460 nm after reaction with mercuric thiocyanate and ferric
ammonium sulfate2 4 1
. Dehalogenating activity was quantified as the rate of halide product
formation per unit time. Temperature profiles of DhaAwt and DhaA115 were evaluated by
measuring their activity, as described above, at temperatures ranging from 20°C-75°C in
three independent replicates. The operational stability was evaluated by measuring
residual activity after incubating 1 mL enzyme samples (1.0 mg.mL"1
) at 60°C in a heat block
(Biosan Pst-100 HL).
Residual activity was determined using a Microlab StarLet Manuload Liquid Handling Robot
(Hamilton). Forthe residual activity measurements, 1 mLof 100 mM Glycine buffer(pH 8.6)
with 1 pL 1,2-dibromoethane was incubated at 37°C for 30 min then 50 pL of heat-treated
71
enzyme solution was added to start the reaction (DhaAwt 0.1 mg.mL"1
, DhaA115 and
DhaA63 1.0 mg.mL"1
). Samples (100 pL) were taken before enzyme addition (0 min) and
after 5, 10 and 15 min reaction time. Samples were then transferred to wells of a MTP
microplate containing 10 pL 35% H3NO4 for inactivation. After all samples had been
collected halide product was detected as described earlier. OD460nm was then measured
using a Sunrise microplate reader (Tecan). Dehalogenation activities were quantified by the
slope of the regression between the product concentration and time.
Enzyme activity assay with LinA. The activity of the LinA variants was tested with vhexachlorocyclohexane
(y-HCH) at 30°C and analyzed using GC-MS. Saturating y-HCH
substrate mixtures in 1 ml 100 mM glycine buffer (pH 8.6) were preheated at 30°C for 30
min. 10 pi of purified enzyme (8.9-28.8 pg.mL"1
) was added to initiate the reaction that was
monitored by withdrawing 1 pL samples at 15 min periodic intervals (0-75 min). The
samples were immediately analyzed by GC-MS. Gas Chromatograph equipped with the
PAL robotic tool change system enabled fully automatized sample preparations, organic
extractions and analysis. The consumption of particular substrates was quantified using gas
chromatography (Trace 1300, Thermo Scientific, USA) equipped with capillary column TGSQC
(30m x 0.25mm x 0.25pm, Thermo Scientific, USA) and connected with mass
spectrometer (ISQ™ LT Single Quadrupole, Thermo Scientific, USA). The 1 pi samples were
injected into split-splitless inlet at 250 °C, with split ratio 1:50. The temperature program
was isothermal at 40 °C for 1 min, followed by increase to 250 °C at 20 "C.min"1
and hold
for 4 min. The flow of carrier gas (He) was 1 ml.min"1
. The MS was operated at SCAN mode
(30 to 320 amu). The temperature of ion source and GC-MS transfer line was 200 °C and
250° C, respectively. Dehydrochlorination activities were quantified following the decrease
in concentration of y-HCH overtime.
Steady-state kinetic measurements. Substrate to product conversion by the action of
DhaAwt and DhaA115 was monitored by using the isothermal titration microcalorimeter
VP-ITC (MicroCal, Piscataway, USA) at 37°C and 57°C, respectively. The substrate 1iododohexane
was dissolved in 100 mM glycine buffer (pH 8.6) and the solution was
72
allowed to reach thermal equilibrium in the reaction cell (1.4 ml). The reaction was initiated
by injecting 10 u.1 of enzyme solution containing either 22 u.M DhaAwt or 24 u.M DhaA115
into the reaction cell. Enzymes were dialyzed overnight against the same glycine buffer.
The measured rate of heat change was assumed to be directly proportional to the velocity
of the enzymatic reaction according to the Equation 13
where AH is the enthalpy of the reaction, [S] is the substrate concentration, and V is the
volume of the cell. AH was determined by titrating the substrate into the reaction cell
containing the enzyme. Each reaction was allowed to proceed to completion. The
integrated total heat of a reaction was divided by the amount of injected substrate. The
evaluated rate of substrate depletion (-d[S]/dt) and corresponding substrate
concentrations were then fitted by nonlinear regression to kinetic models using Origin 6.1
(OriginLab, Massachusetts, USA).
Circular dichroism (CD) spectroscopy. CD spectra (190 to 260 nm) were obtained from
samples of the purified enzymes (0.20-0.25 mg.mL"1
in 50 mM phosphate buffer, pH 7.5, in
a 0.1 cm quartz cuvette) to confirm their correct folding, using a Chirascan CD Spectrometer
equipped with a Peltier thermostat (Applied Photophysics). Each presented spectrum is the
baseline-corrected average of 5-10 scans. Mean residue ellipticity (6MRE) was calculated
using Equation 14.
dQ
dt
= -AHV
d[S]
dt
Equation 13
0 MRE
n.c.l
Equation 14
73
where Q0bs is observed ellipticity in degrees, M w is the protein molecular weight, n is
number of residues, / is the cell path length (0.1 cm), c is the protein concentration and the
factor 100 originates from conversion of the molecular weight to mg.dmol"1 2 4 2
.
Thermal unfolding of the enzyme variants was followed by monitoring the ellipticity at 222
nm over the temperature range of 20°C to 90°C, with a resolution 0.2°C, at a heating rate
at l°C/min. Recorded denaturation curves of tested enzyme were fitted to sigmoid curves
(Boltzmann) using OriginPro8 software (OriginLab, Massachusetts, USA). The melting
temperatures (T"m) were evaluated from the collected data as a midpoint (xo) of the
normalized thermal transition.
Differential scanning calorimetry (DSC). Melting temperatures of the purified enzymes
were determined by monitoring their heat capacity in solution (1.0 mg.mL"1
) in 50 mM
aqueous phosphate buffer (pH 7.5) and in the presence of three cosolvents: 20% acetone,
20% methanol and 50% DMSO (v/v). The measurements were acquired, after degassing, at
temperatures from 20 to 100°C using the VP-capillary differential scanning calorimetry
(DSC) system (MicroCal) and a l°C.min"1
heating rate. The melting point of each protein
was determined as the temperature at which the heat capacity curve peaked2 4 3
.
Acknowledgements
We kindly thank Dr. Yuji Nagata (Graduate School of Life Sciences, Tohoku University,
Japan) for providing the pET28-LinAwt plasmid.
74
4.7 Supporting Information
2 0
1 0
-20 1 1 1
-10 -5 0 5 10 15 20
Figure S 1 . Correlation between ATm and AAG values of 46 DhaA mutants. The experimentally characterized
homogenous set of DhaA mutants2 2 0 , 2 3 8
employed during validation of the Rosetta approach is shown as blue
squares. The red line represents the trend in the experimentally characterized mutants (correlation
coefficient, 0.81).
75
A
TJ'C]
DhaAwt (50.3±0.4°C)
DhaAlOO (51.5 ± 0.5'C)
DhaAlOl (57.0± 0.5'C)
DhaA102 (49.2±0.2"C)
DhaA103 (53.6 + 0.5T)
DhaA112(S5.5 ±0.1°C)
DhaA115(71.8±0.2"C|
190 200 210 220 230 240 250 260
A (nm)
LinAwt (40.0 + 0.9"C)
LinAOl (S8.9 + 0.8*C)
LinA02 (36.3 + 0.3°C)
265
Figure S 2. Far-UV CD spectra of the tested mutants and determined melting temperatures. A) Variants of
haloalkane dehalogenase DhaA. B) Variants of y-nexachlorocyclohexane dehydrochlorinase LinA. The melting
temperatures (Tm ) were evaluated as midpoints of the normalized thermal transitions.
76
O 30 60 90 120 150 180
Time (h)
Figure S 3. Half-life of DhaA63 determined at 60°C in 50 mM phosphate buffer pH 7.5.
Table S 1. Composition of single-point mutation dataset derived from ProTherm database.
PDB
Protein Organism
Structural Number of mutations Number of
ID
Protein Organism
class Total Stabilizing Destabilizing positions
2LZM Lysozyme Bacteriophage T4 a+ß 155 25 2LZM Lysozyme
1BNI Barnase
Bacillus
amyloliquefaciens
a+ß 124 4 1BNI Barnase
1LZ1 Lysozyme Human a+ß 85 19 1LZ1 Lysozyme
1VQB Gene V Bacteriophage f l all ß 60 6 1VQB Gene V
2CI2
Chymotrypsin
inhibitor
Barley a+ß 56 3 2CI2
Chymotrypsin
inhibitor
1CSP Cold shock protein Bacillus subtilis all ß 40 20 1CSP
Cold shock
protein
2RN2 Ribonuclease HI Escherichia coli a/ß 38 21 2RN2 Ribonuclease HI
IB VC Myoglobin Sperm whale all a 36 5 IB VC Myoglobin
1RN1 Ribonuclease T l Aspergillus oryzae a+ß 33 6 1RN1 Ribonuclease T l
4LYZ Lysozyme Chicken a+ß 29 10 4LYZ Lysozyme
77
Table S 2. Performance of four evaluated prediction tools at different decision thresholds.
Metric Tool
Decision threshold [kcal.mol]c
Metric Tool
2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5
FoldX 0.23 0.25 0.27 0.32 0.39 0.51 0.63 0.67 0.60 0.67 0.50
Precision Rosetta 0.26 0.28 0.31 0.35 0.42 0.49 0.59 0.65 0.71 0.76 0.75
(ratios)a
ERIS 0.29 0.31 0.33 0.34 0.38 0.39 0.40 0.39 0.43 0.36 0.36
CUPSAT 0.21 0.22 0.24 0.27 0.29 0.29 0.27 0.20 0.14 0.09 0.03
FoldX 115/510 114/464 112/408 109/341 93/237 77/151 39/62 20/30 9/15 4/6 1/2
Precisiona
Rosetta 114/438 111/394 108/344 99/281 92/218 74/151 62/105 46/71 25/35 16/21 9/12
(absolute values) ERIS 97/331 89/285 80/240 67/195 57/151 50/127 37/92 27/70 17/40 10/28 8/22
CUPSAT 119/569 118/529 116/479 110/415 98/343 77/266 55/206 28/141 12/86 5/55 1/29
FoldX 0.74 0.65 0.55 0.32 0.27 0.14 0.04 0.02 0.01 0.00 0.00
False positive rateb Rosetta 0.60 0.53 0.44 0.35 0.23 0.14 0.08 0.05 0.02 0.01 0.01
(ratios) ERIS 0.44 0.37 0.30 0.34 0.18 0.15 0.10 0.08 0.04 0.03 0.03
CUPSAT 0.84 0.77 0.68 0.57 0.46 0.35 0.28 0.21 0.14 0.09 0.05
FoldX 395/537 350/537 296/537 232/537 144/537 74/537 23/537 10/537 6/537 2/537 1/537
False positive rateb Rosetta 324/537 283/537 236/537 182/537 126/537 77/537 43/537 25/537 10/537 5/537 3/537
(absolute values) ERIS 324/537 283/537 236/537 182/537 126/537 77/537 43/537 25/537 10/537 5/537 3/537
CUPSAT 450/537 411/537 363/537 305/537 245/537 189/537 151/537 113/537 74/537 50/537 28/537
a
- Precision (true positive/(true positive + false positive)) represents the ratio between the truly stabilizing mutations and mutations
predicted as stabilizing by a given tool
b
- False positive rate (false positive/(true negative + false positive)) represents the fraction of destabilizing mutations incorrectly
predicted as stabilizing by a given tool from all truly destabilizing mutations
c
- The threshold possessing the highest precision and at the same time the highest number of true positives for individual tool is
highlighted
78
Table S 3. Stabilizing mutations selected for the 10 most mutated proteins from ProTherm.
PDB ID Rosetta"
Number of mutations predicted as stabilizing by individual tool
FoldX » Rosetta and FoldX' FireProtd
1BVC 105 119 26 23
1LZ1 44 60 5 4
1VQB 74 62 18 7
2LZM 99 106 18 16
4LYZ 150 39 6 5
1BNI 132 118 19 6
1CSP 31 51 4 2
1RN1 97 207 23 18
2CI2 48 37 10 9
2RN2 111 120 25 17
Total 891 919 154 107
" - Number of Rosetta predictions with AAG < -2 kcal.mol
b
- Number of FoldX predictions with AAG < -1 kcal.mol
c
- Number of predictions with Rosetta AAG < -2 kcal.mol and FoldX AAG < -1 kcal.mol
d
- Number of predictions identified by FireProt using criteria defined in the Methods
Table S 4. Results of the energy-based analysis of DhaA.
Position Residue Mutation
FoldX flflG
[kcal.mol1
]
Rosetta AAG
[kcal.mol1
]
Antagonistic effect Interactions Mutant
20 E Q -1.09 -2.13 C128F -
128 C
F
M
-2.21
-3.48
-8.45
-2.96
- DhaA112
148
W -1.09 -2.65 C128F -
148
T L -1.96 -2.00 - DhaA112
172 A V -1.92 -2.21 C176F -
172
A 1 -2.83 -2.16 - DhaA112
F -2.22 -7.07 - DhaA112
176 c
L
H
M
-2.01
-1.08
-2.51
-5.28
-4.82
-4.24
- -
187 D W -1.37 -2.58 - R190
W -1.36 -4.55 - DhaA112
198 D
F
Y
L
-1.98
-1.85
-1.92
-2.95
-2.75
-2.53
- -
217 N Y -2.38 -2.38 C128F -
219 V W -1.77 -3.04 - DhaA112
262 C
L
M
-1.64
-1.42
-4.93
-2.94
- DhaA112
266 D
Y -2.43 -2.90 C128F -
266 D
F -2.31 -2.41 - -
79
Table S 5. Results of the frequency ratio analysis of the HLD-II subfamily.
Position Residue Frequency Res_TOPa
Freq_TOPb
Frequency ratio FoldX AAG [kcal.mol-1
] Interactions Mutant
20 E 0.07 S 0.44 0.17 0.38 - DhaAlOl
59 H 0.11 G 0.56 0.2 2.36 - -
77 L 0.07 1 0.44 0.17 1.90 - -
80 F 0.04 R 0.44 0.08 0.37 - DhaAlOl
128 C 0.04 F 0.41 0.09 -2.21 L237 -
132 1 0.07 V 0.59 0.12 0.92 - -
155 A 0.07 P 0.44 0.17 -0.84 - DhaAlOl
159 R 0.07 E 0.78 0.1 -0.62 E200 -
163 1 0.07 L 0.7 0.11 -0.37 - DhaAlOO
184 V 0.04 E 0.52 0.07 -0.59 - DhaAlOO
197 V 0.04 E 0.52 0.07 -0.20 - DhaAlOO
200 E 0.04 R 0.59 0.06 -0.59 R159 -
203 W 0.15 L 0.78 0.19 0.67 F152, N207 -
207 N 0.15 R 0.81 0.18 1.88 F152, W203 -
218 1 0.07 V 0.7 0.11 0.56 - -
267 1 0.11 V 0.63 0.18 0.92 - -
278 N 0.07 S 0.44 0.17 1.91 1267, L281 -
a
- T h e most conserved residue at a given position of the multiple sequence alignment
b
- Frequency of the most conserved residue at a given position of the multiple sequence alignment
Table S 6. Results of the simple consensus analysis of the HLD-II subfamily.
Position Residue Frequency Res_TOPa
Freq_TOPb
FoldX AAG [kcal.mol"1
] Interactions Mutant
36 L 0.26 V 0.59 1.26 - -
51 1 0.44 v 0.52 0.72 - -
59 H 0.11 G 0.56 2.36 - -
93 E 0.3 D 0.52 0.69 R122 -
119 N 0.3 H 0.63 -0.80 W115, R122 -
132 1 0.07 V 0.59 0.92 - -
159 R 0.07 E 0.78 -0.62 E200 -
161 L 0.22 M 0.59 0.01 - DhaA102
162 1 0.3 V 0.56 0.48 - DhaA102
163 1 0.07 L 0.7 -0.37 - DhaAlOO
169 1 0.37 V 0.52 0.76 - -
184 V 0.04 E 0.52 -0.59 - DhaAlOO
197 V 0.04 E 0.52 -0.20 - DhaAlOO
198 D 0.19 S 0.56 -0.45 - DhaA102
200 E 0.04 R 0.59 -0.59 R159 -
202 L 0.19 T 0.52 3.07 - -
203 W 0.15 L 0.78 0.67 F152, N207 -
205 F 0.26 W 0.7 2.30 - -
207 N 0.15 R 0.81 1.88 F152, W203 -
218 1 0.07 V 0.7 0.56 - -
241 G 0.15 A 0.67 0.80 - -
267 1 0.11 V 0.63 0.92 - -
273 Y 0.15 F 0.67 0.30 N41 -
285 E 0.11 A 0.52 -0.23 K263 -
a
- T h e most conserved residue at a given position of the multiple sequence alignment
b
- Frequency of the most conserved residue at a given position of the multiple sequence alignment
Table S 7. Results of the frequency ratio analysis of the HLD family.
80
Position Residue Frequency Res_TOPa
Freq_TOPb
Frequency ratio FoldX AAG [kcal.mol1
] Interactions Mutant
27 V 0.05 E 0.49 0.09 1.42 -
188 H 0.07 A 0.51 0.14 -0.04 DhaA103
191 E 0.1 A 0.55 0.19 0.10 DhaA103
271 L 0.09 G 0.57 0.16 2.92 "
- The most conserved residue at a given position of the multiple sequence alignment
b
- Frequency of the most conserved residue at a given position of the multiple sequence alignment
Table S 8 Results of the simple consensus analysis of the HLD family.
Position Residue Frequency Res_TOPa
Freq_TOPb
FoldX AAG [kcal.mol"1
] Interactions Mutant
44 S 0.3 W 0.69 12.12 Y46 -
55 V 0.16 L 0.53 0.31 - DhaA103
109 s 0.21 G 0.71 2.13 D106.1132 -
111 L 0.37 1 0.55 1.26 - -
127 A 0.25 V 0.54 -2.32 - DhaA103
130 E 0.3 N 0.67 0.21 V245. L246. 1247. L271. H272 -
188 H 0.07 A 0.51 -0.04 - DhaA103
191 E 0.1 A 0.55 0.10 - DhaA103
209 L 0.13 1 0.54 0.64 - -
244 G 0.31 D 0.68 14.59 - -
271 L 0.09 G 0.57 2.92 - -
273 Y 0.28 F 0.51 0.30 N41 -
a
- The most conserved residue at a given position of the multiple sequence alignment
b
- Frequency of the most conserved residue at a given position of the multiple sequence alignment
Table S 9. Results of the energy-based analysis of LinA.
Position Residue Mutation FoldX AAG [kcal.mol-1
] Rosetta AAG [kcal.mol1
] Interactions Mutant
3 D 1 -1.234 -3.017
3 D L -1.664 -2.867
19 D M -1.460 -2.388 R79
127 S Y -2.165 -1.952 LinAOl
145 A H -1.254 -4.888 LinAOl
133 T 1 -2.138 -3.423 LinAOl
133 T W -2.494 -3.308
133 T L -1.323 -1.986
81
Table S 10. Results of the consensus analysis of LinA.
Position Residue Frequency Res_TOPa
Freq_TOPb
FoldX AAG [kcal.mol1
] UniProt database Mutant
20 K 0.62 Y 0.15 -1.437 Halide-stabilizing
23 A 0.69 G 0.23 1.778
32 L 0.62 F 0.38 4.379
50 Y 0.54 F 0.38 -0.510 LinA02
56 A 0.54 1 0.38 5.508
59 L 0.62 A 0.38 3.642
68 F 0.62 W 0.31 0.000 LinA02
80 L 0.54 V 0.38 1.368
88 V 0.62 A 0.38 2.694
96 L 0.77 C 0.15 2.777
109 1 0.62 V 0.23 0.665
113 F 0.69 Y 0.23 0.138 Activity decrease
126 F 0.54 1 0.23 1.624
131 A 0.62 V 0.15 -1.492 LinA02
144 F 0.54 L 0.23 1.316
a
- T h e most conserved residue at a given position of the multiple sequence alignment
b
- Frequency of the most conserved residue at a given position of the multiple sequence alignment
Table S 11. Predicted effects of DhaA115 and LinAOl mutations on its stability.
Origin Enzyme Mutation
FoldX AAG
[kcal.mol1
]
Rosetta AAG
[kcal.mol1
]
Location
Secondary
structure
Rank3
Structural basis of stabilization
C128F -2.2 -8.5 Buried Sheet 250 Improved packing
T148L -2.0 -2.0 Tunnel Helix 41 Enhanced hydrophobic interactions
A172I -2.8 -2.2 Tunnel Helix 151 Enhanced hydrophobic interactions
DhaA
C176F -2.2 -7.1 Tunnel Loop 169 Improved packing
Energy-
DhaA
D198W -1.4 -4.5 Surface Helix 112 Improved packing
based V219W -1.8 -3.0 Buried Helix 109 Improved packing
Approach C262L -1.6 -4.9 Buried Sheet 66 Enhanced hydrophobic interactions
D266F -2.3 -2.4 Surface Sheet 115 Improved packing
D3I -1.2 -3.0 Surface Helix 151 Improved packing
LinA
S127Y -2.2 -2.0 Buried Sheet 58 Improved packing
LinA
T133I -2.1 -4.9 Buried Sheet 38 Enhanced hydrophobic interactions
A145H -1.3 -3.4 Surface Loop 91 Improved packing
Evolution E20S 0.4 0.5 Surface Sheet 96 Neutral/electrostatics
-based DhaA F80R 0.4 1.6 Surface Helix 98 Neutral/electrostatics
Approach A155P -0.8 -0.7 Surface Loop 4 Increased rigidity
" - The listed variant are ordered from the most flexible to the most rigid according to average residue B-factors
82
Table S 12. Examples of methods providing enzymes with outstanding stabilization.
Experimental work
Method Principle
Protocol
Tested /
successful
Enzyme
Stability
improvement
Relative
activitya
Number of
mutations
Reference
mutants
Directed evolution
GSSM
Saturating every
position
SSM 121.000 10
Haloalkane
dehalogenase
A T M = +18°C 150% 8 221
GSSM
Saturating every
position
SSM 74.000 10 Xylanese A T M = +35°C 100% 9 244
RM
Introducing point
mutations randomly
epPCR 19.000 16
Phosphite
dehydrogenase
A T 5 0
1 0
= +20°C 160% 12 245
Computational prediction ofhotspots
B-FIT
Targeting the most
flexible residues
ISM 19.000 61
Epoxide
hydrolase
A T M = +21°C 500% 10 207
HotS pot
Wizard
Targeting tunnel
residues
SSM 5.000 5
Haloalkane
dehalogenase
A T M = +19°C 40%b
4 220
PISA
Targeting interface
residue
ISM 4.000 17
D-tagatose
3-epimerase
A T 5 0
2 0
= +23°C 64% 8 71
Computational prediction of single-point mutants
FRESCO
Disulfide bridge design,
free energy calculations
SDM 67 24
Epoxide
hydrolase
A T M = +36°C 250% 10 211
ROSETTAVIP
Improving packing in
protein interior by free
energy calculation
SDM 6 4
Methionine
aminopeptidase
A T M = +18°C 70% 5 85
Environmental energy
optimization
Tobacco
S C A D S
Environmental energy
optimization
S D M 1 1 5-epi-aristolochene
synthase
A T M = +45°C 2% 224
Computational prediction of multiple-point mutants
Free energy
Gene
synthesis
Haloalkane
dehalogenase
FIREPROT calculations and
consensus design
Gene
synthesis
6 4
Haloalkane
dehalogenase
A T M = +25°C 128% 11 This study
r m - melting temperature; T 5 0
x
- temperature at which 50% of activity is lost after X minutes of incubation; RM - random mutagenesis; epPCR-error prone
polymerase chain reaction; S S M - site saturation mutagenesis; ISM - iterative saturation mutagenesis; S D M - site-directed mutagenesis
a
- Activity of a mutant compared to the wild-type at temperatures optimal for each protein
b
- Activities measured at 37°C in 40% D M S O
c
- Predicted single-point mutations were recombined in a multipoint mut
83
5 Computer-Assisted Engineering of Hyperstable Fibroblast Growth
Factor 2
Pavel Dvorak1
'2
*, David Bednař12
**, Pavel Vanacek1 , 2
*, Lukas Balek2 #
, Livia Eiselleova3
",
Veronika Štěpánková1 , 4 , 5
, Eva Šebestová1 , 2
, Michaela Kunová Bosákova3
, Žaneta Konečná3
,
Stanislav Mazurenko1 , 2
, Antonín Kuňka1 , 2
, Daniel Horák6
, Radka Chaloupková1 , 2 , 4
, Jan
Brezovsky1 , 2 , 4
, Pavel Krejci2 , 3 , 4
, Zbyněk Prokop1 , 2 , 4 , 5
, Petr Dvorak3 , 4
*, Jiri Damborsky1 , 2 , 4 , 5
*
1
Loschmidt Laboratories, Centre for Toxic Compounds in the Environment RECETOX and Department of
Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
2
Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
3
Department of Biology, Faculty of Medicine, Masaryk University, 625 00 Brno, Czech Republic
4
International Clinical Research Center, St. Anne's University Hospital, Pekařská 53, 656 91 Brno, Czech
Republic
5
Enantis Ltd., Biotechnology Incubator INBIT, Kamenice 34, 625 00 Brno, Czech Republic
6
Institute of Macromolecular Chemistry, Academy of Sciences of the Czech Republic, Heyrovskeho 2,162 06
Prague 6, Czech Republic
# authors contributed equally
Manuscript under review in Scientific Reports
84
5.1 Abstract
Fibroblast growth factors (FGFs) play numerous regulatory functions in complex organisms,
and their corresponding therapeutic potential is of growing interest to academics and
industrial researchers alike. However, applications of these proteins are limited due to their
low stability in vivo and in vitro190
. Here we tackle this problem using a generalizable
computer-assisted protein engineering strategy to create a modified FGF2 with nine
mutations displaying unprecedented stability and uncompromised biological function.
5.2 Main paper
The structurally related and highly conserved polypeptides from the family of FGFs are
involved in a number of physiological processes in diverse animal species including humans.
There has been a considerable effort in investigating members of the FGF family for
applications in pharmacy and bioengineering1 9 0
. Specifically, human FGF2 serves as a
pleiotropic regulator of proliferation, differentiation, migration, and survival in a variety of
cell types and has been studied as a promising agent in treatment of cardiovascular
diseases2 4 6
, cancer2 4 7
and mood disorders2 4 8
. It has also been shown to have efficacy in
ulcer, wound2 4 9 , 2 5 0
and epithelium healing2 5 1
and is being routinely used as an essential
component of cultivation media for human embryonic stem cells2 5 2
.
The long-term maintenance of growth factors in the tissue or media is desirable for protein
therapies and stem cell culturing, but is hindered by low thermal stability of the molecules
and their limited half-life1 9 0
'2 5 3 , 2 5 4
. Stability of FGFs was shown to be enhanced by coadministration
with heparin2 5 5
, conjugation to heparin-mimicking polymers2 5 6
,
encapsulation in microspheres1 9 6
, or fusion with proteoglycan2 5 7
. However, these
strategies possess various limitations: They might significantly affect protein activity and
are of prime concern from both safety and economic standpoints. Development of soluble,
heparin-independent FGF analogues with improved stability is clearly needed for broader
use of these interesting molecules1 9 0
.
85
Protein engineering is a powerful approach for protein stabilization2 2 0 , 2 5 8
. Stabilization by
engineering has been previously applied to highly unstable FGF1, providing up to 40-fold
improved mitogenic activity half-life by introducing stabilizing mutations at N and C
terminus |3-strand interactions of a |3-barrel architecture2 5 9 , 2 6 0
. Triple2 6 1
and quintuple2 6 2
FGF2 mutants with improved stability and up to 10-fold prolonged activity in cell culture
were recently prepared by aligning protein sequences of wild-type FGF2 with previously
reported stabilized FGF1 mutants or by employing and combining individual stabilizing
mutations published elsewhere. Nevertheless, the true potential of state-of-the-art protein
engineering strategies has not yet been exploited in the design of stable FGFs. Here, we
describe engineering of a unique nine-point mutant of low molecular weight isoform FGF2
with melting temperature (7"m) increased by 19°C and in vitro functional half-life at 37°C
improved from 10 hours to more than 20 days. This was achieved by following a computerassisted
engineering strategy combining energy-based and evolution-based analyses2 6 3
with focused directed evolution (Figure 16). We demonstrate that the developed molecule
holds a great promise both for in vitro and in vivo applications.
Using our platform, we proposed 12 single point mutations (R31L, R31W, V52T, H59F, C78Y,
N80G, L92Y, C96Y, S109E, R118W, T121K, and V125L) and 7 positions for randomization
(E54, C78, R90, S94, C96, T121, and S152) to stabilize FGF2 by computationally searching
for mutations that would minimize the Gibbs free energy of the native state combined with
a back-to-consensus analysis (Figure 16 step I, Figure S 4, Table S 5). Three out of twelve
most stabilizing substitutions were identified by the phylogenetic approach2 1 9
and nine
mutations were predicted from energetics using FoldX9 4
and Rosetta ddg monomer8 8
(Table S 13, Table S 14). Seven positions with the highest number of potentially stabilizing
mutations as defined by the difference in their free energy (aaG) < -1 kcal.mol"1
were
considered for randomization in order to look for further beneficial substitutions
overlooked by the rational approach (Table S 5). Functionally relevant positions, namely
those residues within heparin or receptor binding sites, were excluded from the designs to
avoid mutations compromising activity.
86
FGF2-G0 FGF2-G1A FGF2-G2 FGF2-G3
Figure 16. Integrated strategy combining computational analyses with focused directed evolution for
engineering hyperstable FGF2. The workflow starts with the template wild-type molecule FGF2-G0. Initially,
12 computationally designed point mutants (FGF2-G1A) were constructed and tested leading to 7 stabilizing
substitutions at 6 different positions with A T m between 0.9 and 3.7'C. Six selected substitutions were
recombined giving rise to the second generation mutant FGF2-G2 with A T m of 15°C. Subsequently, 420 clones
from 7 focused site-saturation mutagenesis libraries were screened for enhanced thermostability and
retained biological activity (FGF2-G1B), revealing 14 beneficial substitutions at 5 randomized positions with
ATm between 0.3 and 2.9'C. Guided by computational predictions, 5 substitutions from rational design were
combined with 4 mutations from semi-rational design providing the third generation protein FGF2-G3 with
ATm of 19°C.
For the purpose of in vitro mutagenesis (Figure 16 steps II and IV), the gene encoding wildtype
FGF2 was subcloned into vector pET28b with cleavable N-terminal His tag giving rise
to the recombinant variant designated FGF2-G0 (Table S 5). The numbering of mutations
used herein corresponds to the original sequence of wild-type FGF2 (c). The genes of 12
rationally designed single-point mutants representing the first generation of engineered
FGF2 (FGF2-G1A) were synthesized. Twelve mutants were produced in soluble form in
Escherichia coli BL21(DE3) in quantities similarto that of FGF2-G0 (± 5% of the total soluble
protein) and purified to homogeneity using affinity chromatography.
Biophysical characterization (Figure 16 step II) verified proper folding of all 12 mutants and
improved Tm for 7 out of 12 constructed variants (Table S 15). Six beneficial mutations
87
(R31L, V52T, H59F, L92Y, C96Y, and S109E) were re-combined in the cumulative mutant
FGF2-G2 (Figure 16 step III), which was obtained at a yield of 20 mg.L"1
. The experimental
7"m of properly folded FGF2-G2 (68.0 ± 0.2°C) was about 14.5°C higher than 7"m of FGF2-G0
(53.5 ± 0.5°C). This value corresponds well with the theoretical sum of contributions of
individual substitutions, suggesting strictly additive stabilizing effect of mutations in FGF2G2
(Tables 16).
In the next step, 7 computationally pre-selected positions in FGF2-G0 were randomized
(Figure 16 step IV) using fixed oligo technology, which reduced the screening effort needed
to only 60 clones per library to obtain full coverage. Crude extracts of individual E. coli
clones were prepared in microtiter plates and used to test simultaneously for biological
activity and stability of FGF2 mutants in growth arrest assay with rat chondrosarcoma cells
(Figure S 6). Altogether, 33 new FGF2-G1B mutants were identified during the screening
and successfully produced in E. coli to the levels similar to that of FGF2-G0. Thermal shift
assays with purified proteins revealed 14 stabilizing amino acid substitutions in 5 out of 7
randomized positions (Table S 17). Mutations showing an improved thermostability of at
least 1°C (E54D, S94I, C96N, and T121P) were computationally merged with existing
mutations in FGF2-G2 (R31L, V52T, H59F, L92Y, C96Y, and S109E, while replacing C96Y by
C96N) leading to the third generation variant FGF2-G3 (Figure 16 steps V and VI and Figure
17a).
88
Half-life of secondary structure of
FGF2 variants (in hours).
Temp. (°C) GO G2 G3
37 6.3 >24 >24
50 2.0 >24 >24
65 <0.1 2.6 >24
70 <0.1 0.3 1.6
Thermal unfolding
Figure 17. Biophysical and biological characteristics of FGF2 variants with boosted stability, (a) Structural
model of the most stable FGF2-G3 variant with visualized amino acid substitutions originating from rational
(yellow spheres) and semi-rational (red spheres) steps of the engineering strategy, (b) Circular dichroism (CD)
spectra of selected FGF2 variants exhibiting a broad positive peak centered near 227 nm and a minimum at
around 204 nm, characteristic for B-rich proteins of B-ll type, (c) Half-life of FGF2 secondary structure of
selected FGF2 variants determined by CD spectroscopy, (d) The schematic unfolding pathway of variants
described by a three-step model with irreversible unfolding of native state (N) through one intermediate (I)
to denatured protein (D) followed by formation of aggregates (A); Gibbs free energies of the states depicted
in grey were not quantified because of irreversibility, (e) Determination of the in vitro functional half-life of
selected FGF2 variants by activation of ERK pathway. Both FGF2-G2 and FGF2-G3 retained most of their
original activity for 20 days of experiment, while FGF2-G0 lost half of its activity within initial 10 hours as
determined by densitometric analysis of the corresponding Western blots (Figures 11). (f) FGF2-G2 and FGF2G3
support proliferation of human embryonic stem cells better than the wild type. Columns show means,
error bars represent standard error of the mean from three independent experiments. Student's t-test,
**p<0.01, *p<0.05. Abbreviations: MRE, mean residual el I ipticity; Ctrl, control with no FGF2; h, hours; d, days.
Purified FGF2-G3, obtained with the yield of 11 mg.L"1
, was initially characterized in vitro
using a wide spectrum of biophysical methods (Figure 16 step VII). The properly folded
89
mutant (Figure 17b) exhibited a T"m of 72.2 ± 0.1°C and AT"m of 18.7°C compared with FGF2GO,
showing again the additive stabilizing effect of all new mutations, precisely predicted
by the free energy calculation (theoretical AT"m of 19.2°C). In terms of structural integrity at
elevated temperatures, FGF2-G3 clearly outperformed both FGF2-G0 and FGF2-G2,
showing the half-life of its secondary structure of >24 h at 50°C and 1.6 h at 70°C (Figure
17c and Figure S 7). Fitting the data to a two-step unfolding mechanism suggested that
gained stability was due to an increase in the Gibbs activation energy of the first unfolding
step, which is in excellent agreement with our overall computational strategy of lowering
the energy of the native state (Figure 22d, Supplementary Results, Table S 18 and Figure S
8, Figures 9, Figures 10).
Biological assays confirmed the enhanced stability of engineered growth factors translated
into their prolonged activity in vitro and in vivo (Figure 16 step VII). Specifically, while the
activity of FGF2-G0 in an ERK1/2 assay with human embryonic stem cells dropped to 50 %
within initial 10 ± 2 hours of pre-incubation in conditioned medium at 37°C (Figure 17e and
Figure S 11), which is in good agreement with data obtained previously by different
techniques1 9 0 , 2 6 4
, G2 and G3 mutants retained most of their activity for the full length of
the study period (20 days) at the physiological temperature. In the second assay, the
human embryonic stem cells were propagated in medium conditioned with each of the
tested FGF2 variants with no additional supplementation, and the cell numbers and
morphology were recorded for five consecutive passages (Figure 17f and Figure S 12).
While conditioned medium prepared with FGF2-G0 caused significant growth retardation,
the cells incubated in the medium with either G2 or G3 mutant gave rise to monolayers,
suggesting that repeated supplementation of the conditioned medium is not required with
these proteins. lmmunostainingforOct-4and Nanog after five passages on Matrigel proved
that all tested FGF2 variants support expression of pluripotency markers (Figure S 13).
The biological activity of the stabilized variant was confirmed in vivo by injecting either
FGF2-G0 or FGF2-G3, sorbed into a non-biodegradable hydrogel, into the shaved dorsal
skin of telogenic C57BL/6 mice (Figure S 14 and Figure S 15). It is known that growth factors
90
from FGF family have positive effect on hair growth and hair follicle stimulation2 6 5
. We
evaluated the degree of black pigmentation and hair growth photometrically by observing
the skin color for 20 days. In 2 weeks, both FGF2 induced black coloration in the shaved
skin with remarkably stronger hair growth induction in group injected with thermostable
FGF2 (Figure 18a, Figure S 15). The control group with empty hydrogel showed significant
retardation in transition from telogen to anagen stage as seen on skin coloration. The hair
length of plucked hair in the proximity of injected site confirmed that thermostable FGF2
stimulated growth of hairs more significantly comparing to FGF2-G0 and the control group
with no FGF2 applied (Figure 18b). No toxic effects of recombinant proteins were observed.
120
a b
DAY 1 DAY 13 DAY 16
Figure 18. Effect of the selected FGF2 variants on hair growth promotion, (a) 7-week old C57BL/6 mice were
shaved and injected with empty hydrogel or hydrogel sorbed with FGF2-G0 or FGF2-G3 and the hair growth
was recorded during 20 consecutive days, (b) Hair length of C57BL/6 mice measured at the day 20 post
application of FGF2 variants. Hairs (n=20) were plucked in the proximity of injection site. Columns show
91
means, numbers within indicate the quantity of hairs counted per group, error bars show standard error of
the mean. Student's t-test,***p<0.001 versus control group.
In summary, we constructed a hyperstable nine-point mutant FGF2-G3 using a hybrid
computational engineering strategy2 6 3
and focused directed evolution by constructing only
14 mutants and testing only 420 clones. The FGF2-G3 shows a 19°C increase in melting
temperature and greater than 48-fold improved half-life at 37°C, representing the most
thermostable FGF2 with fully preserved biological activity known to date. Our hyperstable
FGF2 supports the undifferentiated growth of human embryonic stem cells, induces the
appropriate downstream signaling cascade, and stimulates hair growth in mouse model,
demonstrating that its biological activity towards highly sensitive cells remained
unmodified. We anticipate this construct will be directly applicable to stem cell culturing1 9 6
,
and will find broad use in clinical medicine1 9 0 , 2 5 6
, cosmetics2 6 5
, and dietary supplements.
The successful demonstration of rapid protein stabilization highlights the power of our
rational protein engineering strategy and should encourage wider use of the described
workflow for stabilizing growth factors and other protein therapeutics that are currently
being tested for use in cancer treatment, regenerative medicine, or a number of
metabolism-associated disorders1 9 0
.
Acknowledgements
The work was supported by the Grant Agency of the Czech Republic (GA16-06096S and
GA15-23033S), Ministry of Education of the Czech Republic (L01214, LQ1605, LM2015051,
LM2015047 and LM2015055), and Ministry of Health of the Czech Republic (15-33232A).
MetaCentrum and CERIT-SC are acknowledged for providing access to computing facilities
(LM2015042, LM2015085). The funders had no role in study design, data collection and
analysis, decision to publish, or preparation of the manuscript.
92
5.3 Methods
5.3.1 Prediction of stabilizing effect of single-point mutations by evolution-based
approach
Multiple sequence alignment and evolutionary conservation analysis. The FGF2 isoform 3
protein sequence (UniProt identifier: P09038-2) was used as a query for PSI-BLAST2 2 6
search
against nr database of NCBI. PSI-BLAST was performed with the E-value thresholds of 10"1
for the initial BLAST search and the threshold of 10"5
for inclusion of the sequence in the
position specific matrix. Sequences collected after 3 iterations of PSI-BLAST were clustered
by CD-HIT2 2 8
at the 90% identity threshold. Resulting dataset of 554 sequences was
clustered with CLANS2 2 9
using default parameters and varying P-value thresholds.
Sequences clustered together with FGF2 at the P-value of 10"3 0
were extracted and aligned
with the MUSCLE program2 3 0
. The alignment was refined manually in BioEdit2 6 6
. All
incomplete or diverged sequences were removed. The final alignment comprising 238
sequences was used to estimate the level of conservation of individual sites within the FGF2
related proteins. Relative evolutionary rates for individual positions were calculated by the
Rate4Site v2.01 program2 1 6
using the empirical Bayesian method2 6 7
and WAG model of
evolution2 3 2
. The evolutionary rates were then converted to the ConSurf conservation
scale2 1 7
.
Selection of individual mutations. The multiple sequence alignment comprising 238 FGF2
protein sequences was used as an input for back-to-consensus analysis using the simple
consensus approach. The analysis was performed using the consensus cut-off of 0.5,
meaning that a given residue must be present at a given position in at least 50% of all
analysed sequences to be assigned as the consensus residue. Stability effects of all possible
single-point mutations in FGF2 protein were estimated by free energy calculations (see
following section for the calculation details). Only mutations with average Gibbs free
energy (AAG) < 1 kcal.mol"1
predicted by both FoldX9 4
and Rosetta8 8
were considered as
hot-spots for FGF2 stabilization. Functionally important sites of FGF2 were excluded as
potentially deleterious mutations for biological function. Results of the back-to-consensus
93
analysis are summarized in Supplementary Table SI. The numbering corresponds to the
sequence of wild-type human FGF2 (Supplementary Fig. 2c). Ten mutations were excluded
based on the high value of predicted AAG or its high uncertainty of the prediction, and
three mutations were discarded from the design due to their location at functionally
important positions for the heparin binding. Four single-point mutations V52T, N80G, L92Y
and S109E passed all criteria and were selected for experimental construction and
characterization.
5.3.2 Prediction of stabilizing effect of single-point mutations by energy-based
approach, selection of positions for randomization
Available structures of FGF2 with resolution higher than 2.20 A (PDB-ID codes: 1BFG, 4FGF,
2FGF, 1BAS, 1BFB, 1BFC, 1BFF, 1EV2,1FGA) were downloaded from the RCSB Protein Data
Bank2 3 4
. The structures were visualized in PyMOL molecular graphics system v 1.7.7.4
(Schrodinger LLC, USA) and prepared for analyses by removing ligands and water
molecules. Chain A was chosen in the case of multiple chain structure. Missing atoms in
side chains were added by <RepairPDB> module of FoldX9 4
.
Prediction of stability effects by FoldX. Stability effects of all possible single-point
mutations were estimated using the FoldX <BuildModel> module. Calculations were
performed 5-times for each mutant following the recommended protocol (pH 7,
temperature 298K, ion strength 0.050 M, VdWDesign 2). For each mutation, the total AAG
value was calculated by averaging all AAG values obtained for a respective mutation in all
analyzed FGF2-G0 crystal structures.
Prediction of stability effects by Rosetta. Repaired structures were minimized by the
minimize_with_cst module of Rosetta8 8
with both backbone and side-chains optimization
enabled (-sc_min_only false), distance for full atom pair potential set to 9 A (-fa_max_dis
9.0), standard weights for the score function and a constraint weight of 1 (--
constraint_weight 1.0). Output from the minimization was used to constraint Ca atoms
with harmonic function within 0.5 A distance from the initial position in the crystal
structure. Protocol 16 incorporating backbone flexibility within the ddg_monomer module
94
of Rosetta was applied according to Kellogg and co-worker8 8
. The soft-repulsive design
energy function (soft_rep_design weights) was used for repacking side-chains (--
sc_min_only false). Optimization was performed on each whole protein without distance
restriction (-local_opt_only false). The previously created constraint file was used during
backbone minimization (—min_cst true). Three rounds of optimization with increasing
weight on the repulsive term (-ramp_repulsive true) were applied. The minimum energies
from 20 iterations were used as the final parameters describing the stability effects of
single -point mutations.
Selection of individual mutations. Mutations predicted as stabilizing by either FoldX or
Rosetta tool (AAG < -1.0 kcal.mol"1
) and not significantly constrained during evolution
(Consurf conservation score < 8) were selected for further analysis. In this way, the
potentially stabilizing mutations with only a limited influence on functional regions, e.g.,
heparin binding residues, were identified. Residues forming the FGF2/FGFR1 interface
(PDB-ID code 1CVS) and FGF2/FGFR2 interface (PDB-ID code 1EV2) were identified using
the PISA server2 6 8
and were discarded from the selection. Nine single-point substitutions
were selected for experimental construction and characterization: R31W, R31L, H59F,
C78Y, L92Y, C96Y, R118W, T121K and V125L (Supplementary Table 2). Interestingly, the
L92Y mutation was identified also previously by evolutionary-based approach. The
numbering of these mutants corresponds to the sequence of wild-type human FGF2
(Supplementary Fig. 2c).
Selection of positions for saturation mutagenesis. Positions for saturation mutagenesis
that should reveal additional stabilizing mutations were proposed using Rosetta
calculations. Every protein position was saturated by all twenty proteinogenic amino acids
and number of stabilizing mutations (AAG < -1.0 kcal.mol"1
) was identified for individual
positions. Positions with conservations score > 7 or situated on the functional regions were
discarded. Seven positions with the highest number (> 3) of stabilizing mutations (E54, C78,
R90, S94, C96, T121, and S152) were selected for saturation mutagenesis (Supplementary
Table 2). Positions 31 and 59 were discarded from selection because significant
95
improvement in thermostability (A7"m=4°C and 3°C, respectively) was verified
experimentally for the mutations R31L and H59F (Supplementary Table 3). Therefore, the
probability of further considerable improvement was negligible.
5.3.3 Construction, production and characterization of single point FGF2-G1A variants
Twelve FGF2-G1A variants R31W, R31L, V52T, H59F, C78Y, N80G, L92Y, C96Y, S109E,
R118W, T121K and V125L were commercially synthesized (GeneArt/Life Technologies,
Germany) and subcloned in the Ndel and Xhol sites of pET28b-His-thrombin downstream
inducible T7 promotor. E.coli BL21(DE3) cells were transformed with expression vectors,
plated on agar plates with kanamycin (50 ug.ml"1
) and grown overnight at 37°C. Single
colonies were used to inoculate 10 ml of LB medium with kanamycin and cells were grown
overnight at 37°C. Overnight culture was used to inoculate 200 ml of LB medium with
kanamycin. Cells were cultivated at 37°C. The expression was induced with IPTG to a final
concentration of 0.25 mM. Cells were then cultivated overnight at 20°C. At the end of
cultivation, biomass was harvested by centrifugation and washed by purification buffer A
(20 m M di-potassium hydrogenphosphate and potassium dihydrogenphosphate, pH 7.5,
0,5 M NaCI, 10 mM imidazole).
Cells in suspension were disrupted by sonication using ultrasonic processor Hielscher
UP200S (Teltow, Germany) with 0.3 s pulses and 85 % amplitude. Cell lysate was
centrifuged for 1 h at 21,000 g at 4°C. FGF2 variants were purified from crude extracts using
single step nickel affinity chromatography. Crude extracts were applied to a 5 ml Ni-NTA
Superflow column (QIAGEN, USA). Column was attached to FPLC Akta (Amersham
Pharmacia Biotech, USA). The buffer system consisted of buffer A and buffer B (20 mM dipotassium
hydrogenphosphate and potassium dihydrogenphosphate, pH 7.5, 0,5 M NaCI,
500 mM imidazole). FGF2 proteins were eluted with a one-step increasing linear gradient
of 0 to 100 % buffer B in 20 column volumes. The presence of FGF2 in peak fractions was
proved by SDS-PAGE using 15 % polyacrylamide gel stained with Coomassie Brilliant Blue
R-250 dye (Fluka, Buchs, Switzerland). Fractions with FGF2 were pooled and concentration
of total protein was determined by Bradford method (Sigma-Aldrich, St. Louis, USA).
96
Precipitation of FGF2 variants was minimized by dialysis against 20 m M potassium
phosphate buffer containing 750 mM NaCI. Purified proteins were stored at 4°C.
Differential scanning calorimetry and circular dichroism spectroscopy. The
thermostability of FGF2-G1 mutants was determined by differential scanning calorimetry
(DSC) assay. Thermal unfolding of 1.0 mg.ml"1
protein solutions in 50 mM phosphate buffer
(pH 7.5) with 750 mM sodium chloride was followed by monitoring the heat capacity using
the VP-capillary DSC system (GE Healthcare, USA). The measurements were performed at
the temperatures from 20 to 80°C at lT.min"1
heating rate. Tm was evaluated as the top of
the Gaussian curve after manual setting of the baseline. Proper folding of all mutants was
verified by circular dichroism (CD) spectroscopy. CD spectra of mutants dialyzed in 50 mM
phosphate buffer pH 7.5 and diluted to the concentration of 0.2 mg.ml"1
were recorded at
20°C using a spectropolarimeter Chirascan (Applied Photophysics, United Kingdom)
equipped with a Peltier thermostat. Data were collected from 200 to 260 nm, at 100
nm.min"1
, 1 s response time and 2 nm bandwidth using a 0.1 cm-quartz cuvette. Each
spectrum is the average of five individual scans and is corrected for absorbance caused by
the buffer. Collected CD data were expressed in terms of the mean residue ellipticity.
Thermal unfolding was followed by monitoring the ellipticity at 232 nm over the
temperature range from 20 to 80°C at a heating rate l°C.min"1
. Recorded thermal
denaturation curves of FGF2 variants were normalized to represent signal changes
between approximately 0 and 1 and fitted to sigmoidal curves. The melting temperatures
were evaluated from the collected data as a midpoint of the normalized thermal transition.
5.3.4 Free energy calculations, construction and thermostability analysis of FGF2-G2
mutant
All mutations improving the melting temperature by at least 0.5°C were selected for in silico
analysis using Rosetta ddg monomer application8 8
as described earlier. In the case of two
stabilizing mutations on the same position, the mutation with the larger effect was chosen.
Selected mutations were then tested for potential antagonistic effect. The additivity of
stabilizing mutations was evaluated by predicting the stability of variants with all pairs of
97
stabilizing single-point mutations (Supplementary Table 4). Mutation pairs for which the
respective double-point mutants showed lower stability in the comparison with the sum of
both single-point mutants taken separately would be considered as antagonistic. However,
none of the double-point mutants had the difference > 1 kcal.mol"1
suggesting the absence
of any significant antagonistic effects among the selected mutations. All mutations
improving Tm by at least 0.5°C (R31L, V52T, H59F, L92Y, C96Y and S109E) were combined
into 6-point mutant designated FGF2-G2. The gene of multiple-point mutant was
commercially synthesized (GeneArt/Life Technologies, Germany), subcloned in the Ndel
and Xhol sites of pET28b-His-Thrombin and expressed in E. coli BL21(DE3) cells as described
before. DSC was used to characterize protein thermal stability of protein purified by affinity
chromatography. DSC data collection was performed over a temperature range of 20°C-
100°C.
5.3.5 Construction and screening of focused site-saturation mutagenesis libraries
Altogether 7 focused site-saturation mutagenesis libraries were constructed commercially
using "Fixed Oligo" technology of GeneArt (Life Technologies, USA). The pET28b-Histhrombin::/g/2
was used as a template for randomization. Plasmid DNA was transformed
into E. coli XJb (DE3) Autolysis cells (Zymo Research, USA). Cells were streaked on LB agar
plates with kanamycin (50 pg.ml'^and incubated overnight at 37°C. Plates with colonies
carrying negative control (empty pET28b), positive control (plasmid pET28b-Histhrombin::/g/2-G2)
and background control (pET28b-His-thrombin::/o//2) were prepared
correspondingly.
Preparation of libraries for screening. Single colonies were used for inoculation of
individual wells in 1 ml 96 deep-well plates (Thermo Fisher Scientific, USA) containing 250
pi of LB medium with kanamycin (50 pg.ml"1
). Colonies were transferred by colony picking
robot CP7200 (Hudson Robotics, USA) or using sterile wooden toothpicks. Plates were
incubated overnight at 37°C with shaking (200 rpm) in shaking incubator NB-205 (N-Biotec,
South Korea) in high humidity chamber to avoid evaporation of the medium. After 16 hrs,
50 pi of culture from each well was transferred to the new microtiter plate containing 50
98
u.1 of sterile 40 % glycerol per well and resulting plates were stored at -70°C as replicas.
Expression of chromosomally inserted A. lysozyme and mutant variants of FGF2 in original
plates with remaining 200 u.1 of overnight culture was induced by addition of 800 u.1 of fresh
LB medium with kanamycin, IPTG and L-arabinose to the final concentration of 50 pig-ml"1
,
0.25 mM and 3 mM, respectively. Plates were incubated overnight at 20°C with shaking
(180 rpm). After 22 hrs, the plates were centrifuged for 20 min (3000 g, 4°C) using Sigma 6-
16K (Sigma Laborzentrifugen, Germany). Supernatant was drained using JP recirculating
water aspirator (VELP Scientifica, Italy). Whole microtiter plates with cell pellets were
frozen at -70°C. Then, plates were incubated for 20 min at room temperature and 100 u.1 of
lysis buffer (20 mM sodium phosphate buffer, 150 mM NaCI, pH 7.0) was added into the
each well. Plates were incubated for 20 min at 30°C with shaking (200 rpm). Cell debris was
removed from resulting cell lysates.
Concentration of total soluble protein in crude extract in one well with negative control,
one well with positive control, one well with background control and in 6 randomly selected
wells with new FGF2-G1B mutants was determined for each plate containing one of the
libraries using Bradford reagent (Sigma Aldrich, USA). Samples of crude extracts from the
same wells were loaded on SDS polyacrylamide gels. The gels were analysed using GS-800
Calibrated Densitometer (Bio-Rad, USA) and the content of FGF2 in the total soluble protein
in each sample was determined. The concentration of FGF2 in crude extracts was calculated
based on the obtained data. The plates with crude extracts were stored at -70°C for further
use.
Screening of biological activity of FGF2-G1B variants using rat chondrosarcoma growtharrest
assay. Rat chondrosarcoma (RCS) cells is an immortalized phenotypically stable cell
line that responds to minute concentrations of FGFs with potent growth arrest
accompanied by marked morphological changes and extracellular matrix degradation. FGF
receptor 3 (FGFR3) functions as a negative regulator of cell proliferation in this cell line. In
order to inhibit cell proliferation, FGF variants have to specifically induce FGFR signal
transduction allowing the measuring of FGF activity reflected by the concentration
99
dependence of induced growth arrest. The major advantage of the RCS assay is the
exclusion of toxic chemicals and false-positive hits2 6 9
. The high-throughput growth arrest
experiment was performed in a 96-well plate format with the cellular content determined
by simple crystal violet staining. Media with or without bacterial crude extracts with
variants of FGF2 in approximate concentration of 40 ng.ml"1
were incubated at 41.5 °C for
48 h and mixed every 12 h within this period. RCS cells were seeded in concentration 250
cells per well in 96-well plate, one day before the treatment. Cells were treated with
preincubated FGF2 at a final concentration of 20 ng.ml"1
for 4 days. Cells were washed with
PBS, fixed with 4% paraformaldehyde, washed again and stained with 0.025% crystal violet
for 1 hour. Coloured cells were 3 times washed with distilled water. Colour from cells was
dissolved in 33% acetic acid. Absorbance was measured at 570 nm (Supplementary Fig. 3).
The more stable variant of FGF2 was present in added crude extract, the more evident was
the growth inhibition. Samples causing more significant growth inhibition than samples
containing FGF2-G0 were considered as positive hits. E. coli clones containing FGF2
candidates were refreshed from glycerol replica plates. For each of positive hits, four wells
with LB medium in fresh 1 ml 96-well PP microtiter plate were inoculated and the whole
screening procedure was repeated. For each of FGF2 candidates verified in second round
of screening, 10 ml of LB medium with kanamycin was inoculated with corresponding E.
coli clone from glycerol replica plate and the cells were grown overnight at 37°C with
shaking. The overnight culture was used for isolation of plasmid DNA using GeneJET
Plasmid Miniprep kit (Thermo Fisher Scientific, USA) and fgf2-GlB genes were
commercially sequenced by Sanger method (GATC Biotech, Germany). Resulting sequences
were aligned with nucleotide sequence of FGF2-G0 using BioEdit32 for determination of
newly inserted mutations.
5.3.6 Small scale production and characterization of selected FGF2-G1B mutants
E. coli BL21(DE3) cells were transformed with pET28b-His-thrombin::/g/2x (where x
represents one of 33 new mutant variants), plated on LB agar plates with kanamycin (50
ug.ml"1
) and grown overnight at 37°C. Small scale cell cultivations in 10 ml of LB medium
with kanamycin was conducted under conditions described before. The biomass was
100
centrifuged at 10,000 g for 2 minutes at 4°C in a benchtop centrifuge Mikro 200 (Andreas
Hettich GmbH & Co.KG, Germany) and the cell pellet was frozen at -70°C. The pellets were
defrosted and resuspended in 600 pi of FastBreak Cell Lysis Reagent from MagneHis Protein
Purification System (Promega, USA) added with NaCI to the concentration of 500 mM and
lpl of DNase I (New England Biolabs, USA). The cells were incubated with shaking for 20
minutes at room temperature. The bacterial lysates were incubated with 30 pi of MagneHis
Ni-Particles beads for 2 minutes at room temperature. The beads were separated using
magnetic stand and the supernatants were carefully removed. To wash out unbound cell
proteins, 150 pi of MagneHis Binding/Wash Buffer with 500 m M NaCI was added. The
elution of bound proteins was performed by adding 105 pi of MagneHis Elution Buffer
containing 500 mM NaCI. The presence of FGF2-G1B variants in eluted fractions was proved
by SDS-PAGE as described before.
Determination of thermal stability of FGF2-G1B mutants. The thermal stability of FGF2G1B
variants was verified by thermal shift assay2 7 0
. FGF2-G0 was used as a background
control. The measurements were conducted in MicroAmp Fast Optical 96-well Reaction
Plate (Thermo Fisher Scientific, USA). Each reaction mixture of final volume of 25 pi was
composed of 2 pi of SYPRO Orange Protein Gel Stain (Thermo Fisher Scientific, USA),
purified FGF2 variant (2.5 mg.ml1
) and the elution buffer (100 m M HEPES, 500 mM
imidazole and 500 m M NaCI, pH 7.5). The assay was performed using StepOnePlus RealTime
PCR System (Applied Biosystems/Thermo Fisher Scientific, USA) with starting
temperature of 25°C (2 min initial equilibration) and ramping up in increments of 1°C to a
final temperature of 95°C. The Tm values were determined from obtained data using
Protein Thermal Shift software (Applied Biosystems/Thermo Fisher Scientific, USA;
Supplementary Table 5).
5.3.7 Free energy calculations, construction and purification of FGF2-G3 mutant
All mutations from screening improving the melting temperature by at least 1°C (E54D,
S94I, C96N, and T121P) were selected for in silico analysis using Rosetta ddg monomer
application8 8
as described earlier. All double-point mutant combinations of newly identified
101
mutations with existing mutations from stable variant FGF2-G2 (R31L, V52T, H59F, L92Y,
C96Y, and S109E) were constructed in silico to predict potential additivity of these
individual mutations (data not shown). Predicted AAG was compared with the sum of AAG
of individual mutations but none of the double point mutants had the difference > 1
kcal.mol"1
again suggesting the absence of any antagonistic effects among selected
mutations. Consequently, 9-point mutant FGF2-G3 was designed and constructed
combining 4 most stabilizing substitutions obtained from screening with 5 substitutions
from FGF2-G2. In FGF2-G3 mutant, the substitution C96N was prioritized over C96Y due to
its higher individual stabilizing effect verified experimentally. Predicted improvement in Tm
for this new variant was 19.2°C. The gene of multiple-point mutant was commercially
synthesized (GeneArt/Life Technologies, Germany), subcloned in the Ndel and Xhol sites of
pET28b-His-Thrombin and expressed in E. coli BL21(DE3) cells as described before.
5.3.8 Characterization of biophysical properties of FGF2-G3 and its comparison with
FGF2-G0 and FGF-G2 variants.
DSC was used to characterize protein thermal stability of protein purified by affinity
chromatography. DSC data collection was performed over a temperature range of 20°C-
100°C. Proper folding of mutant was verified by CD spectroscopy as described earlier.
Circular dichroism spectroscopy. The structural integrity of FGF2-G0, FGF2-G2 and FGF2G3
proteins was followed by monitoring the ellipticity over the wavelength range of 200 to
260 nm at the temperature 37, 50, 65 and 70°C for 24 h. Data were recorded in 2 minute
intervals with 1 nm bandwidth using a 0.1 cm quartz cuvette containing the protein.
Recorded denaturation curves (either single exponential, double exponential or
exponential linear combination function) of tested FGF2 variants were globally fitted to
exponential decay curves using OriginPro8 software (OriginLab, USA). Half-life of FGF2
secondary structure (ti/2 defined as a time required to reduce the initial value of ellipticity,
as a measure of protein secondary structure, to Vi of the original value) was evaluated from
the collected data as a decay constant (t) using the Equation 15:
102
t 1 / 2 = ^ = r H 2 )
Equation 15
where A. is exponential decay constant. The 11/2 was evaluated from the data collected at
227 nm, where all spectra showed the ellipticity maxima (Supplementary Fig. 4). The
maximum time of measurement of 24 h was limited by the capacity of the bomb with
compressed nitrogen used in the CD spectroscopy protocol. The values of ti/2for FGF2-G0
at 65 and 70 °C were not determined because the proteins were denatured immediately at
the beginning of the measurement.
Fluorescence spectroscopy. Local conformational changes during thermal unfolding of
FGF2 variants were followed by monitoring fluorescence emission spectra using FluoroMax
spectrofluorometer (Horiba, Japan). The sample in quartz cuvette with a magnetic stirrer
inside was placed into a temperature-controlled holder, and fluorescence spectra excited
at 295 nm were recorded in 1 minute intervals from 310 to 410 nm with 1 nm bandwith
and 0.1 s integration time from 30 to 90 °C. The actual temperature in the cell was
monitored using thermocouple controlled by Labview software (National Instruments,
USA). The spectrum of the buffer recorded at 30 °C was used as a blank and subsequently
subtracted from the sample data. Unfolding was followed at the emission maximum (347
nm) of the first scan. The concentration of all the samples was approx. 0.1 mg.ml"1
.
Differential scanning fluorescence (DSF). A slightly different experimental set-up was used
for monitoring fluorescence during thermal unfolding. The standard grade capillary
(NanoTemper, Germany) was filled with a sample and placed into the Prometheus NT.48
(NanoTemper, Germany). The samples were continually heated from 30 to 90 °C at
different scan rates (0.3, 0.5,1, 2 and 4 °C) and fluorescence signal excited at 295 nm was
followed at 335 and 350 nm. The concentration dependence of the unfolding curve was
checked by monitoring aliquots of different concentrations (1, 0.5, 0.25 and 0.125 mg.ml"
103
Data analysis. The data were uploaded to an extension of CalFitter (Masaryk University,
Czech Republic) based on MATLAB 2014b (The MathWorks, United States) that allows
simultaneous global fit into unfolding curves. DSC data were numerically integrated to
derive the total heat absorbed during the transition. Then the signals from the four types
of measurement, namely CD ellipticity, DSC heat absorption, and two fluorescence
measurements, were normalized and fit globally (Supplementary Figs. 5 and 6). After the
initial fitting, the weighted least squares were calculated based on the sum of the squared
residuals of each curve; thereby, the contributions of each type of the measurements to
the sum of errors were the same at the optimal point. Regarding the parameters, linear
coefficients were allocated separately to each data set. The models with the minimum
number of intermediates were selected based on the quality of fit measured by normalized
residuals and visual inspection. Although the datasets of the wild type and the G2 mutant
were fitted reasonably well with just a three-step irreversible model, the G3 model had to
include one additional reversible pretransitional step to fit all the data sets perfectly, mainly
due to the DSC data and ratiometric data from DSF. Nonetheless, the effect of this step is
much less pronounced than the subsequent irreversible steps of unfolding in all the
datasets. The estimated values of the main parameters of the unfolding mechanism are
given in Supplementary Table 6.
pH profile. Britton-Robinson buffers of different pHs (5, 6, 7, 7.5, 8, 9,10 and 11) were used
for determination of pH stability profile for the FGF2 variants. 5 u l sample aliquot was
mixed with 95 u l of BR buffer of appropriate pH, thoroughly vortexed and incubated for 3
h at 4 °C. Next standard grade capillary was filled with a mixture by capillary forces and put
into the Prometheus NT.48. Samples were scanned from 30 to 90 °C at l°C.min"1
scan rate.
5.3.9 Characterization of biological properties of FGF2-G3 and its comparison with GO
and G2 variants
Determination of biological activity half-life. FGF-receptors and their downstream
effectors including ERK1/2 are activated upon treatment with FGF2, contributing to
pluripotency of human embryonic stem cells (hESC)2 7 1 , 2 7 2
. As the biological activity of FGF2
104
decreases at 37°C, ERK1/2 phosphorylation declines and hESC easily become primed to
differentiation. To test the thermal stability of FGF2 variants, the hESC medium prepared
without FGF2 was supplemented with FGF2-G0, FGF2-G2 or FGF2-G3 to the final
concentration of 10 ng.ml1
and pre-incubated at 37°C for 6 h, 12 h, 24 h, 2 d, 4 d - 20 d.
FGF2-starved hESC were treated with hESC medium containing pre-incubated FGF2 for two
hours and Western blotted for phosphorylated ERK1/2. Two representative blots per each
protein variant were analysed using ImageJ 1.50b (National Institutes of Health, USA) and
band densities were plotted as a function of pre-incubation time (Supplementary Fig. 8).
Data points were analysed using single exponential decay Equation 16:
A/AO = exp(-t/i) + c
Equation 16
where A/AO is the relative density at time t, t is the time constant, and c is steady state
level of density offset, with help of OriginPro8 software (OriginLab, USA) and the half-life
(i.e. the time required for the loss of one-half of the initial activity) was determined for
FGF2-G0 protein variant.
Western blotting. Cells were lysed with 2x Laemmli buffer and the samples were boiled at
98°C for 10 minutes. Proteins were separated by SDS-PAGE and electrotransferred onto
lmmobilon®-P transfer membrane (Merck Millipore, Germany). Membranes were then
blocked in 5% milk in TBS buffer and incubated with primary rabbit polyclonal antibody P-
44/42 MAPK (Cell Signaling Technologies, USA) at 4°C overnight. The primary antibodies
included rabbit polyclonal anti-pERKl/2 and rabbit polyclonal anti-ERKl/2 (both Cell
Signaling Technology, USA). Next day, the membranes were incubated with donkey antirabbit
antibodies conjugated with horse raddish peroxidase (Santa Cruz Biotechnology,
USA), and the protein bands were visualized using chemiluminiscence detection reagent
ImmobilonTM Western (Merck Millipore, Germany) on photographic paper (Agfa-Gevaert,
Belgium). After stripping, the membrane was re-probed with rabbit polyclonal antibody
p44/42 MAPK (Erkl/2) (Cell Signaling Technology, USA) against total signaling proteins.
105
Cell cultures. The hESC employed in this study were derived from blastocyst-stage embryos
obtained with informed consent of donors. A well characterized human ESC line (Adewumi,
2007) CCTL14 (Centre of Cell Therapy Line) in passages 65 - 75 was used. The hESC were
maintained under feeder-free conditions using MatrigelTM hESC-qualified Matrix (BD
Biosciences). Culture medium required for propagation of hESC grown on Matrigel was
medium conditioned by mitotically inactivated mouse embryonic fibroblasts (mEF). For
preparation of standard conditioned medium (CM), the complete hESC medium containing
4 ng.ml"1
FGF2 is usually conditioned by mitotically inactivated mEF for 5-7 days and then
supplemented by 10 ng.ml"1
of FGF2 to restore growth factor concentration due to its
degradation. In our experiments, to test the long-term thermostability of FGF2, the CM was
prepared out of medium containing 10 ng.ml"1
of FGF2 with no supplementation
afterwards.
Proliferation assays. The hESC were plated into 24-well plates and propagated in presence
of each of the tested FGF2 for five passages and counted every three days after plating
using Burker chamber. Alternatively, cells were plated into 94-well plates and cultured in
the presence of various FGF2 for 6 days. Cells were then fixed in 4% paraformaldehyde (20
min, RT), stained with 0.1% crystal violet (60 min, RT), and destained with 33% acetic acid
(20 min with shaking). The absorbance of the supernatant was then measured at 570 nm
using plate reader (Supplementary Fig. 9).
Immunocytochemistry. The hESC were fixed with 4% paraformaldehyde (20 min, RT),
permeabilized with 0.1% Triton-XlOO in PBS (20 min, RT), and incubated with primary
antibodies at 4°C overnight. Primary antibodies included goat polyclonal anti-Oct4 (Santa
Cruz Biotechnology, USA) and rabbit monoclonal anti-Nanog (Cell Signaling technology,
USA). Next day, incubations with secondary antibodies conjugated to AlexaFluor488 or
AlexaFluor594 (Thermo Fisher Scientific, USA) were carried out at RT for 1 h. Coverslips
were mounted in DAPI-containing Mowiol (Sigma-Aldrich, USA). Microscopic analysis was
performed using Confocal LSM 700 microscope (Zeiss, Germany; Supplementary Fig. 10).
106
Preparation of hydrogel. Macroporous poly(2-hydroxyethyl methacrylate) (PHEMA)
microspheres of narrow particle size distribution were prepared by multi-step swelling
polymerization2 7 3
. Briefly, the method is based on 0.7 pm monodisperse polystyrene seeds
which are swollen with activating agent (dibutyl phthalate), monomers (2(methacryloyl)oxyethyl
acetate, 2-[(methoxycarbonyl)methoxy]ethyl methacrylate,
ethylene dimethacrylate), and porogen (cyclohexyl acetate). After benzoyl peroxideinitiated
and (hydroxypropyl)methyl cellulose-stabilized polymerization, hydrolysis, and
washing, the resulting PHEMA microspheres were 3 pm in diameter, with a narrow size
distribution (Supplementary Fig. 11) and contained 0.5 mmol COOH/g.
In vivo experiment. Animals used in the study of hair promoting activity, 7 weeks old female
C57BL/6 mice, were obtained from Laboratory Animal Breeding and Experimental Facility
(Masaryk University, Brno, Czech Republic) and maintained on a standard laboratory diet
and water as libitum. 17 animals in 3 randomized groups (n=5 or 6) were shaved using
depilatory cream (Veet, USA) at 7 weeks of age, at which all hair follicles were synchronized
in the quiescence telogen2 7 4
. FGF2, both FGF2-G0 and thermostable variant FGF2-G3 (5 pg
per mouse) were dissolved in 0,1% human serum albumin and sorbed into the hydrogel by
continuous stirring for 2 hours at RT. The resulted suspension was applied topically on
dorsal skin of C57BL/6 mice with subcutaneous injection. Empty hydrogel with no FGF2 was
used as a control. Visible hair growth was recorded at days 1, 7, 13, 16 and 20
(Supplementary Fig. 12).
Hair length determination. To examine the effect of FGF2 in hydrogel on hair length, hairs
on the proximity of injected site were plucked randomly by forceps at day 21 post
application. The average length of the 20 plucked hairs per mouse was measured manually
with a micrometer under a stereoscopic microscope and expressed in millimeters.
107
5.4 Supporting Information
Unfolding mechanism ofFGF2. Two-step unfolding mechanism including one intermediate
was revealed for all tested FGF variants by the global fit of unfolding data (Figure S 8, Figure
S 9 and Table 6). The main difference in the variants lies in the Gibbs energy barrier of the
first irreversible step (AG*1) with the value 111±0.3 kJ-mol- 1
at 25°C for the wild type FGF2GO,
increased by 29.7±0.9 kJ-mol1
and 35.7±0.8 kJ-mol1
for the FGF2-G2 and FGF2-G3,
respectively. The predicted AAG values from computer modeling were -26±10 kJ-mol"1
and
-30±9 kJ-mol"1
for the FGF2-G2 for FGF2-G3, respectively, which is in good agreement with
the experimental data. The stability of both FGF2-G2 and FGF2-G3 can be attributed to the
increase in the Gibbs activation energy of the first unfolding step. Moreover, a pretransitional
step (Table S 18, Step 0) and slightly lower optimal pH range 6.0-7.5 forTl/2
was revealed for FGF2-G3, when compared with FGF2-G0 and FGF2-G2 variants preferring
pH range of 7.0-8.0 (Figure S 10).
108
R31 V 5 2
E54
Figure S 4. Proposed stabilizing positions in the structure of wild-type human FGF2. Positions selected by
energy-based approach for site-directed mutagenesis are shown as yellow spheres, while the positions
determined for randomization as red spheres. Positions selected by evolution-based approach are shown as
green spheres.
a
10 20 30 40 50 60
| | I I I I I I I I I I
ATGGGCAGCA GCCATCATCA TCATCATCAC AGCAGCGGCC TGGTGCCGCG CGGCAGCCAT
70 80 90 100 110 120
| | | | | | | | | | | |
ATGGCAGCCG GGAGCATCAC CACGCTGCCC GCCTTGCCCG AGGATGGCGG CAGCGGCGCC
130 140 150 160 170 180
| | | | | | | | | | | |
TTCCCGCCCG GCCACTTCAA GGACCCCAAG CGGCTGTACT GCAAAAACGG GGGCTTCTTC
190 2 0 0 2 1 0 2 2 0 2 3 0 2 4 0
109
C T G C G C A T C C A C C C C G A C G G C C G A G T T G A C G G G G T C C G G G A G A A G A G C G A C C C T C A C A T C
250 260 270 280 290 300
I I I I I I I I I I I I
A A G C T A C A A C T T C A A G C A G A A G A G A G A G G A G T T G T G T C T A T C A A A G G A G T G T G T G C T A A C
310 320 330 340 350 360
I I I I I I I I I I I I
C G T T A C C T G G C T A T G A A G G A A G A T G G A A G A T T A C T G G C T T C T A A A T G T G T T A C G G A T G A G
370 380 390 400 410 420
I I I I I I I I I I I I
T G T T T C T T T T T T G A A C G A T T G G A A T C T A A T A A C T A C A A T A C T T A C C G G T C A A G G A A A T A C
430 440 450 460 470 480
I I I I I I I I I I I I
A C C A G T T G G T A T G T G G C A C T G A A A C G A A C T G G G C A G T A T A A A C T T G G A T C C A A A A C A G G A
490 500 510 520 530
I I I I I I I I I I
C C T G G G C A G A A A G C T A T A C T T T T T C T T C C A A T G T C T G C T A A G A G C T A G C T C G A G
b
10 20 30 40 50 60
I I . . . . I I I I I I I I I I
M G S S H H H H H H S S G L V P R G S H M A A G S I T T L P A L P E D G G S G A F P P G H F K D P K R L Y C K N G G F F
70 80 90 100 110 120
I I I I I I I I I I I I
L R I H P D G R V D G V R E K S D P H I K L Q L Q A E E R G W S I K G V C A N R Y L A M K E D G R L L A S K C V T D E
130 140 150 160 170
I I I I I I I I I I I
C F F F E R L E S N N Y N T Y R S R K Y T S W Y V A L K R T G Q Y K L G S K T G P G Q K A I L F L P M S A K S
C
10 20 30 40 50 60
I I I I I I I I I I I I
M A A G S I T T L P A L P E D G G S G A F P P G H F K D P K R L Y C K N G G F F L R I H P D G R V D G V R E K S D P H I
70 80 90 100 110 120
I I I I I I I I I I I I
K L Q L Q A E E R G W S I K G V C A N R Y L A M K E D G R L L A S K C V T D E C F F F E R L E S N N Y N T Y R S R K Y
130 140 150
I I I I I I I
| s W Y | a L K R T G Q Y K L G S K T G P G Q K A I L F L P M S A K S
Figure S 5. The nucleotide (a) and amino acid (b) sequences of FGF2-G0 with upstream sequences
in pET28b vector and amino acid sequence of wild-type FGF2 (c). Start codons and corresponding
methionines are in grey, 6xHis tag is in turquoise, t h r o m b i n cleavage recognition site is in magenta, stop
codon is in red, restriction sites of Ndel and Xhol are underlined, and mutated amino acid positions in the
sequence of wild-type FGF2 are in green.
1.0
0.8
o
LO
ra 0.6
c
CD
15 0.4
o
Q.
O
0.2
0.0 11 iCTRL CTRL CTRL G5
WT +
G6 G7 G8 H1 H2 H3 H4 H5 H6 H7 H8
Figure S 6. Example of output data from screening of biological activity of mutated FGF2 variants in crude
extracts (CE) originating from the library FGF2-C96X. Coding on X axis corresponds to the wells of original
microtiter plate. CEs pre-incubated at 41.5°C were added to the rat chondrocytes grown in parallel microtiter
plates to the final concentration of FGF2 of 20 n g . m l 1
and inhibition of growth of chondrocytes was compared
to the samples containing controls by measuring the optical density. CTRL-, negative control, CE from E. coli
cells with empty pET28b plasmid; CTRL+, positive control, CE from E. coli cells producing FGF2-G1A mutant
R31L; CTRL WT, background control, CE from E. coli cells producing FGF2-G0. Black line represents the
threshold assigned corresponding to the background control CTRL WT. Clones G5 and H3, whose CE caused
statistically more significant growth arrest of rat chondrocytes than background control were selected for rescreening
as positive hits. Error bars represent deviations calculated from 2 replicated measurements.
I l l
200 4 0 0 6 0 0 8 0 0
Time (min)
1000 1200
100 150 2 0 0
Time (min)
300
400 6 0 0
Time (min)
1000
Figure S 7. Structural stability of selected FGF2 variants determined by CD spectroscopy. Structural stability
of FGF2-G0 at 50°C (a), FGF2-G2 (b) and FGF2-G3 (c) at 70 °C. Solid lines represents the best fit.
112
Step 1 Step 2 Step 3
N > I >D >Ag
Temperature (°C)
113
0 i
40 50 SO 70 30 00
Temperature (°C)
45 50 55 9 0 5 5 70 75 BO 8 5 90
Temperature (°C)
Figure S 8. Global fit of a three-step model for (A) the wild type, (B) the FGF2-G2 mutant, and the modelled
fractions of the states for (C) the wild type and (D) the FGF2-G2 mutant. The proposed mechanism of
unfolding for the two variants is on top (N, I, D, and Ag stand for the natural, intermediate, denatured, and
aggregated states, respectively). The model with three steps of unfolding was successfully fitted into all four
data sets: DSC (diamonds), CD (stars), equilibrium fluorescence (circles), and DSF (points). The fitted curves
are depicted in blue. The respective fractions of the states of unfolding are given in the bottom graphs: natural
(black), intermediate (blue), denatured (yellow), and aggregated (red).
114
Step 0 Step 1 Step 2 Step 3
N* < >N >I >D >Ag
50 55 60 65 70 75 80 85 00
Temperature (°C)
115
50 55 SO 55 70 75 30 35 90
Temperature (°C)
50 55 60 95 70 75 BO 35
Temperature (°C)
Figure S 9. Global fit of a three-step model for the FGF2-G3 mutant (A), DSF signal of FGF2-G3 mutant (B),
the modelled fractions of the states (C), and the modelled scan rate dependence (D). (A,C) The model with
four steps of unfolding was successfully fitted into all four data sets: DSC (diamonds), CD (stars), equilibrium
fluorescence (circles), and DSF (points). The fitted curves are depicted in blue. The respective fractions of the
states of unfolding are given in the bottom left graph: natural* (black), natural (brown), intermediate (blue),
denatured (yellow), and aggregated (red). (B) The ratio (stars) of 330 nm and 350 nm clearly indicates the
change in the signal for the transition area of the first step obtained from the global fit (55-70°C); however,
neither of the wavelengths (dots) separately indicated any significant change. (D) The predicted values of the
fully irreversible model (orange) provided poor fit into the data obtained at low scan rates (black) as
compared with the four-state model with a reversible step (blue).
116
5.0 6.0 7.0 7.5 8.0 9.0 10.0 5.0 6.0 7.0 7.5 8.0 9.0 10.0
pH pH
Figure S 1 0 . The pH profile of thermal unfolding of the variants, (a) The Gibbs activation energy (AG*) of the
first step at temperature 25°C. (b) The temperatures at which half of the native state protein has already
undergone the first step (T1/2).
117
a
F G F 2 - G 0
Time of incubation at 37 °C
p E R K 1/2
E R K 1/2
*f <f ^° <F
— <$
A O ^ A %
A %
A %
^ ^ ^ iS^ ^ ^
? 6 * \ 6 ^ c ^ . c\ . a , . ^ .fe A n O
^° ^ V N* N* N% ^
F G F 2 - G 2
p E R K 1/2
! 8 *
F
G F 2 - G 3
p E R K 1/2
E R K 1/2
•
50 100
Time (hrs)
5 10 15
Time (days)
5 10 15
Time (days)
Figure S 11. Determination of in vitro biological activity half-life of selected FGF2 variants by activation of
ERK pathway, (a) Representative Western blots for each of FGF2-G0, FGF2-G2 and FGF2-G3 variants. To verify
equal loading, stripped membranes were blotted with antibody recognizing all forms of the signaling proteins,
(b) Densitometric analysis of Western blots performed using ImageJ software. Data from analysis of two
independent blots are shown for each of the protein variants as circles and squares. The optimal fit of two
data sets is shown as a black line.
118
a b
No F G F 2 FGF2-G0
Figure S 12. Stabilized FGF2-G2 and FGF2-G3 support proliferation of hESC better than FGF2-G0 and
repeated supplementation of the conditioned media is not required. The hESC were propagated in
conditioned medium in the presence of each of the tested FGF2 variants, and the cell numbers (a) and
morphology (b) were recorded. Columns show means, error bars represent standard error of the mean from
three independent experiments. Scale bar 500 u.m. Student's t-test, **p<0.01, *p<0.05.
119
Oct-4 Nanog DAPI
No FGF2
F G F 2 - G 0
F G F 2 - G 2
F G F 2 - G 3
No primary
antibody
Figure S 13. Stabilized FGF2-G2 and FGF2-G3 maintain pluripotency marker expression of hESC. The hESC
were propagated as monolayers on Matrigel and immunostained for pluripotency markers Oct-4 (red) and
Nanog (green). All of the tested FGF2 equally supported expression of Oct4 and Nanog. Scale bar 100 urn.
120
Figure S 14. TEM micrograph of PHEMA microspheres.
No FGF2 FGF2-G0 FGF2-G3
iimiiuiifiiiiiDAY 7
DAY 13
DAY 16
DAY 20
tiiaiiiiitiiaiFigure S 15. Effect of the selected FGF2 variants on hair growth promotion. 7-week old C57BL/6 mice were
shaved and injected with empty hydrogel or hydrogel sorbed with FGF2-G0 or FGF2-G3 and the hair growth
was recorded during 20 consecutive days.
121
Table S 13. Back-to-consensus mutations identified in FGF2 using 50% consensus cut-off.
Mutations selected for experimental construction are highlighted in bold.
Residue Position Mutation Frequency a
AAG
(kcal.mor1
)
FoldX
AAG
(kcal.mol"1
)
Rosetta
P 22 L 0.59 _b _b
K 27 R 0.52 _b _b
R 42 Q 0.53 2.38 3.04
V 52 T 0.53 0.05 -0.70
Q 63 E 0.61 0.04 1.37
E 67 V 0.71 -0.09 -0.39
A 79 S 0.58 0.47 1.22
N 80 G 0.56 -1.21 -0.03
K 86 N 0.71 1.09 1.67
L 92 Y 0.75 0.03 -2.14
A 93 G 0.53 1.98 2.22
S 109 E 0.69 -0.26 0.51
K 128 N 0.51 0.02 1.22
R 129 K 0.58 0.02 -0.20
K 138 R 0.53 0.10 0.62
L 147 H 0.68 1.44 1.10
M 151 R 0.55 0.97 1.92
AAG, change in Gibbs free energy upon mutation;a
frequency of the most conserved residue at a given position
of the multiple sequence alignment;b
unreliable prediction.
122
Table S 14. The stabilizing mutations and positions for randomization selected based on the free
energy prediction, conservation analysis and visual inspection.
Residue Position Mutation
Stabilizing
mutations
AAG
(kcal.mol"
1
)
FoldX
AAG
(kcal.mol1
)
Rosetta
Conservation
score3
R 31 L - -2.0 - 7
R 31 W - - -4.0 7
E 54 X 6 - - 3
H 59 F - -1.7 - 3
C 78 Y - -1.0 - 3
C 78 X 15 - - 3
R 90 X 4 - - 4
L 92 Y - - -2.3 7
S 94 X 5 - - 7
C 96 Y - - -3.0 3
C 96 X 17 - - 3
R 118 W - - -1.6 3
T 121 K - -1.1 - 7
T 121 X 4 - - 7
V 125 L - -1.3 - 7
S 152 X 5 - - 3
AAG, predicted change in Gibbs free energy upon mutation;a
ConSurf conservation score calculated from
the multiple sequence alignment.
123
Table S 15. Thermostability of FGF2-G1A mutants determined by differential scanning
calorimetry. Mutations selected for construction of combined FGF2-G2 mutant are highlighted in
bold.
FGF2 variant A7m (°C) Design approach
R31W 1.3 energy-based
R31L 3.7 energy-based
V52T 2.0 evolution-based
H59F 3.1 energy-based
C78Y -0.5 energy-based
N80G -1.1 evolution-based
L92Y 1.3 both
C96Y 0.9 energy-based
S109E 0.9 evolution-based
R118W -0.9 energy-based
T121K -0.8 energy-based
V125L -5.3 energy-based
A r m , change in melting temperature upon mutation compared to FGF2-G0.
124
Table S 16. Prediction of additive effect of individual mutations in double-point mutants of FGF2
and in FGF2-G2.
Mutation 1 Mutation 2
Average energy
(kcal.mol"1
)
Expected energy
(kcal.mol"1
) AE (kcal.mol1
)
R31L C96Y -3.64 -3.74 0.10
R31L H59F -2.19 -1.96 -0.24
R31L L92Y -3.24 -3.06 -0.18
R31L S109E -0.61 -0.25 -0.36
R31L V52T -1.61 -1.62 0.00
V52T C96Y -3.27 -3.91 0.64
V52T H59F -1.51 -2.13 0.62
V52T L92Y -2.54 -3.23 0.69
V52T S109E 0.02 -0.43 0.45
H59F C96Y -3.97 -4.25 0.28
H59F L92Y -3.23 -3.57 0.35
H59F S109E -0.55 -0.77 0.22
L92Y C96Y -4.87 -5.35 0.48
L92Y S109E -1.69 -1.87 0.18
C96Y S109E -2.20 -2.55 0.34
Average energy Expected energy
FGF2-G2 (kcal.mol-1
) (kcal.mol-1
) AE (kcal.mol-1
)
R31L+V52T+H59F+L92Y+C96Y+S109E -7.23 -7.74 0.51
125
Table S 17. Thermostability of FGF2-G1 mutants determined by thermal shift assay. Mutations
selected for construction of recombined FGF2-G3 mutant are highlighted in bold.
FGF2 variant Tm (°C)1 A7m (°C) FGF2 variant Tm (°C)1 A7m (°C)
FGF2-G0 50.7 0.0 ±0.3 C96R 51.2 0.5 ±0.2
E54D 52.8 2.1 ±0.1 C96S 51.7 0.9 ±0.1
C78M 51.0 0.3 ±0.1 C96N 52.5 1.7 ±0.3
R90K 48.2 -2.6 ± 0.3 C96W 50.0 -0.7 ±0.5
R90A 48.1 -2.6 ± 0.4 T121C 49.8 -1.0 ±0.1
R90V 47.3 -3.4 ±0.2 T121F 49.0 -1.7 ±0.3
R90N 46.9 -3.9 ±0.0 T121A 50.9 0.1 ±0.1
S94I 52.3 1.5 ±0.3 T121P 53.6 2.9 ± 0.3
S94T 51.3 0.6 ±0.0 T121R 49.7 -1.1 ±0.0
S94M 49.7 -1.0 ±0.1 T121H 50.0 -0.7 ±0.1
S94V 51.1 0.3 ± 0.5 T121Q 51.8 1.0 ±0.3
S94N 49.7 -1.0 ±0.3 T121G 51.0 0.3 ±0.1
S94L 51.4 0.6 ±0.1 T121Y 47.9 -2.8 ±0.0
S94R 48.0 -2.7 ± 0.3 S152Q 48.5 -2.2 ±0.2
S94C 48.4 -2.4 ± 0.7 S152R 48.9 -1.8 ±0.1
S94G 50.7 0.0 ±0.4 S152N 49.7 -1.1 ±0.2
C96Q 52.1 1.3 ±0.2 S152V 47.8 -2.9 ± 0.4
n.d., not determined; Tm, melting temperature; ATm, change in melting temperature upon mutation; 'The
average from three independent experiments ± standard deviation is presented.
126
Table S 18. Estimated parameters of the global fit based on the three- and four-step models. The
values are given with 95% confidence intervals calculated from the fitting under the assumption
of asymptotic normality of the residuals.
o
Q.
CD
CO
Q.
CD
CO
CM
Q.
CD
co
CO
Q_
CD
co
GO G2 G3 Units
AHvh 291.9 ± 8.3 kJmol-1
Tm 343.5 ± 0.3 K
AG0(25°C) 38.6 ± 1.3 kJmor1
Ea1 264.5 ± 1.2 461.4 ± 5.4 462.2 ± 4.5 kJmor1
Tn 347.6 ± 0.1 349.0 ± 0.1 354.4 ± 0.0 K
AG*i(25°C) 110.9 ± 0.3 140.5 ± 0.9 146.6 ± 0.7 kJmor1
Ea 2 101.9 ± 1.2 207.3 ± 11.9 92.8 ± 18.2 kJmor1
Tf2 393.1 ± 0.9 368.9 ± 0.4 410.2 ± 14.8 K
AG*2(25°C) 97.9 ± 0.5 113.0 ± 2.5 98.6 ± 7.8 kJmor1
Ea3 35.6 ± 0.7 232.6 ± 28.7 37.8 ± 10.7 kJmor1
Tf3 658.7 ± 13.0 375.5 ± 1.0 613.5 ± 133.0 K
AG*3(25°C) 92.8 ± 0.7 121.2 ± 6.5 92.7 ± 9.7 kJmol-1
127
6 Balancing the Stability-Activity Trade-Off by Fine-Tuning
Dehalogenase Access Tunnels
Veronika Liškova,1
'2
David Bednař,1
'2
Dr. Tatyana Prudnikova,3
'4
Dr. Pavlina Řezačova,5
'6
Dr. Tana Koudelakova,1
Dr. Eva Šebestová,1
Assoc. Prof. Ivana Kuta Smatanova,3
'4
Dr. Jan Brezovsky,1
Dr. Radka Chaloupková,*1
Prof. Jiri Damborsky*1
'2
1 Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic
Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00
Brno (Czech Republic)
2 International Clinical Research Center, St. Anne's University Hospital, Pekařská 53, 656 91 Brno
(Czech Republic)
3 Faculty of Science University of South Bohemia in Ceske Budějovice, Branisovska 31, 370 05 Ceske
Budějovice (Czech Republic)
4 Institute of Nanobiology and Structural Biology, Academy of Sciences of the Czech Republic, Zamek
136, 373 33 Nove Hrady (Czech Republic)
5 Institute of Molecular Genetic, Academy of Sciences of the Czech Republic, Videnska 1083, 142 20
Prague 4 (Czech Republic)
6 Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic,
Flemingovo nam. 2,166 10 Prague 6 (Czech Republic)
ChemCatChem, 2015, 7, 648-659.
DOI: 10.1002/cctc.201402792
128
6.1 Abstract
A variant of the haloalkane dehalogenase DhaA with greatly enhanced stability and
tolerance of organic solvents but reduced activity was created by mutating four residues in
the access tunnel. To create a stabilized enzyme with superior catalytic activity, two of the
four originally modified residues were randomized. The resulting mutant F176G exhibited
10- and 32-times enhanced activity towards 1,2-dibromoethane in buffer and 40% DMSO,
respectively, while retaining high stability. Structural and molecular dynamics analyses
showed that the new variant exhibited superior activity because the F176G mutation
increased the radius of the tunnel's mouth and the mobility of a-helices lining the tunnel.
The new variant's tunnel was open in 48 % of trajectories, compared to 58 % for the wildtype,
but only 0.02 % for the original four-point variant. Delicate balance between activity
and stability of enzymes can be manipulated by fine-tuning the diameter and dynamics of
their access tunnels.
129
6.2 Introduction
Enzymes are natural catalysts with great potential for industrial applications. Most natural
enzymes are not tolerant of harsh conditions such as those associated with extremes of pH,
elevated temperatures, high salinity, or the presence of organic solvents. The development
of thermodynamically and kinetically stable enzymes that retain high activity under harsh
operating conditions has thus been a major but challenging goal in protein engineering over
the last few decades. Enzymes' structural stability is usually maintained via non-covalent
interactions including hydrogen bonds, salt bridges, hydrophobic interactions and van der
Waals forces, all of which help to enhance the robustness of these biocatalysts. As such,
the most common strategy for increasing enzyme stability is to introduce new interactions
that increase the activation energy of enzyme denaturation6 6
. However, many protein
mutagenesis studies have shown that stability and function are often tightly related mutations
that increase stability often reduce function and vice-versa2 7 5 - 2 7 8
. This negative
correlation between enzyme stability and functionality is known as the stability-activity
trade-off.
Several different protein engineering techniques can be used to manipulate enzymes'
properties, depending on the property of interest and the available information on the
enzyme's structure-function relationships. In general, there are three main strategies for
developing modified enzymes: rational design, directed evolution and semi-rational
design2 7 9
. Rational design uses various computational tools to identify key residues and
predicts the effects of mutations based on knowledge of tertiary structure and structurefunction
relationships. This approach usually generates limited numbers of mutants,
necessitating only a modest amount of laboratory work on screening and selection1
'2 8 0
-2 8 1
.
In contrast, directed evolution uses random mutagenesis to generate mutant libraries with
mutations across the entire gene sequence. These libraries are then screened or selected
to identify improved variants. This strategy does not require structural information about
the target protein but an efficient screening or selection assay is essential for the
identification of interesting hits2 8 2 , 2 8 3
. Semi-rational design combines the benefits of the
130
directed evolution and rational design approaches. Specific residues or segments of the
enzyme's structure are identified by the rational approach and then subjected to
mutagenesis to create focused libraries that do not require laborious screening. At the
same time, these small focused libraries have much higher hit rates than those generated
in directed evolution, while including mutations that would be unlikely to be identified or
explored on the basis of computational results2 0 4 , 2 8 4 - 2 8 6
.
Haloalkane dehalogenases (HLDs; EC 3.8.1.5) are enzymes that catalyze the hydrolytic
cleavage of carbon-halogen bonds in halogenated hydrocarbons to yield the corresponding
alcohol, a proton and a halide1 6 5
. HLDs have potential applications in bioremediation2 8 7 , 2 8 8
,
decontamination1 6 9
, industrial biocatalysis1 6 6 , 1 6 7
, biosensing1 7 0 , 2 8 9
and cell imaging1 7 1
'2 9 0 , 2 9 1
.
However, their utility in these applications is limited by their low stability and activity under
the harsh conditions that are often required2 2 0 , 2 4 0 , 2 9 2
.
A stabilized variant of the HLD DhaA from Rhodococcus rhodochrous NCIMB 130641 7 2
has
been created by Gene Site Saturation Mutagenesis (GSSM)2 2 1
. Compared to wild-type
DhaA, the resulting highly stable ten-point DhaA mutant (named DhaA63 in Koudelakova
et al. 20132 2 0
) was 30,000 times more capable of refolding after denaturation at 55 °C.
However, its catalytic efficiency in aqueous buffer was six times lower than that of the wildtype
enzyme. Several other DhaA variants2 2 0
with substantially improved stability and an
ability to tolerate the presence of the organic co-solvent dimethylsulfoxide (DMSO) were
obtained by a combination of random mutagenesis and focused directed evolution2 2 0
.
Detailed biochemical and structural analysis of these variants revealed that their
stabilization was largely due to mutations in the residues that form the access tunnel, which
connects the enzyme's buried active site to the surrounding solvent. While the tunnel
mutants exhibited increased stability and retained catalytic activity in 40% (v/v) DMSO,
their activity in buffer solutions was low. The DhaA80 variant, which carries four of the ten
substitutions found in DhaA63 (T148L, G171Q, A172V and C176F), exhibited 4000-fold
greater kinetic stability than the wild-type enzyme in 40% (v/v) DMSO and remained stable
at temperatures up to 16.4 °C higher than those inactivating the wild-type. However, the
131
catalytic activity of engineered DhaA80 towards 1,2-dibromoethane in a buffer solution
was reduced by two orders of magnitude. The stabilization of DhaA80 was due to the
introduction of four residues at the opening of the access tunnel, three of which were bulky
and hydrophobic. These mutations led to enhanced intramolecular hydrophobic packing at
the tunnel opening and possibly prevented the destabilization of the protein structure due
to the admission of DMSO molecules to the active site2 2 0
.
This work aimed to address these issues by enhancing the catalytic activity of the highly
stable DhaA80 variant in buffer solutions. Mutagenesis targeting two of the four tunnel
mouth residues that were replaced to create DhaA80 (V172 and F176) led to the
identification of the DhaA106 variant, which differs from DhaA80 in only a single residue
(F176G) but exhibits 32 times greater catalytic activity in a buffer while sacrificing only 4 °C
of thermal stability. Moreover, DhaA106 exhibited enhanced activity towards 26 out of 30
tested halogenated compounds and thus replicates the substrate specificity of the wildtype
enzyme. Crystallographic analysis followed by molecular modelling revealed that
enhanced catalytic activity of DhaA106 is due to an increase in the diameter of the access
tunnel and the mobility of the adjacent secondary structure elements.
6.3 Results
6.3.1 Rational design of two focused libraries
Compared to the wild-type enzyme, the DhaA80 variant bears four substitutions (T148L,
G171Q, A172V and C176F) in the tunnel-lining residues. These were introduced during a
combined mutation and screening campaign that tested the activity of the mutated
enzymes towards 1,2-dibromoethane in 40% (v/v) DMSO. One of these tunnel residues
(Q171), showed the particularly high variability in the study of Gray et al.2 2 1
and was
randomized in our previous study with the goal of further enhancing the enzyme's
stability2 2 0
. Several of the resulting variants exhibited improved thermostability at the
expense of catalytic activity. Previous saturation mutagenesis of T148 with the degenerate
132
codon NNK provided only the substitution by leucine in two independent libraries. We
therefore performed mutagenesis experiments targeting the other two tunnel residues
(V172 and F176), using DhaA80 as a template. In order to minimize losses of enzyme
robustness, the stabilization potentials of all possible combinations of substitutions,
including self-mutations in the two target positions, were investigated using FoldX9 4
.
Neutral or stabilizing effects were predicted for 15 of the 800 possible variants (Table S 19).
Val, lie and Leu were stabilizing or neutral residues in position 172 while Leu, Met, Trp and
Phe were identified as the stabilizing or neutral residues in position 176. Degenerate
codons encoding all of the strongly stabilizing residues identified in the two positions were
designed using the CASTER v2.02 9 3
. The resulting degenerate codons VTY and WKS were
used to construct a smart library in which positions 172 and 176 were simultaneously
saturated (library I). In addition, a second library (library II) was constructed in which the
codon for position 176, that had not been previously mutated by GSSM2 2 1
, was replaced
with the degenerate codon NNK, which encodes all of the standard amino acid residues.
6.3.2 Screening of libraries and biochemical characterization of variant DhaA106
Both libraries were constructed using QuickChange Site-Directed Mutagenesis Kit (Agilent
Technologies, Santa Clara, USA). Colonies of the libraries were screened using a pH
indicator-based colorimetric assay optimized for the presence of the organic co-solvent
DMSO2 9 4
. In total, the activities of 142 and 94 colonies from libraries I and II, respectively,
were tested against 1,2-dibromoethane in 52% (v/v) DMSO. The number of colonies tested
in each case represented at least 95 % of the total variation in each library. Two positive
hits were identified in each library. The hit with the greatest improvement in activity
relative to the template in the presence of DMSO and in buffer solution was the singlepoint
mutant F176G, which was designated DhaA106 (Table S 20). This variant was
expressed in a larger volume of Escherichia coli BL21 cells, purified to homogeneity and
subjected to detailed biochemical and structural analysis. Its properties were compared to
those of the template (DhaA80), the wild-type enzyme (DhaA) and the previously described
highly thermostable and solvent-tolerant DhaA63 variant.
133
Circular dichroism spectroscopy in the far-UV spectral region was used to assess the impact
of the single-point F176G mutation on the folding and secondary structure of DhaA106.
The circular dichroism spectrum of DhaA106 was identical to those of the other DhaA
variants (Figure S 16): all of the enzymes' spectra exhibited a single positive peak at 195 nm
and two negative maxima at 208 and 222 nm, features that are characteristic of a-helical
content2 9 5
. Thermal denaturation experiments were performed to test the effect of the
F176G mutation on the thermal stability of DhaA106. The melting temperature of DhaA106
was 62.7 °C, which is around 4 °C and 6 °C lower than those of DhaA80 and DhaA63,
respectively. However, it is 12 °C higher than the value for DhaA (Table 4). This result was
not unexpected because the F176G mutation replaces a hydrophobic and sterically
demanding phenylalanine residue in the enzyme's tunnel mouth with small glycine residue,
eliminating stabilizing hydrophobic and van der Waals interactions. The phenylalanine
residue was previously estimated to increase the melting temperature by 5 °C relative to
the wild-type enzyme2 2 0
.
Table 4. Melting temperatures of DhaA variants.
Variants Tm [°C]
DhaA 50.4 ± 0.3[ a l
DhaA63 68.3 ±0.3[ a ]
DhaA80 66.8 ± 0.2[ a ]
DhaA106 62.7 ±0.1
[a] Data from Koudelakova et al. 2013.2 9 6
The specific activities of the purified DhaA variants were tested against 1,2-dibromoethane
in buffer solution and buffer solutions containing either 40% or 52% DMSO (v/v) (Chyba!
Nenalezen zdroj odkazu., Table S 21). In the absence of DMSO, the specific activity of
DhaA106 was 32- and 9- fold higher than that of DhaA80 and DhaA63, respectively. While
the activity of DhaA106 in a pure buffer was significantly greater than that of the template
DhaA80, it was 70 % lower than that of wild-type DhaA (Chyba! Nenalezen zdroj odkazu.a).
134
Interestingly, activity tests in the presence of DMSO revealed that DhaA106 was the most
active tested variant (Chyba! Nenalezen zdroj odkazu.b,c). Its activity in 40% (v/v) DMSO
was 10 times greater than the template's and twice as high as that of the most
temperature- and solvent-resistant variant DhaA63. Raising the DMSO concentration in the
reaction buffer to 52 % significantly reduced the activity of all DhaA variants but did not
change their order of activity in terms of co-solvent tolerance (DhaA106 > DhaA63 >
DhaA80> DhaA). These results imply that the F176G mutation in the mouth of the enzyme's
access tunnel significantly enhanced its activity in aqueous environments and in the
a) b) c)
250 40
30
20 -
10
3 -
2 -
1 iDhaA
DhaA63 DhaASO DhaA106 DhaA DhaA63 DhaA80 DhaA106 DhaA DhaA63 DhaASO DhaA106
presence of
To better understand the origin of this change in activity, steady-state kinetic constants
were determined for conversion of 1,2-dibromoethane by DhaA106 and compared to those
for DhaA80, DhaA63 and DhaA (Figure 19, Table 5 and Table 6). In an aqueous environment,
the F176G mutation significantly increased the enzyme's catalytic rate, suppressed
substrate inhibition and reduced the free enzyme's affinity for 1,2-dibromoethane. The
turnover number of DhaA106 in buffer was 32- and 5-fold higher than those for DhaA80
and DhaA63, respectively, and 3-fold lower than that of DhaA (Figure 19a, Table 5). Adding
40% (v/v) DMSO to the reaction mixture reduced the kcat values for all of the studied DhaA
variants while increasing their /Cm and Ks; constants, implying that the organic co-solvent
acts as a mixed inhibitor (Figure 19b, Table 6). Notably, DMSO very strongly reduced the
affinity of free DhaA106 and the enzyme-substrate complex for 1,2-dibromoethane.
135
DhaA106 consequently exhibited no substrate inhibition and had the highest catalytic rate
of the tested variants in the presence of DMSO (Table 6).
Table 5. Steady-state kinetic parameters of DhaA variants with 1,2-dibromoethane in buffer
solution.
Variant Km [mM] K » [ m M ] kat [s1
] kcat/Km [s^mM1
]
DhaA 3.56±0.67[ a l
3.56±0.70[ a l
28.67 ± 3.95[ a ]
8.05 ± 2.63[ a l
DhaA63 1.70 ± 1.28[ a ]
0.24±0.20[ a ]
2.23 ± 1.46[ a l
1.31 ± 1.85[ a ]
DhaA80 0.13±0.09[ a ]
0.41±0.33[ a l
0.34±0.14[ a l
2.62 ± 2.89[ a I
DhaA106 0.89±0.08[ b ]
1.28 ±0.11 11.01 ± 0.23 12.37 ± 1.37
[a] Data from Koudelakova et al. 20132 9 6
. [b] Cooperativity with Hill coefficient n = 1.36 ± 0.13.
Table 6. Steady-state kinetic parameters of DhaA variants with 1,2-dibromoethane in 40% (v/v
DMSO.
Variant Km [mM] Ksi [mM] kcat [s"1
] kcat/Km [f'mM"1
]
DhaA N D ^ N D ^ N D ^ N D ^
DhaA63 1.08±0.26[ t > 1
41.44 ±14.14[ b l
0.72 ± 0.06[ t > 1
0.66 ± 0.22[ b ]
DhaA80 0.88±0.16[ b ]
13.14 ± 2.73[ b l
0.25 ± 0.02[ b l
0.28 ± 0.07[ b I
DhaA106 11.17 ± 1.38 NA[ c l
3.14 ±0.21 0.28 ± 0.05
[a] ND, data could not be collected under the comparable conditions due to protein instability, [b] Data from Koudelakova t
al. 2013,2 9 6
[c] NA, not applicable.
136
Figure 19. Steady state kinetic profiles of DhaA variants (DhaA in blue, DhaA63 in yellow, DhaA80 in green
and DhaA106 in red) at 37 °C. a) in a buffer and b) in 40 % (v/v) DMSO. Note the different scales in the two
figures. The kinetic profile of DhaA in 40 % DMSO was measured under different conditions to the other
variants (the duration of the experiment was limited to 3 min) due to its very low stability in the experimental
environment.
The specific activity of DhaA106 was tested further using a set of 30 halogenated substrates
(Chyba! Nenalezen zdroj odkazu., Table S 22) to determine whether the F176G mutation
affected its activity in aqueous buffer towards substrates other than 1,2-dibromoethane.
The activity of DhaA106 was greater (between 2 and 49 times higher) than that of DhaA80
for all tested compounds. The enzyme was most active towards multisubstituted C2-C3
19CH
|> 180;
1 60
0
1 50
-e> ^ 8 b « e .e ^ .a »e .a A « ( ? „6 -e <,
137
bromolkanes including 1,2-dibromoethane; 1,2-dibromopropane; 1,2,3-tribromopropane
and l,2-dibromo-3-chloropropane.
In addition, the activity of DhaA106 was comparable to or greater than that of DhaA for
more than half of the tested substrates. Principal component analysis (PCA) using
transformed activity data set was used to explore the relationships between the individual
DhaA variants (Figure S 17). A similar analysis demonstrated that wild-type HLD enzymes
cluster into four distinct substrate specificity groups (SSGs)2 2 3
. Like DhaA and DhaA80,
DhaA106 was found to belong to SSG-I (Figure S 17a). Enzymes in SSG-I are robust catalysts
with high activity towards brominated ethanes and propanes, and detectable activity
towards poorly degradable compounds such as 1,2-dichloroethane, 1,2-dichloropropane
and 1,2,3-trichloropropane2 2 3
.
Although all of the tested DhaA variants belong to the same SSG, their substrate
preferences differed to some extent as demonstrated by their different positions on the
PCA scores plot (Figure S 17a). The relative activity of DhaA106 was more than one order
of magnitude greater than that of DhaA80 for several substrates including 1-chlorobutane;
1,3-dichloropropane; 1,2-dibromoethane; 1,2-dibromopropane; 4-bromobutyronitrile;
1,2,3-tribromopropane and l,2-dibromo-3-chloropropane. Unlike DhaA, DhaA106
exhibited decreased preference for substrates with longer alkyl chains such as 1bromohexane;
1-iodohexane; and (l-bromomethyl)-cyclohexane, as well as disubstituted
C2-C3 haloalkanes such as 1,2-dibromoethane; 1,3-dibromopropane; and 2,3-
dichloropropene.
6.3.3 Crystallographic analysis of DhaA106
The structure of DhaA106 was solved at the resolution of 1.69 A (Table S 23) by molecular
replacement using the structure of DhaA14 (PDB ID 3G9X2 9 7
) as a search model. The
resulting diffraction data enabled the localization of residues 4-295, showing that the
enzyme exists as a monomer in the crystal with a solvent content of approximately 41.96
%. As expected, the overall structure of DhaA106 resembles that of DhaA2 9 7
'2 9 8
, consisting
of an a/|3-hydrolase core domain and a helical cap domain (Figure S 18). The core domain
138
is formed by a central twisted eight-stranded |3-sheet (mostly parallel, with a single
antiparallel |32-strand) surrounded by six a-helices. The cap domain consists of five cthelices
linked by six loop insertions. The active site is located in a predominantly
hydrophobic cavity, at the interface between the core and the cap domains, connected to
the protein surface by two access tunnels. Visual inspection of the crystal structure
revealed that the F176G mutation changed the diameter of the main access tunnel and the
intramolecular contacts between its hydrophobic residues.
6.3.4 Molecular dynamics and access tunnels analysis
Molecular dynamics (MD) simulations were performed to further explore the structural
basis of the enhanced catalytic activity and reduced thermostability of DhaA106. Two
independent 200 ns long simulations were run for each of DhaA, DhaA63, DhaA80 and
DhaA106. CAVER 3.012 9 9
was then used to analyze 100,000 snapshots from each simulation
to identify the access tunnels and to provide information on the opening and closing of the
access tunnel as well as time-resolved changes in bottleneck radii. Aside from the main
access tunnels, slot tunnels were identified in the structures of all studied enzymes. The
slot tunnels showed less favorable geometric parameters than the main access tunnels, as
deduced from the tunnel width, length and curvature (Table S 24), indicating that the main
tunnel acts as the preferred pathway for transport of studied substrates and products. The
main access tunnel of DhaA was detected in 94 % of the snapshots taken during the
simulation and was open in 58 % of the snapshots. The average tunnel bottleneck radius
was 1.5 A, with a maximum value of 3.1 A (Figure 20, Table S 25). The secondary structures
of the DhaA cap domain exhibited substantial flexibility: the distance between the two
helices situated on the opposite sides of the tunnel (quantified in terms of the distance
between the Ca atoms of residues F144 and C176) ranged from 7.0 to 14.0 A (Figure S 19).
The access tunnels of DhaA63 and DhaA80 showed very similar properties. The main access
tunnel was detectable in 6 % and 1 % of the snapshots for DhaA63 and DhaA80,
respectively. Similarly, the tunnel was only open in 0.05 % of the DhaA63 snapshots and
139
0.02 % of those for DhaA80. These mutants had identical average bottleneck radii of 1.1 A,
with a maximal radius of 1.7 A (Figure 20, Table S 25). The separation of the helices in the
cap domain was somewhat more constrained than in the wild-type protein, ranging from
8.0 to 13.0 A in both cases (Figure S 19). Both of these variants have four bulky residues in
the main access tunnel that are not present in the wild-type enzyme and which serve to
restrict the tunnel's opening while making the cap domain more rigid. The C176F mutation
in the tunnel entrance was particularly important in narrowing the main pathway to the
active site in these two enzymes.
The modified access tunnel of DhaA106 is strikingly similar to that of wild-type DhaA. The
F176G mutation in DhaA106 created a void in the tunnel entrance that is not present in
DhaA63 and DhaA80, and enhanced the mobility of the cap domain's secondary elements.
The tunnel was detected in 86 % of the snapshots and was open in 48 % of them. The
average tunnel bottleneck radius was 1.4 A, with a maximum value of 2.8 A (Figure 20,
Table S7 in the Supporting information). The distance between helices ranged from 7.5 to
14.0 A (Figure S 19). The F176G mutation removed some of the contacts between residues
within the access tunnel of DhaA106, explaining its lower thermodynamic stability
compared to the template DhaA80. However, because DhaA106 retains three stabilizing
mutations (T148L, G171Q and A172V), it is much more stable than the wild-type enzyme.
140
<
03
CO
CD
<
03
O
CO
<
03
CD
O
<
03
MD1 M D 2
50 100 150
t/ns
50 100 150 200
t/ns
Bar III I i
50 100
t/ns
150 50 100
t/ns
I
. .,.,«„.11
) 50 100 150
t/ns
150 200
Lijl.ii. ii ,i iniii,i,.,j,,L
50 100 150 200
t/ns
200
Figure 2 0 . Visualization of the representative structures of the main tunnels in the cap domains of the
studied enzymes and changes in the tunnel bottlenecks over time. Left: PyMOL 1.5 visualizations of the cap
domain residues (shown as green cartoon) and the tunnel (indicated by the red spheres). The side chains of
the residues located at the 148, 171, 172, and 176 positions (i.e. the positions mutated in DhaA63) in each
enzyme are represented by sticks and the location of residue 176 in DhaA106 is indicated by a black arrow.
Right: The evolution of the bottleneck radius (BR) in two independent 200 ns long molecular dynamics (MD)
simulations. The black horizontal lines indicate the threshold radius (1.4 A) above which the tunnel was
considered to be open. The tunnels were analyzed using CAVER 3.01 2 9 9
.
141
6.4 Discussion
The development of new approaches for the rational engineering of stable catalysts that
retain catalytic activity is a key challenge in protein engineering3 0 0
. This work aimed to
improve the catalytic activity of a recently constructed highly stable and solvent-resistant
DhaA802 9 6
while minimizing losses of thermodynamic stability. To this end, the effects of
all possible substitutions in the targeted tunnel positions (F176 and V172) were evaluated
using the computational tool FoldX9 4
. A smart saturation mutagenesis library was then
constructed featuring enzyme variants incorporating every possible combination of the
predicted stabilizing or neutral access tunnel substitutions (library I). In addition, a second
library was constructed in which one of the target positions in the tunnel mouth (F176) was
randomized using site-saturation mutagenesis (library II). The best variant was DhaA106,
which was obtained by site-saturation mutagenesis and exhibited significantly enhanced
activity in the presence and absence of DMSO. The catalytic activity of DhaA106 towards
1,2-dibromoethane in buffer solution and 40% (v/v) DMSO was 32- and 10- times greater
than that of DhaA80, and its melting temperature (which reflects its thermodynamic
stability) was only 4 °C lower. DhaA106 also exhibited significantly enhanced activity
(relative to DhaA80) towards 26 of 29 additional halogenated compounds, showing similar
levels of activity to wild-type DhaA.
Sequencing of DhaA106 revealed that it contained a substitution that would be difficult to
design rationally. The variant carries small glycine residue in the tunnel mouth in place of a
bulky phenylalanine. Glycine contains a hydrogen atom as its side chain, giving it much
more conformational freedom than other amino acids. Its low steric demand means that
adjacent residues have much more flexibility than they would otherwise3 0 1 , 3 0 2
. Because
high flexibility is often associated with low stability in proteins, one common strategy for
enhancing their stability is to rigidify their most flexible regions6 4 , 6 6 , 3 0 3 - 3 0 5
. However, it is
necessary to maintain a balance between stability and flexibility in order to retain the
protein's biological functionality. Stability ensures an appropriate geometry for ligand
142
binding and prevents denaturation under physiological conditions, while flexibility is
necessary to allow catalysis at a metabolically appropriate rate3 0 6 - 3 0 8
.
In this work, replacing a bulky phenylalanine in the tunnel mouth of DhaA80 with the
smallest amino acid glycine led to a variant (DhaA106) in which the intramolecular
hydrophobic packing of the tunnel residues was partially disrupted, reducing the protein's
stability (AT"m = -4 °C). However, this also increased the flexibility and mobility of the two ahelices
lining the main tunnel, increasing the chance of the tunnel being in an open state.
Consequently, the variant was more catalytically active than the template. The main access
tunnel of DhaA106 was identified and found to be open in 86 and 48 %, respectively, of the
molecular dynamics snapshots that were analyzed; the corresponding values for the
template enzyme DhaA80 were only 1 and 0.02 %. The frequency of tunnel opening in
DhaA106 was comparable to that of DhaA, whose main tunnel was identifiable in 94 % of
its snapshots and open in 58 %. We hypothesize that reopening of the access tunnel
facilitates the admission of the substrate to the active site or the release of the product,
while the remaining three bulky and hydrophobic mutations in the access tunnel2 9 6
significantly increase the enzyme's thermodynamic stability relative to the wild-type (ATm
= 12 °C).
It has previously been shown that modifying the size, physico-chemical properties and
dynamics of access tunnels by protein engineering can change the catalytic activity,
substrate specificity, enantioselectivity and stability of h l D s 2 3 8
' 2 8 0
' 2 9 6
' 3 0 9
- 3 1 0
and many other
enzymes with buried active sites3 1 1
such as cytochrome P450s3 1 2 - 3 1 6
, |3-glucosidases317
,
lipases2 8 6
-3 1 8 - 3 2 2
, esterases3 2 3
, and epoxide hydrolases2 1 3
-3 2 4
-3 2 5
.
Tunnel mouth engineering was shown to have profound effects on the activity and
specificity of the enzyme LinB from Sphingobium japonicum UT262 8 0
. The residue L177,
located in the tunnel opening at the position corresponding to F176 in DhaA80, was
selected for saturation mutagenesis on the basis of structural and phylogenetic analyses.
The effects of the resulting mutations on the variants' catalytic activities greatly differed
for individual substrates2 8 0
. Similar findings have been reported for epoxide hydrolases
143
from Agrobacterium radiobacter AD13 2 5
and Aspergillus niger M2003 2 4
, in which the
engineering of a single amino acid in the tunnel mouth led to improved enzyme activity and
enantioselectivity. As with DhaA106, the catalytic activity of LinB variants was generally
increased by introducing a small non-polar amino acid at position 177, whereas the
introduction of bulky aromatic or charged residues dramatically reduced activity towards
most substrates, including 1,2-dibromoethane. The small side chain of the introduced
glycine residue in the tunnel mouth of the LinB L177G mutant was proposed to increase
the radius of the tunnel mouth, thereby facilitating substrate entry and product release.
Conversely, the bulkier side chain of the introduced tryptophan residue in the LinB L177W
variant presumably blocked the mouth of the enzyme's main access tunnel, reducing its
catalytic activity2 8 0
. A detailed analysis of 1,2-dibromoethane passage through the access
tunnel of the LinB L177W mutant confirmed that this mutation significantly reduced the
rate of product release3 2 6
. Moreover, mutation in position 177 of LinB significantly affected
its thermal stability3 1 0
. Similar observations were also reported for (3-Glucosidase from
Trichoderma reesei whose activity and stability were significantly affected by mutations in
the substrate entrance region3 1 7
.
The influence of the tunnel-lining residues on the catalytic activity of DhaA towards 1,2,3trichloropropane
(TCP) has been studied extensively2 2 1
'2 3 8
'3 0 9 , 3 2 7 , 3 2 8
. Independently
performed error-prone PCR experiments generated two double point mutants,
G3D+C176F2 2 1
and C176Y+Y273F3 2 7
whose activities towards TCP are 4- and 3.5-times
greater than that of the wild-type, respectively. Interestingly, both mutants carried bulky
residues (tyrosine or phenylalanine) in the 176 position whereas the wild-type enzyme has
a comparatively small cysteine residue in this position. The mutants thus have much
narrower access tunnel entrances3 0 9
. Molecular dynamics simulations and structure-based
enzyme design identified the 176 position and another four access tunnel residues as being
crucial for the activity of DhaA. Mutagenesis at these positions yielded a variant whose
catalytic activity and efficiency towards TCP were 32- and 26-times higher, respectively,
than those for the wild-type. This variant had bulky aromatic residues at four of the five
targeted positions, which restricted the access of water molecules to the active site cavity.
144
The rate-limiting step of TCP conversion in the resulting variant was shifted from carbonhalogen
bond cleavage to the release of the reaction products2 3 8
.
Sealing the access tunnel of DhaA with bulky residues has previously been identified as a
viable strategy for enhancing its thermodynamic stability and resistance to the organic cosolvent
DMSO2 9 6
. The introduction of four bulky hydrophobic residues into the access
tunnel yielded a DhaA variant (DhaA80) with a closed tunnel exhibiting enhanced
intramolecular packing. This modification prevents destabilization of the protein's
structure due to the admission of DMSO into the active site. Rigidifying and narrowing the
tunnel in this way shields the interior of the protein from the organic solvent but
presumably also makes the exchange of substrate and product molecules between the
active site and bulk solvent more difficult. Stabilizing the protein in this way therefore
reduces its activity, demonstrating the need to strike a careful balance between protecting
the buried active site from solvent molecule (which may cause denaturation or compete
with the desired substrate) and retaining sufficient flexibility for catalytic activity.
In this study, we showed that it was possible to create a DhaA variant with a superior tradeoff
between activity and stability relative to that seen in DhaA80. This was achieved by
replacing a sterically demanding phenylalanine residue at the mouth of the main access
tunnel with a small glycine residue. This substitution enhanced the flexibility of the two a helices
that form the tunnel but did not greatly reduce protein resistance to organic cosolvents
and tolerance towards elevated temperatures because other bulky hydrophobic
residues inside the tunnel were retained. A potentially viable alternative strategy for
balancing activity and stability in this case would be to introduce a molecular gate in the
access tunnel. Molecular gates are dynamic protein structures that regulate substrate
access to the active site and product release while preventing the access of undesirable
solvent molecules and synchronizing processes occurring in distant parts of a protein3 2 9
.
Introducing a gate should protect the enzyme against irreversible inactivation under harsh
conditions while maintaining good catalytic performance. Additionally, the auxiliary slot
145
tunnels could be subjected to optimization for further improving the stability of DhaA106
without significantly compromising its activity.
6.5 Conclusion
We have demonstrated that the catalytic performance of the thermodynamically robust,
but less active, HLD variant DhaA80 can be greatly enhanced by fine-tuning the geometry
and dynamics of its access tunnel. A single-point mutation (F176G) in the tunnel mouth
yielded the new variant DhaA106, which exhibits greater flexibility in the secondary
structure elements that form the access tunnel. The activities of DhaA106 towards 1,2dibromoethane
in buffer solution and in 40% (v/v) DMSO were 32- and 10-times greater,
respectively, than those of the template enzyme DhaA80. However, its melting
temperature (which reflects its thermodynamic stability) was reduced by only 4 °C. The
high stability of the template enzyme was preserved because DhaA106 retains three
previously introduced bulky residues in the tunnel interior that provide good hydrophobic
packing and prevent solvent molecules from accessing the active site. In addition to its
enhanced activity towards 1,2-dibromoethane, DhaA106 also exhibited enhanced activity
towards 26 out of 29 other halogenated compounds. These results suggest that a fine
balance between tunnel flexibility and tight hydrophobic packing, as well as a precisely
engineered tunnel diameter are important for HLD activity and stability. Tunnel residues
are thus good targets for modification when seeking to balance the activity and stability of
catalysts with buried active sites.
6.6 Experimental Section
Library design and predicting the effects of mutations on enzyme stability. The structure
of DhaA80 (PDB ID 4F60) was downloaded from the RCSB PDB database2 3 4
. The structure
was prepared for analysis by removing ligands and water molecules. Missing atoms in side
chains were added using the <RepairPDB> module of FoldX9 4
. The stability effects of all
possible double-point mutations in positions F176 and V172 of DhaA80 were estimated
using the FoldX <BuildModel> module9 4
. Two variants differing in the specified order of
146
mutations were considered for each double-point mutant (e.g. V172A, F176A and F176A,
V172A for the F176A+V172A mutant). Calculations were performed 5 times for each variant
following the recommended protocol (pH 7, temperature 298 K, ion strength 0.050 M,
VdWDesign 2). All stabilized (AAG < -1 kcal/mol) and neutral (-1 kcal/mol < AAG < 1
kcal/mol) mutants were selected and the frequencies of individual residues at target
positions were counted. Suitable degenerate codons for saturation mutagenesis were
chosen using the CASTER v2.0 program2 9 3
. The degenerate codons were selected to encode
all frequent residues from the double-point mutants without producing an excessively large
library.
Library construction. Saturation mutagenesis was performed using the QuickChange SiteDirected
Mutagenesis Kit (Agilent Technologies, Santa Clara, USA). Positions 172 and 176
of DhaA80 were saturated simultaneously using the following oligonucleotides (Sigma
Aldrich, St. Louis, USA): 5'-GCTTTCATCGAG CAAVTYCTCCCGAAAW KSGTCGTCCGTCCG
CTTACG-3' (forward) and 5'-CGTAAGCGGACGGACGACSMWTTT
CGGGAGRABTTGCTCGATGAAAGC-3' (reverse). Position 176 was independently saturated
using a pair of oligonucleotides (Sigma Aldrich, St. Louis, USA): 5'CGAGCAAGTGCTCCCGAAA
NNKGTCGTCCGTCCGCTTAC-3' (forward) and 5'-GTAAGCGGA
CGGACGACMNNTTTCGGGAGCACTTGCTCG-3' (reverse). The entire plasmid
pAQN::dhaA80His6 served as a template for PCR and was amplified according to the
manufacturer's protocol. PCR was performed using 50 pi reaction mixtures containing 10
ng of template DNA, 5 pmol of each oligonucleotide, and 0.2 mM dNTPs in Phusion HF
buffer with 1.5 mM MgCI2 and 1 U of Phusion DNA Polymerase. PCR proceeded under the
following conditions: 30 s at 95 °C, and then 18 cycles of 30 s at 95 °C, 60 s at 55 °C and 300
s at 68 °C; followed by 10 min at 72 °C. PCR products were then treated with the
methylation-dependent endonuclease Dpnl for 5 min at 37°. The resulting plasmids were
transformed into Escherichia coli XJb(DE3) cells (ZymoResearch, Orange, USA) using the
standard electroporation protocol3 3 0
. Ten candidates from each library were randomly
selected for sequencing.
147
Cultivation in microtiter plates (MTP) and preparation oflysates. MTP wells filled with 150
pi of Luria-Bertani (LB) medium with ampicillin added to a final concentration of 100 pg ml"
1
were inoculated with the single colonies using sterile tooth-picks. Four wells were
inoculated with E. coli XJB pAQN:: dhaA80His6 cells to serve as positive controls for basal
activity measurement and another four wells were inoculated with E. coli XJB carrying an
empty vector (pAQN) to serve as negative controls in the epPCR library screening. Cultures
were grown overnight at 37 °C at 200 r.p.m. After 14 hrs of cultivation (OD6oo= 0.4), 50 pi
of culture from each cultivation plates was added to 50 pi of 30% (v/v) glycerol in new 96well
plates to create a replica plate for storage. 100 pi of fresh LB medium with ampicillin,
L-arabinose at a final concentration of 3 mM and IPTG at a final concentration of 0.5 mM
were added to each well of the cultivation plate and incubated at 30 °C at 200 r.p.m. for 4
hrs. Cells were harvested and frozen at -80 °C.
Library screening. Library screening was performed using the modified pH colorimetric
assay described by Holloway et al.2 9 4
. The assay is based on the detection of the protons
produced during the dehalogenation reaction. After 10 min at room temperature, 50 pi of
the lysis buffer (1 m M HEPES, 20 mM Na2S04 and 1 m M EDTA, pH 8.2) was added to each
well of the defrosted plates. Cell debris was removed from the lysate by centrifugation at
1,600 g for 20 min after l h incubation at 100 r.p.m. at room temperature. 20 pi of lysate
was transferred into each well of a new MTP and 180 pi of assay buffer [52% DMSO (v/v),
1 mM HEPES, 20 mM Na2S04 and 1 mM EDTA, pH 8.2] containing 1,2-dibromoethane (DBE,
9.3 mM) was added. The substrate was incubated in the reaction buffer at 37 °C for 30 min
before starting the reaction. The MTP plate was sealed carefully with a lid and parafilm.
The reaction mixture was then diluted using a buffer solution containing the pH indicator
phenol red (1 m M HEPES, 20 m M Na2S04 and 1 m M EDTA, 50 pg ml"1
phenol red, pH 8.2)
for detection after 14 hrs of dehalogenation. The change in the color of the pH indicator
was estimated by spectrophotometry at 540 nm as described by Holloway et al.2 9 4
.
Expression and purification of proteins. Recombinant plasmids with the DhaA variants
were transformed into E. coli BL21(DE3). For overexpression, cells were grown at 37 °C to
148
an optical density (OD600) of about 0.6 in 1 L of LB medium containing ampicillin (100 ug ml"
1
). Protein expression was induced by adding IPTG to a final concentration of 0.5 mM in LB
medium and the temperature was decreased to 20 °C. Cells were harvested by
centrifugation for 10 min at 3,700 g after overnight cultivation. During harvesting, cells
were washed once with 50 mM phosphate buffer with 10 % glycerol (pH 7.5) and then
resuspended in equilibrating purification buffer (16.4 mM K2HPO4, 3.6 mM KH2PO4, 500
mM NaCI, 10 mM imidazole, pH 7.5). Harvested cells were kept at -80 °C. Defrosted cells
were disrupted by sonication with a Hielscher UP200S ultrasonic processor (Hielscher
Ultrasonics, Teltow, Germany) and C-terminus His-tagged enzymes were purified to
homogeneity using Ni-NTA Superflow Cartridges (Qiagen, Hilden, Germany) as described
previously2 9 6
. The eluted proteins were dialyzed against 50 mM phosphate buffer (pH 7.5).
Protein concentrations were determined using the Bradford reagent (Sigma-Aldrich, St.
Louis, USA) with bovine serum albumin as a standard. The purity of the resulting proteins
was checked by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) in 15% polyacrylamide
gels. The gels were stained with Coomassie brilliant blue R-250 dye (Fluka, Buchs,
Switzerland) and the molecular mass of the proteins was determined using the Protein
Molecular Weight Marker (Fermentas, Burlington, Canada).
Circular dichroism (CD) spectroscopy. CD spectra were recorded at 20 °C using a Chirascan
spectropolarimeter (Applied Photophysics, Leatherhead, United Kingdom) equipped with
a Peltier thermostat. Data were collected from 185 to 260 nm, at 100 nm min"1
with a 1 s
response time and 2 nm bandwidth using a 0.1 cm quartz cuvette. Each spectrum shown is
the average of five individual scans and was corrected for the buffer's absorbance.
Collected CD data were expressed in terms of the mean residue ellipticity. The thermal
unfolding of the enzymes was followed by monitoring their ellipticity at 222 nm during
heating from 20 to 80 °C at a rate of 1 °C min"1
, with a resolution of 0.1 °C. The resulting
thermal denaturation curves were roughly normalized to represent signal changes
between approximately 1 and 0 and fitted to sigmoidal curves using Origin 6.1 (OriginLab,
Northampton, USA). Melting temperatures (Tm) were calculated as the midpoints of the
enzymes' normalized thermal transitions.
149
Activity assay. Enzymatic activity was assayed using the colorimetric method developed by
Iwasaki et al.2 4 1
. The release of halide ions was analyzed spectrophotometrically at 460 nm
using a SUNRISE microplate reader (Tecan, Grodig/Salzburg, Austria) after reaction with
mercuric thiocyanate and ferric ammonium sulfate. The reactions were performed at 37 °C
in 25-ml Reacti-flasks closed by Mininert valves. The reaction mixtures contained 1,2dibromoethane
dissolved in 10 ml of 100 mM glycine buffer (pH 8.6), 10 ml of 60 mM
glycine buffer with 40% (v/v) DMSO or 10 ml of 48 mM glycine buffer with 52% (v/v) DMSO.
The reactions were initiated by addition of enzyme and monitored by periodically
withdrawing of 1 ml samples from the reaction mixture, immediately mixing them with 0.1
ml of 35% nitric acid to terminate the reaction, and analyzing the quenched samples
spectrophotometrically. Dehalogenation activities were quantified as rates of product
formation over time. Each activity was measured in 3 - 5 independent replicates and
expressed as a mean values with a standard error.
Principal component analysis. A matrix containing the activity data for nine wild-type HLDs
and two mutant HLDs with 30 substrates was analyzed by Principal Component Analysis3 3 1
.
The aim of the analysis was to uncover relationships between individual HLDs based on
their activities towards the standardized set of substrates2 2 3
. Principal Component Analysis
was performed using STATISTICA 10.0 (StatSoft, Tulsa, USA). The raw data were logtransformed
and weighted relative to the individual enzyme's activity towards other
substrates prior to performing Principal Component Analysis in order to better discern the
enzyme specificity profiles2 2 3
. These transformed data were used to identify substrate
specificity groups, i.e., groups of enzymes that exhibited similar specificity profiles
regardless of their overall specific activities.
Steady-state kinetics. Steady-state kinetic constants were determined at 37 °C in 25-ml
Reacti-flasks closed by Mininert valves using the method previously described by Iwasaki
et al.2 4 1
The reaction mixtures contained 1,2-dibromoethane dissolved in 10 ml of 100 mM
glycine buffer (pH 8.6) or 10 ml of 60 mM glycine buffer with 40% (v/v) DMSO. The activity
measurements were carried out using at least twelve different substrate concentrations
150
(0.2 - 20 mM). The initial concentration of 1,2-dibromoethane was determined by gas
chromatography using a Trace GC 2000 (Finnigen, San Jose, USA) equipped with a flame
ionization detector and a DB-FFAP 30 m x 0.25 mm x 0.25 pm capillary column (J&W
Scientific, Folsom, USA). The reaction was started by adding the enzyme. Samples were
periodically withdrawn over a 60 min measurement period, immediately quenched by
mixing with 0.1 ml of 35% nitric acid, and then analyzed. All data points corresponded to
the mean of 3 independent replicates. Kinetic parameters were determined by non-linear
curve fitting of the resulting data points using Origin 6.1 (OriginLab, Northampton, USA) by
the following equation for Michaelis-Menten kinetics (Chyba! Nenalezen zdroj odkazu.) a
nd Hill equation including substrate inhibition (Chyba! Nenalezen zdroj odkazu.)2 4 0 , 3 3 2
,
where Km is the Michaelis constant, /C0.5 is the substrate concentration at which halfmaximal
velocity is achieved according to the cooperativity model, n is the Hill coefficient,
Ks\ is the inhibition constant and kcat is the catalytic constant:
v [S]
Kim Km+[S]
Equation 17
[S]n
Kim K n n
0.5
V ^si J
Equation 18
CrystalIographic analysis. Crystals of DhaA106 were obtained by the vapor diffusion
method in a sitting drop at room temperature. Crystals were grown from the drop prepared
by mixing 2 pi of the protein (10.6 mg ml"1
in 50 mM Tris-HCI pH 7.5) with 2 pi of precipitant
solution (0.1M sodium acetate trihydrate pH 4.8, 0.2M ammonium acetate and 35% w/v
PEG 4000) and equilibrated against 300 pi of reservoir solution. Diffraction data were
collected at 100 K using a home-source X-ray diffraction station (rotating anode Nonius
FR591, Bruker-Nonius) equipped with a MAR345 detector (1.542 A monochromatic fixed
151
wavelength), at a resolution of 1.69 A. The diffraction data were processed using the XDS
program3 3 3
. The structure of DhaA106 was solved by the molecular-replacement method
using the program MOLREP3 3 4
and the structure of DhaA143 3 5
(PDB ID 3G9X) as the search
model. Model refinement was carried out using the program REFMAC 5 3 3 6
from the CCP4
package (Collaborative Computational Project, Number 4,1994), interspersed with manual
adjustments using Coot3 3 7
. The quality of the model with respect to the experimental data
was assessed using the program SFCHECK3 3 8
. All-atom contacts in the refined structure of
DhaA106 were validated using the internal tools of Coot3 3 7
and the MOLPROBITY service3 3 9
.
Preparation of protein structures for simulations. The structures of DhaA and DhaA80
were downloaded from the RCSB PDB database (PDB ID 4E46 and 4F60), while the structure
of DhaA106 was obtained within this study (PDB ID 4WCV). All structures were prepared
for analysis by removing ligands and water molecules. Missing atoms in side chains were
added using the <RepairPDB> module of FoldX9 4
. Repaired structures were minimized using
Rosetta's minimize_with_cst application. Both backbone and side chains optimization was
enabled, the distance for full atom pair potential was set to 9 A, and standard weights for
energy function with a constraint weight of 1 were used. The output of the minimization
process was processed using the script convert_to_cst_file.sh to create a constraint file8 8
.
Protocol 16 incorporating the backbone flexibility within the ddg_monomer module of
Rosetta was applied to create a model of DhaA63 with default settings8 8
. All four structures
were protonated using the H++ server at pH 7.53 4 0
. Water molecules from the respective
crystal structures were added to the systems. In the case of DhaA63, non-overlapping
water molecules from the crystal structure of DhaA80 were used. CI" and Na+
ions were
added to a final concentration of 0.1 M using the Tleap module of AMBER 123 4 1
. Using the
same module, an octahedral set of TIP3P water molecules3 4 2
was added such that all solute
atoms within the system were at least 10 A from the octahedron's surface.
Molecular dynamics simulations. Energy minimization and molecular dynamics (MD)
simulations were carried out using the PMEMD module of AMBER12 with the fflO force
field3 4 3
. Initially, the investigated systems were minimized by 500 steps of steepest descent
152
followed by 500 steps of conjugate gradient over five rounds with decreasing harmonic
restraints. The restraints were applied as follows: 500 kcal.mol^.A"2
on all heavy atoms of
the protein, and then 500, 125, 25 and 0 kcal.mol^.A"2
on backbone atoms only. The
subsequent MD simulations employed periodic boundary conditions, using the particle
mesh Ewald method to describe electrostatic interactions1 5 5 , 3 4 4
, a 10 A cut-off for nonbonded
interactions, and a 2 fs time step with the SHAKE algorithm to fix all bonds
containing hydrogens3 4 5
. Equilibration simulations consisted of two steps: (i) 20 ps of
gradual heating from 0 to 300 K at constant volume, using a Langevin thermostat with a
collision frequency of 1.0 ps"1
, and with harmonic restraints of 5.0 kcal.mol^.A"2
on the
positions of all protein atoms, and (ii) 2000 ps of unrestrained MD at 300 K using the
Langevin thermostat at a constant pressure of 1.0 bar using a pressure coupling constant
of 1.0 ps. Finally, two separate 200 ns long production MD simulations were run for each
system using the same settings as the second step of MD equilibration. Coordinates were
saved at intervals of 2 ps and the resulting trajectories were analyzed using the Cpptraj
module of AMBER12, and visualized using Pymol 1.5 (The PyMOL Molecular Graphics
System, Version 1.5.0.4 Schrodinger, LLC) and VMD 1.9.13 4 6
.
Tunnels analysis. Tunnels were analyzed using CAVER 3.012 9 9
.100,000 snapshots sampled
every 2 ps from 200 ns molecular dynamic simulations were used as input structures. Each
atom in the structure was approximated by 12+1 spheres. The tunnel search was
performed using a probe radius of 1.0 A and its opening (i.e. ability to accommodate water
molecules) was assessed using a 1.4 A probe; these values correspond to the program's
default settings. 100,000 randomly selected tunnels were clustered into 25 clusters using
hierarchical average link clustering with a clustering threshold of 5. The remaining tunnels
were assigned to individual clusters using supervised machine learning. The starting point
initially specified by ND2 atom of Asn 41, OD2 atom of Asp 106, NE1 atom of Trp 107 and
NE2 atom of His 272 was automatically optimized to prevent its collision with protein
atoms.
153
Acknowledgements
The work was supported by the Grant Agency of the Czech Republic (P207/12/0775) and
the Czech Ministry of Education of the Czech Republic (L01214 and LH14027) and the
European Regional Development Fund (ICRC CZ.l.05/1.1.00/02.0123). JB was supported by
the "Employment of Best Young Scientists for International Cooperation Empowerment"
(CZ.l.07/2.3.00/30.0037) project co-financed by the European Social Fund and the state
budget of the Czech Republic. MetaCentrum and CERIT-SC are acknowledged for providing
access to computing facilities (LM2010005 and CZ.l.05/3.2.00/08.0144). The authors
would like to express thanks to Tatsiana Holubeva for help with enzyme crystallization.
154
6.7 Supporting Information
Table S 19 Mutations of DhaA80 in the positions 172 and 176 with predicted neutral or stabilizing
effects.
F176 V172 AAG [kcal mo|-1
]M SDN Variant'01
W I -0.49 0.132
a
w I -0.48 0.049
b
F I -0.37 0.141
a
W V -0.04 0.076
a
F V -0.03 0.057
a
W V -0.01 0.069
b
F V 0.47 0.574
b
M I 0.54 0.080
a
M I 0.6 0.092
b
M L 0.65 0.238
a
L I 0.67 0.087
a
W L 0.74 0.740
a
F I 0.83 0.612
b
L I 0.84 0.071
b
M L 0.98 0.667
[a] AAG predicted by FoldX, [b] standard deviation of FoldX predictions,
[c] for each mutant, two variants differing in the order of introducing the mutations were
evaluated: a - variant with lower AAG, b - variant with higher AAG.
155
Table S 20 Summary of mutations in studied DhaA variants.
Position
Vdlldlll 78 80 148 171 172 176 227 240 291 292
DhaA D F T G A C N W P A
DhaA63 G S L Q V F T Y A G
DhaA80 D F L Q V F N W P A
DhaA106 D F L Q V G N W P A
Table S 21 Specific activity of DhaA variants with 1,2-dibromoethane in three different
environments.
Variant Aqueous buffer 40% DMSO 52% DMSO
[nmols1
mg1
] [nmols1
mg1
] [nmols~1
mg~1
DhaA 181.25 ±9.4 2.68 ±0.51 NOP
DhaA63 6.57 ±0.1 14.09 ±0.52 1.0 ±0.0
DhaA80 1.83 ±0.12 3.69 ±0.69 0.19 ±0.11
DhaA106 60.20 ±9.52 35.0 ±1.8 3.34 ±2.3
[a] ND, not detected
156
Table S 22. Specific activities of DhaA, DhaA80 and DhaA106 with the set of thirty halogenated
substrates.
No. Substrate
DhaA[al
Specific activity [nmols-1
DhaA80
•mg"1
]
DhaA106
4 1-chlorobutane 12.78 0.57 14.13
6 1-chlorohexane 6.46 2.78 7.84
18 1-bromobutane 11.62 1.40 12.18
20 1 -bromohexane 13.89 2.44 8.15
28 1 -iodopropane 22.79 1.75 25.85
29 1-iodobutane 14.84 1.33 11.41
31 1-iodohexane 12.00 1.39 6.93
37 1,2-dichloroethane 1.08 1.89 2.94
38 1,3-dichloropropane 21.78 0.62 30.54
40 1,5-dichloropentane 8.58 2.35 7.21
47 1,2-dibromoethane 181.25 1.83 60.20
48 1,3-dibromopropane 20.01 1.12 9.08
52 1 -bromo-3-chloropropane 22.19 1.23 16.78
54 1,3-diiodopropane 39.13 1.87 31.13
64 2-iodobutane 6.99 2.47 12.49
67 1,2-dichloropropane 0.00 0.88 0.00
72 1,2-dibromopropane 36.51 1.52 59.60
76 2-bromo-1 -chloropropane 19.45 NDl b l
43.74
80 1,2,3-trichloropropane 1.82 2.74 4.55
111 bis(2-chloroethyl)ether 9.10 3.04 25.91
115 chlorocyclohexane 0.70 1.34 1.02
117 bromocyclohexane 2.27 1.95 6.62
119 (1 -bromomethyl)cyclohexane 2.27 0.87 1.05
137 1 -bromo-2-chloroethane 74.90 2.72 44.83
138 chlorocyclopentane 5.29 0.87 7.34
141 4-bromobutyronitrile 39.63 2.55 57.01
154 1,2,3-tribromopropane 49.71 3.02 64.11
155 1,2-dibromo-3-chloropropane 45.08 2.66 57.56
209 3-chloro-2-methylpropene 15.48 1.44 12.82
225 2,3-dichloropropene 23.88 0.57 2.45
[a] Data from Koudelakova et al. 2013, [b] ND, not determined.
157
Table S 23. Diffraction data collection and refinement statistics of DhaA106.
X-ray diffraction data collection statistics
Space group P1
Cell parameters (A, °)
a = 42.585, b = 44.477, c = 46.508
a = 115.466, ß = 98.790, y = 109.122
Number of molecules in AU 1
Wavelength (A) 1.541790
Resolution (A) 1.69
Number of unique reflections 28416 (4407)
Redundancy 2.78 (2.73)
Completeness (%) 92.61 (89.3)
R M
'•merge
5.7(13.65)
Average l/a(l) 18.76 (9.16)
Wilson B (A2
) 14.578
Refinement statistics
Resolution range (A) 39.42-1.69(1.733-1.689)
No. of reflections in working set 26995 (1841)
R value (%)tb
' 12.015
R,ree value (%)[cI
17.573
RMSD bond length (A) 0.020
RMSD angle (°) 1.846
No. of atoms in AU 2,449
No. of protein atoms in AU 2,449
No. of water molecules in AU 477
No. of acetate ions in AU 2
No. of chloride ions in AU 1
Mean B value (A2
) 10.459
Ramachandran plot statistics
Residues in favored regions (%) 97.1 (303/312)
Residues in allowed regions (%) 100 (313/313)
PDB code 4WCV
The data in parentheses refer to the highest-resolution shell.
[a] Rmerge = I*ulili(hkl) - (/(M/J)|/Ehk|Ei li(hkl), where the h(hkl) is an individual intensity of the Ah observation of reflection hkl and
{I(hkl)) is the average intensity of reflection hkl with summation over all data, [b] R-value = ||F0| - |FC||/|F0|, where F0 and Fc are
the observed and calculated structure factors, respectively, [c] Rfree is equivalent to R value but is calculated for 5 % of the
reflections chosen at random and omitted from the refinement process.
158
Table S 24. Comparison of main and slot tunnels of DhaA variants calculated by CAVER 3.01 from
MD trajectories.
Enzyme Tunnel
Average Bottleneck
Radius [A]
Maximal
Bottleneck
Radius [A]
Average
Length [A]
Average
Curvature
Average
Throughput
DhaA
main
slot
1.52
1.15
3.07
2.19
12.01
17.18
1.20
1.40
0.68
0.49
DhaA63
main
slot
1.10
1.10
1.86
1.96
15.11
17.18
1.29
1.37
0.49
0.46
DhaA80
main
slot
1.10
1.06
1.75
1.52
11.88
19.64
1.29
1.47
0.53
0.39
DhaA106
main
slot
1.45
1.09
2.84
2.29
15.74
18.96
1.33
1.40
0.62
0.44
Table S 25. Characteristics of the main tunnel of DhaA variants calculated by CAVER 3.01 from
MD trajectories.
Enzyme Tunnel detected [%] Tunnel open [%] Average bottleneck radius [A] Maximum bottleneck radius [A]
DhaA 93.92 58.36 1.52 3.07
DhaA63 6.38 0.05 1.10 1.72
DhaA80 1.14 0.02 1.10 1.75
DhaA106 86.21 47.53 1.45 2.84
159
S-o
g o
5 x
15
10
5
0
-5
-10
DhaA
DhaA63
DhaA80
DhaA106
180 200 220 240
Wavelength I nm
260
Figure S 16. Far-UV circular dichroism spectra of DhaA variants.
a) b)
„« 0
SSG-III
DrbA
*
SSG-I
DhlA.
i LinE
DhaA
| „ DbeAi LinE
DhaA
DmbC
DatA*
SSG-II
^ ^ ^ ^ ^ ^
SSG-IV
DmbA
-6 -4 -2 0 2 4 6
f,
Figure S 17. Statistical analysis of the substrate specificity data, a) The score plot ti/ti from PCA with
transformed dataset. The score plot is a two-dimensional window into the multidimensional space, where
the objects (enzymes) with similar properties (specificity profiles) are collocated. The ti/ti score plot
describing 44.5 % of variance in the dataset shows the enzymes clustered in individual substrate specificity
groups (SSGs). DhaA106 was clustered to the same substrate specificity group as both DhaA and DhaA80. b)
The corresponding loading plot pi/pi from PCA with transformed dataset showing the main substrates for
each SSG. Numbering of the substrates is provided in Table S3.
160
Figure S18. The structure of DhaA106 determined by protein crystallography. Green cartoon represents the
secondary structure elements. The amino acid substitutions in the tunnel are shown as black sticks. The
tunnel calculated using CAVER 3.01 is shown in red.
161
MD1 MD2
0 50 100 150 0 50 100 150
t/ns f/ns
00
CD
<
CTj
<
200
O
CO
<
CO
CD
O
<
03
sz
Q
<
13
14
12
<
V 10 -
T3
50
I
100
r/ns
150
200
200
Figure S 19. Time evolution of the distance between Cot atoms of the residues 144 and 176 at the end of
the helices lining the main access tunnel. Horizontal black line represents the average distance in DhaA80
and DhaA63.
162
7 Site-Specific Analysis of Protein Hydration Based on Unnatural Amino
Acid Fluorescence
Mariana Amaro1
, Jan Brezovský2
-4
, Silvia Kováčova3 , 4
, Jan Sýkora1
, David Bednář2 4
, Václav
Němec3 , 4
, Veronika Lišková2
, Nagendra Prasad Kurumbang2
, Koen Beerens2
, Radka
Chaloupková2
, Kamil Paruch*3
'4
, Martin Hof*1
, and Jiří Damborský*2
-4
1
J. Heyrovsky Institute of Physical Chemistry of the ASCR, v. v. i., Academy of Sciences of the Czech Republic,
Dolejskova 3,182 23 Prague 8, Czech Republic
2
Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in
the Environment RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech
Republic
3
Department of Chemistry, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech
Republic
4
International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech
Republic
Journal of the American Chemical Society, 2015,137 (15), 4988-4992
DOI: 10.1021/jacs.5b01681
163
7.1 Abstract
Hydration of proteins profoundly affects their functions. We describe a simple and general
method for site-specific analysis of protein hydration based on the in vivo incorporation of
fluorescent unnatural amino acids and their analysis by steady-state fluorescence
spectroscopy. Using this method, we investigate the hydration of functionally important
regions of dehalogenases. The experimental results are compared to findings from
molecular dynamics simulations.
164
7.2 Introduction
Protein hydration is important in enzymatic catalysis3 4 7
since it influences enzymes'
kinetics3 4 8
and enantioselectivity3 4 9
, protein folding3 5 0
, ligand-binding, and DNA-protein
interactions3 5 1
. Hydration has been studied using X-ray absorption3 5 2
, NMR
spectroscopy3 5 3
, neutron scattering3 5 4
, dielectric relaxation spectroscopy3 5 5
, twodimensional
infrared spectroscopy3 5 6
, and time-resolved fluorescence spectroscopy3 5 7
. All
these techniques provide valuable information on the arrangement of water molecules in
the vicinity of specific protein moieties. However, they require costly and highly specialized
instrumentation. Here we present a new technique for studying protein hydration using
very basic laboratory instruments. Our approach is based on a recently developed method
for the site-specific incorporation of unnatural amino acids (UAAs) into protein
structures3 5 8
and the analysis of their properties using steady-state fluorescence (SSF)
spectroscopy. Additionally, we developed a more economical synthesis of l-(7hydroxycoumarin-4-yl)
ethylglycine, whose preparation had represented a bottleneck for
its wider use in protein labeling. Furthermore, for the first time a quantitative analysis of
the hydroxycoumarin fluorescence within a specific protein region is performed, followed
by newly established data deconvolution that enables us to estimate the level of protein
hydration. We demonstrate its effectiveness by characterizing the molecular environments
of UAAs in the tunnel mouths of two haloalkane dehalogenases and comparing these
experimental results to the output of molecular dynamics (MD) simulations and previous
reports1 6 6
-3 5 7
.
7.3 Results and Discussion
To deliver an UAA into a specific site encoded by a nonsense codon, the tRNA/aminoacyltRNAsynthetase
pair needs to be constructed. In this project we utilized the l-(7hydroxycoumarin-4-yl)
ethylglycine3 5 9
, which contains the fluorophore 7-hydroxy-4methylcoumarin
(7H4MC). The photophysics of 7H4MC is environment-sensitive3 6 0 , 3 6 1
.
165
Three forms of 7H4MC - neutral, anionic, complexed - exist in equilibrium (Figure S 20A),
each with different excitation maxima. In addition, a tautomeric form can be created in the
excited state via proton transfer between the spatially separated carbonyl and hydroxyl
groups (Figure 21A, Figure S 20B). Accordingly, the SSF spectra of all four excited state
forms are shifted in respect to each other and thus their contributions to the overall signal
can be separated. As the equilibria (both in the ground and excited states) are governed by
the hydration state of the dye, information on the contributions of the corresponding
microenvironments to the SSF spectra can be gained.
0.4 1.0 2.7 7.4 20.1 280 300 320 340
ln<w0) Temperature (K)
Figure 21. (A) General reaction scheme of 7H4MC in the excited state. Neutral, anionic and complexed (for
instance with an adjacent amino group) forms of the fluorophore can exist and/or coexist. Proton transfer
(reaction III) can additionally take place, creating a tautomeric form. This proton transfer has been shown to
be promoted by the presence of 'structured water'3 6 1
. In bulk water, anion formation (Reaction I) prevails
because the pKa* of the excited state is below 1.53 6 1
. The numbers accompanying the rippled arrows
166
correspond to the maximum of the emission wavelengths of each particular 7H4MC form. Red reaction
arrows highlight the dominant pathways in the excited state. (B) Dependence of the hydration parameter on
water content (wo) in AOT micelles. The graph illustrates the increase in the magnitude of the fluorescence
emission of the anionic and tautomer forms of 7H4MC (the sum of the decomposed spectrum areas emitted
from the two forms) with the increasing amount of water added to the reverse AOT micelles (wo). Imidazole
was added in order to mimic -NH+
- and -NH- functional groups present in the protein matrix, resulting in the
formation of the complex form. The areas were gained by the decomposition of the steady-state emission
spectra. (C-D) Emission spectra of UAA incorporated into for DhaA:C176UAA and DbjA:G183UAA, respectively.
The spectra were recorded at an excitation wavelength of 320 nm. The black curve represents the recorded
emission spectrum; the green, grey, blue and orange curves represent its decomposition into the neutral,
complexed, anionic and tautomeric components, respectively. The red curve stands for the result of the fit.
Panel (E): Hydration parameter upon thermal denaturation. The temperature dependence of the hydration
parameter (AA
+AT
) was measured for both herein investigated proteins. Tm values given in the graph legend
corresponds to the melting temperature, which is the temperature at which 50% of the protein becomes
unfolded.
To demonstrate how the fluorescence of 7H4MC can provide information on the extent of
hydration of its vicinity, and specific interactions between the fluorophore and proton
donors or acceptors, we decomposed its SSF spectra in docusate sodium (AOT) reverse
micelles. At low watensurfactant molar ratios (wo), water molecules associate strongly with
the polar heads of the surfactant molecules, forming 'structured water'. As wo increases,
'bulk' water is created inside the reverse micelles. Studies on the photophysics of 7H4MC
in reverse micelles3 6 0
'3 6 1
have shown that in the absence of water (wo~ 0) the dye fluoresces
mainly from its neutral form. Stepwise addition of water causes an increasing contribution
from the tautomeric form, with the anionic form also becoming important as wo increases
further. The relative contributions of these particular 7H4MC forms can be determined by
decomposing the corresponding SSF spectra (Figure S 21). At wo ~ 0, only the neutral form
(emission maximum wavelength ~ 380 nm) is present (Table S 26, Figure S 21A). When wo
rises above 10, the tautomer band (~ 480 nm) appears and gradually becomes more
intense, indicating that the dye is surrounded by an increasing number of water molecules
bound to the detergent's polar headgroups. Note that even at very low wo values, the
complexed form (~ 420 nm) can be detected if proton donor or acceptor molecules are
167
added to the AOT reverse micelles (Table S 27). Further increases in the water content
produce bulk water inside the reverse micelles, sharply increasing the contribution of the
anionic band (~ 450 nm; Figure S 21C). The anionic form is the dominant fluorescing species
in bulk water. We utilized this sensitivity of 7H4MC photophysics and determined the sum
of the contributions of the anionic and tautomer forms to the total emission spectrum
(quantified by parameter AA
+AT
), whose formation is conditioned by the presence of water.
Indeed, AA
+AT
gradually rises with increasing water content wo for three different AOT
systems (Figure 21B, Table S 26 and Table S 27). This parameter can therefore be used as
an indicator of the extent of hydration.Prior to testing this approach in proteins, the
synthesis of the UAA containing the 7H4MC fluorophore had to be optimized since gram
quantities of UAA were required for the cultivation of recombinant E. coli. We realized that
the UAA is poorly soluble in the mobile phase previously used for HPLC purification (< 400
mg/L).18 Therefore, we developed a new, significantly more economical protocol that
avoids the use of HPLC and utilizes ion exchange chromatography followed by precipitation.
The fluorophore was inserted at specific locations within the sequences of two haloalkane
dehalogenases DhaA:C176UAA and DbjA:G183UAA. The selected enzymes are functional
variants derived from haloalkane dehalogenases DhaA from Rhodococcus rhodochrous
NCIMB13064331 7 2
and DbjA from Bradyrhizobium japonicum USDA1103 6 2
. Circular
dichroism analysis (Figure S 22) and activity testing (Table S 28) confirmed that introduction
of the UAA did not disrupt the structure and the catalytic function of the enzymes. Although
both enzymes catalyze nucleophilic substitution via the same mechanism their substrate
specificity and enantioselectivity differ substantially1 6 6
. This is attributed to the different
architectures of their active sites as well as access tunnels. The fluorescent UAA was
inserted at the position 176 and 183 of DhaA and DbjA, respectively, to probe the hydration
of their tunnels (Figure 22A) which are known to be important for enzyme function.3,13
Specifically, DhaA possesses a narrower tunnel and at the same time exhibits significantly
lower enantioselectivity to |3-bromoalkanes than DbjA.13 Recently, we have demonstrated
that hydration and dynamics of the tunnel mouth are important determinants of DbjA
enantioselectivity towards |3-bromoalkanes.3 Thermodynamic analysis revealed that
168
enantiodiscrimination of P-bromoalkanes by DbjA and DhaA is differently influenced by
individual thermodynamic contributions - differential activation enthalpy (AR-SAH*) and
entropy (AR-SAS*). While the resolution of 2-bromopentane by DbjA is driven almost equally
by both enthalpic and entropic terms (where the entropic term represents 83% of the
enthalpic term) the DhaA enantioselectivity is clearly dominated by enthalpy (Table 7). In
both cases, the preferred (/?)-enantiomer is favored by the enthalpy while the nonpreferred
(S)-enantiomer is favored by the entropy. Interestingly, the enthalpic and the
entropic component of DbjA enantioselectivity are four times and seven times greater in
absolute values, respectively, than DhaA.
Table 7. Enantioselectivity and its thermodynamic components for the kinetic resolution of 2bromopentane
by haloalkane dehalogenases DhaA and DbjA.
Enzyme E value 298 K T A r . S A S * 2 9 8 K
A r . S A G * 2 9 8 K
Enzyme E value 298 K
fkj mol-'l [kl m o H fkj mol-'l
DhaA 20 -15-37 -8.22 -7-i5
DbjAa
132 -69.51 -57-78 -11.73
E value - enzyme enantioselectivity, AR.SAG* - differential free energy of activation between (R)- and (S)enantiomer
of 2-bromopentane (AR-SAG* = A R - S A H * - TAR.SAS*) and its enthalpic (AR-SAH*2
S>8K
) and entropic ( T A R .
S A S * 2
9 8 K
) terms. a
Data from Prokop et al.'3
169
DhaA:C176UAA DbjA:G183UAA
Figure 22. Local environment and hydration of the UAAs in DhaA:C176UAA and DbjA:G183UAA. (A)
Positions of the UAAs (indicated by the red surface patches) and the tunnel mouths (indicated by black
arrows) on the surface of the enzymes. (B) Local interactions of UAAs with hydrogen acceptors and amino
groups (purple sticks). The buried (UAA_3) and exposed (UAA_4) conformations of the UAA in DbjA:G183UAA
are indicated by green and yellow sticks, respectively. The conformation of the UAA in DhaA:C176UAA is
indicated by the orange stick model. (C) Hydration of UAAs. The blue surface represents enzyme regions that
are occupied by water molecules for at least 40 % of the total simulation time.
For the decomposition of the SSF spectrum into its separate components, the excitation
wavelength was set to 320 nm in order to predominantly excite the neutral form of the
fluorophore. The neutral form then populates the complexed, anionic and tautomeric
forms (Figure 21A). As shown in Figure 21C, the fluorescence emission of DhaA:C176UAA
originates from the neutral, complexed and anionic forms. The tautomer contributes only
marginally to the emission spectrum. Conversely, significant tautomer emission was
observed for DbjA:G183UAA (Figure 21D). The formation of the tautomer in DbjA:G183UAA
indicates that the chromophore is exposed to 'structured water', i.e. water molecules with
residence times longer than the fluorescence time scale. The presence of the tautomer was
also confirmed by time-resolved fluorescence spectroscopy (Figure S 23). The decay curve
170
contained a negative component due to the population of the tautomer from the excited
neutral species. Such particular feature of the fluorescence decay was not observed for
DhaA:C176UAA. Because the tautomeric and anionic species are only formed in aqueous
environments, their summed contributions to the overall signal reflect the hydration level
within the dye's microenvironment. The anionic and tautomeric forms account only for
50% of the emission signal of DhaA:C176UAA. However, 70% of the emission signal of
DbjA:G183UAA can be attributed to these forms (Figure 21E, Table S 29). This indicates that
the microenvironment surrounding the UAA in DhaA:C176UAA is much less extensively
hydrated tha n in DbjA:G183UAA. To further validate the approach, we thermally
denaturated the enzymes. The UAA originally buried within the protein interior becomes
exposed to more hydrated microenvironment upon the denaturation process. As expected,
for both enzymes the increasing temperature causes an increase in the contribution of the
anion at the expense of the others forms, leading to the growth of AA
+AT
parameter (Figure
21E, Tables 29).
Replicated 200 ns MD simulations were performed for both labelled and wild type
structures of studied enzymes. The periodic-boundary NPT simulations were carried out in
AMBER12 (University of California, San Francisco, 2012) using fflO force field (Table S 30,
Figure S 24)3 4 3
'3 6 3
'3 6 4
. The level of hydration within the tunnel mouth of the wild type
enzymes was about 1.7-times higher in DbjA than in DhaA (Table 8). The MD simulations
show that the UAA is in one dominant conformation in the tunnel of DhaA:C176UAA (Figure
S 25). Conversely, the wider tunnel mouth of DbjA:G183UAA allowed the UAA to adopt two
equally relevant stable conformations, one buried into the tunnel (UAA_3) and one more
exposed (UAA_4; Table S 31, Figure S 25). The simulations suggested that the UAA
microenvironment is substantially less hydrated in DhaA:C176UAA than in both variants of
DbjA:G183UAA (Table 8, Figure 22C), which is consistent with our experimental results.
Additionally, the simulation indicated that the fluorophore often forms hydrogen bonds
with adjacent amino acid residues -Thrl48 in DhaA:C176UAA and Glul46 in DbjA:G183UAA
- , and with amino acids whose side chains contain amino groups - His272 for
DhaA:C176UAA and Argl79 or Hisl39 for DbjA:G183UAA - (Figure 22B, Table S 32). This
171
may facilitate the formation of the complexed form detected in our experiments. Finally,
the simulations indicated that water molecules in the microenvironment of the buried
conformation of DbjA:G183UAA have residence times of up to 60 ns, which is much longer
than the fluorescence time scale (Figure S 26). This explains the strong contribution of the
tautomeric form for this mutant. Such extremely long residence times are not found in
structures without UAA (Figure S 26). This suggests that the "structured water" detected in
DbjA:G183UAA is induced by the UAA itself locking several water molecules within the
active site pocket.
Table 8. Hydration of haloalkane dehalogenases DhaA, DbjA, and DhaA:C176UAA and
DbjA:G183UAA revealed by fluorescence spectroscopy and molecular dynamics simulations.
Experimental parameters and results are presented in normal text; parameters and results
obtained from MD simulations are presented in italics.
Parameter DhaA:Ci76UAA DbjA:Gi83UAA DhaA DbjA
Overall contributions
of anionic and
tautomeric forms to the
emission SSF spectra
UAA less hydrated UAA more hydrated
Less hydrated
tunnel mouth*
More hydrated
tunnel mouth*
Number of water
molecules within 5 A
9±2 2j±5 or 28±4 i6±4 27±4
*The enzymes have been characterized previously.11
The used methodology was different from the SSF method.
7.4 Conclusions
Recent findings emphasizing the importance of dynamics and hydration in enzymatic
catalysis and rational protein design3 4 8
-3 4 9
have created a strong demand for methods that
provide site-specific information on these factors. In our approach, site-specificity is
guaranteed by using UAA. SSF spectroscopic analysis of the hydroxycoumarin probe
incorporated into the structure of the enzymes is a strikingly simple and universally
applicable experimental procedure. The photophysics of the UAA provide qualitative
information on the extent of hydration as demonstrated for two HLDs with already
characterized hydration levels3 5 7
. Although MD simulations show that incorporation of the
chromophore can influence the residential times of water molecules, the conclusions on
172
the hydration levels are valid. Given the ongoing development of UAA technology, this
method could potentially be used to analyze hydration at specific sites in a wide range of
proteins.
Acknowledgement
Financial support from the Czech Science Foundation via grants P208/12/G016 (M.H. and J.S.) and
P207/12/0775 (R.Ch.) and the Ministry of Education of the Czech Republic (L01214;
CZ. 1.05/1.1.00/02.0123) is acknowledged. Moreover, M.H acknowledges the Praemium Academie
Award from Academy of Sciences of the Czech Republic. The work of J.B. and K.B was supported by
Program of "Employment of Best Young Scientists for International Cooperation Empowerment"
(CZ1.07/2.3.00/30.0037) with co-financing from the European Social Fund and the state budget of
the Czech Republic. MetaCentrum is acknowledged for providing access to their computing
facilities, supported by the Ministry of Education of the Czech Republic (LM2010005). CERIT-SC is
acknowledged for providing access to their computing facilities, under the program Center CERIT
scientific Cloud (CZ.1.05/3.2.00/08.0144).
173
7.5 Supplementary Information
7.5.1 Methods
Chemical synthesis of unnatural amino acid (UAA)
(2S)-(l-benzyl 7-ethyl 2-{[(benzyloxy)carbonyl]amino}-5-oxoheptadioat
Carbonyldiimidazole (6.00 g, 37.00 mmol) was added in portions to (2S)-5-(benzyloxy)-4{[(benzyloxy)carbonyl]amino}-5-oxopentanoic
acid (12.50g, 33.66 mmol) in anhydrous THF
(120 mL) and the mixture was stirred under nitrogen at 25 °C for 90 min. Potassium ethyl
malonate (5.50 g, 32.32 mmol) and magnesium chloride (6.00 g, 63.02 mmol) were then
added and the mixture was stirred at 25 °C for 14 hr. The reaction mixture was poured into
water (500 mL) and extracted with diethyl ether (3x400 mL). The combined extracts were
washed with saturated aqueous solution of NaHC03 (200 mL), dried over Na2SÜ4, filtered,
and the solvent was evaporated. The product was obtained as a pale yellow solid (14.34 g,
96 %) and was used directly in the next step without additional purification.
Analytically pure sample can be obtained by flash column chromatography on silica gel
(hexane, EtOAc/ 1:3).
X
H NMR (300 MHz, CDCIs): 6 1,26 (t, 3H); 1,86-2,08 (m, 1H); 2,11-2,28 (m, 1H); 2,48-2,76
(m, 2H); 3,35 (s, 2H); 4,17 (q, 2H); 4,30-4,49 (m, 1H); 5,10 (s, 2H); 5,16 (s, 2H); 5,38 (d, 1H).
174
L-(7-hydroxycoumarin-4-yl)ethylglycine
Methanesulfonic acid (19.7 mL) was added at 0 °C (ice bath) to a mixture of (2S)-(l-benzyl
7-ethyl 2-{[(benzyloxy)carbonyl]amino}-5-oxoheptadioat (4.90 g, 11.11 mmol) and
powdered resorcinol (6.12 g, 55.55 mmol). The mixture was stirred at 0 °C for 5 min, then
it was allowed to warm to 25 °C and stirred for 15 hr. The resulting red solution was poured
onto crushed ice (100 g) and water was added (100 mL). The mixture was extracted with
diethyl ether (3x70 mL). The strongly acidic aqueous phase was loaded onto a column of
Dowex 50WX8 (Aldrich, 100 g, diameter 4 cm, height 12 cm). The column was washed with
water (800 mL) and then eluted with 1M aqueous solution of NaOH (total elution time: ca.
30 min). The dark red fractions were collected and concentrated aqueous HCI was added
to adjust pH to 6. The solution was concentrated to % of the volume and allowed to stand
in at 4 °C overnight. The red precipitate was collected by filtration, mixed with 4 mL of icecold
water and the mixture was quickly filtered. The precipitate on the filter was collected
and dried in a vacuum yielding a pink solid (1.97 g).
The solid was suspended in anhydrous EtOH (200 mL), stirred at 70 °C for 15 min, and the
precipitate was collected by filtration. The precipitate was suspended in anhydrous EtOH
(200 mL), stirred at 70 °C for 15 min, and filtered. The solid was collected by filtration and
dried in a vacuum to yield a pale pink solid (0.59 g, 19 %).
X
H NMR (500 MHz, DMSO-de): 6 1,91-2,10 (m, 2H); 2,77-2,95 (m, 2H); 6.08 (s, 1H); 6,73 (d,
2H); 6,85 (d, 1H); 7,70 (d, 1H).
"C^H} NMR (125 MHz, D2 0): 6 25,03; 27,07; 29,31; 54,34; 103,15; 109,11; 11,92; 114,03;
126,22; 128,70; 128,80; 173,97.
175
Plasmids, strains and chemicals
The plasmid (pEVOL-aaRS) carrying the engineered orthogonal tRNA and aminoacyltRNA
synthatase pair was obtained from Professor Peter Schultz (The Scripps Research
Institute, USA). Two target proteins, haloalkane dehalogenase DhaA and DbjA, were cloned
in the pET21b vector for expression with a HiS6-tag to facilitate purification. Escherichia coli
DH5a and E. coli BL21 (DE3) (Strategene) were used as regular cloning host and for protein
expression, respectively. All antibiotics used and L-arabinose were obtained from SigmaAldrich
and filter sterilized before use.
Construction of mutants by site directed mutagenesis
The corresponding codons for C176 of DhaA and G183 of DbjA were replaced by the
amber stop codon (TAG) by inverse PCR. 5'-phoshorylated primers were designed based on
the DhaA and DbjA nucleotide sequence and obtained from Sigma-Aldrich. Primer
sequences were the following: DhaA-C176-for: 5'-TACGGAGGTCGAGATGGACCACTATCG-3'
and DhaA-C176-rev: 5'-AGCGGACGGACGACCTATTTCGGG-3' for mutation of Cysl76 in
DhaA; and DbjA-G183-for: 5'-GCTCGGCGACGAAGAAATGGCG-3' and DbjA-G183-rev: 5'TTGCGGACGATTCCCTAGGGCAGAAC-3'
for mutation of G183 in DbjA. Briefly, 0.8 pi Pfu DNA
polymerase (Promega), 5 pi of Pfu DNA polymerase buffer with MgSCM, 4 pi of dNTP mix
2.5mM each, 1 pi of each primer with 1 pM concentration, 1 pi of plasmid template (pET21dhaA
or pET21-dbjA) and 37.2 pi of sterile water were added in a PCR tube. The
thermocycler was set at 95°C for 2 min for denaturation, and 30 cycles of 95°C for 1 min for
multiplication, 58°C for 30 s for annealing and 72°C for 12 min for elongation, followed by
72°C for 5 min for final extension. The PCR products were treated with 1-2 pi Dpnl (New
England Biolabs) for 2 h at 37°C and then purified by using a PCR purification kit (Qiagen).
Blunt end ligation was performed at 16°C overnight by adding 1 pi T4 DNA ligase and buffer
(Promega) in 10 pi purified PCR sample. Competent E. coli DH5a were chemically
transformed with the ligation product. Colonies were selected on LB agar plates
supplemented with 100 pg/ml of ampicillin. Plasmid DNA was isolated and the mutations
177
were confirmed by sequencing. The final plasmids were named pET21-dhaA:C176TAG and
pET21-dbjA:G183TAG (Cysl76 and Glyl83 replaced by the TAG stop codon, respectively).
Transformation and expression of protein variants
The expression host E. coli BL21 (DE3) was chemically transformed with pEVOL-aaRS and
the respective mutant plasmid. The colonies were selected on LB agar plates containing
ampicillin (100 pg/ml) and chloramphenicol (34 pg/ml). A single colony from each mutant
was picked and cultivated in 10 mL LB supplemented with ampicillin and chloramphenicol.
Fresh overnight culture (1 ml) was added to 1 L of LB medium with the appropriate
antibiotics and grown at 37°C, 105 rpm until OD600 reached ~0.5. Cells were harvested
aseptically by centrifugation at 4°C and the supernatant was discarded. Cell pellets were
resuspended in 1 L fresh LB containing the required antibiotics and 10 ml of UAA solution
was added for UAA feeding. For the UAA solution, 263 mg of UAA (containing the 7H4MC
fluorophore) was dissolved in 10 ml of 100 mM KOH, pH was adjusted to 7.0 by HCI and
then filter sterilized. The cultures were incubated again at 37°C, 105 rpm until the OD600
reached 0.8-1.0. The samples were then cooled to 20°C and expression was induced by
addition of IPTG and L-arabinose (0.5 mM and 0.02 % (w/v) final concentration,
respectively). Incubation was continued overnight at 20°C and finally the cells were
harvested by centrifugation. The UAA labeled protein was purified via standard His-tag
purification. Expression and purification were checked via SDS-PAGE.
Confirmation of protein structure by circular dichroism (CD) analysis
CD spectra were recorded at room temperature using a Chirascan spectrometer (Applied
Photophysics, UK). Data were collected from 185 to 260 nm (at 100 nm/min, 1 s response
time and 2 nm bandwidth) using a 0.1 cm quartz cuvette containing the enzymes in 50 mM
potassium phosphate buffer (pH 7.5). Each spectrum shown is the average of five to ten
individual scans and is corrected for absorbance caused by the buffer. CD data were
expressed in terms of the mean residue ellipticity (OMRE) using the Equation 19:
178
Equation 19
where 0O bs is the observed ellipticity in degrees, M w is the protein molecular weight, n is
number of residues, I is the cell path length (0.1 cm), c is the protein concentration and the
factor 100 originates from the conversion of the molecular weight to mg/dmol.
Confirmation of protein function by determination of specific activity
Enzymatic activity was assayed by the colorimetric method developed by Iwasaki and coworkers2
4 1
. The release of halide ions was analyzed spectrophotometrically at 460 nm using
the Sunrise microplate reader (Tecan, Austria) after reaction with mercuric thiocyanate and
ferric ammonium sulfate. The dehalogenation reaction was performed with 1,2dibromoethane
as a substrate at 37 °C in 25 ml Reacti-flasks closed by Mininert valves. The
reaction mixture contained 10 ml of glycine buffer (100 mM, pH 8.6) and 10 pi of the
substrate 1,2-dibromoethane. The reaction was initiated by addition of the enzyme in a
final concentration of 0.15 pM. The reaction was monitored by withdrawing 1 ml samples
from the reaction mixture at periodic intervals and immediately mixing the removed
sample with 0.1 ml of 35% nitric acid to terminate the reaction. Dehalogenation activity
was quantified as the rate of product formation in time.
Thermodynamic analysis of enzyme enantioselectivity
Temperature dependence of DhaA enantioselectivity (E value) was analyzed in 25-ml Reacti-Flasks
closed by Mininert Valves containing 25 ml of 50 mM of Tris-sulfate buffer (pH 8.2) and 10 pi of
racemic 2-bromopentane. The enzymatic reaction was initiated by the addition of appropriate
amounts of enzyme. Depending on its activity, final concentration of enzyme was 0.7-3.7 u.M. The
reaction was monitored by periodically withdrawing 0.5 ml samples from the reaction mixture. The
reaction was stopped by mixing the sample with 1 ml of diethyl ether containing 1,2-dichloroethane
as an internal standard. Diethyl ether was anhydrated on a glass column with sodium sulphate after
the extraction. The samples were analyzed using Hewlett-Packard 6890 gas chromatograph
179
(Agilent, USA) equipped with a flame ionization detector and chiral capillary column Chiraldex G-TA
(Alltech, USA).
The difference in activation enthalpy and entropy between enantiomers, denoted AR.SAH* and
AR-SAS*, respectively, was determined by studying the variation of the enantiomeric ratio with
temperature according to Equation 20:
, _ A R S A H * 1 A R S A 5 J
ln£ = — ^ + _ R = S
R T R
Equation 20
where R is the universal gas constant and T is absolute temperature. InE varies linearly with
the reciprocal temperature therefore AR-SAH*/R and AR-SAS*/R were determined as the
slope and intercept of the determined variation of the enantiomeric ratio, respectively.
Steady-statefluorescence spectroscopy
Steady-state fluorescence (SSF) spectra were recorded on Fluorolog-3 spectrofluorometer
(model FL3-11; HORIBA Jobin Yvon) equipped with a Xenon-arc lamp. All spectra were
collected in 1 nm steps (1 or 2 nm bandwidths were chosen for both the excitation and
emission monochromators depending on the signal strength) at 10 °C. The recorded
spectra were then fitted by means of nonlinear least-square procedure to the sum of
asymmetric peak functions, which are expressed as:
y = y0 + Ae exp
x — x.
\ w
J
+ 1
Equation 21
where yo stands for the offset, and xc , w and A represent the center of the band, its width
and amplitude, respectively. The above fitting function was chosen because it provided the
best fit results to the emission spectra of the individual 7H4MC forms (whose spectra was
obtained via dissolution in different solvents). The initial estimations for the parameter xc
were set to the values of 380 nm, 420 nm, 450 nm and 480 nm which correspond to the
180
wavelengths of the emission maximum of the particular forms of 7H4MC. The xc parameter
was kept within the range of ± 8 nm with respect to the initial value during the fitting
procedure. The R-square parameter provided by the software OriginPro 8 (OriginLab
Corporation) was taken as a measure for the goodness of the fit.
Time-resolved fluorescence decays were measured using the time-correlated single photon
counting technique on an IBH 5000 U SPC instrument (HORIBA Jobin-Yvon, USA) equipped
with a cooled Hamamatsu R3809U-50 microchannel plate photomultiplier (Hamamatsu,
Japan) with 40 ps time resolution and time setting of 7 ps per channel. Bandwidths for both
the excitation and emission monochromators were set to 8 nm. A 399 nm cut-off filter was
used to eliminate scattered light. Samples were excited at 373 nm with an IBH NanoLED-
11 diode laser (80 ps fwhm) or at 340 nm with IBH NanoLED N-340 (900 ps fwhm) with a
repetition frequency of 1 MHz. The detected signal was kept below 20 000 counts per
second in order to avoid shortening of the recorded lifetime due to the pile-up effect. The
experimental temperature was set to 10 °C. Fluorescence decays were fitted (by iterative
reconvolution procedure with IBH DAS6 software) to a multiexponential function (Equation
22) convoluted with the experimental response function IRF ("prompt"), yielding sets of
lifetimes v, and corresponding amplitudes Ai. The average lifetimes <x> were calculated
according to Equation 23.
7(0 = 2 4 « "/Ti
®IRF
Equation 22
Equation 23
181
Parameterization ofUAA residue for force field calculations
The structure of the UAA residue was constructed in the extended and a-helical
conformations of backbone atoms using Avogadro 1.0.3 program3 6 5
. For both backbone
conformations, the lowest energy conformations of the UAA side-chain were identified
with systematic rotor search keeping the backbone part of the UAA residue fixed. The UAA
residue was then capped with N-methylamide (NME) and acetyl (ACE) residues for the
purpose of charge fitting. The geometry of the selected conformations (UAA1-4; Figure S
24) was optimized employing the MP2/6-31G* wave function using Gaussian09 program
revision D.013 6 6
. The partial atomic charges of the novel residue were obtained using RESP
ESP charge derive (R.E.D.) server 2.03 6 7
'3 6 8
with HF/6-31G* level of theory using the
Gaussian09 program. The charges on the UAA residue were derived employing RESP-A1A
charge model using multi-conformation multi-orientation RESP fit. The charges on the
capping NME and ACE residues were constrained to zero during the fitting procedure. The
charges on the four atoms forming the peptide bond (N26, H27, C28 and 029) were
constrained to the corresponding values of the electro-neutral residues from the force field
of Cornell et al.3 6 9
The atom types were derived in analogy with the force field of Cornell et
al. The only exception is the aromatic oxygen contained in the moiety of fluorescent probe
whose parameters were obtained from the study of VanBeek et al3 7 0
. Partial charges and
atom types of the UAA residue are summarized in Supplementary Table 5.
Preparation of protein structures for simulations
Structures of DhaA (PDB-ID: 4E46) and DbjA (PDB-ID: 3A2M - chain A) were downloaded
from the RCSB PDB database3 7 1
. All selected crystal structures were prepared for analysis
by removing ligands and water molecules. Both structures were protonated by H++ server
at pH 7.53 4 0
. The viability of introducing the four selected conformations (UAA1-4) into both
enzymes was accessed using Pymol 1.72 3 5
. In the case of DbjA, three conformations (UAA1,
UAA3 and UAA4) fitted without serious steric clashes. In the case of DhaA, only a single
conformation (UAA2) was viable. All water molecules from the crystal structure of DbjA
that did not overlapped with the protein structure were returned to the system. In case of
182
DhaA, non-overlapping water molecules from the crystal structure 1CQW were added in
order to properly solvate the enzyme active site (structure 4E46 contains bound ligand). CI"
and Na+
ions were added to the final concentration of 0.1M using tLeap module of AMBER
123 4 1
. Using the same module, an octahedron of TIP3P water molecules3 4 2
was added at
the distance of 10 A from any atom in the system.
Molecular dynamics simulation
Energy minimization and MD simulations were carried out in PMEMD.CUDA module3 7 2 , 3 7 3
of AMBER12 using fflO force f j e l d 3 4 3
' 3 6 3
' 3 6 4
. Initially, the investigated systems were
minimized by 500 steps of steepest descent followed by 500 steps of conjugate gradient in
five rounds of decreasing harmonic restraints. The restraints were applied as follows: 500
kcal.mol^.A"2
on all heavy atoms of protein, and then 500, 125, 25 and 0 kcal.mol^.A"2
on
backbone atoms only. The subsequent MD simulations employed periodic boundary
conditions, the particle mesh Ewald method for treatment of the electrostatic
interactions1 5 5 , 3 4 4
, 10 A cutoff and 2 fs time step with the SHAKE algorithm to fix all bonds
containing hydrogens3 4 5
. Equilibration simulations consisted of two steps: (i) 20 ps of
gradual heating from 0 to 300 K under constant volume using a Langevin thermostat with
collision frequency of 1.0 ps"1
with harmonic restraints of 5.0 kcal.mol^.A"2
on the position
of all protein atoms; (ii) 2000 ps of unrestrained MD at 300 K using the Langevin thermostat
and constant pressure of 1.0 bar using pressure coupling constant of 1.0 ps. Finally, two
separate 200 ns long production MD simulations were run for each system using the same
settings as the second step of equilibration MD. Coordinates were saved in 2 ps interval
and the trajectories were analyzed using Cpptraj module3 7 4
of AMBER12, and visualized in
Pymol 1.7 and VMD 1.9.13 4 6
. The calculation of the total free energy of enzymes as proxy
to evaluation of their stability was performed by Molecular-Mechanics/Generalized-Born
Surface Area. 1000 snapshot sampled every 100th frame from each MD trajectory was used
in the analysis. The free energy was calculated by combining the gas phase energy
contributions with solvation free energy components calculated from an implicit solvent
model. Input topologies of sole enzymes were prepared by tLeap module of AMBER12
183
using fflO force field. The following settings were used for the calculation: PBradii were set
to mbondi3, Generalized-Born model = 8 and saltcon = 0.1. The analysis was performed by
a python script MMPBSA.py3 7 5
implemented in AmberToolsl3.
184
7.5.2 Supplementary Tables
Table S 26. Areas of the deconvoluted emission spectra of the different forms of 7H4MC when
embedded in AOT reverse micelles with different water content (w0)
Wo A N
(%) AT
(%) AA
(%) AA
+AT
(%)
0 100 0 0 0
2 100 0 0 0
5 92 8 0 8
10 68 18 14 32
20 49 25 26 51
40 45 7 48 55
AN
, A T
and A A
stand for the area of the decomposed emission spectra of the neutral, tautomeric and anionic
forms, respectively. The complex form is not created under the given conditions. The increase in the sum of
the areas corresponding to the anionic and tautomeric form (AA
+AT
) with the growing water content (wo)
demonstrates how this parameter is suitable for qualitative characterization of the extent of hydration.
Table S 27. Areas of the deconvoluted emission spectra of the different forms of 7H4MC when
embedded in AOT reverse micelles with different content of imidazole aqueous solution (5M, pH
~ 13 and pH ~ 7)
pH 13 p H 7
Wo AN
(%) A c
( % ) AA
(%) AN
(%) A c
( % ) AA
(%)
0 100 0 0 100 0 0
0.2 57 31 12
0.5 44 33 23 57 31 12
1 38 33 29 45 33 22
2 33 37 30 37 34 29
5 18 11 71 32 33 35
10 9 9 82 17 11 72
Imidazole was added in order to mimic the - N H +
- and - N H - functional groups present in the protein matrix.
This results in the formation of the complex form of 7H4MC (emission wavelength ~ 420 nm). A N
, A c
and A A
stand for the area of the decomposed emission spectra of the neutral, complex and anionic forms,
respectively. The tautomeric form is not created under the given conditions. The contribution of the anionic
form A A
increases with the growing wo and therefore reflects qualitatively the degree of hydration of the
microenvironment surrounding the probe.
185
Table S 28. Specific activities of wild type haloalkane dehalogenase DhaA, DbjA and their variants
DhaA:C176UAA and DbjA:G183UAA, with incorporated unnatural amino acid, measured with 1,2-
dibromoethane.
Specific activity
Enzyme [umol s'Vmg of enzyme]
D h a A 0.0648
D h a A : C 1 7 6 U A A 0.0493
D b j A 0.0928
D b j A : G 1 8 3 U A A 0.1154
Table S 29. Areas of the deconvoluted emission spectra of the different forms of the 7H4MC
fluorophore present in the UAA incorporated in DhaA and DbjA. Data recorded at various
temperatures in order to follow the effect of the thermal denaturation of the protein on the
hydration parameter (AA
+ AT
).
AN
(%) A c
( % ) AA
(%) A T
(%) AA
+AT
(%)
D h a A : C 1 7 6 U A A
10 °C 32 18 43 7 50
30 °C 25 19 56 0 56
50 °C 0 17 83 0 83
55 °C 0 13 87 0 87
65 °C 0 11 89 0 89
D b j A : G 1 8 3 U A A
10 °C 7 24 38 32 70
30 °C 5 25 49 22 71
50 °C 5 17 78 0 78
55 °C 0 19 81 0 81
60 °C 0 17 83 0 83
AT
, A c
, A N
and A A
stand for the area of the decomposed emission spectra of the tautomeric, complex, neutral
and anionic forms, respectively. The intensities were corrected for the different quantum yields of the various
forms based on the lifetime values recorded at their corresponding wavelengths (see Figure S 27).
186
Table S 30. Atom types and partial charges for the UAA residue.
Atom ID Atom type Partial charge
1 C T -0.1295
2 H C 0.0771
3 H C 0.0771
4 C A 0.0630
5 C A -0.3605
6 H A 0.1538
7 C A 0.7456
8 O A -0.3536
9 C A 0.2830
10 C A 0.0023
11 C A -0.1637
12 H A 0.1666
13 C A -0.2489
14 H A 0.1852
15 C A 0.3629
16 C A -0.3933
17 H A 0.1877
18 O H -0.5843
19 H O 0.4412
20 0 -0.5754
21 C T -0.0119
22 H C 0.0434
23 H C 0.0434
24 C T 0.0159
25 H l 0.0873
26 N -0.4157
27 H 0.2719
28 C 0.5973
29 o -0.5679
Atom IDs are defined in Figure S 24. Atom types are based on Cornell et al. force field.
Table S 31. Stability of enzymes with incorporated UAA.
UAA Energy [kcal/mol]
Enzyme
conformation MD1 MD2 Average
DhaA U A A _ 2 -6572±2 -6554±2 -6563±3
U A A _ 1 -7479±2* -7474±2 -7474±2
DbjA U A A _ 3 -7476±2 -7483±2 -7480±3
U A A _ 4 -7475±2 -7490±2 -7483±3
* conformation of UAA 1 was unstable in this simulation and changed to conformation UAA 3
Table S 32. Potential hydrogen bonding involving UAA. Hydrogen bonds formed with the sidechain
of the UAA are shaded.
UAA
conformatio
n
Average geometry of
Enzyme
UAA
conformatio
n
Acceptor Donor
Occurrenc
e[%]
detected bond
Distance Angle [°]
UAA
conformatio
n
[A]
Thrl48 U A A 95±3 2.8 162.4
U A A His272 85±3 3.1 156.4
DhaA:
C176UA
A
U A A _ 2
Leu173
U A A
U A A
Tyr273
75±5
78±3
3.3
3.5
150.9
156.5
Alal72 U A A 28±7 3.0 147.2
Alal45 U A A 2±1 3.3 143.44
G l u l 4 6 a
U A A 82±15 2.7 163.3
Argl79 U A A 64±3 3.1 151.2
U A A _ 3 U A A He 185 25±1 3.0 142.6
D b j A V a i l 80 U A A 16±1 3.2 143.8D b j A
G183UA U A A His 139 12±2 3.6 148.1
A
Argl79 U A A 80±5 3.1 152.6
U A A _ 4
U A A He 185 13±8 3.0 141.8
V a i l 80 U A A 7±4 3.3 142.8
U A A A r g l 7 9 b
5±1 3.2 151.5
a
Hydrogen bond with Glul46 is formed via OE1 and OE2 atoms.
b
Hydrogen bond with Argl79 is formed via HH11, HH21 and HE atoms.
188
7.5.3 Supplementary Figures
Figure S 20. General reaction scheme of 7H4MC in the ground and excited state. (A) Ground state
equilibrium of 7H4MC. Three forms of 7H4MC - neutral, anionic and complex of the neutral form, for instance
with an adjacent aminogroup - can exist and/or coexist. The value of pKa for Reaction I is approximately 7.8.
The numbers accompanying the rippled arrows correspond to the absorption wavelengths of the particular
7H4MC form. (B) Neutral, anionic, and complex forms can also occur in the excited state. In addition, proton
transfer (reaction III) can take place resulting in the formation of a tautomer. This proton transfer was shown
to be promoted by the presence of 'structured water'3 6 1
. In bulk, formation of anion (Reaction I) prevails since
p/C0* in the excited state decreases beyond 1.53 6 1
. The numbers accompanying the rippled arrows correspond
to the emission wavelengths of the particular 7H4MC form. Red reaction arrows highlight the dominant
pathways in the excited state.
189
350 400 450 500 550 600
wavelength (nm)
Figure S 21. Emission spectra of 7H4MC incorporated in AOT reverse micelles at various water/surfactant
ratios (wo). Spectra recorded at the excitation wavelength 320 nm. The black curve represents the recorded
emission spectrum, the green, blue and orange curves represent its decomposition into the neutral, anionic
and tautomeric contributions, respectively. Panels (A), (B), (C) depict the emission spectra recorded at m =
0,10, and 40, respectively.
190
<T3 i—l I
E ^
wavelength (nm)
Figure S 22. Far-UV CD spectra of wild type haloalkane dehalogenases and their variants DhaA:C176UAA
and DbjA:G183UAA containing the unnatural amino acid. The spectra confirm the proper folding of labeled
proteins.
191
time (ns) time (ns)
Figure S 23. Fluorescence decays recorded for DbjA:G183UAA (panel A) and DhaA:C176UAA (panel B). The
excitation wavelength was set to 340 nm thus exciting preferentially the neutral form. The emission
wavelengths were positioned at 450 nm (black curve), 490 nm (red curve), and 520 nm (blue curve). The
emission wavelength of 450 nm corresponds to the anionic band, while the latter ones reflect preferentially
the tautomeric form. The recorded decays for DhaA:C176UAA (panel A) are independent of the emission
wavelength. In contrast, an increasing contribution of a negative component (with a time constant of 3.3 ns)
is observed at emission wavelengths 490 nm and 520 nm in the case of DbjA:G183UAA (panel B). This finding
indicates that the tautomeric form is created predominantly from the neutral form in the case of
DbjA:G183UAA, and does not have a significant contribution in the case of DhaA:C176UAA.
192
Figure S 24. Four lowest-energy conformations of the UAA residue employed for the parameterization.
193
DhaA:C176UAA_2 DbjA:G183:UAA_1
DbjA:G183:UAA_3 DbjA:G183:UAA_4
Figure S 25. Conformations of UAA incorporated into the structures of DhaA and DbjA enzymes. Enzymes
are shown as gray cartoon. Initial conformation of UAA and two conformations of UAA averaged over two
MD simulations are shown as yellow, green and cyan sticks, respectively.
194
Figure S 26. Residence time of water molecules at the tunnel mouth of studied enzymes. Water molecules
locked by UAA in DbjA:G183UAA_3 show significantly slower residence time.
6.0-
5.5-
5.0-
4 . 5 -
Q)
E 4.0
CD
3.5-1
3.0
2.5-1
UAA - lifetimes at 10 °C
- • - DhaA, excited at 340 nm
—O— DhaA, excited at 370 nm
— a — DbjA, excited at 340 nm
—A— DbjA, excited at 370 nm
380 400 420 440 460 480 500 520
wavelength (nm)
Figure S 27. Average fluorescence lifetimes of UAA incorporated in DhaA:C176UAA and DbjA:G183UAA.
Data obtained by excitation at wavelengths 340 and 370 nm. Apparently, the average lifetime corresponding
to the neutral form (emission maximum at 380 nm) is approximately half of the values for all the other forms
(emission maximum at 420 nm, 450 nm, and 480 nm for the complex, anionic and neutral forms, respectively).
It is therefore reasonable to assume that the quantum yield of the neutral form is reduced to half when
compared to all the other forms, this was taken into account when calculating the areas A N
in the Table S 29.
195
8 Different Structural Origins of Haloalkane Dehalogenases'
Enantioselectivity towards Linear (3-Haloalkanes: Open-Solvated
versus Occluded-Desolvated Active Sites
Veronika Liskova[ 1 , 2 ]
, Veronika Stepankova[ 1
'2 , 3 ]
, David Bednař1 1 , 2 1
, Zbyněk Prokop[ 1 , 2 ]
, Jan
Brezovsky[ 1
'2 ]
, Radka Chaloupkova[ 1 , 2 ]
, and Jiri Damborsky[ 1
'2 ]
111
Loschmidt Laboratories and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno
(Czech Republic)
121
International Clinical Research Center, St. Anne's University Hospital, Pekařská 53, 656 91 Brno (Czech
Republic)
131
Enantis s.r.o., Kamenice 34, 625 00 Brno (Czech Republic)
Manuscript under review in Angewandte Chemie
196
8.1 Abstract
Enzymatic enantiodiscrimination of linear ft-haloalkanes is difficult because the substrates'
simple structures prevent directional interactions. Here we describe two distinct molecular
mechanisms for enantiodiscrimination of the ft-haloalkane 2-bromopentane by haloalkane
dehalogenases. The highly enantioselective DbjA has a very open solvent-accessible active
site, while the engineered enzyme DhaA31 has a much more occluded and less solvated
cavity but achieves similar enantioselectivity. The enantioselectivity of DhaA31 is shown to
arise from steric hindrance imposed by two specific substitutions rather than hydration as
in DbjA.
197
8.2 Main Article
Enzymatic catalysis is a powerful tool for preparing optically pure chemicals2 5 8
'3 7 6 - 3 7 8
.
However, enzyme engineering is often required to develop biocatalysts enantioselective
with non-natural substrates.
Computational methods can greatly increase the effectiveness of engineering efforts1 5 1
-3 7 9 -
3 8 3
, but even when such methods are applied, an understanding of the factors that govern
the active site's interactions with enantiomeric substrates is essential for rational
enantioselectivity engineering.
Haloalkane dehalogenases (HLDs; EC 3.8.1.5) catalyse the hydrolytic cleavage of carbonhalogen
bonds via SN2-type mechanism (Figure S 28)1 7 8
'2 9 0
. They are useful for the
enantiodiscrimination of (3-bromoalkanes, a-bromoesters, and abromoamides1
6 6
'1 6 7
'2 3 4
'3 8 4
, yielding a wide variety of enantiopure intermediates useful in
the synthesis of drugs and other bioactive compounds3 8 4 , 3 8 5
. Engineered
enantiocomplementary HLDs can also be used to prepare (/?)- and (5)-2,3-dichloropropanl-ol,
and thence the chiral building blocks (/?)- and (S)-epichlorohydrin3 8 6
. HLDs are good
model systems for studying the structural principles of enantioselectivity because
individual family members have different enantioselectivities (Figure 23) and their tertiary
structures have been determined at atomic resolution. There has been particular interest
in the enzymes DbjA from Bradyrhizobium japonicum362
and DhaA from Rhodococcus
rhodochrous172
, which belong to the same HLD subfamily but show very different
enantioselectivities toward |3-bromoalkanes1 6 6
'3 4 9
. While DbjA is highly enantioselective (E
= 174) towards racemic 2-bromopentane (2-BP), DhaA exhibits low enantioselectivity for
this substrate (E = 18)1 6 6
. Analyses of the enzymes' crystal structures revealed that the DbjA
active site is more open and solvent-accessible than that of DhaA3 4 9
. Assuming that the
active site geometry is crucial for substrate recognition, it was proposed that DbjA's
enantioselectivity could be transferred to DhaA by transplanting the former enzyme's
active site into the latter3 4 9
. To this end, 8 point mutations and an 11-amino-acid insertion
were introduced into DhaA to create the mutant DhaA12, whose active site and access
198
tunnel are almost identical to those of DbjA at the atomic level (Figure 23). Although the
transplantation yielded a correctly folded and functional enzyme, its enantioselectivity was
low. This was attributed to differences in dynamics and hydration, which are very difficult
to engineer rationally3 4 9
.
E= 179 E=18 £=174
Occluded active site Wide-open active site
Figure 23. Structural bases of HLD enantioselectivity towards linear B-haloalkanes. Top right: transplantation
of the 11 amino acid Extra Region (ER) and 8 point mutations from the highly enantioselective DbjA (which
has an open and solvent-accessible active site) into the weakly enantioselective DhaA yields the weakly
enantioselective DhaA12. The enantioselectivity of DbjA results from water-mediated interactions of the 2bromopentane
(2-BP) alkyl chain with the active site's hydrophobic wall1 6 6
, which are very difficult to
manipulate rationally. Top left: introducing 5 point mutations into DhaA yields the strongly enantioselective
DhaA31, whose active site is occluded and not solvent-accessible. A variant bearing only two of these
mutations, DhaA139, retains high enantioselectivity with a much more occluded and less open active site
than the similarly enantioselective DbjA. The enantioselectivities of DhaA31 and DhaA139 are due to the
hydrophobic substrate's interactions with the occluded active sites, which force (R)-2-BP into a reactive
configuration. Bottom: cross-sections of the crystal structures of DhaA31 (red; PDB ID 3RK4), DhaA (gray; PDB
199
ID 4HZG) and DbjA (blue; PDB ID 3A2M), highlighting the differences in the solvent-accessibility of their active
sites. Molecular surface of each enzyme is highlighted by corresponding colour and black arrows show the
entrance to the active site of the enzymes. DhaA vs. DbjA and DhaA vs. DhaA31 exhibit 51 and 98 % sequence
identity, respectively, and share RMSD 0.54 and 0.09 A, respectively.
Whereas systematic reengineering of the active site and main access tunnel in DhaA did
not increase enantioselectivity, other DhaA variant created by semi-rational engineering
aimed at enhancing enzyme activity with toxic compound 1,2,3-trichloropropane,
DhaA312 3 8
, exhibits excellent enantioselectivity towards 2-BP (E = 179). This is remarkable
because DhaA31 is less structurally similar to DbjA than DhaA (Figure 23). DhaA31 bears
five point mutations (I135F, C176Y, V245F, L246I and Y273F) that insert four large aromatic
side chains into its access tunnels, narrowing them relative to DhaA and occluding the
active site. Its level of active site hydration, which is considered important for DbjA's
enantioselectivity3 4 9
, is far lower than in that enzyme, suggesting a different structural basis
for enantioselectivity towards 2-BP. Here we present a systematic study on the molecular
basis of enantioselectivity in DbjA, DhaA and DhaA31 using thermodynamic analyses,
steady- and pre-steady-state kinetics measurements, site-directed mutagenesis, and
molecular modelling.
The enthalpic (AR-SAAY*) and entropic (TAR-SAS*2 9 3 k
) contributions to a catalyst's
enantioselectivity can be determined by studying the temperature dependence of the
enantioselectivity3 8 7
. Such experiments with DhaA31 and DhaA revealed that mutations in
the former enzyme significantly increased the enthalpic contribution (Table 9; Figure S 29).
The preferential conversion of the (/?)-enantiomer by DhaA31 is thus due to a high
differential activation enthalpy, which may be related to differences in the enantiomers'
orientation in the transition state. A similar explanation for the enantioselective conversion
of 2-BP was proposed for DbjA1 6 6
. The entropic component (which is around 83% as large
as the enthalpic component)1 6 6
also contributes significantly to DbjA's enantioselectivity.
This is attributed to the binding site's high hydration3 4 9
- water promotes hydrophobic
interactions between a wall in the active site and the two enantiomers' alkyl chains such
that (/?)-2-BP binds exclusively in a reactive mode but (5)-2-BP adopts both reactive and
200
non-reactive binding modes1 6 6
. The entropic contribution for the weakly enantioselective,
less hydrated DhaA is much smaller3 8 8
. If the DhaA and DhaA31 active sites have similar
levels of hydration, they should have similar entropic contributions. However, the entropic
component for DhaA31 is large (86% of the enthalpic component), like that of DbjA but
unlike that of DhaA. The differential activation entropy is complex term covering
differences between the enantiomers in conformational degrees of freedom of the protein,
losses in conformational degrees of freedom of the substrate, or differential solvation
effects3 8 9
. The contribution of entropy to DhaA31 enantiodiscrimination of 2-BP is probably
not connected with differences in enantiomers hydration, given the active site pocket's low
water accessibility (Figure 23) but rather with differences in spatial freedom of (/?)- and (5)-
2-BP inside the enzyme active site (Figure 23). The greater differential activation entropy
(relative to DhaA) indicates more rigid transition state for preferred than non-preferred 2BP
enantiomer and also greater dependence of the protein's active site flexibility on the
temperature3 8 9
.
Table 9. The enantioselectivity values (E) and thermodynamic components of DhaA, DhaA31 and
DbjA for the kinetic resolution of 2-BP.
Enzyme E293 K E313 K
A R - S A H *
[kJ mol-1
]
TAR-sAS"93K
[kJ mol-1
]
AR -sAG"9 3 K
[kJ mol
DhaA 18 13 -15.4 -8.1 -7.2
DhaA31 179 17 -91.6 -78.6 -13.0
DbjA[b] 174 28 -69.5 -56.8 -11.7
[a] AR-SAG*2 9 3 k
stands for differential free energy of activation between (/?)- and (S)-enantiomers of 2-BP (ARS
AG*2 9 3 K
= AR-SAH* - TAR-SAS*2 9 3 K
) and its enthalpic (ARSAH*) and entropic (TAR-SAS*2 9 3 K
) terms; [b] values for
DbjA were published previously by Prokop et a l 1 6 6
.
Steady-state kinetic parameters were measured for the hydrolysis of the separated
enantiomers of 2-BP1 6 6
. The kinetic data for DhaA, DbjA and DhaA31 were fitted using a
competitive steady-state model (Table 10). The contribution of substrate inhibition was
included only for (/?)-enantiomer; substrate inhibition was not observed with (S)enantiomer.
Steady-state kinetics measurements revealed significant differences in the
201
absolute values of the kinetic parameters for DhaA31 and DbjA, suggesting that the
enzymes have diverse catalytic behaviour. Despite these dissimilarities, both enzymes'
enantioselectivities are primarily due to differences in their Km values for the two
enantiomers. DhaA31 exhibits similar kcat values for both enantiomers but its Km for the (/?)enantiomer
is 155-fold lower than for the (S)-enantiomer. Similar trend have been reported
for DbjA1 6 6
, whereas the weakly enantioselective DhaA has very similar Km values for both
enantiomers (Table 10). Km is the ratio of the maximum rate of decomposition of the
enzyme-substrate complex (which corresponds to bimolecular nucleophilic substitution in
HLDs)1 7 8
to the apparent rate of substrate binding3 9 0
.
Table 10. Steady-state kinetic parameters for the hydrolysis of (/?)- and (S)-2-BP by DhaA, DhaA31
and DbjA at 20°C (mean ± standard deviation).
(R)-2-BP (S)-2-BP
Enzyme kcat Km Ksi kcat Km
[s-1] [mM] [mM] [s-1] [mM]
DhaA
0.338
± 0.002
0.0110
± 0.0002
4.0
±0.2
0.044
± 0.001
0.0159
± 0.0008
DhaA31
0.0355
± 0.0001
0.00011
± 0.00003
5.10
±0.02
0.036
± 0.001
0.017
± 0.002
DbjA
0.269
± 0.001
0.0100
± 0.0001
1.421
± 0.001
0.55
±0.02
1.28
±0.06
Data were analysed using competitive steady-state model with substrate inhibition for (R) -enantiomer.
The lower and upper limits are available in Table S 33.
Pre-steady-state rapid quench burst and stopped-flow fluorescence analyses of DhaA31
and DbjA with the separated 2-BP enantiomers were performed to determine whether the
HLDs' enantioselectivity towards 2-BP was governed by substrate binding or the
subsequent Sn2 reaction. The kinetic mechanisms of DhaA31 and DbjA are quite similar:
global kinetic analyses (Figure S 30) indicated that hydrolysis of the alkyl-enzyme
intermediate (fa and fo) is rate-limiting for all reaction pathways (Table S 34, Figure S 31,
Figure S 32). However, neither this rate-limiting step nor substrate binding contribute
202
substantially to the enantiodiscrimination of 2- BP by DhaA31 and DbjA, which arises
primarily from carbon-halogen bond cleavage (ki and k$). The ratios of rates of this process
for (/?)- versus (5)-2-BP were 72 for DhaA31 and 15 for DbjA, implying that both enzymes
preferentially convert the /?-enantiomer because it undergoes a faster Sn2 reaction.
Together with the thermodynamic data and previous findings on the kinetic mechanism of
HLDs1 6 5
'1 7 8
, it appears that the positioning of the (/?)- and (S)-enantiomers in the active site
of DhaA31 dictates the enzyme-substrate complex's ability to reach the Sn2 transition state,
in keeping with previous findings for DbjA1 6 6
.
Molecular docking was used to study the positioning of (/?)-2-BP and (5)-2-BP in the DhaA31
active site, revealing reactive and non-reactive binding modes for each enantiomer (Figure
24). (/?)-2-BP adopts a reactive binding mode when its alkyl chain is in close contact with
Asn41, whereas close contact with Trpl07 corresponds to a non-reactive mode. The
opposite is true for (S)-2-BP.
Figure 24. Binding modes of 2-BP enantiomers determined by molecular docking. The substrates (/?)- and
(S)-2-BP are shown in azure and violet, respectively; the catalytic nucleophile (Asp) and two halide-stabilizing
residues (Trp and Asn) of DhaA/DhaA31 are shown in green. [A] reactive binding mode with the (/?)enantiomer's
alkyl chain in close contact with Asn41, [B] non-reactive binding mode with the (/?)enantiomer's
alkyl chain in close contact with Trpl07, [C] reactive binding mode with the (S)-enantiomer's
alkyl chain in close contact with Trpl07, and [D] non-reactive binding mode with the (S)-enantiomer's alkyl
chain in close contact with Asn41.
203
Q M / M M adiabatic mapping along the reaction coordinate revealed very large differences
in activation energy (~60 kcal mol"1
) between the two binding modes. The frequencies of
potentially reactive positions for each enzyme-enantiomer pair were determined by
computing the proportion of time that the enzyme-substrate complex spent in a nearattack
conformation (NAC) over two 60 ns-long MD simulations. In the case of DhaA31, the
NAC proportions were 5.5±1.1 % for (/?)-2-BP and 0.6±0.9 % for (S)-2-BP. The corresponding
values for the weakly enantioselective DhaA were 0.6±0.6 % and 0.4±0.6 %, respectively;
the values for (/?)- and (5)-2-BP thus did not differ significantly (Table S 35). Similar results
were obtained for DbjA, in which both enantiomers bound primarily at the same wall of
the active site but adopted mirror-image orientations with displaced chiral centres. During
the simulations with DbjA, the (/?)-enantiomer was sampled exclusively in a reactive binding
mode but the (S)-enantiomer adopted both reactive and non-reactive binding modes1 6 6
.
The free access of water molecules to the DbjA active site seems to be important for its
binding of hydrophobic linear haloalkanes and their enantiodiscrimination. Molecular
dynamics simulations were performed on the complexes of DhaA31 with (/?)- and (5)-2-BP
to better understand the molecular basis of its enantioselectivity. This revealed that the
main driver for its enantioselectivity is not hydration, as in DbjA1 6 6
-3 4 9
, but high
complementarity between the occluded active site and the (/?)-enantiomer.
Complementary interactions with the active site's hydrophobic wall stabilize the reactive
binding mode of (/?)-2-BP, increasing its frequency of occurrence three-fold relative to wildtype
DhaA (Table S 36). Both enantiomers bind weakly to the wild-type enzyme, leading to
rapid release from its less occluded active site (Table S 36).
Site-directed mutagenesis was used to construct DhaA variants with different combinations
of the five DhaA31 point mutations to identify those essential for differential binding of the
2-BP enantiomers (Figure 25, Table S 37). A DhaA variant bearing two of these mutations
in the main tunnel (C176Y+Y273F) was previously studied by Bosma et a l . 3 2 7
While both
mutations are believed to reduce the active site's size and affect substrate binding, this
variant's enantioselectivity (E = 25) is similar to wild-type DhaA. Therefore, three new
variants carrying one of the other three DhaA31 point mutations (I135F, V245F and L246I)
204
were constructed. Enantioselectivity measurements with 2-BP revealed that V245F and
L246I increased enantioselectivity 5- and 2-fold, respectively, but I137F had no effect.
Kinetic resolution profiling suggested that both V245F and L246I reduced the enzyme's Km
for the (/?)-enantiomer. Combining the two beneficial mutations yielded the double point
mutant DhaA139, whose enantioselectivity (E = 160) matches that of DhaA31. There is an
evidence from molecular modelling, that the new bulky residues create steric hindrance in
the active site, promoting favourable interactions between (/?)-2-BP and the active site
cavity. Consequently, the (S)-enantiomer only binds in a reactive configuration after the
(/?)-enantiomer has been wholly consumed (Figure 25).
Figure 25. Kinetic resolution of 2-BP by DhaA, DhaA31, and DhaA variants with selected DhaA31 point
mutations. The central image shows the side-chains of mutated residues (in red) in the main tunnel (shown
in blue) and the slot tunnel (in green). The mutations C176Y and Y273F affect the main tunnel, residue 245 is
located at the interface of the tunnels and the active site, and residues 135 and 246 are in the slot tunnel.
The tunnels were analyzed using CAVER 3.01.2 9 9
• (S)-2-BP; o (R)-2-BP.
In summary, we have shown that the mechanistic origins of enantioselectivity in the
engineered HLD DhaA31 and the wild-type DbjA1 6 6
are very different even though both
enzymes achieve similar enantiodiscrimination with the substrate 2-BP. In DbjA, the
preferred (/?)-enantiomer adopts a reactive binding mode more frequently than the non-
205
preferred (S)-enantiomer due to water-promoted interactions of the alkyl chain with the
active site's hydrophobic wall. In DhaA31, two point mutations create a occluded active site
that complements (/?)-2-BP; the non-preferred (5)-2-BP binds only after complete
conversion of (/?)-2-BP. Our results show that an enzyme's enantioselectivity depends on
its hydration and dynamics as well as its structure, so one cannot assume that all members
of an enzyme family share the same structural basis of enantioselectivity. We also created
a highly enantioselective HLD from a non-selective enzyme via just two point mutations.
Whereas previous unsuccessful attempts to engineer an enantioselective DhaA involved
transplanting the wide-open and solvent-accessible DbjA active site, the enantioselectivityincreasing
mutations in DhaA31 occluded access to its active site3 4 9
. A key future challenge
will be to engineer an enantioselective "DbjA-type" enzyme by modulating the active site's
hydration; this will require new computational methods for quantitative site-specific
analysis of protein hydration.
Acknowledgements
The authors thank Prof. Petr Klan (Masaryk University, Brno) for help with synthesis of enantiopure
substrates. Financial support is gratefully acknowledged f r o m The Czech Grant Agency (16-06096S,
17-24321S) t h e Ministry of Education, Youth, and Sports of the Czech Republic (LH14027, L01214,
L M 2 0 1 5 0 5 1 , LQ1605), and Masaryk University (MUNI/M/1888/2014). Access t o t h e M E T A C e n t r u m
and CERIT-SC supercomputing facilities is highly appreciated (LM2015042 and LM2015085).
206
8.3 Supporting Information
8.3.1 Experimental section
Construction of DhaA variants. The DhaA variants were constructed using a megaprimer
PCR method with pET21b::d]
haAwt as an initial template. Resulting dhaA 12461 was
subsequently used for construction of double point mutant. The mutagenesis was
performed in two rounds of PCR. The reaction mixtures of 50 pi contained 100 ng of
template DNA, 10 pmol of each oligonucleotide, and 0.02 pM dNTPs (each) (New England
Biolabs, USA), and 2.5 U of Phusion HF DNA Polymerase (New England Biolabs, USA) in
Phusion HF buffer with 1.5 mM MgCb (New England Biolabs, USA). In the first round, the
synthesis of a linear product with a desired mutation was performed using Fw and Rv
oligonucleotides. The product served as a megaprimer in the second round of PCR where
whole plasmid was synthesised. The first round of PCR proceeded under the following
conditions: 3 min at 95°C, and then the 15 cycles of 30 s at 95°C, 30 s at 58°C or 52°C and
20 s or 26 s at 72°C. The subsequent second round of PCR included 25 cycles of 30 s at 95°C,
30 s at 68°C and 3 min 30 s at 72°C followed by 10 min at 72°C. PCR products were then
treated with the methylation-dependent endonuclease Dpnl (New England Biolabs, USA)
for 1 h at 37°C. The resulting plasmids were transformed by a heat shock method into
Escherichia coli Dh5a cells (ZymoResearch, USA) and amplified. The presence of desired
mutations was confirmed by sequence analyses (GATC, Germany)
Sequence of primers used for construction of dhaA variants:
Constructed variant Primer sequence (5'-3')
dhaAL246I F w : acacccggcgtaataatccccccggccgaag
R v : agacccgtttagaggccccaaggggttatg
dhaAV245F F w : cacacccggctttctgatccccccggccgaag
R v : agacccgtttagaggccccaaggggttatg
dhaAV245F, L246I F w : cccgcgaaattaatacgactcactataggg
R v : tcaggccattcatcccaggtcggaatcggacgaataaattccatgc
207
Expression and purification of proteins. Recombinant plasmids coding DbjA and DhaA
variants were transformed into Escherichia coli BL21(DE3). For overexpression, cells were
grown at 37°C to an optical density ( O D 6 0 0 ) of about 0.6 in 1 L of LB medium containing
ampicillin (100 pg-ml"1
). Protein expression was induced by adding IPTG to a final
concentration of 0.5 m M in LB medium and the temperature was decreased to 20°C. Cells
were harvested by centrifugation for 10 min at 3,700 g after overnight cultivation. During
harvesting, cells were washed once with 50 m M phosphate buffer with 10% glycerol (pH
7.5) and then resuspended in equilibrating purification buffer (16.4 mM K2HPO4, 3.6 mM
KH2PO4, 500 m M NaCI, 10 m M imidazole, pH 7.5). Harvested cells were kept at -80°C.
Defrosted cells were disrupted by sonication with a Hielscher UP200S ultrasonic processor
(Hielscher Ultrasonics, Teltow, Germany) and C-terminus His-tagged enzymes were purified
to homogeneity using Ni-NTA Superflow Cartridges (Qiagen, Germany) as described
previously2 9 6
. The eluted proteins were dialyzed against phosphate buffer (50 mM, pH 7.5).
Protein concentrations were determined using the Bradford reagent (Sigma-Aldrich, USA)
with bovine serum albumin as a standard. The purity of the resulting proteins was checked
by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) in 15% polyacrylamide gels. The gels
were stained with Coomassie brilliant blue R-250 dye (Fluka, Switzerland) and the
molecular mass of the proteins was determined using the Protein Molecular Weight
Marker (Fermentas, Canada).
Activity assay. The dehalogenation activity was assayed using the colorimetric method
developed by Iwasaki et a l . 2 9 0
The release of halide ions was analysed
spectrophotometrically at 460 nm using a microplate reader (Tecan, Austria) after reaction
with mercuric thiocyanate and ferric ammonium sulfate. The reactions were performed at
37°C in 25-ml Reacti Flasks closed by Mininert Valves. The reaction mixtures were
composed of 10 ml of glycine buffer (100 m M , pH 8.6) and 10 pi of a substrate (2bromobutane,
2-bromopentane, 2-bromohexane, ethyl-2-bromopropionate or methyl-2bromobutyrate).
Enzymatic reactions were initiated by addition of enzymes and were
monitored by periodically withdrawing of 1 ml samples from the reaction mixture. The
samples were immediately mixed with 0.1 ml of 35% nitric acid to terminate the reaction.
208
Dehalogenation activities were quantified as rates of product formation over time. Each
activity was measured in 3 independent replicates.
Enantioselectivity. The reaction mixture composed of 40 ml of Tris-sulfate buffer (50 mM,
pH 8.2) and 10 pi of 2-bromopentane was incubated for 30 min at 20Q
C in water-bath
shaker (180 rpm). The enantioselectivity was analysed at 20°C in 25-ml Reacti Flasks closed
by Mininert Valves containing 25 ml of the reaction mixture. Enzymatic reaction was
initiated by addition of the enzyme. The reaction progress was monitored by periodical
withdrawing of 0.5 ml samples from the reaction mixture. The reaction was stopped by
mixing the sample with 1 ml of diethyl ether containing 1,2-dichloroethane as an internal
standard. The samples were analysed by using gas chromatograph (Agilent, USA) equipped
with a flame ionization detector and chiral capillary column Chiraldex G-TA (Alltech, USA).
Michaelis-Menten parameters were derived by fitting the progress curves obtained from
kinetic resolution experiments into a competitive kinetic model by numerical integration
using the software Micromath Scientist (MicroMath Research, USA). Enantioselectivity was
determined as the enantiomeric ratio (£) defined by the Equation 24:
Equation 24
where /ccat and Km represent the Michaelis-Menten parameters of the two enantiomers.
Steady-statefc/nef/cs.Substrateto product conversion by the action of DhaA, DhaA31 and
DbjA was monitored by using VP-ITC isothermal titration microcalorimeter (MicroCal, USA).
Substrates (/?)-2-bromopentane and (5)-2-bromopentane were dissolved in glycine buffer
(100 mM, pH 8.6) and was allowed to reach thermal equilibrium in a 1.4-mL reaction cell.
The reaction was initiated by injecting of enzymes. All measurements were performed at
20 °C. The measured rate of heat change is directly proportional to the velocity of
enzymatic reaction according to the Equation 25.
209
dQ d[S]
dt dt
Equation 25
where AAV is the enthalpy of the reaction, [5] is the substrate concentration, and V is the
volume of the cell. A/-/ was determined by titration of the substrate into the reaction cell
containing the enzyme. Each reaction was allowed to proceed to completion. The
integrated total heat of reactions was divided by the amount of injected substrate. The
evaluated rate of substrate depletion (-d[S]/dt), and corresponding substrate
concentrations were then fitted using a competitive steady-state model.
Pre-steady-state kinetics. The burst of analysed reactions was monitored by using rapid
quench flow experiments performed at 37°C in glycine buffer (100 mM, pH 8.6) using the
rapid quench flow instrument model QFM 400 (BioLogic, France). The reaction was started
by rapid mixing of 75 pi enzyme with 75 pi substrate solution and quenched with 100 pi 0.8
M H2SO4 after time intervals ranging from 5 ms to 3 s. The quenched mixture was directly
injected into 0.5 ml of ice-cold diethyl ether with 1,2-dichloroethane as the internal
standard. After extraction, the diethyl ether layer containing non-covalently bound
substrate and alcohol product was collected, dried on a short column containing anhydrous
Na2SC>4 and analysed on gas chromatograph Agilent 7890 (Agilent, USA) equipped with
capillary column DB-FFAP (30m x 0.25mm x 0.25pm, Phenomenex) and connected with
mass spectrometer Agilent 5975C (Agilent, USA). The amount of halide in the water phase
was measured by ion chromatograph 861 Advanced Compact IC equipped with METROSEP
A Supp 5 column (Metrohm, Switzerland).
The fluorescence kinetic data were recorded by using the stopped flow instrument SFM-
300 (BioLogic, France) combined with MOS-200 spectrometer equipped with a Xe arc lamp.
Fluorescence emission from tryptophan residues was observed through a 320 nm cut-off
filter upon excitation at 295 nm. All reactions were performed at 37°C in a glycine buffer of
pH 8.6.
210
Data Analysis and Statistics. All data were imported and fit globally with the KinTek
Explorer program (KinTek Corporation). Data fitting used numerical integration of rate
equations from an input model searching a set of parameters that produce a minimum x2
value using nonlinear regression based on the Levenberg-Marquardt method3 9 1
. To
account for slight variations in the data, enzyme or substrate concentrations were slightly
adjusted (±10%) to derive best fits. In addition, the rate of substrate binding was assumed
to be rapid equilibrium, and so binding constant was set to 1 000 mM'ls'1
. By allowing the
dissociation rate to vary, calculations of equilibrium constants were then possible.
Residuals were normalized by sigma value for each data point. The standard error (S.E.)
was calculated from the covariance matrix during nonlinear regression. In addition to S.E.
values, more rigorous analysis of the variation of the kinetic parameters was accomplished
by confidence contour analysis by using FitSpace Explorer (KinTek, USA). In this analysis,
the lower and upper limits for each parameter were derived from the confidence contours
for x2 threshold at boundary 0.953 9 2
.
Molecular modelling. Preparation of the ligand structure. The three-dimensional
structures of (/?)- and (5)-2-bromopentane were prepared in Avogadro3 6 5
. Their partial
atomic charges were derived by R.E.D. server3 6 8
. Input geometries were optimized by
Gaussian 2009 D.01 program interfaced with this server and a multi-orientation RESP fit
with RESP-A1A charge model was performed.
Preparation of protein structures. Two structures of DhaA from Rhodococcus rhodochrous
(PDB-ID: 4E46 and 4HZG) and two structures of mutant DhaA31 (PDB-ID: 3RK4 and 4FWB)
were downloaded from the RCSB PDB database3 7 1
. All these crystal structures were
prepared for analyses by removing ligands and water molecules. Missing heavy atoms in
side chains and protons were added using the H++ server at pH 7.53 4 0
.
Molecular docking. Autodock atom types and Gasteiger charges were added to protein and
ligands by MGLTools3 9 3
. Precalculations of electrostatic potential energy, van der Waals, Hbonds
and desolvation free energy for docking calculations were performed by AutoGrid
4.01 2 7
. Centre of grid maps with 80 x 80 x 80 grid points and spacing 0.25 A were set to OD1
211
atom of nucleophilic aspartate. These parameters were chosen to cover the active site and
the main tunnel. Substrates were docked into the enzyme using AutoDock 4.01 2 7
. 250 runs
of Lamarckian genetic algorithm were performed with different initial population sizes 50
and 300 using the following parameters: maximum of 3 x 106 energy evaluations and
30,000 generations, elitism value 1, mutation rate 0.02 and crossover rate 0.8. The local
search was performed by Solis & Wets algorithm performing at most 300 iterations3 9 4
.
MD simulations. Force field parameters for the docked conformations of ligands were
prepared by antechamber module of AmberTools 14 with RESP charges obtained from
R.E.D. server. Water molecules from the respective crystal structures were returned to the
systems. CI- and Na+ ions were added to a final concentration of 0.1 M using the Tleap
module of AMBER 143 4 1
. Using the same module, an octahedral set of TIP3P water
molecules3 4 2
was added such that all atoms within the system were at least 10 A from the
octahedron's surface. Energy minimisation and MD simulations were performed by using
the PMEMD.CUDA module of AMBER143 4 1
with the ffl4SB force field3 4 3
for proteins and
general amber force field[ 1 5 ]
for ligands. Initially, the investigated systems were minimised
by 500 steps of steepest descent followed by 500 steps of conjugate gradient over five
rounds with decreasing harmonic restraints. The restraints were applied as follows: 500
kcal.mol^A"2
on all heavy atoms of the protein, and then 500, 125, 25 and 0 kcal.mol^A"2
on backbone atoms only. The subsequent MD simulations employed periodic boundary
conditions, using the particle mesh Ewald method to treat electrostatic interactions1 5 5 , 3 4 4
,
a 10 A cut-off for non-bonded interactions, and a 2 fs time step with the SHAKE algorithm
to fix all bonds containing hydrogens3 4 5
. Equilibration simulations consisted of two steps:
(I) 20 ps of gradual heating from 0 to 293 Kat constant volume, using a Langevin thermostat
with a collision frequency of 1.0 ps"1
, and with harmonic restraints of 5.0 kcal.mol^A"2
on
the positions of all protein atoms, and (ii) 2,000 ps of 293 K using the Langevin thermostat
at a constant pressure of 1.0 bar using a pressure coupling constant of 1.0 ps. Finally, two
separate 60 ns long production MD simulations were run for each system using the same
settings as the second step of MD equilibration. Coordinates were saved at intervals of 2
ps and the resulting trajectories were analysed by using the Cpptraj module of AMBER14,
212
and visualised by using Pymol 1.5 (The PyMOL Molecular Graphics System, Version 1.5.0.4
Schrodinger, LLC) and VMD 1.9.13 4 6
.
Q M / M M adiabatic mapping of the dehalogenation. An adiabatic mapping along the
reaction coordinate was performed by the Sander module of AMBER14. The QM part of
the system contained side-chains of halide stabilizing residues, catalytic aspartate and the
ligand. The semiempirical PM6 Hamiltonian was used for the QM part3 9 5
and ffl4SB force
field for M M part of the system. The Q M / M M boundary was treated through explicit link
atoms and the cutoff for the Q M / M M charge interactions was set to 999 A. Constraint with
force constant 1.0 kcal.mol^A"2
was used for the backbone. The reaction coordinate was
defined as distance between OD1 atom of nucleophile and C2 atom of ligand. The driving
along the reaction coordinate was performed with 0.05 A step and the restraint force
constant of 5,000 kcal.mol^A"2
, each consisting of 1,000 minimization steps of limitedmemory
Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm3 9 6
.
Determination of near attack conformation (NAC). Distance between nucleophilic oxygen
on the catalytic aspartate and C2 atom of the ligand has to be within 3.41 A3 9 7
. Angle
between nucleophilic oxygen on the catalytic aspartate, C2 atom and leaving bromine atom
of the ligand has to be higher than 157°.
213
8.3.2 Supporting Information Figures and Tables
Br
V V_0H. VE n z E n z O M
E n i
R = a l k o x y c a r b o n y l o r a l k y l
Figure S 28. Reaction mechanism of HLD with B-bromoalkanes. Enz-COO-: active site A s p 1 7 8 , 2 9 0
.
LU
6.0
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
y = 8 3 6 Ü X - 2 3
R2
= 0.993
y = 11017X- 32
R2
= 0.944
y = 1 8 4 8 X - 3
R 2
= 0.917
0.003 0.0034 0.00360.0032
m [K-1
]
Figure S 29. Temperature dependence of enantiomeric ratios determined for dehalogenation of 2-BP by
DhaA (red)388
, DhaA31 (green) and DbjA (blue)
214
Table S 33. Steady-state kinetic parameters for the hydrolysis of (/?)- and (S)-2-BP by DhaA,
DhaA31and DbjAat 20°C.
value ± S.E.
(lower limit; upper limit)
(fl)-2-BP (S)-2-BP
j-, kcat Km Ksi kcat Km
n Z y m C
(s^) (mM) (mM) (s^) (mM)
< 0.338 ±0.002 0.0110 ±0.0002 4.0 ± 0.2 0.044 ±0.001 0.0159 ± 0.0008
q (0.336; 0.498) (0.0063; 0.0149) (0.8; 7.7) (0.044; 0.044) (0.0107; 0.0201)
«j 0.0355 ±0.0001 0.00011 ±0.00003 5.10 ±0.02 0.036 ±0.001 0.017 ±0.002
Ž (0.0354; 0.0408) (>0.00001; 0.00084) (2.09; 5.13) (0.036; 0.036) (0.014; 0.018)
Q
< 0.269 ±0.001 0.0100 ±0.0001 1.421 ±0.001 0.55 ±0.02 1.28 ±0.06
q (0.267; 0.298) (0.0036; 0.0173) (1.100; 1.450) (0.35; 1.04) (0.66; 2.92)
Data were analysed using competitive steady-state model with substrate inhibition for (/?) -enantiomer.
215
Table S 34. Pre-steady-state kinetic parameters of 2-BP conversion by DhaA31 and DbjA.
Individual rate and equilibrium dissociation constants obtained by fitting of a competitive kinetic
model (Figure S 30) globally to steady-state and pre-steady-state kinetic data obtained at 37°C
and pH 8.6.
Substrate binding
C-Br bond
cleavage (SN2)
Hydrolysis of
intermediate
(AdN)
Product release
ntane
Enzyme
Ki
m M
fa
s-1
fe
S"1
KA
mM
01
Q_ DhaA31 0.69±0.03 400±20 0.33±0.001 >10
s
bromc
DbjA 2.45±0.19 390±30 0.92±0.01 0.23±0.05
itane
Enzyme
K5
mM
K6
s1
K7
s1
Ks
mM
01
Q_ DhaA31 0.83±0.03 7.00±0.20 0.436±0.001 >10
(S)-2-
bromc
DbjA 7.10±0.60 26.00±2.00 0.75±0.01 0.11±0.05
Ki and Ks - equilibrium dissociation constant for complex of enzyme with (/?)- and (S)-2-BP, respectively; ki
and ke- rate constant for carbon-halogen bond cleavage in conversion of (/?)- and (S)-2-BP, respectively; fa
and h - rate constant for hydrolysis of alkyl-enzyme intermediate in conversion of (/?)- and (S)-2-BP,
respectively; Kn and KB - equilibrium dissociation constants for enzyme-product complex in conversion of (Aland
(S)-2-BP, respectively. Data were fitted globally to competitive kinetic model (Figure S 30).
216
SUBSTRATE CLEAVAGE OF HYDROLYSIS OF PRODUCT
BINDING C-Br BOND (SN2) INTERMEDIATE (AdN) RELEASE
E . S R — - E - I R — - E . P R
4 p
R
^ ft.
E . S , •* E - U — • E . P ,
Figure S 30. The kinetic model of 2-BP conversion by DhaA31 and DbjA at 37°C. E is free enzyme, E.S is
enzyme-substrate complex, E-l is covalently bound alkyl-enzyme intermediate (halide product is bound to the
intermediate), E.P is enzyme complex with both bromide and alcohol products. The subscript identifies (/?)or
(S)-enantiomer of substrate (S), intermediate (I) and alcohol product (P).
50 100
Time (min)
40 80
Time (min)
217
0.25 0.25
Time (s) Time (s)
Figure S 31. Kinetic analysis of DhaA31 reaction with 2-bromopentane. Total conversion of 1.17 and 1.29
mM (/?)-2-bromopentane (A) and 1.13 and 1.29 m M (S)-2-bromopentane (B) by 0.9 \xM DhaA31. Reaction
burst of halide (•) and alcohol ( A ) product monitored upon mixing 160 \xM DhaA31 with 350 \xM (R)-2bromopentane
(C) and 650 \xM (S)-2-bromopentane (D). Kinetic resolution of 750 \xM rac-2-bromopentane
by 2 \xM DhaA31 (E), (/?)-2-bromopentane (blue circles) and (S)-2-bromopentane (green circles). Stopped-flow
fluorescence traces recorded upon rapid mixing of 4 \xM DhaA31 with 0 - 230 \xM (S)-2-bromopentane (F),
each trace shows the average often individual experiments. All reactions performed at 37°C and pH 8.6. Solid
lines represent global fit to the kinetic data.
218
Concentration (mM) Rate (mM .sr1
>
Figure S 32. Kinetic analysis of DbjAwt reaction with 2-bromopentane. Steady-state kinetics of (R)-2bromopentane
(blue circles) and (S)-2-bromopentane (green circles) conversion by DbjAwt (A). Kinetic
resolution of 980 u.M rac-2-bromopentane by 1 u.M DbjAwt (B). Reaction burst of halide (•) and alcohol ( A )
product monitored upon mixing 120 u.M DbjAwt with 350 u.M (R)-2-bromopentane (C) and 160 u.M DbjAwt
with 460 u.M (S)-2-bromopentane (D). Stopped-flow fluorescence traces recorded upon rapid mixing of 155
\xM DbjAwt with 50,120 and 350 u.M (/?)-2-bromopentane (E) or (S)-2-bromopentane (F), each trace shows
the average often individual experiments. All reactions performed at 37°C and pH 8.6. Solid lines represent
global fit to the kinetic data.
Table S 35. Percentage of NACs for both substrates in molecular dynamics simulations.
Enzyme (R)-2-BP (%) (S)-2-BP (%)
DhaA 0.57±0.58 0.38±0.57
DhaA31 5.50±1.11 0.59±0.86
DbjA 0.31±0.11 0.17±0.02
NAC - near-attack configuration
220
Table S 36. Four categories of ligand positioning and NACs (NAC; the ground state configurations
that can convert to the transition state) identified within every category by molecular dynamics.
Enzyme Substrate
Unstabilized
[%]
NAC
Right
[%]
NAC
Left
[%]
NAC
Other
[%]
NAC
DhaA ( « ) 61 2 34 168 3 0 2 1
DhaA (S) 82 1 6 5 10 297 2 5
DhaA31 ( « } 0 4 89 1610 6 0 5 37
DhaA31 (S) 17 3 52 1 10 169 21 4
NAC - near attack configuration
Table S 37. Specific activity and enantioselectivity (E values) of DhaA variants with 2-BP.
Specific activity'3
' E value[ b l
DhaA 0 . 0 0 8 [ c l
18
DhaA31 0.007 179
C176Y+Y273F 0.005 25
I135F 0.019 27
L246I 0.025 37
V245F 0.025 88
V245F+L246I 0.016 160
[ a l
nmol-s^-mg"1
of enzyme at 37°C;[ b l
the F-values were measured at 20°C; [ c l
data measured at room
temperature (Prokop et. al. 2010)1 6 6
.
221
CONCLUSIONS
This dissertation deals with two important topics of structure-function relationships: (i)
protein stabilization (exemplified with haloalkane dehalogenase, y-hexachlorocyclohexane
dehydrochlorinase LinA and fibroblast growth factor 2), and (ii) characterization of
rationally engineered haloalkane dehalogenase DhaA. Nowadays, protein stabilization
methods focus mostly on identification of single-point mutants, which are experimentally
characterized. Newly developed method FireProt is capable of combining individual
mutations into the final multiple-point mutant directly. Only a few proteins have to be
characterized. Thus an increase in the speed and a decrease in the cost and laboratory
demands are the main advantages compared to other methods. However, the usage of
FireProt is not easy and a lot of experience in bioinformatics and computational chemistry
is needed to execute a whole protocol. Therefore, we are currently developing a fully
automatic FireProt web server, which will make this technique accessible to a broad
scientific community. The knowledge from protein stabilization was applied also to
improvement of thermodynamic stability of human fibroblast growth factor FGF2 by 19°C.
This molecule is essential for stem cell cultivation, but because of its short half-life,
cultivation medium has to be exchanged every day. Our stable variant stays active even
after twenty days which significantly simplify cultivation of stem cells. Stable FGF2 molecule
will find use in research and development, cosmetics and wound healing. Second part of
this Thesis is focused on different properties of engineered haloalkane dehalogenase DhaA.
The emphasis is put on importance of its access tunnel, which has significant impact on
several different properties. A new method for analysis of protein hydration at the entry to
the access tunnel was proposed based on fluorescent spectroscopy and incorporation of
unnatural amino acid. A hydration has a large effect on protein behavior but up to now its
effect on enzymatic catalysis has been often neglected. We observed two different
structural bases of enantioselectivity in dehalogenases, one of them being driven by
hydration. In silico methods for analyses of water molecules would be a great helper in
description of enantioselectivity, enzyme kinetics, and possibly other interesting enzyme
properties.
222
REFERENCES
1. Eijsink, V. G. H. et al. Rational engineering of enzyme stability. J. Biotechnol. 113,105-120 (2004).
2. Stepankova, V., Vanacek, P., Damborsky, J. & Chaloupkova, R. Comparison of catalysis by haloalkane
dehalogenases in aqueous solutions of deep eutectic and organic solvents. Green Chem. 16, 2754-2761 (2014).
3. Lazaridis, T. & Karplus, M. Thermodynamics of protein folding: a microscopic view. Biophys. Chem. 100, 367-395
(2002).
4. Deller, M. C, Kong, L & Rupp, B. Protein stability: a crystallographer's perspective. Acta Crystallogr. Sect. FStruct.
Biol. Commun. 72, 72-95 (2016).
5. Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223-230 (1973).
6. Levinthal, C. Are There Pathways for Protein folding? Extrait du Journal de Chimie Physique 1968, 44
7. Rooman, M., Dehouck, Y., Kwasigroch, J. M., Biot, C. & Gilis, D. What is Paradoxical about Levinthal Paradox? J.
Biomol. Struct. Dyn. 20, 327-329 (2002).
8. Mukaiyama, A. & Takano, K. Slow Unfolding of Monomeric Proteins from Hyperthermophiles with Reversible
Unfolding. Int. J. Mol. Sei. 10,1369-1385 (2009).
9. Pauling, L, Corey, R. B. & Branson, H. R. The structure of proteins: Two hydrogen-bonded helical configurations of
the polypeptide chain. Proc. Natl. Acad. Sei. 37, 205-211 (1951).
10. Pauling, L. & Corey, R. B. Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds.
Proc. Natl. Acad. Sei. U. S. A. 37, 729-740 (1951).
11. Kendrew, J. C Myoglobin and the Structure of Proteins. Science 139,1259-1266 (1963).
12. Pace, N. C, Scholtz, J. M. & Grimsley, G. R. Forces stabilizing proteins. FEBS Lett. 588, 2177-2184 (2014).
13. Huyghues-Despointes, B. M., Pace, C N., Englander, S. W. & Scholtz, J. M. Measuring the conformational stability
of a protein by hydrogen exchange. Methods Mol. Biol. Clifton NJ 168, 69-92 (2001).
14. Lesser, G. J. & Rose, G. D. Hydrophobicity of amino acid subgroups in proteins. Proteins Struct. Fund. Bioinforma.
8, 6-13 (1990).
15. Lazaridis, T., Archontis, G. & Karplus, M. in Advances in Protein Chemistry (ed. CB. Anfinsen, F. M. R., John T.Edsall
and David S.Eisenberg) 47, 231-306 (Academic Press, 1995).
16. Pace, C N. Evaluating contribution of hydrogen bonding and hydrophobic bonding to protein folding. Methods
Enzymol. 259, 538-554 (1995).
17. Pace, C.N. et al. Contribution of hydrophobic interactions to protein stability. J. Mol. Biol. 408, 514-528 (2011).
18. Baase, W. A., Liu, L., Tronrud, D. E. & Matthews, B. W. Lessons from the lysozyme of phage T4. Protein Sei. Puhl.
Protein Soc. 19, 631-641 (2010).
19. Doig, A. J. & Sternberg, M. J. Side-chain conformational entropy in protein folding. Protein Sei. Puhl. Protein Soc. 4,
2247-2251 (1995).
20. Sticke, D. F., Presta, L. G., Dill, K. A. & Rose, G. D. Hydrogen bonding in globular proteins. J. Mol. Biol. 226, 1143-
1159 (1992).
21. Bowie, J. U. Membrane Protein Folding: How important are hydrogen bonds? Curr. Opin. Struct. Biol. 21, 42-49
(2011).
22. Joh, N. H. et al. Modest stabilization by most hydrogen-bonded side-chain interactions in membrane proteins.
Nature 453, 1266-1270 (2008).
23. Takano, K., Scholtz, J. M., Sacchettini, J. C & Pace, C N. The Contribution of Polar Group Burial to Protein Stability
Is Strongly Context-dependent. J. Biol. Chem. 278, 31790-31795 (2003).
24. Mills, J. E. J. & Dean, P. M. Three-dimensional hydrogen-bond geometry and probability information from a crystal
survey. J. Comput. Aided Mol. Des. 10, 607-622
25. Gao, J., Bosco, D. A., Powers, E. T. & Kelly, J. W. Localized thermodynamic coupling between hydrogen bonding
and microenvironment polarity substantially stabilizes proteins. Nat. Struct. Mol. Biol. 16, 684-690 (2009).
26. Cao, Z. & Bowie, J. U. An energetic scale for equilibrium H/D fractionation factors illuminates hydrogen bond free
energies in proteins. Protein Sei. Puhl. Protein Soc. 23, 566-575 (2014).
27. Robinson, C R. & Sauer, R. T. Striking stabilization of Arc repressor by an engineered disulfide bond. Biochemistry
(Mose.) 39, 12494-12502 (2000).
223
28. Wijma, H. J. etal. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sei. PEDS 27,
49-58 (2014).
29. Dombkowski, A. A., Sultana, K. Z. & Craig, D. B. Protein disulfide engineering. FEBS Lett. 588, 206-212 (2014).
30. Grimsley, G. R. et al. Increasing protein stability by altering long-range coulombic interactions. Protein Sei. Publ.
Protein Soc. 8, 1843-1849 (1999).
31. Pace, C. N., Alston, R. W. & Shaw, K. L. Charge-charge interactions influence the denatured state ensemble and
contribute to protein stability. Protein Sei. Publ. Protein Soc. 9,1395-1398 (2000).
32. Brady, G. P. & Sharp, K. A. Entropy in protein folding and in protein-protein interactions. Curr. Opin. Struct. Biol. 7,
215-221 (1997).
33. Tzeng, S.-R. & Kalodimos, C G. Protein activity regulation by conformational entropy. Nature 488, 236-240 (2012).
34. Kasinath, V., Sharp, K. A. & Wand, A. J. Microscopic Insights into the NMR Relaxation-Based Protein Conformational
Entropy Meter. J. Am. Chem. Soc. 135, 15092-15100 (2013).
35. Wand, A. J. The dark energy of proteins comes to light: Conformational entropy and its role in protein function
revealed by NMR relaxation. Curr. Opin. Struct. Biol. 23, 75-81 (2013).
36. Baxa, M. C, Haddadian, E. J., Jumper, J. M., Freed, K. F. & Sosnick, T. R. Loss of conformational entropy in protein
folding calculated using realistic ensembles and its implications for NMR-based calculations. Proc. Natl. Acad. Sei.
U. S. A. I l l , 15396-15401 (2014).
37. Thompson, J. B., Hansma, H. G., Hansma, P. K. & Plaxco, K. W. The Backbone Conformational Entropy of Protein
Folding: Experimental Measures from Atomic Force Microscopy. J. Mol. Biol. 322, 645-652 (2002).
38. Hu, X. & Kuhlman, B. Protein design simulations suggest that side-chain conformational entropy is not a strong
determinant of amino acid environmental preferences. Proteins Struct. Fund. Bioinforma. 62, 739-748 (2006).
39. D'Aquino, J. A. et al. The magnitude of the backbone conformational entropy change in protein folding. Proteins
25, 143-156 (1996).
40. Pace, C N. etal. Conformational stability and thermodynamics of folding of ribonucleases Sa, Sa2 and Sa3. J. Mol.
Biol. 279, 271-286(1998).
41. Martinez, A., Calvo, A. C, Teigen, K. & Pey, A. L. Rescuing proteins of low kinetic stability by chaperones and natural
ligands phenylketonuria, a case study. Prog. Mol. Biol. Transl. Sei. 83, 89-134 (2008).
42. Lynch, S. M., Boswell, S. A. & Colon, W. Kinetic stability of Cu/Zn superoxide dismutase is dependent on its metal
ligands: implications for ALS. Biochemistry (Mose.) 43,16525-16531 (2004).
43. Hammarström, P., Wiseman, R. L., Powers, E. T. & Kelly, J. W. Prevention of transthyretin amyloid disease by
changing protein misfolding energetics. Science 299, 713-716 (2003).
44. Costas, M. etal. Between-Species Variation in the Kinetic Stability of TIM Proteins Linked to Solvation-Barrier Free
Energies. J. Mol. Biol. 385, 924-937 (2009).
45. Sanchez-Ruiz, J. M. Protein kinetic stability. Biophys. Chem. 148,1-15 (2010).
46. Sohl, J. L., Jaswal, S. S. & Agard, D. A. Unfolded conformations of a-lytic protease are more stable than its native
state. Nature 395, 817-819 (1998).
47. Tur-Arlandis, G., Rodriguez-Larrea, D., Ibarra-Molero, B. & Sanchez-Ruiz, J. M. Proteolytic scanning calorimetry: a
novel methodology that probes the fundamental features of protein kinetic stability. Biophys. J. 98, L12-14 (2010).
48. Baker, D. & Agard, D. A. Kinetics versus Thermodynamics in Protein Folding. Biochemistry (Mose.) 33, 7505-7509
(1994).
49. Xia, K. etal. Identifying the subproteome of kinetically stable proteins via diagonal 2D SDS/PAGE. Proc. Natl. Acad.
Sei. U. S. A. 104,17329-17334 (2007).
50. Park, C, Zhou, S., Gilmore, J. & Marqusee, S. Energetics-based protein profiling on a proteomic scale: identification
of proteins resistant to proteolysis. J. Mol. Biol. 368, 1426-1437 (2007).
51. Khoury, G. A., Smadbeck, J., Kieslich, C A. & Floudas, C A. Protein folding and de novo protein design for
biotechnological applications. Trends Biotechnol. 32, 99-109 (2014).
52. Schmid, A., Hollmann, F., Park, J. B. & Bühler, B. The use of enzymes in the chemical industry in Europe. Curr. Opin.
Biotechnol. 13, 359-366 (2002).
53. Hartmann, M., Roeraade, J., Stoll, D., Templin, M. F. & Joos, T. 0. Protein microarrays for diagnostic assays. Anal.
Bioanal. Chem. 393,1407-1416 (2009).
224
54. Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin.
Struct. Biol. 33,161-168 (2015).
55. Kleina, L G. & Miller, J. H. Genetic studies of the lac repressor. XIII. Extensive amino acid replacements generated
by the use of natural and synthetic nonsense suppressors. J. Mol. Biol. 212, 295-318 (1990).
56. Bergquist, P. L, Reeves, R. A. & Gibbs, M. D. Degenerate oligonucleotide gene shuffling (DOGS) and random drift
mutagenesis (RNDM): two complementary techniques for enzyme evolution. Biomol. Eng. 22, 63-72 (2005).
57. Rosic, N. N., Huang, W.Johnston, W. A., DeVoss, J. J. & Gillam, E. M.J. Extending the diversity of cytochrome P450
enzymes by DNA family shuffling. Gene 395, 40-48 (2007).
58. Sen, S., Dasu, V. V. & Mandal, B. Developments in Directed Evolution for Improving Enzyme Functions. Appl.
Biochem. Biotechnol. 143, 212-223 (2007).
59. Bershtein, S. & Tawfik, D. S. Advances in laboratory evolution of enzymes. Curr. Opin. Chem. Biol. 12, 151-158
(2008).
60. Gray, K. A. et al. Rapid Evolution of Reversible Denaturation and Elevated Melting Temperature in a Microbial
Haloalkane Dehalogenase. Adv. Synth. Catal. 343, 607-617 (2001).
61. Kretz, K. A. et al. Gene site saturation mutagenesis: a comprehensive mutagenesis approach. Methods Enzymol.
388, 3-11 (2004).
62. Sagar, D. M., Aoudjane, S., Gaudet, M., Aeppli, G. & Dalby, P. A. Optically induced thermal gradients for protein
characterization in nanolitre-scale samples in microfluidic devices. Sei. Rep. 3, 2130 (2013).
63. Yang, X. et al. A novel microfluidic system for the rapid analysis of protein thermal stability. Analyst 139, 2683-
2686 (2014).
64. Reetz, M. T., Carballeira, J. D. & Vogel, A. Iterative Saturation Mutagenesis on the Basis of B Factors as a Strategy
for Increasing Protein Thermostability. Angew. Chem. Int. Ed. 45, 7745-7751 (2006).
65. Zhang, J. etal. High-throughput screening of B factor saturation mutated Rhizomucor miehei lipase thermostability
based on synthetic reaction. Enzyme Microb. Technol. 50, 325-330 (2012).
66. Xie, Y. et al. Enhanced Enzyme Kinetic Stability by Increasing Rigidity within the Active Site. J. Biol. Chem. 289,
7994-8006 (2014).
67. Kim, H. S., Le, Q. A. T. & Kim, Y. H. Development of thermostable lipase B from Candida antarctica (CalB) through
in silico design employing B-factor and RosettaDesign. Enzyme Microb. Technol. 47,1-5 (2010).
68. Le, Q. A. T., Joo, J. C, Yoo, Y. J. & Kim, Y. H. Development of thermostable Candida antarctica lipase B through
novel in silico design of disulfide bridge. Biotechnol. Bioeng. 109, 867-876 (2012).
69. Pavelka, A., Chovancova, E. & Damborsky, J. HotSpot Wizard: a web server for identification of hot spots in protein
engineering. Nucleic Acids Res. 37, W376-W383 (2009).
70. Bendl, J. et al. HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein
engineering. Nucleic Acids Res. 44, W479-487 (2016).
71. Bosshart, A., Panke, S. & Bechtold, M. Systematic optimization of interface interactions increases the
thermostability of a multimeric enzyme. Angew. Chem. Int. Ed Engl. 52, 9673-9676 (2013).
72. Maugini, E., Tronelli, D., Bossa, F. & Pascarella, S. Structural adaptation of the subunit interface of oligomeric
thermophilic and hyperthermophilic enzymes. Comput. Biol. Chem. 33, 137-148 (2009).
73. Steipe, B., Schiller, B., Plückthun, A. & Steinbacher, S. Sequence statistics reliably predict stabilizing mutations in a
protein domain. 7. Mol. Biol. 240,188-192 (1994).
74. Wirtz, P. & Steipe, B. Intrabody construction and expression III: engineering hyperstable V(H) domains. Protein Sei.
Puhl. Protein Soc. 8, 2245-2250 (1999).
75. Sullivan, B. J. et al. Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in
triosephosphate isomerase stability. J. Mol. Biol. 420, 384-399 (2012).
76. Ashkenazy, H. et al. FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids
Res. 40, W580-W584 (2012).
77. Swofford, D. L & Maddison, W. P. Reconstructing ancestral character states under Wagner parsimony. Math.
Biosci. 87, 199-229 (1987).
78. Pagel, M. The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters
on Phylogenies. Syst. Biol. 48, 612-622 (1999).
225
79. Krishnan, N. M., Seligmann, H., Stewart, C.-B., De Koning, A. P. J. & Pollock, D. D. Ancestral sequence reconstruction
in primate mitochondrial DNA: compositional bias and effect on functional inference. Mol. Biol. Evol. 21, 1871-
1883 (2004).
80. Watanabe, K., Ohkuri, T., Yokobori, S. & Yamagishi, A. Designing Thermostable Proteins: Ancestral Mutants of 3Isopropylmalate
Dehydrogenase Designed by using a Phylogenese Tree. J. Mol. Biol. 355, 664-674 (2006).
81. Risso, V. A., Gavira, J. A., Gaucher, E. A. & Sanchez-Ruiz, J. M. Phenotypic comparisons of consensus variants versus
laboratory resurrections of Precambrian proteins. Proteins Struct. Fund. Bioinforma. 82, 887-896 (2014).
82. Cheng, J., Randall, A. & Baldi, P. Prediction of protein stability changes for single-site mutations using support
vector machines. Proteins 62,1125-1132 (2006).
83. Teng, S., Srivastava, A. K. & Wang, L Sequence feature-based prediction of protein stability changes upon amino
acid substitutions. BMC Genomics 11, S5 (2010).
84. Yin, X. etal. Contribution of Disulfide Bridges to the Thermostability of a Type A Feruloyl Esterase from Aspergillus
usamii. PLoSONE 10,(2015).
85. Borgo, B. & Havranek, J. J. Automated selection of stabilizing mutations in designed and natural proteins. Proc.
Natl. Acad. Sei. U. S. A. 109, 1494-1499 (2012).
86. Lawrence, M. S., Phillips, K. J. & Liu, D. R. Supercharging Proteins Can Impart Unusual Resilience. J. Am. Chem. Soc.
129,10110 (2007).
87. Dunbrack, R. L. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 12, 431-440 (2002).
88. Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes
in protein structure and stability. Proteins 79, 830-838 (2011).
89. Yin, S., Ding, F. & Dokholyan, N. V. Eris: an automated estimator of protein stability. Nat. Methods 4, 466-467
(2007).
90. Pokala, N. & Handel, T. M. Energy Functions for Protein Design: Adjustment with Protein-Protein Complex
Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity. J. Mol. Biol. 347, 203-
227 (2005).
91. Benedix, A., Becker, C M., de Groot, B. L., Caflisch, A. & Böckmann, R. A. Predicting free energy changes using
structural ensembles. Nat. Methods 6, 3-4 (2009).
92. Seeliger, D. & de Groot, B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations.
Biophys. J. 98, 2309-2316 (2010).
93. Worth, C L., Preissner, R. & Blundell, T. L. SDM—a server for predicting effects of mutations on protein stability
and malfunction. Nucleic Acids Res. 39, W215-W222 (2011).
94. Guerois, R., Nielsen, J. E. & Serrano, L. Predicting changes in the stability of proteins and protein complexes: a
study of more than 1000 mutations. J. Mol. Biol. 320, 369-387 (2002).
95. Dehouck, Y., Kwasigroch, J. M., Gilis, D. & Rooman, M. PoPMuSiC 2.1: a web server for the estimation of protein
stability changes upon mutation and sequence optimality. BMC Bioinformatics 12,151 (2011).
96. Johnston, M. A., S0ndergaard, C R. & Nielsen, J. E. Integrated prediction of the effect of mutations on multiple
protein characteristics. Proteins 79,165-178 (2011).
97. Capriotti, E., Fariselli, P., Calabrese, R. & Casadio, R. Predicting protein stability changes from sequences using
support vector machines. Bioinformatics 21, Ü54-Ü58 (2005).
98. Capriotti, E., Fariselli, P. & Casadio, R. A neural-network-based method for predicting protein stability changes
upon single point mutations. Bioinformatics 20, i63-i68 (2004).
99. Tian, J., Wu, N., Chu, X. & Fan, Y. Predicting changes in protein thermostability brought about by single- or multisite
mutations. BMC Bioinformatics 11, 370 (2010).
100. Masso, M. & Vaisman, 1.1. AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single
amino acid replacements. Protein Eng. Des. Sei. 23, 683-687 (2010).
101. Laimer, J., Hofer, H., Fritz, M., Wegenkittl, S. & Lackner, P. MAESTRO - multi agent stability prediction upon point
mutations. BMC Bioinformatics 16,116 (2015).
102. Potapov, V., Cohen, M. & Schreiber, G. Assessing computational methods for predicting protein stability upon
mutation: good on average but not in the details. Protein Eng. Des. Sei. 22, 553-560 (2009).
226
103. Kumar, M. D. S. et al. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid
interactions. Nucleic Acids Res. 34, D204-206 (2006).
104. Khan, S. & Vihinen, M. Performance of protein stability predictors. Hum. Mutat. 31, 675-684 (2010).
105. Thiltgen, G. & Goldstein, R. A. Assessing Predictors of Changes in Protein Stability upon Mutation Using SelfConsistency.
PLOS ONE 7, e46084 (2012).
106. Goldenzweig, A. et al. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression
and Stability. Mol. Cell 63, 337-346 (2016).
107. Fleishman, S.J. etal. RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite.
PLOS ONE 6, e20161 (2011).
108. Jorgensen, W. L. The many roles of computation in drug discovery. Science 303,1813-1818 (2004).
109. Moitessier, N., Englebienne, P., Lee, D., Lawandi, J. & Corbeil, C R. Towards the development of universal, fast and
highly accurate docking/scoring methods: a long way to go. Br. J. Pharmacol. 153, S7-S26 (2008).
110. Daniel, L., Buryska, T., Prokop, Z., Damborsky, J. & Brezovsky, J. Mechanism-based discovery of novel substrates of
haloalkane dehalogenases using in silico screening. J. Chem. Inf. Model. 55, 54-62 (2015).
111. Meng, X.-Y., Zhang, H.-X., Mezei, M. & Cui, M. Molecular Docking: A powerful approach for structure-based drug
discovery. Curr. Comput. Aided Drug Des. 7,146-157 (2011).
112. Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. & Ferrin, T. E. A geometric approach to macromolecule-ligand
interactions. J. Mol. Biol. 161, 269-288 (1982).
113. Perola, E., Walters, W. P. & Charifson, P. S. A detailed comparison of current docking and scoring methods on
systems of pharmaceutical relevance. Proteins 56, 235-249 (2004).
114. Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J.
Comput. Chem. 30, 2785-2791 (2009).
115. Sherman, W., Day, T., Jacobson, M. P., Friesner, R. A. & Farid, R. Novel procedure for modeling ligand/receptor
induced fit effects. J. Med. Chem. 49, 534-553 (2006).
116. Zhao, Y. & Sanner, M. F. FLIPDock: docking flexible ligands into flexible receptors. Proteins 68, 726-737 (2007).
117. Trott, 0. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function,
efficient optimization, and multithreading. J. Comput. Chem. 31, 455-461 (2010).
118. Wong, C F. Flexible receptor docking for drug discovery. Expert Opin. Drug Discov. 10,1189-1200 (2015).
119. Burkhard, P., Taylor, P. & Walkinshaw, M. D. An example of a protein ligand found by database mining: description
of the docking method and its verification by a 2.3 A X-ray structure of a thrombin-ligand complex. J. Mol. Biol.
277,449-466(1998).
120. Miller, M. D., Kearsley, S. K., Underwood, D. J. & Sheridan, R. P. FLOG: a system to select 'quasi-flexible' ligands
complementary to a receptor of known three-dimensional structure. J. Comput. Aided Mol. Des. 8,153-174(1994).
121. Ewing, T. J., Makino, S., Skillman, A. G. & Kuntz, I. D. DOCK 4.0: search strategies for automated molecular docking
of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411-428 (2001).
122. Zavodszky, M. I. & Kuhn, L. A. Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis.
Protein Sci. Publ. Protein Soc. 14,1104-1114 (2005).
123. Welch, W., Ruppert, J. & Jain, A. N. Hammerhead: fast, fully automated docking of flexible ligands to protein
binding sites. Chem. Biol. 3, 449-462 (1996).
124. Goodsell, D. S. & Olson, A. J. Automated docking of substrates to proteins by simulated annealing. Proteins Struct.
Fund. Bioinforma. 8,195-202 (1990).
125. McMartin, C & Bohacek, R. S. QXP: powerful, rapid computer algorithms for structure-based drug design. J.
Comput. Aided Mol. Des. 11, 333-344 (1997).
126. Abagyan, R., Totrov, M. & Kuznetsov, D. ICM—A new method for protein modeling and design: Applications to
docking and structure prediction from the distorted native conformation. J. Comput. Chem. 15, 488-506 (1994).
127. Morris, G. M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy
function. J Comput Chem 19,1639-1662 (1998).
128. Taylor, J. S. & Burnett, R. M. DARWIN: a program for docking flexible molecules. Proteins 41,173-191 (2000).
129. Verdonk, M. L., Cole, J. C, Hartshorn, M. J., Murray, C W. & Taylor, R. D. Improved protein-ligand docking using
GOLD. Proteins 52, 609-623 (2003).
227
130. Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery:
methods and applications. Nat. Rev. Drug Discov. 3, 935-949 (2004).
131. Liu, J. & Wang, R. Classification of Current Scoring Functions. J. Chem. Inf. Model. 55, 475-482 (2015).
132. Huang, S.-Y. & Zou, X. Inclusion of solvation and entropy in the knowledge-based scoring function for proteinligand
interactions. 7. Chem. Inf. Model. 50, 262-273 (2010).
133. Pikkemaat, M. G., Linssen, A. B. M., Berendsen, H. J. C & Janssen, D. B. Molecular dynamics simulations as a tool
for improving protein stability. Protein Eng. 15,185-192 (2002).
134. Doss, C G. P. etal. Screening of mutations affecting protein stability and dynamics of FGFR1—A simulation analysis.
Appl. Transl. Genomics 1, 37-43 (2012).
135. Chen, Z., Fu, Y., Xu, W. & Li, M. Molecular Dynamics Simulation of Barnase: Contribution of Noncovalent
Intramolecular Interaction to Thermostability. Math. Probl. Eng. 2013, e504183 (2013).
136. Bernardi, R. C, Cann, I. & Schulten, K. Molecular dynamics study of enhanced Man5B enzymatic activity.
Biotechnol. Biofuels 1, 83 (2014).
137. Osuna, S., Jimenez-Oses, G., Noey, E. L. & Houk, K. N. Molecular Dynamics Explorations of Active Site Structure in
Designed and Evolved Enzymes. Acc. Chem. Res. 48,1080-1089 (2015).
138. Vettoretti, G. et al. Molecular Dynamics Simulations Reveal the Mechanisms of Allosteric Activation of Hsp90 by
Designed Ligands. Sci. Rep. 6, 23830 (2016).
139. Lindorff-Larsen, K., Piana, S., Dror, R. 0. & Shaw, D. E. How fast-folding proteins fold. Science 334, 517-520 (2011).
140. Piana, S., Klepeis, J. L. & Shaw, D. E. Assessing the accuracy of physical models used in protein-folding simulations:
quantitative evidence from long molecular dynamics simulations. Curr. Opin. Struct. Biol. 24, 98-105 (2014).
141. Miao, Y., Feixas, F., Eun, C & McCammon, J. A. Accelerated molecular dynamics simulations of protein folding. J.
Comput. Chem. 36,1536-1549 (2015).
142. Seshasayee, A. S. N. High-Temperature unfolding of a trp-Cage mini-protein: a molecular dynamics simulation
study. Theor. Biol. Med. Model. 2, 7 (2005).
143. Lindorff-Larsen, K., Trbovic, N., Maragakis, P., Piana, S. & Shaw, D. E. Structure and Dynamics of an Unfolded
Protein Examined by Molecular Dynamics Simulation. J. Am. Chem. Soc. 134, 3787-3791 (2012).
144. Vogel, M. Temperature-Dependent Mechanisms for the Dynamics of Protein-Hydration Waters: A Molecular
Dynamics Simulation Study. J. Phys. Chem. B 113, 9386-9392 (2009).
145. Fogarty, A. C & Laage, D. Water Dynamics in Protein Hydration Shells: The Molecular Origins of the Dynamical
Perturbation. J. Phys. Chem. B 118, 7715-7729 (2014).
146. Chen, Q., Luan, Z.-J., Cheng, X. & Xu, J.-H. Molecular Dynamics Investigation of the Substrate Binding Mechanism
in Carboxylesterase. Biochemistry (Mosc.j 54,1841-1848 (2015).
147. Manjunath, K., Jeyakanthan, J. & Sekar, K. Catalytic pathway, substrate binding and stability in SAICAR synthetase:
A structure and molecular dynamics study. J. Struct. Biol. 191, 22-31 (2015).
148. Lawson, J. D., Pate, E., Rayment, I. & Yount, R. G. Molecular Dynamics Analysis of Structural Factors Influencing
Back Door Pi Release in Myosin. Biophys. J. 86, 3794-3803 (2004).
149. Choutko, A. & van Gunsteren, W. F. Molecular dynamics simulation of the last step of a catalytic cycle: Product
release from the active site of the enzyme chorismate mutase from Mycobacterium tuberculosis. Protein Sci. Publ.
Protein Soc. 21, 1672-1681 (2012).
150. Haaffner, F., Norin, T. & Hult, K. Molecular Modeling of the Enantioselectivity in Lipase-Catalyzed
Transesterification Reactions. Biophys. J. 74,1251-1262 (1998).
151. Wijma, H. J., Marrink, S. J. & Janssen, D. B. Computationally efficient and accurate enantioselectivity modeling by
clusters of molecular dynamics simulations. J. Chem. Inf. Model. 54, 2079-2092 (2014).
152. Leach, A. R. Molecular Modelling: Principles and Applications. (Pearson Education, 2001).
153. Cramer, C J. Essentials of Computational Chemistry: Theories and Models. (Wiley, 2002).
154. Piana, S. et al. Evaluating the Effects of Cutoffs and Treatment of Long-range Electrostatics in Protein Folding
Simulations. PLoS ONE 7, (2012).
155. Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: An N-log(N) method for Ewald sums in large systems. J.
Chem. Phys. 98,10089-10092 (1993).
228
156. Wu, X. & Brooks, B. Ft. Isotropic periodic sum: A method for the calculation of long-range interactions. J. Chem.
Phys. 122, 044107 (2005).
157. Lagüe, P., Pastor, R. W. & Brooks, B. R. Pressure-Based Long-Range Correction for Lennard-Jones Interactions in
Molecular Dynamics Simulations: Application to Alkanes and Interfaces. J. Phys. Chem. B 108, 363-368 (2004).
158. Hockney, R. W. Potential Calculation and Some Applications. Methods Comput Phys 9 135-2111970 (1970).
159. Verlet, L. Computer 'Experiments' on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules.
Phys. Rev. 159, 98-103 (1967).
160. Beeman, D. Some multistep methods for use in molecular dynamics calculations. J. Comput. Phys. 20, 130-139
(1976).
161. Senn, H. M. & Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem. Int. Ed. 48,1198-1229 (2009).
162. van Duin, A. C T., Dasgupta, S., Lorant, F. & Goddard, W. A. ReaxFF: A Reactive Force Field for Hydrocarbons. J.
Phys. Chem. A 105, 9396-9409 (2001).
163. Warshel, A. & Levitt, M. Theoretical studies of enzymic reactions: Dielectric, electrostatic and steric stabilization
of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 103, 227-249 (1976).
164. Groenhof, G. Introduction to QM/MM simulations. Methods Mol. Biol. Clifton NJ 924, 43-66 (2013).
165. Damborský, J. etal. Structure-specificity relationships for haloalkane dehalogenases. Environ. Toxicol. Chem. SETAC
20, 2681-2689 (2001).
166. Prokop, Z. et al. Enantioselectivity of haloalkane dehalogenases and its modulation by surface loop engineering.
Angew. Chem. Int. Ed Engl. 49, 6111-6115 (2010).
167. Westerbeek, A. et al. Kinetic Resolution of ot-Bromoamides: Experimental and Theoretical Investigation of Highly
Enantioselective Reactions Catalyzed by Haloalkane Dehalogenases. Adv. Synth. Catal. 353, 931-944 (2011).
168. Dvorak, P., Bidmanova, S., Damborsky, J. & Prokop, Z. Immobilized synthetic pathway for biodegradation of toxic
recalcitrant pollutant 1,2,3-trichloropropane. Environ. Sei. Technol. 48, 6859-6866 (2014).
169. Prokop, Z., Oplustil, F., DeFrank, J. & Damborský, J. Enzymes fight chemical weapons. Biotechnol. J. 1, 1370-80
(2006).
170. Bidmanova, S., Chaloupková, R., Damborsky, J. & Prokop, Z. Development of an enzymatic fiber-optic biosensor
for detection of halogenated hydrocarbons. Anal. Bioanal. Chem. 398,1891-1898 (2010).
171. Mazzucchelli, S. et al. Orientation-controlled conjugation of haloalkane dehalogenase fused homing peptides to
multifunctional nanoparticles for the specific recognition of cancer cells. Angew. Chem. Int. Ed Engl. 52, 3121-3125
(2013).
172. Kulakova, A. N., Larkin, M. J. & Kulakov, L. A. The plasmid-located haloalkane dehalogenase gene from
Rhodococcus rhodochrous NCIMB 13064. Microbiol. Read. Engl. 143 ( Pt 1), 109-115 (1997).
173. Ollis, D. L. et al. The alpha/beta hydrolase fold. Protein Eng. 5, 197-211 (1992).
174. Chovancová, E., Kosinski, J., Bujnicki, J. M. & Damborský, J. Phylogenetic analysis of haloalkane dehalogenases.
Proteins Struct. Fund. Bioinforma. 67, 305-316 (2007).
175. Verschueren, K. H., Seljée, F., Rozeboom, H. J., Kalk, K. H. & Dijkstra, B. W. Crystallographic analysis of the catalytic
mechanism of haloalkane dehalogenase. Nature 363, 693-698 (1993).
176. Boháč, M. et al. Halide-Stabilizing Residues of Haloalkane Dehalogenases Studied by Quantum Mechanic
Calculations and Site-Directed Mutagenesis. Biochemistry (Mose.) 41, 14272-14280 (2002).
177. Prokop, Z. etal. Catalytic Mechanism of the Haloalkane Dehalogenase LinBfromSphingomonas paucimobilis UT26.
J. Biol. Chem. 278, 45094-45100 (2003).
178. Janssen, D. B. Evolving haloalkane dehalogenases. Curr. Opin. Chem. Biol. 8, 150-159 (2004).
179. Imai, R. et al. Dehydrochlorination of y-Hexachlorocyclohexane (y-BHC) by y-BHC-Assimilating Pseudomonas
paucimobilis. Agric. Biol. Chem. 53, 2015-2017 (1989).
180. Nagata, Y. et al. Purification and characterization of y-hexachlorocyclohexane (y-HCH) dehydro-chlorinase (LinA)
from Pseudomonas paucimobilis. Biosci. Biotechnol. Biochem. 57, 1582-1583 (1993).
181. Nagata, Y., Mori, K., Takagi, M., Murzin, A. G. & Damborský, J. Identification of protein fold and catalytic residues
of gamma-hexachlorocyclohexane dehydrochlorinase LinA. Proteins 45, 471-477 (2001).
182. Álvarez-Pedrerol, M. et al. Thyroid disruption at birth due to prenatal exposure to ß-hexachlorocyclohexane.
Environ. Int. 34, 737-740 (2008).
229
183. Nagata, Y., Endo, R., Ito, M., Ohtsubo, Y. & Tsuda, M. Aerobic degradation of lindane (gammahexachlorocyclohexane)
in bacteria and its biochemical and molecular basis. Appl. Microbiol. Biotechnol. 76, 741-
752 (2007).
184. Okai, M. etal. Crystal Structure of y-Hexachlorocyclohexane Dehydrochlorinase LinAfrom Sphingobium japonicum
UT26. J. Mol. Biol. 403, 260-269 (2010).
185. Chen, J. M., Xu, S. L, Wawrzak, Z., Basarab, G. S. & Jordan, D. B. Structure-Based Design of Potent Inhibitors of
Scytalone Dehydratase: Displacement of a Water Molecule from the Active Site. Biochemistry (Mosc.) 37,17735-
17744(1998).
186. Macwan, A. S. etal. Crystal Structure of the Hexachlorocyclohexane Dehydrochlorinase (LinA-Type2): Mutational
Analysis, Thermostability and Enantioselectivity. PLoS ONE 7, (2012).
187. Trantirek, L etal. Reaction mechanism and stereochemistry of gamma-hexachlorocyclohexane dehydrochlorinase
LinA. J. Biol. Chem. 276, 7734-7740 (2001).
188. Ornitz, D. M. & Itoh, N. Fibroblast growth factors. Genome Biol. 2, reviews3005.1-reviews3005.12 (2001).
189. Itoh, N. & Ornitz, D. M. Evolution of the Fgf and Fgfr gene families. Trends Genet. 20, 563-569 (2004).
190. Beenken, A. & Mohammadi, M. The FGF family: biology, pathophysiology and therapy. Nat. Rev. Drug Discov. 8,
235-253 (2009).
191. Kim, H. S. Assignmentl of the human basic fibroblast growth factor gene FGF2 to chromosome 4 band q26 by
radiation hybrid mapping. Cytogenet. Cell Genet. 83, 73 (1998).
192. House, S. L et al. Cardiac-Specific Overexpression of Fibroblast Growth Factor-2 Protects Against Myocardial
Dysfunction and Infarction in a Murine Model of Low-Flow Ischemia. Circulation 108, 3140-3148 (2003).
193. Aviles, R. J., Annex, B. H. & Lederman, R. J. Testing clinical therapeutic angiogenesis using basic fibroblast growth
factor (FGF-2). Br. J. Pharmacol. 140, 637-646 (2003).
194. Kitamura, M. et al. Periodontal Tissue Regeneration Using Fibroblast Growth Factor -2: Randomized Controlled
Phase II Clinical Trial. PLOS ONE 3, e2611 (2008).
195. Turner, C A., Gula, E. L, Taylor, L. P., Watson, S. J. & Akil, H. Antidepressant-like effects of intracerebroventricular
FGF2 in rats. Brain Res. 1224, 63-68 (2008).
196. Lotz, S. et al. Sustained Levels of FGF2 Maintain Undifferentiated Stem Cell Cultures with Biweekly Feeding. PLoS
ONE 8, (2013).
197. Florkiewicz, R. Z. & Sommer, A. Human basic fibroblast growth factor gene encodes four polypeptides: three
initiate translation from non-AUG codons. Proc. Natl. Acad. Sci. 86, 3978-3981 (1989).
198. Zhang, J. D., Cousens, L. S., Barr, P. J. & Sprang, S. R. Three-dimensional structure of human basic fibroblast growth
factor, a structural homolog of interleukin 1 beta. Proc. Natl. Acad. Sci. U. S. A. 88, 3446-3450 (1991).
199. Baird, A., Schubert, D., Ling, N. & Guillemin, R. Receptor- and heparin-binding domains of basic fibroblast growth
factor. Proc. Natl. Acad. Sci. U. S. A. 85, 2324-2328 (1988).
200. Bikfalvi, A., Klein, S., Pintucci, G. & Rifkin, D. B. Biological roles of fibroblast growth factor-2. Endocr. Rev. 18, 26-
45 (1997).
201. Polizzi, K. M., Bommarius, A. S., Broering, J. M. & Chaparro-Riggers, J. F. Stability of biocatalysts. Curr. Opin. Chem.
Biol. 11, 220-225 (2007).
202. Ferdjani, S. et al. Correlation between thermostability and stability of glycosidases in ionic liquid. Biotechnol. Lett.
33,1215-1219 (2011).
203. Gao, D. et al. Thermostable variants of cocaine esterase for long-time protection against cocaine toxicity. Mol.
Pharmacol. 75, 318-323 (2009).
204. Wijma, H. J., Floor, R. J. & Janssen, D. B. Structure- and sequence-analysis inspired engineering of proteins for
enhanced thermostability. Curr. Opin. Struct. Biol. 23, 588-594 (2013).
205. Bommarius, A. S. & Paye, M. F. Stabilizing biocatalysts. Chem. Soc. Rev. 42, 6534-6565 (2013).
206. Bloom, J. D., Labthavikul, S. T., Otey, C R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl. Acad.
Sci. U. S. A. 103, 5869-5874 (2006).
207. Gumulya, Y. & Reetz, M. T. Enhancing the thermal robustness of an enzyme by directed evolution: least favorable
starting points and inferior mutants can map superior evolutionary pathways. Chembiochem Eur. J. Chem. Biol. 12,
2502-2510 (2011).
230
208. Seitz, T. et al. Enhancing the stability and solubility of the glucocorticoid receptor ligand-binding domain by highthroughput
library screening. J. Mol. Biol. 403, 562-577 (2010).
209. Damborsky, J. & Brezovsky, J. Computational tools for designing and engineering biocatalysts. Curr. Opin. Chem.
Biol. 13, 26-34 (2009).
210. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of
residues in the access tunnel. Angew. Chem. Int. Ed Engl. 52,1959-1963 (2013).
211. Wijma, H. J. etal. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sei. 27, 49-
58 (2014).
212. Komor, R. S., Romero, P. A., Xie, C B. & Arnold, F. H. Highly thermostable fungal cellobiohydrolase I (Cel7A)
engineered using predictive methods. Protein Eng. Des. Sei. PEDS 25, 827-833 (2012).
213. Reetz, M.T., Soni, P., Acevedo, J. P. & Sanchis, J. Creation of an amino acid network of structurally coupled residues
in the directed evolution of a thermostable enzyme. Angew. Chem. Int. Ed Engl. 48, 8268-8272 (2009).
214. Khatun, J., Khare, S. D. & Dokholyan, N. V. Can contact potentials reliably predict stability of proteins? J. Mol. Biol.
336, 1223-1238 (2004).
215. Parthiban, V., Gromiha, M. M. & Schomburg, D. CUPSAT: prediction of protein stability upon point mutations.
Nucleic Acids Res. 34, W239-242 (2006).
216. Pupko, T., Bell, R. E., Mayrose, I., Glaser, F. & Ben-Tal, N. Rate4Site: an algorithmic tool for the identification of
functional regions in proteins by surface mapping of evolutionary determinants within their homologues.
Bioinforma. Oxf. Engl. 18 Suppl 1, S71-77 (2002).
217. Landau, M. et al. ConSurf 2005: the projection of evolutionary conservation scores of residues on protein
structures. Nucleic Acids Res. 33, W299-302 (2005).
218. Ashkenazy, H., Erez, E., Maitz, E., Pupko, T. & Ben-Tal, N. ConSurf 2010: calculating evolutionary conservation in
sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 38, W529-533 (2010).
219. Lehmann, M. et al. The consensus concept for thermostability engineering of proteins: further proof of concept.
Protein Eng. 15, 403-411 (2002).
220. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of
residues in the access tunnel. Angew. Chem. Int. Ed Engl. 52,1959-1963 (2013).
221. Gray, K. A. et al. Rapid Evolution of Reversible Denaturation and Elevated Melting Temperature in a Microbial
Haloalkane Dehalogenase. Adv. Synth. Catal. 343, 607-617 (2001).
222. Kuipers, R. K. P. et al. Correlated mutation analyses on super-family alignments reveal functionally important
residues. Proteins 76, 608-616 (2009).
223. Koudelakova, T. et al. Substrate specificity of haloalkane dehalogenases. Biochem. J. 435, 345-54 (2011).
224. Diaz, J. E. etal. Computational design and selections for an engineered, thermostable terpene synthase. Protein
Sei. Puhl. Protein Soc. 20,1597-1606 (2011).
225. Floor, R. J. etal. Computational library design for increasing haloalkane dehalogenase stability. Chembiochem Eur.
J. Chem. Biol. 15, 1660-1672 (2014).
226. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic
Acids Res. 25, 3389-3402 (1997).
227. Sayers, E. W. etal. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res.
40, D13-D25 (2012).
228. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide
sequences. Bioinforma. Oxf. Engl. 22,1658-1659 (2006).
229. Frickey, T. & Lupas, A. CLANS: a Java application for visualizing protein families based on pairwise similarity.
Bioinforma. Oxf. Engl. 20, 3702-3704 (2004).
230. Edgar, R. C MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC
Bioinformatics 5,113 (2004).
231. Mika, S. & Rost, B. UniqueProt: Creating representative protein sequence sets. Nucleic Acids Res. 31, 3789-3791
(2003).
232. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families
using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691-699 (2001).
231
233. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database forthe
investigation of sequences and structures. J. Mol. Biol. 247, 536-540 (1995).
234. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235-42 (2000).
235. The PyMOL molecular graphics system, version 1.7, Schrodinger, LLC.
236. Costantini, S., Colonna, G. & Facchiano, A. M. ESBRI: A web server for evaluating salt bridges in proteins.
Bioinformation 3,137-138 (2008).
237. Tina, K. G., Bhadra, R. & Srinivasan, N. PIC: Protein Interactions Calculator. Nucleic Acids Res. 35, W473-476 (2007).
238. Pavlova, M. etal. Redesigning dehalogenase access tunnels as a strategy for degrading an anthropogenic substrate.
Nat. Chem. Biol. 5, 727-33 (2009).
239. Okai, M. et al. Crystallization and preliminary X-ray analysis of gamma-hexachlorocyclohexane dehydrochlorinase
LinA from Sphingobium japonicum UT26. Acta Crystallograph. Sect. F Struct. Biol. Cryst. Commun. 65, 822-824
(2009).
240. Stepankova, V., Damborsky, J. & Chaloupkova, R. Organic co-solvents affect activity, stability and enantioselectivity
of haloalkane dehalogenases. Biotechnol. J. 8, 719-729 (2013).
241. Iwasaki, I., Utsumi, S., 0. New Colorimetric Determination of Chloride usinf Mercuric Thiocyanate and Ferric Ion.
Bull Chem SocJpn 25, 226-226 (1952).
242. Kelly, S. M., Jess, T. J. & Price, N. C How to study proteins by circular dichroism. Biochim. Biophys. Acta 1751,119-
139 (2005).
243. Ladbrooke, B. D., Williams, R. M. & Chapman, D. Studies on lecithin-cholesterol-water interactions by differential
scanning calorimetry and X-ray diffraction. Biochim. Biophys. Acta 150, 333-340 (1968).
244. Palackal, N. et al. An evolutionary route to xylanase process fitness. Protein Sci. Publ. Protein Soc. 13, 494-503
(2004).
245. Johannes, T. W., Woodyer, R. D. & Zhao, H. Directed evolution of a thermostable phosphite dehydrogenase for
NAD(P)H regeneration. Appl. Environ. Microbiol. 71, 5728-5734 (2005).
246. Seo, J. H., Yu, J. H., Suh, H., Kim, M.-S. & Cho, S.-R. Fibroblast growth factor-2 induced by enriched environment
enhances angiogenesis and motor function in chronic hypoxic-ischemic brain injury. PloS One 8, e74405 (2013).
247. Ilkow, C S. et al. Reciprocal cellular cross-talk within the tumor microenvironment promotes oncolytic virus
activity. Nat. Med. 21, 530-536 (2015).
248. Turner, C A., Clinton, S. M., Thompson, R. C, Watson, S. J. & Akil, H. Fibroblast growth factor-2 (FGF2)
augmentation early in life alters hippocampal development and rescues the anxiety phenotype in vulnerable
animals. Proc. Natl. Acad. Sci. U. S. A. 108, 8021-8025 (2011).
249. Ortega, S., Ittmann, M., Tsang, S. H., Ehrlich, M. & Basilico, C Neuronal defects and delayed wound healing in mice
lacking fibroblast growth factor 2. Proc. Natl. Acad. Sci. U. S. A. 95, 5672-5677 (1998).
250. Sun, B. K., Siprashvili, Z. & Khavari, P. A. Advances in skin grafting and treatment of cutaneous wounds. Science
346, 941-945 (2014).
251. Song, X. etal. Growth Factor FGF2 Cooperates with lnterleukin-17 to Repair Intestinal Epithelial Damage. Immunity
43,488-501 (2015).
252. Levenstein, M. E. et al. Basic fibroblast growth factor support of human embryonic stem cell self-renewal. Stem
Cells Dayt. Ohio 24, 568-574 (2006).
253. Chen, G., Gulbranson, D. R., Yu, P., Hou, Z. & Thomson, J. A. Thermal stability of fibroblast growth factor protein is
a determinant factor in regulating self-renewal, differentiation, and reprogramming in human pluripotent stem
cells. Stem Cells Dayt. Ohio 30, 623-630 (2012).
254. Buchtova, M. etal. Instability restricts signaling of multiple fibroblast growth factors. Cell. Mol. Life Sci. CMLS 72,
2445-2459 (2015).
255. Furue, M. K. et al. Heparin promotes the growth of human embryonic stem cells in a defined serum-free medium.
Proc. Natl. Acad. Sci. U. S. A. 105, 13409-13414 (2008).
256. Nguyen, T. H. et al. A heparin-mimicking polymer conjugate stabilizes basic fibroblast growth factor. Nat. Chem. 5,
221-227 (2013).
257. Yoneda, A., Asada, M., Oda, Y., Suzuki, M. & Imamura, T. Engineering of an FGF-proteoglycan fusion protein with
heparin-independent, mitogenic activity. Nat. Biotechnol. 18, 641-644 (2000).
232
258. Bornscheuer, U. T. et al. Engineering the third wave of biocatalysis. Nature 485,185-194 (2012).
259. Dubey, V. K., Lee, J., Somasundaram, T., Blaber, S. & Blaber, M. Spackling the crack: stabilizing human fibroblast
growth factor-1 by targeting the N and C terminus beta-strand interactions. J. Mol. Biol. 371, 256-268 (2007).
260. Blaber, S. I., Diaz, J. & Blaber, M. Accelerated healing in NONcNZO10/LtJ type 2 diabetic mice by FGF-1. Wound
Repair Regen. Off. Puhl. Wound Heal. Soc. Eur. Tissue Repair Soc. 23, 538-549 (2015).
261. Jeong, S. S. Thermostable variants of fibroblast growth factors. (2012).
262. Nolle, V. FIBROBLAST GROWTH FACTOR MUTEINS WITH INCREASED ACTIVITY. (2015).
263. Bednar, D. et al. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point
Mutants. PLoS Comput. Biol. 11, (2015).
264. Sprugel, K. H., McPherson, J. M., Clowes, A. W. & Ross, R. Effects of growth factors in vivo. I. Cell ingrowth into
porous subcutaneous chambers. Am. J. Pathol. 129, 601-613 (1987).
265. du Cros, D. L. Fibroblast growth factor influences the development and cycling of murine hair follicles. Dev. Biol.
156,444-453(1993).
266. Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT.
Nucleic Acids Symp. Ser. 41, 95-98 (1999).
267. Mayrose, I., Graur, D., Ben-Tal, N. & Pupko, T. Comparison of Site-Specific Rate-Inference Methods for Protein
Sequences: Empirical Bayesian Methods Are Superior. Mol. Biol. Evol. 21, 1781-1791 (2004).
268. Krissinel, E. Stock-based detection of protein oligomeric states in jsPISA. Nucleic Acids Res. 43, W314-W319 (2015).
269. Krejci, P., Pejchalova, K. & Wilcox, W. R. Simple, mammalian cell-based assay for identification of inhibitors of the
Erk MAP kinase pathway. Invest. New Drugs 25, 391-395 (2007).
270. Lo, M.-C et al. Evaluation of fluorescence-based thermal shift assays for hit identification in drug discovery. Anal.
Biochem. 332, 153-159 (2004).
271. Dvorak, P. etal. Expression and Potential Role of Fibroblast Growth Factor 2 and Its Receptors in Human Embryonic
Stem Cells. STEM CELLS 23, 1200-1211 (2005).
272. Eiselleova, L. et al. A Complex Role for FGF-2 in Self-Renewal, Survival, and Adhesion of Human Embryonic Stem
Cells. STEM CELLS 27,1847-1857 (2009).
273. Horák, D. etal. Use of magnetic hydrazide-modified polymer microspheres for enrichment of Francisella tularensis
glycoproteins. Soft Matter 8, 2775-2786 (2012).
274. Müller-Röver, S. et al. A Comprehensive Guide for the Accurate Classification of Murine Hair Follicles in Distinct
Hair Cycle Stages. J. Invest. Dermatol. 117, 3-15 (2001).
275. Beadle, B. M. & Shoichet, B. K. Structural bases of stability-function tradeoffs in enzymes. J. Mol. Biol. 321, 285-
296 (2002).
276. Schreiber, G., Buckle, A. M. & Fersht, A. R. Stability and function: two constraints in the evolution of barstar and
other proteins. Struct. Lond. Engl. 1993 2, 945-951 (1994).
277. Thomas, V. L., McReynolds, A. C & Shoichet, B. K. Structural bases for stability-function tradeoffs in antibiotic
resistance. J. Mol. Biol. 396, 47-59 (2010).
278. Wang, X., Minasov, G. & Shoichet, B. K. Evolution of an antibiotic resistance enzyme constrained by stability and
activity trade-offs. J. Mol. Biol. 320, 85-95 (2002).
279. Davids, T., Schmidt, M., Böttcher, D. & Bornscheuer, U. T. Strategies for the discovery and engineering of enzymes
for biocatalysis. Curr. Opin. Chem. Biol. 17, 215-220 (2013).
280. Chaloupková, R. et al. Modification of activity and specificity of haloalkane dehalogenase from Sphingomonas
paucimobilis UT26 by engineering of its entrance tunnel. J. Biol. Chem. 278, 52622-8 (2003).
281. Holland, J. T. et al. Rational Redesign of Glucose Oxidase for Improved Catalytic Function and Stability. PLoS ONE
7, (2012).
282. Zhao, H., Chockalingam, K. & Chen, Z. Directed evolution of enzymes and pathways for industrial biocatalysis. Curr.
Opin. Biotechnol. 13,104-110 (2002).
283. Dalby, P. A. Strategy and success for the directed evolution of enzymes. Curr. Opin. Struct. Biol. 21,473-480(2011).
284. Lehmann, M. & Wyss, M. Engineering proteins for thermostability: the use of sequence alignments versus rational
design and directed evolution. Curr. Opin. Biotechnol. 12, 371-375 (2001).
233
285. Chica, R. A., Doucet, N. & Pelletier, J. N. Semi-rational approaches to engineering enzyme activity: combining the
benefits of directed evolution and rational design. Curr. Opin. Biotechnol. 16, 378-384 (2005).
286. Brundiek, H. B., Evitt, A. S., Kourist, R. & Bornscheuer, U. T. Creation of a lipase highly selective for trans fatty acids
by protein engineering. Angew. Chem. Int. Ed Engl. 51, 412-4 (2012).
287. Stucki, G. &Thueer, M. Experiences of a Large-Scale Application of 1,2-Dichloroethane Degrading Microorganisms
for Groundwater Treatment. Environ. Sci. Technol. 29, 2339-2345 (1995).
288. Skopelitou, K., Georgakis, N., Efrose, R., Flemetakis, E. & Labrou, N. E. Sol-gel immobilization of haloalkane
dehalogenase from Bradyrhizobium japonicum for the remediation 1,2-dibromoethane../. Mol. Catal. B Enzym. 97,
5-11 (2013).
289. Bidmanova, S., Hlavacek, A., Damborsky, J. & Prokop, Z. Conjugation of 5(6)-carboxyfluorescein and 5(6)carboxynaphthofluorescein
with bovine serum albumin and their immobilization for optical pH sensing. Sens.
Actuators B Chem. 161, 93-99 (2012).
290. Koudelakova, T. etal. Haloalkane dehalogenases: biotechnological applications. Biotechnol. J. 8, 32-45 (2013).
291. Los, G. V. et al. HaloTag: a novel protein labeling technology for cell imaging and protein analysis. ACS Chem. Biol.
3, 373-382 (2008).
292. Chaloupkova, R., Prokop, Z., Sato, Y., Nagata, Y. & Damborsky, J. Stereoselectivity and conformational stability of
haloalkane dehalogenase DbjA from Bradyrhizobium japonicum USDA110: the effect of pH and temperature. FEBS
J. 278, 2728-2738 (2011).
293. Reetz, M. T. & Carballeira, J. D. Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional
enzymes. Nat. Protoc. 2, 891-903 (2007).
294. Holloway, P., Trevors, J. T. & Lee*, H. A colorimetric assay for detecting haloalkane dehalogenase activity. J.
Microbiol. Methods 32, 31-36 (1998).
295. Circular Dichroism and the Conformational Analysis of / G.D. Fasman / Springer.
296. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of
residues in the access tunnel. Angew. Chem. Int. Ed Engl. 52,1959-63 (2013).
297. Stsiapanava, A. et al. Atomic resolution studies of haloalkane dehalogenases DhaA04, DhaA14 and DhaA15 with
engineered access tunnels. Acta Crystallogr. D Biol. Crystallogr. 66, 962-9 (2010).
298. Newman, J. etal. Haloalkane dehalogenases: structure of a Rhodococcus enzyme. Biochemistry (Mosc.) 38,16105-
16114(1999).
299. Chovancova, E. etal. CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS
Comput. Biol. 8, el002708 (2012).
300. Damborsky, J. & Brezovsky, J. Computational tools for designing and engineering enzymes. Curr. Opin. Chem. Biol.
19,8-16 (2014).
301. Yan, B. X. & Sun, Y. Q. Glycine Residues Provide Flexibility for Enzyme Active Sites. J. Biol. Chem. 272, 3190-3194
(1997).
302. Neurath, H. The Role of Glycine in Protein Structure. J. Am. Chem. Soc. 65, 2039-2041 (1943).
303. Feller, G. Protein stability and enzyme activity at extreme biological temperatures. J. Phys. Condens. Matter Inst.
Phys. J. 22, 323101 (2010).
304. Jochens, H., Aerts, D. & Bornscheuer, U. T. Thermostabilization of an esterase by alignment-guided focussed
directed evolution. Protein Eng. Des. Sel. 23, 903-909 (2010).
305. Reetz, M. T., Soni, P., Fernandez, L., Gumulya, Y. & Carballeira, J. D. Increasing the stability of an enzyme toward
hostile organic solvents by directed evolution based on iterative saturation mutagenesis using the B-FIT method.
Chem. Commun. Camb. Engl. 46, 8657-8658 (2010).
306. Fields, P. A. Review: Protein function at thermal extremes: balancing stability and flexibility. Comp. Biochem.
Physiol. A. Mol. Integr. Physiol. 129, 417-431 (2001).
307. Cesarini, S., Bofill, C, Pastor, F. I. J., Reetz, M. T. & Diaz, P. A thermostable variant of P. aeruginosa cold-adapted
LipC obtained by rational design and saturation mutagenesis. Process Biochem. 47, 2064-2071 (2012).
308. Tsou, C L. Conformational flexibility of enzyme active sites. Science 262, 380-381 (1993).
234
309. Banás, P., Otyepka, M., Jeřábek, P., Petrek, M. & Damborský, J. Mechanism of enhanced conversion of 1,2,3trichloropropane
by mutant haloalkane dehalogenase revealed by molecular modeling. J. Comput. Aided Mol. Des.
20, 375-83 (2006).
310. Damborsky, J. et al. Method of Thermostabilization of a Protein and/or Stabilization Towards Organic Solvents.
(2013).
311. Prokop, Z. et al. Engineering of protein tunnels: Keyhole-lock-key model for catalysis by the enzymes with buried
active sites. Protein Eng. Handb. 3, 421-464 (2012).
312. Nguyen, T.-A. et al. Improvement of Cyclophosphamide Activation by CYP2B6 Mutants: From in Silico to ex Vivo.
Mol. Pharmacol. 73, 1122-1133 (2008).
313. Fishelovitch, D., Shaik, S., Wolfson, H. J. & Nussinov, R. Theoretical characterization of substrate access/exit
channels in the human cytochrome P450 3A4 enzyme: involvement of phenylalanine residues in the gating
mechanism. J. Phys. Chem. B 113, 13018-13025 (2009).
314. Khan, K. K., He, Y. Q., Domanski, T. L & Halpert, J. R. Midazolam oxidation by cytochrome P450 3A4 and active-site
mutants: an evaluation of multiple binding sites and of the metabolic pathway that leads to enzyme inactivation.
Mol. Pharmacol. 61, 495-506 (2002).
315. Wen, Z., Baudry, J., Berenbaum, M. R. & Schuler, M. A. llell5Leu mutation in the SRS1 region of an insect
cytochrome P450 (CYP6B1) compromises substrate turnover via changes in a predicted product release channel.
Protein Eng. Des. Sel. PEDS 18,191-199 (2005).
316. Carmichael, A. B. & Wong, L. L. Protein engineering of Bacillus megaterium CYP102. The oxidation of polycyclic
aromatic hydrocarbons. Eur. J. Biochem. 268, 3117-3125 (2001).
317. Lee, H.-L, Chang, C-K., Jeng, W.-Y., Wang, A. H.-J. & Liang, P.-H. Mutations in the substrate entrance region of ßglucosidase
from Trichoderma reesei improve enzyme activity and thermostability. Protein Eng. Des. Sel. PEDS 25,
733-40 (2012).
318. Schmitt, J., Brocca, S., Schmid, R. D. & Pleiss, J. Blocking the tunnel: engineering of Candida rugosa lipase mutants
with short chain length specificity. Protein Eng. 15, 595-601 (2002).
319. Qian, Z., Horton, J. R., Cheng, X. & Lutz, S. Structural redesign of lipase B from Candida antarctica by circular
permutation and incremental truncation. J. Mol. Biol. 393,191-201 (2009).
320. Marton, Z. et al. Mutations in the stereospecificity pocket and at the entrance of the active site of Candida
antarctica lipase B enhancing enzyme enantioselectivity. J. Mol. Catal. B Enzym. 65, 11-17 (2010).
321. Kamal, M. Z., Mohammad, T. A. S., Krishnamoorthy, G. & Rao, N. M. Role of Active Site Rigidity in Activity: MD
Simulation and Fluorescence Study on a Lipase Mutant. PLOS ONE 7, e35188 (2012).
322. Brundiek, H., Padhi, S. K., Kourist, R., Evitt, A. & Bornscheuer, U. T. Altering the scissile fatty acid binding site of
Candida antarctica lipase A by protein engineering for the selective hydrolysis of medium chain fatty acids. Eur. J.
Lipid Sei. Technol. 114,1148-1153 (2012).
323. Schließmann, A., Hidalgo, A., Berenguer, J. & Bornscheuer, U. T. Increased Enantioselectivity by Engineering
Bottleneck Mutants in an Esterase from Pseudomonas fluorescens. ChemBioChem 10, 2920-2923 (2009).
324. Kotik, M., Štěpánek, V., Kyslík, P. & Marešová, H. Cloning of an epoxide hydrolase-encoding gene from Aspergillus
niger M200, overexpression in E. coli, and modification of activity and enantioselectivity of the enzyme by protein
engineering. J. Biotechnol. 132, 8-15 (2007).
325. van Loo, B. etal. Directed evolution of epoxide hydrolase from A. radiobacter toward higher enantioselectivity by
error-prone PCR and DNA shuffling. Chem. Biol. 11, 981-990 (2004).
326. Biedermannová, L. et al. A single mutation in a tunnel to the active site changes the mechanism and kinetics of
product release in haloalkane dehalogenase LinB. J. Biol. Chem. 287, 29062-29074 (2012).
327. Bosma, T., Damborský, J., Stucki, G. & Janssen, D. B. Biodegradation of 1,2,3-Trichloropropane through Directed
Evolution and Heterologous Expression of a Haloalkane Dehalogenase Gene. Appl. Environ. Microbiol. 68, 3582-
3587 (2002).
328. Klvana, M. et al. Pathways and mechanisms for product release in the engineered haloalkane dehalogenases
explored using classical and random acceleration molecular dynamics simulations. J. Mol. Biol. 392, 1339-1356
(2009).
329. Gora, A., Brezovsky, J. & Damborsky, J. Gates of Enzymes. Chem. Rev. 113, 5871-5923 (2013).
235
330. Sambrook, J. Molecular Cloning: A Laboratory Manual, Third Edition. (Cold Spring Harbor Laboratory Press, 2001).
331. Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37-52 (1987).
332. Segel, I. H. Enzyme kinetics: behavior and analysis of rapid equilibrium and steady state enzyme systems. (Wiley,
1975).
333. Kabsch, W. Automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell
constants. J. Appl. Crystallogr. 26, 795-800 (1993).
334. Vagin, A. & Teplyakov, A. MOLREP: an Automated Program for Molecular Replacement. J. Appl. Crystallogr. 30,
1022-1025 (1997).
335. Stsiapanava, A. et al. Crystallization and preliminary X-ray diffraction analysis of the wild-type haloalkane
dehalogenase DhaA and its variant DhaA13 complexed with different ligands. Acta Crystallograph. Sect. F Struct.
Biol. Cryst. Commun. 67, 253-257 (2011).
336. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of Macromolecular Structures by the MaximumLikelihood
Method. Acta Crystallogr. D Biol. Crystallogr. 53, 240-255 (1997).
337. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol.
Crystallogr. 66, 486-501 (2010).
338. Vaguine, A. A., Richelle, J. & Wodak, S. J. SFCHECK: a unified set of procedures for evaluating the quality of
macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D Biol.
Crystallogr. 55, 191-205 (1999).
339. Chen, V. B. etal. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D
Biol. Crystallogr. 66, 12-21 (2010).
340. Gordon, J. C et al. H++: A server for estimating pKas and adding missing hydrogens to macromolecules. Nucleic
Acids Res. 33, W368-371 (2005).
341. Case, D. A. etal. AMBER 2016, University of California, San Francisco. (2016).
342. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential
functions for simulating liquid water. J. Chem. Phys. 79, 926-935 (1983).
343. Hornak, V. et al. Comparison of multiple Amber force fields and development of improved protein backbone
parameters. Proteins 65, 712-725 (2006).
344. Essmann, U. et al. A smooth particle mesh Ewald method. J. Chem. Phys. 103, 8577-8593 (1995).
345. Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. . Numerical integration of the cartesian equations of motion of a
system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327-341 (1977).
346. Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33-38, 27-28 (1996).
347. Fogarty, A. C, Duboue-Dijon, E., Sterpone, F., Hynes, J. T. & Laage, D. Biomolecular hydration dynamics: a jump
model perspective. Chem. Soc. Rev. 42, 5672-5683 (2013).
348. Privett, H. K. etal. Iterative approach to computational enzyme design. Proc. Natl. Acad. Sci. U. S. A. 109, 3790-
3795 (2012).
349. Sykora, J. et al. Dynamics and hydration explain failed functional transformation in dehalogenase design. Nat.
Chem. Biol. 10, 428-430 (2014).
350. Sekhar, A., Vallurupalli, P. & Kay, L. E. Defining a length scale for millisecond-timescale protein conformational
exchange. Proc. Natl. Acad. Sci. U. S. A. 110,11391-11396 (2013).
351. Levy, Y. & Onuchic, J. N. Water mediation in protein folding and molecular recognition. Annu. Rev. Biophys. Biomol.
Struct. 35, 389-415 (2006).
352. Grossman, M. et al. Correlated structural kinetics and retarded solvent dynamics at the metalloprotease active
site. Nat. Struct. Mol. Biol. 18, 1102-1108 (2011).
353. Nucci, N. V., Pometun, M. S. & Wand, A. J. Site-resolved measurement of water-protein interactions by solution
NMR. Nat. Struct. Mol. Biol. 18, 245-249 (2011).
354. Russo, D., Hura, G. & Head-Gordon, T. Hydration dynamics near a model protein surface. Biophys. J. 86,1852-1862
(2004).
355. Oleinikova, A., Sasisanker, P. & Weingartner, H. What Can Really Be Learned from Dielectric Spectroscopy of
Protein Solutions? A Case Study of Ribonuclease A. J. Phys. Chem. B 108, 8467-8474 (2004).
236
356. King, J. T. & Kubarych, K. J. Site-specific coupling of hydration water and protein flexibility studied in solution with
ultrafast 2D-IR spectroscopy. J. Am. Chem. Soc. 134,18705-18712 (2012).
357. Jesenskä, A. et al. Nanosecond time-dependent Stokes shift at the tunnel mouth of haloalkane dehalogenases. J.
Am. Chem. Soc. 131, 494-501 (2009).
358. Summerer, D. et al. A genetically encoded fluorescent amino acid. Proc. Natl. Acad. Sei. 103, 9785-9789 (2006).
359. Mills, J. H., Lee, H. S., Liu, C. C, Wang, J. & Schultz, P. G. A genetically encoded direct sensor of antibody-antigen
interactions. Chembiochem Eur. J. Chem. Biol. 10, 2162-2164 (2009).
360. Choudhury, S. D. & Pal, H. Modulation of excited-state proton-transfer reactions of 7-hydroxy-4-methylcoumarin
in ionic and nonionic reverse micelles. J. Phys. Chem. B 113, 6736-6744 (2009).
361. Choudhury, S. D., Nath, S. & Pal, H. Excited-state proton transfer behavior of 7-hydroxy-4-methylcoumarin in AOT
reverse micelles. J. Phys. Chem. B 112, 7748-7753 (2008).
362. Sato, Y. et al. Two rhizobial strains, Mesorhizobium loti MAFF303099 and Bradyrhizobium japonicum USDA110,
encode haloalkane dehalogenases with novel structures and substrate specificities. Appl. Environ. Microbiol. 71,
4372-4379 (2005).
363. Joung, I. S. & Cheatham, 3rd, T. E. Determination of alkali and halide monovalent ion parameters for use in explicitly
solvated biomolecular simulations. J. Phys. Chem. B 112, 9020-9041 (2008).
364. Joung, I. S. & Cheatham, 3rd, T. E. Molecular dynamics simulations of the dynamic and energetic properties of
alkali and halide ions using water-model-specific ion parameters. J. Phys. Chem. B 113, 13279-13290 (2009).
365. Hanwell, M. D. et al. Avogadro: An advanced semantic chemical editor, visualization, and analysis platform. J.
Cheminformatics 4,17 (2012).
366. Gaussian 09, Revision A.l, M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G.
Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov,
J. Bloino, G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima,
Y. Honda, 0. Kitao, H. Nakai, T. Vreven, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E.
Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C Burant, S. S.
Iyengar, J.Tomasi, M. Cossi, N. Rega, J. M. Millam, M. Kiene, J. E. Knox, J. B. Cross, V. Bakken, C Adamo, J. Jaramillo,
R. Gomperts, R. E. Stratmann, 0. Yazyev, A. J. Austin, R. Cammi, C Pomelli, J. W. Ochterski, R. L. Martin, K.
Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dannenberg, S. Dapprich, A. D. Daniels, Ö. Farkas, J. B.
Foresman, J. V. Ortiz, J. Cioslowski, and D. J. Fox, Gaussian, Inc., Wallingford CT, 2009. (2017). Available at:
http://www.surfchem.fudan.edu.cn/teacher/lizh/Usefull_Files/g09/g_tech/g_ur/m_citation.htm. (Accessed:
10th February 2017)
367. Dupradeau, F.-Y. etal. The R.E.D. tools: advances in RESP and ESP charge derivation and force field library building.
Phys. Chem. Chem. Phys. PCCP 12, 7821-7839 (2010).
368. Vanquelef, E. et al. R.E.D. Server: a web service for deriving RESP and ESP charges and building force field libraries
for new molecules and molecular fragments. Nucleic Acids Res. 39, W511-517 (2011).
369. Cornell, W. D. et al. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic
Molecules J. Am. Chem. Soc. 1995,117, 5179-5197. J. Am. Chem. Soc. 118, 2309-2309 (1996).
370. VanBeek, D. B., Zwier, M. C, Shorb, J. M. & Krueger, B. P. Fretting about FRET: correlation between kappa and R.
Biophys. J. 92, 4168-4178 (2007).
371. Rose, P. W. et al. The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 41,
D475-482 (2013).
372. Götz, A. W. et al. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized
Born. J. Chem. Theory Comput. 8, 1542-1555 (2012).
373. Le Grand, S., Götz, A. W. & Walker, R. C SPFP: Speed without compromise—A mixed precision model for GPU
accelerated molecular dynamics simulations. Comput. Phys. Commun. 184, 374-380 (2013).
374. Roe, D. R. & Cheatham, T. E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics
Trajectory Data. 7. Chem. Theory Comput. 9, 3084-3095 (2013).
375. Miller, B. R. et al. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. J. Chem. Theory
Comput. 8, 3314-3321 (2012).
237
376. Breuer, M. et al. Industrial methods for the production of optically active intermediates. Angew. Chem. Int. Ed
Engl. 43, 788-824 (2004).
377. Acevedo-Rocha, C. G., Agudo, R. & Reetz, M. T. Directed evolution of stereoselective enzymes based on genetic
selection as opposed to screening systems. J. Biotechnol. 191, 3-10 (2014).
378. Choi, J.-M., Han, S.-S. & Kim, H.-S. Industrial applications of enzyme biocatalysis: Current status and future aspects.
Biotechnol. Adv. 33, 1443-1454 (2015).
379. Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder
reaction. Science 329, 309-313 (2010).
380. Heinisch, T. et al. Improving the Catalytic Performance of an Artificial Metalloenzyme by Computational Design. J.
Am. Chem. Soc. 137,10414-10419 (2015).
381. Muthu, P., Chen, H. X. & Lutz, S. Redesigning human 2'-deoxycytidine kinase enantioselectivity for L-nucleoside
analogues as reporters in positron emission tomography. ACS Chem. Biol. 9, 2326-2333 (2014).
382. Wijma, H.J. etal. Enantioselective enzymes by computational design and in silico screening. Angew. Chem. Int. Ed
Engl. 54, 3726-3730 (2015).
383. Schopf, P. & Warshel, A. Validating computer simulations of enantioselective catalysis; reproducing the large steric
and entropic contributions in Candida Antarctica lipase B. Proteins 82,1387-1399 (2014).
384. Westerbeek, A., Szymanski, W., Feringa, B. & Janssen, D. Dynamic kinetic resolution process employing haloalkane
dehalogenase. ACS Catal. 1, 1654-1660 (2011).
385. Patel, R. N. Biocatalysis: synthesis of chiral intermediates for drugs. Curr. Opin. Drug Discov. Devel. 9, 741-764
(2006).
386. van Leeuwen, J. G. E., Wijma, H. J., Floor, R. J., van der Laan, J.-M. & Janssen, D. B. Directed evolution strategies
for enantiocomplementary haloalkane dehalogenases: from chemical waste to enantiopure building blocks.
Chembiochem Eur. J. Chem. Biol. 13, 137-148 (2012).
387. Phillips, R. S. Temperature effects on stereochemistry of enzymatic reactions. Enzyme Microb. Technol. 14, 417-
419 (1992).
388. Amaro, M. et al. Site-specific analysis of protein hydration based on unnatural amino acid fluorescence. J. Am.
Chem. Soc. 137, 4988-4992 (2015).
389. Ottosson, J., Fransson, L, King, J. W. & Hult, K. Size as a parameter for solvent effects on Candida antarctica lipase
B enantioselectivity. Biochim. Biophys. Acta 1594, 325-334 (2002).
390. Johnson, K. A. in The Enzymes (ed. Sigman, D. S.) 20,1-61 (Academic Press, 1992).
391. Johnson, K. A., Simpson, Z. B. & Blom, T. Global kinetic explorer: a new computer program for dynamic simulation
and fitting of kinetic data. Anal. Biochem. 387, 20-29 (2009).
392. Johnson, K. A., Simpson, Z. B. & Blom, T. FitSpace explorer: an algorithm to evaluate multidimensional parameter
space in fitting kinetic data. Anal. Biochem. 387, 30-41 (2009).
393. Sanner, M. F. Python: a programming language for software integration and development. J. Mol. Graph. Model.
17, 57-61 (1999).
394. Solis, F. J. & Wets, R. J.-B. Minimization by Random Search Techniques. Math. Oper. Res. 6,19-30 (1981).
395. Rocha, G. B., Freire, R. 0., Simas, A. M. & Stewart, J. J. P. RM1: a reparameterization of AMI for H, C, N, 0, P, S, F,
CI, Br, and 1.7. Comput. Chem. 27,1101-1111 (2006).
396. Zhu, C, Byrd, R. H., Lu, P. & Nocedal, J. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-scale Boundconstrained
Optimization. ACM Trans Math Softw 23, 550-560 (1997).
397. Hur, S., Kahn, K. & Bruice, T. C Comparison of formation of reactive conformers for the SN2 displacements by
CH3C02- in water and by Aspl24-C02- in a haloalkane dehalogenase. Proc. Natl. Acad. Sci. U.S.A. 100,2215-2219
(2003).
238
CURRICULUM VITAE
Personal information:
Name: David Bednář
Birth: 12.8.1986, Brno, Czech Republic
Nationality: Czech
Address: Svážná 2, Brno, 634 00
Tel.: +420 605 143 394
E-mail: 222755@mail.muni.cz
Education:
2009 - 2011 - MSc. in Molecular Biology and Genetics, Faculty of Science, Masaryk University
2007-2010- Be. in Biochemistry, Faculty of Science, Masaryk University
2006 - 2009 - Be. in Molecular Biology and Genetics, Faculty of Science, Masaryk University
Awards:
2015 - Award of the Dean of Faculty of Science, Masaryk University
2014 - II. Prize for the Best Speaker, Meeting of Biochemists and Molecular Biologists, 2014, Brno
2011-2014 - Brno Ph.D. Talent Scholarship funded by the Brno City Municipality
Research Stage:
2- 6/2014 - Center for Integrative Proteomics Research, Rutgers University, New Jersey, USA
Pedagogical Activities:
2011 - 2015 - Lector in Bioinformatics, practice - Faculty of Science, Masaryk University
2013 - 2015 - Assistant in Structural biology - practice, Faculty of Science, Masaryk University
2010,2012, 2014 - Assistant in practice, Summer School of Protein Engineering, Masaryk University
2016 - Lector in Summer School of Protein Engineering, Masaryk University 239
LIST OF PUBLICATIONS
Bednař D.*, Beerens K.*, Šebestová E., Bendi J., Khare S., Chaloupková R., Prokop Z.,
Brezovsky J., Baker D., Damborsky J., 2015: FireProt: Energy- and Evolution-Based
Computational Design of Thermostable Multiple-Point Mutants. PLoS Computational
Biology 11: el004556.
Dvorak P.*, Bednař D.*, Vanacek P.*, Bálek L.*, Eiselleova L.*, Štěpánková V., Šebestová E.,
Kunová Bosákova M., Konecna Z., Mazurenko S., Kuňka A, Horák D., Chaloupková R.,
Brezovsky J., Krejci P., Prokop Z., Dvorak P., Damborsky J. (2017) Computer-Assisted
Engineering of Hyperstable Fibroblast Growth Factor 2. Scientific Reports (under review).
Liškova V., Bednař D., Holubeva T., Prudnikova T., Řezačova P., Koudelakova T., Šebestová
E., Kuta Smatanova I., Brezovsky J., Chaloupková R., Damborsky J., 2015: Balancing the
Stability-Activity Trade-off by Fine-Tuning Dehalogenase Access Tunnels. ChemCatChem 7:
648-659.
Amaro M., Brezovsky J., Kováčova S., Sýkora J., Bednař D., Němec V., Liškova V., Kurumbang
N., Beerens K., Chaloupková R., Pa ruch K., Hof M., DamborskyJ., 2015: Site-specific Analysis
of Protein Hydration Based on Unnatural Amino Acid Fluorescence. Journal of the American
Chemical Society 137: 4988-4992.
Liškova V., Štěpánková V., Bednař D., Prokop Z., Brezovsky J., Chaloupková R., Damborsky
J, 2017: Striking Difference in Structural Bases of Enantioselectivity of Haloalkane
Dehalogenases with Linear |3-Haloalkanes: Wide-open versus Occluded Active Site.
Angewandte Chemie (under review).
Kozlíkova B., Šebestová E., Sustr V., Brezovsky J., Strnad O., Daniel L., Bednař D., Pavelka A.,
Manak M., Bezděka M., Benes P., Kotry M., Gora A., DamborskyJ., Sochor J., 2014: CAVER
Analyst 1.0: GraphicTool for Interactive Visualization and Analysis of Tunnels and Channels
in Protein Structures. Bioinformatics 30: 2684-2685.
240
Musil M., Stourac J., Bendl J., Brezovský J., MartinekT., Zendulka J., Bednár D., Damborsky
J., 2017. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic
Acids Research (under review).
Beerens K., Mazurenko S., Kunka A., Marques S. M., Hansen N., Musil M., Chaloupkova R.,
Waterman J., Brezovský J., Bednár D., ProkopZ., Damborsky J., 2017. Evolutionary Analysis
is a Powerful Complement to Energy Calculations Allowing Entropy-Driven Stabilization.
ACS Catalysis (under review).
* authors contributed equally
241