Masaryk University k % Faculty of Science Loschmidt Laboratories Department of Experimental Biology ^ e h u m H ^ Molecular Modelling of StructureFunction Relationships in Enzymes Ph.D. Dissertation David Bednář Supervisors: Prof. Mgr. Jiří Damborský, Dr. Mgr. Jan Brezovský, Ph.D. Brno 2017 2 Poděkování: Rád bych poděkoval svému školiteli Jiřímu Damborskému za umožnění studia v Loschmidtových laboratořích a také za odborné vedení, četné diskuze a motivaci při řešení obtížných úkolů. Velmi děkuji Janu Brezovskému za bezmeznou trpělivost, přátelský přístup a cenné rady, které mi během celého studia poskytoval. Děkuji také všem současným i bývalým kolegům, kteří vytvářeli příjemné a inspirativní prostředí, za pomoc a diskuze, bez kterých by se žádná vědecká práce neobešla. V neposlední ředě bych rád poděkoval mým rodičům a blízkým za podporu a důvěru při studiu i v osobním životě. 3 Bibliographic Entry Author: Title of dissertation: Study Programme: Field of Study: Supervisor: Supervisor-Specialist: Year of Defence: Keywords: Mgr. David Bednář Loschmidt Laboratories Department of Experimental Biology Faculty of Science Masaryk University Molecular Modelling of Structure-Function Relationships in Enzymes Biology Molecular and Cellular Biology prof. Mgr. Jiří Damborský, Dr. Mgr. Jan Brezovský, Ph.D. 2017 molecular modelling; enzyme; protein engineering; stability 4 Bibliografický záznam Autor: Název disertace: Studijní program: Studijní obor: Školitel: Školitel specialista: Rok obhajoby: Klíčová slova: Mgr. David Bednář Loschmidtovy laboratoře Ústav experimentální biologie Přírodovědecká fakulta Masarykova univerzita Molekulové modelování vztahů mezi strukturou a funkcí enzymů Biologie Molekulární a buněčná biologie prof. Mgr. Jiří Damborský, Dr. Mgr. Jan Brezovský, Ph.D. 2017 molekulové modelování; enzym; proteinové inženýrství; stabilita 5 © David Bednář, Masaryk University 2017 6 CONTENT Content 7 Abstract 9 Abstrakt H Motivation 13 INTRODUCTION 1 Protein Stability 15 1.1 T h e r m o d y n a m i c Stability 16 1.2 Kinetic Stability 22 1.3 M e t h o d s for Protein Stabilization 25 2 Molecular M o d e l i n g M e t h o d s 32 2.1 M o l e c u l a r Docking 32 2.2 Molecular Dynamic Simulations 35 2.3 Hybrid Q u a n t u m Mechanics/Molecular Mechanics 38 3 M o d e l Proteins 4 1 3.1 Haloalkane Dehalogenase DhaA 41 3.2 y-Hexachlorocyclohexane Dehydrochlorinase LinA 43 3.3 Fibroblast G r o w t h Factor 2 45 RESULTS 4 FireProt: Energy- and Evolution-Based Computational Design of Thermostable M u l t i p l e Point Mutants 48 4.1 Abstract 49 4.2 Author S u m m a r y 50 4.3 Introduction 51 4.4 Results 53 4.5 Discussion 63 4.6 M e t h o d s 67 4.7 Supporting Information 75 5 Computer-Assisted Engineering of Hyperstable Fibroblast G r o w t h Factor 2 84 5.1 Abstract 85 5.2 M a i n paper 85 5.3 M e t h o d s 93 7 5.4 Supporting Information 108 6 Balancing the Stability-Activity Trade-Off by Fine-Tuning Dehalogenase Access Tunnels. 128 6.1 Abstract 129 6.2 Introduction 130 6.3 Results 132 6.4 Discussion 142 6.5 Conclusion 146 6.6 Experimental Section 146 6.7 Supporting Information 155 7 Site-Specific Analysis of Protein Hydration Based on Unnatural A m i n o Acid Fluorescence 163 7.1 Abstract 164 7.2 Introduction 165 7.3 Results and Discussion 165 7.4 Conclusions 172 7.5 Supplementary Information 174 8 Different Structural Origins of Haloalkane Dehalogenases' Enantioselectivity towards Linear ß-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites 196 8.1 Abstract 197 8.2 M a i n Article 198 8.3 Supporting Information 207 Conclusions 222 References 223 Curriculum Vitae 239 List of Publications 240 8 ABSTRACT This dissertation is devoted to studying the structure-function relationships of proteins using the methods of molecular modeling. In the opening chapter, the emphasis is on the theoretical basis of protein stability. Stability is one of the most important properties and the use of proteins depends on good stability in both basic research and industry. Stability can be divided into thermodynamic (the energy difference between folded and unfolded state) and kinetic stability (separation of relevant states by high activation energy). Several possibilities of protein stabilization are discussed ranging from purely experimental methods based on directed evolution and saturation mutagenesis, through identification of hot-spots, to sophisticated algorithms combining free energy calculations with the evolutionary inference. Next two chapters in introduction deal with the molecular modeling methods and model proteins used in the results section. Molecular dynamics is one of the most important methods that can be used to describe protein stability, activity, folding, hydration, enantioselectivity and substrate specificity. The other methods are molecular docking, which predicts binding modes and binding energy of the substrate-enzyme complex, and hybrid quantum mechanics and molecular mechanics methods used to describe the reactivity of macromolecules with small molecules. Results section consists of five parts. The first one (chapter 4), deals with the computational method FireProt applied for effective stabilization of a protein. FireProt combines evolutionary approach with a calculation of Gibbs free energy, complemented by efficient filtering using a variety of in silico tools. The method was applied to the enzyme haloalkane dehalogenase DhaA and dehydrochlorinase LinA resulting in thermostability increase 24 and 21 °C, respectively. Chapter 5 is focused on the stabilization of human fibroblast growth factor FGF2. This protein is involved in numerous regulatory processes, suggesting a good applicability in both medicine and basic research. Nowadays, the main application of FGF2 is its addition 9 into the medium used for stem cells cultivation. However, FGF2 is not very stable with a half-life between 10 and 12 hours. Stabilizing mutations were identified by using the free energy calculation by Rosetta and FoldX in combination with evolutionary approach "backto-consensus". These hits were combined with mutations from saturation mutagenesis libraries. The final nine-point mutant showed thermostability increase of 19 °C and was stable in the cultivation medium for more than twenty days. The chapters 6-8 deal with a characterization of DhaA enzyme. Chapter 6 focuses on the stability-activity relationships. A stable four-point mutant of DhaA showed tolerance against organic co-solvents, but on the other hand had a very low activity. A single mutation at the mouth of access tunnel significantly improved activity while preserving the thermostability and stability against organic co-solvents. A single mutation increased the mobility of secondary structures and significantly increased the diameter of the access tunnel. Chapter 7 deals with a method for the analysis of protein hydration. A fluorescent artificial amino acid, which provides a different signal in the presence of varying number of water molecules, was introduced into the tunnel mouth of two dehalogenases. Using molecular dynamics, different hydration of both dehalogenases was observed, which is in close correlation with experimental fluorescent spectroscopy. The last 8t h chapter is devoted to the study of enantioselectivity of dehalogenases. Previous studies showed that the enantioselectivity of DbjA dehalogenase with 2-bromopentane is due to increased hydration in the accessible active site. Active site hydration aligned the substrate with the hydrophobic wall of the protein favoring the conformation of (R)enantiomer. A five-point mutant of DhaA has the opposite properties of its active site than DbjA (less hydrated and less accessible active site) and yet exhibits the same enantioselectivity. The difference in enantiodiscrimination of dehalogenases was explained using a combination of site-directed mutagenesis, kinetic measurements, and molecular modeling. 10 ABSTRAKT Disertační práce se zabývá studiem strukturních a funkčních závislostí v proteinech metodami molekulového modelování. V úvodních kapitolách je popsán teoretický zaklad stability proteinů. Stabilita je jednou z nejdůležitějších vlastností proteinů a odvíjí se od ní možnosti použití proteinů jak v základním výzkumu, tak i v průmyslu. Stabilitu dělíme na termodynamickou (rozdíl v energiích mezi sbaleným a rozvolněným stavem) a kinetickou (rozdělení stavů vysokou aktivační energií). Dále jsou diskutovány možnosti stabilizace proteinů od čistě experimentálních metod založených na řízené a saturační mutagenezi, přes identifikaci „hot-spotů" až po sofistikované algoritmy kombinující výpočty volných energii a evoluční závislosti. Následující dvě kapitoly úvodu se zabývají metodami molekulového modelování a modelovými proteiny použitými ve výsledkové části. V této části je představena molekulová dynamika jako jedna ze základních metod s širokým uplatněním při popisu stability, aktivity, hydratace, enantioselektivity či substrátové specifity proteinů. Dalšími metodami jsou molekulový docking pro získávání vazebného modu a vazebné energie mezi substrátem a enzymem a hybridní kvantově mechanická a molekulově mechanická metoda pro popis reaktivity makromolekul. Výsledková část se skládá z pěti kapitol. První z nich (kapitola 4) popisuje výpočetní metodu FireProt pro efektivní stabilizaci proteinů. FireProt kombinuje evoluční přístup s výpočtem Gibbsovy volné energie doplněné o efektivní filtrování pomocí nejrůznějších in silico nástrojů. Metoda byla aplikována na enzymy halogenalkandehalogenasu DhaA a dehydrochlorinasu LinA s výsledným zvýšením termostability o 24 a 21°C. Kapitola 5 se věnuje stabilizaci lidského fibroblastového růstového faktoru FGF2. Tento protein se účastní četných regulačních procesů a dá se předpokládat dobré uplatnění stabilizované molekuly v medicíně i základním výzkumu. V současné době je hlavní a nejdůležitější aplikací přidávání FGF2 do média pro kultivaci kmenových buněk. Protein je však málo stabilní s poločasem rozpadu mezi 10 a 12 hodinami. Pomocí výpočtů volné 11 energie programy Rosetta a FoldX a evolučního přístupu „back-to-consensus" byly vytipovány stabilizující mutace. Ty byly skombinovaný s mutacemi získanými saturační mutagenezí jednotlivých pozic. Výsledkem projektu byl protein se zvýšenou termostabilitou o 19 °C a se stabilitou v kultivačním médiu vice než dvacet dní. Další tři kapitoly se zabývají charakterizací různých vlastností enzymu DhaA. Kapitola 6 se věnuje vztahu mezi stabilitou a aktivitou. Stabilní čtyřbodový mutant DhaA vykazoval toleranci vůči organickým solventům, současně však měl velmi nízkou aktivitu. Pomocí jediné mutace v ústí přístupového tunelu došlo k výraznému zlepšení aktivity a přitom k zachování termostability a stability vůči organickým solventům. Tato jediná mutace zvýšila mobilitu sekundárních motivů ve svém okolí a výrazně zvětšila průměr tunelu. Kapitola 7 se zabývá metodou pro analýzu hydratace proteinů. Do ústí tunelu dvou dehalogenas byla vnesena flourescenční umělá aminokyselina, která vykazuje odlišný signál v přítomnosti různého počtu a chování molekul vody. Za pomoci molekulové dynamiky byla zjištěna různá hydratace obou dehalogenas, která úzce korelovala s experimentální flourescenční spektroskopiíTato nově vyvinutá technika značení a studia hydratace proteinů je široce aplikovatelná na různé proteiny. Poslední 8. kapitola se věnuje studiu enantioselektivity dehalogenas. Z předchozí studie vyplynulo, že enantioselektivita dehalogenasy DbjA s 2-brompentanem je způsobena zvýšenou hydratací v dobře přístupném aktivním místě, která vede k vazbě substrátu podél hydrofobní stěny proteinu v konformaci výhodné pro (7?J-enantiomer. Pětibodový mutant DhaA má opačné vlastnosti než DbjA (hůře přístupné a méně hydratované aktivní místo) a přesto vykazuje stejnou selektivitu. Kombinací přístupů místně-cílené mutageneze, kinetických měření a molekulového modelování byl vysvětlen rozdíl mezi dvěma různými způsoby enantiodiskriminace u dehalogenas. 12 MOTIVATION Stability is an important property of every protein and determines its applicability in basic research as well as in industry. There are many protein stabilization methods, usually demanding extensive experimental characterization. Methods for direct prediction of stable multiple-point mutants are very rare and have lower reliability. A design of a method for effective protein stabilization where only a few variants have to be characterized would increase the applicability of many proteins. Haloalkane dehalogenases are an important group of enzymes providing many different application possibilities. Their ability to degrade dangerous environmental pollutants determines them for application in biodegradation, bioremediation or biosensing. A better understanding of their reactivity would be useful for optimization of properties and broadening applications. Objectives of this dissertation: • Development of a new method for fast and efficient protein stabilization • Validation of this new method with three different model proteins • In silico characterization of engineered haloalkane dehalogenase DhaA 13 PARTI INTRODUCTION 14 1 Protein Stability Stability is a fundamental property affecting the function, activity or regulation of every biomolecule. It determines the application feasibility of proteins in both the industry and the basic science1 . In basic science, the understanding of protein stability is essential for the expression optimization, purification, formulation, storage and structural studies of proteins. Protein stability is of major interest in biotechnology, pharmaceutical or food industry. Proteins have to withstand harsh conditions during technological processes such as the use at higher temperatures, solubilization of substrates by organic solvents or decrease the risk of microbial contamination by disinfectants. Further, higher concentrations of additives like organic solvents, which are used to increase the solubility of water-insoluble reactants and suppress their undesired hydrolysis, can have a significant effect on protein stability2 . With increasing use of different enzymes in the research and the industry, finding a wild type enzyme with required properties is often impossible, therefore many protein engineering techniques have been applied to improve the applicability of enzymes by stabilization of their structures. Stability of a native state protein at standard conditions can be driven either by thermodynamics (the most stable conformation) or kinetics (the most accessible conformation) and is influenced by many different factors3 . Therefore, protein stability can have a different meaning for different people based on their field of study. From a biotechnology point of view, one would consider primarily a half-life of the protein as the main measurement of a protein stability4 . On the other hand, structural biology mostly focuses on changes in primary to quaternary structure of the protein where stability is determined as the change of the free energy between individual states. There are many factors which may lead to unfolding or denaturation of the native folded state: i) physical factors, e.g., temperature and pressure, ii) chemical factors, e.g., pH, salt and co-solvents iii) biological factors, e.g., cleavage by proteases. 15 1.1 Thermodynamic Stability The hypothesis which concerns a protein folding was originally postulated by the Nobel Prize Laureate Christian B. Anfinsen and is known as Anfinsen's dogma or thermodynamic hypothesis. Anfinsen observed with small globular proteins that the protein native structure is determined only by its amino acids sequence5 . The protein under natural conditions is folding into stable, unique and kinetically accessible conformation with the minimum free energy. The folding, according to the Levinthal's paradox, cannot be driven only by a sampling of all possible conformations, but the formation of local interactions (nucleation centers), which decrease the folding time to the reasonable level6 '7 . The Anfinsen's hypothesis was experimentally confirmed, even though rare contradicting cases exist when a protein remain in its local energy minimum whereas global minimum is separated by a high activation energy. Most of the smaller globular proteins undergo a two-state mechanism. The equilibrium between the native folded state (F) and an ensemble of unfolded states (U) is defined by the equilibrium constant of protein folding (Ke q , Equation 1). Folded <—> Unfolded Equation 1 Equilibrium constant describes a thermodynamic balance between both states and it can be calculated from folding (kf) and unfolding (ku) rate (Equation 2). Equation 2 A quantitative measure of the protein stability at constant temperature and pressure is then defined as a difference in Gibbsfree energy of folding (AG) between Gibbs free energy of native folded (Gf) and unfolded state (Gu, Equation 3). 16 Equation 3 A negative change in the folding free energy signifies that the folded state is more stable and will be spontaneously formed (Figure 1). The relation between the equilibrium constant and folding free energy change is defined Equation 4. AG° = -RT lnKeq Equation 4 where R is the gas constant and T temperature. Transition state nergy j A G * " u folding 7\ "unfolding / \ CD , / \ CD LL Unfolded A G \ Native Reaction coordinate Figure 1 . The energy diagram for the folding process of the two-state model. AG# f0 iding and AG# unfoiding show the activation energy between the unfolded and transition state and between the native folded and transition state, respectively. Adopted from8 . The Gibbs free energy is a thermodynamic potential which is used to calculate the maximum reversible work which may be performed by a system (Equation 5). 17 G(p,T) = U + pV-TS Equation 5 Where U is internal energy, p pressure, V volume, T thermodynamic temperature and S entropy. From a stability point of view, the difference in the free energy is the maximum work done by the system which transforms reversibly from an initial state to a final state at constant temperature and pressure. At standard state, it can be written as Equation 6. AG0 = AH0 - TAS° Equation 6 Where AH0 is the enthalpic term and TAS the entropic term. AG has to be negative for protein folding to be spontaneous and the native protein structure to be stable. Both terms are results of individual interactions and other effects occurring in protein structures. 1.1.1 Interactions Contributing to Protein Stability Historically, it was believed that hydrogen bonds, enabling the formation of secondary structures9 , 1 0 , are the most important forces for the proper protein folding and keeping the protein in its native conformation. Later, the importance of hydrophobic interactions (van der Waals forces of non-polar residues which form compact protein core) was strongly emphasized as the main driving force of protein folding1 1 . Nowadays, both effects together with the protein conformational flexibility play the most important role in the determination of protein stability (Figure 2)1 2 . Individual contributions of these interactions to protein stability are discussed below. 18 *AG = 0 Hydrophobic Effect Hydrogen Bonds Other Forces Net Stability Conformational Entropy Figure 2. Protein folding diagram depicting the contributions to the protein stability. Final net protein stability in native conditions is usually marginal (5-15 kcal.mol"1 )1 3 . It is the delicate balance between destabilizing effect of conformational entropy and stabilizing effects mostly covered by hydrophobic effect and hydrogen bonding. Hydrophobic effect. On average, proteins bury about 85 % of their non-polar side chains to preserve them from contact with water molecules1 4 . The protein core is more tightly packed then water molecules and the difference between the van der Waals interaction of -CH2- group in protein and in water was calculated to be 1.3 kcal.mol"1 lower in favor of the protein1 5 . Mutational experiment on 13 proteins was performed to evaluate individual side chain contributions. Larger non-polar amino acid was mutated into smaller one and AAG between wild type and mutant protein was measured1 6 . It was estimated that for every lost -CH2- group protein loses 1.1 kcal.mol"1 on average1 7 . The decrease in the number of the atom contacts and formation of voids within the protein structure leads to the loss of van der Waals interactions, and therefore to the lower stability1 8 . Stabilizing hydrophobic effects are connected also with the entropic contribution. Water molecules H-bonded to the protein residues are released during protein folding. This results in an increase of conformational entropy of water and therefore contribute to protein stability1 9 . The hydrophobic interactions contribute to total protein stability with 60 % on average1 7 . 19 Hydrogen bonds. Determination of the real contribution of H-bonds to the protein stability is much more difficult compared to hydrophobic interactions. Every residue in the folded protein forms on average 1.1 hydrogen bonds2 0 and 70% of the peptide groups and 65% of the polar side chains are buried in the protein core are not in contact with water1 4 lt is still disputable whether the burying of the polar groups is energetically more advantageous than the formation of the van der Waals and H-bonds with water molecules1 2 . Studying the effect of H-bonds simply by mutating the bonding residue is rather less effective because the hydrophobicity, the conformational entropy, and the packing of the side chain in the protein is altered as well2 1 . Therefore, an experiment was carried out using double mutant cycle. Both H-bond donor and acceptor were mutated individually and also as a double point mutant2 2 . Double mutant cycles, as well as experiments removing side chain involved in H-bond, showed very similar AAG values. For example, stability of Tyr to Phe mutants was lowered by 1.4 ± 0.9 kcal.mol"1 when the Tyr side chain was hydrogen bonded and only 0.2 ± 0.4 kcal.mol"1 when it was not1 2 . On average, H-bonds contributed by 1.2 ± 1.0 kcal.mol"1 . In the case of Thrto Val mutation, the contribution was 1.0 ± 1.2 kcal.mol"1 . High standard deviations suggest that the H-bond contribution is highly dependent on the environment. An interesting behavior is also the positive effects on stability in cases where the Tyr to Phe mutation was not hydrogen bonded. This shows that buried polar residues can frequently contribute to protein stability. This contribution of improved packing may be larger than stabilization of the unfolded state where a polar chain is stabilized by water H-bond2 3 . The strength of the H-bond depends on its distance and geometry2 4 , but also on the surrounding environment. H-bonds in non-polar environments with a lower dielectric constant form stronger interactions where AAG can be more than 1 kcal.mol"1 more stable than when they are solvent exposed2 5 . Also, charge-stabilized H-bonds are about 2 kcal.mol"1 stronger than neutral and their stability contribution can be as high as 7 kcal.mol" 1 2 6 . Altogether, H-bonds make about 40 % contribution to total protein stability1 7 . 20 Other forces contributing to protein stability. Disulfide bridges can contribute significantly to protein stability by reducing the conformation entropy in the unfolded state. The thermostability (melting temperature) of proteins can be usually increased by several °C2 7 '2 8 and in rare cases even up to 40°C by introducing a single disulfide bridge2 9 . The effect of the disulfide bridge formation can be reasonably estimated by Equation 7 which is based on the experimental observations for proteins with two-state unfolding: AS = - 2 . 1 - (3/2) R ln(n) Equation 7 Where n is the number of residues in the loop formed by the disulfide bridge, and R is the gas constant2 7 . Charge-charge interactions are the strongest non-bonded interactions in proteins and they are expected to have a significant effect on protein stability3 0 . It turns out that stability increase by charge mutation isalways lowerthan predicted by Coulomb's law. This suggests that not only folded but also unfolded state is stabilized by charge-charge interactions so the contribution to total protein stability is small3 1 . Salt bridges (ion pairs) usually located on the surface contribute less than 1 kcal.mol"1 . However, a higher contribution of about 4 kcal.mol"1 can be obtained from buried salt bridges, but their numbers in proteins are rather small1 2 . 1.1.2 Effects Contributing to Protein Instability Protein conformational entropy. The main and the most important effect decreasing protein stability upon folding is the loss of conformational freedom1 9 . Compared to the highly dynamic unfolded protein which can adopt many different conformational states, residues in the folded state are tightly packed in the protein core and restricted to one or few conformations. Determination of the entropy loss is important for dissecting the energetics of protein motions, including folding, conformational changes, and protein or ligand binding3 2 - 3 4 . However, it is very difficult to measure conformational entropy directly. Therefore, mostly 21 molecular dynamics simulations and recently NMR relaxation methods3 5 are applied to identify changes in protein motions. Baxa and coworkers3 6 used molecular dynamics simulation to observe backbone and side chain motion differences in folded and unfolded state of mammalian ubiquitin. They observed that the average loss of entropy (TAS) is 1.4 kcal.mol"1 per residue (at 300 K). Side chain entropy loss contributed only by a small amount (0.2-0.3 kcal.mol"1 ). The highest destabilizing effect was due to loss of backbone flexibility. A difference was observed based on the position in particular secondary structure (1.5,1.0 and 1.2 kcal.mol"1 for helix, sheet, and coil, respectively). There are many studies determining conformational entropy loss and some of them are in the agreement with Baxa's results3 7 , 3 8 , others show different values varying from 1.7 to 3.6 kcal.mol"1 per residue 12 <32 <39 <40 . Exact values can be contextdependent. The lack of reliable experimental methods for determination of conformational entropy is making prediction difficult. 1.2 Kinetic Stability The majority of studies concerned with protein stability are focused on thermodynamic stability whereas kinetic stability has been partially overlooked. It is understandable for two reasons: (i) thermodynamic stability is easy to determine in vitro and (ii) there are many computational methods available which can estimate the effect of a mutation on protein thermodynamic stability. Recently, more studies demonstrating kinetic control of protein stability have occurred. The kinetic stability is studied in biotechnology where stability in harsh conditions is essential for technological applications. In medicine, the decrease in kinetic stability leads to protein aggregation, accelerated degradation or formation of amyloid fibers. These are effects linked with diseases due to protein misfolding like phenylketonuria4 1 , amyotrophic lateral sclerosis4 2 , or transthyretin amyloidoses4 3 . In thermodynamic stability, only folded and unfolded state are taken into consideration for the two-state model. The equilibrium between folded and unfolded state favors more 22 stable folded state under physiological conditions. But there are also partially folded or irreversibly denaturated states (proteolysed or aggregated proteins) which can change the equilibrium between active and inactive states towards the later ones. Therefore, the twostate model should be replaced by Lumry-Eyring model to which the final irreversible state (I) is added (Equation 8) Keq k Folded <—> Unfolded -> Irreversible Equation 8 The biological activity has to be maintained also in cases when folded state in not sufficiently stable with respect to the unfolded or denaturated states. A sufficiently high barrier is necessary to maintain the protein in the functional state long enough to fulfill its biological function4 4 . To determine the height of the barrier, the transition state theory can be applied. The irreversible denaturation rate can be obtained from Eyring equation (Equation 9) AG^( AU\ k = k0 expy-xj) Equation 9 where ko is the front factor, k is the rate constant for irreversible denaturation and AG is the difference between natural folded state and the transition state (activation energy). 1.2.1 Types of Kinetic Stability The kinetic stability can be one of two following types4 5 : i) The native folded state is thermodynamically more stable than the unfolded states, but those can be irreversibly modified (interactions with other molecules, aggregation, proteolysis, etc.). This leads sooner or later (depending on the denaturation rate constant) to the situation where all protein is irreversibly denaturated, and therefore non-functional. Unless there is a high energy barrier dividing both states, many different proteins would undergo the denaturation process when they are in crowded or harsh in vivo environment. 23 ii) The second possibility is that the folded state is notthermodynamically the most stable under physiological conditions. Then, even if the irreversible denaturation does not take place, the protein needs kinetic stabilization to maintain it in the folded state. The example is the a-lytic protease which is synthesized with a terminal region providing a driving force for protein folding (stabilization about 26 kcal.mol"1 for the folded state)4 6 . The terminal region is cleaved after the folding is finished and only the high unfolding barrier keeps the enzyme in its active form. 1.2.2 The Relation Between Kinetic and Thermodynamic Stability Based on the type of kinetic stability, there can be a strong or weak correlation or no correlation at all between the thermodynamic and kinetic stability4 7 . Considering the first type of kinetic stability, it depends whether the formation of the final irreversible state is the rate-limiting step or not. If so, then the kinetic rate of irreversible state formation is directly dependent on equilibration constant between folded and unfolded state. Therefore, any change in the thermodynamics of natural folded and unfolded state is directly affecting the kinetic stability. If the process of irreversible state formation is fast and the unfolding is the rate-limiting step, then change in thermodynamic stability may or may not affect kinetic stability. If we thermodynamically stabilize residues which are unfolding in the transition state, the activation energy will be higher and the stabilization will propagate into kinetic stability. Thermodynamic stabilization in the structured parts of protein will not have an effect on kinetic stability. For the second type of kinetic stability, any thermodynamic stabilization does not have the effect on kinetic stability unless we stabilize the folded state to the level when it becomes the global minimum. 1.2.3 Kinetic Stability of Natural Proteins One of the first enzymes ever described as an exception to Anfinsen's thermodynamic hypothesis was the a-lytic protease4 6 , 4 8 . Later, many other proteins were proved to be stabilized kinetically. Two approaches using high-throughput proteomic screening were applied to identify how easily different proteins access the unfolded or partially folded 24 states. One technique was based on two-dimensional SDS-PAGE where the resistance to SDS (sodium dodecyl sulfate) was observed4 9 . The second technique was based on protein survival in the mixture with proteolytic enzymes5 0 . In both studies together, there were 81 kinetically stabilized proteins identified. Both methods are focused mostly on highly stable proteins and it should be mentioned that kinetic stability depends on the height of a free energy barrier which cannot be divided strictly into two groups. Both studies overlap in only six proteins, which show that also kinetic stability can be divided according to the type of unfolded state, which is targeted during irreversible denaturation. An interesting common feature of kinetically stable proteins in both studies was that the most of the proteins had solved X-ray structure. It was hypothesized that kinetic stability is beneficial in harsh crystallographic conditions4 5 . 1.3 Methods for Protein Stabilization Proteins are increasingly used in many different fields including biotechnology, medicine, pharmacy, biocatalysis or nanomaterials5 1 - 5 3 . The most of the naturally-occurring proteins are only marginally stable which limits their applicability. The difference between the stability of folded and unfolded state is often only 5 - 1 5 kcal.mol"1 , and even a single mutation or change in conditions can significantly destabilize a protein5 4 . Therefore, protein stabilization is necessary for all applications. Many methods for protein stabilization were established based on different approaches ranging from experimental screening and rational designs to bioinformatics and evolutionary analyses. 1.3.1 Random Mutagenesis Completely random mutagenesis can lead to stable variants, but unfortunately, the majority of mutations are neutral or destabilizing5 5 . Therefore, a large number of mutants has to be constructed to identify those that have a significant stabilizing effect. Mutations are introduced by DNA shuffling or error-prone PCR5 6 - 5 8 creating a large library of individual 25 mutants. The finding of required mutations is dependent on effective high-throughput screening or selection assay5 9 . As an example, Gene Site Saturation Mutagenesis (GSSM)6 0 '6 1 method was used to increase thermostability of haloalkane dehalogenase DhaA for biotechnological application. A large library comprised of more than 100,000 variants was constructed to cover all possible single-point mutants. A high-throughput screening for thermostability increase was used and resulted in an eight-point mutant with 18°C increase in melting temperature. Effectivity of similar methods is quite low when only a tiny fraction of screened mutants is stabilizing. Other disadvantages are time and financial demands and that appropriate screening method does not have to be available for every case. A potential solution for decreasing time and cost of large screenings appears to be microfluidic techniques which are on the uptrend nowadays6 2 , 6 3 . 1.3.2 Hot-Spot Prediction Another possibility how to decrease experimental work and increase the effectivity of stabilizing mutations detection is by using computational approaches. Areas or residues possessing a property linked with stability may be identified and experimental mutagenesis takes place only in relevant regions. Iterative saturation mutagenesis is then used in selected positions and the best variants are identified. The advantage of this approach is that it can decrease the number of experiments by a few orders of magnitude (compared to the random mutagenesis), but still screening or selection technique is usually needed. Another disadvantage is that structural information has to be available. One of these methods is B-fitter which focuses on flexible parts of protein structures6 4 . Thermophilic enzymes are characterized by higher degrees of rigidity resulting from better packing and increased number of different interactions. Reetz and co-workers used bfactors (Debye-Waller factor, temperature factor) from crystallographic analyses to identify the most flexible regions and selected them for iterative saturation mutagenesis. They applied their method on the lipase from Bacillus subtilis LipA and measured the 26 increase in T$o (temperature where 50 % of an enzyme is irreversibly denaturated). The introduction of seven mutations into wild type enzyme (T"5o= 50°C) led to the abolition of irreversible denaturation even after heating to 100°C. The strategy to focus on residues with high b-factors was used successfully repeatedly either in combination with experimental screening6 5 , 6 6 or other stabilization approaches6 7 , 6 8 . A different strategy is used in Hotspot Wizard6 9 . Prediction of hot-spots for mutagenesis is focused on functional regions like the active site or access tunnel. The protocol is extended by identification of functional and conserved residues. The later version7 0 adds analyses of b-factors described above and stability prediction by the back-to-consensus approach. Finally, Hotspot Wizard can be used also for library design varying identified residues. An effective approach for stabilization of multimeric proteins is optimization of proteinprotein interfaces7 1 . Maugini and co-workers7 2 analyzed interface interactions of oligomeric thermophilic and hyperthermophilic enzymes. The results showed that the difference compared to mesophilic enzymes is mostly in increased area of inter-domain interactions and that the hydrophobic interactions are the main driving force of oligomerization in thermophilic enzymes. Therefore, improved packing of the interface is a good approach for stabilization of multimeric proteins. Screening or selection techniques are demanding and there is a need to identify directly specific mutations and not only positions. Computational chemistry and bioinformatics have a solution for that and single- or even multiple-point mutants can be obtained by calculation of free energy and analysis of evolutionary relations. 1.3.3 Evolution-Based Methods Interesting and effective stabilization methods are those based on bioinformatics analyses of protein evolution. Their biggest advantages are that they do not need any structural information, are easy to use, and computational demands are much lower compared to the methods predicting the free energy. The bioinformatics analysis is based on multiple sequence alignment (MSA) of evolutionarily related sequences5 4 . For prediction of 27 stabilizing mutations, the concept of consensus is used, based on the idea that mutation to the most common residue in MSA is likely to be stabilizing. Two main methods use this principle: (i) back-to-consensus and (ii) ancestral reconstruction. Consensus approach is simply mutating every residue in the target sequence which is not the most frequent in MSA. The success rate of identifying stabilizing mutations for immunoglobin domains was about 60 %7 3 , 7 4 , but the rate may differ for different proteins. Usually, about a half of the mutations are stabilizing7 5 . Mutations not improving protein stability may be conserved due to their importance for other properties, e.g., activity or folding, which are also interesting for protein engineers. However, it is not possible to distinguish why the particular residue is conserved. Moreover, consensus approach can fail when the residue is not well conserved or when the residue is coupled (correlated) with other residues. Elimination of residues with lower conservation or with statistical correlation from consensus approach can lead to higher success rate7 5 . A similar approach is used also for reconstruction of ancestral proteins. Here the MSA is extended by phylogenetic analysis. MSA is used for the construction of the phylogenetic tree. Based on the mutation rate, the output sequence can be calculated for every internal node in that tree7 6 . There are three main algorithms for ancestral sequence reconstruction: (i) maximum parsimony, (ii) maximum likelihood and (iii) Bayesian reconstruction6 3 '7 7 - 7 9 . The correspondence between residues identified by consensus and ancestral approach is often very high8 0 , but in some cases is the later superior8 1 . It seems possible that ancestral proteins are using both sequence consensus information for we 11-conserved sites and also some sequence correlation information which further improves its effectivity5 4 . The contribution of individual mutations in evolution-based approaches is usually low but a high number of identified mutations results in significant stability improvements. 1.3.4 Computational Methods A large number of computational methods and protocols have been developed for prediction of stabilizing mutations. They range from not very precise sequence-based 28 methods8 2 '8 3 , to structure-based methods introducing prolines or disulfide bridges8 4 , optimizing protein core8 5 , surface charges8 6 or increasing surface polarity5 4 . However, the most of the methods focus on the evaluation of Gibbs free energy change upon mutation, i.e., calculation of AAG between wild type protein and single-point mutant. Calculation of exact energy change upon mutations brings two problems. Firstly, the threedimensional conformational space of protein is too large, therefore reduced rotamer library of preferred side chain conformations has to be used8 7 . Secondly, precise energy calculation by quantum mechanics is not possible for as large system as a protein, therefore molecular mechanics is employed. Forcefields take their energy functions either from i) physical-based potentials which analyze individual atom interactions (e.g. Rosetta, Eris, EGAD, CC/PBSA, ...)8 8 - 9 2 . They are computationally the most expensive but on the other hand, they provide very good accuracy9 3 ; or from ii) empirical or statistical potentials which are derived from statistical analysis of experimental data (e.g. FoldX, PoPMuSiC, PEAT-SA, SDM, ... )9 3 - 9 6 . Another large group is machine learning methods. Approaches like artificial neural networks, support vector machines or decision tree learnings are using different weighted descriptors to predict protein stability, implemented in l-Mutant, Prethermut, AUTO-MUTE and MAESTRO9 7 - 1 0 1 . With this large number of different methods, it is difficult to choose the best one or few which are appropriate for the particular problem. Fortunately, several comparisons were already done. Potapov and coworkers1 0 2 published a comparison of six tools (CC/PBSA, EGAD, FoldX, IMutant2.0, Rosetta, and Hunter). They calculate the correlation coefficient between predicted and experimentally measured change in stability (AAG) upon mutation. The evaluation was done on 2156 single-point mutations obtained from FoldX9 4 and ProTherm database1 0 3 . The results showed that all methods are able to detect hotspots or classify mutations as stabilizing and destabilizing but the exact AAG value is not predicted so precisely (average error 1.2 kcal.mol"1 ). The correlation coefficient ranged between 0.59 for EGAD and 0.26 for Rosetta. This significantly lower value for Rosetta led to a reaction from Kellog and coworkers8 8 where they presented 20 different protocols employing Rosetta 29 with correlation coefficient ranging from 0.04 to 0.69 (depending on different settings). This proved that i) Potapov's setting (of Rosetta) is not appropriate for energy evaluation, ii) Rosetta can perform even slightly better than all the other tools and iii) Rosetta is not so user friendly. Khan and Vihinen1 0 4 evaluated eleven online prediction tools (CUPSAT, Dmutant, FoldX, IMutant2.0, two versions of I-Mutant3.0, MultiMutate, MUpro, SCide, Scpred, and SRide) on updated (2008) ProTherm database. Only mutations which were not used for training of particular tool were used for evaluation. Therefore, testing datasets ranged from 28 to 1784 mutations and differed for individual tools. CUPSAT performed the best for prediction of stabilizing mutations, on the other hand, I-Mutant3.0 and FoldX had the highest accuracy for prediction of destabilizing mutations. Evaluation performed by Thiltgen and Goldstein1 0 5 on FoldX, Rosetta, Eris, and I-Mutant3.0 used different comparison. They selected pairs of known structures of wild type and corresponding single-point mutant and observed how consistent will be the prediction of forward and back mutation. Rosetta performed significantly better than the other three tools showing small systematic bias and low energy errors. Recently, different tools and approaches were combined into methods which increase the success rate of stabilizing mutation identification or predict directly stable multiple-point mutants. One of the methods is FRESCO (Framework for Rapid Enzyme Stabilization by Computational libraries)2 8 which was used on model enzyme limonene epoxide hydrolase. First of all, disulfide bridges were designed using their own DDD algorithm2 8 . Secondly, Rosetta and FoldX were used to identify potentially stabilizing single-point mutants. Those mutations were subsequently filtered by very short (100 ps) MD simulations. Using atoms root-mean-square-deviation (RMSD), the flexibility of introduced mutations was determined and mutations increasing flexibility were discarded. Finally, 64 single-point mutants were experimentally characterized and combined into the final 10-point mutant. Melting temperature was for 35°C and half-life >250-fold higher compared to the wild type. Even though, the most of the stabilization came from disulfide bridge and mutations on the 30 interface of two subunits. Therefore, it would not be so applicable for monomeric proteins, however it is interesting method providing high percentage of stabilizing mutations. Second method enables the design of multiple-point mutants directly (without the need of testing single-point variants) and is called PROSS (Protein Repair One Stop Shop)1 0 6 . The method uses Rosetta ConsensusDesignMover1 0 7 , a modified consensus approach (every residue more frequent in MSA than wild type residue is allowed in design), followed by evaluation of energy by Rosetta. The most stable multiple point mutants are synthetized and experimentally characterized. PROSS was applied on five different enzymes (acetylcholinesterase, histone deacylase, ADP-ribosylase, NAD+ -dependent deacylase, DNA cytosine-methyltransferase). Many of the designs showed an increase in thermostability and/or expression level but in some cases the protocol failed. On the other hand, activity or functional yield was also improved for several proteins. This proves that consensusbased methods have a very high success rate for protein improvements but it is usually not possible to determine which properties will be targeted by selected mutations. Another strategy belonging into methods for direct prediction of stable multiple-point mutants is FireProt (see chapter 4). 31 2 Molecular Modeling Methods Methods of molecular modeling are a very extensive topic which would be sufficient for several theses. Therefore, here I would like to just briefly introduce methods used in projects compiled in this dissertation. 2.1 Molecular Docking High-throughput techniques for protein characterizations together with higher numbers of solved protein structures by crystallography or nuclear magnetic resonance have led to increased interest for protein-ligand complexes. Molecular docking, as a basic tool for virtual screening, has become one of the most important methods in drug discovery1 0 8 . Compared to the experimental screenings, virtual screening is able to evaluate several orders of magnitude more complexes, decreasing the cost and increasing the effectivity significantly1 0 9 '1 1 0 . Molecular docking is a method which predicts the behavior of a small molecule in the active site of a target protein by analysis of interactions on atomic level1 1 1 . This can explain biochemical processes or produce inputs for other computational methods. Docking is composed of two steps i) prediction of ligand position and its conformation (sampling methods) and ii) evaluation of binding energy (scoring functions). Both steps are very demanding for precise evaluation, and therefore many approximations have to be introduced. One of the important approximations is receptor and ligand partial or total rigidity. First sampling algorithms handled both protein and ligand as rigid bodies1 1 2 . The configuration space was reduced to ligand rotational and translational degrees of freedom which significantly increased a speed of calculation but the prediction power was very low. Therefore, ligand flexibility1 1 3 and later also protein flexibility was introduced to model ligand binding behavior in real proteins1 1 4 - 1 1 8 . 32 2.1.1 Sampling Methods Matching algorithm (MA). Both ligand and receptor active site are represented by pharmacophores, physical and chemical features. Complementary pharmacophores are placed to match and energy of the conformation is evaluated (Figure 3). MAs are fast and are used for enrichment of large libraries. MA is implemented in SAN DOCK1 1 9 or FLOG1 2 0 . Figure 3. Scheme of the matching algorithm. Individual pharmacophores are represented by different color. MA places ligand into the active site of the protein so the corresponding colors match. Incremental construction (IC). The ligand is divided between single bonds into individual fragments and one of these fragments (usually the larger one) is docked into the active site as an anchor. The rest of the fragments are added subsequently in different orientations. IC is implemented in Dock 4.01 2 1 , SLIDE1 2 2 or Hammerhead1 2 3 . Stochastic methods. The search in the configuration space is done randomly by modification of ligand conformation or its position in the active site. The main two stochastic methods are Monte Carlo and genetic algorithm1 1 1 . Monte Carlo (MC) randomly places ligand into the active site. The conformation of the ligand is then changed and the newly obtained binding mode is evaluated for the energy. Metropolis acceptance criterion is used to decide if the new conformation shall be accepted. If the energy is lower the conformation change is accepted. If the energy is higher it is either accepted or rejected based on energy penalty compared to the random value within some interval. This process is iterated until the predefined number of steps is reached. MC was used in first versions of AutoDock1 2 4 , QXP1 2 5 or ICM1 2 6 . 33 Genetic algorithms (GA) are inspired by Darwin's evolution theory. Different ligand conformations become subjects of one of two operations - mutation (rotation on single bond) or crossover (exchange of ligand parts between the pairs). All conformations are evaluated by scoring function and the favorable are used in the next generation (Figure 4). GA is used in AutoDock41 2 7 , DARWIN1 2 8 , or GOLD1 2 9 . Chromosome Parents Children Crossover point" Translation —• Orientation — » •*— Mutation E = +2.2 kcal/mol E = -22.8 kcal/mol E = -13.3 kcal/mol E = +295 kcal/mol Removed Selection of the next generation 101 -19 171 134 175 95 69 179 86 86 -61 -61 -77 -77 x, y, z x,y,z 0, 0. V 0, 0, V Removed Next Generation Figure 4. Scheme of a genetic algorithm. Conformations from parent generation undergo a rotation of torsion angle (mutation) or exchange part of the structure with different conformation (crossover). New conformations with favorable energy pass into the next generation. Adopted from1 0 9 . 2.1.2 Scoring Functions The sampling algorithms are usually able to find a binding mode which is close to the experiment but it is not easy to distinguish it from non-relevant modes. Scoring functions enable to evaluate binding energy of individual protein-ligand configurations. Scoring functions can be divided in force-field-based, empirical and knowledge-based1 3 0 . Force-field-based scoring functions. These scoring functions use classical molecular mechanics force-fields1 3 1 . The binding free energy is typically computed as a sum of non- 34 bonded interactions (van der Waals and electrostatic) between the binding partners. Later, they have been supplemented by solvation energy or entropy terms. The advantage is that improvements can be expected with further development of modern force-fields. On the other hand, the calculation is still slow and the results are not significantly better compared to less demanding scoring functions. Empirical scoring functions. The binding energy is decomposed into components like hydrophobic interaction, hydrogen bonds, binding entropy, etc.1 1 1 . Individual terms are weighted by coefficients obtained from regression analysis of experimentally determined binding affinities of protein-ligand complexes. Empirical scoring function energy terms are easy to calculate, therefore speed is the main advantage of this approach. On the other hand, energy evaluation can fail if similar cases were not present in the training dataset. Knowledge-based scoring functions. Crystal structure of protein-ligand complexes from PDB are statistically analyzed for pairwise interatomic distances. If a particular atom interaction occurs more frequently than in random distribution, the interaction is favorable. The advantage is computational simplicity and unlike the other methods, knowledge-based functions take into consideration all protein-ligand interactions. Methods are sometimes combined with solvation and entropy terms to increase the predictive power1 3 2 . 2.2 Molecular Dynamic Simulations Molecular dynamics (MD) is a computational method which characterizes the timedependent behavior of a macromolecular system. Molecular movements are closely connected to different molecular properties, and therefore MD has become one of the most used computational techniques applied to various protein problems. It can be used for studying protein stability1 3 3 - 1 3 5 , activity (see chapter 6 ) 1 3 6 - 1 3 8 , folding1 3 9 - 1 4 1 , unfolding1 4 2 -1 4 3 , protein hydratation (see chapter 7)1 4 4 -1 4 5 , substrate binding1 4 6 '1 4 7 , product release1 4 8 , 1 4 9 or enantioselectivity (see chapter 8)1 5 0 '1 5 1 . 35 MD uses molecular mechanics force fields and Newton's equations of motion to calculate changes in protein structure during the small time steps1 5 2 -1 5 3 . First of all, input data has to be selected (Figure 5). Initial coordinates are obtained from crystallography, NMR spectroscopy or homology modeling. Initial velocities are selected from MaxwellBoltzmann distribution. Potential energy surface of the molecule is described by chosen force field. An example of the force field description is in Equation 10. Structure Coordinates and velocities Force Field Definition on potential function Compute forcesCompute forces Solve Newton's equations of motion Update coordinates, velocities and energies Update coordinates, velocities and energies T to 04-1 Finish calculation Figure 5. Scheme of molecular dynamics simulation. u=J2 \hir - rQf + \ue - Oof + T [ 1 + c o s ( w ^ bonds angles + ^ + E 4 ^ 7 torsions 4 improper LJ r 6 elec € 0r ij Equation 10 36 where the first four terms (bond stretching, angle bending, dihedral and improper torsions) describe energy contributions of bonded interactions and the last two terms (van der Waals and electrostatics) of non-bonded interactions. The simulated molecule is inserted into the box and the whole system is solvated by explicit water molecules1 5 2 '1 5 3 . Periodic boundary conditions (replication of the box in each direction) are applied to simulate bulk water with a relatively small number of water molecules. Each molecule, which leaves the box, is substituted by another molecule from the neighboring box. Potential energy and individual forces affecting every atom in a molecule are calculated (Equation 10). Bonded terms are easy to calculate and their number is proportional to a number of atoms. The most time-consuming part is the evaluation of non-bonded terms where the complexity rises as the square of the number of atoms. Therefore, several approximations are used to decrease computational demands. Van der Waals interactions use minimum image convention where each atom can interact with only one copy of all atoms. Further, the distance cutoff (at least 9 A) is applied above which the forces are neglected1 5 4 . A switching function can be used for a smooth approach towards zero energy value. Electrostatic interactions are long-ranged and thus cannot be turned-off completely. Particle Mesh Ewald (PME) method1 5 5 is used which divides electrostatics into short-range (real space) and long-range (reciprocal space). The long-range charges are then converted into a grid of density values. The potential is calculated for density grids, and forces on a particle are applied according to the position of the grid cell and position in the cell. This lowers the complexity to N logN. PME becomes a standard for solving long-range interactions but also other methods can be used in particular situations e.g. Isotropic periodic sum1 5 6 or Pressure-based long-range correction1 5 7 . Forces applied on each atom are obtained by calculating all bonded and non-bonded interactions1 5 2 , 1 5 3 . Newton's motion equations are deterministic which means that we are 37 able to calculate positions of every atom after a time step (At) when we know atom initial coordinates and velocities. Potential energy is a complicated function (Equation 10) of all atom positions and the equations of motion do not have the analytical solution. Therefore, numerical solutions are employed using particular integrator (Leap-frog1 5 8 , Verlet1 5 9 , Beeman1 6 0 algorithms). 2.3 Hybrid Quantum Mechanics/Molecular Mechanics Modeling of the enzymatic reactions is a difficult task. Classical MD simulation has a fixed topology and it is not possible to create or cleave chemical bonds using molecular mechanics (MM)1 6 1 unless a special force-field is employed1 6 2 . On the other hand, quantum mechanics (QM) is able to describe chemical reactivity and other electronic processes. However, Q M can only deal with hundreds of atoms since the calculation complexity rises with N3 or more, depending on the level of used theory (N is a number of atoms). Application of Q M directly to the biomolecules is not practical, therefore a hybrid QM/MM method was introduced by Arieh Warshel1 6 3 . It combines advantages of both methods, the accuracy and ability to simulate chemical reactions from Q M , and the speed of M M . The region hosting a chemical reaction is handled by Q M (substrate, co-factor, catalytic residues), whereas the rest of the system (most of the protein and solvent) is treated by M M (Figure 6). 38 Figure 6. An example of system division into two parts. Q M part (green sticks) consists of a ligand and the active site residues divided between CA and CB atoms from M M part (grey lines). Partitioning of the system into Q M and M M parts may be straightforward for noncovalently bound molecules, but in the case of covalent bonds, the division has to be treated by introduction of specific parameters1 6 1 . Q M atoms which participate in bond formation or breaking should not be involved in any bonded M M term. A good place to cut a bond is a nonpolar single aliphatic bond not involved in any conjugated interactions (peptide bond is not suitable). The total potential cannot be obtained simply as the sum of individual parts because both parts are strongly interacting. Therefore, coupling terms are introduced to describe the interactions between both parts. A subtractive (Equation 11) or additive (Equation 12) Q M / M M coupling can be applied1 6 4 . VTOTAL = VMM(QM + MM) + VQM(QM) - VMM(QM) Equation 11 where the total potential energy is the sum of the potential of the whole system (QM+MM) treated by the M M force field and Q M part described by Q M theory. The force field evaluation of QM part is subsequently subtracted. VTOTAL = VQM(QM) + VMM(MM) + V0M_MM(QM + MM) Equation 12 39 where, compared to the Equation 11, only M M part is treated by the force field and the correction between QM and M M parts is treated explicitly at various sophistication level. The boundary is described by special interactions1 6 1 : i) link-atom schemes introduce additional hydrogen atom which in reality is not present in the system, ii) boundary-atom schemes exchange first M M atom for special one which can be found in both QM and M M calculations and iii) localized-orbital schemes introduce a frozen orbital on the Q M / M M interface replacing the cut bond. 40 3 Model Proteins 3.1 Haloalkane Dehalogenase DhaA Haloalkane dehalogenases (HLDs) are mostly bacterial enzymes (EC 3.8.1.5) catalyzing hydrolytic cleavage of the carbon-halogen bond in chlorinated, brominated and iodinated alkanes, cycloalkanes, alkenes, esters, ethers, alcohols, amides or acetonitrils1 6 5 - 1 6 7 . Many of the halogenated compounds are important environmental pollutants with toxic or genotoxic effect on human. Therefore, the ability to degrade these substances predetermined them for application in bioremediation1 6 8 , warfare agents detoxification1 6 9 , biosensing1 7 0 , synthesis of optically pure compounds1 6 6 and cell imaging1 7 1 . HLDs are wellcharacterized enzymes with reasonable stability and extensively studied reaction mechanism. A wealth of mechanistic information, together with broad applicability, makes HLDs ideal model enzymes for enzymology or protein engineering studies. DhaA is HLD isolated from Rhodococcus rhodochrous NCIMB 130641 7 2 and together with all the others HLDs belongs structurally to the a/|3-hydrolase fold1 7 3 . DhaA consists of two covalently bound domains (Figure 7A). The main domain is composed of the conserved eight-stranded (3-sheet surrounded by six a-helices and it is connected a-helical cap domain by two loops. The buried active site cavity is situated on the interface of both domains and is connected with the solvent by the main tunnel leading through the cap domain. Catalytic residues of DhaA, so-called catalytic pentad (Figure 7B), are situated on the main domain and is conserved within the whole HLD-II subfamily1 7 4 . In DhaA, it consists of the catalytic triad (aspartate 106, glutamate 130, histidine 272)1 7 5 and two halide-stabilizing residues (asparagine 41 and tryptophan 107)1 7 6 . 41 Figure 7. The structure of haloalkane dehalogenase DhaA and the catalytic pentad. A) The main domain is represented in gray and cap domain in black color. The active site containing the catalytic pentad is situated between both domains (red spheres). B) The arrangement of catalytic residues within the active site. Dehalogenation reaction involves the two-step catalytic mechanism (Figure 8), where the first step is the bimolecular nucleophilic substitution followed by the hydrolysis in the second step 1 7 5 -1 7 7 -1 7 8 . initially, the halogenated substrate binds near the catalytic nucleophile at the position 106 with the leaving halogen atom pointing towards the halidestabilizing residues. Aspartate 106 performs the nucleophilic attack forming a covalently bound enzyme-substrate intermediate and the halide ion stabilized by two hydrogen bonds from the halide-stabilizing residues. The catalytic base (histidine 272) stabilized by the catalytic acid (glutamine 130) takes a proton from one of the active site water molecules. Subsequently, the hydroxyl anion hydrolyses the enzyme-substrate intermediate and the formed alcohol and the halide ion leave the active site. 42 Figure 8. Two-step dehalogenation reaction of DhaA enzyme. In the first step, Sn2 reaction is performed through nucleophile (Aspl06) attacking the carbon atom carrying the halogen. Leaving halide ion is stabilized by halide-stabilizing residues and alkyl-enzyme intermediate is formed. Subsequently, the water molecule activated by catalytic base (His272) hydrolyses the intermediate forming the alcohol product and restores enzyme to the original form. 3.2 y-Hexachlorocyclohexane Dehydrochlorinase LinA y-hexachlorocyclohexane dehydrochlorinase LinA is a unique enzyme (EC 4.5.1.B1) isolated from bacterium Sphingobium japonicum UT26 found in the areas with soil contaminated by y-hexachlorocyclohexane (HCH, lindane)1 7 9 , 1 8 0 . HCH was used worldwide as an insecticide for many years before it has been prohibited in the most of the countries as a dangerous environmental pollutant1 8 1 , 1 8 2 with toxic and potentially genotoxic effect on human health. The long persistence in the soil together with its toxicity makes HCH an important target for degradation attempts using LinA in the first and the second step of the HCH degradation pathway. LinA is followed by LinB, LinC, LinD, LinE, LinF, LinG, LinH and LinJ enzymes which metabolize the substrate to succinyl-CoA and acetyl-CoA, which are metabolized in the citrate acid cycle1 8 3 . LinA is 156 amino acids long dehydrochlorinase belonging to the a+(3 proteins1 8 4 but it shows very low amino acids identity with structurally similar proteins, e.g., 16 % sequence identity with scytalone dehydratase1 8 5 . The biological unit is a homotrimer and each chain forms a cone-shaped barrel fold composed of six-stranded (3-list, four a-helices and the Cterminal region interacting with the neighboring chain (Figure 9). The catalytic active site, large enough to accommodate the HCH, is situated inside the barrel of each chain. The 43 active site and the access tunnel are primarily formed by the hydrophobic residues. The catalytic residues histidine 73 and aspartate 25 form so-called catalytic dyad. Figure 9. The structure of y-hexachlorocyclohexane dehydrochlorinase LinA. Each of three homomeric units is highlighted by different color. The active site containing two catalytic residues is represented by red spheres. C-terminal regions from the neighboring subunits are proposed to serve as molecular gates that distinguish between the "open" unbound conformation and the "closed" conformation. These conformational changes create a hydrophobic environment for the bound substrate as it is observed in scytalone dehydratase1 8 4 -1 8 6 . The proposed reaction mechanism is bimolecular elimination E2 reaction (Figure 10). Histidine 73 functions as a base and abstracts the proton from HCH or y-pentachlorocyclohexene (PCCH). Histidine 73 is activated by aspartate 25 which increases its basicity and stabilizes the positive charge that develops on the histidine after the deprotonation. The other active site residues are probably stabilizing transition state or are directing the substrate into proper binding mode but site-directed mutagenesis showed that they are not essential for the reaction1 8 4 '1 8 7 . 44 11 HCH PCCH TCDN TCB Figure 1 0 . Generalized reaction mechanism of dehydrochlorinase LinA. Firstly, HCH is converted to pentachlorocyclohexene (PCCH). In the second step, PCCH is converted again by LinA to tetrachlorocyclohexadiene (TCDN). TCDN is then metabolized by the second pathway enzyme haloalkane dehalogenase LinB, followed by other enzymes or in the absence of LinB it undergoes spontaneous conversion to trichlorobenzene (TCB). 3.3 Fibroblast Growth Factor 2 Fibroblast growth factors (FGF1-23) belong to one of the largest families of protein growth factors, which can be found in the broad range of organisms from nematodes to human1 8 8 - 1 9 0 . In human, FGFs have important functions in both embryos and adults. In embryonic development, FGFs regulate many developmental processes like brain patterning, branching morphogenesis, angiogenesis and limb development. In adults, FGFs show important roles during wound healing or tissue repair and during regulation of metabolism and homeostasis. All these functions make FGFs highly attractive for medical, pharmaceutical or biotechnological applications. Human fibroblast growth factor 2 (basic fibroblast growth factor, bFGF, FGF-(3, FGF2) is a pleiotropic regulator of cell proliferation, differentiation, and migration1 9 0 '1 9 1 . Recently, it has been studied thoroughly for the applications in medicine. Particularly, angiogenic stimulation by FGF2 can be used in tissue repair and its overexpression in animals showed cardio-protective function from a heart attack and better regeneration after re perfusion1 9 2 . The promoted angiogenesis has a positive effect on artery diseases and healing of patients suffering from ulcers1 9 3 . FGF2 shows increased regeneration of alveolar bone in patients with periodontitis1 9 4 and upregulation of FGF2 can be used for the treatment of mood 45 disorders1 9 5 . From the biotechnological point of view, FGF2 is an essential component of culture medium used for cultivation of human embryonic stem cells. It prevents differentiation of pluripotent cells1 9 6 . The high instability of the protein at 37 °C complicate cultivation of stem cells. Even daily media replacement does not prevent significant fluctuation of the protein concentration level. Protein engineering of FGF2 towards prolonged half-life has a potential to increase the efficiency and decrease the cost of stem cell cultivation. Human FGF2 encodes 5 different isoforms whose length depends on subcellular localization and function1 9 7 . Basic protein is 155 amino acids long starting its translation classically from AUG codon. Isoforms with higher molecular weight initiate the translation from non-standard codons. The structure is composed solely of |3-strands comprising 12stranded antiparallel |3-sheets forming a trigonal pyramidal structure (Figure l l ) 1 9 8 . The pleiotropic effect can be reached by different binding partners. The positions 13 - 30 and 106 - 115 represent FGF-receptors or heparin-binding sites1 9 9 -2 0 0 . Figure 1 1 . The structure of human fibroblast growth factor 2 . 46 PART II RESULTS 47 4 FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants David Bednar1 4 Ť , Koen Beerens1 + , Eva Šebestová1 , Jaroslav Bendi1 '2 '4 , Sagar Khare3 , Radka Chaloupková1 , Zbyněk Prokop1 , 5 , Jan Brezovský1 , 4 , David Baker6 , Jiri Damborsky1 '4 , 5 * 1 Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic. 2 Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Bozetechova 1/2, 612 66 Brno, Czech Republic. 3 Department of Chemistry and Chemical Biology, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854, USA. "International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic. 5 Enantis, Ltd., Palackeho trida 1802/129, 612 00 Brno, Czech Republic. 6 Department of Biochemistry, University of Washington, Seattle, WA 98195, USA. f These authors contributed equally to the work PLoS Computational Biology, 2015,11, el004556 DOI: 10.1371/journal.pcbi. 1004556 48 4.1 Abstract There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but experimentally strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, experimentally verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and y-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ATm = 24°C and 21°C) by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnological applications. 49 4.2 Author Summary Proteins are increasingly used in numerous biotechnological applications. A key property determining proteins' applicability is their stability under operating conditions. Natural proteins can be stabilized by modification of their structure. Methods of molecular biology allow introduction of modifications - mutations - to the protein structure at will, but it is not straightforward where to mutate and which amino acid to introduce for better stability. Computational methods can be used for prediction of stabilizing mutations using computers. Current computational methods predict libraries of single-point mutations, which need to be constructed individually, tested and recombined, resulting in non-trivial experimental effort. Here we present a robust computational strategy for predicting multiple-point mutants, providing extremely stabilized proteins with a minimal experimental effort. 50 4.3 Introduction Proteins are increasingly used in biotechnological applications as therapeutics 5 1 , diagnostics5 3 , nanomaterials5 1 and biocatalysts5 2 . Despite numerous advantages, the utility of proteins is frequently restricted by their limited stability under practical conditions, such as high temperatures, extreme pH, or the presence of organic solvents or proteases. Their thermostability is usually positively correlated with stability and performance in the presence of denaturing agents2 0 1 , expression yield2 0 2 , serum survival time2 0 3 and shelflife2 0 4 . Thus, it is a key determinant of proteins' applicability in biotechnological processes. High temperatures may also be required to prevent bacterial contamination during enzymatic food processing2 0 5 . Moreover, thermostable proteins can tolerate much larger numbers of mutations than mesophilic variants and show enhanced evolvability in protein engineering projects2 0 6 . Protein engineering is frequently applied to obtain more stable proteins. If successful, such efforts typically enhance the melting temperature (T"m) of engineered proteins by 2 to 15°C2 0 4 '2 0 7 . Extremely stabilized proteins with even greater increases in melting temperature (AT"m) have been engineered by incorporating multiple mutations, and several outstanding increases of up to 35°C have been achieved using directed evolution methods2 0 5 . However, these methods generally require extensive experiments, including screening up to 108 colonies of organisms expressing mutant variants to identify stable constructs, and appropriate high-throughput screening assays must be available2 0 8 . A currently popular strategy is saturated mutagenesis of hotspots identified by (semi)rational approaches2 0 4 , 2 0 5 , 2 0 9 , such as the most flexible residues2 0 7 ; tunnel-forming residues2 1 0 ; or residues at multimeric interfaces7 1 . The selected hotspots are then subjected to site-saturation mutagenesis (while leaving the rest of the protein unchanged) to create smaller smart libraries, markedly reducing the required screening to thousands of colonies. A long-sought alternative to screening-based approaches is reliable in silico design of stability-enhancing mutations. Numerous stable proteins have been computationally 51 engineered via diverse approaches (singly or in combination), e.g., identification of backto-consensus or ancestral mutations, calculation of changes in folding free energies upon mutation, introduction of disulfide bridges and elimination of highly flexible regions2 0 4 , 2 0 5 , 2 0 9 . However, mutants generated using computational methods have rarely surpassed the 15°C AT"m threshold of outstanding stabilization as a result of neutral, destabilizing or function-corrupting mutations that were predicted as stabilizing due to moderate accuracy of these methods2 1 1 , 2 1 2 . To overcome this obstacle and provide substantial stabilization, predicted mutations are usually introduced by site-directed mutagenesis and tested individually. The most viable mutations are then recombined in multiple-point mutants assuming they have additive effects, but this is often invalid due to antagonistic epistatic effects of individual mutations2 1 3 . For all those reasons, no computational method capable of directly designing highly stable multiple-point mutants has been previously published. Here we introduce a strategy, FireProt, for computationally designing multiple-point mutants, enabling significant improvements of protein stability with minimal experimental effort. We demonstrate its power by stabilizing the model proteins haloalkane dehalogenase (HLD) DhaA and y-hexachlorocyclohexane dehydrochlorinase LinA. The method's general applicability was further verified by validation against information from the ProTherm database1 0 3 , demonstrating that it can be used to identify stabilizing mutations in diverse proteins with known tertiary structures and homologous sequences allowing phylogenetic analysis, and thus should have broad utility in protein stabilization projects. 52 4.4 Results 4.4.1 Development of FireProt strategy for design of stabilizing multiple-point mutants The FireProt strategyfor protein stabilization is based on combiningthe best multiple-point mutants obtained from predictions of AAG following mutation from a set of crystal structures and evolutionary information derived from multiple sequence alignment (Figure 12). Additional pre- and post-processing filters are applied in both approaches to improve prediction reliability and reduce the required computational effort. Energy-based approach Evolution-based approach Target protein tz o £ a. E o Ü CD c CD CD Q- X LU Conservation and correlation analysis FoldX prediction Rosetta prediction Interaction analysis Antagonistic effect prediction Multiple-point mutant design Structure and activity check Stability determination Back-to-consensus analysis FoldX prediction Interaction analysis Multiple-point mutant design Structure and activity check Stability determination Combined mutant Figure 12. Workflow of the FireProt method. Individual steps involved in the energy- and evolution-based approaches Dataset construction. FireProt's capacity to identify the most stabilizing single point mutations either by energy- or evolution-based approaches was evaluated using the 53 dataset derived from the ProTherm database employing the records with absolute AAG values > 0.5 kcal.mol"1 . This value was selected since the estimated experimental error for measurements of AAG is about 0.48 kcal.mol"1 2 1 4 . When multiple AAG values were available for a single mutation, the value determined closest to the standard experimental conditions was retained. Ten most mutated proteins from the ProTherm database, representing all four protein structural classes (Table S 1), were selected to increase the chance that predicted mutations will be experimentally characterized. The total number of possible mutations in these 10 proteins was 30,058, but good quality experimental data were available for only about 2% of the mutations, yielding the dataset of 656 mutations (119 stabilizing and 537 destabilizing) at 337 different positions (Table S 1). Optimization of energy-based approach. We evaluated the performance of four prediction tools: FoldX9 4 , Rosetta8 8 , ERIS89 and CUPSAT2 1 5 , using the ProTherm dataset (Table S 2). The purpose was to select suitable combination of tools and the thresholds of predicted energy change upon mutation that would achieve very high precision (the ratio between the truly stabilizing mutations and mutations predicted as stabilizing) and simultaneously very low false positive rate (the fraction of destabilizing mutations incorrectly predicted as stabilizing from all truly destabilizing mutations). The stability change thresholds were tested in the range between -2.5 and +2.5 kcal.mol"1 with the step size 0.5 kcal.mol"1 . According to this evaluation, Rosetta and FoldX achieved the highest precision, 76% and 67%, respectively, when using the thresholds of -2 kcal.mol"1 and -1 kcal.mol"1 (Table S 2). The false positive rates for these thresholds for Rosetta and FoldX were 1% and 2%, respectively (Table S 2). Evaluation of energy-based approach. Combination of the prediction by the two best performing tools using their optimal thresholds with conservation analysis by Rate4Site tool2 1 6 in the energy-based FireProt approach performed notably better than either Rosetta or FoldX alone: FireProt achieved 100% precision and 0% false positive rate. Such high reliability in prediction was achieved at the expense of the number of recognized stabilizing mutations. Rosetta and FoldX correctly identified 16 and 20 truly stabilizing 54 mutations out of 21 and 30 mutations predicted by these tools as stabilizing (Table S 2). However, only 8 mutations were agreed upon as stabilizing by both methods following the FireProt strategy. Conservation analysis discarded 2 false positive mutations since they targeted immutable positions with high CONSURF conservation grade (> 8)2 1 7 , 2 1 8 . The remaining 6 mutations selected by FireProt as stabilizing were all true positives. In addition to 6 mutations included in the evaluation, the FireProt predicted other 101 stabilizing mutations for 10 most mutated proteins for which, however, experimental data were not available (Table S 3). Evaluation of evolution-based approach. In addition to the energy-based approach, the evolution-based approach of FireProt strategy was also evaluated using the ProTherm dataset. The back-to-consensus method2 1 9 identified six potentially stabilizing mutations in the 10 most mutated proteins for which experimental data were available. Four of these mutations were true positives and two were false positives. The subsequent application of the FoldX filter (Figure 12) correctly discarded one of the false positive mutations, giving a precision of 80%. Evaluation of prediction for multiple-point mutants. The last step in the development of the FireProt workflow before its application towards a design of thermostable proteins was to evaluate Rosetta's ability to predict the stability effects for multiple-point mutations. We collected a consistent dataset of previously measured stability changes in 46 mutants of DhaA enzyme for this purpose. Using this dataset a correlation of 0.81 between the predicted AAG and experimentally determined Tm values was observed, suggesting that Rosetta could be employed for this purpose (Figure S 1). 4.4.2 Design of thermostable haloalkane dehalogenase DhaA DhaA enzyme was selected as the first model protein due to the wealth of knowledge available on mutants engineered towards higher thermostability, prolonged half-life and stability in organic co-solvents that enables quantitative comparison of their performance with the mutants designed by FireProt2 2 0 '2 2 1 . 55 Energy-based approach. Out of 5,529 possible single-point mutations, 1,919 were at positions with high evolutionary conservation (CONSURF grade > 8) or exhibiting evolutionary correlation with other residues (Correlated Mutation Analysis score > 0.8 calculated by Comulator tool2 2 2 ), indicating that these positions are functionally important. All these mutations were discarded to avoid major changes in the enzymes' activity or substrate specificity. FoldX analysis of AAG for the remaining 3,610 mutations identified 151 potentially stabilizing single-point mutations. Rosetta calculation was then applied to further decrease the number of potentially false positives among these mutations, resulting in 22 promising mutations (Table S 4). A mutation disrupting a salt bridge and five with antagonistic effects (AAG of double-point mutants > -3.0 kcal.mol"1 ) were also discarded, leaving 16 potentially stabilizing mutations. Only the most favorable mutations at each position were further analyzed for their mutual interactions, leaving a final set of eight potentially non-antagonistic stabilizing mutations: C128F, T148L, A172I, C176F, D198W, V219W, C262L and D266F (Figure 13). The Rosetta predicted high stability of the recombined multiple-point mutant DhaA112 carrying all these mutations (Table 1). Evolution-based approach. The back-to-consensus approach employing simple consensus and frequency ratio predictions identified 42 potentially stabilizing substitutions (Table S 5, Table S 6, Table S 7, Table S 8). Of these, 22 were excluded by FoldX predictions as potentially destabilizing (AAG > 0.5 kcal.mol"1 ), and seven to preserve residues with sidechains involved in important interactions. In total, 13 back-to-consensus mutations passed all of the applied filters and were combined into four multiple-point mutants (Figure 13). These mutations were combined according to their different origin from two applied consensus techniques using either representative MSA of the HLD-II subfamily or MSA of whole HLD family (see Methods for more details). DhaAlOO featured the I136L, V184E and V197E mutations, which were predicted by both simple consensus and frequency ratio analyses of the representative MSA of the HLD-II subfamily. DhaAlOl contained the mutations E20S, F80R and A155P, which were predicted solely by the frequency ratio analysis, while DhaA102 included the mutations L161M, 1162V and D198S, which were predicted solely by the simple consensus analysis. Finally, DhaA103 contained the V55L, 56 A127V, H188A and E191A mutations, which were predicted by simple consensus analysis of the representative MSA of the whole HLD family. Interestingly, none of these mutants was predicted to be more stable than the wild-type by Rosetta (Table 1). Energy-based mutant Evolution-based mutant Energy-based mutant Evolution-based mutant Combined mutant Figure 13. Location of stabilizing mutations in designed enzymes. A) Locations of substitutions in energybased, evolution-based and combined mutants of DhaA enzyme. Substitutions in the multiple-point mutant designed by the energy-based approach (DhaA112) are represented as orange spheres, while substitutions in multiple-point mutants designed by the evolution-based approach are represented as red (DhaAlOO), blue (DhaAlOl), green (DhaA102) and magenta (DhaA103) spheres. Mutations in the combined mutant (DhaA115) are colored in orange and blue in correspondence with their original mutants (DhaA112 and DhaAlOl). B) Locations of substitutions in energy-based, and evolution-based mutants of LinA enzyme. Substitutions in the multiple-point mutant designed by the energy-based approach (LinAOl) are represented as orange spheres, while substitutions in multiple-point mutant designed by the evolution-based approach (LinA02) are represented as blue spheres. 57 Table 1. Characteristics of predicted multiple-point mutants of DhaA Rosetta DSC Activity at Method Protein Mutations AAG 37°Cb (kcal.mol1 ) Tm CO A7m (°C) (nmol s_1 mg"1 ) DhaAwt _a _a 49.0 ±0.7 _a 18.0 Energy-based DhaA112 C128F + T148L + A172I + C176F + D198W + V219W + C262L+D266F -32.0 ± 1.4 65.2 ±0.1 +16.2 5.5 DhaAlOO I136L + V184E + V197E 1.1 ±0.3 48.5 ±0.4 -0.6 9.4 DhaAlOl E20S + F80R + A155P 0.8 ±0.1 58.6 ±0.3 +9.6 49.3 Evolution-based DhaA102 L161M +1162V+ D198S 2.3 ±0.7 48.1 ±0.1 -0.9 9.4 DhaA103 V55L + A127V + H188A + E191A 3.0 ± 1.0 51.4 ±0.1 +2.3 6.5 E20S + F80R + C128F + T148L + A155P Combined DhaA115 + A172I + C176F + D198W + V219W -32.4 ± 1.0 73.6 ±0.1 +24.6 5.6 + C262L + D266F a not applicable;b activity determined with 1-iodohexane at 37°Cand pH 8.6; AAG-predicted change in Free Gibbs Energy; DSC - Differential Scanning Calorimetry; DhaA115 combines mutations of DhaAlOl and DhaA112 Characterization of mutants designed by FireProt. Expression and purification of all constructed mutants resulted in good yields and protein purity. Far-UV CD spectra of wild-type and mutants show that none of the mutations caused significant changes in secondary structure (Figure S 2A) and all tested variants were active with 1-iodohexane at 37°C (Table 1). The constructed variants' thermostability was determined by thermallyinduced denaturation using DSC (Table 1) and CD spectroscopy (Figure S 2A). A substantial increase in melting temperatures (A7"m 16.2°C) was observed for variant DhaA112 designed by the energy-based approach, indicating strong stabilization effects of the introduced mutations (Table 1). The effect of the consensus substitutions was moderate - only two tested variants, DhaAlOl and DhaA103, showed positive thermostabilization effects (AT"m 9.6 and 2.3°C), while the mutations in DhaAlOO and DhaA102 had neutral or destabilizing effects (Table 1). Combining the best energy-based mutant (DhaA112) with the best mutant identified by the evolution-based approach (DhaAlOl) produced a final mutant, DhaA115. Effects of the evolution-based substitutions were complementary and additive with those 58 predicted by the energy-based approach, giving an outstanding increase in thermostability: A r m 24.6°C (Table 1). Characterization of the final combined mutant. The combined mutant DhaA115 was characterized in detail, in terms of its thermostability in the presence of organic cosolvents, half-life at elevated temperature and temperature profile. The Tm determinations show that the mutations had stabilizing effects in the presence of three organic co-solvents comparable to those in the pure buffer (AT"m: 20 to 26°C; Figure 14A). The enhanced thermostability was also reflected in a strongly improved half-life at 60°C (Figure 14B). After seven days incubation at 60°C, the mutant DhaA115 still retained about 50% of its initial activity, while the wild-type became inactivated within six hours. Two inactivation phases were observed for all DhaA variants: rapid initial inactivation followed by a slower decay of activity (Figure 14B). The wild-type lost around 80% of its activity in the first fast phase, while the mutations in DhaA115 reduced the loss during the first inactivation phase to only 30%. The rate of inactivation during the first phase was comparable for both wild-type and DhaA115, while the rate during the second phase was dramatically (100-fold) slower for the mutant. Similar effects were observed with a previously reported stable variant of DhaA63 (denoted Dhla8)2 2 1 constructed using Gene Site-Saturation Mutagenesis (Figure S 3). The apparent optimal temperature shifted from 45°C for the wild-type enzyme to 65°C for the mutant DhaA115, and its specific activity with 1-iodohexane at optimum temperature was 28% higher, but the shape of the temperature profile remained largely unchanged (Figure 14C). Steady-state kinetic constants of the two enzymes determined at their respective suboptimal temperatures with 1-iodohexane revealed comparable catalytic properties (Table 2). Table 2. Steady-state kinetic constants of DhaA wild-type and the final mutant D h a l l 5 determined with 1-iodohexane at 37°C and 57°C, respectively, and pH 8.6 Enzyme fcca, (S-1 ) Kos (UM) N Ks, (mM) DhaAwt 2.47 ±0.01 12.00 ± 1.14 1.80 ± 0.04 0.53 ±0.01 DhaA115 2.85 ± 0 . 0 3 5.00 ±2.31 1.89 ±0.01 /Cos - concentration of substrate at half maximal velocity, feat - catalytic constant, n - Hill coefficient Ks\ - substrate inhibition constant 59 80 60 u ü — 40 20 buffer B 50% DMSO 20% 2 0 % acetone methanol 30 60 90 120 Time (h) 150 180 20 30 40 50 60 70 80 Temperature (°C) Figure 14. Biochemical properties of DhaA wild-type and the final mutant DhaA115. A) Melting temperatures of DhaA wild-type (blue) and DhaA115 (red) in the presence of indicated solvents. B) Half-life of DhaA wild-type (blue) and DhaA115 (red) determined at 60°C and pH 8.6 with the substrate 1-iodohexane. C) Temperature profiles of DhaA wild-type (blue) and DhaA115 (red) determined at pH 8.6 with the substrate 1-iodohexane. 4.4.3 Design of thermostable y-hexachlorocyclohexane dehydrochlorinase LinA y-hexachlorocyclohexane dehydrochlorinase LinA enzyme was selected as the second model protein to illustrate broader applicability of FireProt strategy to other proteins of 60 very different characteristics: (i) LinA is natively homotrimer (DhaA is monomer), (ii) LinA monomers form a+(3 barrel fold (DhaA possesses a/|3-hydrolase fold), (iii) LinA is mainly composed of |3-sheets (a-helices and |3-sheets and equally represented in DhaA) and (iv) LinA is with 156 amino acids two-times shorter (DhaA has 294 residues). Energy-based approach. Out of 2,689 possible single-point mutations, 1,533 passed the evolutionary conservation or correlation filter. FoldX analysis of AAG for the remaining mutations identified 68 potentially stabilizing single-point mutations. Subsequent Rosetta calculation further decreased the number of promising mutations to 8 (Table S 9). A mutation D19M disrupting a salt bridge and T133L with antagonistic effect with position D3 were also discarded, leaving 6 potentially stabilizing mutations at four positions. Only the most favorable mutations at each position were further analyzed: D3i, S127Y, T133I and A145H (Figure 14B). The Rosetta predicted high stability of the recombined multiple-point mutant LinAOl carrying all these four mutations (Table 3). Evolution-based approach. The back-to-consensus identified 15 potentially stabilizing substitutions (Table S 10). Of these, 10 were excluded as destabilizing due to FoldX predictions. Mutation K20Y touches the halide-stabilizing residue and F113Y has a negative effect on enzyme activity according to Uniprot database. Remaining 3 back-to-consensus mutations passed all of the applied filters and were combined into three-point mutant: Y50F, F68W and A131V (Figure 13B). Table 3. Characteristics of predicted multiple-point mutants of LinA Method Protein Mutations Rosetta AAG (kcal.mol1 ) DSC Tm (°C) A7m (°C) Activity at 30°C (umol s"1 mg-1 ) b LinAwt _a _a 41.410.1 0.21 (0.12 mM)c 1.91 (0.38 mM)c Energy-based LinAOl D3I + S127Y + T133I +A145H -31.4 62.310.4 +20.9 0.32 (0.12 mM)c 1.29 (0.34 mM)c Evolution-based LinA02 Y50F + F68W + A131V 0.1 37.710.2 -3.7 0.17 (0.11 mM)c ND a not applicable; b activity determined with y-hexachlorocyclohexane at 30°C and pH 8.6;c initial y-HCH concentration is given since it affects determined specific activity; AAG - predicted change in Free Gibbs Energy; DSC - Differential Scanning Calorimetry; ND, not determined 61 Characterization of mutants designed by FireProt. Expression and purification of all constructed mutants resulted in good protein yields and purity. Comparison of the far-UV CD spectra of LinA wild-type and its mutants show that none of the mutations caused significant changes in secondary structure (Figure S 2B). Both LinA variants retained similar level of specific activity as LinAwt (Table 3) showing that the introduced mutations did not alter activity negatively. The thermostability of the constructed LinA variants was determined by thermally-induced denaturation using both DSC (Table 3) and CD spectroscopy (Figure S 2B). Similar to the energy-based DhaA variant, the energy-based LinA variant (LinAOl) showed a substantial increase in melting temperatures (A7"m +20.9°C) again showing the strong stabilization effects of the introduced mutations (Table 3). The evolution-based mutant (LinA02) showed small decrease in thermostability (AT"m -3.7°C) indicating that mutations identified by consensus methods are not conserved to preserve the stability of the enzyme (Table 3). No combined mutant was constructed due to the absence of stable evolution-based mutations. 4.4.4 Structural basis of mutation effect Visual inspection of mutant structures coupled with detailed analysis of their individual energy terms calculated by Rosetta provided indications of the possible structural basis of protein stabilization by mutations of DhaA115 and LinAOl (Table S 11). These mutations were introduced to various locations in the protein structure with different types of secondary structures. Stabilizing mutations in DhaA. Out of 11 mutations, 3 residues are lining a main transport tunnel, 3 mutated residues were buried in the protein core and 5 are exposed to solvent on the protein surface (Figure 13A and Table S 11). 8 mutations identified by the energybased approach introduce bulkier, more hydrophobic residues (Table S 11) that probably enhance stability by improving packing of atoms in the protein interior and/or strengthening hydrophobic interactions. The contributions to stability of the mutations proposed by the evolution-based approach are more difficult to explain. The A155P 62 mutation (at the fourth most flexible position in the protein structure) could increase rigidity by introducing proline to the affected loop and account for most of the observed stability improvement, while effects of the E20S and F80R mutations are probably neutral or restructuring the charged network on the surface of DhaA (Table S 11). Stabilizing mutations in LinA. Out of 4 stabilizing mutations, 2 are buried in the protein core and 2 are exposed on the protein surface (Figure 13B and Table S 11). In correspondence with observation for stabilizing mutations introduced into DhaA enzyme, all 4 mutations identified by the energy-based approach for LinA introduced bulkier and more hydrophobic residues (Table S 11). 4.5 Discussion The last decade has seen significant advances towards more rational approaches to reduce the experimental effort required to engineer highly stable proteins (Figure 15 and Table S 12). As a contribution to these efforts we have developed a hybrid strategy integrating energy-based and evolution-based approaches, with smart filtering of mutations that are destabilizing or may impair enzymes' functions, enabling the identification of additively stabilizing substitutions in multiple-point mutants. It is essential to correctly configure all of the tools used in both the energy- and evolution-based approaches of the FireProt workflow in order to achieve robust and reliable predictions. Therefore, individual steps of the workflow were verified using a dataset featuring diverse proteins from the ProTherm database. The predictions carried out for 656 mutations confirmed the FireProt's precision: the energy- and evolution-based approaches identified stabilizing mutations with success rates of 100% and 80%, respectively. Strikingly, only one stabilizing mutation that exceeded our thresholds was identified by both approaches, suggesting that they are highly complementary. The potential downside of the stringent conditions imposed to avoid false positives was that 92% of the available stabilizing mutations were discarded. However, the remaining correctly identified stabilizing mutations should be more than sufficient to construct highly stable catalysts (Table S 3). 63 When the energy-based approach was applied to DhaA and LinA enzymes, the removal of conserved and correlated positions from analysis helped to avoid modification of structurally and functionally important residues, thereby greatly reducing the number of possible mutations requiring evaluation by computationally intensive free energy calculation. Since FoldX computation is about an order of magnitude faster than Rosetta, it was applied as a pre-filter, further reducing numbers of mutations to be analyzed by Rosetta. Regarding the prediction of multiple-point mutants, simple recombination of the most promising mutants could weaken stabilization, since strong antagonistic effects were detected even at the level of double-point mutants. The thermostability enhancement for the eight- and four-point mutants predicted by this approach, DhaA112 (AT"m 16°C) and LinAOl (AT"m 21°C), both exceeded the threshold for outstanding stabilization, although none of the introduced mutations optimized either hydrogen bonds or charge-charge interactions. This may be due to sampling limited rotamer libraries during the calculations and the requirement for both FoldX and Rosetta to unambiguously evaluate selected mutations as stabilizing. FoldX and Rosetta employ simplified scoring functions and despite using three protein structures for analysis, only limited protein flexibility is allowed, implying that it should be possible to supplement mutations proposed by free energy calculations with beneficial substitutions identified using different principles. Experimental effort Computational effort Success rate Figure 15. Schematic comparison of protein stabilization methods. Examples of representative methods with their characteristics and success rates are presented in Table S 12. 64 To this end, additional mutations were selected by the evolution-based approach. The mutations predicted by the back-to-consensus method were filtered by FoldX to discard mutations proposed due to function-related evolutionary constraints rather than structural stabilization. This filtering step proved to be very important as over half of the mutations were discarded as potentially destabilizing. Interestingly, all five multiple-point mutants DhaA100-DhaA103 and LinA02 were predicted as destabilizing by Rosetta and had to be tested experimentally. While this prediction was accurate for three of them (DhaAlOO, DhaA102 and LinA02), the other two mutants (DhaAlOl and DhaA103) were clearly more stable than the wild-type. This result suggests that some underlying principles important for stability detected by the back-to-consensus method are not captured by the applied Rosetta protocol. We speculate that these may include larger backbone rearrangements, interactions with ions present in the solvent, or other entropic contributions that are not well accounted for in the current protocols. Experimental characterization of these mutants by microcalorimetry, temperature-jump stopped-flow and protein crystallography is currently on-going in our laboratory. Despite its lower reliability, the evolution-based approach should still be considered as useful supplement to the energy-based approach, potentially enabling further improvement in the stability of designed proteins. The final 11point mutant DhaA115 arising from this hybrid prediction strategy is one of the most stable HLD protein known to date (A7"m > 24°C)2 2 0 <2 2 3 . We have compared our strategy against several methods providing exceptional protein stabilization (Table S 12). The experimentally intensive protocols of directed evolution and hot-spot predictions can provide engineered enzymes with comparable enhancement. However, since their success rate is generally below 1%, stable proteins can only be obtained after extensive screening. Notably, two of these studies also focused on improving stability of the enzyme DhaA. In one, an eight-point mutant DhaA was obtained with a A7"m of 18°C after screening all 121,000 possible variants2 2 1 . We have obtained a clearly superior enzyme after experimental evaluation of as few as six mutants, highlighting the importance of removing mutations with antagonistic and uncertain stabilizing effects. In the other study performed with DhaA, four hotspots in an access tunnel were 65 experimentally randomized, requiring experimental screening of 5,000 mutations2 2 0 , and the AT"m forthe best four-point mutant was 19°C. Highly stable proteins have been obtained by in silico prediction of stabilizing effects of single-point mutations in four recently published studies8 5 '2 1 1 , 2 2 4 , 2 2 5 . In one, 67 variants of epoxide hydrolase with mutations identified as potentially stabilizing by the FRESCO method were experimentally tested, 24 were reportedly more stable than the parent protein, and the variant with the best permutation of mutations had remarkably enhanced thermostability (AT"m 36°C)2 2 5 . Much of this enhancement arose from disulfide bridges at the dimer interface, making this approach particularly suitable for multimeric proteins. In another of the studies, four out of six engineered methionine aminopeptidases designed by the Rosettavip method were found to be stabilizing and a combined five-point mutant reportedly had a AT"m of 18°C85 . The authors noted that their final construct is still less stable than the most thermostable native aminopeptidases and that the method is particularly effective for mutagenesis of buried residues around internal cavities. In the other study, a 12-point mutant of Tobacco 5-epi-aristolochene synthase was generated using the SCADS method with an impressive AT"m (45°C), but at the expense of 98% of catalytic activity at the optimal temperature2 2 4 . In comparison to the methods applied in these and other relevant studies (Table S 12), FireProt affords a reduction of experimental screening effort due to robust identification of stabilizing mutations and ensuring their additivity. In addition, it has promising applicability to diverse proteins, potentially all proteins with known tertiary structure and homologous sequences, due to the diverse locations of introduced mutations and universal applicability of underlying principles. In summary, the presented hybrid strategy FireProt affords rapid design of stable proteins. Consideration of the additivity of identified potentially beneficial mutations enables prediction of multiple-point mutants with significantly enhanced stability. Despite a dramatic reduction in experimental effort, the workflow provided two proteins with outstanding stability. One of them a HLD with greater thermostability than all known HLD enzymes, either obtained from thermophilic organisms or engineered using extensive 66 combinatorial screening. Furthermore, owing to the smart filtering, this strategy is affordable by users with limited access to powerful computer facilities. In addition, implementation of the FireProt strategy in the web-based protein engineering tool Hotspot Wizard6 9 is currently on-going in our laboratory to ensure user convenience. 4.6 Methods 4.6.1 Bioinformatics analysis Construction of multiple sequence alignments. Sequences of six experimentally characterized HLDs - DhaA, LinB, DrbA, DmbC, DhIA and DmbB - or the sequence of LinA were used as queries for PSI-BLAST2 2 6 searches against the nr NCBI database (version July- 2009 and May-2015, respectively)2 2 7 , with threshold f-values of 10"1 0 and 10"1 5 forthe initial BLAST search and inclusion of a sequence in the position-specific matrix, respectively. Sequences collected after three PSI-BLAST iterations were clustered by CD-HIT2 2 8 at 90% identity threshold. The resulting dataset (including 8,226 sequences for DhaA and 946 for LinA) was clustered with CLANS2 2 9 using default parameters and varying threshold P-values. Sequences clustering with query at the P-value of 10"2 9 were extracted and aligned with MUSCLE2 3 0 . All artificial, incomplete or divergent sequences were removed. Final multiple sequence alignment (MSA) of LinA contained 13 sequences. The prepared MSA of the HLD protein family comprised 168 sequences. All sequences (47) belonging to the HLD-II subfamily were then extracted to create a HLD-II subfamily dataset and aligned with MUSCLE. To reduce possible bias toward highly similar sequences, UniqueProt2 3 1 was used (with a HSSP cut-off value of 40) to select representative sequences from both datasets. The created representative HLD family and HLD-II subfamily datasets comprised 87 and 27 sequences, respectively. Each representative dataset was then aligned with MUSCLE2 3 0 . Analysis of evolutionary conservation and correlation. The MSA of the whole HLD-II subfamily or the MSA of LinA was used to estimate the level of conservation at individual positions. Normalized evolutionary rates for each amino acid site of the MSA were 67 calculated by the Bayesian method implemented in Rate4Site v2.012 1 6 with the WAG evolutionary model2 3 2 and maximum likelihood optimization of branch lengths using a gamma model with four discrete categories. Calculated normalized evolutionary rates were converted to CONSURF conservation grades2 1 7 , 2 1 8 . Positions with a grade > 8 were considered immutable. The MSA of the whole HLD family or MSA of LinA was used to identify positions with correlated mutations (and a threshold Correlated Mutation Analysis score > 0.8), by applying the 3DM database's Comulator online tool2 2 2 . Back-to-consensus analysis. Back-to-consensus mutations were selected by analyzing both representative MSAs of HLDs and the MSA of LinA by simple consensus and frequency ratio approaches2 1 9 . Residues from poorly aligned regions (DhaA residues 1-18, 131-179 and 279-293 from the HLD family MSA, and residues 1-14 from the HLD-II subfamily MSA) were excluded from the analysis. The simple consensus analysis was performed using the consensus cut-off of 0.5, meaning that a given residue must be present at a given position in at least 50% of all analyzed sequences to be assigned as the consensus residue. Two cutoffs were simultaneously applied in the frequency ratio analyses: a frequency cut-off of 0.2 for the maximum allowed ratio between target and conserved residue frequencies, and a minimal frequency of 0.4 for the most conserved residue. Compilation of a validation dataset. The validation dataset was compiled from ProTherm records that include experimentally determined differences between the Gibbs free energies of folding for mutant proteins and the corresponding wild type (AAG). To increase the reliability of the analysis, only records with absolute AAG values of > 0.5 kcal.mol"1 were included; the experimental error for measurements of AAG is estimated to be about 0.48 kcal.mol"1 2 1 4 . When multiple AAG values were available for a single mutation, only the value determined under the experimental conditions closest to the physiological pH of 7 was retained. The dataset was also limited to mutations in the ten most mutated proteins in ProTherm. Multiple sequence alignments for each protein in the dataset were constructed using a protocol similar to that applied in the analysis of the model enzyme DhaA. However, an automatic multi-step procedure was developed to circumvent the need 68 to manually select PSI-BLAST queries. First, the name of the relevant protein family was found in the SCOP database2 3 3 . Then, all members of the same protein family were clustered by CD-HIT using an identity threshold of 90%. Finally, up to five representatives of each resulting cluster were selected at random to constitute the set of PSI-BLAST queries. 4.6.2 Molecular modeling Preparation of protein structures. Crystal structures of wild-type DhaA (PDB ID: 1BN6, 1BN7 and 1CQW) and wild-type LinA (PDB ID: 3A76) were downloaded from the RCSB PDB database2 3 4 . PyMOL v l . 4 2 3 5 was used to model substitutions A172V, I209L and G292A in the crystal structures of DhaA to ensure their correspondence with DhaA from Rhodococcus rhodochrous (Gl number 3114657). The crystal structures were then prepared for predictions by removing ligands and water molecules. Missing side-chain atoms were added by the module of FoldX 3.09 4 . Repaired structures were minimized by the minimize_with_cst module of Rosetta8 8 , with: both backbone and side-chains optimization enabled (-sc_min_only false), distance for full atom pair potential set to 9 A (-fa_max_dis 9.0), standard weights for the score function and a constraint weight of 1 (-- constraint_weight 1.0). Output from the minimization was used to create a constraint file by script convert_to_cst_file.sh. Prediction of stability effects by FoldX. Stability effects of all possible single-point mutations were estimated using the module of FoldX. Calculations were performed five times for each mutation following the recommended protocol (pH 7, temperature 298 K, ion strength 0.050 M, VdWDesign 2). Mutations with predicted AAG averaged over all three analyzed structures smaller than -1.0 kcal.mol"1 were considered as stabilizing, while a tighter criterion was applied for detecting potentially destabilizing mutations (AAG > 0.5 kcal.mol"1 ). Prediction of stability effects by Rosetta. Protocol 16 incorporating backbone flexibility within the ddg_monomer module of Rosetta was applied according to Kellogg et al.8 8 . The soft-repulsive design energy function (-soft_rep_design weights) was used for repacking side-chains (-sc_min_only false). Optimization was performed on each whole protein 69 without distance restriction (-local_opt_only false). The previously created constraint file was used during backbone minimization (-min_cst true). Three rounds of optimization with increasing weight on the repulsive term (-ramp_repulsive true) were applied. The minimum energies from 20 and 50 iterations were used as the final parameters describing the stability effects of single- and multiple-point mutations, respectively. Mutations with AAG averaged over all three analyzed DhaA structures or three chains of LinA < -2.0 kcal.mol"1 were considered as stabilizing. The additivity of stabilizing mutations was evaluated by predicting the stability of variants with all pairs of potentially stabilizing singlepoint mutations. Mutation pairs for which the respective double-point mutants showed predicted AAG > -3.0 kcal.mol"1 were considered as antagonistic. The cumulative mutants were then prepared by combining mutually additive single-point mutations starting with the most stabilizing mutation. If there were more than one stabilizing mutations at the same position, the most favorable mutation was used. Analyses of interactions. Selected mutations were visually analyzed in PyMOL. All three crystal structures of DhaA and three chains of LinA were analyzed for the presence of sidechains involved in intra-molecular salt bridges by the ESBRI server2 3 6 and additional intramolecular interactions using the Protein Interactions Calculator server2 3 7 . An interaction had to be present in at least one of the analyzed structures or chains to be considered as important. 4.6.3 Construction of mutants and biochemical characterization Subcloning. All reagents and primers were purchased from Sigma-Aldrich unless otherwise specified. Restriction enzymes, T4 DNA ligase and accompanying buffers were purchased from New England Biolabs. Genes encoding tested Rhodococcus rhodochrous DhaA (Uniprot: P0A3G2) mutants were cloned in the pET21b (EMD Biosciences) (DhaAwt, 101, 110-112,115-116) or pAQN vector2 3 8 (DhaA63,100-103) using the restriction enzyme pairs Nde\/Hind\\\ or BamH\/Hind\\\, respectively, followed by ligation with T4 DNA ligase according to the supplier's protocol. The plasmid pET28-LinAwt encoding the wild-type LinA from Sphingomonas paucimobilis UT26 (Uniprot: P51697) was a gift from Dr. Yuji Nagata2 3 9 . 70 The genes encoding for the mutants were cloned in the pET28b (EMD Biosciences) using the restriction enzymes Nde\ and fcoRI, followed by ligation with T4 DNA ligase (Promega) according to the supplier's protocol. Correct integration of the genes was verified by sequencing (GATC Biotech) and analyzed using Clone Manager Professional (Sci-Ed Software) and BioEdit (Ibis Biosciences) software. DhaA and LinA variants were expressed with a C-terminal and N-terminal HiS6-tag, respectively, to facilitate purification. Enzyme expression and purification. The HiS6-tagged DhaA and LinA mutants were overexpressed in Escherichia coli BL21 (DE3) cells as previously described2 3 8 . Proteins were purified using Ni-NTA Superflow Cartridges (Qiagen) and a previously described method2 4 0 . Protein concentration was determined by assays with the Bradford reagent (SigmaAldrich). The purity of purified proteins was checked by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) followed by Coomassie Brilliant Blue R-250 staining. Enzyme activity assays with DhaA. Reaction mixtures containing 12 pL of substrate in 12 ml 100 mM glycine buffer (pH 8.6) were preheated at 37°C for 30 min, 240 pi of purified enzyme (0.4-1.0 mg.mL"1 ) was added to initiate the reaction, and it was monitored by withdrawing samples at periodic intervals (0-30 min). The samples were immediately mixed with 35% nitric acid to terminate the reaction. The released halide ions were measured spectrophotometrically at 460 nm after reaction with mercuric thiocyanate and ferric ammonium sulfate2 4 1 . Dehalogenating activity was quantified as the rate of halide product formation per unit time. Temperature profiles of DhaAwt and DhaA115 were evaluated by measuring their activity, as described above, at temperatures ranging from 20°C-75°C in three independent replicates. The operational stability was evaluated by measuring residual activity after incubating 1 mL enzyme samples (1.0 mg.mL"1 ) at 60°C in a heat block (Biosan Pst-100 HL). Residual activity was determined using a Microlab StarLet Manuload Liquid Handling Robot (Hamilton). Forthe residual activity measurements, 1 mLof 100 mM Glycine buffer(pH 8.6) with 1 pL 1,2-dibromoethane was incubated at 37°C for 30 min then 50 pL of heat-treated 71 enzyme solution was added to start the reaction (DhaAwt 0.1 mg.mL"1 , DhaA115 and DhaA63 1.0 mg.mL"1 ). Samples (100 pL) were taken before enzyme addition (0 min) and after 5, 10 and 15 min reaction time. Samples were then transferred to wells of a MTP microplate containing 10 pL 35% H3NO4 for inactivation. After all samples had been collected halide product was detected as described earlier. OD460nm was then measured using a Sunrise microplate reader (Tecan). Dehalogenation activities were quantified by the slope of the regression between the product concentration and time. Enzyme activity assay with LinA. The activity of the LinA variants was tested with vhexachlorocyclohexane (y-HCH) at 30°C and analyzed using GC-MS. Saturating y-HCH substrate mixtures in 1 ml 100 mM glycine buffer (pH 8.6) were preheated at 30°C for 30 min. 10 pi of purified enzyme (8.9-28.8 pg.mL"1 ) was added to initiate the reaction that was monitored by withdrawing 1 pL samples at 15 min periodic intervals (0-75 min). The samples were immediately analyzed by GC-MS. Gas Chromatograph equipped with the PAL robotic tool change system enabled fully automatized sample preparations, organic extractions and analysis. The consumption of particular substrates was quantified using gas chromatography (Trace 1300, Thermo Scientific, USA) equipped with capillary column TGSQC (30m x 0.25mm x 0.25pm, Thermo Scientific, USA) and connected with mass spectrometer (ISQ™ LT Single Quadrupole, Thermo Scientific, USA). The 1 pi samples were injected into split-splitless inlet at 250 °C, with split ratio 1:50. The temperature program was isothermal at 40 °C for 1 min, followed by increase to 250 °C at 20 "C.min"1 and hold for 4 min. The flow of carrier gas (He) was 1 ml.min"1 . The MS was operated at SCAN mode (30 to 320 amu). The temperature of ion source and GC-MS transfer line was 200 °C and 250° C, respectively. Dehydrochlorination activities were quantified following the decrease in concentration of y-HCH overtime. Steady-state kinetic measurements. Substrate to product conversion by the action of DhaAwt and DhaA115 was monitored by using the isothermal titration microcalorimeter VP-ITC (MicroCal, Piscataway, USA) at 37°C and 57°C, respectively. The substrate 1iododohexane was dissolved in 100 mM glycine buffer (pH 8.6) and the solution was 72 allowed to reach thermal equilibrium in the reaction cell (1.4 ml). The reaction was initiated by injecting 10 u.1 of enzyme solution containing either 22 u.M DhaAwt or 24 u.M DhaA115 into the reaction cell. Enzymes were dialyzed overnight against the same glycine buffer. The measured rate of heat change was assumed to be directly proportional to the velocity of the enzymatic reaction according to the Equation 13 where AH is the enthalpy of the reaction, [S] is the substrate concentration, and V is the volume of the cell. AH was determined by titrating the substrate into the reaction cell containing the enzyme. Each reaction was allowed to proceed to completion. The integrated total heat of a reaction was divided by the amount of injected substrate. The evaluated rate of substrate depletion (-d[S]/dt) and corresponding substrate concentrations were then fitted by nonlinear regression to kinetic models using Origin 6.1 (OriginLab, Massachusetts, USA). Circular dichroism (CD) spectroscopy. CD spectra (190 to 260 nm) were obtained from samples of the purified enzymes (0.20-0.25 mg.mL"1 in 50 mM phosphate buffer, pH 7.5, in a 0.1 cm quartz cuvette) to confirm their correct folding, using a Chirascan CD Spectrometer equipped with a Peltier thermostat (Applied Photophysics). Each presented spectrum is the baseline-corrected average of 5-10 scans. Mean residue ellipticity (6MRE) was calculated using Equation 14. dQ dt = -AHV d[S] dt Equation 13 0 MRE n.c.l Equation 14 73 where Q0bs is observed ellipticity in degrees, M w is the protein molecular weight, n is number of residues, / is the cell path length (0.1 cm), c is the protein concentration and the factor 100 originates from conversion of the molecular weight to mg.dmol"1 2 4 2 . Thermal unfolding of the enzyme variants was followed by monitoring the ellipticity at 222 nm over the temperature range of 20°C to 90°C, with a resolution 0.2°C, at a heating rate at l°C/min. Recorded denaturation curves of tested enzyme were fitted to sigmoid curves (Boltzmann) using OriginPro8 software (OriginLab, Massachusetts, USA). The melting temperatures (T"m) were evaluated from the collected data as a midpoint (xo) of the normalized thermal transition. Differential scanning calorimetry (DSC). Melting temperatures of the purified enzymes were determined by monitoring their heat capacity in solution (1.0 mg.mL"1 ) in 50 mM aqueous phosphate buffer (pH 7.5) and in the presence of three cosolvents: 20% acetone, 20% methanol and 50% DMSO (v/v). The measurements were acquired, after degassing, at temperatures from 20 to 100°C using the VP-capillary differential scanning calorimetry (DSC) system (MicroCal) and a l°C.min"1 heating rate. The melting point of each protein was determined as the temperature at which the heat capacity curve peaked2 4 3 . Acknowledgements We kindly thank Dr. Yuji Nagata (Graduate School of Life Sciences, Tohoku University, Japan) for providing the pET28-LinAwt plasmid. 74 4.7 Supporting Information 2 0 1 0 -20 1 1 1 -10 -5 0 5 10 15 20 Figure S 1 . Correlation between ATm and AAG values of 46 DhaA mutants. The experimentally characterized homogenous set of DhaA mutants2 2 0 , 2 3 8 employed during validation of the Rosetta approach is shown as blue squares. The red line represents the trend in the experimentally characterized mutants (correlation coefficient, 0.81). 75 A TJ'C] DhaAwt (50.3±0.4°C) DhaAlOO (51.5 ± 0.5'C) DhaAlOl (57.0± 0.5'C) DhaA102 (49.2±0.2"C) DhaA103 (53.6 + 0.5T) DhaA112(S5.5 ±0.1°C) DhaA115(71.8±0.2"C| 190 200 210 220 230 240 250 260 A (nm) LinAwt (40.0 + 0.9"C) LinAOl (S8.9 + 0.8*C) LinA02 (36.3 + 0.3°C) 265 Figure S 2. Far-UV CD spectra of the tested mutants and determined melting temperatures. A) Variants of haloalkane dehalogenase DhaA. B) Variants of y-nexachlorocyclohexane dehydrochlorinase LinA. The melting temperatures (Tm ) were evaluated as midpoints of the normalized thermal transitions. 76 O 30 60 90 120 150 180 Time (h) Figure S 3. Half-life of DhaA63 determined at 60°C in 50 mM phosphate buffer pH 7.5. Table S 1. Composition of single-point mutation dataset derived from ProTherm database. PDB Protein Organism Structural Number of mutations Number of ID Protein Organism class Total Stabilizing Destabilizing positions 2LZM Lysozyme Bacteriophage T4 a+ß 155 25 2LZM Lysozyme 1BNI Barnase Bacillus amyloliquefaciens a+ß 124 4 1BNI Barnase 1LZ1 Lysozyme Human a+ß 85 19 1LZ1 Lysozyme 1VQB Gene V Bacteriophage f l all ß 60 6 1VQB Gene V 2CI2 Chymotrypsin inhibitor Barley a+ß 56 3 2CI2 Chymotrypsin inhibitor 1CSP Cold shock protein Bacillus subtilis all ß 40 20 1CSP Cold shock protein 2RN2 Ribonuclease HI Escherichia coli a/ß 38 21 2RN2 Ribonuclease HI IB VC Myoglobin Sperm whale all a 36 5 IB VC Myoglobin 1RN1 Ribonuclease T l Aspergillus oryzae a+ß 33 6 1RN1 Ribonuclease T l 4LYZ Lysozyme Chicken a+ß 29 10 4LYZ Lysozyme 77 Table S 2. Performance of four evaluated prediction tools at different decision thresholds. Metric Tool Decision threshold [kcal.mol]c Metric Tool 2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 FoldX 0.23 0.25 0.27 0.32 0.39 0.51 0.63 0.67 0.60 0.67 0.50 Precision Rosetta 0.26 0.28 0.31 0.35 0.42 0.49 0.59 0.65 0.71 0.76 0.75 (ratios)a ERIS 0.29 0.31 0.33 0.34 0.38 0.39 0.40 0.39 0.43 0.36 0.36 CUPSAT 0.21 0.22 0.24 0.27 0.29 0.29 0.27 0.20 0.14 0.09 0.03 FoldX 115/510 114/464 112/408 109/341 93/237 77/151 39/62 20/30 9/15 4/6 1/2 Precisiona Rosetta 114/438 111/394 108/344 99/281 92/218 74/151 62/105 46/71 25/35 16/21 9/12 (absolute values) ERIS 97/331 89/285 80/240 67/195 57/151 50/127 37/92 27/70 17/40 10/28 8/22 CUPSAT 119/569 118/529 116/479 110/415 98/343 77/266 55/206 28/141 12/86 5/55 1/29 FoldX 0.74 0.65 0.55 0.32 0.27 0.14 0.04 0.02 0.01 0.00 0.00 False positive rateb Rosetta 0.60 0.53 0.44 0.35 0.23 0.14 0.08 0.05 0.02 0.01 0.01 (ratios) ERIS 0.44 0.37 0.30 0.34 0.18 0.15 0.10 0.08 0.04 0.03 0.03 CUPSAT 0.84 0.77 0.68 0.57 0.46 0.35 0.28 0.21 0.14 0.09 0.05 FoldX 395/537 350/537 296/537 232/537 144/537 74/537 23/537 10/537 6/537 2/537 1/537 False positive rateb Rosetta 324/537 283/537 236/537 182/537 126/537 77/537 43/537 25/537 10/537 5/537 3/537 (absolute values) ERIS 324/537 283/537 236/537 182/537 126/537 77/537 43/537 25/537 10/537 5/537 3/537 CUPSAT 450/537 411/537 363/537 305/537 245/537 189/537 151/537 113/537 74/537 50/537 28/537 a - Precision (true positive/(true positive + false positive)) represents the ratio between the truly stabilizing mutations and mutations predicted as stabilizing by a given tool b - False positive rate (false positive/(true negative + false positive)) represents the fraction of destabilizing mutations incorrectly predicted as stabilizing by a given tool from all truly destabilizing mutations c - The threshold possessing the highest precision and at the same time the highest number of true positives for individual tool is highlighted 78 Table S 3. Stabilizing mutations selected for the 10 most mutated proteins from ProTherm. PDB ID Rosetta" Number of mutations predicted as stabilizing by individual tool FoldX » Rosetta and FoldX' FireProtd 1BVC 105 119 26 23 1LZ1 44 60 5 4 1VQB 74 62 18 7 2LZM 99 106 18 16 4LYZ 150 39 6 5 1BNI 132 118 19 6 1CSP 31 51 4 2 1RN1 97 207 23 18 2CI2 48 37 10 9 2RN2 111 120 25 17 Total 891 919 154 107 " - Number of Rosetta predictions with AAG < -2 kcal.mol b - Number of FoldX predictions with AAG < -1 kcal.mol c - Number of predictions with Rosetta AAG < -2 kcal.mol and FoldX AAG < -1 kcal.mol d - Number of predictions identified by FireProt using criteria defined in the Methods Table S 4. Results of the energy-based analysis of DhaA. Position Residue Mutation FoldX flflG [kcal.mol1 ] Rosetta AAG [kcal.mol1 ] Antagonistic effect Interactions Mutant 20 E Q -1.09 -2.13 C128F - 128 C F M -2.21 -3.48 -8.45 -2.96 - DhaA112 148 W -1.09 -2.65 C128F - 148 T L -1.96 -2.00 - DhaA112 172 A V -1.92 -2.21 C176F - 172 A 1 -2.83 -2.16 - DhaA112 F -2.22 -7.07 - DhaA112 176 c L H M -2.01 -1.08 -2.51 -5.28 -4.82 -4.24 - - 187 D W -1.37 -2.58 - R190 W -1.36 -4.55 - DhaA112 198 D F Y L -1.98 -1.85 -1.92 -2.95 -2.75 -2.53 - - 217 N Y -2.38 -2.38 C128F - 219 V W -1.77 -3.04 - DhaA112 262 C L M -1.64 -1.42 -4.93 -2.94 - DhaA112 266 D Y -2.43 -2.90 C128F - 266 D F -2.31 -2.41 - - 79 Table S 5. Results of the frequency ratio analysis of the HLD-II subfamily. Position Residue Frequency Res_TOPa Freq_TOPb Frequency ratio FoldX AAG [kcal.mol-1 ] Interactions Mutant 20 E 0.07 S 0.44 0.17 0.38 - DhaAlOl 59 H 0.11 G 0.56 0.2 2.36 - - 77 L 0.07 1 0.44 0.17 1.90 - - 80 F 0.04 R 0.44 0.08 0.37 - DhaAlOl 128 C 0.04 F 0.41 0.09 -2.21 L237 - 132 1 0.07 V 0.59 0.12 0.92 - - 155 A 0.07 P 0.44 0.17 -0.84 - DhaAlOl 159 R 0.07 E 0.78 0.1 -0.62 E200 - 163 1 0.07 L 0.7 0.11 -0.37 - DhaAlOO 184 V 0.04 E 0.52 0.07 -0.59 - DhaAlOO 197 V 0.04 E 0.52 0.07 -0.20 - DhaAlOO 200 E 0.04 R 0.59 0.06 -0.59 R159 - 203 W 0.15 L 0.78 0.19 0.67 F152, N207 - 207 N 0.15 R 0.81 0.18 1.88 F152, W203 - 218 1 0.07 V 0.7 0.11 0.56 - - 267 1 0.11 V 0.63 0.18 0.92 - - 278 N 0.07 S 0.44 0.17 1.91 1267, L281 - a - T h e most conserved residue at a given position of the multiple sequence alignment b - Frequency of the most conserved residue at a given position of the multiple sequence alignment Table S 6. Results of the simple consensus analysis of the HLD-II subfamily. Position Residue Frequency Res_TOPa Freq_TOPb FoldX AAG [kcal.mol"1 ] Interactions Mutant 36 L 0.26 V 0.59 1.26 - - 51 1 0.44 v 0.52 0.72 - - 59 H 0.11 G 0.56 2.36 - - 93 E 0.3 D 0.52 0.69 R122 - 119 N 0.3 H 0.63 -0.80 W115, R122 - 132 1 0.07 V 0.59 0.92 - - 159 R 0.07 E 0.78 -0.62 E200 - 161 L 0.22 M 0.59 0.01 - DhaA102 162 1 0.3 V 0.56 0.48 - DhaA102 163 1 0.07 L 0.7 -0.37 - DhaAlOO 169 1 0.37 V 0.52 0.76 - - 184 V 0.04 E 0.52 -0.59 - DhaAlOO 197 V 0.04 E 0.52 -0.20 - DhaAlOO 198 D 0.19 S 0.56 -0.45 - DhaA102 200 E 0.04 R 0.59 -0.59 R159 - 202 L 0.19 T 0.52 3.07 - - 203 W 0.15 L 0.78 0.67 F152, N207 - 205 F 0.26 W 0.7 2.30 - - 207 N 0.15 R 0.81 1.88 F152, W203 - 218 1 0.07 V 0.7 0.56 - - 241 G 0.15 A 0.67 0.80 - - 267 1 0.11 V 0.63 0.92 - - 273 Y 0.15 F 0.67 0.30 N41 - 285 E 0.11 A 0.52 -0.23 K263 - a - T h e most conserved residue at a given position of the multiple sequence alignment b - Frequency of the most conserved residue at a given position of the multiple sequence alignment Table S 7. Results of the frequency ratio analysis of the HLD family. 80 Position Residue Frequency Res_TOPa Freq_TOPb Frequency ratio FoldX AAG [kcal.mol1 ] Interactions Mutant 27 V 0.05 E 0.49 0.09 1.42 - 188 H 0.07 A 0.51 0.14 -0.04 DhaA103 191 E 0.1 A 0.55 0.19 0.10 DhaA103 271 L 0.09 G 0.57 0.16 2.92 " - The most conserved residue at a given position of the multiple sequence alignment b - Frequency of the most conserved residue at a given position of the multiple sequence alignment Table S 8 Results of the simple consensus analysis of the HLD family. Position Residue Frequency Res_TOPa Freq_TOPb FoldX AAG [kcal.mol"1 ] Interactions Mutant 44 S 0.3 W 0.69 12.12 Y46 - 55 V 0.16 L 0.53 0.31 - DhaA103 109 s 0.21 G 0.71 2.13 D106.1132 - 111 L 0.37 1 0.55 1.26 - - 127 A 0.25 V 0.54 -2.32 - DhaA103 130 E 0.3 N 0.67 0.21 V245. L246. 1247. L271. H272 - 188 H 0.07 A 0.51 -0.04 - DhaA103 191 E 0.1 A 0.55 0.10 - DhaA103 209 L 0.13 1 0.54 0.64 - - 244 G 0.31 D 0.68 14.59 - - 271 L 0.09 G 0.57 2.92 - - 273 Y 0.28 F 0.51 0.30 N41 - a - The most conserved residue at a given position of the multiple sequence alignment b - Frequency of the most conserved residue at a given position of the multiple sequence alignment Table S 9. Results of the energy-based analysis of LinA. Position Residue Mutation FoldX AAG [kcal.mol-1 ] Rosetta AAG [kcal.mol1 ] Interactions Mutant 3 D 1 -1.234 -3.017 3 D L -1.664 -2.867 19 D M -1.460 -2.388 R79 127 S Y -2.165 -1.952 LinAOl 145 A H -1.254 -4.888 LinAOl 133 T 1 -2.138 -3.423 LinAOl 133 T W -2.494 -3.308 133 T L -1.323 -1.986 81 Table S 10. Results of the consensus analysis of LinA. Position Residue Frequency Res_TOPa Freq_TOPb FoldX AAG [kcal.mol1 ] UniProt database Mutant 20 K 0.62 Y 0.15 -1.437 Halide-stabilizing 23 A 0.69 G 0.23 1.778 32 L 0.62 F 0.38 4.379 50 Y 0.54 F 0.38 -0.510 LinA02 56 A 0.54 1 0.38 5.508 59 L 0.62 A 0.38 3.642 68 F 0.62 W 0.31 0.000 LinA02 80 L 0.54 V 0.38 1.368 88 V 0.62 A 0.38 2.694 96 L 0.77 C 0.15 2.777 109 1 0.62 V 0.23 0.665 113 F 0.69 Y 0.23 0.138 Activity decrease 126 F 0.54 1 0.23 1.624 131 A 0.62 V 0.15 -1.492 LinA02 144 F 0.54 L 0.23 1.316 a - T h e most conserved residue at a given position of the multiple sequence alignment b - Frequency of the most conserved residue at a given position of the multiple sequence alignment Table S 11. Predicted effects of DhaA115 and LinAOl mutations on its stability. Origin Enzyme Mutation FoldX AAG [kcal.mol1 ] Rosetta AAG [kcal.mol1 ] Location Secondary structure Rank3 Structural basis of stabilization C128F -2.2 -8.5 Buried Sheet 250 Improved packing T148L -2.0 -2.0 Tunnel Helix 41 Enhanced hydrophobic interactions A172I -2.8 -2.2 Tunnel Helix 151 Enhanced hydrophobic interactions DhaA C176F -2.2 -7.1 Tunnel Loop 169 Improved packing Energy- DhaA D198W -1.4 -4.5 Surface Helix 112 Improved packing based V219W -1.8 -3.0 Buried Helix 109 Improved packing Approach C262L -1.6 -4.9 Buried Sheet 66 Enhanced hydrophobic interactions D266F -2.3 -2.4 Surface Sheet 115 Improved packing D3I -1.2 -3.0 Surface Helix 151 Improved packing LinA S127Y -2.2 -2.0 Buried Sheet 58 Improved packing LinA T133I -2.1 -4.9 Buried Sheet 38 Enhanced hydrophobic interactions A145H -1.3 -3.4 Surface Loop 91 Improved packing Evolution E20S 0.4 0.5 Surface Sheet 96 Neutral/electrostatics -based DhaA F80R 0.4 1.6 Surface Helix 98 Neutral/electrostatics Approach A155P -0.8 -0.7 Surface Loop 4 Increased rigidity " - The listed variant are ordered from the most flexible to the most rigid according to average residue B-factors 82 Table S 12. Examples of methods providing enzymes with outstanding stabilization. Experimental work Method Principle Protocol Tested / successful Enzyme Stability improvement Relative activitya Number of mutations Reference mutants Directed evolution GSSM Saturating every position SSM 121.000 10 Haloalkane dehalogenase A T M = +18°C 150% 8 221 GSSM Saturating every position SSM 74.000 10 Xylanese A T M = +35°C 100% 9 244 RM Introducing point mutations randomly epPCR 19.000 16 Phosphite dehydrogenase A T 5 0 1 0 = +20°C 160% 12 245 Computational prediction ofhotspots B-FIT Targeting the most flexible residues ISM 19.000 61 Epoxide hydrolase A T M = +21°C 500% 10 207 HotS pot Wizard Targeting tunnel residues SSM 5.000 5 Haloalkane dehalogenase A T M = +19°C 40%b 4 220 PISA Targeting interface residue ISM 4.000 17 D-tagatose 3-epimerase A T 5 0 2 0 = +23°C 64% 8 71 Computational prediction of single-point mutants FRESCO Disulfide bridge design, free energy calculations SDM 67 24 Epoxide hydrolase A T M = +36°C 250% 10 211 ROSETTAVIP Improving packing in protein interior by free energy calculation SDM 6 4 Methionine aminopeptidase A T M = +18°C 70% 5 85 Environmental energy optimization Tobacco S C A D S Environmental energy optimization S D M 1 1 5-epi-aristolochene synthase A T M = +45°C 2% 224 Computational prediction of multiple-point mutants Free energy Gene synthesis Haloalkane dehalogenase FIREPROT calculations and consensus design Gene synthesis 6 4 Haloalkane dehalogenase A T M = +25°C 128% 11 This study r m - melting temperature; T 5 0 x - temperature at which 50% of activity is lost after X minutes of incubation; RM - random mutagenesis; epPCR-error prone polymerase chain reaction; S S M - site saturation mutagenesis; ISM - iterative saturation mutagenesis; S D M - site-directed mutagenesis a - Activity of a mutant compared to the wild-type at temperatures optimal for each protein b - Activities measured at 37°C in 40% D M S O c - Predicted single-point mutations were recombined in a multipoint mut 83 5 Computer-Assisted Engineering of Hyperstable Fibroblast Growth Factor 2 Pavel Dvorak1 '2 *, David Bednař12 **, Pavel Vanacek1 , 2 *, Lukas Balek2 # , Livia Eiselleova3 ", Veronika Štěpánková1 , 4 , 5 , Eva Šebestová1 , 2 , Michaela Kunová Bosákova3 , Žaneta Konečná3 , Stanislav Mazurenko1 , 2 , Antonín Kuňka1 , 2 , Daniel Horák6 , Radka Chaloupková1 , 2 , 4 , Jan Brezovsky1 , 2 , 4 , Pavel Krejci2 , 3 , 4 , Zbyněk Prokop1 , 2 , 4 , 5 , Petr Dvorak3 , 4 *, Jiri Damborsky1 , 2 , 4 , 5 * 1 Loschmidt Laboratories, Centre for Toxic Compounds in the Environment RECETOX and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic 2 Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic 3 Department of Biology, Faculty of Medicine, Masaryk University, 625 00 Brno, Czech Republic 4 International Clinical Research Center, St. Anne's University Hospital, Pekařská 53, 656 91 Brno, Czech Republic 5 Enantis Ltd., Biotechnology Incubator INBIT, Kamenice 34, 625 00 Brno, Czech Republic 6 Institute of Macromolecular Chemistry, Academy of Sciences of the Czech Republic, Heyrovskeho 2,162 06 Prague 6, Czech Republic # authors contributed equally Manuscript under review in Scientific Reports 84 5.1 Abstract Fibroblast growth factors (FGFs) play numerous regulatory functions in complex organisms, and their corresponding therapeutic potential is of growing interest to academics and industrial researchers alike. However, applications of these proteins are limited due to their low stability in vivo and in vitro190 . Here we tackle this problem using a generalizable computer-assisted protein engineering strategy to create a modified FGF2 with nine mutations displaying unprecedented stability and uncompromised biological function. 5.2 Main paper The structurally related and highly conserved polypeptides from the family of FGFs are involved in a number of physiological processes in diverse animal species including humans. There has been a considerable effort in investigating members of the FGF family for applications in pharmacy and bioengineering1 9 0 . Specifically, human FGF2 serves as a pleiotropic regulator of proliferation, differentiation, migration, and survival in a variety of cell types and has been studied as a promising agent in treatment of cardiovascular diseases2 4 6 , cancer2 4 7 and mood disorders2 4 8 . It has also been shown to have efficacy in ulcer, wound2 4 9 , 2 5 0 and epithelium healing2 5 1 and is being routinely used as an essential component of cultivation media for human embryonic stem cells2 5 2 . The long-term maintenance of growth factors in the tissue or media is desirable for protein therapies and stem cell culturing, but is hindered by low thermal stability of the molecules and their limited half-life1 9 0 '2 5 3 , 2 5 4 . Stability of FGFs was shown to be enhanced by coadministration with heparin2 5 5 , conjugation to heparin-mimicking polymers2 5 6 , encapsulation in microspheres1 9 6 , or fusion with proteoglycan2 5 7 . However, these strategies possess various limitations: They might significantly affect protein activity and are of prime concern from both safety and economic standpoints. Development of soluble, heparin-independent FGF analogues with improved stability is clearly needed for broader use of these interesting molecules1 9 0 . 85 Protein engineering is a powerful approach for protein stabilization2 2 0 , 2 5 8 . Stabilization by engineering has been previously applied to highly unstable FGF1, providing up to 40-fold improved mitogenic activity half-life by introducing stabilizing mutations at N and C terminus |3-strand interactions of a |3-barrel architecture2 5 9 , 2 6 0 . Triple2 6 1 and quintuple2 6 2 FGF2 mutants with improved stability and up to 10-fold prolonged activity in cell culture were recently prepared by aligning protein sequences of wild-type FGF2 with previously reported stabilized FGF1 mutants or by employing and combining individual stabilizing mutations published elsewhere. Nevertheless, the true potential of state-of-the-art protein engineering strategies has not yet been exploited in the design of stable FGFs. Here, we describe engineering of a unique nine-point mutant of low molecular weight isoform FGF2 with melting temperature (7"m) increased by 19°C and in vitro functional half-life at 37°C improved from 10 hours to more than 20 days. This was achieved by following a computerassisted engineering strategy combining energy-based and evolution-based analyses2 6 3 with focused directed evolution (Figure 16). We demonstrate that the developed molecule holds a great promise both for in vitro and in vivo applications. Using our platform, we proposed 12 single point mutations (R31L, R31W, V52T, H59F, C78Y, N80G, L92Y, C96Y, S109E, R118W, T121K, and V125L) and 7 positions for randomization (E54, C78, R90, S94, C96, T121, and S152) to stabilize FGF2 by computationally searching for mutations that would minimize the Gibbs free energy of the native state combined with a back-to-consensus analysis (Figure 16 step I, Figure S 4, Table S 5). Three out of twelve most stabilizing substitutions were identified by the phylogenetic approach2 1 9 and nine mutations were predicted from energetics using FoldX9 4 and Rosetta ddg monomer8 8 (Table S 13, Table S 14). Seven positions with the highest number of potentially stabilizing mutations as defined by the difference in their free energy (aaG) < -1 kcal.mol"1 were considered for randomization in order to look for further beneficial substitutions overlooked by the rational approach (Table S 5). Functionally relevant positions, namely those residues within heparin or receptor binding sites, were excluded from the designs to avoid mutations compromising activity. 86 FGF2-G0 FGF2-G1A FGF2-G2 FGF2-G3 Figure 16. Integrated strategy combining computational analyses with focused directed evolution for engineering hyperstable FGF2. The workflow starts with the template wild-type molecule FGF2-G0. Initially, 12 computationally designed point mutants (FGF2-G1A) were constructed and tested leading to 7 stabilizing substitutions at 6 different positions with A T m between 0.9 and 3.7'C. Six selected substitutions were recombined giving rise to the second generation mutant FGF2-G2 with A T m of 15°C. Subsequently, 420 clones from 7 focused site-saturation mutagenesis libraries were screened for enhanced thermostability and retained biological activity (FGF2-G1B), revealing 14 beneficial substitutions at 5 randomized positions with ATm between 0.3 and 2.9'C. Guided by computational predictions, 5 substitutions from rational design were combined with 4 mutations from semi-rational design providing the third generation protein FGF2-G3 with ATm of 19°C. For the purpose of in vitro mutagenesis (Figure 16 steps II and IV), the gene encoding wildtype FGF2 was subcloned into vector pET28b with cleavable N-terminal His tag giving rise to the recombinant variant designated FGF2-G0 (Table S 5). The numbering of mutations used herein corresponds to the original sequence of wild-type FGF2 (c). The genes of 12 rationally designed single-point mutants representing the first generation of engineered FGF2 (FGF2-G1A) were synthesized. Twelve mutants were produced in soluble form in Escherichia coli BL21(DE3) in quantities similarto that of FGF2-G0 (± 5% of the total soluble protein) and purified to homogeneity using affinity chromatography. Biophysical characterization (Figure 16 step II) verified proper folding of all 12 mutants and improved Tm for 7 out of 12 constructed variants (Table S 15). Six beneficial mutations 87 (R31L, V52T, H59F, L92Y, C96Y, and S109E) were re-combined in the cumulative mutant FGF2-G2 (Figure 16 step III), which was obtained at a yield of 20 mg.L"1 . The experimental 7"m of properly folded FGF2-G2 (68.0 ± 0.2°C) was about 14.5°C higher than 7"m of FGF2-G0 (53.5 ± 0.5°C). This value corresponds well with the theoretical sum of contributions of individual substitutions, suggesting strictly additive stabilizing effect of mutations in FGF2G2 (Tables 16). In the next step, 7 computationally pre-selected positions in FGF2-G0 were randomized (Figure 16 step IV) using fixed oligo technology, which reduced the screening effort needed to only 60 clones per library to obtain full coverage. Crude extracts of individual E. coli clones were prepared in microtiter plates and used to test simultaneously for biological activity and stability of FGF2 mutants in growth arrest assay with rat chondrosarcoma cells (Figure S 6). Altogether, 33 new FGF2-G1B mutants were identified during the screening and successfully produced in E. coli to the levels similar to that of FGF2-G0. Thermal shift assays with purified proteins revealed 14 stabilizing amino acid substitutions in 5 out of 7 randomized positions (Table S 17). Mutations showing an improved thermostability of at least 1°C (E54D, S94I, C96N, and T121P) were computationally merged with existing mutations in FGF2-G2 (R31L, V52T, H59F, L92Y, C96Y, and S109E, while replacing C96Y by C96N) leading to the third generation variant FGF2-G3 (Figure 16 steps V and VI and Figure 17a). 88 Half-life of secondary structure of FGF2 variants (in hours). Temp. (°C) GO G2 G3 37 6.3 >24 >24 50 2.0 >24 >24 65 <0.1 2.6 >24 70 <0.1 0.3 1.6 Thermal unfolding Figure 17. Biophysical and biological characteristics of FGF2 variants with boosted stability, (a) Structural model of the most stable FGF2-G3 variant with visualized amino acid substitutions originating from rational (yellow spheres) and semi-rational (red spheres) steps of the engineering strategy, (b) Circular dichroism (CD) spectra of selected FGF2 variants exhibiting a broad positive peak centered near 227 nm and a minimum at around 204 nm, characteristic for B-rich proteins of B-ll type, (c) Half-life of FGF2 secondary structure of selected FGF2 variants determined by CD spectroscopy, (d) The schematic unfolding pathway of variants described by a three-step model with irreversible unfolding of native state (N) through one intermediate (I) to denatured protein (D) followed by formation of aggregates (A); Gibbs free energies of the states depicted in grey were not quantified because of irreversibility, (e) Determination of the in vitro functional half-life of selected FGF2 variants by activation of ERK pathway. Both FGF2-G2 and FGF2-G3 retained most of their original activity for 20 days of experiment, while FGF2-G0 lost half of its activity within initial 10 hours as determined by densitometric analysis of the corresponding Western blots (Figures 11). (f) FGF2-G2 and FGF2G3 support proliferation of human embryonic stem cells better than the wild type. Columns show means, error bars represent standard error of the mean from three independent experiments. Student's t-test, **p<0.01, *p<0.05. Abbreviations: MRE, mean residual el I ipticity; Ctrl, control with no FGF2; h, hours; d, days. Purified FGF2-G3, obtained with the yield of 11 mg.L"1 , was initially characterized in vitro using a wide spectrum of biophysical methods (Figure 16 step VII). The properly folded 89 mutant (Figure 17b) exhibited a T"m of 72.2 ± 0.1°C and AT"m of 18.7°C compared with FGF2GO, showing again the additive stabilizing effect of all new mutations, precisely predicted by the free energy calculation (theoretical AT"m of 19.2°C). In terms of structural integrity at elevated temperatures, FGF2-G3 clearly outperformed both FGF2-G0 and FGF2-G2, showing the half-life of its secondary structure of >24 h at 50°C and 1.6 h at 70°C (Figure 17c and Figure S 7). Fitting the data to a two-step unfolding mechanism suggested that gained stability was due to an increase in the Gibbs activation energy of the first unfolding step, which is in excellent agreement with our overall computational strategy of lowering the energy of the native state (Figure 22d, Supplementary Results, Table S 18 and Figure S 8, Figures 9, Figures 10). Biological assays confirmed the enhanced stability of engineered growth factors translated into their prolonged activity in vitro and in vivo (Figure 16 step VII). Specifically, while the activity of FGF2-G0 in an ERK1/2 assay with human embryonic stem cells dropped to 50 % within initial 10 ± 2 hours of pre-incubation in conditioned medium at 37°C (Figure 17e and Figure S 11), which is in good agreement with data obtained previously by different techniques1 9 0 , 2 6 4 , G2 and G3 mutants retained most of their activity for the full length of the study period (20 days) at the physiological temperature. In the second assay, the human embryonic stem cells were propagated in medium conditioned with each of the tested FGF2 variants with no additional supplementation, and the cell numbers and morphology were recorded for five consecutive passages (Figure 17f and Figure S 12). While conditioned medium prepared with FGF2-G0 caused significant growth retardation, the cells incubated in the medium with either G2 or G3 mutant gave rise to monolayers, suggesting that repeated supplementation of the conditioned medium is not required with these proteins. lmmunostainingforOct-4and Nanog after five passages on Matrigel proved that all tested FGF2 variants support expression of pluripotency markers (Figure S 13). The biological activity of the stabilized variant was confirmed in vivo by injecting either FGF2-G0 or FGF2-G3, sorbed into a non-biodegradable hydrogel, into the shaved dorsal skin of telogenic C57BL/6 mice (Figure S 14 and Figure S 15). It is known that growth factors 90 from FGF family have positive effect on hair growth and hair follicle stimulation2 6 5 . We evaluated the degree of black pigmentation and hair growth photometrically by observing the skin color for 20 days. In 2 weeks, both FGF2 induced black coloration in the shaved skin with remarkably stronger hair growth induction in group injected with thermostable FGF2 (Figure 18a, Figure S 15). The control group with empty hydrogel showed significant retardation in transition from telogen to anagen stage as seen on skin coloration. The hair length of plucked hair in the proximity of injected site confirmed that thermostable FGF2 stimulated growth of hairs more significantly comparing to FGF2-G0 and the control group with no FGF2 applied (Figure 18b). No toxic effects of recombinant proteins were observed. 120 a b DAY 1 DAY 13 DAY 16 Figure 18. Effect of the selected FGF2 variants on hair growth promotion, (a) 7-week old C57BL/6 mice were shaved and injected with empty hydrogel or hydrogel sorbed with FGF2-G0 or FGF2-G3 and the hair growth was recorded during 20 consecutive days, (b) Hair length of C57BL/6 mice measured at the day 20 post application of FGF2 variants. Hairs (n=20) were plucked in the proximity of injection site. Columns show 91 means, numbers within indicate the quantity of hairs counted per group, error bars show standard error of the mean. Student's t-test,***p<0.001 versus control group. In summary, we constructed a hyperstable nine-point mutant FGF2-G3 using a hybrid computational engineering strategy2 6 3 and focused directed evolution by constructing only 14 mutants and testing only 420 clones. The FGF2-G3 shows a 19°C increase in melting temperature and greater than 48-fold improved half-life at 37°C, representing the most thermostable FGF2 with fully preserved biological activity known to date. Our hyperstable FGF2 supports the undifferentiated growth of human embryonic stem cells, induces the appropriate downstream signaling cascade, and stimulates hair growth in mouse model, demonstrating that its biological activity towards highly sensitive cells remained unmodified. We anticipate this construct will be directly applicable to stem cell culturing1 9 6 , and will find broad use in clinical medicine1 9 0 , 2 5 6 , cosmetics2 6 5 , and dietary supplements. The successful demonstration of rapid protein stabilization highlights the power of our rational protein engineering strategy and should encourage wider use of the described workflow for stabilizing growth factors and other protein therapeutics that are currently being tested for use in cancer treatment, regenerative medicine, or a number of metabolism-associated disorders1 9 0 . Acknowledgements The work was supported by the Grant Agency of the Czech Republic (GA16-06096S and GA15-23033S), Ministry of Education of the Czech Republic (L01214, LQ1605, LM2015051, LM2015047 and LM2015055), and Ministry of Health of the Czech Republic (15-33232A). MetaCentrum and CERIT-SC are acknowledged for providing access to computing facilities (LM2015042, LM2015085). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 92 5.3 Methods 5.3.1 Prediction of stabilizing effect of single-point mutations by evolution-based approach Multiple sequence alignment and evolutionary conservation analysis. The FGF2 isoform 3 protein sequence (UniProt identifier: P09038-2) was used as a query for PSI-BLAST2 2 6 search against nr database of NCBI. PSI-BLAST was performed with the E-value thresholds of 10"1 for the initial BLAST search and the threshold of 10"5 for inclusion of the sequence in the position specific matrix. Sequences collected after 3 iterations of PSI-BLAST were clustered by CD-HIT2 2 8 at the 90% identity threshold. Resulting dataset of 554 sequences was clustered with CLANS2 2 9 using default parameters and varying P-value thresholds. Sequences clustered together with FGF2 at the P-value of 10"3 0 were extracted and aligned with the MUSCLE program2 3 0 . The alignment was refined manually in BioEdit2 6 6 . All incomplete or diverged sequences were removed. The final alignment comprising 238 sequences was used to estimate the level of conservation of individual sites within the FGF2 related proteins. Relative evolutionary rates for individual positions were calculated by the Rate4Site v2.01 program2 1 6 using the empirical Bayesian method2 6 7 and WAG model of evolution2 3 2 . The evolutionary rates were then converted to the ConSurf conservation scale2 1 7 . Selection of individual mutations. The multiple sequence alignment comprising 238 FGF2 protein sequences was used as an input for back-to-consensus analysis using the simple consensus approach. The analysis was performed using the consensus cut-off of 0.5, meaning that a given residue must be present at a given position in at least 50% of all analysed sequences to be assigned as the consensus residue. Stability effects of all possible single-point mutations in FGF2 protein were estimated by free energy calculations (see following section for the calculation details). Only mutations with average Gibbs free energy (AAG) < 1 kcal.mol"1 predicted by both FoldX9 4 and Rosetta8 8 were considered as hot-spots for FGF2 stabilization. Functionally important sites of FGF2 were excluded as potentially deleterious mutations for biological function. Results of the back-to-consensus 93 analysis are summarized in Supplementary Table SI. The numbering corresponds to the sequence of wild-type human FGF2 (Supplementary Fig. 2c). Ten mutations were excluded based on the high value of predicted AAG or its high uncertainty of the prediction, and three mutations were discarded from the design due to their location at functionally important positions for the heparin binding. Four single-point mutations V52T, N80G, L92Y and S109E passed all criteria and were selected for experimental construction and characterization. 5.3.2 Prediction of stabilizing effect of single-point mutations by energy-based approach, selection of positions for randomization Available structures of FGF2 with resolution higher than 2.20 A (PDB-ID codes: 1BFG, 4FGF, 2FGF, 1BAS, 1BFB, 1BFC, 1BFF, 1EV2,1FGA) were downloaded from the RCSB Protein Data Bank2 3 4 . The structures were visualized in PyMOL molecular graphics system v 1.7.7.4 (Schrodinger LLC, USA) and prepared for analyses by removing ligands and water molecules. Chain A was chosen in the case of multiple chain structure. Missing atoms in side chains were added by module of FoldX9 4 . Prediction of stability effects by FoldX. Stability effects of all possible single-point mutations were estimated using the FoldX module. Calculations were performed 5-times for each mutant following the recommended protocol (pH 7, temperature 298K, ion strength 0.050 M, VdWDesign 2). For each mutation, the total AAG value was calculated by averaging all AAG values obtained for a respective mutation in all analyzed FGF2-G0 crystal structures. Prediction of stability effects by Rosetta. Repaired structures were minimized by the minimize_with_cst module of Rosetta8 8 with both backbone and side-chains optimization enabled (-sc_min_only false), distance for full atom pair potential set to 9 A (-fa_max_dis 9.0), standard weights for the score function and a constraint weight of 1 (-- constraint_weight 1.0). Output from the minimization was used to constraint Ca atoms with harmonic function within 0.5 A distance from the initial position in the crystal structure. Protocol 16 incorporating backbone flexibility within the ddg_monomer module 94 of Rosetta was applied according to Kellogg and co-worker8 8 . The soft-repulsive design energy function (soft_rep_design weights) was used for repacking side-chains (-- sc_min_only false). Optimization was performed on each whole protein without distance restriction (-local_opt_only false). The previously created constraint file was used during backbone minimization (—min_cst true). Three rounds of optimization with increasing weight on the repulsive term (-ramp_repulsive true) were applied. The minimum energies from 20 iterations were used as the final parameters describing the stability effects of single -point mutations. Selection of individual mutations. Mutations predicted as stabilizing by either FoldX or Rosetta tool (AAG < -1.0 kcal.mol"1 ) and not significantly constrained during evolution (Consurf conservation score < 8) were selected for further analysis. In this way, the potentially stabilizing mutations with only a limited influence on functional regions, e.g., heparin binding residues, were identified. Residues forming the FGF2/FGFR1 interface (PDB-ID code 1CVS) and FGF2/FGFR2 interface (PDB-ID code 1EV2) were identified using the PISA server2 6 8 and were discarded from the selection. Nine single-point substitutions were selected for experimental construction and characterization: R31W, R31L, H59F, C78Y, L92Y, C96Y, R118W, T121K and V125L (Supplementary Table 2). Interestingly, the L92Y mutation was identified also previously by evolutionary-based approach. The numbering of these mutants corresponds to the sequence of wild-type human FGF2 (Supplementary Fig. 2c). Selection of positions for saturation mutagenesis. Positions for saturation mutagenesis that should reveal additional stabilizing mutations were proposed using Rosetta calculations. Every protein position was saturated by all twenty proteinogenic amino acids and number of stabilizing mutations (AAG < -1.0 kcal.mol"1 ) was identified for individual positions. Positions with conservations score > 7 or situated on the functional regions were discarded. Seven positions with the highest number (> 3) of stabilizing mutations (E54, C78, R90, S94, C96, T121, and S152) were selected for saturation mutagenesis (Supplementary Table 2). Positions 31 and 59 were discarded from selection because significant 95 improvement in thermostability (A7"m=4°C and 3°C, respectively) was verified experimentally for the mutations R31L and H59F (Supplementary Table 3). Therefore, the probability of further considerable improvement was negligible. 5.3.3 Construction, production and characterization of single point FGF2-G1A variants Twelve FGF2-G1A variants R31W, R31L, V52T, H59F, C78Y, N80G, L92Y, C96Y, S109E, R118W, T121K and V125L were commercially synthesized (GeneArt/Life Technologies, Germany) and subcloned in the Ndel and Xhol sites of pET28b-His-thrombin downstream inducible T7 promotor. E.coli BL21(DE3) cells were transformed with expression vectors, plated on agar plates with kanamycin (50 ug.ml"1 ) and grown overnight at 37°C. Single colonies were used to inoculate 10 ml of LB medium with kanamycin and cells were grown overnight at 37°C. Overnight culture was used to inoculate 200 ml of LB medium with kanamycin. Cells were cultivated at 37°C. The expression was induced with IPTG to a final concentration of 0.25 mM. Cells were then cultivated overnight at 20°C. At the end of cultivation, biomass was harvested by centrifugation and washed by purification buffer A (20 m M di-potassium hydrogenphosphate and potassium dihydrogenphosphate, pH 7.5, 0,5 M NaCI, 10 mM imidazole). Cells in suspension were disrupted by sonication using ultrasonic processor Hielscher UP200S (Teltow, Germany) with 0.3 s pulses and 85 % amplitude. Cell lysate was centrifuged for 1 h at 21,000 g at 4°C. FGF2 variants were purified from crude extracts using single step nickel affinity chromatography. Crude extracts were applied to a 5 ml Ni-NTA Superflow column (QIAGEN, USA). Column was attached to FPLC Akta (Amersham Pharmacia Biotech, USA). The buffer system consisted of buffer A and buffer B (20 mM dipotassium hydrogenphosphate and potassium dihydrogenphosphate, pH 7.5, 0,5 M NaCI, 500 mM imidazole). FGF2 proteins were eluted with a one-step increasing linear gradient of 0 to 100 % buffer B in 20 column volumes. The presence of FGF2 in peak fractions was proved by SDS-PAGE using 15 % polyacrylamide gel stained with Coomassie Brilliant Blue R-250 dye (Fluka, Buchs, Switzerland). Fractions with FGF2 were pooled and concentration of total protein was determined by Bradford method (Sigma-Aldrich, St. Louis, USA). 96 Precipitation of FGF2 variants was minimized by dialysis against 20 m M potassium phosphate buffer containing 750 mM NaCI. Purified proteins were stored at 4°C. Differential scanning calorimetry and circular dichroism spectroscopy. The thermostability of FGF2-G1 mutants was determined by differential scanning calorimetry (DSC) assay. Thermal unfolding of 1.0 mg.ml"1 protein solutions in 50 mM phosphate buffer (pH 7.5) with 750 mM sodium chloride was followed by monitoring the heat capacity using the VP-capillary DSC system (GE Healthcare, USA). The measurements were performed at the temperatures from 20 to 80°C at lT.min"1 heating rate. Tm was evaluated as the top of the Gaussian curve after manual setting of the baseline. Proper folding of all mutants was verified by circular dichroism (CD) spectroscopy. CD spectra of mutants dialyzed in 50 mM phosphate buffer pH 7.5 and diluted to the concentration of 0.2 mg.ml"1 were recorded at 20°C using a spectropolarimeter Chirascan (Applied Photophysics, United Kingdom) equipped with a Peltier thermostat. Data were collected from 200 to 260 nm, at 100 nm.min"1 , 1 s response time and 2 nm bandwidth using a 0.1 cm-quartz cuvette. Each spectrum is the average of five individual scans and is corrected for absorbance caused by the buffer. Collected CD data were expressed in terms of the mean residue ellipticity. Thermal unfolding was followed by monitoring the ellipticity at 232 nm over the temperature range from 20 to 80°C at a heating rate l°C.min"1 . Recorded thermal denaturation curves of FGF2 variants were normalized to represent signal changes between approximately 0 and 1 and fitted to sigmoidal curves. The melting temperatures were evaluated from the collected data as a midpoint of the normalized thermal transition. 5.3.4 Free energy calculations, construction and thermostability analysis of FGF2-G2 mutant All mutations improving the melting temperature by at least 0.5°C were selected for in silico analysis using Rosetta ddg monomer application8 8 as described earlier. In the case of two stabilizing mutations on the same position, the mutation with the larger effect was chosen. Selected mutations were then tested for potential antagonistic effect. The additivity of stabilizing mutations was evaluated by predicting the stability of variants with all pairs of 97 stabilizing single-point mutations (Supplementary Table 4). Mutation pairs for which the respective double-point mutants showed lower stability in the comparison with the sum of both single-point mutants taken separately would be considered as antagonistic. However, none of the double-point mutants had the difference > 1 kcal.mol"1 suggesting the absence of any significant antagonistic effects among the selected mutations. All mutations improving Tm by at least 0.5°C (R31L, V52T, H59F, L92Y, C96Y and S109E) were combined into 6-point mutant designated FGF2-G2. The gene of multiple-point mutant was commercially synthesized (GeneArt/Life Technologies, Germany), subcloned in the Ndel and Xhol sites of pET28b-His-Thrombin and expressed in E. coli BL21(DE3) cells as described before. DSC was used to characterize protein thermal stability of protein purified by affinity chromatography. DSC data collection was performed over a temperature range of 20°C- 100°C. 5.3.5 Construction and screening of focused site-saturation mutagenesis libraries Altogether 7 focused site-saturation mutagenesis libraries were constructed commercially using "Fixed Oligo" technology of GeneArt (Life Technologies, USA). The pET28b-Histhrombin::/g/2 was used as a template for randomization. Plasmid DNA was transformed into E. coli XJb (DE3) Autolysis cells (Zymo Research, USA). Cells were streaked on LB agar plates with kanamycin (50 pg.ml'^and incubated overnight at 37°C. Plates with colonies carrying negative control (empty pET28b), positive control (plasmid pET28b-Histhrombin::/g/2-G2) and background control (pET28b-His-thrombin::/o//2) were prepared correspondingly. Preparation of libraries for screening. Single colonies were used for inoculation of individual wells in 1 ml 96 deep-well plates (Thermo Fisher Scientific, USA) containing 250 pi of LB medium with kanamycin (50 pg.ml"1 ). Colonies were transferred by colony picking robot CP7200 (Hudson Robotics, USA) or using sterile wooden toothpicks. Plates were incubated overnight at 37°C with shaking (200 rpm) in shaking incubator NB-205 (N-Biotec, South Korea) in high humidity chamber to avoid evaporation of the medium. After 16 hrs, 50 pi of culture from each well was transferred to the new microtiter plate containing 50 98 u.1 of sterile 40 % glycerol per well and resulting plates were stored at -70°C as replicas. Expression of chromosomally inserted A. lysozyme and mutant variants of FGF2 in original plates with remaining 200 u.1 of overnight culture was induced by addition of 800 u.1 of fresh LB medium with kanamycin, IPTG and L-arabinose to the final concentration of 50 pig-ml"1 , 0.25 mM and 3 mM, respectively. Plates were incubated overnight at 20°C with shaking (180 rpm). After 22 hrs, the plates were centrifuged for 20 min (3000 g, 4°C) using Sigma 6- 16K (Sigma Laborzentrifugen, Germany). Supernatant was drained using JP recirculating water aspirator (VELP Scientifica, Italy). Whole microtiter plates with cell pellets were frozen at -70°C. Then, plates were incubated for 20 min at room temperature and 100 u.1 of lysis buffer (20 mM sodium phosphate buffer, 150 mM NaCI, pH 7.0) was added into the each well. Plates were incubated for 20 min at 30°C with shaking (200 rpm). Cell debris was removed from resulting cell lysates. Concentration of total soluble protein in crude extract in one well with negative control, one well with positive control, one well with background control and in 6 randomly selected wells with new FGF2-G1B mutants was determined for each plate containing one of the libraries using Bradford reagent (Sigma Aldrich, USA). Samples of crude extracts from the same wells were loaded on SDS polyacrylamide gels. The gels were analysed using GS-800 Calibrated Densitometer (Bio-Rad, USA) and the content of FGF2 in the total soluble protein in each sample was determined. The concentration of FGF2 in crude extracts was calculated based on the obtained data. The plates with crude extracts were stored at -70°C for further use. Screening of biological activity of FGF2-G1B variants using rat chondrosarcoma growtharrest assay. Rat chondrosarcoma (RCS) cells is an immortalized phenotypically stable cell line that responds to minute concentrations of FGFs with potent growth arrest accompanied by marked morphological changes and extracellular matrix degradation. FGF receptor 3 (FGFR3) functions as a negative regulator of cell proliferation in this cell line. In order to inhibit cell proliferation, FGF variants have to specifically induce FGFR signal transduction allowing the measuring of FGF activity reflected by the concentration 99 dependence of induced growth arrest. The major advantage of the RCS assay is the exclusion of toxic chemicals and false-positive hits2 6 9 . The high-throughput growth arrest experiment was performed in a 96-well plate format with the cellular content determined by simple crystal violet staining. Media with or without bacterial crude extracts with variants of FGF2 in approximate concentration of 40 ng.ml"1 were incubated at 41.5 °C for 48 h and mixed every 12 h within this period. RCS cells were seeded in concentration 250 cells per well in 96-well plate, one day before the treatment. Cells were treated with preincubated FGF2 at a final concentration of 20 ng.ml"1 for 4 days. Cells were washed with PBS, fixed with 4% paraformaldehyde, washed again and stained with 0.025% crystal violet for 1 hour. Coloured cells were 3 times washed with distilled water. Colour from cells was dissolved in 33% acetic acid. Absorbance was measured at 570 nm (Supplementary Fig. 3). The more stable variant of FGF2 was present in added crude extract, the more evident was the growth inhibition. Samples causing more significant growth inhibition than samples containing FGF2-G0 were considered as positive hits. E. coli clones containing FGF2 candidates were refreshed from glycerol replica plates. For each of positive hits, four wells with LB medium in fresh 1 ml 96-well PP microtiter plate were inoculated and the whole screening procedure was repeated. For each of FGF2 candidates verified in second round of screening, 10 ml of LB medium with kanamycin was inoculated with corresponding E. coli clone from glycerol replica plate and the cells were grown overnight at 37°C with shaking. The overnight culture was used for isolation of plasmid DNA using GeneJET Plasmid Miniprep kit (Thermo Fisher Scientific, USA) and fgf2-GlB genes were commercially sequenced by Sanger method (GATC Biotech, Germany). Resulting sequences were aligned with nucleotide sequence of FGF2-G0 using BioEdit32 for determination of newly inserted mutations. 5.3.6 Small scale production and characterization of selected FGF2-G1B mutants E. coli BL21(DE3) cells were transformed with pET28b-His-thrombin::/g/2x (where x represents one of 33 new mutant variants), plated on LB agar plates with kanamycin (50 ug.ml"1 ) and grown overnight at 37°C. Small scale cell cultivations in 10 ml of LB medium with kanamycin was conducted under conditions described before. The biomass was 100 centrifuged at 10,000 g for 2 minutes at 4°C in a benchtop centrifuge Mikro 200 (Andreas Hettich GmbH & Co.KG, Germany) and the cell pellet was frozen at -70°C. The pellets were defrosted and resuspended in 600 pi of FastBreak Cell Lysis Reagent from MagneHis Protein Purification System (Promega, USA) added with NaCI to the concentration of 500 mM and lpl of DNase I (New England Biolabs, USA). The cells were incubated with shaking for 20 minutes at room temperature. The bacterial lysates were incubated with 30 pi of MagneHis Ni-Particles beads for 2 minutes at room temperature. The beads were separated using magnetic stand and the supernatants were carefully removed. To wash out unbound cell proteins, 150 pi of MagneHis Binding/Wash Buffer with 500 m M NaCI was added. The elution of bound proteins was performed by adding 105 pi of MagneHis Elution Buffer containing 500 mM NaCI. The presence of FGF2-G1B variants in eluted fractions was proved by SDS-PAGE as described before. Determination of thermal stability of FGF2-G1B mutants. The thermal stability of FGF2G1B variants was verified by thermal shift assay2 7 0 . FGF2-G0 was used as a background control. The measurements were conducted in MicroAmp Fast Optical 96-well Reaction Plate (Thermo Fisher Scientific, USA). Each reaction mixture of final volume of 25 pi was composed of 2 pi of SYPRO Orange Protein Gel Stain (Thermo Fisher Scientific, USA), purified FGF2 variant (2.5 mg.ml1 ) and the elution buffer (100 m M HEPES, 500 mM imidazole and 500 m M NaCI, pH 7.5). The assay was performed using StepOnePlus RealTime PCR System (Applied Biosystems/Thermo Fisher Scientific, USA) with starting temperature of 25°C (2 min initial equilibration) and ramping up in increments of 1°C to a final temperature of 95°C. The Tm values were determined from obtained data using Protein Thermal Shift software (Applied Biosystems/Thermo Fisher Scientific, USA; Supplementary Table 5). 5.3.7 Free energy calculations, construction and purification of FGF2-G3 mutant All mutations from screening improving the melting temperature by at least 1°C (E54D, S94I, C96N, and T121P) were selected for in silico analysis using Rosetta ddg monomer application8 8 as described earlier. All double-point mutant combinations of newly identified 101 mutations with existing mutations from stable variant FGF2-G2 (R31L, V52T, H59F, L92Y, C96Y, and S109E) were constructed in silico to predict potential additivity of these individual mutations (data not shown). Predicted AAG was compared with the sum of AAG of individual mutations but none of the double point mutants had the difference > 1 kcal.mol"1 again suggesting the absence of any antagonistic effects among selected mutations. Consequently, 9-point mutant FGF2-G3 was designed and constructed combining 4 most stabilizing substitutions obtained from screening with 5 substitutions from FGF2-G2. In FGF2-G3 mutant, the substitution C96N was prioritized over C96Y due to its higher individual stabilizing effect verified experimentally. Predicted improvement in Tm for this new variant was 19.2°C. The gene of multiple-point mutant was commercially synthesized (GeneArt/Life Technologies, Germany), subcloned in the Ndel and Xhol sites of pET28b-His-Thrombin and expressed in E. coli BL21(DE3) cells as described before. 5.3.8 Characterization of biophysical properties of FGF2-G3 and its comparison with FGF2-G0 and FGF-G2 variants. DSC was used to characterize protein thermal stability of protein purified by affinity chromatography. DSC data collection was performed over a temperature range of 20°C- 100°C. Proper folding of mutant was verified by CD spectroscopy as described earlier. Circular dichroism spectroscopy. The structural integrity of FGF2-G0, FGF2-G2 and FGF2G3 proteins was followed by monitoring the ellipticity over the wavelength range of 200 to 260 nm at the temperature 37, 50, 65 and 70°C for 24 h. Data were recorded in 2 minute intervals with 1 nm bandwidth using a 0.1 cm quartz cuvette containing the protein. Recorded denaturation curves (either single exponential, double exponential or exponential linear combination function) of tested FGF2 variants were globally fitted to exponential decay curves using OriginPro8 software (OriginLab, USA). Half-life of FGF2 secondary structure (ti/2 defined as a time required to reduce the initial value of ellipticity, as a measure of protein secondary structure, to Vi of the original value) was evaluated from the collected data as a decay constant (t) using the Equation 15: 102 t 1 / 2 = ^ = r H 2 ) Equation 15 where A. is exponential decay constant. The 11/2 was evaluated from the data collected at 227 nm, where all spectra showed the ellipticity maxima (Supplementary Fig. 4). The maximum time of measurement of 24 h was limited by the capacity of the bomb with compressed nitrogen used in the CD spectroscopy protocol. The values of ti/2for FGF2-G0 at 65 and 70 °C were not determined because the proteins were denatured immediately at the beginning of the measurement. Fluorescence spectroscopy. Local conformational changes during thermal unfolding of FGF2 variants were followed by monitoring fluorescence emission spectra using FluoroMax spectrofluorometer (Horiba, Japan). The sample in quartz cuvette with a magnetic stirrer inside was placed into a temperature-controlled holder, and fluorescence spectra excited at 295 nm were recorded in 1 minute intervals from 310 to 410 nm with 1 nm bandwith and 0.1 s integration time from 30 to 90 °C. The actual temperature in the cell was monitored using thermocouple controlled by Labview software (National Instruments, USA). The spectrum of the buffer recorded at 30 °C was used as a blank and subsequently subtracted from the sample data. Unfolding was followed at the emission maximum (347 nm) of the first scan. The concentration of all the samples was approx. 0.1 mg.ml"1 . Differential scanning fluorescence (DSF). A slightly different experimental set-up was used for monitoring fluorescence during thermal unfolding. The standard grade capillary (NanoTemper, Germany) was filled with a sample and placed into the Prometheus NT.48 (NanoTemper, Germany). The samples were continually heated from 30 to 90 °C at different scan rates (0.3, 0.5,1, 2 and 4 °C) and fluorescence signal excited at 295 nm was followed at 335 and 350 nm. The concentration dependence of the unfolding curve was checked by monitoring aliquots of different concentrations (1, 0.5, 0.25 and 0.125 mg.ml" 103 Data analysis. The data were uploaded to an extension of CalFitter (Masaryk University, Czech Republic) based on MATLAB 2014b (The MathWorks, United States) that allows simultaneous global fit into unfolding curves. DSC data were numerically integrated to derive the total heat absorbed during the transition. Then the signals from the four types of measurement, namely CD ellipticity, DSC heat absorption, and two fluorescence measurements, were normalized and fit globally (Supplementary Figs. 5 and 6). After the initial fitting, the weighted least squares were calculated based on the sum of the squared residuals of each curve; thereby, the contributions of each type of the measurements to the sum of errors were the same at the optimal point. Regarding the parameters, linear coefficients were allocated separately to each data set. The models with the minimum number of intermediates were selected based on the quality of fit measured by normalized residuals and visual inspection. Although the datasets of the wild type and the G2 mutant were fitted reasonably well with just a three-step irreversible model, the G3 model had to include one additional reversible pretransitional step to fit all the data sets perfectly, mainly due to the DSC data and ratiometric data from DSF. Nonetheless, the effect of this step is much less pronounced than the subsequent irreversible steps of unfolding in all the datasets. The estimated values of the main parameters of the unfolding mechanism are given in Supplementary Table 6. pH profile. Britton-Robinson buffers of different pHs (5, 6, 7, 7.5, 8, 9,10 and 11) were used for determination of pH stability profile for the FGF2 variants. 5 u l sample aliquot was mixed with 95 u l of BR buffer of appropriate pH, thoroughly vortexed and incubated for 3 h at 4 °C. Next standard grade capillary was filled with a mixture by capillary forces and put into the Prometheus NT.48. Samples were scanned from 30 to 90 °C at l°C.min"1 scan rate. 5.3.9 Characterization of biological properties of FGF2-G3 and its comparison with GO and G2 variants Determination of biological activity half-life. FGF-receptors and their downstream effectors including ERK1/2 are activated upon treatment with FGF2, contributing to pluripotency of human embryonic stem cells (hESC)2 7 1 , 2 7 2 . As the biological activity of FGF2 104 decreases at 37°C, ERK1/2 phosphorylation declines and hESC easily become primed to differentiation. To test the thermal stability of FGF2 variants, the hESC medium prepared without FGF2 was supplemented with FGF2-G0, FGF2-G2 or FGF2-G3 to the final concentration of 10 ng.ml1 and pre-incubated at 37°C for 6 h, 12 h, 24 h, 2 d, 4 d - 20 d. FGF2-starved hESC were treated with hESC medium containing pre-incubated FGF2 for two hours and Western blotted for phosphorylated ERK1/2. Two representative blots per each protein variant were analysed using ImageJ 1.50b (National Institutes of Health, USA) and band densities were plotted as a function of pre-incubation time (Supplementary Fig. 8). Data points were analysed using single exponential decay Equation 16: A/AO = exp(-t/i) + c Equation 16 where A/AO is the relative density at time t, t is the time constant, and c is steady state level of density offset, with help of OriginPro8 software (OriginLab, USA) and the half-life (i.e. the time required for the loss of one-half of the initial activity) was determined for FGF2-G0 protein variant. Western blotting. Cells were lysed with 2x Laemmli buffer and the samples were boiled at 98°C for 10 minutes. Proteins were separated by SDS-PAGE and electrotransferred onto lmmobilon®-P transfer membrane (Merck Millipore, Germany). Membranes were then blocked in 5% milk in TBS buffer and incubated with primary rabbit polyclonal antibody P- 44/42 MAPK (Cell Signaling Technologies, USA) at 4°C overnight. The primary antibodies included rabbit polyclonal anti-pERKl/2 and rabbit polyclonal anti-ERKl/2 (both Cell Signaling Technology, USA). Next day, the membranes were incubated with donkey antirabbit antibodies conjugated with horse raddish peroxidase (Santa Cruz Biotechnology, USA), and the protein bands were visualized using chemiluminiscence detection reagent ImmobilonTM Western (Merck Millipore, Germany) on photographic paper (Agfa-Gevaert, Belgium). After stripping, the membrane was re-probed with rabbit polyclonal antibody p44/42 MAPK (Erkl/2) (Cell Signaling Technology, USA) against total signaling proteins. 105 Cell cultures. The hESC employed in this study were derived from blastocyst-stage embryos obtained with informed consent of donors. A well characterized human ESC line (Adewumi, 2007) CCTL14 (Centre of Cell Therapy Line) in passages 65 - 75 was used. The hESC were maintained under feeder-free conditions using MatrigelTM hESC-qualified Matrix (BD Biosciences). Culture medium required for propagation of hESC grown on Matrigel was medium conditioned by mitotically inactivated mouse embryonic fibroblasts (mEF). For preparation of standard conditioned medium (CM), the complete hESC medium containing 4 ng.ml"1 FGF2 is usually conditioned by mitotically inactivated mEF for 5-7 days and then supplemented by 10 ng.ml"1 of FGF2 to restore growth factor concentration due to its degradation. In our experiments, to test the long-term thermostability of FGF2, the CM was prepared out of medium containing 10 ng.ml"1 of FGF2 with no supplementation afterwards. Proliferation assays. The hESC were plated into 24-well plates and propagated in presence of each of the tested FGF2 for five passages and counted every three days after plating using Burker chamber. Alternatively, cells were plated into 94-well plates and cultured in the presence of various FGF2 for 6 days. Cells were then fixed in 4% paraformaldehyde (20 min, RT), stained with 0.1% crystal violet (60 min, RT), and destained with 33% acetic acid (20 min with shaking). The absorbance of the supernatant was then measured at 570 nm using plate reader (Supplementary Fig. 9). Immunocytochemistry. The hESC were fixed with 4% paraformaldehyde (20 min, RT), permeabilized with 0.1% Triton-XlOO in PBS (20 min, RT), and incubated with primary antibodies at 4°C overnight. Primary antibodies included goat polyclonal anti-Oct4 (Santa Cruz Biotechnology, USA) and rabbit monoclonal anti-Nanog (Cell Signaling technology, USA). Next day, incubations with secondary antibodies conjugated to AlexaFluor488 or AlexaFluor594 (Thermo Fisher Scientific, USA) were carried out at RT for 1 h. Coverslips were mounted in DAPI-containing Mowiol (Sigma-Aldrich, USA). Microscopic analysis was performed using Confocal LSM 700 microscope (Zeiss, Germany; Supplementary Fig. 10). 106 Preparation of hydrogel. Macroporous poly(2-hydroxyethyl methacrylate) (PHEMA) microspheres of narrow particle size distribution were prepared by multi-step swelling polymerization2 7 3 . Briefly, the method is based on 0.7 pm monodisperse polystyrene seeds which are swollen with activating agent (dibutyl phthalate), monomers (2(methacryloyl)oxyethyl acetate, 2-[(methoxycarbonyl)methoxy]ethyl methacrylate, ethylene dimethacrylate), and porogen (cyclohexyl acetate). After benzoyl peroxideinitiated and (hydroxypropyl)methyl cellulose-stabilized polymerization, hydrolysis, and washing, the resulting PHEMA microspheres were 3 pm in diameter, with a narrow size distribution (Supplementary Fig. 11) and contained 0.5 mmol COOH/g. In vivo experiment. Animals used in the study of hair promoting activity, 7 weeks old female C57BL/6 mice, were obtained from Laboratory Animal Breeding and Experimental Facility (Masaryk University, Brno, Czech Republic) and maintained on a standard laboratory diet and water as libitum. 17 animals in 3 randomized groups (n=5 or 6) were shaved using depilatory cream (Veet, USA) at 7 weeks of age, at which all hair follicles were synchronized in the quiescence telogen2 7 4 . FGF2, both FGF2-G0 and thermostable variant FGF2-G3 (5 pg per mouse) were dissolved in 0,1% human serum albumin and sorbed into the hydrogel by continuous stirring for 2 hours at RT. The resulted suspension was applied topically on dorsal skin of C57BL/6 mice with subcutaneous injection. Empty hydrogel with no FGF2 was used as a control. Visible hair growth was recorded at days 1, 7, 13, 16 and 20 (Supplementary Fig. 12). Hair length determination. To examine the effect of FGF2 in hydrogel on hair length, hairs on the proximity of injected site were plucked randomly by forceps at day 21 post application. The average length of the 20 plucked hairs per mouse was measured manually with a micrometer under a stereoscopic microscope and expressed in millimeters. 107 5.4 Supporting Information Unfolding mechanism ofFGF2. Two-step unfolding mechanism including one intermediate was revealed for all tested FGF variants by the global fit of unfolding data (Figure S 8, Figure S 9 and Table 6). The main difference in the variants lies in the Gibbs energy barrier of the first irreversible step (AG*1) with the value 111±0.3 kJ-mol- 1 at 25°C for the wild type FGF2GO, increased by 29.7±0.9 kJ-mol1 and 35.7±0.8 kJ-mol1 for the FGF2-G2 and FGF2-G3, respectively. The predicted AAG values from computer modeling were -26±10 kJ-mol"1 and -30±9 kJ-mol"1 for the FGF2-G2 for FGF2-G3, respectively, which is in good agreement with the experimental data. The stability of both FGF2-G2 and FGF2-G3 can be attributed to the increase in the Gibbs activation energy of the first unfolding step. Moreover, a pretransitional step (Table S 18, Step 0) and slightly lower optimal pH range 6.0-7.5 forTl/2 was revealed for FGF2-G3, when compared with FGF2-G0 and FGF2-G2 variants preferring pH range of 7.0-8.0 (Figure S 10). 108 R31 V 5 2 E54 Figure S 4. Proposed stabilizing positions in the structure of wild-type human FGF2. Positions selected by energy-based approach for site-directed mutagenesis are shown as yellow spheres, while the positions determined for randomization as red spheres. Positions selected by evolution-based approach are shown as green spheres. a 10 20 30 40 50 60 | | I I I I I I I I I I ATGGGCAGCA GCCATCATCA TCATCATCAC AGCAGCGGCC TGGTGCCGCG CGGCAGCCAT 70 80 90 100 110 120 | | | | | | | | | | | | ATGGCAGCCG GGAGCATCAC CACGCTGCCC GCCTTGCCCG AGGATGGCGG CAGCGGCGCC 130 140 150 160 170 180 | | | | | | | | | | | | TTCCCGCCCG GCCACTTCAA GGACCCCAAG CGGCTGTACT GCAAAAACGG GGGCTTCTTC 190 2 0 0 2 1 0 2 2 0 2 3 0 2 4 0 109 C T G C G C A T C C A C C C C G A C G G C C G A G T T G A C G G G G T C C G G G A G A A G A G C G A C C C T C A C A T C 250 260 270 280 290 300 I I I I I I I I I I I I A A G C T A C A A C T T C A A G C A G A A G A G A G A G G A G T T G T G T C T A T C A A A G G A G T G T G T G C T A A C 310 320 330 340 350 360 I I I I I I I I I I I I C G T T A C C T G G C T A T G A A G G A A G A T G G A A G A T T A C T G G C T T C T A A A T G T G T T A C G G A T G A G 370 380 390 400 410 420 I I I I I I I I I I I I T G T T T C T T T T T T G A A C G A T T G G A A T C T A A T A A C T A C A A T A C T T A C C G G T C A A G G A A A T A C 430 440 450 460 470 480 I I I I I I I I I I I I A C C A G T T G G T A T G T G G C A C T G A A A C G A A C T G G G C A G T A T A A A C T T G G A T C C A A A A C A G G A 490 500 510 520 530 I I I I I I I I I I C C T G G G C A G A A A G C T A T A C T T T T T C T T C C A A T G T C T G C T A A G A G C T A G C T C G A G b 10 20 30 40 50 60 I I . . . . I I I I I I I I I I M G S S H H H H H H S S G L V P R G S H M A A G S I T T L P A L P E D G G S G A F P P G H F K D P K R L Y C K N G G F F 70 80 90 100 110 120 I I I I I I I I I I I I L R I H P D G R V D G V R E K S D P H I K L Q L Q A E E R G W S I K G V C A N R Y L A M K E D G R L L A S K C V T D E 130 140 150 160 170 I I I I I I I I I I I C F F F E R L E S N N Y N T Y R S R K Y T S W Y V A L K R T G Q Y K L G S K T G P G Q K A I L F L P M S A K S C 10 20 30 40 50 60 I I I I I I I I I I I I M A A G S I T T L P A L P E D G G S G A F P P G H F K D P K R L Y C K N G G F F L R I H P D G R V D G V R E K S D P H I 70 80 90 100 110 120 I I I I I I I I I I I I K L Q L Q A E E R G W S I K G V C A N R Y L A M K E D G R L L A S K C V T D E C F F F E R L E S N N Y N T Y R S R K Y 130 140 150 I I I I I I I | s W Y | a L K R T G Q Y K L G S K T G P G Q K A I L F L P M S A K S Figure S 5. The nucleotide (a) and amino acid (b) sequences of FGF2-G0 with upstream sequences in pET28b vector and amino acid sequence of wild-type FGF2 (c). Start codons and corresponding methionines are in grey, 6xHis tag is in turquoise, t h r o m b i n cleavage recognition site is in magenta, stop codon is in red, restriction sites of Ndel and Xhol are underlined, and mutated amino acid positions in the sequence of wild-type FGF2 are in green. 1.0 0.8 o LO ra 0.6 c CD 15 0.4 o Q. O 0.2 0.0 11 iCTRL CTRL CTRL G5 WT + G6 G7 G8 H1 H2 H3 H4 H5 H6 H7 H8 Figure S 6. Example of output data from screening of biological activity of mutated FGF2 variants in crude extracts (CE) originating from the library FGF2-C96X. Coding on X axis corresponds to the wells of original microtiter plate. CEs pre-incubated at 41.5°C were added to the rat chondrocytes grown in parallel microtiter plates to the final concentration of FGF2 of 20 n g . m l 1 and inhibition of growth of chondrocytes was compared to the samples containing controls by measuring the optical density. CTRL-, negative control, CE from E. coli cells with empty pET28b plasmid; CTRL+, positive control, CE from E. coli cells producing FGF2-G1A mutant R31L; CTRL WT, background control, CE from E. coli cells producing FGF2-G0. Black line represents the threshold assigned corresponding to the background control CTRL WT. Clones G5 and H3, whose CE caused statistically more significant growth arrest of rat chondrocytes than background control were selected for rescreening as positive hits. Error bars represent deviations calculated from 2 replicated measurements. I l l 200 4 0 0 6 0 0 8 0 0 Time (min) 1000 1200 100 150 2 0 0 Time (min) 300 400 6 0 0 Time (min) 1000 Figure S 7. Structural stability of selected FGF2 variants determined by CD spectroscopy. Structural stability of FGF2-G0 at 50°C (a), FGF2-G2 (b) and FGF2-G3 (c) at 70 °C. Solid lines represents the best fit. 112 Step 1 Step 2 Step 3 N > I >D >Ag Temperature (°C) 113 0 i 40 50 SO 70 30 00 Temperature (°C) 45 50 55 9 0 5 5 70 75 BO 8 5 90 Temperature (°C) Figure S 8. Global fit of a three-step model for (A) the wild type, (B) the FGF2-G2 mutant, and the modelled fractions of the states for (C) the wild type and (D) the FGF2-G2 mutant. The proposed mechanism of unfolding for the two variants is on top (N, I, D, and Ag stand for the natural, intermediate, denatured, and aggregated states, respectively). The model with three steps of unfolding was successfully fitted into all four data sets: DSC (diamonds), CD (stars), equilibrium fluorescence (circles), and DSF (points). The fitted curves are depicted in blue. The respective fractions of the states of unfolding are given in the bottom graphs: natural (black), intermediate (blue), denatured (yellow), and aggregated (red). 114 Step 0 Step 1 Step 2 Step 3 N* < >N >I >D >Ag 50 55 60 65 70 75 80 85 00 Temperature (°C) 115 50 55 SO 55 70 75 30 35 90 Temperature (°C) 50 55 60 95 70 75 BO 35 Temperature (°C) Figure S 9. Global fit of a three-step model for the FGF2-G3 mutant (A), DSF signal of FGF2-G3 mutant (B), the modelled fractions of the states (C), and the modelled scan rate dependence (D). (A,C) The model with four steps of unfolding was successfully fitted into all four data sets: DSC (diamonds), CD (stars), equilibrium fluorescence (circles), and DSF (points). The fitted curves are depicted in blue. The respective fractions of the states of unfolding are given in the bottom left graph: natural* (black), natural (brown), intermediate (blue), denatured (yellow), and aggregated (red). (B) The ratio (stars) of 330 nm and 350 nm clearly indicates the change in the signal for the transition area of the first step obtained from the global fit (55-70°C); however, neither of the wavelengths (dots) separately indicated any significant change. (D) The predicted values of the fully irreversible model (orange) provided poor fit into the data obtained at low scan rates (black) as compared with the four-state model with a reversible step (blue). 116 5.0 6.0 7.0 7.5 8.0 9.0 10.0 5.0 6.0 7.0 7.5 8.0 9.0 10.0 pH pH Figure S 1 0 . The pH profile of thermal unfolding of the variants, (a) The Gibbs activation energy (AG*) of the first step at temperature 25°C. (b) The temperatures at which half of the native state protein has already undergone the first step (T1/2). 117 a F G F 2 - G 0 Time of incubation at 37 °C p E R K 1/2 E R K 1/2 *f DhaA63 > DhaA80> DhaA). These results imply that the F176G mutation in the mouth of the enzyme's access tunnel significantly enhanced its activity in aqueous environments and in the a) b) c) 250 40 30 20 - 10 3 - 2 - 1 iDhaA DhaA63 DhaASO DhaA106 DhaA DhaA63 DhaA80 DhaA106 DhaA DhaA63 DhaASO DhaA106 presence of To better understand the origin of this change in activity, steady-state kinetic constants were determined for conversion of 1,2-dibromoethane by DhaA106 and compared to those for DhaA80, DhaA63 and DhaA (Figure 19, Table 5 and Table 6). In an aqueous environment, the F176G mutation significantly increased the enzyme's catalytic rate, suppressed substrate inhibition and reduced the free enzyme's affinity for 1,2-dibromoethane. The turnover number of DhaA106 in buffer was 32- and 5-fold higher than those for DhaA80 and DhaA63, respectively, and 3-fold lower than that of DhaA (Figure 19a, Table 5). Adding 40% (v/v) DMSO to the reaction mixture reduced the kcat values for all of the studied DhaA variants while increasing their /Cm and Ks; constants, implying that the organic co-solvent acts as a mixed inhibitor (Figure 19b, Table 6). Notably, DMSO very strongly reduced the affinity of free DhaA106 and the enzyme-substrate complex for 1,2-dibromoethane. 135 DhaA106 consequently exhibited no substrate inhibition and had the highest catalytic rate of the tested variants in the presence of DMSO (Table 6). Table 5. Steady-state kinetic parameters of DhaA variants with 1,2-dibromoethane in buffer solution. Variant Km [mM] K » [ m M ] kat [s1 ] kcat/Km [s^mM1 ] DhaA 3.56±0.67[ a l 3.56±0.70[ a l 28.67 ± 3.95[ a ] 8.05 ± 2.63[ a l DhaA63 1.70 ± 1.28[ a ] 0.24±0.20[ a ] 2.23 ± 1.46[ a l 1.31 ± 1.85[ a ] DhaA80 0.13±0.09[ a ] 0.41±0.33[ a l 0.34±0.14[ a l 2.62 ± 2.89[ a I DhaA106 0.89±0.08[ b ] 1.28 ±0.11 11.01 ± 0.23 12.37 ± 1.37 [a] Data from Koudelakova et al. 20132 9 6 . [b] Cooperativity with Hill coefficient n = 1.36 ± 0.13. Table 6. Steady-state kinetic parameters of DhaA variants with 1,2-dibromoethane in 40% (v/v DMSO. Variant Km [mM] Ksi [mM] kcat [s"1 ] kcat/Km [f'mM"1 ] DhaA N D ^ N D ^ N D ^ N D ^ DhaA63 1.08±0.26[ t > 1 41.44 ±14.14[ b l 0.72 ± 0.06[ t > 1 0.66 ± 0.22[ b ] DhaA80 0.88±0.16[ b ] 13.14 ± 2.73[ b l 0.25 ± 0.02[ b l 0.28 ± 0.07[ b I DhaA106 11.17 ± 1.38 NA[ c l 3.14 ±0.21 0.28 ± 0.05 [a] ND, data could not be collected under the comparable conditions due to protein instability, [b] Data from Koudelakova t al. 2013,2 9 6 [c] NA, not applicable. 136 Figure 19. Steady state kinetic profiles of DhaA variants (DhaA in blue, DhaA63 in yellow, DhaA80 in green and DhaA106 in red) at 37 °C. a) in a buffer and b) in 40 % (v/v) DMSO. Note the different scales in the two figures. The kinetic profile of DhaA in 40 % DMSO was measured under different conditions to the other variants (the duration of the experiment was limited to 3 min) due to its very low stability in the experimental environment. The specific activity of DhaA106 was tested further using a set of 30 halogenated substrates (Chyba! Nenalezen zdroj odkazu., Table S 22) to determine whether the F176G mutation affected its activity in aqueous buffer towards substrates other than 1,2-dibromoethane. The activity of DhaA106 was greater (between 2 and 49 times higher) than that of DhaA80 for all tested compounds. The enzyme was most active towards multisubstituted C2-C3 19CH |> 180; 1 60 0 1 50 -e> ^ 8 b « e .e ^ .a »e .a A « ( ? „6 -e <, 137 bromolkanes including 1,2-dibromoethane; 1,2-dibromopropane; 1,2,3-tribromopropane and l,2-dibromo-3-chloropropane. In addition, the activity of DhaA106 was comparable to or greater than that of DhaA for more than half of the tested substrates. Principal component analysis (PCA) using transformed activity data set was used to explore the relationships between the individual DhaA variants (Figure S 17). A similar analysis demonstrated that wild-type HLD enzymes cluster into four distinct substrate specificity groups (SSGs)2 2 3 . Like DhaA and DhaA80, DhaA106 was found to belong to SSG-I (Figure S 17a). Enzymes in SSG-I are robust catalysts with high activity towards brominated ethanes and propanes, and detectable activity towards poorly degradable compounds such as 1,2-dichloroethane, 1,2-dichloropropane and 1,2,3-trichloropropane2 2 3 . Although all of the tested DhaA variants belong to the same SSG, their substrate preferences differed to some extent as demonstrated by their different positions on the PCA scores plot (Figure S 17a). The relative activity of DhaA106 was more than one order of magnitude greater than that of DhaA80 for several substrates including 1-chlorobutane; 1,3-dichloropropane; 1,2-dibromoethane; 1,2-dibromopropane; 4-bromobutyronitrile; 1,2,3-tribromopropane and l,2-dibromo-3-chloropropane. Unlike DhaA, DhaA106 exhibited decreased preference for substrates with longer alkyl chains such as 1bromohexane; 1-iodohexane; and (l-bromomethyl)-cyclohexane, as well as disubstituted C2-C3 haloalkanes such as 1,2-dibromoethane; 1,3-dibromopropane; and 2,3- dichloropropene. 6.3.3 Crystallographic analysis of DhaA106 The structure of DhaA106 was solved at the resolution of 1.69 A (Table S 23) by molecular replacement using the structure of DhaA14 (PDB ID 3G9X2 9 7 ) as a search model. The resulting diffraction data enabled the localization of residues 4-295, showing that the enzyme exists as a monomer in the crystal with a solvent content of approximately 41.96 %. As expected, the overall structure of DhaA106 resembles that of DhaA2 9 7 '2 9 8 , consisting of an a/|3-hydrolase core domain and a helical cap domain (Figure S 18). The core domain 138 is formed by a central twisted eight-stranded |3-sheet (mostly parallel, with a single antiparallel |32-strand) surrounded by six a-helices. The cap domain consists of five cthelices linked by six loop insertions. The active site is located in a predominantly hydrophobic cavity, at the interface between the core and the cap domains, connected to the protein surface by two access tunnels. Visual inspection of the crystal structure revealed that the F176G mutation changed the diameter of the main access tunnel and the intramolecular contacts between its hydrophobic residues. 6.3.4 Molecular dynamics and access tunnels analysis Molecular dynamics (MD) simulations were performed to further explore the structural basis of the enhanced catalytic activity and reduced thermostability of DhaA106. Two independent 200 ns long simulations were run for each of DhaA, DhaA63, DhaA80 and DhaA106. CAVER 3.012 9 9 was then used to analyze 100,000 snapshots from each simulation to identify the access tunnels and to provide information on the opening and closing of the access tunnel as well as time-resolved changes in bottleneck radii. Aside from the main access tunnels, slot tunnels were identified in the structures of all studied enzymes. The slot tunnels showed less favorable geometric parameters than the main access tunnels, as deduced from the tunnel width, length and curvature (Table S 24), indicating that the main tunnel acts as the preferred pathway for transport of studied substrates and products. The main access tunnel of DhaA was detected in 94 % of the snapshots taken during the simulation and was open in 58 % of the snapshots. The average tunnel bottleneck radius was 1.5 A, with a maximum value of 3.1 A (Figure 20, Table S 25). The secondary structures of the DhaA cap domain exhibited substantial flexibility: the distance between the two helices situated on the opposite sides of the tunnel (quantified in terms of the distance between the Ca atoms of residues F144 and C176) ranged from 7.0 to 14.0 A (Figure S 19). The access tunnels of DhaA63 and DhaA80 showed very similar properties. The main access tunnel was detectable in 6 % and 1 % of the snapshots for DhaA63 and DhaA80, respectively. Similarly, the tunnel was only open in 0.05 % of the DhaA63 snapshots and 139 0.02 % of those for DhaA80. These mutants had identical average bottleneck radii of 1.1 A, with a maximal radius of 1.7 A (Figure 20, Table S 25). The separation of the helices in the cap domain was somewhat more constrained than in the wild-type protein, ranging from 8.0 to 13.0 A in both cases (Figure S 19). Both of these variants have four bulky residues in the main access tunnel that are not present in the wild-type enzyme and which serve to restrict the tunnel's opening while making the cap domain more rigid. The C176F mutation in the tunnel entrance was particularly important in narrowing the main pathway to the active site in these two enzymes. The modified access tunnel of DhaA106 is strikingly similar to that of wild-type DhaA. The F176G mutation in DhaA106 created a void in the tunnel entrance that is not present in DhaA63 and DhaA80, and enhanced the mobility of the cap domain's secondary elements. The tunnel was detected in 86 % of the snapshots and was open in 48 % of them. The average tunnel bottleneck radius was 1.4 A, with a maximum value of 2.8 A (Figure 20, Table S7 in the Supporting information). The distance between helices ranged from 7.5 to 14.0 A (Figure S 19). The F176G mutation removed some of the contacts between residues within the access tunnel of DhaA106, explaining its lower thermodynamic stability compared to the template DhaA80. However, because DhaA106 retains three stabilizing mutations (T148L, G171Q and A172V), it is much more stable than the wild-type enzyme. 140 < 03 CO CD < 03 O CO < 03 CD O < 03 MD1 M D 2 50 100 150 t/ns 50 100 150 200 t/ns Bar III I i 50 100 t/ns 150 50 100 t/ns I . .,.,«„.11 ) 50 100 150 t/ns 150 200 Lijl.ii. ii ,i iniii,i,.,j,,L 50 100 150 200 t/ns 200 Figure 2 0 . Visualization of the representative structures of the main tunnels in the cap domains of the studied enzymes and changes in the tunnel bottlenecks over time. Left: PyMOL 1.5 visualizations of the cap domain residues (shown as green cartoon) and the tunnel (indicated by the red spheres). The side chains of the residues located at the 148, 171, 172, and 176 positions (i.e. the positions mutated in DhaA63) in each enzyme are represented by sticks and the location of residue 176 in DhaA106 is indicated by a black arrow. Right: The evolution of the bottleneck radius (BR) in two independent 200 ns long molecular dynamics (MD) simulations. The black horizontal lines indicate the threshold radius (1.4 A) above which the tunnel was considered to be open. The tunnels were analyzed using CAVER 3.01 2 9 9 . 141 6.4 Discussion The development of new approaches for the rational engineering of stable catalysts that retain catalytic activity is a key challenge in protein engineering3 0 0 . This work aimed to improve the catalytic activity of a recently constructed highly stable and solvent-resistant DhaA802 9 6 while minimizing losses of thermodynamic stability. To this end, the effects of all possible substitutions in the targeted tunnel positions (F176 and V172) were evaluated using the computational tool FoldX9 4 . A smart saturation mutagenesis library was then constructed featuring enzyme variants incorporating every possible combination of the predicted stabilizing or neutral access tunnel substitutions (library I). In addition, a second library was constructed in which one of the target positions in the tunnel mouth (F176) was randomized using site-saturation mutagenesis (library II). The best variant was DhaA106, which was obtained by site-saturation mutagenesis and exhibited significantly enhanced activity in the presence and absence of DMSO. The catalytic activity of DhaA106 towards 1,2-dibromoethane in buffer solution and 40% (v/v) DMSO was 32- and 10- times greater than that of DhaA80, and its melting temperature (which reflects its thermodynamic stability) was only 4 °C lower. DhaA106 also exhibited significantly enhanced activity (relative to DhaA80) towards 26 of 29 additional halogenated compounds, showing similar levels of activity to wild-type DhaA. Sequencing of DhaA106 revealed that it contained a substitution that would be difficult to design rationally. The variant carries small glycine residue in the tunnel mouth in place of a bulky phenylalanine. Glycine contains a hydrogen atom as its side chain, giving it much more conformational freedom than other amino acids. Its low steric demand means that adjacent residues have much more flexibility than they would otherwise3 0 1 , 3 0 2 . Because high flexibility is often associated with low stability in proteins, one common strategy for enhancing their stability is to rigidify their most flexible regions6 4 , 6 6 , 3 0 3 - 3 0 5 . However, it is necessary to maintain a balance between stability and flexibility in order to retain the protein's biological functionality. Stability ensures an appropriate geometry for ligand 142 binding and prevents denaturation under physiological conditions, while flexibility is necessary to allow catalysis at a metabolically appropriate rate3 0 6 - 3 0 8 . In this work, replacing a bulky phenylalanine in the tunnel mouth of DhaA80 with the smallest amino acid glycine led to a variant (DhaA106) in which the intramolecular hydrophobic packing of the tunnel residues was partially disrupted, reducing the protein's stability (AT"m = -4 °C). However, this also increased the flexibility and mobility of the two ahelices lining the main tunnel, increasing the chance of the tunnel being in an open state. Consequently, the variant was more catalytically active than the template. The main access tunnel of DhaA106 was identified and found to be open in 86 and 48 %, respectively, of the molecular dynamics snapshots that were analyzed; the corresponding values for the template enzyme DhaA80 were only 1 and 0.02 %. The frequency of tunnel opening in DhaA106 was comparable to that of DhaA, whose main tunnel was identifiable in 94 % of its snapshots and open in 58 %. We hypothesize that reopening of the access tunnel facilitates the admission of the substrate to the active site or the release of the product, while the remaining three bulky and hydrophobic mutations in the access tunnel2 9 6 significantly increase the enzyme's thermodynamic stability relative to the wild-type (ATm = 12 °C). It has previously been shown that modifying the size, physico-chemical properties and dynamics of access tunnels by protein engineering can change the catalytic activity, substrate specificity, enantioselectivity and stability of h l D s 2 3 8 ' 2 8 0 ' 2 9 6 ' 3 0 9 - 3 1 0 and many other enzymes with buried active sites3 1 1 such as cytochrome P450s3 1 2 - 3 1 6 , |3-glucosidases317 , lipases2 8 6 -3 1 8 - 3 2 2 , esterases3 2 3 , and epoxide hydrolases2 1 3 -3 2 4 -3 2 5 . Tunnel mouth engineering was shown to have profound effects on the activity and specificity of the enzyme LinB from Sphingobium japonicum UT262 8 0 . The residue L177, located in the tunnel opening at the position corresponding to F176 in DhaA80, was selected for saturation mutagenesis on the basis of structural and phylogenetic analyses. The effects of the resulting mutations on the variants' catalytic activities greatly differed for individual substrates2 8 0 . Similar findings have been reported for epoxide hydrolases 143 from Agrobacterium radiobacter AD13 2 5 and Aspergillus niger M2003 2 4 , in which the engineering of a single amino acid in the tunnel mouth led to improved enzyme activity and enantioselectivity. As with DhaA106, the catalytic activity of LinB variants was generally increased by introducing a small non-polar amino acid at position 177, whereas the introduction of bulky aromatic or charged residues dramatically reduced activity towards most substrates, including 1,2-dibromoethane. The small side chain of the introduced glycine residue in the tunnel mouth of the LinB L177G mutant was proposed to increase the radius of the tunnel mouth, thereby facilitating substrate entry and product release. Conversely, the bulkier side chain of the introduced tryptophan residue in the LinB L177W variant presumably blocked the mouth of the enzyme's main access tunnel, reducing its catalytic activity2 8 0 . A detailed analysis of 1,2-dibromoethane passage through the access tunnel of the LinB L177W mutant confirmed that this mutation significantly reduced the rate of product release3 2 6 . Moreover, mutation in position 177 of LinB significantly affected its thermal stability3 1 0 . Similar observations were also reported for (3-Glucosidase from Trichoderma reesei whose activity and stability were significantly affected by mutations in the substrate entrance region3 1 7 . The influence of the tunnel-lining residues on the catalytic activity of DhaA towards 1,2,3trichloropropane (TCP) has been studied extensively2 2 1 '2 3 8 '3 0 9 , 3 2 7 , 3 2 8 . Independently performed error-prone PCR experiments generated two double point mutants, G3D+C176F2 2 1 and C176Y+Y273F3 2 7 whose activities towards TCP are 4- and 3.5-times greater than that of the wild-type, respectively. Interestingly, both mutants carried bulky residues (tyrosine or phenylalanine) in the 176 position whereas the wild-type enzyme has a comparatively small cysteine residue in this position. The mutants thus have much narrower access tunnel entrances3 0 9 . Molecular dynamics simulations and structure-based enzyme design identified the 176 position and another four access tunnel residues as being crucial for the activity of DhaA. Mutagenesis at these positions yielded a variant whose catalytic activity and efficiency towards TCP were 32- and 26-times higher, respectively, than those for the wild-type. This variant had bulky aromatic residues at four of the five targeted positions, which restricted the access of water molecules to the active site cavity. 144 The rate-limiting step of TCP conversion in the resulting variant was shifted from carbonhalogen bond cleavage to the release of the reaction products2 3 8 . Sealing the access tunnel of DhaA with bulky residues has previously been identified as a viable strategy for enhancing its thermodynamic stability and resistance to the organic cosolvent DMSO2 9 6 . The introduction of four bulky hydrophobic residues into the access tunnel yielded a DhaA variant (DhaA80) with a closed tunnel exhibiting enhanced intramolecular packing. This modification prevents destabilization of the protein's structure due to the admission of DMSO into the active site. Rigidifying and narrowing the tunnel in this way shields the interior of the protein from the organic solvent but presumably also makes the exchange of substrate and product molecules between the active site and bulk solvent more difficult. Stabilizing the protein in this way therefore reduces its activity, demonstrating the need to strike a careful balance between protecting the buried active site from solvent molecule (which may cause denaturation or compete with the desired substrate) and retaining sufficient flexibility for catalytic activity. In this study, we showed that it was possible to create a DhaA variant with a superior tradeoff between activity and stability relative to that seen in DhaA80. This was achieved by replacing a sterically demanding phenylalanine residue at the mouth of the main access tunnel with a small glycine residue. This substitution enhanced the flexibility of the two a helices that form the tunnel but did not greatly reduce protein resistance to organic cosolvents and tolerance towards elevated temperatures because other bulky hydrophobic residues inside the tunnel were retained. A potentially viable alternative strategy for balancing activity and stability in this case would be to introduce a molecular gate in the access tunnel. Molecular gates are dynamic protein structures that regulate substrate access to the active site and product release while preventing the access of undesirable solvent molecules and synchronizing processes occurring in distant parts of a protein3 2 9 . Introducing a gate should protect the enzyme against irreversible inactivation under harsh conditions while maintaining good catalytic performance. Additionally, the auxiliary slot 145 tunnels could be subjected to optimization for further improving the stability of DhaA106 without significantly compromising its activity. 6.5 Conclusion We have demonstrated that the catalytic performance of the thermodynamically robust, but less active, HLD variant DhaA80 can be greatly enhanced by fine-tuning the geometry and dynamics of its access tunnel. A single-point mutation (F176G) in the tunnel mouth yielded the new variant DhaA106, which exhibits greater flexibility in the secondary structure elements that form the access tunnel. The activities of DhaA106 towards 1,2dibromoethane in buffer solution and in 40% (v/v) DMSO were 32- and 10-times greater, respectively, than those of the template enzyme DhaA80. However, its melting temperature (which reflects its thermodynamic stability) was reduced by only 4 °C. The high stability of the template enzyme was preserved because DhaA106 retains three previously introduced bulky residues in the tunnel interior that provide good hydrophobic packing and prevent solvent molecules from accessing the active site. In addition to its enhanced activity towards 1,2-dibromoethane, DhaA106 also exhibited enhanced activity towards 26 out of 29 other halogenated compounds. These results suggest that a fine balance between tunnel flexibility and tight hydrophobic packing, as well as a precisely engineered tunnel diameter are important for HLD activity and stability. Tunnel residues are thus good targets for modification when seeking to balance the activity and stability of catalysts with buried active sites. 6.6 Experimental Section Library design and predicting the effects of mutations on enzyme stability. The structure of DhaA80 (PDB ID 4F60) was downloaded from the RCSB PDB database2 3 4 . The structure was prepared for analysis by removing ligands and water molecules. Missing atoms in side chains were added using the module of FoldX9 4 . The stability effects of all possible double-point mutations in positions F176 and V172 of DhaA80 were estimated using the FoldX module9 4 . Two variants differing in the specified order of 146 mutations were considered for each double-point mutant (e.g. V172A, F176A and F176A, V172A for the F176A+V172A mutant). Calculations were performed 5 times for each variant following the recommended protocol (pH 7, temperature 298 K, ion strength 0.050 M, VdWDesign 2). All stabilized (AAG < -1 kcal/mol) and neutral (-1 kcal/mol < AAG < 1 kcal/mol) mutants were selected and the frequencies of individual residues at target positions were counted. Suitable degenerate codons for saturation mutagenesis were chosen using the CASTER v2.0 program2 9 3 . The degenerate codons were selected to encode all frequent residues from the double-point mutants without producing an excessively large library. Library construction. Saturation mutagenesis was performed using the QuickChange SiteDirected Mutagenesis Kit (Agilent Technologies, Santa Clara, USA). Positions 172 and 176 of DhaA80 were saturated simultaneously using the following oligonucleotides (Sigma Aldrich, St. Louis, USA): 5'-GCTTTCATCGAG CAAVTYCTCCCGAAAW KSGTCGTCCGTCCG CTTACG-3' (forward) and 5'-CGTAAGCGGACGGACGACSMWTTT CGGGAGRABTTGCTCGATGAAAGC-3' (reverse). Position 176 was independently saturated using a pair of oligonucleotides (Sigma Aldrich, St. Louis, USA): 5'CGAGCAAGTGCTCCCGAAA NNKGTCGTCCGTCCGCTTAC-3' (forward) and 5'-GTAAGCGGA CGGACGACMNNTTTCGGGAGCACTTGCTCG-3' (reverse). The entire plasmid pAQN::dhaA80His6 served as a template for PCR and was amplified according to the manufacturer's protocol. PCR was performed using 50 pi reaction mixtures containing 10 ng of template DNA, 5 pmol of each oligonucleotide, and 0.2 mM dNTPs in Phusion HF buffer with 1.5 mM MgCI2 and 1 U of Phusion DNA Polymerase. PCR proceeded under the following conditions: 30 s at 95 °C, and then 18 cycles of 30 s at 95 °C, 60 s at 55 °C and 300 s at 68 °C; followed by 10 min at 72 °C. PCR products were then treated with the methylation-dependent endonuclease Dpnl for 5 min at 37°. The resulting plasmids were transformed into Escherichia coli XJb(DE3) cells (ZymoResearch, Orange, USA) using the standard electroporation protocol3 3 0 . Ten candidates from each library were randomly selected for sequencing. 147 Cultivation in microtiter plates (MTP) and preparation oflysates. MTP wells filled with 150 pi of Luria-Bertani (LB) medium with ampicillin added to a final concentration of 100 pg ml" 1 were inoculated with the single colonies using sterile tooth-picks. Four wells were inoculated with E. coli XJB pAQN:: dhaA80His6 cells to serve as positive controls for basal activity measurement and another four wells were inoculated with E. coli XJB carrying an empty vector (pAQN) to serve as negative controls in the epPCR library screening. Cultures were grown overnight at 37 °C at 200 r.p.m. After 14 hrs of cultivation (OD6oo= 0.4), 50 pi of culture from each cultivation plates was added to 50 pi of 30% (v/v) glycerol in new 96well plates to create a replica plate for storage. 100 pi of fresh LB medium with ampicillin, L-arabinose at a final concentration of 3 mM and IPTG at a final concentration of 0.5 mM were added to each well of the cultivation plate and incubated at 30 °C at 200 r.p.m. for 4 hrs. Cells were harvested and frozen at -80 °C. Library screening. Library screening was performed using the modified pH colorimetric assay described by Holloway et al.2 9 4 . The assay is based on the detection of the protons produced during the dehalogenation reaction. After 10 min at room temperature, 50 pi of the lysis buffer (1 m M HEPES, 20 mM Na2S04 and 1 m M EDTA, pH 8.2) was added to each well of the defrosted plates. Cell debris was removed from the lysate by centrifugation at 1,600 g for 20 min after l h incubation at 100 r.p.m. at room temperature. 20 pi of lysate was transferred into each well of a new MTP and 180 pi of assay buffer [52% DMSO (v/v), 1 mM HEPES, 20 mM Na2S04 and 1 mM EDTA, pH 8.2] containing 1,2-dibromoethane (DBE, 9.3 mM) was added. The substrate was incubated in the reaction buffer at 37 °C for 30 min before starting the reaction. The MTP plate was sealed carefully with a lid and parafilm. The reaction mixture was then diluted using a buffer solution containing the pH indicator phenol red (1 m M HEPES, 20 m M Na2S04 and 1 m M EDTA, 50 pg ml"1 phenol red, pH 8.2) for detection after 14 hrs of dehalogenation. The change in the color of the pH indicator was estimated by spectrophotometry at 540 nm as described by Holloway et al.2 9 4 . Expression and purification of proteins. Recombinant plasmids with the DhaA variants were transformed into E. coli BL21(DE3). For overexpression, cells were grown at 37 °C to 148 an optical density (OD600) of about 0.6 in 1 L of LB medium containing ampicillin (100 ug ml" 1 ). Protein expression was induced by adding IPTG to a final concentration of 0.5 mM in LB medium and the temperature was decreased to 20 °C. Cells were harvested by centrifugation for 10 min at 3,700 g after overnight cultivation. During harvesting, cells were washed once with 50 mM phosphate buffer with 10 % glycerol (pH 7.5) and then resuspended in equilibrating purification buffer (16.4 mM K2HPO4, 3.6 mM KH2PO4, 500 mM NaCI, 10 mM imidazole, pH 7.5). Harvested cells were kept at -80 °C. Defrosted cells were disrupted by sonication with a Hielscher UP200S ultrasonic processor (Hielscher Ultrasonics, Teltow, Germany) and C-terminus His-tagged enzymes were purified to homogeneity using Ni-NTA Superflow Cartridges (Qiagen, Hilden, Germany) as described previously2 9 6 . The eluted proteins were dialyzed against 50 mM phosphate buffer (pH 7.5). Protein concentrations were determined using the Bradford reagent (Sigma-Aldrich, St. Louis, USA) with bovine serum albumin as a standard. The purity of the resulting proteins was checked by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) in 15% polyacrylamide gels. The gels were stained with Coomassie brilliant blue R-250 dye (Fluka, Buchs, Switzerland) and the molecular mass of the proteins was determined using the Protein Molecular Weight Marker (Fermentas, Burlington, Canada). Circular dichroism (CD) spectroscopy. CD spectra were recorded at 20 °C using a Chirascan spectropolarimeter (Applied Photophysics, Leatherhead, United Kingdom) equipped with a Peltier thermostat. Data were collected from 185 to 260 nm, at 100 nm min"1 with a 1 s response time and 2 nm bandwidth using a 0.1 cm quartz cuvette. Each spectrum shown is the average of five individual scans and was corrected for the buffer's absorbance. Collected CD data were expressed in terms of the mean residue ellipticity. The thermal unfolding of the enzymes was followed by monitoring their ellipticity at 222 nm during heating from 20 to 80 °C at a rate of 1 °C min"1 , with a resolution of 0.1 °C. The resulting thermal denaturation curves were roughly normalized to represent signal changes between approximately 1 and 0 and fitted to sigmoidal curves using Origin 6.1 (OriginLab, Northampton, USA). Melting temperatures (Tm) were calculated as the midpoints of the enzymes' normalized thermal transitions. 149 Activity assay. Enzymatic activity was assayed using the colorimetric method developed by Iwasaki et al.2 4 1 . The release of halide ions was analyzed spectrophotometrically at 460 nm using a SUNRISE microplate reader (Tecan, Grodig/Salzburg, Austria) after reaction with mercuric thiocyanate and ferric ammonium sulfate. The reactions were performed at 37 °C in 25-ml Reacti-flasks closed by Mininert valves. The reaction mixtures contained 1,2dibromoethane dissolved in 10 ml of 100 mM glycine buffer (pH 8.6), 10 ml of 60 mM glycine buffer with 40% (v/v) DMSO or 10 ml of 48 mM glycine buffer with 52% (v/v) DMSO. The reactions were initiated by addition of enzyme and monitored by periodically withdrawing of 1 ml samples from the reaction mixture, immediately mixing them with 0.1 ml of 35% nitric acid to terminate the reaction, and analyzing the quenched samples spectrophotometrically. Dehalogenation activities were quantified as rates of product formation over time. Each activity was measured in 3 - 5 independent replicates and expressed as a mean values with a standard error. Principal component analysis. A matrix containing the activity data for nine wild-type HLDs and two mutant HLDs with 30 substrates was analyzed by Principal Component Analysis3 3 1 . The aim of the analysis was to uncover relationships between individual HLDs based on their activities towards the standardized set of substrates2 2 3 . Principal Component Analysis was performed using STATISTICA 10.0 (StatSoft, Tulsa, USA). The raw data were logtransformed and weighted relative to the individual enzyme's activity towards other substrates prior to performing Principal Component Analysis in order to better discern the enzyme specificity profiles2 2 3 . These transformed data were used to identify substrate specificity groups, i.e., groups of enzymes that exhibited similar specificity profiles regardless of their overall specific activities. Steady-state kinetics. Steady-state kinetic constants were determined at 37 °C in 25-ml Reacti-flasks closed by Mininert valves using the method previously described by Iwasaki et al.2 4 1 The reaction mixtures contained 1,2-dibromoethane dissolved in 10 ml of 100 mM glycine buffer (pH 8.6) or 10 ml of 60 mM glycine buffer with 40% (v/v) DMSO. The activity measurements were carried out using at least twelve different substrate concentrations 150 (0.2 - 20 mM). The initial concentration of 1,2-dibromoethane was determined by gas chromatography using a Trace GC 2000 (Finnigen, San Jose, USA) equipped with a flame ionization detector and a DB-FFAP 30 m x 0.25 mm x 0.25 pm capillary column (J&W Scientific, Folsom, USA). The reaction was started by adding the enzyme. Samples were periodically withdrawn over a 60 min measurement period, immediately quenched by mixing with 0.1 ml of 35% nitric acid, and then analyzed. All data points corresponded to the mean of 3 independent replicates. Kinetic parameters were determined by non-linear curve fitting of the resulting data points using Origin 6.1 (OriginLab, Northampton, USA) by the following equation for Michaelis-Menten kinetics (Chyba! Nenalezen zdroj odkazu.) a nd Hill equation including substrate inhibition (Chyba! Nenalezen zdroj odkazu.)2 4 0 , 3 3 2 , where Km is the Michaelis constant, /C0.5 is the substrate concentration at which halfmaximal velocity is achieved according to the cooperativity model, n is the Hill coefficient, Ks\ is the inhibition constant and kcat is the catalytic constant: v [S] Kim Km+[S] Equation 17 [S]n Kim K n n 0.5 V ^si J Equation 18 CrystalIographic analysis. Crystals of DhaA106 were obtained by the vapor diffusion method in a sitting drop at room temperature. Crystals were grown from the drop prepared by mixing 2 pi of the protein (10.6 mg ml"1 in 50 mM Tris-HCI pH 7.5) with 2 pi of precipitant solution (0.1M sodium acetate trihydrate pH 4.8, 0.2M ammonium acetate and 35% w/v PEG 4000) and equilibrated against 300 pi of reservoir solution. Diffraction data were collected at 100 K using a home-source X-ray diffraction station (rotating anode Nonius FR591, Bruker-Nonius) equipped with a MAR345 detector (1.542 A monochromatic fixed 151 wavelength), at a resolution of 1.69 A. The diffraction data were processed using the XDS program3 3 3 . The structure of DhaA106 was solved by the molecular-replacement method using the program MOLREP3 3 4 and the structure of DhaA143 3 5 (PDB ID 3G9X) as the search model. Model refinement was carried out using the program REFMAC 5 3 3 6 from the CCP4 package (Collaborative Computational Project, Number 4,1994), interspersed with manual adjustments using Coot3 3 7 . The quality of the model with respect to the experimental data was assessed using the program SFCHECK3 3 8 . All-atom contacts in the refined structure of DhaA106 were validated using the internal tools of Coot3 3 7 and the MOLPROBITY service3 3 9 . Preparation of protein structures for simulations. The structures of DhaA and DhaA80 were downloaded from the RCSB PDB database (PDB ID 4E46 and 4F60), while the structure of DhaA106 was obtained within this study (PDB ID 4WCV). All structures were prepared for analysis by removing ligands and water molecules. Missing atoms in side chains were added using the module of FoldX9 4 . Repaired structures were minimized using Rosetta's minimize_with_cst application. Both backbone and side chains optimization was enabled, the distance for full atom pair potential was set to 9 A, and standard weights for energy function with a constraint weight of 1 were used. The output of the minimization process was processed using the script convert_to_cst_file.sh to create a constraint file8 8 . Protocol 16 incorporating the backbone flexibility within the ddg_monomer module of Rosetta was applied to create a model of DhaA63 with default settings8 8 . All four structures were protonated using the H++ server at pH 7.53 4 0 . Water molecules from the respective crystal structures were added to the systems. In the case of DhaA63, non-overlapping water molecules from the crystal structure of DhaA80 were used. CI" and Na+ ions were added to a final concentration of 0.1 M using the Tleap module of AMBER 123 4 1 . Using the same module, an octahedral set of TIP3P water molecules3 4 2 was added such that all solute atoms within the system were at least 10 A from the octahedron's surface. Molecular dynamics simulations. Energy minimization and molecular dynamics (MD) simulations were carried out using the PMEMD module of AMBER12 with the fflO force field3 4 3 . Initially, the investigated systems were minimized by 500 steps of steepest descent 152 followed by 500 steps of conjugate gradient over five rounds with decreasing harmonic restraints. The restraints were applied as follows: 500 kcal.mol^.A"2 on all heavy atoms of the protein, and then 500, 125, 25 and 0 kcal.mol^.A"2 on backbone atoms only. The subsequent MD simulations employed periodic boundary conditions, using the particle mesh Ewald method to describe electrostatic interactions1 5 5 , 3 4 4 , a 10 A cut-off for nonbonded interactions, and a 2 fs time step with the SHAKE algorithm to fix all bonds containing hydrogens3 4 5 . Equilibration simulations consisted of two steps: (i) 20 ps of gradual heating from 0 to 300 K at constant volume, using a Langevin thermostat with a collision frequency of 1.0 ps"1 , and with harmonic restraints of 5.0 kcal.mol^.A"2 on the positions of all protein atoms, and (ii) 2000 ps of unrestrained MD at 300 K using the Langevin thermostat at a constant pressure of 1.0 bar using a pressure coupling constant of 1.0 ps. Finally, two separate 200 ns long production MD simulations were run for each system using the same settings as the second step of MD equilibration. Coordinates were saved at intervals of 2 ps and the resulting trajectories were analyzed using the Cpptraj module of AMBER12, and visualized using Pymol 1.5 (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrodinger, LLC) and VMD 1.9.13 4 6 . Tunnels analysis. Tunnels were analyzed using CAVER 3.012 9 9 .100,000 snapshots sampled every 2 ps from 200 ns molecular dynamic simulations were used as input structures. Each atom in the structure was approximated by 12+1 spheres. The tunnel search was performed using a probe radius of 1.0 A and its opening (i.e. ability to accommodate water molecules) was assessed using a 1.4 A probe; these values correspond to the program's default settings. 100,000 randomly selected tunnels were clustered into 25 clusters using hierarchical average link clustering with a clustering threshold of 5. The remaining tunnels were assigned to individual clusters using supervised machine learning. The starting point initially specified by ND2 atom of Asn 41, OD2 atom of Asp 106, NE1 atom of Trp 107 and NE2 atom of His 272 was automatically optimized to prevent its collision with protein atoms. 153 Acknowledgements The work was supported by the Grant Agency of the Czech Republic (P207/12/0775) and the Czech Ministry of Education of the Czech Republic (L01214 and LH14027) and the European Regional Development Fund (ICRC CZ.l.05/1.1.00/02.0123). JB was supported by the "Employment of Best Young Scientists for International Cooperation Empowerment" (CZ.l.07/2.3.00/30.0037) project co-financed by the European Social Fund and the state budget of the Czech Republic. MetaCentrum and CERIT-SC are acknowledged for providing access to computing facilities (LM2010005 and CZ.l.05/3.2.00/08.0144). The authors would like to express thanks to Tatsiana Holubeva for help with enzyme crystallization. 154 6.7 Supporting Information Table S 19 Mutations of DhaA80 in the positions 172 and 176 with predicted neutral or stabilizing effects. F176 V172 AAG [kcal mo|-1 ]M SDN Variant'01 W I -0.49 0.132 a w I -0.48 0.049 b F I -0.37 0.141 a W V -0.04 0.076 a F V -0.03 0.057 a W V -0.01 0.069 b F V 0.47 0.574 b M I 0.54 0.080 a M I 0.6 0.092 b M L 0.65 0.238 a L I 0.67 0.087 a W L 0.74 0.740 a F I 0.83 0.612 b L I 0.84 0.071 b M L 0.98 0.667 [a] AAG predicted by FoldX, [b] standard deviation of FoldX predictions, [c] for each mutant, two variants differing in the order of introducing the mutations were evaluated: a - variant with lower AAG, b - variant with higher AAG. 155 Table S 20 Summary of mutations in studied DhaA variants. Position Vdlldlll 78 80 148 171 172 176 227 240 291 292 DhaA D F T G A C N W P A DhaA63 G S L Q V F T Y A G DhaA80 D F L Q V F N W P A DhaA106 D F L Q V G N W P A Table S 21 Specific activity of DhaA variants with 1,2-dibromoethane in three different environments. Variant Aqueous buffer 40% DMSO 52% DMSO [nmols1 mg1 ] [nmols1 mg1 ] [nmols~1 mg~1 DhaA 181.25 ±9.4 2.68 ±0.51 NOP DhaA63 6.57 ±0.1 14.09 ±0.52 1.0 ±0.0 DhaA80 1.83 ±0.12 3.69 ±0.69 0.19 ±0.11 DhaA106 60.20 ±9.52 35.0 ±1.8 3.34 ±2.3 [a] ND, not detected 156 Table S 22. Specific activities of DhaA, DhaA80 and DhaA106 with the set of thirty halogenated substrates. No. Substrate DhaA[al Specific activity [nmols-1 DhaA80 •mg"1 ] DhaA106 4 1-chlorobutane 12.78 0.57 14.13 6 1-chlorohexane 6.46 2.78 7.84 18 1-bromobutane 11.62 1.40 12.18 20 1 -bromohexane 13.89 2.44 8.15 28 1 -iodopropane 22.79 1.75 25.85 29 1-iodobutane 14.84 1.33 11.41 31 1-iodohexane 12.00 1.39 6.93 37 1,2-dichloroethane 1.08 1.89 2.94 38 1,3-dichloropropane 21.78 0.62 30.54 40 1,5-dichloropentane 8.58 2.35 7.21 47 1,2-dibromoethane 181.25 1.83 60.20 48 1,3-dibromopropane 20.01 1.12 9.08 52 1 -bromo-3-chloropropane 22.19 1.23 16.78 54 1,3-diiodopropane 39.13 1.87 31.13 64 2-iodobutane 6.99 2.47 12.49 67 1,2-dichloropropane 0.00 0.88 0.00 72 1,2-dibromopropane 36.51 1.52 59.60 76 2-bromo-1 -chloropropane 19.45 NDl b l 43.74 80 1,2,3-trichloropropane 1.82 2.74 4.55 111 bis(2-chloroethyl)ether 9.10 3.04 25.91 115 chlorocyclohexane 0.70 1.34 1.02 117 bromocyclohexane 2.27 1.95 6.62 119 (1 -bromomethyl)cyclohexane 2.27 0.87 1.05 137 1 -bromo-2-chloroethane 74.90 2.72 44.83 138 chlorocyclopentane 5.29 0.87 7.34 141 4-bromobutyronitrile 39.63 2.55 57.01 154 1,2,3-tribromopropane 49.71 3.02 64.11 155 1,2-dibromo-3-chloropropane 45.08 2.66 57.56 209 3-chloro-2-methylpropene 15.48 1.44 12.82 225 2,3-dichloropropene 23.88 0.57 2.45 [a] Data from Koudelakova et al. 2013, [b] ND, not determined. 157 Table S 23. Diffraction data collection and refinement statistics of DhaA106. X-ray diffraction data collection statistics Space group P1 Cell parameters (A, °) a = 42.585, b = 44.477, c = 46.508 a = 115.466, ß = 98.790, y = 109.122 Number of molecules in AU 1 Wavelength (A) 1.541790 Resolution (A) 1.69 Number of unique reflections 28416 (4407) Redundancy 2.78 (2.73) Completeness (%) 92.61 (89.3) R M '•merge 5.7(13.65) Average l/a(l) 18.76 (9.16) Wilson B (A2 ) 14.578 Refinement statistics Resolution range (A) 39.42-1.69(1.733-1.689) No. of reflections in working set 26995 (1841) R value (%)tb ' 12.015 R,ree value (%)[cI 17.573 RMSD bond length (A) 0.020 RMSD angle (°) 1.846 No. of atoms in AU 2,449 No. of protein atoms in AU 2,449 No. of water molecules in AU 477 No. of acetate ions in AU 2 No. of chloride ions in AU 1 Mean B value (A2 ) 10.459 Ramachandran plot statistics Residues in favored regions (%) 97.1 (303/312) Residues in allowed regions (%) 100 (313/313) PDB code 4WCV The data in parentheses refer to the highest-resolution shell. [a] Rmerge = I*ulili(hkl) - (/(M/J)|/Ehk|Ei li(hkl), where the h(hkl) is an individual intensity of the Ah observation of reflection hkl and {I(hkl)) is the average intensity of reflection hkl with summation over all data, [b] R-value = ||F0| - |FC||/|F0|, where F0 and Fc are the observed and calculated structure factors, respectively, [c] Rfree is equivalent to R value but is calculated for 5 % of the reflections chosen at random and omitted from the refinement process. 158 Table S 24. Comparison of main and slot tunnels of DhaA variants calculated by CAVER 3.01 from MD trajectories. Enzyme Tunnel Average Bottleneck Radius [A] Maximal Bottleneck Radius [A] Average Length [A] Average Curvature Average Throughput DhaA main slot 1.52 1.15 3.07 2.19 12.01 17.18 1.20 1.40 0.68 0.49 DhaA63 main slot 1.10 1.10 1.86 1.96 15.11 17.18 1.29 1.37 0.49 0.46 DhaA80 main slot 1.10 1.06 1.75 1.52 11.88 19.64 1.29 1.47 0.53 0.39 DhaA106 main slot 1.45 1.09 2.84 2.29 15.74 18.96 1.33 1.40 0.62 0.44 Table S 25. Characteristics of the main tunnel of DhaA variants calculated by CAVER 3.01 from MD trajectories. Enzyme Tunnel detected [%] Tunnel open [%] Average bottleneck radius [A] Maximum bottleneck radius [A] DhaA 93.92 58.36 1.52 3.07 DhaA63 6.38 0.05 1.10 1.72 DhaA80 1.14 0.02 1.10 1.75 DhaA106 86.21 47.53 1.45 2.84 159 S-o g o 5 x 15 10 5 0 -5 -10 DhaA DhaA63 DhaA80 DhaA106 180 200 220 240 Wavelength I nm 260 Figure S 16. Far-UV circular dichroism spectra of DhaA variants. a) b) „« 0 SSG-III DrbA * SSG-I DhlA. i LinE DhaA | „ DbeAi LinE DhaA DmbC DatA* SSG-II ^ ^ ^ ^ ^ ^ SSG-IV DmbA -6 -4 -2 0 2 4 6 f, Figure S 17. Statistical analysis of the substrate specificity data, a) The score plot ti/ti from PCA with transformed dataset. The score plot is a two-dimensional window into the multidimensional space, where the objects (enzymes) with similar properties (specificity profiles) are collocated. The ti/ti score plot describing 44.5 % of variance in the dataset shows the enzymes clustered in individual substrate specificity groups (SSGs). DhaA106 was clustered to the same substrate specificity group as both DhaA and DhaA80. b) The corresponding loading plot pi/pi from PCA with transformed dataset showing the main substrates for each SSG. Numbering of the substrates is provided in Table S3. 160 Figure S18. The structure of DhaA106 determined by protein crystallography. Green cartoon represents the secondary structure elements. The amino acid substitutions in the tunnel are shown as black sticks. The tunnel calculated using CAVER 3.01 is shown in red. 161 MD1 MD2 0 50 100 150 0 50 100 150 t/ns f/ns 00 CD < CTj < 200 O CO < CO CD O < 03 sz Q < 13 14 12 < V 10 - T3 50 I 100 r/ns 150 200 200 Figure S 19. Time evolution of the distance between Cot atoms of the residues 144 and 176 at the end of the helices lining the main access tunnel. Horizontal black line represents the average distance in DhaA80 and DhaA63. 162 7 Site-Specific Analysis of Protein Hydration Based on Unnatural Amino Acid Fluorescence Mariana Amaro1 , Jan Brezovský2 -4 , Silvia Kováčova3 , 4 , Jan Sýkora1 , David Bednář2 4 , Václav Němec3 , 4 , Veronika Lišková2 , Nagendra Prasad Kurumbang2 , Koen Beerens2 , Radka Chaloupková2 , Kamil Paruch*3 '4 , Martin Hof*1 , and Jiří Damborský*2 -4 1 J. Heyrovsky Institute of Physical Chemistry of the ASCR, v. v. i., Academy of Sciences of the Czech Republic, Dolejskova 3,182 23 Prague 8, Czech Republic 2 Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic 3 Department of Chemistry, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00 Brno, Czech Republic 4 International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic Journal of the American Chemical Society, 2015,137 (15), 4988-4992 DOI: 10.1021/jacs.5b01681 163 7.1 Abstract Hydration of proteins profoundly affects their functions. We describe a simple and general method for site-specific analysis of protein hydration based on the in vivo incorporation of fluorescent unnatural amino acids and their analysis by steady-state fluorescence spectroscopy. Using this method, we investigate the hydration of functionally important regions of dehalogenases. The experimental results are compared to findings from molecular dynamics simulations. 164 7.2 Introduction Protein hydration is important in enzymatic catalysis3 4 7 since it influences enzymes' kinetics3 4 8 and enantioselectivity3 4 9 , protein folding3 5 0 , ligand-binding, and DNA-protein interactions3 5 1 . Hydration has been studied using X-ray absorption3 5 2 , NMR spectroscopy3 5 3 , neutron scattering3 5 4 , dielectric relaxation spectroscopy3 5 5 , twodimensional infrared spectroscopy3 5 6 , and time-resolved fluorescence spectroscopy3 5 7 . All these techniques provide valuable information on the arrangement of water molecules in the vicinity of specific protein moieties. However, they require costly and highly specialized instrumentation. Here we present a new technique for studying protein hydration using very basic laboratory instruments. Our approach is based on a recently developed method for the site-specific incorporation of unnatural amino acids (UAAs) into protein structures3 5 8 and the analysis of their properties using steady-state fluorescence (SSF) spectroscopy. Additionally, we developed a more economical synthesis of l-(7hydroxycoumarin-4-yl) ethylglycine, whose preparation had represented a bottleneck for its wider use in protein labeling. Furthermore, for the first time a quantitative analysis of the hydroxycoumarin fluorescence within a specific protein region is performed, followed by newly established data deconvolution that enables us to estimate the level of protein hydration. We demonstrate its effectiveness by characterizing the molecular environments of UAAs in the tunnel mouths of two haloalkane dehalogenases and comparing these experimental results to the output of molecular dynamics (MD) simulations and previous reports1 6 6 -3 5 7 . 7.3 Results and Discussion To deliver an UAA into a specific site encoded by a nonsense codon, the tRNA/aminoacyltRNAsynthetase pair needs to be constructed. In this project we utilized the l-(7hydroxycoumarin-4-yl) ethylglycine3 5 9 , which contains the fluorophore 7-hydroxy-4methylcoumarin (7H4MC). The photophysics of 7H4MC is environment-sensitive3 6 0 , 3 6 1 . 165 Three forms of 7H4MC - neutral, anionic, complexed - exist in equilibrium (Figure S 20A), each with different excitation maxima. In addition, a tautomeric form can be created in the excited state via proton transfer between the spatially separated carbonyl and hydroxyl groups (Figure 21A, Figure S 20B). Accordingly, the SSF spectra of all four excited state forms are shifted in respect to each other and thus their contributions to the overall signal can be separated. As the equilibria (both in the ground and excited states) are governed by the hydration state of the dye, information on the contributions of the corresponding microenvironments to the SSF spectra can be gained. 0.4 1.0 2.7 7.4 20.1 280 300 320 340 ln8K ) and entropic ( T A R . S A S * 2 9 8 K ) terms. a Data from Prokop et al.'3 169 DhaA:C176UAA DbjA:G183UAA Figure 22. Local environment and hydration of the UAAs in DhaA:C176UAA and DbjA:G183UAA. (A) Positions of the UAAs (indicated by the red surface patches) and the tunnel mouths (indicated by black arrows) on the surface of the enzymes. (B) Local interactions of UAAs with hydrogen acceptors and amino groups (purple sticks). The buried (UAA_3) and exposed (UAA_4) conformations of the UAA in DbjA:G183UAA are indicated by green and yellow sticks, respectively. The conformation of the UAA in DhaA:C176UAA is indicated by the orange stick model. (C) Hydration of UAAs. The blue surface represents enzyme regions that are occupied by water molecules for at least 40 % of the total simulation time. For the decomposition of the SSF spectrum into its separate components, the excitation wavelength was set to 320 nm in order to predominantly excite the neutral form of the fluorophore. The neutral form then populates the complexed, anionic and tautomeric forms (Figure 21A). As shown in Figure 21C, the fluorescence emission of DhaA:C176UAA originates from the neutral, complexed and anionic forms. The tautomer contributes only marginally to the emission spectrum. Conversely, significant tautomer emission was observed for DbjA:G183UAA (Figure 21D). The formation of the tautomer in DbjA:G183UAA indicates that the chromophore is exposed to 'structured water', i.e. water molecules with residence times longer than the fluorescence time scale. The presence of the tautomer was also confirmed by time-resolved fluorescence spectroscopy (Figure S 23). The decay curve 170 contained a negative component due to the population of the tautomer from the excited neutral species. Such particular feature of the fluorescence decay was not observed for DhaA:C176UAA. Because the tautomeric and anionic species are only formed in aqueous environments, their summed contributions to the overall signal reflect the hydration level within the dye's microenvironment. The anionic and tautomeric forms account only for 50% of the emission signal of DhaA:C176UAA. However, 70% of the emission signal of DbjA:G183UAA can be attributed to these forms (Figure 21E, Table S 29). This indicates that the microenvironment surrounding the UAA in DhaA:C176UAA is much less extensively hydrated tha n in DbjA:G183UAA. To further validate the approach, we thermally denaturated the enzymes. The UAA originally buried within the protein interior becomes exposed to more hydrated microenvironment upon the denaturation process. As expected, for both enzymes the increasing temperature causes an increase in the contribution of the anion at the expense of the others forms, leading to the growth of AA +AT parameter (Figure 21E, Tables 29). Replicated 200 ns MD simulations were performed for both labelled and wild type structures of studied enzymes. The periodic-boundary NPT simulations were carried out in AMBER12 (University of California, San Francisco, 2012) using fflO force field (Table S 30, Figure S 24)3 4 3 '3 6 3 '3 6 4 . The level of hydration within the tunnel mouth of the wild type enzymes was about 1.7-times higher in DbjA than in DhaA (Table 8). The MD simulations show that the UAA is in one dominant conformation in the tunnel of DhaA:C176UAA (Figure S 25). Conversely, the wider tunnel mouth of DbjA:G183UAA allowed the UAA to adopt two equally relevant stable conformations, one buried into the tunnel (UAA_3) and one more exposed (UAA_4; Table S 31, Figure S 25). The simulations suggested that the UAA microenvironment is substantially less hydrated in DhaA:C176UAA than in both variants of DbjA:G183UAA (Table 8, Figure 22C), which is consistent with our experimental results. Additionally, the simulation indicated that the fluorophore often forms hydrogen bonds with adjacent amino acid residues -Thrl48 in DhaA:C176UAA and Glul46 in DbjA:G183UAA - , and with amino acids whose side chains contain amino groups - His272 for DhaA:C176UAA and Argl79 or Hisl39 for DbjA:G183UAA - (Figure 22B, Table S 32). This 171 may facilitate the formation of the complexed form detected in our experiments. Finally, the simulations indicated that water molecules in the microenvironment of the buried conformation of DbjA:G183UAA have residence times of up to 60 ns, which is much longer than the fluorescence time scale (Figure S 26). This explains the strong contribution of the tautomeric form for this mutant. Such extremely long residence times are not found in structures without UAA (Figure S 26). This suggests that the "structured water" detected in DbjA:G183UAA is induced by the UAA itself locking several water molecules within the active site pocket. Table 8. Hydration of haloalkane dehalogenases DhaA, DbjA, and DhaA:C176UAA and DbjA:G183UAA revealed by fluorescence spectroscopy and molecular dynamics simulations. Experimental parameters and results are presented in normal text; parameters and results obtained from MD simulations are presented in italics. Parameter DhaA:Ci76UAA DbjA:Gi83UAA DhaA DbjA Overall contributions of anionic and tautomeric forms to the emission SSF spectra UAA less hydrated UAA more hydrated Less hydrated tunnel mouth* More hydrated tunnel mouth* Number of water molecules within 5 A 9±2 2j±5 or 28±4 i6±4 27±4 *The enzymes have been characterized previously.11 The used methodology was different from the SSF method. 7.4 Conclusions Recent findings emphasizing the importance of dynamics and hydration in enzymatic catalysis and rational protein design3 4 8 -3 4 9 have created a strong demand for methods that provide site-specific information on these factors. In our approach, site-specificity is guaranteed by using UAA. SSF spectroscopic analysis of the hydroxycoumarin probe incorporated into the structure of the enzymes is a strikingly simple and universally applicable experimental procedure. The photophysics of the UAA provide qualitative information on the extent of hydration as demonstrated for two HLDs with already characterized hydration levels3 5 7 . Although MD simulations show that incorporation of the chromophore can influence the residential times of water molecules, the conclusions on 172 the hydration levels are valid. Given the ongoing development of UAA technology, this method could potentially be used to analyze hydration at specific sites in a wide range of proteins. Acknowledgement Financial support from the Czech Science Foundation via grants P208/12/G016 (M.H. and J.S.) and P207/12/0775 (R.Ch.) and the Ministry of Education of the Czech Republic (L01214; CZ. 1.05/1.1.00/02.0123) is acknowledged. Moreover, M.H acknowledges the Praemium Academie Award from Academy of Sciences of the Czech Republic. The work of J.B. and K.B was supported by Program of "Employment of Best Young Scientists for International Cooperation Empowerment" (CZ1.07/2.3.00/30.0037) with co-financing from the European Social Fund and the state budget of the Czech Republic. MetaCentrum is acknowledged for providing access to their computing facilities, supported by the Ministry of Education of the Czech Republic (LM2010005). CERIT-SC is acknowledged for providing access to their computing facilities, under the program Center CERIT scientific Cloud (CZ.1.05/3.2.00/08.0144). 173 7.5 Supplementary Information 7.5.1 Methods Chemical synthesis of unnatural amino acid (UAA) (2S)-(l-benzyl 7-ethyl 2-{[(benzyloxy)carbonyl]amino}-5-oxoheptadioat Carbonyldiimidazole (6.00 g, 37.00 mmol) was added in portions to (2S)-5-(benzyloxy)-4{[(benzyloxy)carbonyl]amino}-5-oxopentanoic acid (12.50g, 33.66 mmol) in anhydrous THF (120 mL) and the mixture was stirred under nitrogen at 25 °C for 90 min. Potassium ethyl malonate (5.50 g, 32.32 mmol) and magnesium chloride (6.00 g, 63.02 mmol) were then added and the mixture was stirred at 25 °C for 14 hr. The reaction mixture was poured into water (500 mL) and extracted with diethyl ether (3x400 mL). The combined extracts were washed with saturated aqueous solution of NaHC03 (200 mL), dried over Na2SÜ4, filtered, and the solvent was evaporated. The product was obtained as a pale yellow solid (14.34 g, 96 %) and was used directly in the next step without additional purification. Analytically pure sample can be obtained by flash column chromatography on silica gel (hexane, EtOAc/ 1:3). X H NMR (300 MHz, CDCIs): 6 1,26 (t, 3H); 1,86-2,08 (m, 1H); 2,11-2,28 (m, 1H); 2,48-2,76 (m, 2H); 3,35 (s, 2H); 4,17 (q, 2H); 4,30-4,49 (m, 1H); 5,10 (s, 2H); 5,16 (s, 2H); 5,38 (d, 1H). 174 L-(7-hydroxycoumarin-4-yl)ethylglycine Methanesulfonic acid (19.7 mL) was added at 0 °C (ice bath) to a mixture of (2S)-(l-benzyl 7-ethyl 2-{[(benzyloxy)carbonyl]amino}-5-oxoheptadioat (4.90 g, 11.11 mmol) and powdered resorcinol (6.12 g, 55.55 mmol). The mixture was stirred at 0 °C for 5 min, then it was allowed to warm to 25 °C and stirred for 15 hr. The resulting red solution was poured onto crushed ice (100 g) and water was added (100 mL). The mixture was extracted with diethyl ether (3x70 mL). The strongly acidic aqueous phase was loaded onto a column of Dowex 50WX8 (Aldrich, 100 g, diameter 4 cm, height 12 cm). The column was washed with water (800 mL) and then eluted with 1M aqueous solution of NaOH (total elution time: ca. 30 min). The dark red fractions were collected and concentrated aqueous HCI was added to adjust pH to 6. The solution was concentrated to % of the volume and allowed to stand in at 4 °C overnight. The red precipitate was collected by filtration, mixed with 4 mL of icecold water and the mixture was quickly filtered. The precipitate on the filter was collected and dried in a vacuum yielding a pink solid (1.97 g). The solid was suspended in anhydrous EtOH (200 mL), stirred at 70 °C for 15 min, and the precipitate was collected by filtration. The precipitate was suspended in anhydrous EtOH (200 mL), stirred at 70 °C for 15 min, and filtered. The solid was collected by filtration and dried in a vacuum to yield a pale pink solid (0.59 g, 19 %). X H NMR (500 MHz, DMSO-de): 6 1,91-2,10 (m, 2H); 2,77-2,95 (m, 2H); 6.08 (s, 1H); 6,73 (d, 2H); 6,85 (d, 1H); 7,70 (d, 1H). "C^H} NMR (125 MHz, D2 0): 6 25,03; 27,07; 29,31; 54,34; 103,15; 109,11; 11,92; 114,03; 126,22; 128,70; 128,80; 173,97. 175 Plasmids, strains and chemicals The plasmid (pEVOL-aaRS) carrying the engineered orthogonal tRNA and aminoacyltRNA synthatase pair was obtained from Professor Peter Schultz (The Scripps Research Institute, USA). Two target proteins, haloalkane dehalogenase DhaA and DbjA, were cloned in the pET21b vector for expression with a HiS6-tag to facilitate purification. Escherichia coli DH5a and E. coli BL21 (DE3) (Strategene) were used as regular cloning host and for protein expression, respectively. All antibiotics used and L-arabinose were obtained from SigmaAldrich and filter sterilized before use. Construction of mutants by site directed mutagenesis The corresponding codons for C176 of DhaA and G183 of DbjA were replaced by the amber stop codon (TAG) by inverse PCR. 5'-phoshorylated primers were designed based on the DhaA and DbjA nucleotide sequence and obtained from Sigma-Aldrich. Primer sequences were the following: DhaA-C176-for: 5'-TACGGAGGTCGAGATGGACCACTATCG-3' and DhaA-C176-rev: 5'-AGCGGACGGACGACCTATTTCGGG-3' for mutation of Cysl76 in DhaA; and DbjA-G183-for: 5'-GCTCGGCGACGAAGAAATGGCG-3' and DbjA-G183-rev: 5'TTGCGGACGATTCCCTAGGGCAGAAC-3' for mutation of G183 in DbjA. Briefly, 0.8 pi Pfu DNA polymerase (Promega), 5 pi of Pfu DNA polymerase buffer with MgSCM, 4 pi of dNTP mix 2.5mM each, 1 pi of each primer with 1 pM concentration, 1 pi of plasmid template (pET21dhaA or pET21-dbjA) and 37.2 pi of sterile water were added in a PCR tube. The thermocycler was set at 95°C for 2 min for denaturation, and 30 cycles of 95°C for 1 min for multiplication, 58°C for 30 s for annealing and 72°C for 12 min for elongation, followed by 72°C for 5 min for final extension. The PCR products were treated with 1-2 pi Dpnl (New England Biolabs) for 2 h at 37°C and then purified by using a PCR purification kit (Qiagen). Blunt end ligation was performed at 16°C overnight by adding 1 pi T4 DNA ligase and buffer (Promega) in 10 pi purified PCR sample. Competent E. coli DH5a were chemically transformed with the ligation product. Colonies were selected on LB agar plates supplemented with 100 pg/ml of ampicillin. Plasmid DNA was isolated and the mutations 177 were confirmed by sequencing. The final plasmids were named pET21-dhaA:C176TAG and pET21-dbjA:G183TAG (Cysl76 and Glyl83 replaced by the TAG stop codon, respectively). Transformation and expression of protein variants The expression host E. coli BL21 (DE3) was chemically transformed with pEVOL-aaRS and the respective mutant plasmid. The colonies were selected on LB agar plates containing ampicillin (100 pg/ml) and chloramphenicol (34 pg/ml). A single colony from each mutant was picked and cultivated in 10 mL LB supplemented with ampicillin and chloramphenicol. Fresh overnight culture (1 ml) was added to 1 L of LB medium with the appropriate antibiotics and grown at 37°C, 105 rpm until OD600 reached ~0.5. Cells were harvested aseptically by centrifugation at 4°C and the supernatant was discarded. Cell pellets were resuspended in 1 L fresh LB containing the required antibiotics and 10 ml of UAA solution was added for UAA feeding. For the UAA solution, 263 mg of UAA (containing the 7H4MC fluorophore) was dissolved in 10 ml of 100 mM KOH, pH was adjusted to 7.0 by HCI and then filter sterilized. The cultures were incubated again at 37°C, 105 rpm until the OD600 reached 0.8-1.0. The samples were then cooled to 20°C and expression was induced by addition of IPTG and L-arabinose (0.5 mM and 0.02 % (w/v) final concentration, respectively). Incubation was continued overnight at 20°C and finally the cells were harvested by centrifugation. The UAA labeled protein was purified via standard His-tag purification. Expression and purification were checked via SDS-PAGE. Confirmation of protein structure by circular dichroism (CD) analysis CD spectra were recorded at room temperature using a Chirascan spectrometer (Applied Photophysics, UK). Data were collected from 185 to 260 nm (at 100 nm/min, 1 s response time and 2 nm bandwidth) using a 0.1 cm quartz cuvette containing the enzymes in 50 mM potassium phosphate buffer (pH 7.5). Each spectrum shown is the average of five to ten individual scans and is corrected for absorbance caused by the buffer. CD data were expressed in terms of the mean residue ellipticity (OMRE) using the Equation 19: 178 Equation 19 where 0O bs is the observed ellipticity in degrees, M w is the protein molecular weight, n is number of residues, I is the cell path length (0.1 cm), c is the protein concentration and the factor 100 originates from the conversion of the molecular weight to mg/dmol. Confirmation of protein function by determination of specific activity Enzymatic activity was assayed by the colorimetric method developed by Iwasaki and coworkers2 4 1 . The release of halide ions was analyzed spectrophotometrically at 460 nm using the Sunrise microplate reader (Tecan, Austria) after reaction with mercuric thiocyanate and ferric ammonium sulfate. The dehalogenation reaction was performed with 1,2dibromoethane as a substrate at 37 °C in 25 ml Reacti-flasks closed by Mininert valves. The reaction mixture contained 10 ml of glycine buffer (100 mM, pH 8.6) and 10 pi of the substrate 1,2-dibromoethane. The reaction was initiated by addition of the enzyme in a final concentration of 0.15 pM. The reaction was monitored by withdrawing 1 ml samples from the reaction mixture at periodic intervals and immediately mixing the removed sample with 0.1 ml of 35% nitric acid to terminate the reaction. Dehalogenation activity was quantified as the rate of product formation in time. Thermodynamic analysis of enzyme enantioselectivity Temperature dependence of DhaA enantioselectivity (E value) was analyzed in 25-ml Reacti-Flasks closed by Mininert Valves containing 25 ml of 50 mM of Tris-sulfate buffer (pH 8.2) and 10 pi of racemic 2-bromopentane. The enzymatic reaction was initiated by the addition of appropriate amounts of enzyme. Depending on its activity, final concentration of enzyme was 0.7-3.7 u.M. The reaction was monitored by periodically withdrawing 0.5 ml samples from the reaction mixture. The reaction was stopped by mixing the sample with 1 ml of diethyl ether containing 1,2-dichloroethane as an internal standard. Diethyl ether was anhydrated on a glass column with sodium sulphate after the extraction. The samples were analyzed using Hewlett-Packard 6890 gas chromatograph 179 (Agilent, USA) equipped with a flame ionization detector and chiral capillary column Chiraldex G-TA (Alltech, USA). The difference in activation enthalpy and entropy between enantiomers, denoted AR.SAH* and AR-SAS*, respectively, was determined by studying the variation of the enantiomeric ratio with temperature according to Equation 20: , _ A R S A H * 1 A R S A 5 J ln£ = — ^ + _ R = S R T R Equation 20 where R is the universal gas constant and T is absolute temperature. InE varies linearly with the reciprocal temperature therefore AR-SAH*/R and AR-SAS*/R were determined as the slope and intercept of the determined variation of the enantiomeric ratio, respectively. Steady-statefluorescence spectroscopy Steady-state fluorescence (SSF) spectra were recorded on Fluorolog-3 spectrofluorometer (model FL3-11; HORIBA Jobin Yvon) equipped with a Xenon-arc lamp. All spectra were collected in 1 nm steps (1 or 2 nm bandwidths were chosen for both the excitation and emission monochromators depending on the signal strength) at 10 °C. The recorded spectra were then fitted by means of nonlinear least-square procedure to the sum of asymmetric peak functions, which are expressed as: y = y0 + Ae exp x — x. \ w J + 1 Equation 21 where yo stands for the offset, and xc , w and A represent the center of the band, its width and amplitude, respectively. The above fitting function was chosen because it provided the best fit results to the emission spectra of the individual 7H4MC forms (whose spectra was obtained via dissolution in different solvents). The initial estimations for the parameter xc were set to the values of 380 nm, 420 nm, 450 nm and 480 nm which correspond to the 180 wavelengths of the emission maximum of the particular forms of 7H4MC. The xc parameter was kept within the range of ± 8 nm with respect to the initial value during the fitting procedure. The R-square parameter provided by the software OriginPro 8 (OriginLab Corporation) was taken as a measure for the goodness of the fit. Time-resolved fluorescence decays were measured using the time-correlated single photon counting technique on an IBH 5000 U SPC instrument (HORIBA Jobin-Yvon, USA) equipped with a cooled Hamamatsu R3809U-50 microchannel plate photomultiplier (Hamamatsu, Japan) with 40 ps time resolution and time setting of 7 ps per channel. Bandwidths for both the excitation and emission monochromators were set to 8 nm. A 399 nm cut-off filter was used to eliminate scattered light. Samples were excited at 373 nm with an IBH NanoLED- 11 diode laser (80 ps fwhm) or at 340 nm with IBH NanoLED N-340 (900 ps fwhm) with a repetition frequency of 1 MHz. The detected signal was kept below 20 000 counts per second in order to avoid shortening of the recorded lifetime due to the pile-up effect. The experimental temperature was set to 10 °C. Fluorescence decays were fitted (by iterative reconvolution procedure with IBH DAS6 software) to a multiexponential function (Equation 22) convoluted with the experimental response function IRF ("prompt"), yielding sets of lifetimes v, and corresponding amplitudes Ai. The average lifetimes were calculated according to Equation 23. 7(0 = 2 4 « "/Ti ®IRF Equation 22 Equation 23 181 Parameterization ofUAA residue for force field calculations The structure of the UAA residue was constructed in the extended and a-helical conformations of backbone atoms using Avogadro 1.0.3 program3 6 5 . For both backbone conformations, the lowest energy conformations of the UAA side-chain were identified with systematic rotor search keeping the backbone part of the UAA residue fixed. The UAA residue was then capped with N-methylamide (NME) and acetyl (ACE) residues for the purpose of charge fitting. The geometry of the selected conformations (UAA1-4; Figure S 24) was optimized employing the MP2/6-31G* wave function using Gaussian09 program revision D.013 6 6 . The partial atomic charges of the novel residue were obtained using RESP ESP charge derive (R.E.D.) server 2.03 6 7 '3 6 8 with HF/6-31G* level of theory using the Gaussian09 program. The charges on the UAA residue were derived employing RESP-A1A charge model using multi-conformation multi-orientation RESP fit. The charges on the capping NME and ACE residues were constrained to zero during the fitting procedure. The charges on the four atoms forming the peptide bond (N26, H27, C28 and 029) were constrained to the corresponding values of the electro-neutral residues from the force field of Cornell et al.3 6 9 The atom types were derived in analogy with the force field of Cornell et al. The only exception is the aromatic oxygen contained in the moiety of fluorescent probe whose parameters were obtained from the study of VanBeek et al3 7 0 . Partial charges and atom types of the UAA residue are summarized in Supplementary Table 5. Preparation of protein structures for simulations Structures of DhaA (PDB-ID: 4E46) and DbjA (PDB-ID: 3A2M - chain A) were downloaded from the RCSB PDB database3 7 1 . All selected crystal structures were prepared for analysis by removing ligands and water molecules. Both structures were protonated by H++ server at pH 7.53 4 0 . The viability of introducing the four selected conformations (UAA1-4) into both enzymes was accessed using Pymol 1.72 3 5 . In the case of DbjA, three conformations (UAA1, UAA3 and UAA4) fitted without serious steric clashes. In the case of DhaA, only a single conformation (UAA2) was viable. All water molecules from the crystal structure of DbjA that did not overlapped with the protein structure were returned to the system. In case of 182 DhaA, non-overlapping water molecules from the crystal structure 1CQW were added in order to properly solvate the enzyme active site (structure 4E46 contains bound ligand). CI" and Na+ ions were added to the final concentration of 0.1M using tLeap module of AMBER 123 4 1 . Using the same module, an octahedron of TIP3P water molecules3 4 2 was added at the distance of 10 A from any atom in the system. Molecular dynamics simulation Energy minimization and MD simulations were carried out in PMEMD.CUDA module3 7 2 , 3 7 3 of AMBER12 using fflO force f j e l d 3 4 3 ' 3 6 3 ' 3 6 4 . Initially, the investigated systems were minimized by 500 steps of steepest descent followed by 500 steps of conjugate gradient in five rounds of decreasing harmonic restraints. The restraints were applied as follows: 500 kcal.mol^.A"2 on all heavy atoms of protein, and then 500, 125, 25 and 0 kcal.mol^.A"2 on backbone atoms only. The subsequent MD simulations employed periodic boundary conditions, the particle mesh Ewald method for treatment of the electrostatic interactions1 5 5 , 3 4 4 , 10 A cutoff and 2 fs time step with the SHAKE algorithm to fix all bonds containing hydrogens3 4 5 . Equilibration simulations consisted of two steps: (i) 20 ps of gradual heating from 0 to 300 K under constant volume using a Langevin thermostat with collision frequency of 1.0 ps"1 with harmonic restraints of 5.0 kcal.mol^.A"2 on the position of all protein atoms; (ii) 2000 ps of unrestrained MD at 300 K using the Langevin thermostat and constant pressure of 1.0 bar using pressure coupling constant of 1.0 ps. Finally, two separate 200 ns long production MD simulations were run for each system using the same settings as the second step of equilibration MD. Coordinates were saved in 2 ps interval and the trajectories were analyzed using Cpptraj module3 7 4 of AMBER12, and visualized in Pymol 1.7 and VMD 1.9.13 4 6 . The calculation of the total free energy of enzymes as proxy to evaluation of their stability was performed by Molecular-Mechanics/Generalized-Born Surface Area. 1000 snapshot sampled every 100th frame from each MD trajectory was used in the analysis. The free energy was calculated by combining the gas phase energy contributions with solvation free energy components calculated from an implicit solvent model. Input topologies of sole enzymes were prepared by tLeap module of AMBER12 183 using fflO force field. The following settings were used for the calculation: PBradii were set to mbondi3, Generalized-Born model = 8 and saltcon = 0.1. The analysis was performed by a python script MMPBSA.py3 7 5 implemented in AmberToolsl3. 184 7.5.2 Supplementary Tables Table S 26. Areas of the deconvoluted emission spectra of the different forms of 7H4MC when embedded in AOT reverse micelles with different water content (w0) Wo A N (%) AT (%) AA (%) AA +AT (%) 0 100 0 0 0 2 100 0 0 0 5 92 8 0 8 10 68 18 14 32 20 49 25 26 51 40 45 7 48 55 AN , A T and A A stand for the area of the decomposed emission spectra of the neutral, tautomeric and anionic forms, respectively. The complex form is not created under the given conditions. The increase in the sum of the areas corresponding to the anionic and tautomeric form (AA +AT ) with the growing water content (wo) demonstrates how this parameter is suitable for qualitative characterization of the extent of hydration. Table S 27. Areas of the deconvoluted emission spectra of the different forms of 7H4MC when embedded in AOT reverse micelles with different content of imidazole aqueous solution (5M, pH ~ 13 and pH ~ 7) pH 13 p H 7 Wo AN (%) A c ( % ) AA (%) AN (%) A c ( % ) AA (%) 0 100 0 0 100 0 0 0.2 57 31 12 0.5 44 33 23 57 31 12 1 38 33 29 45 33 22 2 33 37 30 37 34 29 5 18 11 71 32 33 35 10 9 9 82 17 11 72 Imidazole was added in order to mimic the - N H + - and - N H - functional groups present in the protein matrix. This results in the formation of the complex form of 7H4MC (emission wavelength ~ 420 nm). A N , A c and A A stand for the area of the decomposed emission spectra of the neutral, complex and anionic forms, respectively. The tautomeric form is not created under the given conditions. The contribution of the anionic form A A increases with the growing wo and therefore reflects qualitatively the degree of hydration of the microenvironment surrounding the probe. 185 Table S 28. Specific activities of wild type haloalkane dehalogenase DhaA, DbjA and their variants DhaA:C176UAA and DbjA:G183UAA, with incorporated unnatural amino acid, measured with 1,2- dibromoethane. Specific activity Enzyme [umol s'Vmg of enzyme] D h a A 0.0648 D h a A : C 1 7 6 U A A 0.0493 D b j A 0.0928 D b j A : G 1 8 3 U A A 0.1154 Table S 29. Areas of the deconvoluted emission spectra of the different forms of the 7H4MC fluorophore present in the UAA incorporated in DhaA and DbjA. Data recorded at various temperatures in order to follow the effect of the thermal denaturation of the protein on the hydration parameter (AA + AT ). AN (%) A c ( % ) AA (%) A T (%) AA +AT (%) D h a A : C 1 7 6 U A A 10 °C 32 18 43 7 50 30 °C 25 19 56 0 56 50 °C 0 17 83 0 83 55 °C 0 13 87 0 87 65 °C 0 11 89 0 89 D b j A : G 1 8 3 U A A 10 °C 7 24 38 32 70 30 °C 5 25 49 22 71 50 °C 5 17 78 0 78 55 °C 0 19 81 0 81 60 °C 0 17 83 0 83 AT , A c , A N and A A stand for the area of the decomposed emission spectra of the tautomeric, complex, neutral and anionic forms, respectively. The intensities were corrected for the different quantum yields of the various forms based on the lifetime values recorded at their corresponding wavelengths (see Figure S 27). 186 Table S 30. Atom types and partial charges for the UAA residue. Atom ID Atom type Partial charge 1 C T -0.1295 2 H C 0.0771 3 H C 0.0771 4 C A 0.0630 5 C A -0.3605 6 H A 0.1538 7 C A 0.7456 8 O A -0.3536 9 C A 0.2830 10 C A 0.0023 11 C A -0.1637 12 H A 0.1666 13 C A -0.2489 14 H A 0.1852 15 C A 0.3629 16 C A -0.3933 17 H A 0.1877 18 O H -0.5843 19 H O 0.4412 20 0 -0.5754 21 C T -0.0119 22 H C 0.0434 23 H C 0.0434 24 C T 0.0159 25 H l 0.0873 26 N -0.4157 27 H 0.2719 28 C 0.5973 29 o -0.5679 Atom IDs are defined in Figure S 24. Atom types are based on Cornell et al. force field. Table S 31. Stability of enzymes with incorporated UAA. UAA Energy [kcal/mol] Enzyme conformation MD1 MD2 Average DhaA U A A _ 2 -6572±2 -6554±2 -6563±3 U A A _ 1 -7479±2* -7474±2 -7474±2 DbjA U A A _ 3 -7476±2 -7483±2 -7480±3 U A A _ 4 -7475±2 -7490±2 -7483±3 * conformation of UAA 1 was unstable in this simulation and changed to conformation UAA 3 Table S 32. Potential hydrogen bonding involving UAA. Hydrogen bonds formed with the sidechain of the UAA are shaded. UAA conformatio n Average geometry of Enzyme UAA conformatio n Acceptor Donor Occurrenc e[%] detected bond Distance Angle [°] UAA conformatio n [A] Thrl48 U A A 95±3 2.8 162.4 U A A His272 85±3 3.1 156.4 DhaA: C176UA A U A A _ 2 Leu173 U A A U A A Tyr273 75±5 78±3 3.3 3.5 150.9 156.5 Alal72 U A A 28±7 3.0 147.2 Alal45 U A A 2±1 3.3 143.44 G l u l 4 6 a U A A 82±15 2.7 163.3 Argl79 U A A 64±3 3.1 151.2 U A A _ 3 U A A He 185 25±1 3.0 142.6 D b j A V a i l 80 U A A 16±1 3.2 143.8D b j A G183UA U A A His 139 12±2 3.6 148.1 A Argl79 U A A 80±5 3.1 152.6 U A A _ 4 U A A He 185 13±8 3.0 141.8 V a i l 80 U A A 7±4 3.3 142.8 U A A A r g l 7 9 b 5±1 3.2 151.5 a Hydrogen bond with Glul46 is formed via OE1 and OE2 atoms. b Hydrogen bond with Argl79 is formed via HH11, HH21 and HE atoms. 188 7.5.3 Supplementary Figures Figure S 20. General reaction scheme of 7H4MC in the ground and excited state. (A) Ground state equilibrium of 7H4MC. Three forms of 7H4MC - neutral, anionic and complex of the neutral form, for instance with an adjacent aminogroup - can exist and/or coexist. The value of pKa for Reaction I is approximately 7.8. The numbers accompanying the rippled arrows correspond to the absorption wavelengths of the particular 7H4MC form. (B) Neutral, anionic, and complex forms can also occur in the excited state. In addition, proton transfer (reaction III) can take place resulting in the formation of a tautomer. This proton transfer was shown to be promoted by the presence of 'structured water'3 6 1 . In bulk, formation of anion (Reaction I) prevails since p/C0* in the excited state decreases beyond 1.53 6 1 . The numbers accompanying the rippled arrows correspond to the emission wavelengths of the particular 7H4MC form. Red reaction arrows highlight the dominant pathways in the excited state. 189 350 400 450 500 550 600 wavelength (nm) Figure S 21. Emission spectra of 7H4MC incorporated in AOT reverse micelles at various water/surfactant ratios (wo). Spectra recorded at the excitation wavelength 320 nm. The black curve represents the recorded emission spectrum, the green, blue and orange curves represent its decomposition into the neutral, anionic and tautomeric contributions, respectively. Panels (A), (B), (C) depict the emission spectra recorded at m = 0,10, and 40, respectively. 190 4 and analysed on gas chromatograph Agilent 7890 (Agilent, USA) equipped with capillary column DB-FFAP (30m x 0.25mm x 0.25pm, Phenomenex) and connected with mass spectrometer Agilent 5975C (Agilent, USA). The amount of halide in the water phase was measured by ion chromatograph 861 Advanced Compact IC equipped with METROSEP A Supp 5 column (Metrohm, Switzerland). The fluorescence kinetic data were recorded by using the stopped flow instrument SFM- 300 (BioLogic, France) combined with MOS-200 spectrometer equipped with a Xe arc lamp. Fluorescence emission from tryptophan residues was observed through a 320 nm cut-off filter upon excitation at 295 nm. All reactions were performed at 37°C in a glycine buffer of pH 8.6. 210 Data Analysis and Statistics. All data were imported and fit globally with the KinTek Explorer program (KinTek Corporation). Data fitting used numerical integration of rate equations from an input model searching a set of parameters that produce a minimum x2 value using nonlinear regression based on the Levenberg-Marquardt method3 9 1 . To account for slight variations in the data, enzyme or substrate concentrations were slightly adjusted (±10%) to derive best fits. In addition, the rate of substrate binding was assumed to be rapid equilibrium, and so binding constant was set to 1 000 mM'ls'1 . By allowing the dissociation rate to vary, calculations of equilibrium constants were then possible. Residuals were normalized by sigma value for each data point. The standard error (S.E.) was calculated from the covariance matrix during nonlinear regression. In addition to S.E. values, more rigorous analysis of the variation of the kinetic parameters was accomplished by confidence contour analysis by using FitSpace Explorer (KinTek, USA). In this analysis, the lower and upper limits for each parameter were derived from the confidence contours for x2 threshold at boundary 0.953 9 2 . Molecular modelling. Preparation of the ligand structure. The three-dimensional structures of (/?)- and (5)-2-bromopentane were prepared in Avogadro3 6 5 . Their partial atomic charges were derived by R.E.D. server3 6 8 . Input geometries were optimized by Gaussian 2009 D.01 program interfaced with this server and a multi-orientation RESP fit with RESP-A1A charge model was performed. Preparation of protein structures. Two structures of DhaA from Rhodococcus rhodochrous (PDB-ID: 4E46 and 4HZG) and two structures of mutant DhaA31 (PDB-ID: 3RK4 and 4FWB) were downloaded from the RCSB PDB database3 7 1 . All these crystal structures were prepared for analyses by removing ligands and water molecules. Missing heavy atoms in side chains and protons were added using the H++ server at pH 7.53 4 0 . Molecular docking. Autodock atom types and Gasteiger charges were added to protein and ligands by MGLTools3 9 3 . Precalculations of electrostatic potential energy, van der Waals, Hbonds and desolvation free energy for docking calculations were performed by AutoGrid 4.01 2 7 . Centre of grid maps with 80 x 80 x 80 grid points and spacing 0.25 A were set to OD1 211 atom of nucleophilic aspartate. These parameters were chosen to cover the active site and the main tunnel. Substrates were docked into the enzyme using AutoDock 4.01 2 7 . 250 runs of Lamarckian genetic algorithm were performed with different initial population sizes 50 and 300 using the following parameters: maximum of 3 x 106 energy evaluations and 30,000 generations, elitism value 1, mutation rate 0.02 and crossover rate 0.8. The local search was performed by Solis & Wets algorithm performing at most 300 iterations3 9 4 . MD simulations. Force field parameters for the docked conformations of ligands were prepared by antechamber module of AmberTools 14 with RESP charges obtained from R.E.D. server. Water molecules from the respective crystal structures were returned to the systems. CI- and Na+ ions were added to a final concentration of 0.1 M using the Tleap module of AMBER 143 4 1 . Using the same module, an octahedral set of TIP3P water molecules3 4 2 was added such that all atoms within the system were at least 10 A from the octahedron's surface. Energy minimisation and MD simulations were performed by using the PMEMD.CUDA module of AMBER143 4 1 with the ffl4SB force field3 4 3 for proteins and general amber force field[ 1 5 ] for ligands. Initially, the investigated systems were minimised by 500 steps of steepest descent followed by 500 steps of conjugate gradient over five rounds with decreasing harmonic restraints. The restraints were applied as follows: 500 kcal.mol^A"2 on all heavy atoms of the protein, and then 500, 125, 25 and 0 kcal.mol^A"2 on backbone atoms only. The subsequent MD simulations employed periodic boundary conditions, using the particle mesh Ewald method to treat electrostatic interactions1 5 5 , 3 4 4 , a 10 A cut-off for non-bonded interactions, and a 2 fs time step with the SHAKE algorithm to fix all bonds containing hydrogens3 4 5 . Equilibration simulations consisted of two steps: (I) 20 ps of gradual heating from 0 to 293 Kat constant volume, using a Langevin thermostat with a collision frequency of 1.0 ps"1 , and with harmonic restraints of 5.0 kcal.mol^A"2 on the positions of all protein atoms, and (ii) 2,000 ps of 293 K using the Langevin thermostat at a constant pressure of 1.0 bar using a pressure coupling constant of 1.0 ps. Finally, two separate 60 ns long production MD simulations were run for each system using the same settings as the second step of MD equilibration. Coordinates were saved at intervals of 2 ps and the resulting trajectories were analysed by using the Cpptraj module of AMBER14, 212 and visualised by using Pymol 1.5 (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrodinger, LLC) and VMD 1.9.13 4 6 . Q M / M M adiabatic mapping of the dehalogenation. An adiabatic mapping along the reaction coordinate was performed by the Sander module of AMBER14. The QM part of the system contained side-chains of halide stabilizing residues, catalytic aspartate and the ligand. The semiempirical PM6 Hamiltonian was used for the QM part3 9 5 and ffl4SB force field for M M part of the system. The Q M / M M boundary was treated through explicit link atoms and the cutoff for the Q M / M M charge interactions was set to 999 A. Constraint with force constant 1.0 kcal.mol^A"2 was used for the backbone. The reaction coordinate was defined as distance between OD1 atom of nucleophile and C2 atom of ligand. The driving along the reaction coordinate was performed with 0.05 A step and the restraint force constant of 5,000 kcal.mol^A"2 , each consisting of 1,000 minimization steps of limitedmemory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm3 9 6 . Determination of near attack conformation (NAC). Distance between nucleophilic oxygen on the catalytic aspartate and C2 atom of the ligand has to be within 3.41 A3 9 7 . Angle between nucleophilic oxygen on the catalytic aspartate, C2 atom and leaving bromine atom of the ligand has to be higher than 157°. 213 8.3.2 Supporting Information Figures and Tables Br V V_0H. VE n z E n z O M E n i R = a l k o x y c a r b o n y l o r a l k y l Figure S 28. Reaction mechanism of HLD with B-bromoalkanes. Enz-COO-: active site A s p 1 7 8 , 2 9 0 . LU 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 y = 8 3 6 Ü X - 2 3 R2 = 0.993 y = 11017X- 32 R2 = 0.944 y = 1 8 4 8 X - 3 R 2 = 0.917 0.003 0.0034 0.00360.0032 m [K-1 ] Figure S 29. Temperature dependence of enantiomeric ratios determined for dehalogenation of 2-BP by DhaA (red)388 , DhaA31 (green) and DbjA (blue) 214 Table S 33. Steady-state kinetic parameters for the hydrolysis of (/?)- and (S)-2-BP by DhaA, DhaA31and DbjAat 20°C. value ± S.E. (lower limit; upper limit) (fl)-2-BP (S)-2-BP j-, kcat Km Ksi kcat Km n Z y m C (s^) (mM) (mM) (s^) (mM) < 0.338 ±0.002 0.0110 ±0.0002 4.0 ± 0.2 0.044 ±0.001 0.0159 ± 0.0008 q (0.336; 0.498) (0.0063; 0.0149) (0.8; 7.7) (0.044; 0.044) (0.0107; 0.0201) «j 0.0355 ±0.0001 0.00011 ±0.00003 5.10 ±0.02 0.036 ±0.001 0.017 ±0.002 Ž (0.0354; 0.0408) (>0.00001; 0.00084) (2.09; 5.13) (0.036; 0.036) (0.014; 0.018) Q < 0.269 ±0.001 0.0100 ±0.0001 1.421 ±0.001 0.55 ±0.02 1.28 ±0.06 q (0.267; 0.298) (0.0036; 0.0173) (1.100; 1.450) (0.35; 1.04) (0.66; 2.92) Data were analysed using competitive steady-state model with substrate inhibition for (/?) -enantiomer. 215 Table S 34. Pre-steady-state kinetic parameters of 2-BP conversion by DhaA31 and DbjA. Individual rate and equilibrium dissociation constants obtained by fitting of a competitive kinetic model (Figure S 30) globally to steady-state and pre-steady-state kinetic data obtained at 37°C and pH 8.6. Substrate binding C-Br bond cleavage (SN2) Hydrolysis of intermediate (AdN) Product release ntane Enzyme Ki m M fa s-1 fe S"1 KA mM 01 Q_ DhaA31 0.69±0.03 400±20 0.33±0.001 >10 s bromc DbjA 2.45±0.19 390±30 0.92±0.01 0.23±0.05 itane Enzyme K5 mM K6 s1 K7 s1 Ks mM 01 Q_ DhaA31 0.83±0.03 7.00±0.20 0.436±0.001 >10 (S)-2- bromc DbjA 7.10±0.60 26.00±2.00 0.75±0.01 0.11±0.05 Ki and Ks - equilibrium dissociation constant for complex of enzyme with (/?)- and (S)-2-BP, respectively; ki and ke- rate constant for carbon-halogen bond cleavage in conversion of (/?)- and (S)-2-BP, respectively; fa and h - rate constant for hydrolysis of alkyl-enzyme intermediate in conversion of (/?)- and (S)-2-BP, respectively; Kn and KB - equilibrium dissociation constants for enzyme-product complex in conversion of (Aland (S)-2-BP, respectively. Data were fitted globally to competitive kinetic model (Figure S 30). 216 SUBSTRATE CLEAVAGE OF HYDROLYSIS OF PRODUCT BINDING C-Br BOND (SN2) INTERMEDIATE (AdN) RELEASE E . S R — - E - I R — - E . P R 4 p R ^ ft. E . S , •* E - U — • E . P , Figure S 30. The kinetic model of 2-BP conversion by DhaA31 and DbjA at 37°C. E is free enzyme, E.S is enzyme-substrate complex, E-l is covalently bound alkyl-enzyme intermediate (halide product is bound to the intermediate), E.P is enzyme complex with both bromide and alcohol products. The subscript identifies (/?)or (S)-enantiomer of substrate (S), intermediate (I) and alcohol product (P). 50 100 Time (min) 40 80 Time (min) 217 0.25 0.25 Time (s) Time (s) Figure S 31. Kinetic analysis of DhaA31 reaction with 2-bromopentane. Total conversion of 1.17 and 1.29 mM (/?)-2-bromopentane (A) and 1.13 and 1.29 m M (S)-2-bromopentane (B) by 0.9 \xM DhaA31. Reaction burst of halide (•) and alcohol ( A ) product monitored upon mixing 160 \xM DhaA31 with 350 \xM (R)-2bromopentane (C) and 650 \xM (S)-2-bromopentane (D). Kinetic resolution of 750 \xM rac-2-bromopentane by 2 \xM DhaA31 (E), (/?)-2-bromopentane (blue circles) and (S)-2-bromopentane (green circles). Stopped-flow fluorescence traces recorded upon rapid mixing of 4 \xM DhaA31 with 0 - 230 \xM (S)-2-bromopentane (F), each trace shows the average often individual experiments. All reactions performed at 37°C and pH 8.6. Solid lines represent global fit to the kinetic data. 218 Concentration (mM) Rate (mM .sr1 > Figure S 32. Kinetic analysis of DbjAwt reaction with 2-bromopentane. Steady-state kinetics of (R)-2bromopentane (blue circles) and (S)-2-bromopentane (green circles) conversion by DbjAwt (A). Kinetic resolution of 980 u.M rac-2-bromopentane by 1 u.M DbjAwt (B). Reaction burst of halide (•) and alcohol ( A ) product monitored upon mixing 120 u.M DbjAwt with 350 u.M (R)-2-bromopentane (C) and 160 u.M DbjAwt with 460 u.M (S)-2-bromopentane (D). Stopped-flow fluorescence traces recorded upon rapid mixing of 155 \xM DbjAwt with 50,120 and 350 u.M (/?)-2-bromopentane (E) or (S)-2-bromopentane (F), each trace shows the average often individual experiments. All reactions performed at 37°C and pH 8.6. Solid lines represent global fit to the kinetic data. Table S 35. Percentage of NACs for both substrates in molecular dynamics simulations. Enzyme (R)-2-BP (%) (S)-2-BP (%) DhaA 0.57±0.58 0.38±0.57 DhaA31 5.50±1.11 0.59±0.86 DbjA 0.31±0.11 0.17±0.02 NAC - near-attack configuration 220 Table S 36. Four categories of ligand positioning and NACs (NAC; the ground state configurations that can convert to the transition state) identified within every category by molecular dynamics. Enzyme Substrate Unstabilized [%] NAC Right [%] NAC Left [%] NAC Other [%] NAC DhaA ( « ) 61 2 34 168 3 0 2 1 DhaA (S) 82 1 6 5 10 297 2 5 DhaA31 ( « } 0 4 89 1610 6 0 5 37 DhaA31 (S) 17 3 52 1 10 169 21 4 NAC - near attack configuration Table S 37. Specific activity and enantioselectivity (E values) of DhaA variants with 2-BP. Specific activity'3 ' E value[ b l DhaA 0 . 0 0 8 [ c l 18 DhaA31 0.007 179 C176Y+Y273F 0.005 25 I135F 0.019 27 L246I 0.025 37 V245F 0.025 88 V245F+L246I 0.016 160 [ a l nmol-s^-mg"1 of enzyme at 37°C;[ b l the F-values were measured at 20°C; [ c l data measured at room temperature (Prokop et. al. 2010)1 6 6 . 221 CONCLUSIONS This dissertation deals with two important topics of structure-function relationships: (i) protein stabilization (exemplified with haloalkane dehalogenase, y-hexachlorocyclohexane dehydrochlorinase LinA and fibroblast growth factor 2), and (ii) characterization of rationally engineered haloalkane dehalogenase DhaA. Nowadays, protein stabilization methods focus mostly on identification of single-point mutants, which are experimentally characterized. Newly developed method FireProt is capable of combining individual mutations into the final multiple-point mutant directly. Only a few proteins have to be characterized. Thus an increase in the speed and a decrease in the cost and laboratory demands are the main advantages compared to other methods. However, the usage of FireProt is not easy and a lot of experience in bioinformatics and computational chemistry is needed to execute a whole protocol. Therefore, we are currently developing a fully automatic FireProt web server, which will make this technique accessible to a broad scientific community. The knowledge from protein stabilization was applied also to improvement of thermodynamic stability of human fibroblast growth factor FGF2 by 19°C. This molecule is essential for stem cell cultivation, but because of its short half-life, cultivation medium has to be exchanged every day. Our stable variant stays active even after twenty days which significantly simplify cultivation of stem cells. Stable FGF2 molecule will find use in research and development, cosmetics and wound healing. Second part of this Thesis is focused on different properties of engineered haloalkane dehalogenase DhaA. The emphasis is put on importance of its access tunnel, which has significant impact on several different properties. A new method for analysis of protein hydration at the entry to the access tunnel was proposed based on fluorescent spectroscopy and incorporation of unnatural amino acid. A hydration has a large effect on protein behavior but up to now its effect on enzymatic catalysis has been often neglected. We observed two different structural bases of enantioselectivity in dehalogenases, one of them being driven by hydration. In silico methods for analyses of water molecules would be a great helper in description of enantioselectivity, enzyme kinetics, and possibly other interesting enzyme properties. 222 REFERENCES 1. Eijsink, V. G. H. et al. Rational engineering of enzyme stability. J. Biotechnol. 113,105-120 (2004). 2. Stepankova, V., Vanacek, P., Damborsky, J. & Chaloupkova, R. Comparison of catalysis by haloalkane dehalogenases in aqueous solutions of deep eutectic and organic solvents. Green Chem. 16, 2754-2761 (2014). 3. Lazaridis, T. & Karplus, M. Thermodynamics of protein folding: a microscopic view. Biophys. Chem. 100, 367-395 (2002). 4. Deller, M. C, Kong, L & Rupp, B. Protein stability: a crystallographer's perspective. Acta Crystallogr. Sect. FStruct. Biol. Commun. 72, 72-95 (2016). 5. Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223-230 (1973). 6. Levinthal, C. Are There Pathways for Protein folding? Extrait du Journal de Chimie Physique 1968, 44 7. Rooman, M., Dehouck, Y., Kwasigroch, J. M., Biot, C. & Gilis, D. What is Paradoxical about Levinthal Paradox? J. Biomol. Struct. Dyn. 20, 327-329 (2002). 8. Mukaiyama, A. & Takano, K. Slow Unfolding of Monomeric Proteins from Hyperthermophiles with Reversible Unfolding. Int. J. Mol. Sei. 10,1369-1385 (2009). 9. Pauling, L, Corey, R. B. & Branson, H. R. The structure of proteins: Two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sei. 37, 205-211 (1951). 10. Pauling, L. & Corey, R. B. Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds. Proc. Natl. Acad. Sei. U. S. A. 37, 729-740 (1951). 11. Kendrew, J. C Myoglobin and the Structure of Proteins. Science 139,1259-1266 (1963). 12. Pace, N. C, Scholtz, J. M. & Grimsley, G. R. Forces stabilizing proteins. FEBS Lett. 588, 2177-2184 (2014). 13. Huyghues-Despointes, B. M., Pace, C N., Englander, S. W. & Scholtz, J. M. Measuring the conformational stability of a protein by hydrogen exchange. Methods Mol. Biol. Clifton NJ 168, 69-92 (2001). 14. Lesser, G. J. & Rose, G. D. Hydrophobicity of amino acid subgroups in proteins. Proteins Struct. Fund. Bioinforma. 8, 6-13 (1990). 15. Lazaridis, T., Archontis, G. & Karplus, M. in Advances in Protein Chemistry (ed. CB. Anfinsen, F. M. R., John T.Edsall and David S.Eisenberg) 47, 231-306 (Academic Press, 1995). 16. Pace, C N. Evaluating contribution of hydrogen bonding and hydrophobic bonding to protein folding. Methods Enzymol. 259, 538-554 (1995). 17. Pace, C.N. et al. Contribution of hydrophobic interactions to protein stability. J. Mol. Biol. 408, 514-528 (2011). 18. Baase, W. A., Liu, L., Tronrud, D. E. & Matthews, B. W. Lessons from the lysozyme of phage T4. Protein Sei. Puhl. Protein Soc. 19, 631-641 (2010). 19. Doig, A. J. & Sternberg, M. J. Side-chain conformational entropy in protein folding. Protein Sei. Puhl. Protein Soc. 4, 2247-2251 (1995). 20. Sticke, D. F., Presta, L. G., Dill, K. A. & Rose, G. D. Hydrogen bonding in globular proteins. J. Mol. Biol. 226, 1143- 1159 (1992). 21. Bowie, J. U. Membrane Protein Folding: How important are hydrogen bonds? Curr. Opin. Struct. Biol. 21, 42-49 (2011). 22. Joh, N. H. et al. Modest stabilization by most hydrogen-bonded side-chain interactions in membrane proteins. Nature 453, 1266-1270 (2008). 23. Takano, K., Scholtz, J. M., Sacchettini, J. C & Pace, C N. The Contribution of Polar Group Burial to Protein Stability Is Strongly Context-dependent. J. Biol. Chem. 278, 31790-31795 (2003). 24. Mills, J. E. J. & Dean, P. M. Three-dimensional hydrogen-bond geometry and probability information from a crystal survey. J. Comput. Aided Mol. Des. 10, 607-622 25. Gao, J., Bosco, D. A., Powers, E. T. & Kelly, J. W. Localized thermodynamic coupling between hydrogen bonding and microenvironment polarity substantially stabilizes proteins. Nat. Struct. Mol. Biol. 16, 684-690 (2009). 26. Cao, Z. & Bowie, J. U. An energetic scale for equilibrium H/D fractionation factors illuminates hydrogen bond free energies in proteins. Protein Sei. Puhl. Protein Soc. 23, 566-575 (2014). 27. Robinson, C R. & Sauer, R. T. Striking stabilization of Arc repressor by an engineered disulfide bond. Biochemistry (Mose.) 39, 12494-12502 (2000). 223 28. Wijma, H. J. etal. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sei. PEDS 27, 49-58 (2014). 29. Dombkowski, A. A., Sultana, K. Z. & Craig, D. B. Protein disulfide engineering. FEBS Lett. 588, 206-212 (2014). 30. Grimsley, G. R. et al. Increasing protein stability by altering long-range coulombic interactions. Protein Sei. Publ. Protein Soc. 8, 1843-1849 (1999). 31. Pace, C. N., Alston, R. W. & Shaw, K. L. Charge-charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sei. Publ. Protein Soc. 9,1395-1398 (2000). 32. Brady, G. P. & Sharp, K. A. Entropy in protein folding and in protein-protein interactions. Curr. Opin. Struct. Biol. 7, 215-221 (1997). 33. Tzeng, S.-R. & Kalodimos, C G. Protein activity regulation by conformational entropy. Nature 488, 236-240 (2012). 34. Kasinath, V., Sharp, K. A. & Wand, A. J. Microscopic Insights into the NMR Relaxation-Based Protein Conformational Entropy Meter. J. Am. Chem. Soc. 135, 15092-15100 (2013). 35. Wand, A. J. The dark energy of proteins comes to light: Conformational entropy and its role in protein function revealed by NMR relaxation. Curr. Opin. Struct. Biol. 23, 75-81 (2013). 36. Baxa, M. C, Haddadian, E. J., Jumper, J. M., Freed, K. F. & Sosnick, T. R. Loss of conformational entropy in protein folding calculated using realistic ensembles and its implications for NMR-based calculations. Proc. Natl. Acad. Sei. U. S. A. I l l , 15396-15401 (2014). 37. Thompson, J. B., Hansma, H. G., Hansma, P. K. & Plaxco, K. W. The Backbone Conformational Entropy of Protein Folding: Experimental Measures from Atomic Force Microscopy. J. Mol. Biol. 322, 645-652 (2002). 38. Hu, X. & Kuhlman, B. Protein design simulations suggest that side-chain conformational entropy is not a strong determinant of amino acid environmental preferences. Proteins Struct. Fund. Bioinforma. 62, 739-748 (2006). 39. D'Aquino, J. A. et al. The magnitude of the backbone conformational entropy change in protein folding. Proteins 25, 143-156 (1996). 40. Pace, C N. etal. Conformational stability and thermodynamics of folding of ribonucleases Sa, Sa2 and Sa3. J. Mol. Biol. 279, 271-286(1998). 41. Martinez, A., Calvo, A. C, Teigen, K. & Pey, A. L. Rescuing proteins of low kinetic stability by chaperones and natural ligands phenylketonuria, a case study. Prog. Mol. Biol. Transl. Sei. 83, 89-134 (2008). 42. Lynch, S. M., Boswell, S. A. & Colon, W. Kinetic stability of Cu/Zn superoxide dismutase is dependent on its metal ligands: implications for ALS. Biochemistry (Mose.) 43,16525-16531 (2004). 43. Hammarström, P., Wiseman, R. L., Powers, E. T. & Kelly, J. W. Prevention of transthyretin amyloid disease by changing protein misfolding energetics. Science 299, 713-716 (2003). 44. Costas, M. etal. Between-Species Variation in the Kinetic Stability of TIM Proteins Linked to Solvation-Barrier Free Energies. J. Mol. Biol. 385, 924-937 (2009). 45. Sanchez-Ruiz, J. M. Protein kinetic stability. Biophys. Chem. 148,1-15 (2010). 46. Sohl, J. L., Jaswal, S. S. & Agard, D. A. Unfolded conformations of a-lytic protease are more stable than its native state. Nature 395, 817-819 (1998). 47. Tur-Arlandis, G., Rodriguez-Larrea, D., Ibarra-Molero, B. & Sanchez-Ruiz, J. M. Proteolytic scanning calorimetry: a novel methodology that probes the fundamental features of protein kinetic stability. Biophys. J. 98, L12-14 (2010). 48. Baker, D. & Agard, D. A. Kinetics versus Thermodynamics in Protein Folding. Biochemistry (Mose.) 33, 7505-7509 (1994). 49. Xia, K. etal. Identifying the subproteome of kinetically stable proteins via diagonal 2D SDS/PAGE. Proc. Natl. Acad. Sei. U. S. A. 104,17329-17334 (2007). 50. Park, C, Zhou, S., Gilmore, J. & Marqusee, S. Energetics-based protein profiling on a proteomic scale: identification of proteins resistant to proteolysis. J. Mol. Biol. 368, 1426-1437 (2007). 51. Khoury, G. A., Smadbeck, J., Kieslich, C A. & Floudas, C A. Protein folding and de novo protein design for biotechnological applications. Trends Biotechnol. 32, 99-109 (2014). 52. Schmid, A., Hollmann, F., Park, J. B. & Bühler, B. The use of enzymes in the chemical industry in Europe. Curr. Opin. Biotechnol. 13, 359-366 (2002). 53. Hartmann, M., Roeraade, J., Stoll, D., Templin, M. F. & Joos, T. 0. Protein microarrays for diagnostic assays. Anal. Bioanal. Chem. 393,1407-1416 (2009). 224 54. Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 33,161-168 (2015). 55. Kleina, L G. & Miller, J. H. Genetic studies of the lac repressor. XIII. Extensive amino acid replacements generated by the use of natural and synthetic nonsense suppressors. J. Mol. Biol. 212, 295-318 (1990). 56. Bergquist, P. L, Reeves, R. A. & Gibbs, M. D. Degenerate oligonucleotide gene shuffling (DOGS) and random drift mutagenesis (RNDM): two complementary techniques for enzyme evolution. Biomol. Eng. 22, 63-72 (2005). 57. Rosic, N. N., Huang, W.Johnston, W. A., DeVoss, J. J. & Gillam, E. M.J. Extending the diversity of cytochrome P450 enzymes by DNA family shuffling. Gene 395, 40-48 (2007). 58. Sen, S., Dasu, V. V. & Mandal, B. Developments in Directed Evolution for Improving Enzyme Functions. Appl. Biochem. Biotechnol. 143, 212-223 (2007). 59. Bershtein, S. & Tawfik, D. S. Advances in laboratory evolution of enzymes. Curr. Opin. Chem. Biol. 12, 151-158 (2008). 60. Gray, K. A. et al. Rapid Evolution of Reversible Denaturation and Elevated Melting Temperature in a Microbial Haloalkane Dehalogenase. Adv. Synth. Catal. 343, 607-617 (2001). 61. Kretz, K. A. et al. Gene site saturation mutagenesis: a comprehensive mutagenesis approach. Methods Enzymol. 388, 3-11 (2004). 62. Sagar, D. M., Aoudjane, S., Gaudet, M., Aeppli, G. & Dalby, P. A. Optically induced thermal gradients for protein characterization in nanolitre-scale samples in microfluidic devices. Sei. Rep. 3, 2130 (2013). 63. Yang, X. et al. A novel microfluidic system for the rapid analysis of protein thermal stability. Analyst 139, 2683- 2686 (2014). 64. Reetz, M. T., Carballeira, J. D. & Vogel, A. Iterative Saturation Mutagenesis on the Basis of B Factors as a Strategy for Increasing Protein Thermostability. Angew. Chem. Int. Ed. 45, 7745-7751 (2006). 65. Zhang, J. etal. High-throughput screening of B factor saturation mutated Rhizomucor miehei lipase thermostability based on synthetic reaction. Enzyme Microb. Technol. 50, 325-330 (2012). 66. Xie, Y. et al. Enhanced Enzyme Kinetic Stability by Increasing Rigidity within the Active Site. J. Biol. Chem. 289, 7994-8006 (2014). 67. Kim, H. S., Le, Q. A. T. & Kim, Y. H. Development of thermostable lipase B from Candida antarctica (CalB) through in silico design employing B-factor and RosettaDesign. Enzyme Microb. Technol. 47,1-5 (2010). 68. Le, Q. A. T., Joo, J. C, Yoo, Y. J. & Kim, Y. H. Development of thermostable Candida antarctica lipase B through novel in silico design of disulfide bridge. Biotechnol. Bioeng. 109, 867-876 (2012). 69. Pavelka, A., Chovancova, E. & Damborsky, J. HotSpot Wizard: a web server for identification of hot spots in protein engineering. Nucleic Acids Res. 37, W376-W383 (2009). 70. Bendl, J. et al. HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering. Nucleic Acids Res. 44, W479-487 (2016). 71. Bosshart, A., Panke, S. & Bechtold, M. Systematic optimization of interface interactions increases the thermostability of a multimeric enzyme. Angew. Chem. Int. Ed Engl. 52, 9673-9676 (2013). 72. Maugini, E., Tronelli, D., Bossa, F. & Pascarella, S. Structural adaptation of the subunit interface of oligomeric thermophilic and hyperthermophilic enzymes. Comput. Biol. Chem. 33, 137-148 (2009). 73. Steipe, B., Schiller, B., Plückthun, A. & Steinbacher, S. Sequence statistics reliably predict stabilizing mutations in a protein domain. 7. Mol. Biol. 240,188-192 (1994). 74. Wirtz, P. & Steipe, B. Intrabody construction and expression III: engineering hyperstable V(H) domains. Protein Sei. Puhl. Protein Soc. 8, 2245-2250 (1999). 75. Sullivan, B. J. et al. Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability. J. Mol. Biol. 420, 384-399 (2012). 76. Ashkenazy, H. et al. FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res. 40, W580-W584 (2012). 77. Swofford, D. L & Maddison, W. P. Reconstructing ancestral character states under Wagner parsimony. Math. Biosci. 87, 199-229 (1987). 78. Pagel, M. The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies. Syst. Biol. 48, 612-622 (1999). 225 79. Krishnan, N. M., Seligmann, H., Stewart, C.-B., De Koning, A. P. J. & Pollock, D. D. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference. Mol. Biol. Evol. 21, 1871- 1883 (2004). 80. Watanabe, K., Ohkuri, T., Yokobori, S. & Yamagishi, A. Designing Thermostable Proteins: Ancestral Mutants of 3Isopropylmalate Dehydrogenase Designed by using a Phylogenese Tree. J. Mol. Biol. 355, 664-674 (2006). 81. Risso, V. A., Gavira, J. A., Gaucher, E. A. & Sanchez-Ruiz, J. M. Phenotypic comparisons of consensus variants versus laboratory resurrections of Precambrian proteins. Proteins Struct. Fund. Bioinforma. 82, 887-896 (2014). 82. Cheng, J., Randall, A. & Baldi, P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 62,1125-1132 (2006). 83. Teng, S., Srivastava, A. K. & Wang, L Sequence feature-based prediction of protein stability changes upon amino acid substitutions. BMC Genomics 11, S5 (2010). 84. Yin, X. etal. Contribution of Disulfide Bridges to the Thermostability of a Type A Feruloyl Esterase from Aspergillus usamii. PLoSONE 10,(2015). 85. Borgo, B. & Havranek, J. J. Automated selection of stabilizing mutations in designed and natural proteins. Proc. Natl. Acad. Sei. U. S. A. 109, 1494-1499 (2012). 86. Lawrence, M. S., Phillips, K. J. & Liu, D. R. Supercharging Proteins Can Impart Unusual Resilience. J. Am. Chem. Soc. 129,10110 (2007). 87. Dunbrack, R. L. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 12, 431-440 (2002). 88. Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830-838 (2011). 89. Yin, S., Ding, F. & Dokholyan, N. V. Eris: an automated estimator of protein stability. Nat. Methods 4, 466-467 (2007). 90. Pokala, N. & Handel, T. M. Energy Functions for Protein Design: Adjustment with Protein-Protein Complex Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity. J. Mol. Biol. 347, 203- 227 (2005). 91. Benedix, A., Becker, C M., de Groot, B. L., Caflisch, A. & Böckmann, R. A. Predicting free energy changes using structural ensembles. Nat. Methods 6, 3-4 (2009). 92. Seeliger, D. & de Groot, B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys. J. 98, 2309-2316 (2010). 93. Worth, C L., Preissner, R. & Blundell, T. L. SDM—a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Res. 39, W215-W222 (2011). 94. Guerois, R., Nielsen, J. E. & Serrano, L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320, 369-387 (2002). 95. Dehouck, Y., Kwasigroch, J. M., Gilis, D. & Rooman, M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 12,151 (2011). 96. Johnston, M. A., S0ndergaard, C R. & Nielsen, J. E. Integrated prediction of the effect of mutations on multiple protein characteristics. Proteins 79,165-178 (2011). 97. Capriotti, E., Fariselli, P., Calabrese, R. & Casadio, R. Predicting protein stability changes from sequences using support vector machines. Bioinformatics 21, Ü54-Ü58 (2005). 98. Capriotti, E., Fariselli, P. & Casadio, R. A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics 20, i63-i68 (2004). 99. Tian, J., Wu, N., Chu, X. & Fan, Y. Predicting changes in protein thermostability brought about by single- or multisite mutations. BMC Bioinformatics 11, 370 (2010). 100. Masso, M. & Vaisman, 1.1. AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single amino acid replacements. Protein Eng. Des. Sei. 23, 683-687 (2010). 101. Laimer, J., Hofer, H., Fritz, M., Wegenkittl, S. & Lackner, P. MAESTRO - multi agent stability prediction upon point mutations. BMC Bioinformatics 16,116 (2015). 102. Potapov, V., Cohen, M. & Schreiber, G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng. Des. Sei. 22, 553-560 (2009). 226 103. Kumar, M. D. S. et al. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 34, D204-206 (2006). 104. Khan, S. & Vihinen, M. Performance of protein stability predictors. Hum. Mutat. 31, 675-684 (2010). 105. Thiltgen, G. & Goldstein, R. A. Assessing Predictors of Changes in Protein Stability upon Mutation Using SelfConsistency. PLOS ONE 7, e46084 (2012). 106. Goldenzweig, A. et al. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability. Mol. Cell 63, 337-346 (2016). 107. Fleishman, S.J. etal. RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite. PLOS ONE 6, e20161 (2011). 108. Jorgensen, W. L. The many roles of computation in drug discovery. Science 303,1813-1818 (2004). 109. Moitessier, N., Englebienne, P., Lee, D., Lawandi, J. & Corbeil, C R. Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br. J. Pharmacol. 153, S7-S26 (2008). 110. Daniel, L., Buryska, T., Prokop, Z., Damborsky, J. & Brezovsky, J. Mechanism-based discovery of novel substrates of haloalkane dehalogenases using in silico screening. J. Chem. Inf. Model. 55, 54-62 (2015). 111. Meng, X.-Y., Zhang, H.-X., Mezei, M. & Cui, M. Molecular Docking: A powerful approach for structure-based drug discovery. Curr. Comput. Aided Drug Des. 7,146-157 (2011). 112. Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. & Ferrin, T. E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269-288 (1982). 113. Perola, E., Walters, W. P. & Charifson, P. S. A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 56, 235-249 (2004). 114. Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785-2791 (2009). 115. Sherman, W., Day, T., Jacobson, M. P., Friesner, R. A. & Farid, R. Novel procedure for modeling ligand/receptor induced fit effects. J. Med. Chem. 49, 534-553 (2006). 116. Zhao, Y. & Sanner, M. F. FLIPDock: docking flexible ligands into flexible receptors. Proteins 68, 726-737 (2007). 117. Trott, 0. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455-461 (2010). 118. Wong, C F. Flexible receptor docking for drug discovery. Expert Opin. Drug Discov. 10,1189-1200 (2015). 119. Burkhard, P., Taylor, P. & Walkinshaw, M. D. An example of a protein ligand found by database mining: description of the docking method and its verification by a 2.3 A X-ray structure of a thrombin-ligand complex. J. Mol. Biol. 277,449-466(1998). 120. Miller, M. D., Kearsley, S. K., Underwood, D. J. & Sheridan, R. P. FLOG: a system to select 'quasi-flexible' ligands complementary to a receptor of known three-dimensional structure. J. Comput. Aided Mol. Des. 8,153-174(1994). 121. Ewing, T. J., Makino, S., Skillman, A. G. & Kuntz, I. D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411-428 (2001). 122. Zavodszky, M. I. & Kuhn, L. A. Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis. Protein Sci. Publ. Protein Soc. 14,1104-1114 (2005). 123. Welch, W., Ruppert, J. & Jain, A. N. Hammerhead: fast, fully automated docking of flexible ligands to protein binding sites. Chem. Biol. 3, 449-462 (1996). 124. Goodsell, D. S. & Olson, A. J. Automated docking of substrates to proteins by simulated annealing. Proteins Struct. Fund. Bioinforma. 8,195-202 (1990). 125. McMartin, C & Bohacek, R. S. QXP: powerful, rapid computer algorithms for structure-based drug design. J. Comput. Aided Mol. Des. 11, 333-344 (1997). 126. Abagyan, R., Totrov, M. & Kuznetsov, D. ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation. J. Comput. Chem. 15, 488-506 (1994). 127. Morris, G. M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19,1639-1662 (1998). 128. Taylor, J. S. & Burnett, R. M. DARWIN: a program for docking flexible molecules. Proteins 41,173-191 (2000). 129. Verdonk, M. L., Cole, J. C, Hartshorn, M. J., Murray, C W. & Taylor, R. D. Improved protein-ligand docking using GOLD. Proteins 52, 609-623 (2003). 227 130. Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935-949 (2004). 131. Liu, J. & Wang, R. Classification of Current Scoring Functions. J. Chem. Inf. Model. 55, 475-482 (2015). 132. Huang, S.-Y. & Zou, X. Inclusion of solvation and entropy in the knowledge-based scoring function for proteinligand interactions. 7. Chem. Inf. Model. 50, 262-273 (2010). 133. Pikkemaat, M. G., Linssen, A. B. M., Berendsen, H. J. C & Janssen, D. B. Molecular dynamics simulations as a tool for improving protein stability. Protein Eng. 15,185-192 (2002). 134. Doss, C G. P. etal. Screening of mutations affecting protein stability and dynamics of FGFR1—A simulation analysis. Appl. Transl. Genomics 1, 37-43 (2012). 135. Chen, Z., Fu, Y., Xu, W. & Li, M. Molecular Dynamics Simulation of Barnase: Contribution of Noncovalent Intramolecular Interaction to Thermostability. Math. Probl. Eng. 2013, e504183 (2013). 136. Bernardi, R. C, Cann, I. & Schulten, K. Molecular dynamics study of enhanced Man5B enzymatic activity. Biotechnol. Biofuels 1, 83 (2014). 137. Osuna, S., Jimenez-Oses, G., Noey, E. L. & Houk, K. N. Molecular Dynamics Explorations of Active Site Structure in Designed and Evolved Enzymes. Acc. Chem. Res. 48,1080-1089 (2015). 138. Vettoretti, G. et al. Molecular Dynamics Simulations Reveal the Mechanisms of Allosteric Activation of Hsp90 by Designed Ligands. Sci. Rep. 6, 23830 (2016). 139. Lindorff-Larsen, K., Piana, S., Dror, R. 0. & Shaw, D. E. How fast-folding proteins fold. Science 334, 517-520 (2011). 140. Piana, S., Klepeis, J. L. & Shaw, D. E. Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Curr. Opin. Struct. Biol. 24, 98-105 (2014). 141. Miao, Y., Feixas, F., Eun, C & McCammon, J. A. Accelerated molecular dynamics simulations of protein folding. J. Comput. Chem. 36,1536-1549 (2015). 142. Seshasayee, A. S. N. High-Temperature unfolding of a trp-Cage mini-protein: a molecular dynamics simulation study. Theor. Biol. Med. Model. 2, 7 (2005). 143. Lindorff-Larsen, K., Trbovic, N., Maragakis, P., Piana, S. & Shaw, D. E. Structure and Dynamics of an Unfolded Protein Examined by Molecular Dynamics Simulation. J. Am. Chem. Soc. 134, 3787-3791 (2012). 144. Vogel, M. Temperature-Dependent Mechanisms for the Dynamics of Protein-Hydration Waters: A Molecular Dynamics Simulation Study. J. Phys. Chem. B 113, 9386-9392 (2009). 145. Fogarty, A. C & Laage, D. Water Dynamics in Protein Hydration Shells: The Molecular Origins of the Dynamical Perturbation. J. Phys. Chem. B 118, 7715-7729 (2014). 146. Chen, Q., Luan, Z.-J., Cheng, X. & Xu, J.-H. Molecular Dynamics Investigation of the Substrate Binding Mechanism in Carboxylesterase. Biochemistry (Mosc.j 54,1841-1848 (2015). 147. Manjunath, K., Jeyakanthan, J. & Sekar, K. Catalytic pathway, substrate binding and stability in SAICAR synthetase: A structure and molecular dynamics study. J. Struct. Biol. 191, 22-31 (2015). 148. Lawson, J. D., Pate, E., Rayment, I. & Yount, R. G. Molecular Dynamics Analysis of Structural Factors Influencing Back Door Pi Release in Myosin. Biophys. J. 86, 3794-3803 (2004). 149. Choutko, A. & van Gunsteren, W. F. Molecular dynamics simulation of the last step of a catalytic cycle: Product release from the active site of the enzyme chorismate mutase from Mycobacterium tuberculosis. Protein Sci. Publ. Protein Soc. 21, 1672-1681 (2012). 150. Haaffner, F., Norin, T. & Hult, K. Molecular Modeling of the Enantioselectivity in Lipase-Catalyzed Transesterification Reactions. Biophys. J. 74,1251-1262 (1998). 151. Wijma, H. J., Marrink, S. J. & Janssen, D. B. Computationally efficient and accurate enantioselectivity modeling by clusters of molecular dynamics simulations. J. Chem. Inf. Model. 54, 2079-2092 (2014). 152. Leach, A. R. Molecular Modelling: Principles and Applications. (Pearson Education, 2001). 153. Cramer, C J. Essentials of Computational Chemistry: Theories and Models. (Wiley, 2002). 154. Piana, S. et al. Evaluating the Effects of Cutoffs and Treatment of Long-range Electrostatics in Protein Folding Simulations. PLoS ONE 7, (2012). 155. Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: An N-log(N) method for Ewald sums in large systems. J. Chem. Phys. 98,10089-10092 (1993). 228 156. Wu, X. & Brooks, B. Ft. Isotropic periodic sum: A method for the calculation of long-range interactions. J. Chem. Phys. 122, 044107 (2005). 157. Lagüe, P., Pastor, R. W. & Brooks, B. R. Pressure-Based Long-Range Correction for Lennard-Jones Interactions in Molecular Dynamics Simulations: Application to Alkanes and Interfaces. J. Phys. Chem. B 108, 363-368 (2004). 158. Hockney, R. W. Potential Calculation and Some Applications. Methods Comput Phys 9 135-2111970 (1970). 159. Verlet, L. Computer 'Experiments' on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules. Phys. Rev. 159, 98-103 (1967). 160. Beeman, D. Some multistep methods for use in molecular dynamics calculations. J. Comput. Phys. 20, 130-139 (1976). 161. Senn, H. M. & Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem. Int. Ed. 48,1198-1229 (2009). 162. van Duin, A. C T., Dasgupta, S., Lorant, F. & Goddard, W. A. ReaxFF: A Reactive Force Field for Hydrocarbons. J. Phys. Chem. A 105, 9396-9409 (2001). 163. Warshel, A. & Levitt, M. Theoretical studies of enzymic reactions: Dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 103, 227-249 (1976). 164. Groenhof, G. Introduction to QM/MM simulations. Methods Mol. Biol. Clifton NJ 924, 43-66 (2013). 165. Damborský, J. etal. Structure-specificity relationships for haloalkane dehalogenases. Environ. Toxicol. Chem. SETAC 20, 2681-2689 (2001). 166. Prokop, Z. et al. Enantioselectivity of haloalkane dehalogenases and its modulation by surface loop engineering. Angew. Chem. Int. Ed Engl. 49, 6111-6115 (2010). 167. Westerbeek, A. et al. Kinetic Resolution of ot-Bromoamides: Experimental and Theoretical Investigation of Highly Enantioselective Reactions Catalyzed by Haloalkane Dehalogenases. Adv. Synth. Catal. 353, 931-944 (2011). 168. Dvorak, P., Bidmanova, S., Damborsky, J. & Prokop, Z. Immobilized synthetic pathway for biodegradation of toxic recalcitrant pollutant 1,2,3-trichloropropane. Environ. Sei. Technol. 48, 6859-6866 (2014). 169. Prokop, Z., Oplustil, F., DeFrank, J. & Damborský, J. Enzymes fight chemical weapons. Biotechnol. J. 1, 1370-80 (2006). 170. Bidmanova, S., Chaloupková, R., Damborsky, J. & Prokop, Z. Development of an enzymatic fiber-optic biosensor for detection of halogenated hydrocarbons. Anal. Bioanal. Chem. 398,1891-1898 (2010). 171. Mazzucchelli, S. et al. Orientation-controlled conjugation of haloalkane dehalogenase fused homing peptides to multifunctional nanoparticles for the specific recognition of cancer cells. Angew. Chem. Int. Ed Engl. 52, 3121-3125 (2013). 172. Kulakova, A. N., Larkin, M. J. & Kulakov, L. A. The plasmid-located haloalkane dehalogenase gene from Rhodococcus rhodochrous NCIMB 13064. Microbiol. Read. Engl. 143 ( Pt 1), 109-115 (1997). 173. Ollis, D. L. et al. The alpha/beta hydrolase fold. Protein Eng. 5, 197-211 (1992). 174. Chovancová, E., Kosinski, J., Bujnicki, J. M. & Damborský, J. Phylogenetic analysis of haloalkane dehalogenases. Proteins Struct. Fund. Bioinforma. 67, 305-316 (2007). 175. Verschueren, K. H., Seljée, F., Rozeboom, H. J., Kalk, K. H. & Dijkstra, B. W. Crystallographic analysis of the catalytic mechanism of haloalkane dehalogenase. Nature 363, 693-698 (1993). 176. Boháč, M. et al. Halide-Stabilizing Residues of Haloalkane Dehalogenases Studied by Quantum Mechanic Calculations and Site-Directed Mutagenesis. Biochemistry (Mose.) 41, 14272-14280 (2002). 177. Prokop, Z. etal. Catalytic Mechanism of the Haloalkane Dehalogenase LinBfromSphingomonas paucimobilis UT26. J. Biol. Chem. 278, 45094-45100 (2003). 178. Janssen, D. B. Evolving haloalkane dehalogenases. Curr. Opin. Chem. Biol. 8, 150-159 (2004). 179. Imai, R. et al. Dehydrochlorination of y-Hexachlorocyclohexane (y-BHC) by y-BHC-Assimilating Pseudomonas paucimobilis. Agric. Biol. Chem. 53, 2015-2017 (1989). 180. Nagata, Y. et al. Purification and characterization of y-hexachlorocyclohexane (y-HCH) dehydro-chlorinase (LinA) from Pseudomonas paucimobilis. Biosci. Biotechnol. Biochem. 57, 1582-1583 (1993). 181. Nagata, Y., Mori, K., Takagi, M., Murzin, A. G. & Damborský, J. Identification of protein fold and catalytic residues of gamma-hexachlorocyclohexane dehydrochlorinase LinA. Proteins 45, 471-477 (2001). 182. Álvarez-Pedrerol, M. et al. Thyroid disruption at birth due to prenatal exposure to ß-hexachlorocyclohexane. Environ. Int. 34, 737-740 (2008). 229 183. Nagata, Y., Endo, R., Ito, M., Ohtsubo, Y. & Tsuda, M. Aerobic degradation of lindane (gammahexachlorocyclohexane) in bacteria and its biochemical and molecular basis. Appl. Microbiol. Biotechnol. 76, 741- 752 (2007). 184. Okai, M. etal. Crystal Structure of y-Hexachlorocyclohexane Dehydrochlorinase LinAfrom Sphingobium japonicum UT26. J. Mol. Biol. 403, 260-269 (2010). 185. Chen, J. M., Xu, S. L, Wawrzak, Z., Basarab, G. S. & Jordan, D. B. Structure-Based Design of Potent Inhibitors of Scytalone Dehydratase: Displacement of a Water Molecule from the Active Site. Biochemistry (Mosc.) 37,17735- 17744(1998). 186. Macwan, A. S. etal. Crystal Structure of the Hexachlorocyclohexane Dehydrochlorinase (LinA-Type2): Mutational Analysis, Thermostability and Enantioselectivity. PLoS ONE 7, (2012). 187. Trantirek, L etal. Reaction mechanism and stereochemistry of gamma-hexachlorocyclohexane dehydrochlorinase LinA. J. Biol. Chem. 276, 7734-7740 (2001). 188. Ornitz, D. M. & Itoh, N. Fibroblast growth factors. Genome Biol. 2, reviews3005.1-reviews3005.12 (2001). 189. Itoh, N. & Ornitz, D. M. Evolution of the Fgf and Fgfr gene families. Trends Genet. 20, 563-569 (2004). 190. Beenken, A. & Mohammadi, M. The FGF family: biology, pathophysiology and therapy. Nat. Rev. Drug Discov. 8, 235-253 (2009). 191. Kim, H. S. Assignmentl of the human basic fibroblast growth factor gene FGF2 to chromosome 4 band q26 by radiation hybrid mapping. Cytogenet. Cell Genet. 83, 73 (1998). 192. House, S. L et al. Cardiac-Specific Overexpression of Fibroblast Growth Factor-2 Protects Against Myocardial Dysfunction and Infarction in a Murine Model of Low-Flow Ischemia. Circulation 108, 3140-3148 (2003). 193. Aviles, R. J., Annex, B. H. & Lederman, R. J. Testing clinical therapeutic angiogenesis using basic fibroblast growth factor (FGF-2). Br. J. Pharmacol. 140, 637-646 (2003). 194. Kitamura, M. et al. Periodontal Tissue Regeneration Using Fibroblast Growth Factor -2: Randomized Controlled Phase II Clinical Trial. PLOS ONE 3, e2611 (2008). 195. Turner, C A., Gula, E. L, Taylor, L. P., Watson, S. J. & Akil, H. Antidepressant-like effects of intracerebroventricular FGF2 in rats. Brain Res. 1224, 63-68 (2008). 196. Lotz, S. et al. Sustained Levels of FGF2 Maintain Undifferentiated Stem Cell Cultures with Biweekly Feeding. PLoS ONE 8, (2013). 197. Florkiewicz, R. Z. & Sommer, A. Human basic fibroblast growth factor gene encodes four polypeptides: three initiate translation from non-AUG codons. Proc. Natl. Acad. Sci. 86, 3978-3981 (1989). 198. Zhang, J. D., Cousens, L. S., Barr, P. J. & Sprang, S. R. Three-dimensional structure of human basic fibroblast growth factor, a structural homolog of interleukin 1 beta. Proc. Natl. Acad. Sci. U. S. A. 88, 3446-3450 (1991). 199. Baird, A., Schubert, D., Ling, N. & Guillemin, R. Receptor- and heparin-binding domains of basic fibroblast growth factor. Proc. Natl. Acad. Sci. U. S. A. 85, 2324-2328 (1988). 200. Bikfalvi, A., Klein, S., Pintucci, G. & Rifkin, D. B. Biological roles of fibroblast growth factor-2. Endocr. Rev. 18, 26- 45 (1997). 201. Polizzi, K. M., Bommarius, A. S., Broering, J. M. & Chaparro-Riggers, J. F. Stability of biocatalysts. Curr. Opin. Chem. Biol. 11, 220-225 (2007). 202. Ferdjani, S. et al. Correlation between thermostability and stability of glycosidases in ionic liquid. Biotechnol. Lett. 33,1215-1219 (2011). 203. Gao, D. et al. Thermostable variants of cocaine esterase for long-time protection against cocaine toxicity. Mol. Pharmacol. 75, 318-323 (2009). 204. Wijma, H. J., Floor, R. J. & Janssen, D. B. Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr. Opin. Struct. Biol. 23, 588-594 (2013). 205. Bommarius, A. S. & Paye, M. F. Stabilizing biocatalysts. Chem. Soc. Rev. 42, 6534-6565 (2013). 206. Bloom, J. D., Labthavikul, S. T., Otey, C R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl. Acad. Sci. U. S. A. 103, 5869-5874 (2006). 207. Gumulya, Y. & Reetz, M. T. Enhancing the thermal robustness of an enzyme by directed evolution: least favorable starting points and inferior mutants can map superior evolutionary pathways. Chembiochem Eur. J. Chem. Biol. 12, 2502-2510 (2011). 230 208. Seitz, T. et al. Enhancing the stability and solubility of the glucocorticoid receptor ligand-binding domain by highthroughput library screening. J. Mol. Biol. 403, 562-577 (2010). 209. Damborsky, J. & Brezovsky, J. Computational tools for designing and engineering biocatalysts. Curr. Opin. Chem. Biol. 13, 26-34 (2009). 210. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of residues in the access tunnel. Angew. Chem. Int. Ed Engl. 52,1959-1963 (2013). 211. Wijma, H. J. etal. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sei. 27, 49- 58 (2014). 212. Komor, R. S., Romero, P. A., Xie, C B. & Arnold, F. H. Highly thermostable fungal cellobiohydrolase I (Cel7A) engineered using predictive methods. Protein Eng. Des. Sei. PEDS 25, 827-833 (2012). 213. Reetz, M.T., Soni, P., Acevedo, J. P. & Sanchis, J. Creation of an amino acid network of structurally coupled residues in the directed evolution of a thermostable enzyme. Angew. Chem. Int. Ed Engl. 48, 8268-8272 (2009). 214. Khatun, J., Khare, S. D. & Dokholyan, N. V. Can contact potentials reliably predict stability of proteins? J. Mol. Biol. 336, 1223-1238 (2004). 215. Parthiban, V., Gromiha, M. M. & Schomburg, D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res. 34, W239-242 (2006). 216. Pupko, T., Bell, R. E., Mayrose, I., Glaser, F. & Ben-Tal, N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinforma. Oxf. Engl. 18 Suppl 1, S71-77 (2002). 217. Landau, M. et al. ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res. 33, W299-302 (2005). 218. Ashkenazy, H., Erez, E., Maitz, E., Pupko, T. & Ben-Tal, N. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 38, W529-533 (2010). 219. Lehmann, M. et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng. 15, 403-411 (2002). 220. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of residues in the access tunnel. Angew. Chem. Int. Ed Engl. 52,1959-1963 (2013). 221. Gray, K. A. et al. Rapid Evolution of Reversible Denaturation and Elevated Melting Temperature in a Microbial Haloalkane Dehalogenase. Adv. Synth. Catal. 343, 607-617 (2001). 222. Kuipers, R. K. P. et al. Correlated mutation analyses on super-family alignments reveal functionally important residues. Proteins 76, 608-616 (2009). 223. Koudelakova, T. et al. Substrate specificity of haloalkane dehalogenases. Biochem. J. 435, 345-54 (2011). 224. Diaz, J. E. etal. Computational design and selections for an engineered, thermostable terpene synthase. Protein Sei. Puhl. Protein Soc. 20,1597-1606 (2011). 225. Floor, R. J. etal. Computational library design for increasing haloalkane dehalogenase stability. Chembiochem Eur. J. Chem. Biol. 15, 1660-1672 (2014). 226. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402 (1997). 227. Sayers, E. W. etal. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 40, D13-D25 (2012). 228. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinforma. Oxf. Engl. 22,1658-1659 (2006). 229. Frickey, T. & Lupas, A. CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinforma. Oxf. Engl. 20, 3702-3704 (2004). 230. Edgar, R. C MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5,113 (2004). 231. Mika, S. & Rost, B. UniqueProt: Creating representative protein sequence sets. Nucleic Acids Res. 31, 3789-3791 (2003). 232. Whelan, S. & Goldman, N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691-699 (2001). 231 233. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database forthe investigation of sequences and structures. J. Mol. Biol. 247, 536-540 (1995). 234. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235-42 (2000). 235. The PyMOL molecular graphics system, version 1.7, Schrodinger, LLC. 236. Costantini, S., Colonna, G. & Facchiano, A. M. ESBRI: A web server for evaluating salt bridges in proteins. Bioinformation 3,137-138 (2008). 237. Tina, K. G., Bhadra, R. & Srinivasan, N. PIC: Protein Interactions Calculator. Nucleic Acids Res. 35, W473-476 (2007). 238. Pavlova, M. etal. Redesigning dehalogenase access tunnels as a strategy for degrading an anthropogenic substrate. Nat. Chem. Biol. 5, 727-33 (2009). 239. Okai, M. et al. Crystallization and preliminary X-ray analysis of gamma-hexachlorocyclohexane dehydrochlorinase LinA from Sphingobium japonicum UT26. Acta Crystallograph. Sect. F Struct. Biol. Cryst. Commun. 65, 822-824 (2009). 240. Stepankova, V., Damborsky, J. & Chaloupkova, R. Organic co-solvents affect activity, stability and enantioselectivity of haloalkane dehalogenases. Biotechnol. J. 8, 719-729 (2013). 241. Iwasaki, I., Utsumi, S., 0. New Colorimetric Determination of Chloride usinf Mercuric Thiocyanate and Ferric Ion. Bull Chem SocJpn 25, 226-226 (1952). 242. Kelly, S. M., Jess, T. J. & Price, N. C How to study proteins by circular dichroism. Biochim. Biophys. Acta 1751,119- 139 (2005). 243. Ladbrooke, B. D., Williams, R. M. & Chapman, D. Studies on lecithin-cholesterol-water interactions by differential scanning calorimetry and X-ray diffraction. Biochim. Biophys. Acta 150, 333-340 (1968). 244. Palackal, N. et al. An evolutionary route to xylanase process fitness. Protein Sci. Publ. Protein Soc. 13, 494-503 (2004). 245. Johannes, T. W., Woodyer, R. D. & Zhao, H. Directed evolution of a thermostable phosphite dehydrogenase for NAD(P)H regeneration. Appl. Environ. Microbiol. 71, 5728-5734 (2005). 246. Seo, J. H., Yu, J. H., Suh, H., Kim, M.-S. & Cho, S.-R. Fibroblast growth factor-2 induced by enriched environment enhances angiogenesis and motor function in chronic hypoxic-ischemic brain injury. PloS One 8, e74405 (2013). 247. Ilkow, C S. et al. Reciprocal cellular cross-talk within the tumor microenvironment promotes oncolytic virus activity. Nat. Med. 21, 530-536 (2015). 248. Turner, C A., Clinton, S. M., Thompson, R. C, Watson, S. J. & Akil, H. Fibroblast growth factor-2 (FGF2) augmentation early in life alters hippocampal development and rescues the anxiety phenotype in vulnerable animals. Proc. Natl. Acad. Sci. U. S. A. 108, 8021-8025 (2011). 249. Ortega, S., Ittmann, M., Tsang, S. H., Ehrlich, M. & Basilico, C Neuronal defects and delayed wound healing in mice lacking fibroblast growth factor 2. Proc. Natl. Acad. Sci. U. S. A. 95, 5672-5677 (1998). 250. Sun, B. K., Siprashvili, Z. & Khavari, P. A. Advances in skin grafting and treatment of cutaneous wounds. Science 346, 941-945 (2014). 251. Song, X. etal. Growth Factor FGF2 Cooperates with lnterleukin-17 to Repair Intestinal Epithelial Damage. Immunity 43,488-501 (2015). 252. Levenstein, M. E. et al. Basic fibroblast growth factor support of human embryonic stem cell self-renewal. Stem Cells Dayt. Ohio 24, 568-574 (2006). 253. Chen, G., Gulbranson, D. R., Yu, P., Hou, Z. & Thomson, J. A. Thermal stability of fibroblast growth factor protein is a determinant factor in regulating self-renewal, differentiation, and reprogramming in human pluripotent stem cells. Stem Cells Dayt. Ohio 30, 623-630 (2012). 254. Buchtova, M. etal. Instability restricts signaling of multiple fibroblast growth factors. Cell. Mol. Life Sci. CMLS 72, 2445-2459 (2015). 255. Furue, M. K. et al. Heparin promotes the growth of human embryonic stem cells in a defined serum-free medium. Proc. Natl. Acad. Sci. U. S. A. 105, 13409-13414 (2008). 256. Nguyen, T. H. et al. A heparin-mimicking polymer conjugate stabilizes basic fibroblast growth factor. Nat. Chem. 5, 221-227 (2013). 257. Yoneda, A., Asada, M., Oda, Y., Suzuki, M. & Imamura, T. Engineering of an FGF-proteoglycan fusion protein with heparin-independent, mitogenic activity. Nat. Biotechnol. 18, 641-644 (2000). 232 258. Bornscheuer, U. T. et al. Engineering the third wave of biocatalysis. Nature 485,185-194 (2012). 259. Dubey, V. K., Lee, J., Somasundaram, T., Blaber, S. & Blaber, M. Spackling the crack: stabilizing human fibroblast growth factor-1 by targeting the N and C terminus beta-strand interactions. J. Mol. Biol. 371, 256-268 (2007). 260. Blaber, S. I., Diaz, J. & Blaber, M. Accelerated healing in NONcNZO10/LtJ type 2 diabetic mice by FGF-1. Wound Repair Regen. Off. Puhl. Wound Heal. Soc. Eur. Tissue Repair Soc. 23, 538-549 (2015). 261. Jeong, S. S. Thermostable variants of fibroblast growth factors. (2012). 262. Nolle, V. FIBROBLAST GROWTH FACTOR MUTEINS WITH INCREASED ACTIVITY. (2015). 263. Bednar, D. et al. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Comput. Biol. 11, (2015). 264. Sprugel, K. H., McPherson, J. M., Clowes, A. W. & Ross, R. Effects of growth factors in vivo. I. Cell ingrowth into porous subcutaneous chambers. Am. J. Pathol. 129, 601-613 (1987). 265. du Cros, D. L. Fibroblast growth factor influences the development and cycling of murine hair follicles. Dev. Biol. 156,444-453(1993). 266. Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95-98 (1999). 267. Mayrose, I., Graur, D., Ben-Tal, N. & Pupko, T. Comparison of Site-Specific Rate-Inference Methods for Protein Sequences: Empirical Bayesian Methods Are Superior. Mol. Biol. Evol. 21, 1781-1791 (2004). 268. Krissinel, E. Stock-based detection of protein oligomeric states in jsPISA. Nucleic Acids Res. 43, W314-W319 (2015). 269. Krejci, P., Pejchalova, K. & Wilcox, W. R. Simple, mammalian cell-based assay for identification of inhibitors of the Erk MAP kinase pathway. Invest. New Drugs 25, 391-395 (2007). 270. Lo, M.-C et al. Evaluation of fluorescence-based thermal shift assays for hit identification in drug discovery. Anal. Biochem. 332, 153-159 (2004). 271. Dvorak, P. etal. Expression and Potential Role of Fibroblast Growth Factor 2 and Its Receptors in Human Embryonic Stem Cells. STEM CELLS 23, 1200-1211 (2005). 272. Eiselleova, L. et al. A Complex Role for FGF-2 in Self-Renewal, Survival, and Adhesion of Human Embryonic Stem Cells. STEM CELLS 27,1847-1857 (2009). 273. Horák, D. etal. Use of magnetic hydrazide-modified polymer microspheres for enrichment of Francisella tularensis glycoproteins. Soft Matter 8, 2775-2786 (2012). 274. Müller-Röver, S. et al. A Comprehensive Guide for the Accurate Classification of Murine Hair Follicles in Distinct Hair Cycle Stages. J. Invest. Dermatol. 117, 3-15 (2001). 275. Beadle, B. M. & Shoichet, B. K. Structural bases of stability-function tradeoffs in enzymes. J. Mol. Biol. 321, 285- 296 (2002). 276. Schreiber, G., Buckle, A. M. & Fersht, A. R. Stability and function: two constraints in the evolution of barstar and other proteins. Struct. Lond. Engl. 1993 2, 945-951 (1994). 277. Thomas, V. L., McReynolds, A. C & Shoichet, B. K. Structural bases for stability-function tradeoffs in antibiotic resistance. J. Mol. Biol. 396, 47-59 (2010). 278. Wang, X., Minasov, G. & Shoichet, B. K. Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs. J. Mol. Biol. 320, 85-95 (2002). 279. Davids, T., Schmidt, M., Böttcher, D. & Bornscheuer, U. T. Strategies for the discovery and engineering of enzymes for biocatalysis. Curr. Opin. Chem. Biol. 17, 215-220 (2013). 280. Chaloupková, R. et al. Modification of activity and specificity of haloalkane dehalogenase from Sphingomonas paucimobilis UT26 by engineering of its entrance tunnel. J. Biol. Chem. 278, 52622-8 (2003). 281. Holland, J. T. et al. Rational Redesign of Glucose Oxidase for Improved Catalytic Function and Stability. PLoS ONE 7, (2012). 282. Zhao, H., Chockalingam, K. & Chen, Z. Directed evolution of enzymes and pathways for industrial biocatalysis. Curr. Opin. Biotechnol. 13,104-110 (2002). 283. Dalby, P. A. Strategy and success for the directed evolution of enzymes. Curr. Opin. Struct. Biol. 21,473-480(2011). 284. Lehmann, M. & Wyss, M. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr. Opin. Biotechnol. 12, 371-375 (2001). 233 285. Chica, R. A., Doucet, N. & Pelletier, J. N. Semi-rational approaches to engineering enzyme activity: combining the benefits of directed evolution and rational design. Curr. Opin. Biotechnol. 16, 378-384 (2005). 286. Brundiek, H. B., Evitt, A. S., Kourist, R. & Bornscheuer, U. T. Creation of a lipase highly selective for trans fatty acids by protein engineering. Angew. Chem. Int. Ed Engl. 51, 412-4 (2012). 287. Stucki, G. &Thueer, M. Experiences of a Large-Scale Application of 1,2-Dichloroethane Degrading Microorganisms for Groundwater Treatment. Environ. Sci. Technol. 29, 2339-2345 (1995). 288. Skopelitou, K., Georgakis, N., Efrose, R., Flemetakis, E. & Labrou, N. E. Sol-gel immobilization of haloalkane dehalogenase from Bradyrhizobium japonicum for the remediation 1,2-dibromoethane../. Mol. Catal. B Enzym. 97, 5-11 (2013). 289. Bidmanova, S., Hlavacek, A., Damborsky, J. & Prokop, Z. Conjugation of 5(6)-carboxyfluorescein and 5(6)carboxynaphthofluorescein with bovine serum albumin and their immobilization for optical pH sensing. Sens. Actuators B Chem. 161, 93-99 (2012). 290. Koudelakova, T. etal. Haloalkane dehalogenases: biotechnological applications. Biotechnol. J. 8, 32-45 (2013). 291. Los, G. V. et al. HaloTag: a novel protein labeling technology for cell imaging and protein analysis. ACS Chem. Biol. 3, 373-382 (2008). 292. Chaloupkova, R., Prokop, Z., Sato, Y., Nagata, Y. & Damborsky, J. Stereoselectivity and conformational stability of haloalkane dehalogenase DbjA from Bradyrhizobium japonicum USDA110: the effect of pH and temperature. FEBS J. 278, 2728-2738 (2011). 293. Reetz, M. T. & Carballeira, J. D. Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat. Protoc. 2, 891-903 (2007). 294. Holloway, P., Trevors, J. T. & Lee*, H. A colorimetric assay for detecting haloalkane dehalogenase activity. J. Microbiol. Methods 32, 31-36 (1998). 295. Circular Dichroism and the Conformational Analysis of / G.D. Fasman / Springer. 296. Koudelakova, T. et al. Engineering enzyme stability and resistance to an organic cosolvent by modification of residues in the access tunnel. Angew. Chem. Int. Ed Engl. 52,1959-63 (2013). 297. Stsiapanava, A. et al. Atomic resolution studies of haloalkane dehalogenases DhaA04, DhaA14 and DhaA15 with engineered access tunnels. Acta Crystallogr. D Biol. Crystallogr. 66, 962-9 (2010). 298. Newman, J. etal. Haloalkane dehalogenases: structure of a Rhodococcus enzyme. Biochemistry (Mosc.) 38,16105- 16114(1999). 299. Chovancova, E. etal. CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput. Biol. 8, el002708 (2012). 300. Damborsky, J. & Brezovsky, J. Computational tools for designing and engineering enzymes. Curr. Opin. Chem. Biol. 19,8-16 (2014). 301. Yan, B. X. & Sun, Y. Q. Glycine Residues Provide Flexibility for Enzyme Active Sites. J. Biol. Chem. 272, 3190-3194 (1997). 302. Neurath, H. The Role of Glycine in Protein Structure. J. Am. Chem. Soc. 65, 2039-2041 (1943). 303. Feller, G. Protein stability and enzyme activity at extreme biological temperatures. J. Phys. Condens. Matter Inst. Phys. J. 22, 323101 (2010). 304. Jochens, H., Aerts, D. & Bornscheuer, U. T. Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Eng. Des. Sel. 23, 903-909 (2010). 305. Reetz, M. T., Soni, P., Fernandez, L., Gumulya, Y. & Carballeira, J. D. Increasing the stability of an enzyme toward hostile organic solvents by directed evolution based on iterative saturation mutagenesis using the B-FIT method. Chem. Commun. Camb. Engl. 46, 8657-8658 (2010). 306. Fields, P. A. Review: Protein function at thermal extremes: balancing stability and flexibility. Comp. Biochem. Physiol. A. Mol. Integr. Physiol. 129, 417-431 (2001). 307. Cesarini, S., Bofill, C, Pastor, F. I. J., Reetz, M. T. & Diaz, P. A thermostable variant of P. aeruginosa cold-adapted LipC obtained by rational design and saturation mutagenesis. Process Biochem. 47, 2064-2071 (2012). 308. Tsou, C L. Conformational flexibility of enzyme active sites. Science 262, 380-381 (1993). 234 309. Banás, P., Otyepka, M., Jeřábek, P., Petrek, M. & Damborský, J. Mechanism of enhanced conversion of 1,2,3trichloropropane by mutant haloalkane dehalogenase revealed by molecular modeling. J. Comput. Aided Mol. Des. 20, 375-83 (2006). 310. Damborsky, J. et al. Method of Thermostabilization of a Protein and/or Stabilization Towards Organic Solvents. (2013). 311. Prokop, Z. et al. Engineering of protein tunnels: Keyhole-lock-key model for catalysis by the enzymes with buried active sites. Protein Eng. Handb. 3, 421-464 (2012). 312. Nguyen, T.-A. et al. Improvement of Cyclophosphamide Activation by CYP2B6 Mutants: From in Silico to ex Vivo. Mol. Pharmacol. 73, 1122-1133 (2008). 313. Fishelovitch, D., Shaik, S., Wolfson, H. J. & Nussinov, R. Theoretical characterization of substrate access/exit channels in the human cytochrome P450 3A4 enzyme: involvement of phenylalanine residues in the gating mechanism. J. Phys. Chem. B 113, 13018-13025 (2009). 314. Khan, K. K., He, Y. Q., Domanski, T. L & Halpert, J. R. Midazolam oxidation by cytochrome P450 3A4 and active-site mutants: an evaluation of multiple binding sites and of the metabolic pathway that leads to enzyme inactivation. Mol. Pharmacol. 61, 495-506 (2002). 315. Wen, Z., Baudry, J., Berenbaum, M. R. & Schuler, M. A. llell5Leu mutation in the SRS1 region of an insect cytochrome P450 (CYP6B1) compromises substrate turnover via changes in a predicted product release channel. Protein Eng. Des. Sel. PEDS 18,191-199 (2005). 316. Carmichael, A. B. & Wong, L. L. Protein engineering of Bacillus megaterium CYP102. The oxidation of polycyclic aromatic hydrocarbons. Eur. J. Biochem. 268, 3117-3125 (2001). 317. Lee, H.-L, Chang, C-K., Jeng, W.-Y., Wang, A. H.-J. & Liang, P.-H. Mutations in the substrate entrance region of ßglucosidase from Trichoderma reesei improve enzyme activity and thermostability. Protein Eng. Des. Sel. PEDS 25, 733-40 (2012). 318. Schmitt, J., Brocca, S., Schmid, R. D. & Pleiss, J. Blocking the tunnel: engineering of Candida rugosa lipase mutants with short chain length specificity. Protein Eng. 15, 595-601 (2002). 319. Qian, Z., Horton, J. R., Cheng, X. & Lutz, S. Structural redesign of lipase B from Candida antarctica by circular permutation and incremental truncation. J. Mol. Biol. 393,191-201 (2009). 320. Marton, Z. et al. Mutations in the stereospecificity pocket and at the entrance of the active site of Candida antarctica lipase B enhancing enzyme enantioselectivity. J. Mol. Catal. B Enzym. 65, 11-17 (2010). 321. Kamal, M. Z., Mohammad, T. A. S., Krishnamoorthy, G. & Rao, N. M. Role of Active Site Rigidity in Activity: MD Simulation and Fluorescence Study on a Lipase Mutant. PLOS ONE 7, e35188 (2012). 322. Brundiek, H., Padhi, S. K., Kourist, R., Evitt, A. & Bornscheuer, U. T. Altering the scissile fatty acid binding site of Candida antarctica lipase A by protein engineering for the selective hydrolysis of medium chain fatty acids. Eur. J. Lipid Sei. Technol. 114,1148-1153 (2012). 323. Schließmann, A., Hidalgo, A., Berenguer, J. & Bornscheuer, U. T. Increased Enantioselectivity by Engineering Bottleneck Mutants in an Esterase from Pseudomonas fluorescens. ChemBioChem 10, 2920-2923 (2009). 324. Kotik, M., Štěpánek, V., Kyslík, P. & Marešová, H. Cloning of an epoxide hydrolase-encoding gene from Aspergillus niger M200, overexpression in E. coli, and modification of activity and enantioselectivity of the enzyme by protein engineering. J. Biotechnol. 132, 8-15 (2007). 325. van Loo, B. etal. Directed evolution of epoxide hydrolase from A. radiobacter toward higher enantioselectivity by error-prone PCR and DNA shuffling. Chem. Biol. 11, 981-990 (2004). 326. Biedermannová, L. et al. A single mutation in a tunnel to the active site changes the mechanism and kinetics of product release in haloalkane dehalogenase LinB. J. Biol. Chem. 287, 29062-29074 (2012). 327. Bosma, T., Damborský, J., Stucki, G. & Janssen, D. B. Biodegradation of 1,2,3-Trichloropropane through Directed Evolution and Heterologous Expression of a Haloalkane Dehalogenase Gene. Appl. Environ. Microbiol. 68, 3582- 3587 (2002). 328. Klvana, M. et al. Pathways and mechanisms for product release in the engineered haloalkane dehalogenases explored using classical and random acceleration molecular dynamics simulations. J. Mol. Biol. 392, 1339-1356 (2009). 329. Gora, A., Brezovsky, J. & Damborsky, J. Gates of Enzymes. Chem. Rev. 113, 5871-5923 (2013). 235 330. Sambrook, J. Molecular Cloning: A Laboratory Manual, Third Edition. (Cold Spring Harbor Laboratory Press, 2001). 331. Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37-52 (1987). 332. Segel, I. H. Enzyme kinetics: behavior and analysis of rapid equilibrium and steady state enzyme systems. (Wiley, 1975). 333. Kabsch, W. Automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell constants. J. Appl. Crystallogr. 26, 795-800 (1993). 334. Vagin, A. & Teplyakov, A. MOLREP: an Automated Program for Molecular Replacement. J. Appl. Crystallogr. 30, 1022-1025 (1997). 335. Stsiapanava, A. et al. Crystallization and preliminary X-ray diffraction analysis of the wild-type haloalkane dehalogenase DhaA and its variant DhaA13 complexed with different ligands. Acta Crystallograph. Sect. F Struct. Biol. Cryst. Commun. 67, 253-257 (2011). 336. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of Macromolecular Structures by the MaximumLikelihood Method. Acta Crystallogr. D Biol. Crystallogr. 53, 240-255 (1997). 337. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486-501 (2010). 338. Vaguine, A. A., Richelle, J. & Wodak, S. J. SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D Biol. Crystallogr. 55, 191-205 (1999). 339. Chen, V. B. etal. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12-21 (2010). 340. Gordon, J. C et al. H++: A server for estimating pKas and adding missing hydrogens to macromolecules. Nucleic Acids Res. 33, W368-371 (2005). 341. Case, D. A. etal. AMBER 2016, University of California, San Francisco. (2016). 342. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926-935 (1983). 343. Hornak, V. et al. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins 65, 712-725 (2006). 344. Essmann, U. et al. A smooth particle mesh Ewald method. J. Chem. Phys. 103, 8577-8593 (1995). 345. Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. . Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327-341 (1977). 346. Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33-38, 27-28 (1996). 347. Fogarty, A. C, Duboue-Dijon, E., Sterpone, F., Hynes, J. T. & Laage, D. Biomolecular hydration dynamics: a jump model perspective. Chem. Soc. Rev. 42, 5672-5683 (2013). 348. Privett, H. K. etal. Iterative approach to computational enzyme design. Proc. Natl. Acad. Sci. U. S. A. 109, 3790- 3795 (2012). 349. Sykora, J. et al. Dynamics and hydration explain failed functional transformation in dehalogenase design. Nat. Chem. Biol. 10, 428-430 (2014). 350. Sekhar, A., Vallurupalli, P. & Kay, L. E. Defining a length scale for millisecond-timescale protein conformational exchange. Proc. Natl. Acad. Sci. U. S. A. 110,11391-11396 (2013). 351. Levy, Y. & Onuchic, J. N. Water mediation in protein folding and molecular recognition. Annu. Rev. Biophys. Biomol. Struct. 35, 389-415 (2006). 352. Grossman, M. et al. Correlated structural kinetics and retarded solvent dynamics at the metalloprotease active site. Nat. Struct. Mol. Biol. 18, 1102-1108 (2011). 353. Nucci, N. V., Pometun, M. S. & Wand, A. J. Site-resolved measurement of water-protein interactions by solution NMR. Nat. Struct. Mol. Biol. 18, 245-249 (2011). 354. Russo, D., Hura, G. & Head-Gordon, T. Hydration dynamics near a model protein surface. Biophys. J. 86,1852-1862 (2004). 355. Oleinikova, A., Sasisanker, P. & Weingartner, H. What Can Really Be Learned from Dielectric Spectroscopy of Protein Solutions? A Case Study of Ribonuclease A. J. Phys. Chem. B 108, 8467-8474 (2004). 236 356. King, J. T. & Kubarych, K. J. Site-specific coupling of hydration water and protein flexibility studied in solution with ultrafast 2D-IR spectroscopy. J. Am. Chem. Soc. 134,18705-18712 (2012). 357. Jesenskä, A. et al. Nanosecond time-dependent Stokes shift at the tunnel mouth of haloalkane dehalogenases. J. Am. Chem. Soc. 131, 494-501 (2009). 358. Summerer, D. et al. A genetically encoded fluorescent amino acid. Proc. Natl. Acad. Sei. 103, 9785-9789 (2006). 359. Mills, J. H., Lee, H. S., Liu, C. C, Wang, J. & Schultz, P. G. A genetically encoded direct sensor of antibody-antigen interactions. Chembiochem Eur. J. Chem. Biol. 10, 2162-2164 (2009). 360. Choudhury, S. D. & Pal, H. Modulation of excited-state proton-transfer reactions of 7-hydroxy-4-methylcoumarin in ionic and nonionic reverse micelles. J. Phys. Chem. B 113, 6736-6744 (2009). 361. Choudhury, S. D., Nath, S. & Pal, H. Excited-state proton transfer behavior of 7-hydroxy-4-methylcoumarin in AOT reverse micelles. J. Phys. Chem. B 112, 7748-7753 (2008). 362. Sato, Y. et al. Two rhizobial strains, Mesorhizobium loti MAFF303099 and Bradyrhizobium japonicum USDA110, encode haloalkane dehalogenases with novel structures and substrate specificities. Appl. Environ. Microbiol. 71, 4372-4379 (2005). 363. Joung, I. S. & Cheatham, 3rd, T. E. Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J. Phys. Chem. B 112, 9020-9041 (2008). 364. Joung, I. S. & Cheatham, 3rd, T. E. Molecular dynamics simulations of the dynamic and energetic properties of alkali and halide ions using water-model-specific ion parameters. J. Phys. Chem. B 113, 13279-13290 (2009). 365. Hanwell, M. D. et al. Avogadro: An advanced semantic chemical editor, visualization, and analysis platform. J. Cheminformatics 4,17 (2012). 366. Gaussian 09, Revision A.l, M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino, G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, 0. Kitao, H. Nakai, T. Vreven, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C Burant, S. S. Iyengar, J.Tomasi, M. Cossi, N. Rega, J. M. Millam, M. Kiene, J. E. Knox, J. B. Cross, V. Bakken, C Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, 0. Yazyev, A. J. Austin, R. Cammi, C Pomelli, J. W. Ochterski, R. L. Martin, K. Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dannenberg, S. Dapprich, A. D. Daniels, Ö. Farkas, J. B. Foresman, J. V. Ortiz, J. Cioslowski, and D. J. Fox, Gaussian, Inc., Wallingford CT, 2009. (2017). Available at: http://www.surfchem.fudan.edu.cn/teacher/lizh/Usefull_Files/g09/g_tech/g_ur/m_citation.htm. (Accessed: 10th February 2017) 367. Dupradeau, F.-Y. etal. The R.E.D. tools: advances in RESP and ESP charge derivation and force field library building. Phys. Chem. Chem. Phys. PCCP 12, 7821-7839 (2010). 368. Vanquelef, E. et al. R.E.D. Server: a web service for deriving RESP and ESP charges and building force field libraries for new molecules and molecular fragments. Nucleic Acids Res. 39, W511-517 (2011). 369. Cornell, W. D. et al. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules J. Am. Chem. Soc. 1995,117, 5179-5197. J. Am. Chem. Soc. 118, 2309-2309 (1996). 370. VanBeek, D. B., Zwier, M. C, Shorb, J. M. & Krueger, B. P. Fretting about FRET: correlation between kappa and R. Biophys. J. 92, 4168-4178 (2007). 371. Rose, P. W. et al. The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 41, D475-482 (2013). 372. Götz, A. W. et al. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 8, 1542-1555 (2012). 373. Le Grand, S., Götz, A. W. & Walker, R. C SPFP: Speed without compromise—A mixed precision model for GPU accelerated molecular dynamics simulations. Comput. Phys. Commun. 184, 374-380 (2013). 374. Roe, D. R. & Cheatham, T. E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. 7. Chem. Theory Comput. 9, 3084-3095 (2013). 375. Miller, B. R. et al. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. J. Chem. Theory Comput. 8, 3314-3321 (2012). 237 376. Breuer, M. et al. Industrial methods for the production of optically active intermediates. Angew. Chem. Int. Ed Engl. 43, 788-824 (2004). 377. Acevedo-Rocha, C. G., Agudo, R. & Reetz, M. T. Directed evolution of stereoselective enzymes based on genetic selection as opposed to screening systems. J. Biotechnol. 191, 3-10 (2014). 378. Choi, J.-M., Han, S.-S. & Kim, H.-S. Industrial applications of enzyme biocatalysis: Current status and future aspects. Biotechnol. Adv. 33, 1443-1454 (2015). 379. Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science 329, 309-313 (2010). 380. Heinisch, T. et al. Improving the Catalytic Performance of an Artificial Metalloenzyme by Computational Design. J. Am. Chem. Soc. 137,10414-10419 (2015). 381. Muthu, P., Chen, H. X. & Lutz, S. Redesigning human 2'-deoxycytidine kinase enantioselectivity for L-nucleoside analogues as reporters in positron emission tomography. ACS Chem. Biol. 9, 2326-2333 (2014). 382. Wijma, H.J. etal. Enantioselective enzymes by computational design and in silico screening. Angew. Chem. Int. Ed Engl. 54, 3726-3730 (2015). 383. Schopf, P. & Warshel, A. Validating computer simulations of enantioselective catalysis; reproducing the large steric and entropic contributions in Candida Antarctica lipase B. Proteins 82,1387-1399 (2014). 384. Westerbeek, A., Szymanski, W., Feringa, B. & Janssen, D. Dynamic kinetic resolution process employing haloalkane dehalogenase. ACS Catal. 1, 1654-1660 (2011). 385. Patel, R. N. Biocatalysis: synthesis of chiral intermediates for drugs. Curr. Opin. Drug Discov. Devel. 9, 741-764 (2006). 386. van Leeuwen, J. G. E., Wijma, H. J., Floor, R. J., van der Laan, J.-M. & Janssen, D. B. Directed evolution strategies for enantiocomplementary haloalkane dehalogenases: from chemical waste to enantiopure building blocks. Chembiochem Eur. J. Chem. Biol. 13, 137-148 (2012). 387. Phillips, R. S. Temperature effects on stereochemistry of enzymatic reactions. Enzyme Microb. Technol. 14, 417- 419 (1992). 388. Amaro, M. et al. Site-specific analysis of protein hydration based on unnatural amino acid fluorescence. J. Am. Chem. Soc. 137, 4988-4992 (2015). 389. Ottosson, J., Fransson, L, King, J. W. & Hult, K. Size as a parameter for solvent effects on Candida antarctica lipase B enantioselectivity. Biochim. Biophys. Acta 1594, 325-334 (2002). 390. Johnson, K. A. in The Enzymes (ed. Sigman, D. S.) 20,1-61 (Academic Press, 1992). 391. Johnson, K. A., Simpson, Z. B. & Blom, T. Global kinetic explorer: a new computer program for dynamic simulation and fitting of kinetic data. Anal. Biochem. 387, 20-29 (2009). 392. Johnson, K. A., Simpson, Z. B. & Blom, T. FitSpace explorer: an algorithm to evaluate multidimensional parameter space in fitting kinetic data. Anal. Biochem. 387, 30-41 (2009). 393. Sanner, M. F. Python: a programming language for software integration and development. J. Mol. Graph. Model. 17, 57-61 (1999). 394. Solis, F. J. & Wets, R. J.-B. Minimization by Random Search Techniques. Math. Oper. Res. 6,19-30 (1981). 395. Rocha, G. B., Freire, R. 0., Simas, A. M. & Stewart, J. J. P. RM1: a reparameterization of AMI for H, C, N, 0, P, S, F, CI, Br, and 1.7. Comput. Chem. 27,1101-1111 (2006). 396. Zhu, C, Byrd, R. H., Lu, P. & Nocedal, J. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-scale Boundconstrained Optimization. ACM Trans Math Softw 23, 550-560 (1997). 397. Hur, S., Kahn, K. & Bruice, T. C Comparison of formation of reactive conformers for the SN2 displacements by CH3C02- in water and by Aspl24-C02- in a haloalkane dehalogenase. Proc. Natl. Acad. Sci. U.S.A. 100,2215-2219 (2003). 238 CURRICULUM VITAE Personal information: Name: David Bednář Birth: 12.8.1986, Brno, Czech Republic Nationality: Czech Address: Svážná 2, Brno, 634 00 Tel.: +420 605 143 394 E-mail: 222755@mail.muni.cz Education: 2009 - 2011 - MSc. in Molecular Biology and Genetics, Faculty of Science, Masaryk University 2007-2010- Be. in Biochemistry, Faculty of Science, Masaryk University 2006 - 2009 - Be. in Molecular Biology and Genetics, Faculty of Science, Masaryk University Awards: 2015 - Award of the Dean of Faculty of Science, Masaryk University 2014 - II. Prize for the Best Speaker, Meeting of Biochemists and Molecular Biologists, 2014, Brno 2011-2014 - Brno Ph.D. Talent Scholarship funded by the Brno City Municipality Research Stage: 2- 6/2014 - Center for Integrative Proteomics Research, Rutgers University, New Jersey, USA Pedagogical Activities: 2011 - 2015 - Lector in Bioinformatics, practice - Faculty of Science, Masaryk University 2013 - 2015 - Assistant in Structural biology - practice, Faculty of Science, Masaryk University 2010,2012, 2014 - Assistant in practice, Summer School of Protein Engineering, Masaryk University 2016 - Lector in Summer School of Protein Engineering, Masaryk University 239 LIST OF PUBLICATIONS Bednař D.*, Beerens K.*, Šebestová E., Bendi J., Khare S., Chaloupková R., Prokop Z., Brezovsky J., Baker D., Damborsky J., 2015: FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Computational Biology 11: el004556. Dvorak P.*, Bednař D.*, Vanacek P.*, Bálek L.*, Eiselleova L.*, Štěpánková V., Šebestová E., Kunová Bosákova M., Konecna Z., Mazurenko S., Kuňka A, Horák D., Chaloupková R., Brezovsky J., Krejci P., Prokop Z., Dvorak P., Damborsky J. (2017) Computer-Assisted Engineering of Hyperstable Fibroblast Growth Factor 2. Scientific Reports (under review). Liškova V., Bednař D., Holubeva T., Prudnikova T., Řezačova P., Koudelakova T., Šebestová E., Kuta Smatanova I., Brezovsky J., Chaloupková R., Damborsky J., 2015: Balancing the Stability-Activity Trade-off by Fine-Tuning Dehalogenase Access Tunnels. ChemCatChem 7: 648-659. Amaro M., Brezovsky J., Kováčova S., Sýkora J., Bednař D., Němec V., Liškova V., Kurumbang N., Beerens K., Chaloupková R., Pa ruch K., Hof M., DamborskyJ., 2015: Site-specific Analysis of Protein Hydration Based on Unnatural Amino Acid Fluorescence. Journal of the American Chemical Society 137: 4988-4992. Liškova V., Štěpánková V., Bednař D., Prokop Z., Brezovsky J., Chaloupková R., Damborsky J, 2017: Striking Difference in Structural Bases of Enantioselectivity of Haloalkane Dehalogenases with Linear |3-Haloalkanes: Wide-open versus Occluded Active Site. Angewandte Chemie (under review). Kozlíkova B., Šebestová E., Sustr V., Brezovsky J., Strnad O., Daniel L., Bednař D., Pavelka A., Manak M., Bezděka M., Benes P., Kotry M., Gora A., DamborskyJ., Sochor J., 2014: CAVER Analyst 1.0: GraphicTool for Interactive Visualization and Analysis of Tunnels and Channels in Protein Structures. Bioinformatics 30: 2684-2685. 240 Musil M., Stourac J., Bendl J., Brezovský J., MartinekT., Zendulka J., Bednár D., Damborsky J., 2017. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Research (under review). Beerens K., Mazurenko S., Kunka A., Marques S. M., Hansen N., Musil M., Chaloupkova R., Waterman J., Brezovský J., Bednár D., ProkopZ., Damborsky J., 2017. Evolutionary Analysis is a Powerful Complement to Energy Calculations Allowing Entropy-Driven Stabilization. ACS Catalysis (under review). * authors contributed equally 241