1 Advanced molecular biology tools Molecular cloning and gene assembly in biotechnology, biomedicine and basic research Martin Marek Loschmidt Laboratories Faculty of Science, Masaryk University Kamenice 5, bld. C13, room 332 martin.marek@recetox.muni.cz Molecular Biotechnology (2022) 2 What is molecular biology? The molecular biology studies biological macromolecules and the molecular mechanisms found in living systems, such as the molecular nature of the gene and its mechanisms of gene replication, mutation and expression. 3 Central dogma of molecular biology  The central dogma of molecular biology states that DNA contains instructions for making a protein, which are copied by RNA.  RNA then uses the instructions to make a protein.  In short: DNA → RNA → Protein, or DNA to RNA to Protein. How DNA directs protein synthesis Redundancy of the genetic code • Degeneracy of codons is the redundancy of the genetic code, exhibited as the multiplicity of three-base pair codon combinations that specify an amino acid • The genetic code is degenerate mainly at the third codon position • The genetic code consists of 64 triplet codons specifying 20 canonical amino acids and 3 stop signals 7  Genetic engineering is the action to modify the genetic information present in a living cell  Adding, substituting or removing a genetic information in a given biological system necessarily implies to physically introduce a new information into the target cell. DNA is the physical support of genetic information.  Therefore, genetic engineering relies on the generation of artificial DNA molecules, containing the information of interest, so it can be transferred to the target cells  In molecular biotechnologies, genetic engineering represents a process of taking a gene from one species and putting it into another species What is genetic engineering? Applications of molecular biotechnologies Why should we bother to think about molecular biotechnologies? Molecular biology in rational (computational) protein design The key role of protein production in pharma industry Protein production is a significant bottleneck in early phase drug discovery Recombinant protein production workflow • Gene synthesis and molecular cloning • Protein expression • Protein purification • Protein characterization • (Protein structure determination) 12 DNA synthesis & molecular cloning Concepts Methods Applications Structure of DNA A) A single nucleotide. The phosphate and deoxyribose sugar form the backbone of DNA. The nitrogenous base (in this case adenine) is the information-carrying unit of each nucleic acid. B) The structure of single-stranded DNA. In nature, enzymes form phosphodiester bonds (blue circles) that link the 5th position and 3rd position of adjacent deoxyribose sugars. Due to the modular nature of nucleotides, this chain can grow indefinitely. Chemical synthesis of DNA  Making DNA chemically rather than biologically was one of the first new technologies to be applied by the biotechnology industry. The ability to make short synthetic stretches of DNA is crucial to using DNA replication in laboratory techniques. DNA polymerase cannot synthesize DNA without a free 3′-OH end to elongate. Therefore, to use DNA polymerase in vitro, the researcher must supply a short primer. Such primers are used to sequence DNA, to amplify DNA with PCR, to introduce DNA mutations, and even to find genes in library screening.  Technically, oligonucleotides are any piece of DNA approx. 20 nucleotides in length, but today, oligonucleotide denotes a short piece of DNA (approx. 125 nucleotides) that is chemically synthesized.  Unlike in vivo DNA synthesis, artificial (chemical) synthesis is done in the 3′ to 5′ direction. Overview of the phosphoramidite approach This 4-step cycle repeats until the oligo receives its final nucleotide. Automated and miniaturized oligonucleotide synthesis Start-Coupling: the first phosphoramidite in the chain is attached to the solid surface with a catalyzed condensation reaction (think of this as linking pinkies when trying to hold hands). Oxidation: the phosphite triester is unstable, so it is converted to a phosphate to improve sequence integrity (think of this as grabbing hold of the hand tightly). Deblocking: the 5’ protecting group is removed in acidic conditions. Coupling: the next phosphoramidite in the chain is coupled to the available -OH on the previous deblocked molecule in a catalyzed reaction. Capping: as the coupling is not 100% efficient, sometimes the coupling fails. Therefore uncoupled sequences could create errors in the synthesized molecule. To stop this, an unreactive group is added blocking further extension. Repetition: the oxidation -> coupling cycle can be repeated to extend the oligonucleotide molecule in a desired sequence. The schematic view of the oligonucleotide synthesis (a) The synthesis processes involve designing target sequence, delivering chemical reagents through the inkjet printer, and oligonucleotide synthesis in the microreactor chip. (b) The single microreactor is filled with silica beads which enhance the surface area for the following synthesis. The beads are inherently fixed in the microreactor using sintering process. (c) The oligonucleotide synthesis on the silica beads follows the four-step with phosphoramidite strategy: deprotection, coupling, capping and oxidation. Enzymatic DNA synthesis: polymerase chain reaction (PCR) Amplification of up to 20 kbp DNA fragment from pre-existing template (genomic loci, cDNA library, cloned fragment etc.) Gel electrophoresis  Gel electrophoresis is a method for separation and analysis of macromolecules (DNA) and their fragments, based on their size and charge.  It is used in molecular biology to separate a mixed population of DNA fragments by length, to estimate the size of DNA fragments. Different PCR protocols Real-time PCR / quantitative PCR (qPCR)  It is a technique used to monitor the progress of a PCR reaction in real-time.  At the same time, a relatively small amount of PCR product (DNA, cDNA or RNA) can be quantified.  The process is monitored in “real-time”. The reaction is placed into a real-time PCR machine that watches the reaction occur with a camera or detector.  To link the amplification of DNA to the generation of fluorescence which can simply be detected with a camera during each PCR cycle.  Hence, as the number of gene copies increases during the reaction, so does the fluorescence, indicating the progress of the reaction. Digital PCR (dPCR)  Digital PCR is a highly precise approach to sensitive and reproducible nucleic acid detection and quantification.  Measurements are performed by dividing the sample into partitions, such that there are either zero or one or more target molecules present in any individual reaction.  Each partition is analyzed after end-point PCR cycling for the presence (positive reaction) or absence (negative reaction) of a fluorescent signal, and the absolute number of molecules present in the sample is calculated. It does not rely on a standard curve for sample target quantification.  Eliminating the reliance on standard curves reduces error and improves precision. Sample dilution and PCR reaction mix setup Blue – Target Red - Background (gDNA, cDNA; primers/probes; master mix) PCR reaction partitioning into thousands of individual reactions​ End-point PCR amplification of partitions Readout and absolute quantification Comparison between real-time qPCR and digital PCR Reverse transcription polymerase chain reaction (RT-PCR)  Conversion of RNA into cDNA using reverse transcriptase  Amplification of cDNA using PCR  cDNA is DNA that is synthesized from messenger RNA molecules. cDNA synthesis is catalyzed by an enzyme called reverse transcriptase, which uses RNA as a template for DNA synthesis. Reverse transcriptase was initially discovered and isolated from a retrovirus. These viruses contain an RNA genome; therefore the viruses need to produce a cDNA copy of their genome to be compatible with the host cell's molecular machinery. Plasmids: essential tools for genetic engineering What is a plasmid?  PLASMIDS are “extrachromosomal” (not part of the chromosomes), circular pieces of DNA.  Similarly to chromosomes, they are doublestranded, which means they can easily be “unzipped” and copied (replicated).  Plasmids use the host’s machinery (DNA polymerase), but they don’t have to wait for the host to divide to copy themselves → lots of copies of themselves.  When the cell does divide, these copies will get split between the daughter cells, so they’ll inherit the plasmid as well.  The plasmid can act as a VECTOR – a vehicle for taking genes we want to deliver into cells. Always sequence your plasmid to double-check that the gene is correctly inserted! Antibiotics as selectable markers Restriction endonucleases  Restriction enzyme, also called restriction endonuclease, is a protein produced by bacteria that cleaves DNA at specific sites along the molecule.  Restriction endonucleases cut the DNA double helix in very precise ways. It cleaves DNA into fragments at or near specific recognition sites within the molecule known as restriction sites.  They have the capacity to recognize specific base sequences on DNA and then to cut each strand at a given place. Hence, they are also called as ‘molecular scissors’. Restriction endonucleases (restriction enzymes) Type I enzymes cleave at sites remote from a recognition site; require both ATP and S-adenosyl-L-methionine to function; multifunctional protein with both restriction and methylase activities. Type II enzymes cleave within or at short specific distances from a recognition site; most require magnesium; single function (restriction) enzymes independent of methylase. Type III enzymes cleave at sites a short distance from a recognition site; require ATP (but do not hydrolyze it); S-adenosyl-L-methionine stimulates the reaction but is not required; it exists as part of a complex with a modification methylase. Type IV enzymes target modified DNA, e.g. methylated, hydroxymethylated and glucosyl-hydroxymethylated DNA. Restriction cloning The production of exact copies of a particular gene or DNA sequence using genetic engineering techniques is called gene cloning. In vitro PCR-based cloning  DNA fragment (gene) of interest can be flanked by any restriction site Golden gate DNA assembly Simultaneous and directional assembly of multiple DNA fragments into a single piece using Type IIs restriction enzymes and T4 DNA ligase. Sequence and ligation independent cloning (SLIC)  SLIC, or sequence and ligase independent cloning, does not utilize restriction enzymes or ligase.  A DNA sequence fragment to be cloned into a destination vector is PCR amplified with oligos whose 5' termini contain about 25 bp of sequence homology to the ends of the destination vector, linearized either by restriction digest or PCR amplification. Gibson DNA assembly Why Gibson Assembly?  No need for specific restriction sites.  Join almost any 2 fragments regardless of sequence.  No scar between joined fragments.  Fewer steps. One tube reaction.  Can combine many DNA fragments at once. Traditional restriction cloning versus Gibson assembly Large-scale de novo DNA synthesis  Protein engineering  Engineered metabolic pathways  Synthetic biology  Whole-genome syntheses  DNA nanotechnology (DNA computers) DNA synthesis prices 38 DNA mutagenesis Concepts Methods Applications 39 Site-directed mutagenesis • Site-directed mutagenesis is used to generate mutations that may produce a rationally designed protein that has improved or special properties (i.e. protein engineering). • The basic procedure requires the synthesis of a short DNA primer. This synthetic primer contains the desired mutation and is complementary to the template DNA around the mutation site so it can hybridize with the DNA in the gene of interest. The mutation may be a single base change (a point mutation), multiple base changes, deletion, or insertion. The single-strand primer is then extended using a DNA polymerase, which copies the rest of the gene. The copied gene thus contains the mutated site, and is then introduced into a host cell in a vector and cloned. Finally, mutants are selected by DNA sequencing to check that they contain the desired mutation. 40 Site-saturation mutagenesis (SSM) • Site saturation mutagenesis is used to substitute targeted residues to any other naturally occurring amino acid • The core of a SSM experiment lies in the codon degeneracy or randomness. A completely randomized codon (NNN, where N=A, C, G or T) results in a library size of 64 different sequences encoding all 20 amino acids and 3 stop codons • When an experiment targets multiple codons, the library size can be considerably higher, making it difficult to perform a complete screening (e.g. targeting three NNN codons has 262,144 unique codon configurations 41 Site-saturation mutagenesis (SSM) 42 Error-prone PCR (epPCR) • The error rate of Taq DNA polymerase is 0.001-0.002 % per nucleotide per replication cycle under standard conditions which is sufficient to create mutant libraries of large genes but not for small genes • Error-prone PCR (epPCR) takes advantage of the inherently low fidelity of Taq DNA Polymerase, which may be further decreased by the addition of Mn2+, increasing the Mg2+ concentration, and using unequal dNTP concentrations. • The rate of mutagenesis achieved by error-prone PCR is in the range of 0.6-2.0 % https://lifescience.canvaxbiotech.com/product/pickmutant-error-prone-pcr-kit/ 43 Mutator strains • Mutator strains of E. coli are deficient in one or more of DNA repair genes, leading to single base substitutions at a rate of approximately 1 mutation per 1000 base pairs • Generation of mutant libraries • Process is simple 44 Insertion and deletion (InDel) mutagenesis • Gain or lost of one or more nucleotides produces frameshift mutations (triplet reading frame) • Triplet InDel mutagenesis may trigger protein backbone changes essential for evolvability • Insertion and deletion mutations can enhance proteins through structural rearrangements not possible by substitution mutations alone Arpino et al., Structure 22: 889-898 (2014) Using directed evolution, green fluorescent protein (GFP) was observed to tolerate residue deletions, particularly within short and long loops, helical elements, and at the termini of strands. A variant with G4 removed from a helix (EGFPG4Δ) conferred significantly higher cellular fluorescence. 45 DNA shuffling • DNA shuffling is a method for in vitro recombination of homologous genes • The genes to be recombined are randomly fragmented by DNaseI, and fragments of the desired size are purified from an agarose gel • These fragments are then reassembled using cycles of denaturation, annealing, and extension by a polymerase • Recombination occurs when fragments from different parental templates anneal at a region of high sequence identity • Following this reassembly reaction, PCR amplification with primers is used to generate full-length chimeras suitable for cloning into an expression vector • Moving from DNA shuffling to whole genome shuffling is known as GENOME SHUFFLING The CRISPR/Cas9 system • The Cas9 (CRISPR associated protein 9) is a protein which plays a vital role in the immunological defense of bacteria against DNA viruses, and which is used in genetic engineering. Its main function is to cut DNA and therefore it can alter a cell's genome • Structurally, Cas9 is an RNA-guided DNA endonuclease enzyme associated with CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) adaptive immunity system in Streptococcus pyogenes • Cas9 performs this by unwinding foreign DNA and checking for sites complementary to the 20 bp spacer region of the guide RNA • If the DNA substrate is complementary to the guide RNA, Cas9 cleaves the invading DNA Crystal structure of S. pyogenes Cas9 in complex with sgRNA and its target DNA at 2.5 Å resolution, Nishimazu et al., Cell 156: 935–49 (2014) The CRISPR/Cas9 system: key elements • Cas9 nuclease specifically cleaves double-stranded DNA activating double-strand break repair machinery • In the absence of a homologous repair template non-homologous end joining can result in indels disrupting the target sequence • Alternatively, precise mutations and knock-ins can be made by providing a homologous repair template and exploiting the homology directed repair pathway Advantages of CRISPR/Cas9-mediated mutagenesis • The CRISPR/Cas9 system requires only the redesign of the crRNA to change target specificity • This contrasts with other genome editing tools, including zinc finger and TALENs, where redesign of the protein-DNA interface is required • Furthermore, CRISPR/Cas9 enables rapid genome-wide interrogation of gene function by generating large gRNA libraries for genomic screening Zinc-finger nucleases TALENs Cas9 The mini-RNA-guided endonuclease CRISPR-Cas12j3 • CRISPR-Cas12j is a recently identified family of miniaturized RNA-guided endonucleases from phages. • These ribonucleoproteins provide a compact scaffold gathering all key activities of a genome editing tool. • A site-directed mutagenesis analysis supports the DNA cutting mechanism, providing new avenues to redesign CRISPR-Cas12j nucleases for genome editing. Molecular carpentry  DNA polymerases  DNases  Exonucleases  Restriction endonucleases  Kinases  Phosphatases  Ligases  Etc. 51 Molecular biology in protein technologies 53 Molecular Biotechnology Practicals Kdy: Úterý 27.9. 2022 od 10:00 od 14:00 Kde: D36-308 54 Questions 55 Supplementary materials Gateway cloning The GATEWAY Cloning Technology is based on the site-specific recombination system used by phage λ to integrate its DNA in the E. coli chromosome. Both organisms have specific recombination sites called attP in phage λ site and attB in E. coli. The integration process (lysogeny) is catalyzed by 2 enzymes: the phage λ encoded protein Int (Integrase) and the E. coli protein IHF (Integration Host Factor). Upon integration, the recombination between attB (25 nt) and attP (243 nt) sites generate attL (100 nt) and attR (168 nt) sites that flank the integrated phage l DNA. The process is reversible and the excision is again catalyzed Int and IHF in combination with the phage λ protein Xis. The attL and attR sites surrounding the inserted phage DNA recombine site-specifically during the excision event to reform the attP site in phage λ and the attB site in the E. coli chromosome. Gateway cloning The GATEWAY reactions are in vitro versions of the integration and excision reactions. To make the reactions directional two slightly different and specific site were developed, att1 and att2 for each recombination site. These sites react very specifically with each other. For instance in the BP Reaction attB1 only reacts with attP1 resulting in attL1 and attR1, and attB2 only with attP2 giving attL2 and attR2. The reverse reaction (LR Reaction) shows the same specificity. Gateway cloning  Lambda bacteriophage site specific integration system Gateway cloning The GATEWAY Cloning Technology is based on the site-specific recombination system used by phage λ to integrate its DNA in the E. coli chromosome. Both organisms have specific recombination sites called attP in phage λ site and attB in E. coli. The integration process (lysogeny) is catalyzed by 2 enzymes: the phage λ encoded protein Int (Integrase) and the E. coli protein IHF (Integration Host Factor). Upon integration, the recombination between attB (25 nt) and attP (243 nt) sites generate attL (100 nt) and attR (168 nt) sites that flank the integrated phage l DNA. The process is reversible and the excision is again catalyzed Int and IHF in combination with the phage λ protein Xis. The attL and attR sites surrounding the inserted phage DNA recombine site-specifically during the excision event to reform the attP site in phage λ and the attB site in the E. coli chromosome. The CRISPR/Cas9 system on YouTube https://www.youtube.com/watch?v=bXnWIk8FgKc https://www.youtube.com/watch?v=OjNrbPMXyMA https://www.youtube.com/watch?v=0dRT7slyGhs https://www.youtube.com/watch?v=2pp17E4E-O8 69 Dr. Martin Marek Loschmidt Laboratories Faculty of Science, MUNI Kamenice 5, bld. A13, room 332 martin.marek@recetox.muni.cz