Bacterial and viral genomics David Smajs Department of Biology Faculty of Medicine Masaryk University Brno, Czech Republic Bacterial and viral genomics What is genomics? DNA Sequencing approaches Genome size and structure, mutation rate Genomics of human pathogens: examples The Black Death in Europe in 1347 Ebola, influenza Changes in human genome selected by pathogens Department of Biology _ Fatuity of Medicine * Wasaryk University What is genomics? Genomics is the study of whole genomes of organisms, and incorporates elements from genetics. Genomics uses a combination of recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the structure and function of genomes. In 1952, Alfred Hershey and Martha Chase demonstrated with a series of experiments that DNA, not protein, is responsible for carrying genetic traits that may be inherited. James Watson and Francis Crick discovered the double helix structure of DNA in 1953. 1.1. What is genomics? Milestones 1952: Hershey and Chase - genetic information is encoded in DNA 1953: Watson and Crick - DNA structure 1972: Sanger started work on DNA sequencing 1977: first DNA virus sequenced (OX 174 bacteriophage) 1995: first bacterium sequenced {Haemophilus influenzae) 1996: first eukaryotic genome sequenced (Saccharomyces cerevisiae) 1998: first multicellular organism sequenced {Caenorhabditis elegans) 2003: human genome completed 2023: sequenced 506 618 Department of Biology genomes • Archaea: 5 371 • Bacteria: 433 319 • Eukarya: 49 843 Faculty of Medicine • Masaryk University • Viruses: 18 085 1.1. What is genomics? - Milestones ARCHAEA BACTERIA EUKARYOTE u S © OJD Cm O • O 250.000 200.000 150.000 100.000 50.000 C Sanger sequencing Next-generation sequencing —virus /t t • i , i ! , -x -METAGENOMIC (Hmn-tnrousnput sequencing) / / I / / / — _-- 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 2021 2023 ^ Department of Biology Faculty of Medicine • Masaryk University https ://gold.j gi.doe.gov/ 1.1. What is genomics? - Milestones Cost of Sequencing the Human Genome $100,000,000 $90,000,000 $80,000,000 $70,000,000 $60,000,000 $50,000,000 $40,000,000 $30,000,000 $20,000,000 $10,000,000 $0 395,263,072 $61,448,422 Next-generation sequencing $40,157,554 $18,519,312 $13,801,124 $10,474,556 $7,147,571 I I ■ ■ $343,502$70,333 $29,092 $10,497 $7,950 $5,000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 V ^ Foci Department of Biology Faculty of Medicine • Masaryk University NGS cost in 2024: $100 per human genome First Generation Second Generation Third Generation Fourth Generation > 1972: r 1977: ir 1977: r 1977: 1995: ir 1996: ir 1996: 2001: 2005: 2006: 2006: > 2007: r 2008: r 2009: r 2010: 'r 2011: r- 2012: r 2013: r 2014: r 2017: r 2008: Helicc r 2010: r 2011: r 2012: r 2013: r 2014: r 2015: r- 2016: r 2014 r- 2017 Sanger started work on DNA sequencing Sanger developed Di-deoxy chain termination method of DNA sequencing Maxam and Gilbert developed chemical degradation method of DNA sequencing First DNA based genome sequenced ((PX174 bacteriophage) First bacterium Haemophilus influenzae was sequenced by shotgun method Applied Biosystems developed automated DNA sequencing based on Sanger's method First eukaryotic genome (Saccharomyces cerevisiae) was sequenced First human genome draft was published by two different independent teams First NGS platform released Roche 454 GS-20 Introduction of second NGS platform -Solexa Genome Analyzer Initiation of 1000 genome project Introduction of Roche 454 GS-FLX & ABI-SOLiD sequencer Development of lllumina GA-II Introduction of Roche 454 GS-FLX Titanium Introduction of Roche 454 GS-Junior Introduction of SOLiD 5500 W & lllumina MiSeq Introduction of lllumina HiSeq Introduction of SOLiD 5500x1 W& lllumina MiniSeq Introduction of Roche 454 GS-Junior+, lllumina NextSeq 500 & lllumina HiSeq X Ten Introduction of lllumina iSeq 100 Development of first commercial platform of third generation technology i..e Heliscopeby Introduction of PacBio RS C1/C2 P6 C4 Nanopore Technologies > 2018: Commercialization of ProMethlON platform by Oxford Nanopore Technologies Main sequencing technologies and their characteristics Sequencing Throughput Technology (manufacturer) chemistry Platform Read length (bp) (Gb/h run) Best suited for: Sanger Dye terminator ABI 3730x1 700-900 De novo and metagenomics 454 (Roche) Pyro sequencing GS FLX 400-700 0.04 De novo and metagenomics GS Junior 400 0.004 De novo and metagenomics Solexa (Illumina) Sequencing by Gallx 36-150 0.3 Re sequencing synthesis with HiSeq2000 36-100 2.9 Re sequencing reversible terminators MiSeq 36-250 0.2 Re sequencing SOLiD (ABI) Sequencing by ligation 5500x1 35-75 1 Re sequencing Heliscope (Helicos) Sequencing by synthesis with virtual terminators tSMS 25-55 1 Resequencing Ion Torrent (Life Technologies) Semiconductor Ion torrent PGM 100-200 0.2 Re sequencing sequencing Ion proton sequencer 100-200 2.5 Resequencing PacBio (Pacific Bioscience) SMRT PacBioflS 250-10 000 0.1 Genome structure technology and metagenomics Nanopore (Oxford Nanopore Ionic current GridlON and 10 000-50 000 De novo and Technologies) sensing MinlON genome structure Genome Sequencing Genome: 3 Gb Cut genome into large pieces Clone into BACs: 100 kb j Order based on sequence features {markers) = mapping 1 Cut again Assemble entire sequence ! TTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTT. Sequence Assemble each BAC TAAA ACAT TT TAA AAGG TAG1" AC C GAG TAG C T TC TAGT 150 160 170 TTGTAAGTGAGAACA AGAACAGGAC GTAT GT GGT TGTGGTTTTCTACTCC CTACTCCTGTGTT DNA Sequencing approaches Sanger sequencing process (shotgun) Ultrasound or enzyme Genome DNA DNA fragments Heat 3'.....AGTCATGAGTCC— |_, 3'-----AGTCATGAGTCC- -5' Template Primer -5' DNA fragment Plasmid E coli I dATP I dATP dGTP dGTP dCTP dCTP dTCP dTCP + 1 + I ddATP 1 ddGTPi \ + / \ + / V ■ V \0/ DNA polymerase For example 3'------AGTCATGAGTCC.....5' 5'^~TCAddG 3'------AGTCATGAGTCC—--5' 5' "^-TCAGTACTCAddG 3'------AGTCATGAGTCC.....5' 5'^TCAGTACTCAGddG Complementary chain T 5' Electrophoresis Department of Biology Faculty of Medicine • Masaryk University (A) Genomic DNA fragmented by ultrasound or enzymes, cloned to vector (plasmid), and transformed into E. coli. (B) The dideoxy chain termination method is used for sequencing. The PCR reaction mixture contains template, primers, DNA polymerase, and dNTPs. Moreover, four types of ddNTP (ddATP, ddTTP, ddCTP, and ddGTP) with fluorescent marks are added separately to four reactions. Since ddNTP does not form phosphodiester bond (does not contain hydroxyl group), incorporation of ddNTP into synthetized DNA strand results in termination of DNA amplification. (C) After PCR amplification, four PCR reaction are loaded to electrophoresis and DNA sequence is determined from positions of PCR products. Zhang et al. 2021; https://doi.org/10.3389/fmicb.2021.766364 Sanger sequencing TCAGTAATGCCA AGTCATTACGGT DNA to be sequenced DNA template ddATP ddGTP ddCTP ddTTP + polymerase, dNTPs, primer ( tube 1 TCAGTAA ^TCAGTAATGCCA *TCA *TCAGTA tube 2 TCAG *TCAGTAATG tube 3 TCAGTAATGC *TC •"TCAGTAATGCC tube 4 ▼ TCAGTAATGCCA *TCAGTAATGCC ^TCAGTAATGC *TCAGTAATG *TCAGTAAT ^TCAGTAA *TCAGTA ^TCAGT h"CAG ^TCA ^TC A C C G T A A T G A C T Illumina Genome Analyzer Workflow — Adapter ligation muß* MhA Oh* »• «tut, Surface attachment I l#i" "It" »• Hin«« Bridge amplification I I I I Hi'.'.'!. 'iij¥! • ♦» Denaturation I iH Single base extension Imaging TRENDS m Genetics Illumina workflow. Starting from similar fragmentation and adapter ligation steps, the library is added to a flow cell for bridge amplification (an isothermal process that amplifies each fragment into a cluster). The cluster fragments are denatured, annealed with a sequencing primer and subjected to sequencing by synthesis using 3' blocked labeled nucleotides. Helicos single molecule sequencing method C T A C G ■3' a Fragmented DNA A C G b. Addition of poly' A1 tail c. Flow cell surface (Template) d. Addition of single fluorescently labeled nucleotide e. Detection of emitted light from the incorporation of nucleotide Sample preparation and sequencing process for single molecule sequencing of biological samples. I ISA Mmm« 11 1 ^7 V :mn>il TnxfcnM. • poly A. _1 t» i» t I * « a I I FIIIAUKt WteklM. C < I 4 I * I A t A « < r v r * I A i yxifcmv WnhtiiL In ■ 1 \ ||| EE 1; I. : i! Í T ! T f A t i t 1 f 1 I A 1 i I A 1 T A 1 l 1 a * 1 L Zhao L, Deng L, Li G, Jin H, Cai J, et al. (2017) Single molecule sequencing of the M13 virus genome without amplification. PLOS ONE 12(12): eO 188181. https://doi.org/10.1371/journal.pone.0188181 https://iournals.plos.org/plosone/article?id= 10.1371/iournal.pone.Ol 88181 A new class of third-generation sequencing platforms capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias. PLOS ONE PacBio sequencing 705 71 0 71 5 72 0 72 5 73 0 73 5 74 0 74 5 Time (s) Detection of methylated bases using PacBio sequencing PacBio sequencing can detect modified bases, including m6A (also known as 6mA), by analyzing variation in the time between base incorporations in the read strand. The figure is adapted with permission from Pacific Biosciences, a.u. stands for arbitrary unit. Nanopore sequencing MinlON sequencing Available since May 2015 512 channels, 2 048 pores Theoretical output 50 Gb (run for 72 hours at 420 bases / second) - 30Gb-1 Ox human genome Nanopores read the length of DNA presented to them, longest read so far: > 4 Mb Whole genome, targeted sequencing, whole transcriptome, metagenomics Newest flowcell and chemistry - Raw reads accuracies Q20 (1 in 100)- 99% - Consensus accuracies qso a in 1000 ooo)-99.999% Real-time analysis Sample adctod lo flow ceil here. Read Length Histogram Summary read length distribution Estimated Read Length in Base^s 4 GridlON • 5 flowcells to be run concurrently or individually • 250Gb, flexible, Integrated compute PromethlON • 48 flowcells, 14Tb • large scale, ultra-high throughput lillllllllllllllil :::::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::::: 1 :::::::::::::t::::::::::::::::::::::::::::::::::- Genome size and structure, mutation rate mostly coding DMA mostly non-coding DNA BACTERIA Mycoplasma E. coii 1^1 budding yeast FUNGI PROWS Arabidapaia PLANTS • The size of bacterial genomes ranges from 0.6 Mbp to >10 Mbp. Archeal genomes are smaller 0.5 Mbp to 5.8 Mbp. • For comparison, eukaryotic chromosomes range from 2.9 Mbp (Microsporidia) to 150,000 Mbp (P. japonica). INSECTS MOLLUSK5 shark CARTILAGINOUS FISH Fugu zebra fish BONY FISH ^ I AMPHIBIANS 10Ů 107 REPTILES BIRDS ■ MAMMALS I0e 109 human be Ein lily I I Poris japontca marbled lungfish newt I 10 10 10 li 10 number of nucleotide pairs per haploid genome Department of Biology Faculty of Medicine • Masaryk University http://book.bionumbers.org/how-big-are-genomes Genome size and structure, mutation rate Genome size vs. protein count across NCBI genomes 10 10' c D o ü w 103 (Ľ c d) a U) c xs O O C 102 £ o 10' 10u Archaea Bacteria Eukaryota Viruses ......I ........I ,,....|. ■■I ........I Wl' 10 10' 10' 10° 10 10° Genome size (bp) 10: ľj Department of Biology Faculty of Medicine*Masaryk University organism genome size (base pairs) protein coding genes model organisms model bacteria E.coli 4.6 Mb p 4,300 budding yeast 5. cereWsj'ae 12 Mbp 6,600 fission yeast 5. pambe i3Mbp 4,800 amoeba D. discotdeurt) 34 Mbp 13,000 nematode C. elegam 100 Mop 20,000 fruit fly D. melancgaster 140 Mbp 14,000 model plants. Tbaliana 140Mbp 27P000 moss P patens SlOMbp 2B,000 mouse M. muscvlus 2-SGbp 20,000 human H. sapiens 3.2 Gbp 21,000 viruses hepatitis D virus (smallest known animal RNA virus) 1.7 Kb 1 HIV-J 9.7 kbp 9 influenza A 14 kbp 11 bacteriophage ^ A9 kbp 66 Pandoravirus iatinus (largest known viral genome) 2,8 Mbp 2500 bacteria C ruddii (smallest genome of an endosymbiont bacteria) 160kbp 182 M. genilalium (smallest genome of a free living bacteria) 580 kbp 470 H. pylori 1.7 Mbp 1,600 Cyanobacteria S. elongates 2.7 Mbp 3,000 meth icillin- resi stant 5. aureus (M R S A) 23 Mbp 2,700 H wbTitis. 4.3 Mbp 4,100 S.cetfufosum (largest known bacterial genome) 13 Mbp 9,400 eukaryotes - multicellular pufferfish Fugu rubripes (smallest known vertebrate genome) 400 Mbp 19,000 poplar i>. trkhocarpn (first trop genome iequoncrdj 500 Mbp 46,000 corn 2. mays 2.3 Gbp 35,000 dog C. familiaris 2.4 Gbp 19,000 chimpanzee P. troglodytes 33 Gbp 19,000 wheat T. aeitivum (hexaploid) 16.8 Gbp 95,000 marbled lung fish P.aethiopicus (largest known animal genome) 130 Gbp unknown herb plant Pam Japonka (largest known genome) 150 Gbp unknown http://book.bionumbers.org/how-big-are-genomes There exist globally between 0.8 and 1.6 million prokaryotic OTUs (16S gene at 91% V similarity). 0.8 bacteria GPC (worldwide) GPC (America) 100 200 300 400 500 number of studies 0.06 0.04 0.02 100 200 300 400 500 number of studies GPC (1/4 of studies) 356,488 OTUs on average iChao2split global estimate: 935,387 OTUs on average bacteria archaea & .# 4- ^v ^ t>° cS estimator 0.28 0.21 0.14 0.07 0.00 "O1 estimator GPC 690 474 OTUs SILVANR99 584 978 sequences global estimate based on SILVA NR99 coverage: 717 109 OTUs Estimating global prokaryotic OTU richness. (A, B) Accumulation curves showing the number of bacterial (A) and archaeal (B) OTUs discovered, depending on the number of distinct studies included. Curves are averaged over 100 random subsamplings, and whiskers show corresponding standard deviations. Continuous curves were calculated using all studies (worldwide), while blue dashed curves were calculated using solely studies performed in the Americas or near American coasts. (C, D) Global OTU richness of Bacteria (C) and Archaea (D), estimated using the iChao2, iChao2split, ICE, CatchAII, breakaway, and tWLRM estimators. The number of OTUs discovered by the GPC is included for comparison (last bar). Bacterial genomics - Escherichia coli „E. coli is a single species with numerous recognized roles, from lab workhorse to beneficial intestinal commensal or deadly pathogen. The extant strains have disparate lifestyles as a result of differential niche expansion since their divergence 25^1-0 million years ago, ten times longer than the estimated divergence between chimpanzees and humans." Prokaryotic cell E. coli is a Gram-negative, nonsporulating, facultative anaerobe. Cells are typically rod-shaped (2.0 x 0.5 jurn). Faculty of Medicine*Masaryk University Department of Biology Hendrickson 2009 ; DOI: 10.1371/journal.pgen. 1000335 https://www.intechopen.com/chapters/84764 Escherichia coli E. coli genome types genes ore génom 2000 gene; Genome is the set of genetic information (genes) in one cell (strain/organism). Core genome is the set of genes found across all strains. Pangenome is the set of all genes from all strains. Accessory (cloud) genome is the set shared within only one or some strains. Department of Biology Faculty of Medicine*Masaryk University >50% of E. coli genetic information is variable Genome size and structure, mutation rate - Genetically monomorphic pathogens T. pallidum genome (1.14 x 106 bp) contains minimal sequence diversity among syphilis strains (99.99% identity). • there are two genomic groups Gauthier Samoa D Other examples of genetically monomorphic pathogens are Bacillus anthracis (anthrax), Yersinia pestis (plague), Burkholderia mallei (glanders), Escherichia coli 0157:H7 (HUS), Mycobacterium tuberculosis (tuberculosis), Mycobacterium leprae (leprae), and Salmonella enterica serovar Typhi (typhoid fever). ^ Department of Biology Faculty of Medicine • Masaryk University Mikalová et al.2010, https://doi.org/10.1371/journal.pone.0015713 Genome size and structure, mutation rate Minimal bacterial genome Mycoplasma genitalium (a human urogenital pathogen) has the smallest genome of all solitaire organisms. Its genome size is 580 kb and contains only 470 genes. It has a minimal metabolism and little genomic redundancy. One third of the proteins have unknown function. Synthetic genomes JCVI-syn3.0, synthetic bacterium, encodes only 473 genes (genome 531.56 kbp). Proposed minimal set of 256 genes. Department of Biology Faculty of Medicine • Masaryk University Hutchinson et al. 2016; https:// doi: 10.1126/science.aad6253. Mushegian AR, Koonin EV 1996; https://doi.org/10.1073/pnas.93.19.10268 600 500 u c o cr 400 300 200 100 0 Distribution of 2,112 bacterial genome sizes. The average genome size was 3,451 kbps, and the standard deviation was 1,882 kbps. The range of the size difference was approximately 93-fold. o o o o o o CM o o o m o o o o o o o o o o o o o o o 00 o o o O O o o o o O o o o o O o rH CM m rH rH rH kbps Archaea: Methanosarcina acetivorans Halobacterium salinariuml Sulfolobus solfataricus Methanosarcina barkeri Halobacterium sp ,'n Archaeglobus fulgidusll Pyrococcus furiosusli Ferroplasma acidarmanusli Methanobacterium thermoautotrophicum"j[ Methanococcus jannaschiili Thermoplasnria acidophilumll Nanoarchaeum equitans Bacteria: Nostoc punctiforme Myxococcus xanthus Gemmata obscuriglobus Streptomyces coelicolor Mesorhizobium lot i Mycobacterium smegmatis Pseudomonas aeruginosa EurkhoIderia pseudomallei Escherichia coli 01 57 :H7 Agrobacterium tumefaciens"j| Pseudomonas putida". Salmonella typhimurium][ Escherichia coli K-1 2 Mycobacterium tuberculosis]I Bacillus subtilis"| Caulobacter crescentus Vibrio cholerae Deinococcus radiodurans][ Xylella fastidiosa'i Lactococcus lactisll Neisseria meningitidis"! Chlorobium tepidum Haemophilus influenzae Aquifex aeolicus~[ Rickettsia prowazekii][ Geobacter sulfurreducens~[ Mycoplasma pneumoniae Mycoplasma genitalium 10 Genome size (Mbp) Not long ago it was thought that all prokaryotic genomes (both Bacteria and Archae) were much smaller than eukaryotic genomes. However, the application of new techniques for constructing physical maps and whole genome sequencing has demonstrated that there is tremendous diversity in the size and organization of prokaryotic genomes. The following figure shows some examples of genome sizes of Bacteria and Archae. The size of Bacterial chromosomes ranges from 0.6 Mbp to over 10 Mbp, and the size of Archael chromosomes range from 0.5 Mbp to 5.8 Mbp. (For comparison, Eukaryotic chromosomes range from 2.9 Mbp (Microsporidia) to well over 4,000 Mbp, although the largest genomes are littered with a tremendous amount of repetitive "junk" DNA.) Genome rearrangements and evolution Evolution of genomes caused by inversions. Different colored boxes represent the gene blocks. Grey arrow shows the inverted region of the genome. The identification of the inversions can reveal the evolutionary history of the organisms as depicted here. A schematic representation of evolution of the TP0136 locus in TPeL and TPeC trepoenemes. The evolution of this regions required several steps including two gene conversion events and one deletion. The part of the TP0136 showing modular structure was not affected during these changes. TP0133 TP0134 TP0135 tRNA-Val TP0136 Ancestor K ^> TPA,TPE,TEN strains TP0133 TP0134 TP0135 tRNA-Val TP0136 step 1 step 2 < c K TP0133 TP0133aTP0133b tRNA-Val TP0136 ;<-rr-t TPeL L2 TP0133 TP0135 tRNA-Val TP0136 step 3 TPeC Cuniculi A modular structure (Strouhal etal. 2018) Composition of repeat motif regions observed in the TPeL L2, L3, and TPeC Cuniculi A genomes. Cuniculi A [Harper 2003} 4 5 ZZ 10 11 12 13 14 15 16 17 IS 19 20 21 Cuniculi A L2 L3 "......"" i mini Repeat Units Types and their amirioacid sequences CA L2 L3 1 2 3 4 5 6 7 3 9 10 11 12 13 14 15 16 17 IS 19 20 H E V E D V P K V V E P A S E R G G R E R E V E D A P G V V E P A s E R G G R E R E V E D V P G V V E P A s E R G G R E R E V E D A P K V V E P A s E R G G R E » E V E D V P K V V E P V = E R G G G E R E V E D A P G V V E P V E R G G G E R E V E D V P K V V E P A. s E R G G G E « E V E D V P K V V E P V = E R G G R E Genome rearrangements and evolution Y. pseudotuberculosis IP32953 Black death, Y. pestis Some of the pre-existing conditions necessary for this occurrence include war, famine, weather and interactions between people. At the end of the 13th century and into the first half of the 14th century, disastrous weather had severe implications worldwide (Little Ice Age). The Great Famine struck all of Northern Europe in the 14th century, resulting in hunger and malnutrition. The Black Death first came to Europe in 1347, engulfing the continent in sickness and turmoil. Transported through fleas (and, by extension, rats stowed away on seafaring ships) the Black Death killed off as much as 50 percent of the population in less than a decade, and outbreaks of the plague continued to sicken Europeans until the disease disappeared in the early 19th century. THE 'BLACK DEATH' ENTERED ENGLAND IN 1348 THROUGH THIS PORT. IT KILLED 30-50% OF THE COUNTRY'S TOTAL POPULATION Scientists think that plague bacteria circulate at low rates within populations of certain rodents without causing excessive rodent die-off. This is called the enzootic cycle. Occasionally, other species become infected, causing an outbreak among animals, called an epizootic. Humans are usually more at risk during, or shortly after, a plague epizootic. The plague outbreaks in Asia were consistently followed by flare-ups in Europe roughly 15 years later—just enough time for Asia's rodent reservoirs to travel trade routes into Europe. In other words, European rats aren't to blame for the Black Death; it's their furry Asian counterparts. Fully parsimonious minimal spanning tree of 933 SNPs for 282 isolates of Y. pestis colored by location. Branch 1 1.0RI3 IP674 1.ANT1 2.ANT1; 516 Branch 2 2.MED2 K11973002 2.MED1 MG05-1020 KimlO 0.PE7 Root Y. pseudotuberculosis IP32953 PEST-F O.PE2a O.PE2b 1.0RI2 • Madagascar O Nothern Africa O USA O Germany • Former USSR O Kurdistan/Turkey South America O Other O China O Southeastern Asia # India • Central/South Africa F19910161™ The phylogenetic analysis suggests that Y. pestis evolved in or near China and spread through multiple radiations to Europe, South America, Africa and Southeast Asia, leading to country-specific lineages that can be traced by lineage-specific SNPs. All 626 current isolates from the United States reflect one radiation, and 82 isolates from Madagascar represent a second radiation. Subsequent local microevolution of Y. pestis is marked by sequential, geographically specific SNPs. Hare syphilis in Europe, more than half of hare population is infected with T. paraluisleporidarum MJ network of all tested samples based on concatenated sequences (TP0548 and TP0488). Only parsimony informative sites (n = 63) were used for construction of network. Color code correspond to geographic region (N - North Germany, S - South Germany). While there is a significant bootstrap support for the group of Swedish samples and Cuniculi A (USA), the bootstrap support for two clusters in the fig. below is very low. There is no clear clustering of samples based on geographical origin. Viral genomics Viral particles (virions) consist of (i) genetic material (DNA or RNA), (ii) protein coat (capsid), which surrounds and protects the genetic material, and in some cases (iii) lipid envelope. Virion morphology - helical, icosahedral, and complex structures. ■ Ounn Hepatitis B virus Department of Biology Faculty of Medicine • Masaryk University Hepatitis A virus. Parvovirus Poliovirus !0nm 30nm https://en.wikipedia.Org/wiki/VirUS#Etymology dsDNA < Z Q Asfarviridae Poxvmdae Chordoooxvirinae Indovindae Ranavirus Lymphocystivirus dsDNA (RT) Hepadnavihdae Polyomavtridae Herpesviridae Papillomaviridae Adenoviridae ssDNA Circovindae e Parvoviridae Parvovmnae < Z dsRNIA Reovihdae Orthoreovirus Orb/virus Cottivirus Rotavirus Aquareovirus Birnavmdae Aquabirnavirus Avibimavirus - 100 nm ssRNA (-) Retroviridae ssRNA \/\ A/v/vy MaexMxw RNA(+) RNA(-) RNA(+) DNA(+/-) oc/no proteins https://viralzone.expasy.org/254 The positive-sense genome acts as mRNA and it is directly translated into viral proteins. The negative-sense genome is complementary to mRNA molecules, which are synthesized by the viral R1SIA-dependent RNA polymerase (RdRp). https ://slideplayer. com/slide/12253566/ 10 ■■ J <1> 10 "H ■fr viroid +■ ssC+JRNA - sfiHRNA O dsRNA X ssDNA O dsDNA O bacterium m1 103 Genome size (nt) 10s I07 Drake's rule is a notoriously universal property of genomes from microbes to mammals—the number of (functional) mutations per-genome per-generation is approximately constant within a phylum, despite the orders of magnitude differences in genome sizes and diverse populations' properties. 0.0033 per generation Relationship between mutation rate and genome size, with major virus groups indicated. Values for viroids and bacteria, the two adjacent levels of biological complexity, are also plotted. The mutation rate is expressed as the number of substitutions per nucleotide per generation, defined as a cell infection in viruses (us/n/c). (A) Before application of mutagen Point of Error Catastrophe (B) After application of mutagen Lethal mutagenesis i Normal quas is pedes distribution with meaningful information Random sequences with meaningless information Mutation rate/replication COPmiG FIDELITY Seq. 1 Seq. 2 TCC TTC CAG ACC TAA TCC TTA CAG ACC TAA Seq. 1 Seq. 2 TCC TTC CAG ACC TAA * * * * TCA TTA CAG ACT TAG NH2 HC OH OH Ribavirin (eis) ď OH OH Ribavrin (trans) NH, The lethal mutagenesis mechanism of ribavirin. The ribavirin c/'s conformer can pair with uridine by mimicking adenosine, and the trans conformer can pair with cytidine by mimicking guanosine. J I 5----H—N . N N-H-------N Ribavirin (as) Ribavirin (trans) Cy Ebola virus disease •Ebola virus disease (EVD), formerly known as Ebola haemorrhagic fever, is a rare but severe, often fatal illness in humans. •The virus is transmitted to people from wild animals and spreads in the human population through human-to-human transmission. •The average EVD case fatality rate is around 50%. Vaccines to protect against Ebola have been developed and have been used to help control the spread of Ebola outbreaks in Guinea and in the Democratic Republic of the Congo (DRC). •Early supportive care with rehydration, symptomatic treatment improves survival. Two monoclonal antibodies (Inmazeb and Ebanga) were approved for the treatment of Zaire ebolavirus (Ebolavirus) infection in adults and children by the US Food and Drug Administration in late 2020. The 2014-2016 outbreak in West Africa was the largest Ebola outbreak since the virus was first discovered in 1976. The outbreak started in Guinea and then moved across land borders to Sierra Leone and Liberia. The virus family Filoviridae includes three genera: Cuevavirus, Marburgvirus, and Ebolavirus. Within the genus Ebolavirus, six species have been identified: Zaire, Bundibugyo, Sudan, TaT Forest, Reston and Bombali. Egyptian rousette Schreibers's long-fingered bat Complete or coding-complete filovirus genome sequences have been obtained from cave-dwelling and house-dwelling bats and highly diverse fish on the African, Asian and European continents. The pathogenic potential of most filoviruses remains unclear, as does the transmission route of pathogenic filoviruses proven to infect humans and pigs or of pathogenic filoviruses suspected to infect chimpanzees, duikers and gorillas. Animals that have been proven to be infected by filoviruses are indicated in black; grey animals are suspected but unproven reservoirs of the indicated viruses. Solid arrows indicate highly likely transmission routes; dashed arrows indicate hypothesized transmission routes. BDBV, Bundibugyo virus; BOMV, Bombali virus; EBOV, Ebola virus; HUJV, Huangjiao virus; LLOV, Lloviu virus; MARV, Marburg virus; MLAV, Mengla virus; RAW, Ravn virus; RESTV, Reston virus; SUDV, Sudan virus; TAFV, Tai Forest virus; XILV, XTIang virus. Within and between Country Genomic Relationships of Ebola Virus This is a picture of an influenza virus. Influenza A viruses are classified by subtypes based on the properties of their hemagglutinin (H) and neuraminidase (N) surface proteins. There are 18 different HA subtypes and 11 different NA subtypes. Subtypes are named by combining the H and N numbers - e.g., A(H1 N1), A(H3N2). Click on the image to enlarge the picture. Influenza Virus: 3 Types ^RNA virus SAntigenically distinct ^No cross-immunity Influenza Type A Type B Type C Disease ++ -Hi- - Epidemics/Pandemics Epidemics & Pandemics Milder epidemics No Host Humans & other species ! Humans only Humans only Antigenic variation Frequent Infrequent Stahle Viral genomics - Influenza virus (Ortomyxoviridae) • Four types - Influenza A, B, C, and D • ssRNA(-) segmented genome (8 segments,~13.5 kb; 11 proteins) High rate of random mutations (antigenic drift) is responsible for a emergence of new influenza variants each season (year). Yeganeh et al. 2013; PMID: 23454774. Viral genomics - Influenza virus Evolution of influenza through reassortment (antigenic shift) If a host cell is infected with more than one influenza virus, the viral progeny contain random sets of the genome segments. n i PB1 PB1-F2 PA PA-X -5' ■5' Hemagglutinin (HA) Neuraminidase (NA) 1-5' V ^ Foci Department of Biology Faculty of Medicine • Masaryk University natural double infection ™ Liang 2023; https://doi.org/10.1080/21505594.2023.2223057 https://www.zkbs-online.de/ZKBS/EN/Meta/focus_topics/Influenzavkuses/Influenza_viruses_no Viral genomics - Influenza virus Segment reassortment is responsible for the emergence of pandemic influenza variants. Bird-to-human transmission 1957 Asian influenza H2N2 H2N2 H1N1 H3 avian virus human virus avian virus H2N2 human virus HUMANS, ANIMALS AND TYPES OF INFLUENZA A VIRUS (characterüedintDSubtypes) • May cause pandemics Most common variant of SWINE FLU Responsible rot the 1918 Spanish Hu that killed - 50-100 WILIOH PEOPLE worldwide in 2009 a new strain ol him emerged causing a pandemic mat killed -15,000 PEOPLE worldwide BIRD FLU ':ii. the media No human to human transmission 622 human cases repoted since 2003. with a 60% MORTALITY RATE [ÉMMllfÉlfetillMn I II ľ - ľ I H ■ //SUBTYPE influenza A viruses only Subtypes are denoted based on the outer protein spikes hemagglutinin [H) and neuraminidase IV , OiSHtHHB ■ Hill llirl|li Ca« i op if urteil high noiiilü/ me Cai nfict latelf iiltcts //STRAIN Influenza * viruses are Further characterized into strains, while influenza B are divided into strains only HIMER If URBS AREA a UP i! HC í ÜBT V Pil Till IIFECI NATALIE CORMIER 2015 Department of Biology Faculty of Medicine*Masaryk University https://bmcl.utm.utoronto.ca/~natalie/flufacts/ Viral genomics Evolution of SARS-CoV-2 variants ssRNA(+) nonsegmented genome (-30 kb; 29 proteins) High rate of mutations (antigenic drift) Clade 19A 19B 20A 20B 20C 20D 20E (EU1) 20A.EU2 S.S98F S.D80Y S.N439K 19A November 2019 ulgaria Istanbul Ankafa Turkey . February 2020 May 2020 Date August 2020 Left, the tree shows a representative sample of isolates from Europe coloured by clades/variants. Clade 20A and its daughter clades 20B and 20C (yellow) carry mutations S:D614G. Variant 20E (EU1) (orange), with mutation S:A222V on a S:D614G background, emerged in early summer 2020 and became common in many European countries in autumn 2020. A separate variant (20A.EU2; blue) with mutation S:S477N became prevalent in France. Right, the proportion of sequences belonging to each variant per country. Department of Biology _ \ ~ Faculty of Medicine*Masaryk University Map data copyright Google, INEGI (2021). Changes in human genome selected by pathogens Malaria Tuberculosis Smallpox Leprosy Cholera AIDS w -200,000 years ago -100,000-50,000 years ago 12,000 2,200 500 years ago years ago years ago Today Modern humans emerge in Africa Migrations within Africa Migrations out of Africa Early agriculture (neolithic Silk Road links Africa, European colonization demographic transition) Europe and Asia of Americas begins Globalization Nature Reviews | Genetics Key events in recent human evolution (boxes outlined in black) are juxtaposed with the estimated ages of infectious disease emergence (boxes outlined in red). The fragmentation of the human lineage into genetically and geographically distinct populations (blue lines) accelerates with migration out of Africa. Later, these populations started mixing more (blue shaded regions between the populations) along trade routes (such as the Silk Road), through colonization and through high rates of global travel nowadays. Smallpox-variola. Common Erythrocyte Variants That Affect Resistance to Malaria Gene FY G6PD GYPA GYPB GYPC HBA HBB HP SCL4A1 Protein Duffy antigen Gluco se-6-pho sphatase dehydogenase Glycophorin A Glycophorin B Glycophorin C a-Globin ß-Globin Haptoglobin Function Chemokine receptor Enzyme that protects against oxidative stress Sialoglycoprotein Sialoglycoprotein Sialoglycoprotein Component of hemoglobin Component of hemoglobin Hemoglobin-binding protein present in plasma (not erythrocyte) CD233, erythrocyte band 3 protein Chloride/bicarbonate exchanger Reported Genetic Associations with Malaria FY*0 allele completely protects against P. vivax infection. G6PD deficiency protects against severe malaria. GYPA-deficient erythrocytes are resistant to invasion by P. falciparum. GYPB-deficient erythrocytes are resistant to invasion by P. falciparum. GYPC-deficient erythrocytes are resistant to invasion by P. falciparum. a+ Thalassemia protects against severe malaria but appears to enhance mild malaria episodes in some environments. HbS and HbC alleles protect against severe malaria. HbE allele reduces parasite invasion. Haptoglobin 1-1 genotype is associated with susceptibility to severe malaria in Sudan and Ghana. Deletion causes ovalocytosis but protects against cerebral malaria. The Jim eric an Journal of Human Genetics 1ÉC Estimated tuberculosis (TB) incidence rates, 2011 Cystic fibrosis 1 in 3,000 children are born with CF, and 2% of people carry one mutant gene Clastic and Nonclassic Cystic Fibrosis CUu< cystic dbroei» (no functional CFTR protem) Chrome sinusitis Severe cnronrc bectenet in »«cl«on of airways Severe hepatobiliary duoaso • L*-t 9miU »I >««*» t> ItlMSIIitMBll « Exon 9 splicing pattern III. ItM'HI Clinical phenotype Nofmal \ Non-classical CF Skt|i|>ill", CFTR genotype (TC)m (T)n TO» T» TC10 T7 T01t TS TOW T* TCI 3 TJ Cystic fibrosis gene protects against tuberculosis Between 1600 and 1900, TB caused 20% of all deaths in Europe The Lübeck disaster (1929-1933) is a unique event in the history of tuberculosis when 251 newborns were accidentally infected with a virulent strain of Mycobacterium tuberculosis. The disaster happened while BCG was introduced as an anti-TB vaccine. In an exemplary multidisciplinary investigation, the disaster was shown to be due to the accidental contamination of BCG vaccine preparations with virulent M. tuberculosis and not BCG itself. o 300 250 200 150 100 50 Disease severity No symptoms, positive TST - Mild illness Moderate illness - Severe illness - Death 2>N 3N * Month of assessment 7316 There were nearly 200,000 cases of leprosy around the world at the beginning of 2012. I I 0 cases reported Leprosy and the Adaptation of Human Toll-Like Receptor 1. Population differentiation at TLR11602S The protective dysfunctional 602S allele is rare in Africa but expands to become the dominant allele among individuals of European descent. This supports the hypothesis that this locus may be under selection from mycobacteria or other pathogens that are recognized by TLR1 and its co-receptors. 602S 6021 Montovy v T Coo v T m * t Z*t*aft«* L C ■ T L l e 1 I L L r L C I T L i> r l e I r l It» •- c X 11 & r L C X t L L r L C V r r X> » L C I r • X, r L C X • V » L C !i ; L o X 9 Wong SH, Gochhait S, Malhotra D, Pettersson FH, Teo YY, et al. (2010) Leprosy and the Adaptation of Human Toll-Like Receptor 1. PLoS Pathog 6(7): el000979. doi:10.1371/journal.ppat.l000979 http://127.0.0.1:8081/plospathogens/article?id=info:doi/10.1371/iournal.ppat.l000979 '.3* PLOS I pathogens Leucine-rich repeats(LRRs) TLR2 is one of the toll-like receptors and plays a role in the immune system. TLR2 is a membrane protein, a receptor, which is expressed on the surface of certain cells and recognizes foreign substances and passes on appropriate signals to the cells of the immune system. The presence of the TLR2 Arg753Gln polymorphism was significantly associated with pneumonia in AML patients. The presence of the TLR2 Arg753Gln polymorphism was significantly associated with resistence to syphilis. Adults (15-49 years of age) Living with HIV HIV epidemic 1. Env Viral membrane gp41 gp120 Variable ^ |o°p3 Cell 2. CD4 binding w 3. Coreceptor binding 4. Membrane fusion liiiiiilj111111111 ] 11 m e m b ra n e<^t^<^<^t^(^^t^t^t^t^^ CD4 Fusion peptide Coreceptor Six-helix bundle formation The CCR5 locus (coreceptor) shows that historical epidemics have been important in shaping the genomes of humans and other primate species. It has been projected that if the HIV epidemic continues for another 100 years, it will leave a signature on the human genome at the CCR5 locus and related HIV resistance loci. Although higher HLA-C expression protects against HIV progression, it also increases risk of the inflammatory disorder Crohn's disease, which highlights the potential for health repercussions of pathogen-driven selection. The CCR5 chemokine receptor is exploited by HIV-1 to gain entry into CD4+ T cells. A deletion mutation (A32) confers resistance against HIV by obliterating the expression of the receptor on the cell surface.. The allele exists at appreciable frequencies only in Europe, and within Europe, the frequency is higher in the north. Fig. 1. A. A schematic representation of Viking hypothesis. The red square represents a Scandinavian origin of the allele. The black arrows represent dissemination of CCR5-D32 by Vikings southwards towards France and the Mediterranean, eastwards towards Russia, and northwest towards Iceland. Contour lines and color represent the frequency in Europe at an intermediate stage of the allele's migration out of Scandinavia. B. The modern-day observed allele frequencies. Squares mark locations of sampled allele frequencies, and color within the squares denotes the observed frequencies. Contour lines represent interpolated allele frequencie The selective raise of the CCR5-A32 allele was proposed to be attributed to smallpox (Variola major) caused by the poxvirus. Their estimates suggest that the CCR5-A32 deletion arose around 2000 years ago with a range from 375 to 4800 years