Lidský génom tcacaatttagacatctagtcttccacttaagcatatttagattgtttccagttttcagcttttatgactaäatcttctaaaattgtttttccctaäatgtatattttaatttgtctcaggagtagaatttctgagtcataäagcggt catatgtataaattttaggtgcctcatagctcttcaaatagtcatcccattttatacatccaggcaatatatgagagttcttggtgctccacatcttagctaggatttgatgtcaaccagtctctttaatttagatattctagtacat acaaaataatacctcagtgtaacctctgtttgtatttcccttgattaactgatgctgagcacatcttcatgtgcttattgaccattaattagtcttatttgttaaatgtctcaaatattttatacagttttacattgtgttattcatt ttttaaaaaattcattttaggttatatgtatgtgtgtgtcaaagtgtgtgtacatctatttgatatatgtatgtctatatattctggataccatctctgtttcatgcattgcatatatatttgcctatttagtggtttatcttttcat tttcttttggtatcttttcattagaaatgttatttattttgagtaagtaacatttaatatattctgtaacatttaatgaatcattttatgttatgtttagtattaaatttctgaaaacattctatgtattctactagaattgtcataa ttttatcttttatatacattgatatttttatgtcaaatatgtaggtatgtgatattatgcacatggttttaattcagttaattgttcttccagatgtttgtaccattccaacatcatttaaatcattaaatgaaaagcctttccttac tagctagccagctttgaaaatccattcatagggtttgtgttaatatatttttgttcttttttttcctttctactgatctctttatattaatacctactgtggctttatatgaagtcatggaataatacgtagtaagccctctaacact gttctgttactgttgttattgttttctcagggtactttgaaatattcgagattttattattttttagtagcctagatttcaagattgttttgacgatcaatttttgaatcaattgtcaatatttttagtaataaaatgatgatttttg attggaaatacattaaatctataagccaaattggagattattgatatattaacaaaaatgagttttccagtccatgaatgtatgcacattataaaattcattcttaagtatgtcattttttaagttttagtttcagcagtatatgttt gttacataggtaaactcctgtcatgggggttagttgtacaggttattttatcatccaggcataaagcccagtacccagtagttatcttttctgctcctctccctcctgtcaccctccactctcaagtagaccccagtttctgttgttc tcttctttgcattaatgacttctcatcatttagattgcacttgtaagtgagaacaggacgtatgtggttttctactcctgtgttagtttgctaaggataaccacctccatctccatccatgttcccacaaaagacatgatctcctttt ttatggctgcatattattccatggtatatatgtaccacattttctttatccaatctgtcattgatggacatttaggttgtttccacatcattgccgttgtaaatactgctgcagtgaatattcgtgtgtatgtctttatggtagaatg atttatattcctctgggtatatttccaagtaatgggatggttgggtcaaatggtaattctgcttttagctttttgaggaattgccatattgcctttcacaacggttgaactaatttatactcccaagagtgtataagttgttcctttt tctctgcaacctcgacatcacctgttatttatgacttttatataatagccattctgctggtctgagatggtatctcattatgattttgatttgcatttctctaatgctcagtgatattgagcttggctgcatatatgtcttcttttaa aaatatctgttcatgtcctttgcctaatttataacggggttgtttgtttttctcttgtaaatttgtttaagttccttatagattctaggtattaaaccttttttcagaggcgtggcttgcaaatattttctcccattctataggttgt ctgtttattctgttgatagtttcccttgctgtgcagaagctcttaactttaattagatccgacttgtcaatttttgctttggtcgcaattgcttttgatgttattgtcgtgaaatctttgctagttcttaggtccaggatgatattgc ccaagttgtcttccagggcttttataattttggattttacatttaagtcttaatatatttattaaatttgttagggtttcaggatacaaggacaatatagcagcaaacaatgtaaaagtaaaatctgaaaaataatagaaaacagttt aattgaacactttaccattatgtaatgcccttctttgtctttcctgatctttgttggtttgaagttcaaaaaagacaaacttaatggtacaataggtattgtagatttcaggactttctgtataaaatattttgtatatatgaataga tcattttttatttccagtctttaaacattttcttaacattttcttctattgcttcacttcactcgctaggaccatcaggacagtgttgaacagaaattgtcagactgatcatcacaactttttctagattttagaaggaaatttttct ttatttcaacataaagcagcatgttaatgccaagttttaatatgtgttatcagattgaaatttttttgtatatttctacattaccaagaatttttagcaagagtttttgttgagttttaatttaaaaatcatttgttaatttcatctg atttttttatttctctttttaccttaagagattaaactgactacagattgaatataaacaaacaaacaaacaaacaaaaactctaaaatgctgtggatcaacaccacttagtaatttgtatacttggattcaatttgctgaaattttg ttagacatttttgcgtcgatatttatgagggatgttgatctgtaaaagtattaaaatgcctttgacagattttgatagcagtgttattctggcctaataaatcaaactgaggtatgatccttccttttctatttcttaatagcatttt taaaattggtggttttttccttccttagtgaaatttaccagcaaagtaacaggccttatatttctcttgtggaaatattttaatttcaaattaatggtattttgttcttgtagggtggtaattttctctgtgtttggtcttaatggac tcttagctgatcacccagttactcagcgaggtctcttcactctggaagagctggaactccagtgtgttttagtgcagcatgaccacgggtattaccgttcaacatttaggctttatcagtgataactatttgtcctcatggagttttt gccgctgggcctacacagtttaggcttcagcttagaacacataatgaattcttatgcagatttctgcccacctttgacctttcatgatttcctcttcttgggtaagctgccttattaatctgatacacttcagcagtccagaactaca ctctttcccttctctgctcttggagatgactcttttgtctgagattcactttgctgtgctgaaaaagaaaagtgcttcaaggaagatM ■'! gttgcctatgttcaatttctgaaaataattagagcatatatactctgtgtgagaaggcaaatccagacagttagtttgtatgactagH Ha gagtatgtttaatgttccacagatctcattctataaatctttatcatcttagagctctgatactatttagaattactattccttcaaH Ha GCATATTAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATTTACATTCTGCAAAATCATTCTTTCCTTTTTG, .[LeÍIIÍJJ JIIUiMiíÍ^iu^^^|T TCATTTAATGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAAGTATCTATAACGATGGAAATATACATCTcH ^ ^^ľj^^äáEríSS1 cagggtctcaatatttaaatgtattaagctttaattaatgtaaatttgaatttagcaaaacatgtatagcttgtggttactgttttaH , >^ä■ ,/^K^ÍSÍ;Ír£- "'T GGACTATTTTATCTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTGTTGAATATTTTTATAGCAAAAGTCäH jlji^f."^ ^~^^>vN^Q^^KifflT TCTACTCCCTTCTTACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGAAATGACAACATAAATAATTTTAC' -.1-J jW -Cj ' í ^\C*f ■'' ÍMy^,'.X^?*fNa ttttgaactaacttagatgataattttaatctatatcctagatgaactttaaatcaataaaatctctcaatggtgttataaatctcaH 'ÍL^ÍL* ^^■'V^ŕí ^\^JjfeT^^iA^^V"; gaatgctgtagcatccatgtttaaatactagttaacaaaatgcactggcatcagatacaataaggatgaaatgagatataattaggaH j"íSt^^7 "oř ^^^^^S^2V^^^áĚíT CCAGCTTATTTTATTTTGAGACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGGCTCGCTCCAACCTCTGtH ÍS^HiUíIiÍa^' *V -itS'^ '""^^^lí' idrT^SSlM'íi^Äi^*1 TATAAACAGGATAATAAAAATAAGACAAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGTTACTTTGAAGC^^H^BB*" J^^^^^^t-^^^^^^S^tfmUC^^k '■ '\ ' T TATTATGACACAAAGTCTATAAATTCTTATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATAAAATTTCTT^HI-T^Í^^J^t1" , , lHfcI*-J'.^ 'm' C 1000 telefonních seznamů 1588 Genetika Genomika ĚBrnida principles Iqi rnaid and open dala release eílohlhtíri Chinese Halionol Human Ěenome Cenlers (in feipij and Stung liui) eslobletiep1 Exetiňie o'Jei iwns;:neli( Ji'.i minatian ir U.S. I«de*al *:r-c :te Cost of Sequencing the Human Genome 3100,000,000 $90,000,000 $80,000,000 $70,000,000 $60,000,000 $50,000,000 $40,000,000 $30,000,000 $20,000,000 $10,000,000 195,263,072 $61,448,422 $40,157,554 $1S,519,312 m $13,801,124 $10,474,556 $7,147,571 1 1 1 1 *3+3>S02$70,333 $29,092 $10,497 $7,950 $5,000 2001 2002 2003 2004 200S 2006 2007 2008 2009 2010 2011 2012 2013 Completion of the genome International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004 Oct 21;431(7011):931-45. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Human genome seems to encode only 20,000-25,000 protein-coding genes A recent study noted more than 160 euchromatic gaps of which 50 gaps were closed. [10] However, there are still numerous gaps in the heterochromatic parts of the genome which is much harder to sequence due to numerous repeats and other intractable sequence features. How the genome wa^egueficed? What's in a genome? >Geny podmiňující gen. choroby - poziční klonování (30 genů) >Paralogní geny (achromatopsie, CNGA3, CNGB3); (971 známých genů => 286 paralogních genů) XTíle zásahu medikamentu - recentní kompendium =483 cílů, 18 nově identifikovaných; (Alzheimer's disease, fi-amyloid is generated by processing APP by BACE; BACE2 in obligatory Down's syndrom region of chromosome 21) > Obecná biologie - horká chuť - nová rodina G-proteinových receptoru What are application to Medicine and Biology? /mapping the\ i CANCER I What are genetic differences between Modern Man and other species? ^ What are other projects that follow HGP? j|f| mm Genome Sequencing Genome: 3 Gb Cut genome into large pieces Clone into BACs: 100 kb Order based on sequence features (markers) = mapping Cut again Assemble entire sequence Sequence rAAA ACAT TT TAAAAGC TAGTAC C CAGTAC C T TC TAGT 150 160 170 Assemble each BAC ttgtaagtgagaaca AGAACAGGACGTATGTGGT TGTGGTTTTCTACTCC CTACTCCTGTGTT What does the sequence mean? TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCCAGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATGTATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGT CATATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTTTATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTAGGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACAT ACAAAATAATACCTCAGTGTAACCTCTGTTTGTATTTCCCTTGATTAACTGATGCTGAGCACATCTTCATGTGCTTATTGACCATTAATTAGTCTTATTTGTTAAATGTCTCAAATATTTTATACAGTTTTACATTGTGTTATTCATT TTTTAAAAAATTCATTTTAGGTTATATGTATGTGTGTGTCAAAGTGTGTGTACATCTATTTGATATATGTATGTCTATATATTCTGGATACCATCTCTGTTTCATGCATTGCATATATATTTGCCTATTTAGTGGTTTATCTTTTCAT TTTCTTTTGGTATCTTTTCATTAGAAATGTTATTTATTTTGAGTAAGTAACATTTAATATATTCTGTAACATTTAATGAATCATTTTATGTTATGTTTAGTATTAAATTTCTGAAAACATTCTATGTATTCTACTAGAATTGTCATAA TTTTATCTTTTATATACATTGATATTTTTATGTCAAATATGTAGGTATGTGATATTATGCACATGGTTTTAATTCAGTTAATTGTTCTTCCAGATGTTTGTACCATTCCAACATCATTTAAATCATTAAATGAAAAGCCTTTCCTTAC TAGCTAGCCAGCTTTGAAAATCCATTCATAGGGTTTGTGTTAATATATTTTTGTTCTTTTTTTTCCTTTCTACTGATCTCTTTATATTAATACCTACTGTGGCTTTATATGAAGTCATGGAATAATACGTAGTAAGCCCTCTAACACT GTTCTGTTACTGTTGTTATTGTTTTCTCAGGGTACTTTGAAATATTCGAGATTTTATTATTTTTTAGTAGCCTAGATTTCAAGATTGTTTTGACGATCAATTTTTGAATCAATTGTCAATATTTTTAGTAATAAAATGATGATTTTTG ATTGGAAATACATTAAATCTATAAGCCAAATTGGAGATTATTGATATATTAACAAAAATGAGTTTTCCAGTCCATGAATGTATGCACATTATAAAATTCATTCTTAAGTATGTCATTTTTTAAGTTTTAGTTTCAGCAGTATATGTTT GTTACATAGGTAAACTCCTGTCATGGGGGTTAGTTGTACAGGTTATTTTATCATCCAGGCATAAAGCCCAGTACCCAGTAGTTATCTTTTCTGCTCCTCTCCCTCCTGTCACCCTCCACTCTCAAGTAGACCCCAGTTTCTGTTGTTC TCTTCTTTGCATTAATGACTTCTCATCATTTAGATTGCACTTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTTAGTTTGCTAAGGATAACCACCTCCATCTCCATCCATGTTCCCACAAAAGACATGATCTCCTTTT TTATGGCTGCATATTATTCCATGGTATATATGTACCACATTTTCTTTATCCAATCTGTCATTGATGGACATTTAGGTTGTTTCCACATCATTGCCGTTGTAAATACTGCTGCAGTGAATATTCGTGTGTATGTCTTTATGGTAGAATG ATTTATATTCCTCTGGGTATATTTCCAAGTAATGGGATGGTTGGGTCAAATGGTAATTCTGCTTTTAGCTTTTTGAGGAATTGCCATATTGCCTTTCACAACGGTTGAACTAATTTATACTCCCAAGAGTGTATAAGTTGTTCCTTTT TCTCTGCAACCTCGACATCACCTGTTATTTATGACTTTTATATAATAGCCATTCTGCTGGTCTGAGATGGTATCTCATTATGATTTTGATTTGCATTTCTCTAATGCTCAGTGATATTGAGCTTGGCTGCATATATGTCTTCTTTTAA AAATATCTGTTCATGTCCTTTGCCTAATTTATAACGGGGTTGTTTGTTTTTCTCTTGTAAATTTGTTTAAGTTCCTTATAGATTCTAGGTATTAAACCTTTTTTCAGAGGCGTGGCTTGCAAATATTTTCTCCCATTCTATAGGTTGT CTGTTTATTCTGTTGATAGTTTCCCTTGCTGTGCAGAAGCTCTTAACTTTAATTAGATCCGACTTGTCAATTTTTGCTTTGGTCGCAATTGCTTTTGATGTTATTGTCGTGAAATCTTTGCTAGTTCTTAGGTCCAGGATGATATTGC CCAAGTTGTCTTCCAGGGCTTTTATAATTTTGGATTTTACATTTAAGTCTTAATATATTTATTAAATTTGTTAGGGTTTCAGGATACAAGGACAATATAGCAGCAAACAATGTAAAAGTAAAATCTGAAAAATAATAGAAAACAGTTT AATTGAACACTTTACCATTATGTAATGCCCTTCTTTGTCTTTCCTGATCTTTGTTGGTTTGAAGTTCAAAAAAGACAAACTTAATGGTACAATAGGTATTGTAGATTTCAGGACTTTCTGTATAAAATATTTTGTATATATGAATAGA TCATTTTTTATTTCCAGTCTTTAAACATTTTCTTAACATTTTCTTCTATTGCTTCACTTCACTCGCTAGGACCATCAGGACAGTGTTGAACAGAAATTGTCAGACTGATCATCACAACTTTTTCTAGATTTTAGAAGGAAATTTTTCT TTATTTCAACATAAAGCAGCATGTTAATGCCAAGTTTTAATATGTGTTATCAGATTGAAATTTTTTTGTATATTTCTACATTACCAAGAATTTTTAGCAAGAGTTTTTGTTGAGTTTTAATTTAAAAATCATTTGTTAATTTCATCTG ATTTTTTTATTTCTCTTTTTACCTTAAGAGATTAAACTGACTACAGATTGAATATAAACAAACAAACAAACAAACAAAAACTCTAAAATGCTGTGGATCAACACCACTTAGTAATTTGTATACTTGGATTCAATTTGCTGAAATTTTG TTAGACATTTTTGCGTCGATATTTATGAGGGATGTTGATCTGTAAAAGTATTAAAATGCCTTTGACAGATTTTGATAGCAGTGTTATTCTGGCCTAATAAATCAAACTGAGGTATGATCCTTCCTTTTCTATTTCTTAATAGCATTTT TAAAATTGGTGGTTTTTTCCTTCCTTAGTGAAATTTACCAGCAAAGTAACAGGCCTTATATTTCTCTTGTGGAAATATTTTAATTTCAAATTAATGGTATTTTGTTCTTGTAGGGTGGTAATTTTCTCTGTGTTTGGTCTTAATGGAC TCTTAGCTGATCACCCAGTTACTCAGCGAGGTCTCTTCACTCTGGAAGAGCTGGAACTCCAGTGTGTTTTAGTGCAGCATGACCACGGGTATTACCGTTCAACATTTAGGCTTTATCAGTGATAACTATTTGTCCTCATGGAGTTTTT GCCGCTGGGCCTACACAGTTTAGGCTTCAGCTTAGAACACATAATGAATTCTTATGCAGATTTCTGCCCACCTTTGACCTTTCATGATTTCCTCTTCTTGGGTAAGCTGCCTTATTAATCTGATACACTTCAGCAGTCCAGAACTACA CTCTTTCCCTTCTCTGCTCTTGGAGATGACTCTTTTGTCTGAGATTCACTTTGCTGTGCTGAAAAAGAAAAGTGCTTCAAGGAAGATACCAAGGAAAATCACAGGGCTCATTTATGTATTTCTCTTCTTTCAAGGACTACAGCTTTGT GTTGCCTATGTTCAATTTCTGAAAATAATTAGAGCATATATACTCTGTGTGAGAAGGCAAATCCAGACAGTTAGTTTGTATGACTAGAAGCAGAAGTCTACATGGAGAATTTTACTTAACTGTGTTATAGTTTCTTTAATTATTTCAA GAGTATGTTTAATGTTCCACAGATCTCATTCTATAAATCTTTATCATCTTAGAGCTCTGATACTATTTAGAATTACTATTCCTTCAAATAAGAGATTAGAAACAGGGTTATATTTGGGGTAGGTTGACTTACTTTTCTGGGAACCAAA GCATATTAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATTTACATTCTGCAAAATCATTCTTTCCTTTTTGAATTTGAAAAGGATCTTTGGTATACAGATATTCAATAGCCAGCCTGAAGATTCATTTGAAT TCATTTAATGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAAGTATCTATAACGATGGAAATATACATCTCCACTGCCCAAGATGGTAGTCATGAGTCAATATTGATCATGTGAGACGTGGCAAGTGTTACT CAGGGTCTCAATATTTAAATGTATTAAGCTTTAATTAATGTAAATTTGAATTTAGCAAAACATGTATAGCTTGTGGTTACTGTTTTATTCAGTGCCAATATAGAACATTTCCATGATTACAGAAAGTTATCTTAGAATACTCAGTTCT GGACTATTTTATCTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTGTTGAATATTTTTATAGCAAAAGTCATTTATAAATTTAAAACTCAAATAATTATCTTTTTCAATATGTAAAATATGTCTTTACATAT TCTACTCCCTTCTTACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGAAATGACAACATAAATAATTTTACTAAGTGTCACCATATAAAAAACTTTGAACAAAATCAGATTATATCACTGTGGATATTTCTA TTTTGAACTAACTTAGATGATAATTTTAATCTATATCCTAGATGAACTTTAAATCAATAAAATCTCTCAATGGTGTTATAAATCTCAAGCCATTAGCCACTGATTATCCCATTTTTATTCTTTTCATATTAATTTTATTGCCATGTAT GAATGCTGTAGCATCCATGTTTAAATACTAGTTAACAAAATGCACTGGCATCAGATACAATAAGGATGAAATGAGATATAATTAGGACTCTGGTAACACACATAAAATTGGAAAGATACCCTGAAATTCAAGCCAAGAAGATATTTAT CCAGCTTATTTTATTTTGAGACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGGCTCGCTCCAACCTCTGTCTCCCAAATTGAAGTAATTCTCGTGCCTCAATCTCCCGAGTAGCTGGGATTACAGGCATGT GTCACCAAGCCTGGCTGATTTTTGTAGTTTTAGTAGAGACGGGGTTTCACCATGATGGCCAGGCTGGTCTTGAACTCCTGGCCTCAAGTGACTGGAACACCTCGGCCTCCTAAAGTGCTGGGATTACAGACGAGAGCCACTGAACAGC TTTGATCCAACTTATTTGGATGAATGAGTTACATATTTTACATTAAATCTGTTATTGTGATAATTCTTCATGTTATTTTCCATGTATAGATTTATATATAATGTAATTTTAATTTTTTTTCACCGGAGAGTATAAACAACAATTATTT TATAAACAGGATAATAAAAATAAGACAAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGTTACTTTGAAGCTGTTATCAATACTTGTGATGTATTACAATTAAGTAAAGATTTAAAGATGCCATTTTTAACT TATTATGACACAAAGTCTATAAATTCTTATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATAAAATTTCTTCTATGGATTGGTCTTCAATCGAGGCATAAAAAGGAATATAACAGTGTGGCACTATAACTTC TATATTGAATTTCTATATTATTTAACACAATTATAATTTTGCTAATGAATTGTAATGTTTTTAAAAAGCTAGGTGAATTTTATTAAATTCATTACATGGCGATAACACAGAGAAAACATTTTGGGGATTCTTTTAAAATGGTATGTAC AAAAGCTTAAAAGTTGTTATGTAGTGGCAGAGATAAAAAAGTAAAACAAAAAAAAGCTTAAAAGTTTGCTTTACTATTTATAGGCTCATAAGTGTAAGTGTGCCAGAAAATGAAAAAGAAAGGAGAGAAATTATAAATAACTGTGTGG AAAACACAGATAAAGCATAAAGATAGAATATAAAGATAGAAGCATTTTAATATGAGGCAGTGATGGCTTTTTGAAGAATCCCAACTAAGGACCTACTTTTAGTTAATAAATAATATGTTTCTAATCCCTATATTGTCCACAGCAACCT TTTTAGGACATGGAGCAGTGACTATGAGTGCCAGAAGGCAAGAGTAGAAGCAATTGTAAAATCATGAACACTAGTTTGTAAAATCCTCACTGAGATATAATATCTGTTTGCCTCTACCTTAGAATTATTAATGTCTTGAGGGCTGGGA Avery small piece of chromosome 21 What's in a genome? Genes (i. e., protein coding) But. . . only <2% of the human genome encodes proteins Other than protein coding genes, what is there? • genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.) • structural sequences (scaffold attachment regions) Regulatory sequences • "junk" (including transposons, retroviral insertions, etc.) 10 20 30 40 SO 60 70 eo 90 100 I I I % C or G I I I I 41 I XAorT I I ! 21% 34% 42% 45% 48% 53% 90.5% — 192% 100 IJNEs SINEs DNA transposon fossils' Retrovir us-1 Ike elements Segmental duplications Simple sequence repeats _II_ Hetero chrom Introns Protein coding regions. Genes REPEATS UNIQUE Pseudogenes • 70% of processed pseudogenes (retrotransposon genes) • 30 % unprocessed pseudogenes (duplicated genes) • -20,000 Torrents et al. Genome Res. 2003 13: 2559-67. isomerases; 94; 0,5% receptors; 1076; 6,3% storage proteins; 15; 0,1% structural proteins; 280; 1,6% surfactants; 15; 0,1% cell junction proteins; 67; 0,4% chaperones; 130; 0,8% -* / transcription factors; 2067; 12,0% phosphatases; 230; 1,3% membrane traffic proteins; 321; 1,9% transfer/carrier proteins; 248; 1,4% hydrolases; 454; 2,6% defense/immunity proteins; 107; 0,6% calcium-binding proteins; 63; 0,4% viral proteins; 7; 0,0% unclassified; 4061; 23,6% extracellular matrix proteins; 72; 0,4% proteases; 476; 2,8% cytoskeletal proteins; 441; 2,6% transporters; 1098; 6,4% transmembrane receptor regulatory/ /adaptor proteins; 84; 0,5% transferases; 1512; 8,8% oxidoreductases; 550; 3,2% lyases; 104; 0,6% cell adhesion molecules; 93; 0,5% ligases; 260; 1,5% nucleic acid binding; 1466; 8,5% signaling molecules; 961; 5,6% enzyme modulators; 857; 5,0% What are application to Medicine and Biology? > Geny podmiňující gen. Choroby >rok 2000 - 1300 genů pro choroby s jednoduchou mendelovskou dědičností >rok 2010-2900 genů >Zbývá asi 1800 >1100 genů asociovaných s 165 častými chorobami(včetně chorob infekčních) >(např. IBD 70-100 genů) What are application to Medicine and Biology? >Paralogní geny (achromatopsie, CNGA3, CNGB3); (971 známých genů => 286 paralogních genů) >Cíle zásahu medikamentů - recentní kompendium = 483 cílů, 18 nOVě identifikovaných; (Alzheimer's disease, p-amyloid is generated by processing APP by BACE; BACE2 in obligatory Down's syndrom region of chromosome 21) Mutation rate in humans 1. Error rate in DNA replication (1.0 x 10"10 per bp) Given that the human genome is 3.2 x 1 o9 bp, there are about 30 cell generations between zygote and egg cells and about 400 cell divisions between zygote and mature sperm. Thus, in males, the sperm cells have about 128 new mutations and the haploid egg genome has about 10 new mutations for a total of 138 new mutations in every new zygote. 2. Phylogenetic method (humans vs. chimpanzees) 112-160 new mutations in every new generation. 3. Direct method (human families) 56-103 new mutations in every new generation, but only 60-80.% of the genome sequence is reliable More than 100 new mutations in every new generation. number of mutations throughout humanity per generation My TTT TOT n n n n n n n mutations Q ^ ^ =s 10-100--— x7x10y people » 1 0 ' -10 lz mutations per generation globally generation r r r 13 3 v base-pairs human genome tens to hundreds of people with denovo appearance of any specific mutation (3x1 09)2 possible 2 bp combinations > 1 oe generations for humanity to randomly 10i2mutatlof1s/generatlon reach a specific 2 base-pair mutation We find that each single base pair mutation is explored dozens of times in every generation but that a specific combination of two base pairs will require an unrealistic number of generations to occur at random. Mclean et al., 2011 Nature Search for complete deletion of sequences otherwise highly conserved between chimpanzees and other mammals. 510 such deletions were identified in humans, which fall almost exclusively in non-coding regions and are enriched near genes involved in steroid hormone signalling and neural function. One deletion removes a sensory vibrissae and penile spine enhancer from the human androgen receptor (AR) gene, a molecular change correlated with anatomical loss of androgen-dependent sensory vibrissae and penile spines in the human lineage. Another deletion removes a forebrain subventricular zone enhancer near the tumour suppressor gene growth arrest and DNA-damage-inducible, gamma (GADD45G), a loss correlated with expansion of Chimpanzee Sequencing and Analysis Consortium Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005 Sep l;437(7055):69-87. Thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements. 98,6 % identitity to human genome sequence Differences in gene/exon structures Phylogenetic trees M4 M25 Savanna elephant (L africana) Forest elephant {Loxodonta cyclotis) Mammoth {M. primigenius) Indian elephant (£ maximus) Neanderthal {Homo neandertbatensis) Human {Homo sapiens) Chimpanzee (Par? troglodytes) Bonobo {Pan partiscus) Gorilla {Gorilla gorilia) Apparent differences between humans and great apes in the incidence or severity of medically important conditions (excluding differences explained by obvious anatomical differences). Medical Condition Definite HIV progression to AIDS Influenza A symptomatology Hepatitis B/C late complications P. falciparum malaria Menopause Likely E. coli K99 gastroenteritis Alzheimer's disease pathology Coronary atherosclerosis Epithelial cancers Humans Common Moderate to severe Moderate to severe Susceptible Universal Resistant Complete Common Common Great Apes Very rare Mild Mild Resistant Rare Sensitive? Incomplete Uncommon Rare [EXPERIMENT] SCANNING THE GENOME To find the parts of our genome that make us human, the author wrote a computer program that searched for the DNA sequences that have changed the most since humans and chimpanzees diverged from their last common ancestor. Topping the list was a 118-letter snippet of code known as human accelerated region 1 (HAR1). This region of the genome changed very little for most of vertebrate evolution, with chimp and chicken sequences differing by just two letters. Human and chimp HAR1s, however, differ by 18 letters, suggesting that HAR1 acquired an important new function in humans. Common ancestor of humans and chimps Human Chimp Common ancestor of humans and chickens Chicken tgaaaqggaggagat t a c a g c a at0t c aqc t gaaa t □atQggQgta agcQgQggaa t caaaatQaa gattttcctc gacQcaQgtc at0gtttcta agtQtttaga aaQtttca Changes in human sequence relative to that of the chimp tgaaatggaggag agcaatttatcaa tataggtgtagac agcagtQgaaaQa tcaaaattaaagt gattttcctcaaa a a a t t a c c t g a a a t a c a t g t c g t t t c t a a t t t a g a t t t c a | Changes in chimp sequence relative to that of the chicken tgaaatggaggagaaattac agcaatttatcaactgaaat tataggtgtagacacatgtc agcagtagaaacagtttcta tcaaaattaaagtatttaga gattttcctcaaatttca [FINDINGS] DISTINCTIVE DNA Efforts 1o uncover uniquely human DNA have yielded a number of sequences that are distinctive in humans 35 compared with chimpanzees. A partial list of these sequences-and some of their functions—follows below. sequence: har1 What it does: Active in the brain; may be necessary for development of the cerebral cortex, which is especially large in humans. Possibly atso involved in sperm production. sequence: FOXP2 What it does; Facilitates formation of word* by the mouth, enabling modern human speech. sequencers™ What it does: Controls brain size, which has more than tripled overthe course of hu man evolution. sequence: AMY1 What it does: Facilitates digestion of starch, which may have enabled 1 early bjmans to exploit noireE foods. sequence: LCT What it does; Permits digestion of milt sugar in adulthood, allowing people to make milk from domesticated animals a dietary staple. sequence: HAR2 What it does; Drives gene activity in the wrist and thumb during development, h 11 activity that may have given the hand enough dexterity to make and use complex tools. brain shapers: Changes to certain genome sequences can have dramatic effects on the brain. Mutation of the ASPM gene, for example, leads to markedly reduced brain size {middle) compared with a normal brain (fop), suggesting that this gene played a key role in the evolution of large brain size in humans. Malfunctions in the neurons in which HAR1 is active during development, meanwhile, can lead to a severe disorder in which the brain's cerebral cortex fails to fold properly (bottom), hinting that HAR1 is essential for the formation of a healthy cortex. Evolution of FOXP2 Genetic differences between Modern Man {Homo sapiens) and other species Neanderthal Man {Homo neandertalensis) an extinct member of the Homo genus appeared in Europe 500.000 years ago Europe and West Asia extinct about 30,000 years ago interbreeding took place with modern humans between roughly 80,000 to 50,000 years ago in the Middle East Neanderthal Man (Homo neandertalensis) Neanderthal Man DNA • Bone fouund in 1980 in Vindija Cave, Croatia • Radioisotope dating: 38,310 ±2,130 let http://www.nature.com/nature/journal/v444/n7117/pdf/444254a.pdf Genetic identity: 99,5% 1-4% of the genome of people from Eurasia having been contributed by Neanderthals Pseudomonadales (1,470; 0.6%), Burkholderiales (1.912; 0.8%) em (8,408 Primates (15,701; 6.2%) Actinomycetales (17,213; 6.8%) Rhizobiales (1,230; 0.5%) Enterobacteriales (788; 0.3%) Poates (429; 0.2%) Rhodocyclales {394; 0.2%) All other orders (6,559; 2.6%) No hit (200,829; 79.0%) Homo flo res lens is the Hobbit LAOS VIETNAM Philippine Sea MYANM PHILIPPINES CAMBODIA South China Sea Pa cific Ocean BRUNEI MALAYSIA ^i-"- ' , . „ „nnr 1 1 Celebes Sea Manado -Halmahera Msdan_, V*SL % SINGAPORE / J Pematangsiantar -v. ', ' „ , , ^—WmtiiMf Pekanbarus- Pontianaks- Samarinda PadangS-«mtl| BOJW^ 3Palembang rBaniarmasin BAR I SAM MTS , _ Java Sea Ujungpandang JAKARTA Mount Madura -SunteSto»--- +. "am* | Flares Sea Bandung S , Surabaya l(- ■ Yogyakarta .)■ | « J*!± | EAST TIMOR Jra/iini Sumbawa TlmQr iL-i^o-i-- ■■■: Sea PAPUA NEW GUINEA Moluccas I s CeramSea Jayapum Puricjik Jtt(ya4 M AO K E MTS Banda Sea IRIAN J AYA i Cufnea Kupang INDONESIA Indian Ocean _= Specialty ! "Ik* Pruducer Timor Sea AUSTRALIA .000 - 13.000 years The Denisovan is an extinct subspecies of human in the genus Homo. Pending its status as either species or subspecies it currently carries the temporary name Homo sp. Altai, and Homo sapiens ssp. Denisova. In March 2010, scientists announced the discovery of a finger bone fragment of a juveni female who lived about 41,000 years ago, foun in the remote Denisova Cave in the Altai Mountains in Siberia. Analysis of the mitochondrial DNA (mtDNA) the finger bone showed it to be genetically distin from the mtDNAs of Neanderthals and modern humans. Denisovans shared a common origin with Neanderthals, that they ranged from Siberia to Southeast Asia, and that they lived among and interbred with the ancestors of some modern humans, with about 3% to 5% of the DNA of Melanesians and Aboriginal Australians deriving from Denisovans African Cradle geneticists agree that modern humans arose some 200,000 years ago in Africa, fossils were found in Omo Kibish, Ethiopia. Sites in Israel hold the earliest utside Africa, but that group went no farther, dying out about 90,000 years ago Genetic diversity of Modern Man (a) mtDNA HVS I; unrooted and pruned Malaria Tuberculosis Smallpox Leprosy Cholera AIDS W -200,000 years ago -100,000-50,000 years ago 12,000 2,200 500 years ago years ago years ago Today Modern humans emerge in Africa Migrations within Africa Migrations out of Africa Early agriculture [neolithic Silk Road links Africa, European colonization demographic transition) Europe and Asia of Americas begins Globalization Nature Reviews | Genetics Key events in recent human evolution (boxes outlined in black) are juxtaposed with the estimated ages of infectious disease emergence (boxes outlined in red). The fragmentation of the human lineage into genetically and geographically distinct populations (blue lines) accelerates with migration out of Africa. Later, these populations started mixing more (blue shaded regions between the populations) along trade routes (such as the Silk Road), through colonization and through high rates of global travel nowadays. Common Erythrocyte Variants That Affect Resistance to Malaria Protein G6PD GYPA GYPB GYPC SCL4A1 Duffy antigen Glucose-6-phosphatase dehydogenase Glycophorin A Glycophorin B Glycophorin C a-Globin ß-Globin Haptoglobin Function Chemokine receptor Enzyme that protects against oxidative stress Sialoglycoprotein Sialoglycoprotein Sialoglycoprotein Component of hemoglobin Component of hemoglobin Hemoglobin-binding protein present in plasma (not erythrocyte) CD233, erythrocyte band 3 protein Chloride/bicarbonate exchanger Reported Genetic Associations with Malaria FY*0 allele completely protects against P. vivax infection. G6PD deficiency protects against severe malaria. GYPA-deficient erythrocytes are resistant to invasion by P. falciparum. GYPB-deficient erythrocytes are resistant to invasion by P. falciparum. GYPC-deficient erythrocytes are resistant to invasion by P. falciparum. a+ Thalassemia protects against severe malaria but appears to enhance mild malaria episodes in some environments. HbS and HbC alleles protect against severe malaria. HbE allele reduces parasite invasion. Haptoglobin 1-1 genotype is associated with susceptibility to severe malaria in Sudan and Ghana. Deletion causes ovalocytosis but protects against cerebral malaria. Cystic fibrosis 1 in 3,000 children are born with CF, and 2% of people carry one mutant gene tlaiiiů and Nú-núíaiile CyitIt Flhtuli Ní.....■ ill" r.!:r,,M,. 4khtm runcflicuJ CFTR pfrrimn. Pfcvifiing iiMvtnt adwvil*Q*l Saw™ throw bader.il wifa.1ion of airw*v« h*p*2l0lMlkMY fill Lir min nj-rKJilt* I'.t 1Í i ■ , M«the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts 52 ENCODE {Encyclopedia of DNA Elements) For the last decade, geneticists have run a seemingly endless stream of "genome-wide association studies" (GWAS), attempting to understand the genetic basis of disease. They have thrown up a long list of SNPs -variants at specific DNA letters—that correlate with the risk of different conditions. The ENCODE team have mapped all of these to their data. They found that just of the SNPs lie within protein-coding areas. They also showed that compared to random SNPs, the disease-associated ones are 60 percent more likely to lie within functional, non-coding regions, especially in . This suggests that many of these variants are controlling the activity of different genes, and provides many fresh leads for understanding how they affect our risk of disease. HapMap Project >The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. >DNA samples from 269 people from Africa, Japan, China and USA >10 millions nucleotide sites where people most frequently differ - SNPs (99,9%) >SNPs are present in haplotypes. Haplotype is a set of single-nucleotide polymorphisms (SNPs) on a single chromatid (half a chromosome pair) that are statistically associated (300.000 - 600.000 haplotypes). Within the body of a healthy adult, microbial cells are estimated to outnumber human cells by a factor of ten. These communities, however, remain largely unstudied, leaving almost entirely unknown their influence upon human development, physiology, immunity, and nutrition. The Human Microbiome: Our Second Genome The human microbiome is a source of genetic diversity, a modifier of disease, an essential component of immunity, and a functional entity that influences metabolism and modulates drug interactions. Human mtcTObionra Pilot HM projects Disease/symptoms skin psoriasis acne vulgaris atopic dermatitis GIT obesity morbus Crohn esophageal cancer necrotizing Enterocolitis colitis ulcerosa irritable bowel syndrome Urogenital tract bacterial vaginosis STD Systemic imunodeficiences febriles HIV epidemic The CCR5 locus shows that historical epidemics have been important in shaping the genomes of humans and other primatespecies. It has been projected that if the HIV epidemic continues for another 100 years, it will leave a signature on the human genome at the CCR5 locus and related HIV resistance loci. Although higher HLA-C expression protects against HIV progression, it also increases risk of the inflammatory disorder Crohn's disease, which highlights the potential for health repercussions of pathogen-driven selection. The human leukocyte antigen (HLA) system is the locus of genes that encode for proteins on the surface of cells that are responsible for regulation of the immune system in humans. The HLA genes are the human versions of the major histocompatibility complex (MHC) genes that are found in most vertebrates HLAs corresponding to MHC class I (A, B, and C) presenl peptides from inside the cell. HLAs corresponding to MHC class II (DP, DM, DOA, DOB, DQ, and PR) present antigens from outside of the cell to T-lymphocytes. HLAs corresponding to MHC class III encode component! of the complement system. HLAs are important in disease defense. They are the major cause of organ transplant rejections. They may protect against or fail to protect (if down-regulated by an infection) against cancers.Mutations in HLA may be linked to autoimmune disease (examples: type I diabetes, coeliac disease). HLA may also be related to people's perception of the odor of other people, and may be involved in mate selection, as at least one study found a lower-than-expected rate of HLA similarity between spouses in an isolated community.^ The MHC represents the most polymorphic gene cluster in humans, and more than 2,700 alleles have been described for the most variable gene, HLAB. Increased diversity at HLA class I genes (compared to the genome average) is observed in populations living in geographic regions where pathogen diversity is also high. Diverse distribution of HLA-B alleles worldwide Nature Reviews Immunology Inflammatory bowel disease Caused by autoimmune attacks on the gastrointestinal system. 163 distinct loci have been significantly associated with IBD. Seven of the eight leprosy susceptibility loci are also associated with increased IBD risk. Coeliac disease Coeliac disease is a strongly heritable (-80%) inflammatory intestinal disorder triggered by gluten consumption. Coeliac disease occurs at 1-2% in Europe and up to 6% in North African Sahrawi. Individuals who are homozygous for the coeliac risk allele (-22% of the European population) have stronger activation of the NOD2 pathway and a 3-5-fold higher pro-inflammatory cytokine response to lipopolysaccharide. Better protection against bacterial infection may have conferred a selective advantage that outweighed the increased risk of coeliac disease risk. Non-autoimmune disease: kidney disease African Americans suffer from kidney disease — including focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) — at higher rates than European Americans. Two independent coding variants in APOL1 that are strongly associated with FSGS (odds ratio = 10.5) and H-ESKD (odds ratio = 7.3). In vitro assays showed that the kidney disease-associated variants lyse T. brucei rhodesiense, which is the trypanosome parasite that causes the most acute, virulent form of sleeping sickness. Common ancestor (48) Human (46) Neanderthal (46?) Denisovan (46) Gorilla (48) Bonobo (48) Chimpanzee (48) Chromosome fusion leading to human chromosome 2 The Wallace Line or Wallace's Line is a faunal boundary line drawn in 1859 by the British naturalist Alfred Russel Wallace that separates the ecozones of Asia and Wallacea, a transitional zone between Asia and Australia. Nature Reviews | Genetics Denisovans shared a common origin with Neanderthals, that they ranged from Siberia to Southeast Asia, and that they lived among and interbred with the ancestors of some modern humans, with about 3% to 5% of the DNA of Melanesians and Aboriginal Australians deriving from Denisovans