CG920 Genomics Lesson 2 Genes Identification Jan Hejätko Functional Genomics and Proteomics of Plants, Mendel Centre for Plant Genomics and Proteomics, Central European Institute of Technology (CEITEC), Masaryk University, Brno hejatko(5)sei.muni.cz, www.ceitec.muni.cz investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Literature ■ Literature sources for Chapter 02: Plant Functional Genomics, ed. Erich Grotewold, 2003, Humana Press, Totowa, New Jersey Majoros, W.H., Pertea, M., Antonescu, C. and Salzberg, S.L. (2003) GlimmerM, Exonomy, and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Research, 31(13). ■ Singh, G. and Lykke-Andersen, J. (2003) New insights into the formation of active nonsensemediated decay complexes. TRENDS in Biochemical Sciences, 28 (464). Wang, L. and Wessler, S.R. (1998) Inefficient reinitiation is responsible for upstream open reading frame-mediated translational repression of the maize R gene. Plant Cell, 10, (1733) de Souza et al. (1998) Toward a resolution of the introns earlyylate debate: Only phase zero introns are correlated with the structure of ancient proteins PNAS, 95, (5094) Feuillet and Keller (2002) Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution Ann Bot, 89 (3-10) Frobius, A.C., Matus, D.Q., and Seaver, E.C. (2008). Genomic organization and expression demonstrate spatial and temporal Hox gene colinearity in the lophotrochozoan Capitella sp. I. PLoS One 3, e4004 investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ■ Forward and Reverse Genetics Approaches ■ Differences between the approaches used for identification of genes and their function ■ Identification of Genes Ab Initio ■ Structure of genes and searching for them ■ Genomic colinearity and genomic homology ■ Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology ■ EST libraries Forward and reverse genetics investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ■ Forward and Reverse Genetics Approaches ■ Differences between the approaches used for identification of genes and their function EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > ~Á- ^^^^^jfl investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Forward vs. Reverse Genetics Revolution in understanding the term „gene" .classical11 genetics approaches »reverse genetics" approaches 5TTATATATATATATTAAAAAATAAAATAAAA Identification of the role OÍARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > ~Á- ^^^^^jfl investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role OÍARR21 gene Recent Model of the CK Signaling via Multistep Phosphorelay (MSP) Pathway HPt Proteins • ahp1-6 nucleus pm AHK sensor histidine kinases • ahk2 • ahk3 • cre1/ahk4/wol Response Regulators • arr1-24 regulation of transcription interaction with effector proteins Identification of the role OÍARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis • Mutant identified by searching in databases of insertional mutants (SINS-sequenced insertion site) using BLAST EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene — isolation of insertional mutant Searching in databases of insertional mutants (SINS) Insert_SIKS: 01_09_64 Query: 80 tcctagcgttcatgagcgtaccatacttgacaanagagaacgtagccagccatttacagg 139 i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Sbjct: 58319 tcctaycyttcatyaycytaccatacttgacaagagagaacgtagccagccatttaeagg 58378 Arr21: 1830 InsertSIHS: 010964 Query: 140 tttgatatctcttgtcaaaaatgtttttggattttactgt 179 i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i Sbjct: 5 8379 tttgatatctcttgtcaaaaatgtttttggattttactgt 58418 Arr21: 1890 Localization of dSpm insertion in genome sequence of ARR21 using sequenation of PCR products 16k - dll atg| D2 D1 K W 1727 bp 1728 bp 16k-16d Identification of the role OÍARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis • Mutant identified by searching in databases of insertional mutants (SINS-sequenced insertion site) using BLAST • Expression ofARR21 in wild-type and inhibition of expression of ARR21 in insertional mutant confirmed at the RNA level investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene — analysis of expression wild type expression insertional mutant vs wild type gene /cycles ACTIN 2/20 ACTIN 2/25 controls water DNA F 4 o — y, ■o f oj a) d. « íl EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, mládeže a tělovýchovy gene / cycles primers ACTIN 2 / 25 aktU1 -aktL1 ARR21 /40 2UI -2LII ARR21 / 40 1UII - 1LI ARR21 /40 2UI - dsLb OP Vzdělávání pro konkurenceschopnost > S S 1 ~ controls water DNA investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role OÍARR21 gene • Hypothetical signal transducer in two-component system of Arabidopsis • Mutant identified by searching in databases of insertional mutants (SINS-sequenced insertion site) using BLAST • Expression ofARR21 in wild-type and inhibition of expression of ARR21 in insertional mutant confirmed at the RNA level • Phenotype analysis of insertional mutant investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene - phenotype analysis of mutant $ ^ $ŕ 5f Analysis of sensitivity to plant growth regulators ■ 2,4-D a kinetin ■ ethylene ■ Light of various wavelengths 100 ^ 30 Q CM 10 No alterations - nor in flowering, neither in the number of the seeds EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, mládeže a tělovýchovy OP Vzdělávání pro konkurenceschopnost > 3 10 30 100 300 1000 kinetin jxg ■ I-1 investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene — possible reasons for the absence of the phenotype • Functional redundance within the gene family EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání mládeže a tělovýchovy pro konkurenceschopnost > 1/1 ~Á- ^^^^^jfl investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene - homology of ARR genes Legpnda: □ ARR-A ■ ARR-B • nalezena alespoň jedna EST Identification of the role of ARR21 gene - causes of absence of the phenotype • Functional redundance within the gene family? • Phenotype only under specific conditions EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of the role of ARR21 gene - summary ■ Gene ARR21 identified by comparative analysis of Arabidopsis genome ■ Based on sequence analysis, its function was predicted ■ Site-specific expression of ARR21 gene was proved at the RNA-level ■ Identification of gene function by insertional mutagenesis in case of ARR21 in development of Arabidopsis was not successful, probably because of functional redundancy within the gene family investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ■ for identification of genes and their function Identification of Genes Ab Initio * Structure of genes and searching for them EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > ~Á- ^^^J,ť 'í. j-. S- investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky RNA Splicing 5 splice sile introu 3-splice site + 9HUI1 3" conscrued regions EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > in a* investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of Genes Ab Initio ■ Omitting 5' and 3' UTR ■ Identification of translation start (ATG) and stop codon (TAG, TAA, TGA) ■ Finding donor (typically GT) and acceptor (AG) splicing sites ■ Using various statistic models (e.g. Hidden Markov Model - HMM, see recommended literature, Majoros et a/., 2003) to evaluate and score the weight of identified donor and acceptor sites EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání mládeže a tělovýchovy pro konkurenceschopnost > 1/1 investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Splicing Site Prediction Programs for splice site prediction (specifity approximately 35 %) □ GeneSplicer (http://www.tigr.org/tdb/GeneSplicer/gene spl.html) □ SplicePredictor (http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi) EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání mládeže a tělovýchovy pro konkurenceschopnost > i/i investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky SplicePredictor BCB © ISU Download Help Tutorial References Contact SplicePredictor - a method to identify potential splice sites in (plant) prc-mRNA by sequence inspection using Bayesian statistical models (click here to access the older method using logitlincar models) Sequences should be in the onc-lcttcr-code ({a,b,c,«,h,k,in,n,r,s,t,u,w,y}), upper or lower ease; all other characters are ignored during input. Multiple sequence input is accepted in FA ST A format (sequences separated by identifier lines of the form ">SQ;name_of_sequence comments") or in CenBank format. Paste your genomic DMA sequence here: GAG GA G G C ACAAAAT GAC GMT ATAC AAAAT G AT C TT A A AC A G C T AAAC T AT AT T G G AC AT T T T T T C G AT C T C A G AT AT A AAAGATTTCATTCAATATAATACTTGGATAAATACTCTTATTATTTTTCTTTAGTTTATTAAAAAAAACCTCTAATAAAT ACGAGTTTAAGTCCACAAAATCGCTTAGACTAAAATACACCATATAATTTCAAACGATAAAGTTTACAAAAGTAATATCC AAGT ATCTCATAGTC AAC A T AT AT AT AGTAATAAT TAGTTGACGT ATAAGAAAAT AAAAAT AAATAAAT TAGTATCTTAT TTTGGGTGGTGCTGACTGGTGACTGGTGACTGCAGAATGCTCGGCAAATGGAACCATATCCCAAGACATGGGTTTTAGAT ... or upload your sequence file (specify file name): \ Browse... ] ... or type in the GenBank accession number of your sequence: EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky SplicePredictor What do the output columns mean? SplicePredictor. Version of February 13, 2005. □ate run: Wed Nov 9 11:30:14 2005 CCGAATGCCTGAGATATTGTTTC:TAAAA"GAGATGATTGTTT"TA"TTA"TACCATGATTTGT~TjTa:TäAjC"TCCTTTCCCCTTTGCAATACATA[2GATATAAATTCATACATGTTCCTAATTT"AT"TT gg:t"a:ggac"ctataacaaaggattttactctactaacaaaaataaataatggtactaaacaaacatgattcgaaggaaaggsgaaacgttatgtatcctatatttaagtatgtacaasgattaaaataaaa BpuEl Bglll tgcacttgagt~t atggt~t tc~t tggtggaagatc~atat gtat !]~AT a~CT at attatttt actct~t tct tcgt^ 2 Species; Model: Prediction cutoff [2 ln[BF]): Local pruning: Non-canonical sites: Homo sapiens 2-class Bayesian 3.00 on not scored Sequence your-sequence, from 1 to 9490. Potential splice sites —- <— <— <— —- <— ACjTGAACTCAAATACCäAAAGAAäCCACCTTCTAGATA~A-A~AjATATAGA~ATAATAAAATGAjAAAAGAAGCAGCAGTAAATATCATATAATATATATACGTGTGTGTGTGTGTGGATATACATATCGAG ^bal TSpuEl AATTCTAGATAAAATATATAGAAATGGATCTTGAGAATCATTTTTTTTGTATTCTTTTGTTATCAAAGGGT"T:GACT"TGC"CCGAGGAAGAAGA"AATATGAAAAGAGCTTTTTAGGGTTTATCATTCTCCT ttaagatctattttatatatctttacctagaactcttagtaaaaaaaacataagaaaacaatagtttcccaaagctgaaacgaggctccttcttctattatacttttctcgaaaaatcccaaatagtaagagga h K r a 1-uorf- 1. ■ loc sequence ■ rho gamma * - .7 7» a <- - 75 ttttttcgatctcAGat C .973 7 16 :": ooc 0 (' c c 7 [5 1) ft <- - 1 74 attatttttctttAGtt C .999 14 86 :": 0 (' c c i [5 1) a - - 500 gattttgttgtttAGtc c .977 4 8 l: 0 000 7 :'-; 1 1) a <- - ■k:: tctgttattgtatAGct c .986 = 56 0 0 000 7 [5 1 1) a <- - 3 48 tattttttgaaatAGat c .968 6 80 7 0 c c c i ;5 1 1) a <- - _l 51 caatttatttttaAGaa c .93C .-. 19 0 000 1 : 5 1 1 a <- - 1213 ttatttattttttAGtt c .998 12 14 :": 0 (' c c i [5 i; a <- - 1373 tttcctctctcacAGga c .999 17 :": 0 (' c c i : 5 i; a <- - 1487 tttatatattgatAGtg c .883 •i 04 l: 0 ( c C 1 [5 1 l) a <- - 1591 atgtgttgcttgtAGga c .982 : 03 0 0 000 1 ;5 1 i a <- - 1781 ggttgtgcgaaatAGgg c .886 4 10 :: ooc 0 000 1 : 5 1 i; ft <- - 2440 taattaaaaatttAGat c .939 - 4 6 0 c c c i :5 i; ft <- - 2479 satctaaaattttAGat c .942 59 :": 0 c c c i :5 i; D > 2546 aagGTagta :: ■ 4 61 0 _ 903 15 : 5 ':. 5: a <- - 2572 ttttttttttggcAGca :: .93C 16 ;; 0 c c c 7 [5 1 1 a < — - 2763 ~tcaaattcacaaAGgt c .873 3 86 :: 185 0 000 11 : 5 i; a < — - 2782 tttcgttttcattAGcg c .952 - :: 22C 0 000 11 : 5 5 i; ft < — - 3022 tttgtttgtactaAGct c .956 6 16 :": 221 0 c c c 11 5 i; ft < — - 3048 ~tttgcaatacatAGga c .973 15 l: 229 0 c c c 11 :5 5 i a <- - 3171 =gtcgtcatttatAGta :: .988 ■ 74 0 0 c c c 7 [5 1 li a <-- - 3284 cttttgttatcaaAGgg 0 .993 10 03 0 000 0 006 8 1 5 1 21 > 3372 aatGTaagg 0 933 5 28 855 1 849 15 [5 5 a < — - 34 :1 aatgettcctcgtAGaa 0 .916 4 77 0 293 0 065 12 [5 5 2] a - - 3581 cga tcgccgttctAGgt 0 850 3 47 0 000 0 000 [5 - 1) Lj 3649 cacGTatta 0 .933 5 25 0 000 1 848 li i 5 1 5) .-. - - 3695 a - 4254 attattgttcttcAGat 3 17 82 0 ■' 002 8 (5 1 2 1 a - 4351 tttcttacattgcAGaa c .991 9 47 ooc 0 (' c c V [5 1 a <- - 4633 gtcttgtttctttfißgg c .879 3 97 0 I c c 7 1 1 a <- - 4976 crttgttgtttctcAGct c .952 5 96 0 • C C 7 [5 1 1 a <- - 5004 ttttttttttgccAGag c .996 __ 17 7 0 C C C 1 [5 1 1) u — —> 5356 caaGTgaat c .821 3 04 :■: 387 0 (' c c 11 : 5 1 d — —> 5384 ttgGTaaga c .941 .-. ,4 :■: 0 C9C 13 [5 5 a - - 5403 actctgtttctttAGct c .894 4 26 0 000 7 :':; 1 1] a < — - 5441 ;tttctctctaacAGaa c .995 :c 4 >. 387 0 000 11 : s 5 1 a < — - 54 12 ttgttaaaattacAGct c .965 6 62 7 0 090 12 ;5 >:. 3) c > 5745 gcgGTaaga c .991 3 li- 7 99C _ 956 15 [5 5; a < — - 5808 ~atcatatcctaaAGgt c .948 : SS :■: 458 0 (' C C 11 75 5 1 a < — - 6135 ggtctattattatAGgt c .999 13 .'- 9 :": . 0 (' ; C 12 75 5 2; a <- - 6552 ggattttcacctcAGag c .938 5 47 0 ( c C 7 [5 1 i] Hc:l -0277-" Hpal T(^TTTCtAAM:eTSAMieTAtee:teTT-WTCCT-eTACTT-eTTCCT-TTT*TACeTA-CÄ-TCCTACMT«ST-*AC*ATeCT-C:TCeTteAA ———I———+————---1—-———■————*---1———-1———-—+-—-————--1——E~—~———-'--1-'———h—— 3484 ACTGAAACGTT-TGCA:iMAC.VTCCiTGAAACTAGCAACATGAA\\A,v^ EcolCRI J peel ^vul GACTGAGCTCTTTTCAGTGGCTTC7TTGCAGCAGCTTCTTCCTTGGAGGACTAATCAAGACAGAAATCTGTTCCTCTAAAAACGATCGCCGT"0TAGG1AATCTTGCCATTCTTGACGAGTCTTGATCTT7AGA CTGACTCGAGAAAAGTCACCGAAjAAACG~CG~CjAAGAAGjAACCTCCTGATTAGT~CTGT ZT~TAGA-AAGjAGAT~TTTGCTAGCGGCAAGATCCATTAüAACüGTAAGAACTüCTCAGAACTASAAATCT hLLLL TBPTTR ^■sil pssSI jAsel atcaaatttataat,ggatca:ga7,a~a:acgtattaattatta~tt~ttt~tt~tttgctttttgtggtt -. i-m-^-^-^-,-^-h-~-h-^|-- i i ■ i i-—--..... "a3t"taaatattccctagtgctctatgtgcataattaataataaaaaaaaaaaaaacgaaaaacaccaata" ta™Wea rngB VII: ttcactcaaatgatggt3aaa3ttacaaagcttgt3gcttca:g_:caattgtggt: --k~-——h-——--h-.---.--1-—i— 3752 tAAGTGAGT"TAC"A:CA:T"T:AATG"TTCGAACA:CGAAGT3CA3G"TAACACCA3 mmvkvtklvasrf TTTTGCGTC:TGGTAATTCTGC_TTCTTTCTTCTAAATTATACGATGATTCTACATTTCTACTCATC7CGTTCT7GTTTTTCAAATGA7ATAATTATTG7GTGTATATCACCCATTCATG7ATATTTATTGAAA .......i.........i ■ ^-1--h-^--I-~-1 i ■ i i-.....i.........i......... 1 1-'----h—--~—h—.......i.........i...... AAAACGCAGGACCATTAAGAtGAAAGAAAGAAGATTTAATATGCTACTAAGATGTAAAGATGAGTAGAGCAAGAACAAAAAGTTTACTATATTAATAACACACATATAGTGGGTAAGTACATATAAATAAC"TT -e>:on 4 -f g v l | psml pglll |BspEI AATA_A3GCAT_C:TG3TGGTTGTTTTCGAGTGCATTTGGATCTCAAATTGGCGAACAACAACGGAGAACCTAG7CAAAGAGGTCGCT7CATTTACCGAAGA7CTCCGGACAAGTC7AGT7TCGGAGATTGAAA ttatatccgtaaggaccaccaacaaaagctcacgtaaacctagagtttaaccgcttgttgttgcct:t"ggat:agtt"ctccagcgaagtaaatggcttctagaggcgtgttcaga"caaagcct:taac"tt .aflvvvfeciwisnwrtttenlvkevasftedlrtslvs 4020j Splicing Site Prediction Programs for splice site prediction (specifity approximately 35 %) □ GeneSplicer (http://www.tigr.org/tdb/GeneSplicer/gene spl.html) □ SplicePredictor (http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi) □ NetGene2 (http://www.cbs.dtu.dk/services/NetGene2/) investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky NetGene2 CENTERFO RBI OLD Gl CALSEQU ENCEANA LYSli CBS eaa ...I I?: IT; n." » >^ NelGcnc2 NetGene2 Server The NetGene2 server is a service producing neural network predictions of splice sites in human. C. elegans and A. thaliai Instructions Output format Abstract Performanc SUBMISSION Submission of a local file with a single sequence: File in FASTA format_ ® Human C C. elegans '._ A. thaliana [ Clear fields ] | Send file ] Browse.. Submission by pasting a single sequence: Sequence name Z) Human O C elegans (•Ja. thaliana Sequence GAGGAGGCACAAAATGACGAATATACAAAATGATCTTAAACAGCTAAACTATATTGGACATTTTTTCGATC TCAGATATA AAAGATTTCATTCAATATAATACTTGGATAAATACTCTTATTATTTTTGTTTAGTTTATTAAAAAAAACCT CTAATAAAT ACGAGTTTAAGTCCACAAAATCGCTTAGAGTAAAATACACCATATAATTTCAAACGATAAAGTTTACAAAA | [ Clear fields ] | Send file ] NOTE: The submitted sequences are kept confidential and will be erased immediately after processing NetGene2 Prediction done n i t t - - * *** NetGene2 v. 2.4 ** The sequence: Sequence has the following composition; Length: 9490 nucleotides. 31.8% A, 17.04 C, 19.6% G, 31.74 T, 0.0* X, 3< 1,5% G+C Donor splice sites, direct strand pos 51->3' phase :1 can confidence 5 f exon intron 3 1704 0 + D. . 87 TTCCAAACAC "GTTAATATTT 1906 0 + 0 . 99 CGGTGAACGGAGTCAGAACAT 3582 1 + 1. 00 GCCGTTCTAGAG.AATCTTGC 3765 1 + 1. 00 TTGCGTCCTGAG.AATTCTGC 4134 c + 0 . 74 TCAAACACAGAGTTGTTAAAA 4619 : + D. .74 AGCAAGAAAGAGTCTTGTTTC 4915 0 + D. . )4 CGTTCCTCTGAGTAAATACTG 535 0 + D. . 87 TCTCAACCAAAGTGAATGTTT 5384 : + 1. . 00 GATTTGGTTGAGTAAGACTCT 5809 _ + _ . 00 TATCCTAAAGAGTGTGTCCAA 6057 0 + 1 . 00 GCAGTCTTTGAGTAAGCTACT 6096 1 + 0. .74 CTCTTCACAAAGTAAATCTAG 7369 c + 1. . 00 GGACTGCCAAAGTAAGTTTAA 7886 c + D. .74 GAACAAAATGAGTTAGATGAA 9323 c + D. . -4 GAAGATTAGGAGTTTTTCTCT Donor splice sites, complement strand pos 3 T->5' pos 5'->31 phase strand confidence 5' Acceptor splice sites, direct strand exon intron >s 5'->3' phase strand confidence 51 intron exon 31 1213 0 \ 0. TMTTTTTAGATTATGGAGAC 1221 2 ■ 0. .87 AGTTATGGAG'ACAAGAATCG 1373 C 0. 7: TCTCTCACAGA GACACAGAAT 1487 1 + 0. .81 ATATTGATAGATGGGACATTA 4254 0 + 1. .00 TGTTCTTCAGAATCGCACCAT H 4832 2 • 0. .54 AAAATTGCAGATTCCAGTGGC 5004 C + 0. ■I TTTTTGCCAGAAGATACACAC 5472 : ■ 0. .96 AAAATTACAGACTCTGCTCAA 6135 c ■ 1. 0 0 ATTATTATAGA GTAAGATTAA H 6490 l + 0. .-n: AAAGTTACAG"TGGTGGAGAA 6744 c ■ 0. .59 TGTCAAACAGATTTCGTAGAG 7447 c i 0. .96 TTCTGCACAGAATGCCAGAAA 7780 2 0. .76 TCCATTTCAG"ATACAGAACA 7786 2 ■ 0. ■i2 TCAGATACAGAAACACATGCA "CGAATGGCTGAGATATTGTTTCCTAAAATGAGATGATTGT TT_TA_TTA_TACCATGATTTjT_T3TAC!TAAjC_TCCTTTCCCCT T TGCAATACATAGGATATAAAT TCATACATGT TCCTAATT T~AT~T T 2GCTTACGGACTCTATAflCAfiAGGATTTTACTCTAC~AACAAAAfiTAAATAATGGTACTAAACAAA^A"jfl~T GGAAGGAAAGGGGAAACGT~ATG~ATCCTA~ATTTAflGTATGTACAAGGATTAfiAATAAAA TGCACTTGAGTTTATGGTTTTCTTTGGTGGAAGATC"ATAT jTaT 2"ATA"CTflTAT~AT TTTACT;T"T TCT TCGTCGT 2A"T TATAG"ATAT TA~ATATATGCACA;ACA;ACACAC^TATA~GTA"AGCT 2 ACGTGAACT!! AAATACCAAAAGAAACCACCT TCTAGATA~ACA~AGATATAGA~atAATAAAATGAjAAAAGAAGCAGCAGTAAA~ATCATA~AATATATATACGTGTGTGTGTGTGTGGATATACBTATCGAG AAT TCT AGAT AaAATAT A~AGAA.ATGG.ATCT TGAGAATG at T T~TT~TjTaTTCTTT~GTTAH lGT~T l"G A CT~T GC~CCGAGG AA jAAGA~A.ATATG AAAAG.AGCT TT~TAGGjT~TA~CAT~CTCCT "taagatctat"ttatatat:t"tacctagaact:t"ag_aaaaaaaala aagaaa.acaa ., ;tTTJ^:aaagc"gaaa:gaggct:c"t:_t:tattatac"tttctcgaaaaatcc:aaatagtaagagga ~GAC~T TGCAAAA^GT GAAATG~AAGGCACT T~GATCGT"GTACT T~GT TGCT~T T TATACGTA~CjC~TCCTACAATAAGT~AACAATGCT~CCTCGTAGAA~ T jCAAAACAT~TG~GjACCG~GAT~TAG AT aCTGaAACGTT-T2CAl*T-TACaTTCCGTGAAaC TAGCAaCATGAAaCAACGAaAAA" ATGCAT aG^Ca.AGGATGTTATTCAATTGT T ACGAaG jA.GC ATCT TAA^GTTT" GT aa.AC AC ^TGGCaCTAaATGT.A -exon 2 — EcolCRI jAG TGAjCTC T~T TCAjTGGG T~^T~T SCAGCAGG T~CT~CG T~GjAGGACTAATCAAGACAGAAATC"jT~C,G TCTAAAAACGA~t. jCCGT~C ^TGACTCGAGAAAAGJC ACCGAa jAaAl'G'CG'CGaaGaaG jAAC ^~C;TGAT-AGT-CTGT;T-TAGal,AaGGAGAT~TT TGCT aGl*GGCAaGS' . C"TjCCAT~CTTGACGASTCTTGATCTTTAGA TtGAACGGTAaGAACTGCT;AGAACTAGAAA"CT F^sll pssSI ^sel Hiindlll plfel AT:AAATTTATAAGGGATCA:GAjA"A:ACGTATTAATTATTA"TT"TTT"TT"TTTGCTTTTTGTGG"TA"ACAAGTTGACTCAAATGATGGTGAAAGTTACAAAGCTTGTGGCTTCACGTCCAATTGTGGTC "AGT"TAAATA"T:CCTAGTGC":TAT^TGCA"AAT"AA"AATAAAAAAAAAAAAAACGAAAAACACCAATATGTTCAAGTGAGTTTACTACCACTTTtAATGTTTCGAACACCGAAGTGCAGGTTAACACCAG ttttgcgtccII CT GC~T TCT TTCT TCTAAAT TATACGATGATTCTACATTTCTACTCATCTCGT~CT TGT T~TTCAAA~GATATAAT TA~TjTGTG~ATAT ^ACC^A~TCAT GTATA~T TA~TGAAA iAAACGwAG^,"- ;a:GAAAGAAAGAAGA aa A T. AC A AG A AAAGA GAG A i AG ~ a A G A AC a A A A.AG 1 1 a a a A lA^.CAC; A a; GGG ' A'l a:A A ' AA iAC F c v L ' aata"asgcat"c:tgstggttgtt"t:gagtgcat"tggatc"caaattgg:gaacaa:aa:ggaiaa:c"astcaaagaggtcgcttcat"taccgaagatct:cgiacaag":tagtt":ggasa"tgaaa ttatatccgtaaggaccaccaacaaaagctcacgtaaacctasagtttaaccscttg~tsttscct;t~jGat;agtt~ctccagcgaagtaaatggcttctagaggcctgttcagatcaaagcctctaacttt .aflvvvfec i w i snwrtttenlvkevasf tedlrtslvse ie RNA Splicing and Adaptation ■ Flexibility in splicing site recognition in plants in practice -example of developmental plasticity of (not only) plants Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exoAnNI TdsrnT AlwNI Bpml PflMI Asel Psil Spel Bell II II II CTGCGAATTACAAAGITGriATlGlCmGATCCr^ .........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I GACGCITffilGITrCAACAATAACAGAACmSGATITffiC^^ ^^^^^ RLVVVS. LVLI KVLYLQVC -PDR_UVb LsJ-no splicing- EXON 3-1— ■ E LVKLT GAKTH EAKIN I INDVNGI I K PGR -PDR exon 3 ORF- Pst I Pvull BspMI Hpal Stul WTCITCimnGTIGCřG^TRACACTGITGCnGGrCCrC^ ......I........TT........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I.........I'" 1653 ATAAGAAGAAQGACAACGTO^AATTGTGACAAaGAACCAGGAGGATCGAaX^ L F F L L L Q LTLLLGPP -no splicing- —pis1 DEL— -pis1 EXON-4- -pis1 introfi- ^CGKTTLLKALSGNLENNLK —11-pis1 exon 4 ORF- - GCTGTTGCAa - EXON 4- LTLLLGPPSCGKTTLLKALSGNLENNLK -PDR exon 4 ORE- -PDR L4- EVROPSKA UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání '^^É^i'7 MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost a .n a státním rozpočtem České republiky RNA Splicing and Adaptation Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exon Analysis by RT PCR proved the presence of a fragment shorter than cDNA should be after the typical splicing event PDR Ula/PDR LI 500 bp 400 bp 300 bp 200 bp 100 bp PDR Ulb/PDR Lib 500 bp 400 bp 300 bp 200 bp 100 bp wt pisl RNA Splicing and Adaptation Flexibility in splicing site recognition in plants in practice example of developmental plasticity of (not only) plants Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exon Analysis by RT PCR proved the presence of a fragment shorter than cDNA should be after the typical splicing event Sequenation of this fragment then suggested alternative splicing with the closest possible splice site in exon 4 AlwNI til PflMI Asel Psil Spel ^cll CT GCGAATT ACAAAGTT GT TAT TGTCT TG ATCCT AAATT GAATG CTCTT GTG TT TTC TATTT CT CCAGGAAC TGGTGAAGCT CACTGGT GCAAAAACACATGAAGCCAAGAT AAACATT ATT AATGATGTTAAT GGCAT TAT AAAGC CAGGAAG GTT AG TAG TT GTC TC CTAACTAGTT TTGAT CAAAGTTT TATACCT TCAAG TGT GC T GACGCTT AATGT TT CAACAATAACAGAAC TAGGATTT AACTT AC GAGAACAC AAAAG AT AAAGAGGT CCTTG ACCAC TT CGAGT GACCACGT TT TTGTGTACTT CGGTT CTATT TGT AATAATT ACT ACAAT TACCGTAATATT TCG GT CCT TC CAATC ATC AACAG AG GAT TGATCAAAACTAGTT TCAAAAT ATGGAAGT TC ACACG A LVLIKVLYLQVC -no splicing - ^spMI |Hpal TATT CTT CT TGCTGTTGCAGGT TAACACTGTTGCTTGGTCC AT AAGAAGAACGACAACGT CCAAT TGT GACAACG AACCAGG; LFFLLLQ LTLLLG -no splicing -1-pis1 DEL — - GCTGTTGCAa - L T L L L G CGKTTLLKALS GNLENNLK -pis1 exon 4 ORF - Hstl lr ACAAC TT TGT TAAAG GCCTT GT CTGGAAAT TT AGAAAACAAT CTAAAGGT TC TAATGATG AAAGC AG TTATATCATT TTCTT GTGAA GAT TT TTT TG CTG CAGCT GT GTG AAGTT TGTACCT TTT C TGTTG AAACAAT TTC CGGAACAGACCT TTAAATCT TT TGT TAGAT TT CCAAGATT AC TAC TT TCG TC AAT AT AGT AAAAGAACACTT CTAAAAAAAC GAC GT CGACACAC TT CAAACATGGAAAAG RNA Splicing and Adaptation ■ Divergencies at splice site recognition in plants in practice -example of developmental plasticity of (not only) plants Identification of mutant with point mutation (transition G—>A) exactly at the splice site at the 5' end of the 4th exon Analysis by RT PCR proved the presence of a fragment shorter than cDNA should be after the typical splicing event Sequenation of this fragment then suggested alternative splicing with the closest possible splice site in exon 4 Existence of similar defense mechanisms was proven in different organisms as well (e.g. Instability of mutant mRNA with early stop codon formation (> 50 - 55 bp before typical stop codon) in eukaryotes, see recommended literature - Singh and Lykke-Andersen, 2003 EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > 1/1 investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Identification of Genes Ab Initio ■ Programs for exon prediction □ 4 types of exons (according to location in the gene): initial internal terminal single □ Programs predict splice sites and they take into account the structure of the type of exon as well • initial: □ Genescan (http://genes.mit.edu/GENSCAN.html) □ GeneMark.hmm (http://opal.biologv.gatech.edu/GeneMark/) • internal: □ MZEF (http://rulai.cshl.org/tools/genefinder/) investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky GENSCAN The New GENSCAN Web Server at MIT Identification of complete gene structures in genomic DNA w i // to o) . .-. .-oOOo-[_)~o00o-. .-. .-. .-. .-. .-. .-. .-. .-. ..-. 1X1 I IX / I I |X| I I \ /I I |X| | |\ l\ I IXI I |\ /ll|X|||\ / I I IX I I I \ /ll|X||| \ /I I |X| | |\ /MIX / \ I I I x I I I / \l I I x| I 1/ \ I I I x I I I / \ I I I x I I I / \l I |X| I 1/ \ I I I x I I I / \ I I I x I I I / \ I I I x i i I / For information about Genscan, click here riis server provides access to the program Genscan for predicting the locations and exon-intron tmctures of genes in genomic sequences from a variety of organisms. 'his server can accept sequences up to 1 million base pairs (1 Mbp) in length. If you have trouble with le web server or if you have a large number of sequences to process, request a local copy of the rogram (see instructions at the bottom of this page) or use the GENSCAN email server. If your browse s.g., Lynx) does not support file upload or multipart forms, use the older version. Organism: i Suboptimal exon cutoff (optional): q iequence name (optional): Tint options: Jpload your DNA sequence file (one-letter code, upper or lower case, spaces/numbers ignored): r paste your DNA sequence here (one-letter code, upper or lower case, spaces/numbers ignored): GAGGA G G CACAAAAT GAG GAATATACAAAATGAT C T TAAACAGCTAAACTATAT T GGACATTTTTTCGATC TCAGATATA AAAGATTTCATTCAATATAATACTTGGATAAATACTCTTATTATTTTTCTTTAGTTTATTAAAAAAAACCT ~TAATAAAT ACGAGTTTAAGTCCACAAAATCGCTTAGACTAAAATACACCATATAATTTCAAACGATAAAGTTTACAAAA 3TAATATCC AAGT A T C TCATAG TCAACATATATATAGTAATAAT TAGTTGAC GTATAAGAAAAT AAAAAT AAATAAATTA 3TATCTTAT rTTGGGTGGTGCTGACTGGTGACTGGTGACTGCAGAATGCTCGGCAAATGGAACCATATCCCAAGACATGG 3TTTTAGAT AGAACAAAATAAGTGTCCGAAGGAATGATATTAAAAGTCAAATAGAATAATTATAAATATTGTAATTAGCA AATAAAAAG 'o have the results mailed to you, enter your email address here (optional): GENSCAN CENSCANW output for sequence CKI1 GENSCAN 1,0 Date run: 10-Nov-105 Time: 02:24:26 Sequence CKI1 : 9490 bp : 36.53% C+G : Isochore 1 (0-43 C+G%) Parameter matrix: Arabidopsis. smat Predicted genes/exons: Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr.. 1 . 00 Prom + 1497 1536 40 3 85 1.01 Init + 3708 3764 57 0 63 51 37 0 499 03 1.02 Intr + 3894 4133 240 0 327 0 713 17 32 1.03 Intr + 4255 4914 660 0 0 86 59 296 0 771 22 57 1 . 04 Intr + 5005 5383 37 9 0 1 70 91 343 0 772 31 41 1 .05 Intr + 5473 6056 584 2 2 38 99 582 0 722 50 76 1 . 06 Intr + 6136 7368 1233 0 0 58 108 655 0 977 56. 86 1 . 07 Term + 7448 7660 213 1 0 43 35 212 0 999 12 65 1 .08 PlyA + 7910 7915 6 -0. 45 2 . 03 PlyA - 7976 7971 6 -4 83 2 . 02 Term - 8793 8050 744 0 0 107 37 542 0 997 48 46 2.01 Init - 9253 8936 318 1 0 105 73 336 0 999 41. 18 Suboptimal exons with probability > 0 100 Exnum Type s .Begin . .End . Len Fr Ph B/AC Do/T CodRg p Tscr.. S.001 Init + 1867 1905 39 0 0 54 40 57 0 298 3. 74 S.002 Init + 2374 2442 69 0 0 55 95 -11 0 132 2. 40 S.003 Intr + 3894 4110 217 2 1 -3 -34 307 0 177 11 55 S.004 Intr + 4352 4914 563 0 2 75 59 338 0 187 26 20 S.005 Intr + 5005 5379 375 0 0 70 8 335 0 212 22 99 S.006 Intr + 5442 6056 615 2 0 95 99 589 0 208 57 32 r r GENSCAN GENSCAN predicted genes in sequence 02:56:23 □ i i i.... i.... i 0.0 0.5 ..i....i....i.... i .... i .... i .... i .... i .... i .... i . 2.5 3.0 .v5 4.0 4.5 5.0 kb 5.1:1 I .... 1 .... I .... 1 i.5 6.0 1 .... I .... 1 .... I kb 6.5 7.5 Kev: Initial exon Interim! exon Termina] exon Single-ex on ye IK1 □ Optitiii.il exon Suboptimal exon Regulation of Translation • Splicing in Untranslated Regions - important regulation part of genes Translational repression by short ORFs in 5' UTR Identified e.g. in maize (Wang and Wessler, 1998, see recommended literature for additional info.) In case of CKI1 there was an attempt to prove this mechanism of regulation using transgenic lines carrying uidA under control of two versions of promoter (unconfirmed so far) m k r a f . ATGaaaagagcttttTAG ATGatggtgaaagttaca.... m k r a f . m m v k v t... ATGaaaagagcttttTAG ATGatggtgaaagttaca.... EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > i/i investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Regulation of translation • Functional purpose of splicing in untranslated regions - important regulation part of genes In case of CKI1 there was an attempt to prove this mechanism of regulation using transgenic lines carrying uidA under control of two versions of promoter (unconfirmed so far) BamHI GAGGAGGCACAAAATGACGAA -//- TGTATTCTTTTGTTATCAAAGGGTTTCGACTTTGCTCCGAGGAAGAAGATAATATGftQGATCCCCCGGGTAGGTCAGTCCCTTATGTTACGTCCTGTAGAAACCCCAACC (M)RI PRVGQSLMLRPVETPT -2739 GAGGAGGCACAAAATGACGAA-//- GTTATACAAGTTCACTCAAATGATGGTGAAAGTTACAAAGCTTGTGGCTTCACGTCGGATCCCCCGGGTAGGTCAGTCCCTTATGTTACGTCCTGTAGAAACCCCAACC MMVKVTKLVASR Rl PRVGQSLMLRPVETPT - intron I exon EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání mládeže a tělovýchovy pro konkurenceschopnost > i/i investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Gene Modelling Programs for gene modelling □ Those that take into account other parameters as well, e.g.continuity of ORFs □ Genescan (http://genes.mit.edu/GENSCAN.html) - very good foor prediction of exons in coding regions (tested for gene PDR9, Genescan identified all of the 23 (!) exons) □ GeneMark.hmm (http://opal.biologv.gatech.edu/GeneMark/) □ GlimmerHMM (http://http://ccb.ihu.edu/software/gIimmerhmm/ EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání mládeže a tělovýchovy pro konkurenceschopnost > i/i investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky GeneMark GeneMark™ A family of gene prediction programs provided by Mark Borodovsky's Bioinformatics Group at the Georgia Institute of Technology, Atlanta, Georgia. What's New: - November, 2005 Supported Prokaryotes: predicted by NIH gene database. Prokaryotes: models for GeneMark and GeneMark.hmm. Gene Prediction in Bacteria and Archaea For bacterial and archaeal gene prediction, you can use the parallel combination of the GeneMark and GeneMark.hmm programs here. If the DNA sequence of interest belongs to a species whose name is not in the list of available models, you should use either the Heuristic models option or, if the sequence is longer than 1 Mb, generate models with the self-training program GeneMarkS. Both options will allow you to generate models and then to use GeneMark.hmm and GeneMark in parallel. Gene Prediction in Eukaryotes For eukaryotic gene prediction, you can use the parallel combination of the GeneMark and GeneMark.hmm programs here. Gene Prediction in EST and cDNA To analyze ESTs and cDNAs, please follow this link. Gene Prediction in Viruses ztfmb F°r víra I gene prediction, or to access our mE» virus database VIOLIN, please follow this link. What the programs do: Borodovsky Group Gene Prediction Prog rams • GeneMark • GeneMark.hmm • Frame-by-Frame • GeneMarkS • Heuristic models Statistics • Documented GeneMark.* usage Help . References • Papers . FAQ • Contact Databases of predicted genes . ProkaryotesNevv! • Viruses/Phages (VIOLIN) Bioinformatics Resources • Links Bioinformatics Studies at Georgia Tech • MS Degree Program • PhD Program • Lectures . Seminars . Center for Bioinformatics and Eukaryotic GeneMark.hmm^1|2' jM°iithLLEMe) References: ^Borodovsky M. and Lukashin A, (unpublished) zLonnsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M., "Gene identification in novel eukaryotic genomes by self-training algorithm" Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506 Accuracy comparison UPDATE October 2005. Added pre-built models of eukaryotic GeneMark.hmm ES-3.0 (E -eukaryotic; S - self-training; 3.0 - the version) Listing of previous updates Input Sequence Title (optional): 9_ fl EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost > [ČKŤT Sequence:^ iittittcartcaiattciíiiajgttitttcjttttcjttjgigíccíttítitcjjctttcttjitgaaSctttitťtcttctit gtgaiJtctajttaajactJttttcgtgttatattgjt^ttajjjatgaajatcttttggtttttatgttt aatcattttcatgagtat agatttaagfctaaaactaatatccgaat gcctgagatatfc gfcťfceetaaaatgagatgatt gtttttaťfctattaccatgatttgfcttgt ctttcccctttgcaatacataggatafcaaattcataeatgttcetaattttafcttttgcactt jagfettatggttttefcttggfcggaaga tctatatctatattattttactcttttcttcgtcgtcatttatagtatattatatatatgcacacacacacacacctatatgtatagctc aaaatatatagaaa.tggatettgagaateattttttttgSattettttgfetafceaaagggtttcgactttgítccgaggaagaagataat ctttttagggtttatcattetccttgaetttgcaaaacgfcgaaatgtaaggeaetttgatcgttgtactttgttgctttttatacgtatc at aa gttaacaat gctt cefce gtagaattgcaaaaeattfcgtggaec gtgatfctacatgact jagctcttttcagtggettcttfcgeajc rtggaggactaatcaagaeagaaatcfcgttcctctaaaaacgategccgttefcajjtaatctt gccattett gacgagfccttgatcttta rataagggateacgagataeacgtat*aattattattttfcttttt*tttgct*tttgtggttatacaag*tcactcaaAT&ATWT rTtT&KTTCACCTCCňňTTtT&&TiľrTTT&C&TCCT>aattctgctttcfcttcttctaaattatacgatgattctacatttctac-tc z gttttt c aut jat at aatt att gtgt gt atat c ac c catt c atgt at attt at t gaaaaat atag&C ATT C CT>G-GTTGTTTT C &A ATCTCAAATTG&C&AACAACAAC&G-A&AACCTAGTCAAA&AGíTCGCTTCATTTAC^ jAAAATTTACATAT&CCAA&ACAAACTTATCTAC&ATC(^TTTAK&A&A&TTATA& A AC AC Laactaatt acab aaattí att ctt agfct att atctt gttat at aacatt aacb at a at r gtt gHrt qHrt gtt att attgtt ctt cagAT C « AC C ATT&TT &TTT &T A&CTT ^ j&TCTCATGTTTTCTTACATT&CA&AATCAAACACAA&T&TC&CTGTTTTTKCAATTCCTC&^ ^AACCCT&&ATCAtTTAACTG&TC&TCTTAAC&&&AACTCAACGAAATCTCA^^ T AACT AC ACT AC AtC CTTTGT A&&AAC GA&CTT&&GA&ľ^&AAGAT AAC&AGA^ rCTTTAŕ&G-TTTCC&&TTAAGACTTTňňCC&AňG-TTTT&AňCAG-TTTG-ňňTCTňCACGGC&ňAG-AíCTTTACATG-T&GňCAňňGGňC&G-C TT C &T r^A&&TT C ACT&AAT G ATT CTTT CTT CAT CT C C AAT&&CTC&ATTT «T^ rTGCACTTCCň&TG-&CTňCGAG-&T&&ňGňTCAňňňGňTTAň&ňTACCAň&CTTTTT&CTCTG-TTŕirT&ňňCTTTC&&HG-TTCCTCT> acat attt c act tt gatgcagt aaaaat g c ategactt gfctgtttct c agctt ctt ccaat ggtttttttttt gccagAG-AT AC AC ACT C ^CAAAWA&&A&CAACAC«ATCAAKACCAA&C&GAAAA&&CAAAATATCAACTTATT&^ rG-T&T5G-TTTňTGAT&CňAGCAňCňň&GňG-ň&ň&ňTGCATÄ,T&CCTG-CňňCCCTG-ATňňňCCAAŕir&&ňňHG-ňCňCAACAň&CT&ň&AC Sequence File upload:e Species :0| Athdiana ES-3.0 Model description Output Options Email Address: (required for graphical output or sequences longer than 400000 bp)G I b£] Generate PDF graphics (screen) H Generate PostScript graphics (email)* n Print GeneMark 2,4 predictions in addition to GeneMark,hmm predictions* □ Translate predicted genes into protein* 1 Run Default I Start GeneMark.hmm | LÁVÁNÍ ...nancovana Evropským sociálním fondem a státním rozpočtem České republiky GeneMark Result of last submission: View PDF Graphical Output GeneMarkhmm Listing Go to: GeneMarkhmm Protein Translations Go to: Job Submission EuJtariofcyc G-erueElArk . hnun version, bp 3.9 ^>zil £5, £008 Sequence najne : CKI1 Sequence lsri.gt±i: 5043 bp fyt-C content: 38.79* Eta.tr ices file: /honni/genjinark/ 4uJí_ghjiň. matr icts/ atb-a.1 i ana_hjYun3. Ornod Thu Oct 1 11:09:24 £009 r r r1 r r„ r r iBpuEI I6* r r Predicted gcries/ «cutis Gene Eicon. Strand Eh on. Eicon Raxigs Eicon Start/End It S Typt Length. Frame 1 1 + lni* ial 963 ľ . ' 57 1 S - - 1 2 :.....-. 1155 1394 240 1 J Iivfc c r ni 1 1516 £175 £60 1 4 + Internal ZZ66 £644 379 1 S + Int e indl £"734 3317 584 1 5 + Int e ihaI 3397 45£9 1233 1 T + Terminal 4109 49£1 £13 r r Wir EVROPSKÁ UNIE N ER MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > /zdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky GeneMark Result of last submission: View PDF Graphical Output GeneMarkhmm Listing Go to: GeneMarkhmm Protein Translations Go to: Job Submission EuJtariotyc GeneHark . bnran version, bp 3.9 ^>zil Z5{ £008 Sequence mine: CKI1 Seq-uence lengtli: 5043 bp G+C content: 3S."J3* Eta.tr ices file: /honňe/gerairiark/ euJí_ghjiň. matr icts/ atb-al i an.a_hjYuiri3. Ornod Thu Oct 1 11:09:24 £009 GeneMark.hmm prediction Thu Nov 10 03:23:47 EST 2005. Order 5. Window 96. Step 12, 4/6 Predicted genes/ exons Gene Ex on Strand Ex on S S Type Eicon Range Ľ son Length. St art/End frame Genomic Homologies ■ Searching for genes according to homologies with known sequences ■ Comparison with EST databases □ BLASTN (http://www.ncbi.nlm.nih.gov/BLAST/, http://workbench.sdsc.edu/ ■ Comparison with protein databases □ BLASTX (http://www.ncbi.nlm.nih.gov/BLAST/, http://workbench.sdsc.edu/ □ Genewise (http://www.ebi.ac.uk/Wise2/) They compare protein sequence with genomic DNA (after reverse transcription), therefore the aminoacid sequence is needed ■ Comparison with homologous genome sequences from related species □ VISTA/AVID (http://www.lbl.gov/Tech-Transfer/techs/lbnl1690.html) investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ruivvdiu di iu fxcvcioc oci idioo /-\|j|jiu< Differences between the approaches used Tor loeniiTicaiion ot genes ano tneir Tunciion Identification of Genes Ab Initio ■ Structure of genes and searching for them ■ Genomic colinearity and genomic homology EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > i/i investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Genomic Colinearity Genomes of related species (despite large differencies) are characterized by similarities in sequence organization -> possibility to use this information for identification of genes in related species when searching in databases General scheme of work while applying genomic colinearity (also called ..comparative genomics") for experimental identification of genes in related species: □ Mapping small genomes using low-copy DNA markers (e.g. RFLP) □ Using these markers for identification of orthologous genes (genes with the same or similar function) of related species □ Small genome (e.g. rice, 466 Mbp) can be used as a guide: molecular low-copy markers (e.g. RFLP) bound to gene of interest are identified and these regions are then used as a probe for searching in BAC libraries during identification of orthologous regions of large genomes (e.g. barley: 5 Gbp, or wheat. 16 Gbp) Genomic Colinearity 140 kb A 20 kb Maize (2500 Mbp) Rice (400 Mbp) B 50 kb Hexaploid wheat (16 000 Mbp) I Barley (5000 Mbp) □ Rice (400 Mbp) c High gene density Feuillet and Keller, 2002 1 Mb Genomic Colinearity Can be mostly used for the species of grass (e.g. using related genes of species of barely, wheat, rice, maize) Small genome reorganizations (deletions, duplications, inversions, translocations smaller than a few cM) are then detected by detailed sequentional comparative analysis During evolution there's occured some divergencies in related species, mostly in non-coding regions (invasion of retrotransposons etc.) =■=□=»0 Maize I25IW vriipi Hexaploid wheal (16000 M hp} Burlcy (5000 Mbp) Rice (400 Mbp( EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání mládeže a tělovýchovy pro konkurenceschopnost > INVESTICt uu KUíivujt vz.utLMVÁNi Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Genomic Colinearity Genomic colinearity of HOX genes in animals ■ Transcription factors controlling organisation of body in anterio-posterior axis Position of genes in corresponds with spatial during development genome expression Interspecies conservation +-■-_1-1—nn- LJpl-^i,-h> C.lpf-lDtf^ 11 no i nn Cap ir Cajl-So Capi-lind 1 VI mo PG9-14 Posterior Cdx Ljuprvmna t:-: Capitols IojU * r-ibJiu"i Sb» Tu!.1.11,1 h!::.A h'Waa|MedPost - Fticasanna Haia „, n — Brnnc,_iccljaTa Hc*B Brarx?iwt7lwTW HovA - BiaiTy.ipagy^JtoK? — Uao:tBllfl Anta-^- - [^AjntpT rnbafixn anb - ■ l.i'.:.'.t;ii;.,I 1— Eup-virrBnilp E Cool ■-TriboiuipScr - Hvrrsig.liHf v'. rr- c'opTtdtu Itr1 - , . . ..... no srr 3 - ■nMHophmi - Cap icHU 5u 5r. T.uuH^lttiiBl k^nlüma Hösel — fi-anihuslcma Evk Trbouum rmi - fl.1 Eve i Mox PG8 PG7 PG6 PG5 PG4 Xlox Gsx PG3 PG1-2 Anterior Central Outline Structure of oenes and searchino for thsm ■ Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání mládeže a tělovýchovy pro konkurenceschopnost > i/i ~Á- ^^^^^jfl investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Methylation Filtration ■ Preparation of gene-enriched libraries by technology of methylation filtration ■ genes are (mostly!) hypomethylated, noncoding regions are methylated ■ using bacterial restriction-modification system, which recognizes methylated DNA with restriction enzymes McrAa McrBC McrBC recognizes methylated cytosin (in DNA), which comes after purine (G or A) □ For cleavage the distance of these sites 40-2000 bp is necessary investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Methylation Filtration Preparation of gene-enriched libraries by technology of methylation filtration Scheme of work during preparation of BAC genome libraries using methylation filtration: □ preparation of genomic DNA without addition of organelle DNA (chloroplasts and mitochondria) □ fragmentation of DNA (1-4 kbp) and ligation of adaptors □ preparation of BAC libraries in mcrBC+ strain of E. coli □ selection of positive clones Limitied usage: enrichment of coding DNA only approx. 5-10% investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Outline ruivvdiu di iu fxcvcioc oci icuuo aa|J|jiu< Differences between the approaches used Tor luentiTication ot usnes ano tneir Tunciion V Í3 ic homology Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology ■ EST libraries EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost > investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky EST Libraries Preparation of EST libraries Isolation of mRNA Reverse transcription Ligation of linkers and synthesis of second cDNA strand Cloning into suitable bacterial vector Transformation into bacteria and isolation of DMA (amplification of DNA) Sequencing using primers specific for used plasmid Saving the results of sequencing into public database cctacgattatacccccaa ggatgctaatatgggggttatacaagtgtt jjttttttit: Základy genomiky II, Identifikace genů Outline ■ Forward and Reverse Genetics Approaches ■ Differences between the approaches used for identification of genes and their function ■ Identification of Genes Ab Initio ■ Structure of genes and searching for them ■ Genomic colinearity and genomic homology ■ Experimental Genes Identification ■ Constructing gene-enriched libraries using methylation filtration technology ■ EST libraries Forward and reverse genetics investice do rozvoje vzdělávání Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky Discussion íl EVROPSKÁ UNIE MINISTERSTVO ŠKOLSTVÍ, OPVzděiávání MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost > 1/1 ~Á- ^^^^^jfl in a* investice do rozvoje vzdělávaní Tato prezentace je spolufinancována Evropským sociálním fondem a státním rozpočtem České republiky