Zaslání sekvence DNA do primární databáze GenBank/EMBL/DDBJ Nejdůležitější databáze sekvencí nukleových kyselin a proteinů • V každém ze tří hlavních bioinformatických center je spravována genomová databáze sekvencí nukleových kyselin a odpovídajících, z nich přeložených proteinů. - EMBL Nucleotide Sequence Database (v rámci institutu EBI)- 1980 - GenBank (v rámci institutu NCBI) - 1982 - DDBJ (The DNA Data Bank of Japan) -1984 • Tři samostatné báze vznikly v důsledku potřeby rychlé dostupnosti databáze sekvencí na jednotlivých kontinentech v době, kdy ještě nebyly rozvinuté vysokorychlostní komunikační sítě. Mezinárodní spolupráce sekvenčních databází Databáze sdílejí stejná data NIH NCBI ENTREZ GenBank NIG CIB Get Entry DDBJ EMBL EBI SRS EMBL Divize GenBank https://www.ncbi.nlm.nih.gov/qenbank/htas/division ftp://ftp.ncbi.nlm.nih.gov/qenbank Resources® How To® GenBank Nucleotide ] GenBank ~ | Submit ■* | Genomes -r WGS HTGs - EST/GSS -r Metagenomes ▼ TPA TSA - INSDC GenBank Database Divisions GenBank divisions are divided into two general categories and were described in an (Genome Research (1997) 7(10)) article by Ouellette and Boguski; the full-text article is available fDatabase Divisions and Homology Search Files: A Guide for the PerplexedV The "Organismal" category includes databases pertaining to sequences derived from specific organisms and the "Functional" databases pertain to different types of sequence data being collected. Sequence records exist only in one GenBank division. For example, the HTG division includes unfinished sequences (phases 0, 1, and 2) being generated from several different organisms. As a sequence is updated to phase 3, it is moved into the appropriate organismal division. For instance, human phase 3 (finished) HTG sequences are located in the PRI division. The GenBank divisions listed here represent the location of the annotated sequence records; for homology search purposes the records are reformatted and stored in the BLAST databases. The different database divisions currently available, as well as the related BLAST database, are listed below. An example of a submission (one accession number) that has progressed through phase 1, phase 2, and phase 3 is available I Examples'). Organismal Divisions: HTGs Resource About HTGs Submitting HTGs Processing HTGs HTGs FAQ Database Division BLAST Example BCT Bacterial sequences nr, month PRI Primate sequences nr, month Human Phase 3 ROD Rodent sequences nr, month MAM Other mammalian sequences nr, month VRT Other vertebrate sequences nr, month INV Invertebrate sequences nr, month Drosophila, C. elegans Phase 3 PLN Plant and Fungal sequences nr, month Arabidopsis Phase 3 VRL Viral sequences nr, month PHG Phage sequences nr, month RNA Structural RNA sequences nr, month SYN Synthetic and chimeric sequences nr, month UNA Unannotated sequences nr, month Functional Divisions: Database Division BLAST Example EST Expressed Sequence Tags dbest, month STS Sequence Tagged Sites dbsts, month GSS Genome Survey Sequences dbgss, month HTG High Throughput Genomic sequences htgs, month All Organisms: Phase 0, 1, and 2 Identifikace záznamu v primárních sekvenčních databázích • GenBank • EMBL-Bank (European Nucleotide Archive, ENA) • DDBJ • Přístupový kód (Accession Number) • číslo Gl (GenBank Identifier) LOCUS AY870395 553 bp DNA linear BCT 30-JAN-2005 DEFINITION Macrococcus brunensis strain CCM 4811 60 kDa chaperonin (cpn60) gene, partial cds. ACCESSION AY870395 i VERSION AY870395.1 GI:58119461 LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL REMARK COMMENT FEATURES source AY182241 1931 bp mRNA linear PLN 04-MAY-2004 Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. AY182241 AY182241.2 GI:32265057 Malus x domestica (cultivated apple) Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. 1 (bases 1 to 1931) Pechous,S.W. and Whitaker,B.D. Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit Planta 219, 84-94 (2004) 2 (bases 1 to 1931) Pechous,S.W. and Whitaker,B.D. Direct Submission Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA 3 (bases 1 to 1931) Pechous,S.W. and Whitaker,B.D. Direct Submission Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA Sequence update by submitter On Jun 26, 2003 this sequence version replaced gi:27804758. Location/Qualifiers 1..1931 /organism="Malus x domestica" /mol_type= "mRNA" /cultivar="1 Law Rome1" /db_xref="taxon:3750" /tissue_type="peel" 1..1931 /gene="AFS1" 54..1784 /gene="AFS1" /note="terpene synthase" /codon_start=l /product="(E,E)-alpha-farnesene synthase" /protein_id="AA022848.2" /db_xref="GI:32265058" /translation^' MEFRVHLQADNEQKIFQNQMKPEPEAS YLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 2 41 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt gene Tradiční záznam GenBank <>. Header y Feature Table Sequence Jak se data dostanou do databází? Předání dat prostřednictvím WWW portálu - Banklt (GenBank) • Submission Portal (https://www.ncbi.nlm.nih.gov/WebSub/) - Webln (EMBL/European Nucleotide Archive) • http://www.ebi.ac.uk/ena/submit - Sakura(DDBJ) • http://www.ddbi.niq.ac.jp/sub/websub-e.html Samostatná aplikace pro PC - Sequin • http://www.ncbi.nlm.nih.gov/Seauin/download/seg download.html - pro delší sekvence manuálně anotované - fylogenetické, populační nebo mutační studie obsahující sekvenční přiložení Tbl2asn - batch submissin - command-line program for MAC a Unix - automatizuje vytvoření záznamu sekvence - určený pro celé genomy, EST, STS a zaslání velkých dávek sekvencí https://www.ncbi.nlm.nih.qov/qenbank/submit/ % NCBl Resources© How To© pantucek GenBank | Nucleotide » GenBank ▼ | Submit ▼ | Genomes ▼ WGS ▼ 1 HTGs ▼ EST/GSS ▼ | Metagenomes ▼ | TPA ▼ | TSA ▼ | INSDC ▼ How to submit data to GenBank The most important source of new data for GenBank® is direct submissions from scientists. GenBank depends on its contributors to help keep the database as comprehensive, current, and accurate as possible. NCBI provides timely and accurate processing and biological review of new entries and updates to existing entries, and is ready to assist authors who have new data to submit. Receiving an Accession Number for your Manuscript Mostjournals require DNA and amino acid sequences that are cited in articles be submitted to a public sequence repository (DDBJ/EMBL/Genbank - INSDC) as part of the publication process. Data exchange between DDBJ, EMBL and GenBank occurs daily so it is only necessary to submit the sequence to one database, whichever one is most convenient, without regard for where the sequence may be published. Sequence data submitted in advance of publication can be kept confidential if requested. GenBank will provide accession numbers for submitted sequences, usually within two working days. This accession number serves as an identifier for your submitted your data, and allows the community to retrieve the sequence upon reading the journal article. The accession number should be included in your manuscript, preferably in a footnote on the first page of the article, or as required by individual journal procedures. GenBank Resources GenBank Home Submission Types Submission Tools Search GenBank Update GenBank Records Submissions to GenBank There are several options for submitting data to GenBank: ■ Banklt, a WWW-based submission tool with wizards to guide the submission process ■ Sequin. NCBI's stand-alone submission tool with wizards to guide the submission process is available by FTP for use on for MAC, PC, and UNIX platforms. ■ tb!2asn. a command-line program, automates the creation of sequence records for submission to GenBank using many of the same functions as Sequin. It is used primarily for submission of complete genomes and large batches of sequences and is available by FTP for use on MAC, PC and Unix platforms. ■ Submission Portal, a unified system for multiple submission types. Currently only ribosornal RNA (rRNA) or rRNA-ITS sequences can be submitted with the GenBank component of this tool. This will be expanded in the future to include other types of GenBank submissions. Genome and Transcriptome Assemblies can be submitted through the Genomes and TSA portals, respectively. ■ Barcode Submission Tool, a WWW-based tool for the submission of sequences and trace read data for Barcode of Life projects based on the COI gene. Banklt, Submission Portal and Barcode Submission Tool entries are automatically submitted to GenBank. Submissions made with Sequin or tbl2asn must be mailed to qb-sub@ncbi.nlm.nih.gov. Large files which may be truncated during mailing with conventional mail tools should be submitted directly using Sequin MacroSend. You can subscribe to be notified of updates to the submission tools. There are specialized, streamlined procedures for batch submissions of sequences, such as EST and GSS sequences. Genome submissoin portal https://www.ncbi.nlm.nih.aov/aenbank/genomesubmit/ % NCBI Resources© How To© pantucek My NCBI Sign C Gen Bank Nucleotide * I Search GenBank ▼ | Submit ▼ | Genomes ▼ WGS ▼ Metagenonnes ▼ TPA ▼ TSA ▼ INSDC ▼ Other ▼ 1 III II Prokaryotic and Eukaryotic Genomes Submission Guide Both WGS and non-WGS genomes; including gapless complete bacterial chromosomes, can be submitted via the Submission Portal. You will be asked to choose whether the genome being submitted is considered WGS or not. The differences for GenBank purposes are: non-WGS ■ Each chromosome is in a single sequence and there are no extra sequences • Each sequence in the genome must be assigned to a chromosome or plasmid or organelle ■ Plasmids and organelles can still be in multiple pieces WGS Genome Resources About WGS WGS Browser Genome Submission Guide Genome Submission Portal Update Genome Records FAQ tb!2asn Create Submission Template Eukaryotic Annotation Guide Prokaryotic Annotation Guide Ann eta'Jen Example Files Discrepancy Report NCBI Prokaryotic Genome Annotation Pipeline AGP Format Complex Assembly Submission Guide Metagenorne Submission Guide Bio Project http://www.ebi.ac.uk/ena/submit N Submit and update Submitting and updating data We offer a number of services through which data (including updates) can be submitted to the European Nucleotide Archive (ENA). These technologies provide options appropriate for the scale and frequency of submission, the expertise and capacity of the submitter and the nature of the data to be transferred. The choices below lead users most directly to the appropriate submission route, Submit read data Submit assembled sequence and/or annotation (No partial or complete assemblies) I Submit genome assemblies (contigs/scaffolds/chromosomes) Email ENA helpdesk Protokoly pro zaslání do nukleotidové databáze • Standard • ESTs (expressed sequence tags) a GSSs (genome survey sequences) • Complete Microbial or Eukaryotic Genomes • Whole Genome Shotgun (WGS) • High-Throughput Genomic Sequences (HTGs) • Transcriptome Shotgun Assembly (TSA) • Third Party Annotation (TPA) - záznamy, které upřesňují existující sekvence uložené do databází jinými autory - striktní požadavek na přímý experimentální důkaz Sekvence, které nejsou akceptovány v primárních databázích sekvence bez fyzického (biologického) protějšku - např. konsenzní sekvence genomové sekvence více exonů bez údajů o sekvencích intronů sekvence <200 bp (vyjma patentových) sekvence primerů (mohou být zaslány do NCBľs Probe database) Douze sekvence proteinů (mohou být zaslány do JniProt/SwissProt) sekvence složené z genomové sekvence a mRNA reprezentované jako jedna sekvence Typy standardních anotovaných sekvencí (nucleotide sequence database) • prokaryotické geny a genomy • eukaryotické geny a genomy • m RNA sekvence • rRNA a nebo ITS • virové sekvence • transpozony a inzerční sekvence • mikrosatelity • pseudogeny • klonovací vektroy • fylogenetické nebo populační studie (alignmenty) • nekódující RNA Celogenomové sekvence BioSample & BioProject c Umbrella BioProject v Genome BioProject Transcriptome BioProject Epigenome BioProject dáta j ^.data ~daís 1 [data í data- I í rdats BioSample 1 BioSample 2 Whole Genome Shotgun (WGS) • WGS sekvenační projekty jsou celé genomy nebo chromozomy sekvenované strategií celogenomového shotgun sekvenování • DDBJ/EMBL/GenBank akceptují jak kompletní, tak nekompletní genomy • WGS projekty mohou být anotovány, může být zvolena automatická anotace s NCBI pipeline • Části WGS projektu jsou kontigy, které nesmí obsahovat mezery • Volitelně - soubor AGP ukazuje, jak jsou kontigy oddělené mezerami uspořádány na chromozomu • Volitelně lze nahrát BAM nebo FASTQ do SRA (Sequence Read Archive) Sequence Read Archive (SRA) % NCBI Resources ® How To 0 Sicin into NCBI SRA SRA t Advanced Help Sequence Read Archive (SRA) makes biological sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, including Roche 454 GS System®, lllumina Genome Analyzer®, Applied Biosystems SOLiD System®, Helicos Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®. Getting Started How to Submit How to search and download How to use SRA in the cloud Submit to SRA Tools and Software Download SRA Toolkit SRA Toolkit Documentation SRA-BLAST SRA Run Browser SRA Run Selector Related Resources Submission Portal Trace Archive dbGaP Home BioProject BioSample You are here: NCBI > DNA & RNA > Sequence Read Archive (SRA) Support Center GETTING STARTED RESOURCES POPULAR FEATURED NCBI INFORMATION NCBI Education Chemicals & Bioassays PubMed Genetic Testing Registry About NCBI NCBI Help Manual Data & Software Bookshelf GenBank Research at NCBI NCBI Handbook DNA 8, RNA PubMed Central Reference Sequences NCBI News & Blog Training & Tutorials Domains & Structures BLAST Gene Expression Omnibus NCBI FTP Site Submit Data Genes & Expression Nucleotide Genome Data Viewer NCBI on Facebook Genetics & Medicine Genome Human Genome NCBI on Twitter Genomes & Maps SNP Mouse Genome NCBI on YouTube Homology Gene Influenza Virus Privacy Policy Automatická anotace • NCBI Prokaryotic Genome Annotation Pipeline (PGAP) - https://www.ncbi.nlm.nih.gov/qenome/annotation prok/process/ • NCBI Eukaryotic Genome Annotation Pipeline - https://www.ncbi.nlm.nih.gov/qenome/annotation euk/process/ • Jiné servery pro automatickou anotaci RAST - http ://rast. n m pd r. orq/ % NCBI Resources @ How To 0 Sian in to NCBI Genome Genome t Limits Advanced Prokaryotic Annotation Home Documentation Complete Genome Submission ▼ WGS Genome Submission ■» NCBI Prokaryotic Genome Annotation Pipeline NCBI Prokaryotic Genome Annotation Pipeline(PGAP) is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids). Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements. NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Pipeline was developed in 2001 and is regularly upgraded to improve structural and functional annotation quality [Haft DH et al 2018, Tatusova T et al 2016). Recent improvements utilize curated protein profile hidden Markov models (HMMs), including TIGRFAMS and new HMMs for antimicrobial resistance proteins, and curated complex domain architectures for functional annotation of proteins. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment. Related documentation: ■ Annotation process ■ Annotation standards ■ Assemblies excluded from RefSeq ■ Release notes GenBank The NCBI prokaryotic annotation pipeline is available as a sen/ice for GenBank submitters. The pipeline is capable of annotating both complete genomes and draft WGS genomes consisting of multiple contigs. You can request PGAP annotation when you submit your genome to GenBank. Both WGS and non-WGS genomes, including gapless complete bacterial chromosomes, can be submitted via the Submission Portal. You will be asked to choose whether the genome being submitted is considered WGS or not. The differences for GenBank purposes are: non-WGS: • Each chromosome is in a single sequence and there are no extra sequences ■ Each sequence in the genome must be assigned to a chromosome or plasmid or organelle ■ Plasmids and organelles can still be in multiple pieces. WGS: ■ One or more chromosomes are in multiple pieces and/or some sequences are not assembled into chromosomes % NCBI Resources 0 How To 0 Sian into NCBI Genome Genome t Limits Advanced Eukaryotic Annotation Home Documentatior ▼ Annotated Genomes ▼ Annotation Policy Request Annotation The NCBI Eukaryotic Genome Annotation Pipeline The NCBI Eukaryotic Genome Annotation Pipeline provides content for various NCBI resources including Nucleotide, Protein, BLAST, Gene and the Genome Data Viewer genome browser. This page provides an overview of the annotation process. Please refer to the Eukaryotic Genome Annotation chapter of the NCBI Handbook for algorithmic details. The pipeline uses a modular framework for the execution of all annotation tasks from the fetching of raw and curated data from public repositories (sequence and Assembly databases) to the alignment of sequences and the prediction of genes, to the submission of the accessioned annotation products to public databases. Core components of the pipeline are alignment programs (Splion and ProSplion) and an HMM-based gene prediction program (Gnomon) developed at NCBI. Important features of the pipeline include: • flexibility and speed • higher weight given to curated evidence than non-curated evidence ■ utilization of RNA-Seq for gene prediction ■ production of models that compensate for assembly issues ■ tracking of gene loci from one annotation to the next ■ ability to co-annotate multiple assemblies forthe same organism The products of an annotation run (chromosome, scaffolds and model transcripts and proteins) are labeled with an Annotation Release number. The Annotation Release name is the combination of the organism name and Annotation Release number (e.g. NCBI Pongo abelii Annotation Release 103) and is used throughout NCBI as a wayto uniquely identify annotation products originating from the same annotation run. Contents • Process o Source of genome assemblies o Masking ° Transcript alignments o RNA-Seq read alignments ° Protein alignments o Model prediction o Curated RefSeq genomic sequence alignments ° Choosing the best models for a gene o Protein naming and determination of locus type o Assignment of GenelDs ° Annotation of small RNAs RAST (Rapid Annotation using Subsystem Technology) Server http://rast.nmpdr.org/ Account Upkhsd a new gerto • HCÄ! lm*ctX3fnt to • Sc*cx« • Strar» * ?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]A_"abcdefghijklmnopqrstuvwxyz{|}~ -► Metadata v SRA Datové soubory jsou zasílány s metadaty - Studie - Experiment - Vzorek - Běh - Analýza - eticky citlivá data (EGA) Příklad SRAs mikrobiálním genomem ^^^^^^^^^^ https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9600155 Finu Postup zaslání GenBank Standardního typu • http://www.ncbi.nlm.nih.gov/books/NBK51157/ The GenBank Submissions Handbook G^mBiink Submissions Handbook NCBI Help Manual • U.S. NolinrvoJ Library oF Mndřán* Banklt @ Banklt - Windows Internet Explorer r$ http://www.ncbi.nlrn.nih.gov/WebSub/?l:orrľ =historySJ:ool=i v X 3 Bankit Soubor Úpravy Zobrazit Oblíbené položky. Nástroje Nápověda Oblíbené položky Banklt - Stránka " Zabezpečení" Nástroje % NCBI Home Search Site Map New Banklt Submissions New Submission Complete Submissions ID Date Submitted Record 1391012 15 Sep 2010 10:35:52 Download File r.zip) Contact I Copyright | Disclaimer | Privacy | Accessibility National Cental far Biotechnology Information , US National Library of Medicine 5600 Rodwille Pike , Bethesda, M D USA 20694 Logged in as Roman Pantucek (roman, pantu cek) Leo cut 'S f NLM "ittp: Ijwww. nebi. nlm. nih. go v/WebSub/index, cgi?tool= Internet Sequin - příprava zaslání sekvence https://www.ncbi.nlm.nih.gov/Sequin/ Welcome to Sequin Mise Sequin Sequin Application Version 6.00 Standard Release [Oct 2 7 200 5] National Center for Biotechnology Information National Library of Medicine National Institutes of Health [301)496-2475 info@ncbi. nlm. nih. gov database for submission (* GenBank C EMBL C DDBJ Start New Submission Read Existing Record Show Help Quit Program Sequence Format File Submission type (* Single Sequence C Gapped Sequence C Segmented Sequence C Population Study C Phylogenese Study C Mutation Study C Environmental Samples C Batch Submission Sequence data format (* FAST A (no alignment) Submission category r Alignment (FASTA+GAP, NEXUS, PHYLIP, etc. (* Original Submission C Third Party Annotation _ « Prev Form Newt Form » Požadavky na každé zaslání sekvence kontaktní informace Submitting Authors File Edit Submission If Contact First Name M . I Please include country code for non—U.S. phone numbers. File Edit Authors If Affiliation Darwin Last Name Sfx 3 01 44 171-007-1212 Fax Q darwin^beagle. edu.uk Submitting Authors Submission Contact Authors Af f iliation « Prev Page Ne: Institution Department Address City State/Province Country Oxbridge University Evolutionary Biology Department 1859 Tennis Court Lane Camford Zip/Postal Code 0X1 2BH United Kingdom « Prev Page Newt Form >> Další požadavky na zaslání sekvence • Informace o datu zveřejnění • Informace o relevantních publikacích • Popis zdroje sekvence • Vlastní sekvence -typ a tvar molekuly - anotace vlastností sekvence Popis zdroje sekvence organism nezkrácené vědecké jméno Příklad: [organism=Drosophila melanogaster] lineage taxonomické zařazení organismu (dle NCBI taxonomy database) http://www.ncbi.nlm.nih.qov/Taxonomv/Browser/wwwtaxxqi7modesRoot molecule ve tvaru "DNA" nebo "RNA". Příklad : [molecule=DNA] moltype může nabývat následujících hodnot Příklad : [moltype=Genomic DNA] - Genomic DNA - Genomic RNA - Precursor RNA - mRNA [cDNA] - Ribosomal RNA - Transfer RNA - Small nuclear RNA - Small cytoplasmic RNA - Other-Genetic - cRNA - Small nucleolar RNA • topology Popis zdroje sekvence 2 location může nabývat následujících hodnot Příklad: [location=mitochondrion] - genomic - chloroplast - kinetoplast - mitochondrion - plastid - macronuclear - extrachromosomal - plasmid - cyanelle - proviral - virion - nucleomorph - apicoplast - leucoplast - proplastid - endogenous-virus - hydrogenosome Genetic code (http://www.ncbi.nlm.nih.qov/Taxonomv/Utils/wprintqc.cqi7modesc) Popis zdroje sekvence 3 Další popisovače ke zdroji sekvence acronym anamorph authority biotype biovar breed cell-line eel I-type chemovar chromosome clone clone-lib collected-by common country cultivar dev-stage ecotype endogenous-virus-name forma forma-specialis • serogroup fwd-pcr-primer-name • serotype fwd-pcr-primer-seq • serovar genotype • sex group • specific-host haplotype • specimen-voucher identified-by • strain isolate • sub-species isolation-source • subclone lab-host • subgroup lat-lon • substrain map • subtype note • synonym pathovar • teleomorph plasmid-name • tissue-lib plastid-name • tissue-type pop-variant • type rev-pcr-primer-name • variety rev-pcr-primer-seq segment Formát sekvence • Sekvence nukleové kyseliny a kódovaných proteinů připravené ve formátu FASTA Nucleotide Sequence: >ABC-1 [organism=Saccharomyces cerevisiae][strain—ABC][clone—1] AT T GC GT T AT GGAAAT T C GAAAC T GC C AAAT AC T AT GT C AC C AT CAT T GA T GC AC C T GGAC ACAGAGAT T T CAT C AAGAAC AT GAT C AC T GG T AC T T Protein Sequences: >4E-I [gene=eIF4E] [protein=eukaryotic initiation factor 4E-I] MQ SD FHRMKNFANPKSMFKT SAP S TE QGRPE PPT S AAAPAE AKD VKPKED PQE T GE PAGN . . . >4E-II [gene=eIF4E] [protein=eukaryotic initiation factor 4E-II] MWLE TE KT SAP S TE QGRPE PPT S AAAPAE AKD VKPKED PQE TGE PAGNTAT T TAPAGDD . . . Přsrušená sekvence >m_gagei [organism=Mansonia gagei] Mansonia gagei nadh dehydrogenase ... atggagcatacatatcaatattcatggatcataccgtttgtgccacttccaattcctattttaataggaa ttggactcctactttttccgacggcaacaaaaaatcttcgtcgtatgtgggctcttcccaatattttatt gttaagtatagttatgattttttcggtcgatctgtccattcagcaaataaataaaagttctatctatcaa tatgtatggtcttggaccatcaataatgatttttctttcgagtttggctactttattgattcgcttacct >?2oo <- Délka přerušení ggt at aat aacagt at t at t aggggc t ac t t t agc t c t t gc tcaaaaagatattaagaggggtttagcctattctacaatgtcccaactgggttatatgatgttagctcta ggtat ggggt ct tat cgagccgc t t tat t tcat t tgat tac tcat gc t tat t cgaaggcat t gt t gt t t t taggatccggatccgttattcattccatggaagctattgttggatattctccagataaaagccagaatat ggtttttatgggcggtttaagaaagcatgtgccaattacacaaattgcttttttagtgggtacactttct ctttgtggtattccaccccttgcttgtttttggtccaaagatgaaattcttagtgacagctggttgt >?unkioo <- Přerušení neznámé délky tcaataaaactatggggtaaagaagaacaaaaaataattaacagaaattttcgtttatctcctttattaa tat t aacgat gaat aat aat gagaagccat at agaat t ggt gat aat gt aaaaaaaggggc t c t tat tac tattacgagttttggctacaagaaggctttttcttatcctcatgaatcggataatactatgctatttcct atgcttatattggctctatttactttttttgttggagccatagcaattccttttaatcaagaaggactac atttggatatattatccaaattattaactccatctataaatcttttacatcaaaattcaaatgattttga ggattggtatcaatttttaacaaatgcaactctttcagtgagtatagcctgtttcggaatatttacagca ttccttttatataagcctttttattcatctttacaaaatttgaacttactaaatttattttcgaaagggg gt cc t aaaagaat t t t t t t ggat aaaat aat at ac t t gat at ac gat t ggt c at at aat cgt ggt tac at Sekvenční přiložení • Fasta+GAP >ABC-1 [organism=Saccharomyces cerevisiae][strain=ABC][clone=l] ---ATTGCGTTATGGAAATTCGAAACTGCCAAATACTATGTCACCATCAT TGATGCACCTGGACACAGAGATTTCATCAAGAACATGATCACTGGTACTT >ABC-2 [organism=Saccharomyces cerevisiae][strain=ABC][clone=2] GATATTGCTTTATGGAAATTCGAAACTGCCAAATACTATGTCACCATCAT TGATGCACCTGGACACAGAAATTTCATCAAGAACATGATCACTGGTACTT >ABC-3 [organism=Saccharomyces cerevisiae][strain=ABC][clone=3] ---AT T GC T T T AT GGAAAT T C GAAAC TGC CAAATAC T AT G T T A------- TGATGCACCTGGACACAGAGATTTCATCAAAAACATGATCACTGGTACTT • PHYLIP 3 100 ABC-1 —ATTGCGT TATGGAAATT CGAAACTGCC AAATACTATG TCACCATCAT ABC-2 GATATTGCTT TATGGAAATT CGAAACTGCC AAATACTATG TCACCATCAT ABC-3 — ATTGCTT TATGGAAATT CGAAACTGCC AAATACTATG TTA------- TGATGCACCT GGACACAGAG ATTTCATCAA GAACATGATC ACTGGTACTT TGATGCACCT GGACACAGAA ATTTCATCAA GAACATGATC ACTGGTACTT TGATGCACCT GGACACAGAG ATTTCATCAA AAACATGATC ACTGGTACTT >[organism=Saccharomyces cerevisiae][strain=ABC][clone=1] >[organism=Saccharomyces cerevisiae][strain=ABC][clone=2] >[organism=Saccharomyces cerevisiae][strain=ABC][clone=3] File Edit Search Options Misc Annotate Target Sequence |elF4E "3 Done Format |GenBank Mode |Sequin Style |Normal CDS: eukaryotic initiation factor 4E—II LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL REFERENCE AUTHORS TITLE JOURNAL FEATURES source gene CDS eIF4E 2881 bp DNA linear INV 27-OCT-2005 Drosophila melanogaster eukaryotic initiation factor 4E (eIF4E) gene, alternative splice products, complete cds. Drosophila melanogaster (fruit fly) Drosophila melanogaster Eukauyota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neopteua; Endopterygota; Diptera; Brachyceua; Huscomorpha; Ephyduoidea; Drosophilidae; Drosophila. 1 (bases 1 to 2881) Burnett,F.H., van der Waals,J.D. and Szent-Gyorgi,A. Environmental influences on the expansion of germline tandem repeats in several species of Galapagos finches Unpublished 2 (bases 1 to 2881) Burnett,F.H., van der Waals,J.D. and Szent-Gyorgi,A. Direct Submission Submitted (27-OCT-2005) Evolutionary Biology Department, Oxbridge University, 1859 Tennis Court Lane, Camford 0X1 2BH, United Kingdom Location/Qualifiers 1..2881 /organism="Drosophila melanogaster" /mol_type="genomic DNA" /strain="Oregon R" join[201..224,1550, /gene="eIF4E" join[201..224,1550, /gene="eIF4E" /codon_start=1 /product="eukaryotic initiation factor 4E-II" / trans lat ion= "MWLETEKTS APSTEQGRPEPPTSAAAPAEAKDVKPKEDPQETG EPAGNTATTTAPAGDDAVRTEHLYKHPLHNVWTLWYLENDRSKSUEDHQNEITSFDTV EDFUSLYNHIKPPSEIKLGSDYSLFKKNIRPMTJEDAANKQGGRUVITLNKSSKTDLDN LTJLDVLLCL IGEAFDHSDQICGAVINIRGKSNKISIIJJTADGNNEEAALEIGHKLRDAL RLGRNNSLQYQLHKDTMVKCjGSNVKSIYTL" 1920,1986..2085,2317. 1920, 1986. .2085,2317. ,2404,2466..2629) ,2404,2466..2629) File Edit Search Options Mise Annotate Target Sequence |elF4E File Edit Search Options Misc Annotate Target Sequence |elF4E "3 Done Sequence CDS: eukaryotic initiation factor 4E—II Feature display: | Target Numbering : | Top ~Z\ Grid: | Off 61 10 70 130 20 80 I 140 3 Done Format |Graphic 3 Style |Default Filter |Default jj Scale |10 eIF4E| I i i i i I i i i i I i i i i I i i i i I i i i i I i i i I 1 1000 2000 2881 30 90 150 210 40 50 . 1 60 1 tgacaggcat 1 1 ttccagagtt 1 gccctgttca 100 110 , 1 120 1 tcccaaactt 1 1 aattaaagaa 1 ttaaataatt 160 170 . 1 ISO 1 agcttgagtg 1 1 cgtaaccgat 1 atctagtata 220 230 . 1 240 1 tggagacgga 1 1 gaaggtaaga 1 cgatgataga Gene: eIF4E CDS: ^eukaryotic inrtia^^cm^^actc^^E—11 CDS: eukaryotic initiation, factor 4E-I UVV LETE K 270 280 290 300 I I I I tttgcgctg agccgtggca gggaacaaca aaaacagggt 330 340 350 3 60 I I I I atagtcgag eggaaaagag tgcagttggc gtggctacat 390 400 410 420 I I I I ttttttgea caattgetta atattaattg tacttgeacg "fl Coding Region Jnjxj Properties Location f Product Protein Exceptions Mise Genetic Code Reading Frame Standard T-1 Protein Length 248 Protein Product J4E-II I Coding Region File Edit MWLETEKTSAPSTEQGKPEPPTSAAAPAEAKDV1 ATTTAPAGDDAVKTEHLYKHPLMNVWTLWYLENDJ TVEDFWSLYNHIKPPSEIKLGSDYSLFKKNIKPM1 NKSSKTDLDNLWLDVLLCLIGEAFDHSDQICGAV GNNEEAALEIGHKLKDALKLGKNNSLQYQLHKDTI Predict Interval Translate Product — Edit - Retranslate on Accept {7 Synchro! Accept Cancel ^ Coding Region ^Properties^ Location^ Identifiers J General Connnnent Flags I- Partial |~~ Pseudo Evidence I- Exception Explanation Standard explanation Gene Map by (!" Overlap C Cross-reference Edit Gene Feature lature | I Retranslate on Accept p" Synchroi Accept j Cancel | 201 224 |Plus jj|elF4E zl 1550 1920 Plus jj elF4EjJ 1986 2085 Plus 2J elF4E 2317 2404 Plus jJ|elF4E zl I- 'order' (intersperse intervals with gaps) I- Retranslate on Accept W Synchronize Partials Accept I Cancel | Anotace vlastní sekvence Kódované proteiny -CDS interval nekompletnost na N- nebo C- konci - gene interval odpovídající CDS u experimentálně prokázaných genů - mRNA interval obsahující 5'-UTR a 3'-UTR Kódované strukturní RNA Příklady sekvencí Sekvence mRNA nebo cDNA Kódující oblasti včetně iniciačního a terminačního kodonu Název proteinu Název genu Sekvence proteinu Horno sapiens prolidase (PEPD) mRNA, complete cds. FEATURES source mRNA gene CDS Location/Qualifiers 1..1888 /organism="Homo sapiens /chromosome="19" /map="19ql2-ql3.2" /cell_type="fibroblasts 1..1888 /gene="PEPD 1..1888 /gene="PEPD 17..1498 /gene="PEPD /EC_number="3.4.13.9 /note="imidodipeptidase /product="prolidase" ii n in n Sekvence prokaryotického genu Kódující intervaly Název proteinu Název genu, je-li známý Aminokyselinová sekvence Escherichia coli RecA protein (recA) gene, complete cds FEATURES Location/Qualifiers source gene CDS 1..3300 /organism="Escherichia coli /strain="K-12" 783..1961 /gene="recA 783..1961 /gene="recA /function="DNA repair protein /product="RecA protein" ii n n Sekvence eukaryotického genu Intervaly kódujících oblastí včetně start- a stop-kodonů a intervaly všech intronů Název proteinu Název genu, je-li známý Aminokyselinová sekvence Caenorhabditis elegans tyrosine kinase PTK-2 (ptk-2) gene, complete cds. FEATURES source gene mRNA CDS Location/Qualifiers 1..3180 /organism="Caenorhabditis elegans" 211..3011 /gene="ptk-2" join(211..288,533..703,763..890,940..1024, 1084..1380,1838..1962,2018..2099,2301..3011) /gene="ptk-2" /product="protein kinase PTK-2" join(250..288,533..703,763..890,940..1024, 1084..1380,1838..1962,2018..2099,2301..2456) /gene="ptk-2" /product="protein kinase PTK-2" Ribosomální RNA a vnitřní přepisované mezerníky • Názvy jakékoli strukturní RNA (např. tRNA-lle, 16S ribosomal RNA) Názvy mezerníkových oblastí (např., internal transcribed spacer 1, 16S/23S intergenic spacer) Nukleotidové pozice Saccharomyces cerevisiae 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene and internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence. FEATURES Location/Qualifiers source 1..540 /organism="Saccharomyces cerevisiae" /strain="UMD 334" rRNA <1..5 /product="18S ribosomal RNA" mi s c_RNA 6. .178 /product="internal transcribed spacer 1 " rRNA 179..377 /product="5.8S ribosomal RNA" misc_RNA 378..519 /product="internal transcribed spacer 2" rRNA 520..>540 /product="28S ribosomal RNA" Oblast promotoru • Název proteinu nebo genu, ke kterému patří promotor a jeho 5' a 3' obklopující sekvence Intervaly přepisovaných a kódujících sekvencí, pokud jsou přítomné Homo sapiens enhancer-binding protein 2 (EBP2) gene, promoter region and partial cds. FEATURES Location/Qualifiers source gene promoter TATA_s i gna 1 mRNA 5 'UTR CDS 1..3061 /organism="Homo sapiens" /chromosome=n15n /map="15ql3" /cell_line="H441" /tissue_type="lung" 1..>3061 /gene="EBP2" 1. .2947 /gene="EBP2" 2918..2923 /gene="EBP2" 2948..>3061 /gene="EBP2" /product="enhancer-binding protein 2" 2948..3010 /gene=MEBP2" 3011..>3061 /gene="EBP2" /product="enhancer-binding protein 2" Transpozon nebo inzerční sekvence Specifické jméno elementu Nukleotidoné pozice Jména a intervaly kódovaných genových produktů, pokud jsou přítomny (např., transposase) Pozice a intervaly dalších vlastností (např. LTRs, repeat regions) Bacillus subtilis transposon BLT transposase (tnpA) gene, complete cds FEATURES source source repeat_region gene CDS repeat region Location/Qualifiers 1..1221 /organism="Bacillus subtilis /strain=MRS2M 21..1127 /organism="Bacillus subtilis /strain=MRS2M /transposon="BLT" 21..61 /rpt_type=inverted 128..1034 /gene="tnpA" 128..1034 /gene="tnpA" /product="transposase 1085..1127 /rpt type=inverted ii n Oblasti repeticí Intervaly repetitivních sekvencí Rodina repeticí (napr., Alu, Mer) Typ repetice (tandem, inverted, flanking, terminal, direct, dispersed, or other) Jednotka repetice (repeat unit) popis intervalů, jestliže sekvence obsahuje více než jednu repetici Homo sapiens repeat regions FEATURES source repeat region repeat region repeat region repeat region repeat region n Location/Qualifiers 1. .2050 /organism="Homo sapiens /chromosome="6" /map="6q25" 8. .126 /rpt_type=dispersed /rpt_family="B2" 197..344 /rpt_type="direct" /rpt_unit="197..220" 389..673 /rpt_family="AluSx" /rpt_type=dispersed 847..876 /note="microsatellite BT21" /rpt_type="tandem" /rpt_uni t="ca" 1000..2000 /rpt family="human endogeneous retrovirus K-10 Klonovací vektor Jedinečné jméno vektoru Kódující intervaly, jména genů a proteinů Cloning vector pRB223, complete sequence FEATURES source gene CDS RBS rep_origin gene CDS misc feature RBS promoter Location/Qualifiers 1..4361 /organism="Cloning vector pRB223" 86..1276 /gene="tetff 86..1276 /gene="tetff /product="tetracycline resistance protein' 1905..1909 /note="Shine-Dalgarno sequence" 2535 complement(3293..4194) /gene="bla" complement(3293..4153) /gene="bla" /product="beta-lactamase" 4069..4125 /note="multiple cloning site" complement(4161..4165) /gene="bla" /note="Shine-Dalgarno sequence" complement(4188..4194) /gene="bla" Bacteriophage lysis module; endolysin and HNH endonuclease genes, complete CDS FEATURES source gene CDS mi sc feature intron CDS Location/Qualifiers 1..3165 /organism="Staphylococcus bacteriophage 812" /virion /mol_type=" genomic DNA" /strain="phi812" /lab_host="Staphylococcus aureus CCM 4028" /type="wild type" 654..3017 /gene="lyt812" join(654..1449,2329..3017) /gene="lyt812" /experiment="peptide sequencing" /note="Lyt812" /codon_start=l /transl_table=ll /product="endolysin" /translation="MAKTQAEI............... " join(1239..1449,2329..2576) /gene="lyt812" /note="SM00644; Ami_2; This family includes zinc amidases that have N-acetylmuramoyl-L-alanine amidase activity; Region: Ami_2 " 1450..2328 /gene="lyt812" /s tandard_name="ly1812-11" /experiment="cDNA synthesis and sequencing" 1617..2117 /gene="lyt812" /note="ORFI-812HI" /codon_start=l /transl_table=ll /product="putative HNH endonuclease Příklady některých dalších modifikací deskriptorů • Title - Informace vyskytující se v databázi v DEFINITION LINE • Comment - Poznámka k různým vlastnostem • Technique - Umožňuje výběr techniky použité pro vytvoření nebo experimentální evidenci vlastností sekvence Přehled deskriptorů pro popis vlastností sekvence (http://www.ncbi.nlm.nih.gov/Banklt/help.htmn attenuator • misc_RNA • S_region C-region • misc_signal • satellite CAAT signal • misc_structure • scRNA CDS • modified base • sig peptide conflict • mRNA • snRNA D-loop • N_region • snoRNA D-segment • old_sequence • source enhancer • operon • stem loop exon • oriT • STS gap • polyAsignal • TATA_signal GC_signal • polyA_site • terminator gene • precursorRNA • transit peptide iDNA • prim_transcript • tRNA intron • primer_bind • unsure J_segment • promoter • V_region LTR • protein bind • V_segment mat_peptide • RBS • variation misc_binding • repeat_region • 3'clip misc_difference • repeat_unit • 3'UTR misc_feature • rep origin • 5'clip misc_recomb • rRNA • 5'UTR