Využití internetových zdrojů při studiu mikroorganismů
doc. RNDr. Milan Bartoš, Ph.D.
bartosm@vfu.cz
Přírodovědecká fakulta MU, 2012
Obsah přednášky
1) Práce se sekvenčními daty
2) Základní veřejně dostupné databáze
3) Práce se stránkami NCBI
4) Jak se posuzuje podobnost sekvencí
5) Prohledavač BLAST
6) Mnohočetné přiřazení - program CLUSTAL
Doporučená literatura
Cvrčkova F. (2006):
Úvod do praktické bioinformatiky, Academia Praha
http://www.ncbi.nlm.nih.gov/
Práce se sekvenčními daty
Sekvenční data = zápis primární sekvence makromolekul, tj. DNA (RNA) a proteinů
> DNA a RNA se zapisují ve směru 5'- 3'
> Proteiny se zapisují od N-konce k C-konci
> Používají se jednopísmenkové kódy (podle IUPAC)
Zkratky pro nukleové kyseliny
DNA, RNA
Kód A C G T U
Y S W
Báze Adenin Cytosin Guanin Ty m i n Uracil A, G (purin) C, T (pyrimidin) G, C (strong) A, T (weak)
Kód K M B D H
N
Báze G, T (keto) A, C (amino) C, G, T (ne A) A, G, T (ne C) A, C, T (ne G) A, C, G (ne T, U) cokoli (any)
mezera
Zkratky pro proteiny
Kód	Zkratka	Amino kyselina	Kód	Zkratka	Amino kyselina
A	Ala	Alanin	P	Pro	Prolin
C	Cys	Cystein	Q	Gin	Glutamin
D	Asp	Aspartat	R	Arg	Arginin
E	Glu	Glutamat	S	Ser	Serin
F	Phe	Fenylalanin	T	Thr	Threonin
G	Gly	Glycin	V	Val	Valin
H	His	Histidin	W	Trp	Tryptofan
1	lie	Izoleucin	Y	Tyr	Tyros i n
K	Lys	Lys i n	X	Xxx	cokoli
L	Leu	Leucin	B	Asx	Asp, Asn
M	Met	Methionin	Z	Glx	Glp, Gin
N	As n	Asparagin			
Způsoby zápisu
Surová data (raw data, raw formát)
> Některé programy je umí přijmout a zpracovat
> Nejsou ale vhodné pro dlouhodobé uchovávání
Specializované formáty
> Základní veřejné databáze je umí převádět
Jednoduché formáty - FASTA
> Nejlépe bez mezer a speciálních znaků
>gi|291219937|ref|NM_001888.3| Horno sapiens crystallin, mu (CRYM), transcript variant 1, mRNA
TTTCAAATGGGGAGTTTCCCTGCACAAGCTTTCTTGTCTGCCACTATGTGAGATATACCTT TCACCTTCTGCCGTGATTGTGAGGCCTCCTCAGCCACGTGGAACTGTAAAAACTCCTGGAA GAAAAGATCCTGCAATTT
FASTA a WORD
Na co si dát pozor
> Uložit ve formátu „pouze text"
> Nepoužívat tabelátory a jiné cizí znaky
> Vypnout funkce „automatické opravy" a „automatický text" i funkce „inteligentní vyjímání a vkládání"
Typ písma
Doporučuji formát pisma „Courier New" - každé písmeno zaujímá stejnou plochu
Courier New 24
TTTCAAATGGGGAGTTTCCCTGCACAAGCTTTCTT AAAGTTTACCCCTCAAAGGGACGTGTTCGAAAGAA
Arial 24
TTTC A A ATG G G G AGTTTCCCTG C AC A AG CTTTCTT AAAGTTTACCCCTCAAAGGGACGTGTTCGAAAGAA
Pozor, zkratky pro NA a proteiny jsou v ^ některých případech shodné!
Vstupní formáty pro počítačové zpracování musí být specifikovány, aby program rozpoznal, jde-li o NA nebo protein ,
Molekulárně-biologické databáze
Evropsky institut pro bioinformatiku ve Velké Británii (EBI)
EMBL, 1980 www.ebi.ac.uk
Národní centrum pro biotechnologické informace (NCBI) založené v rámci Národní lékařské knihovny (NLM) v USA
GenBank, 1982 www.ncbi.nlm.nih.gov
Centrum pro inormační biologii (CIB), jako oddělení Národního genetického institutu (NIG) v Japonsku
DDBJ, 1984
www.cib.niq.ac.jp
GenBank/EMBL/DDBJ
> Vzájemně si vyměňují si informace
> Volně dostupné
> Přijímají nové sekvence z genomových center a pracovišť zabývajících se sekvenováním
Sekvenci v databázích může zveřejnit kdokoli!
Databáze sekvencí proteinů
Databáze SWISS-PROT založená na Univerzitě v Ženevě v roce 1986
Spravuje Švýcarský institut pro bioinformatiku (SIB)
www.expasy.org
Obsahuje automaticky doplňované překlady sekvencí z E M B L
Databáze PDB (The Protein Databank)
Archivuje a analyzuje proteinové struktury a komplexy informačních biomakromolekul
^ http://www.rcsb.orq/pdb/home/home.do
Práce s databází NCBI
www.ncbi.nlm.nih.gov
Resources M  How To |v|
ÍNCBI
National Center for Biotechnology Information
NCBI Home
Resource List (A-Z)
All Resources
Chemicals & Bioassays
Data & Software
DNA & RNA
Domains & Structures
Genes & Expression
Genetics & Medicine
Genomes & Maps
Homology
Literature
Proteins
Sequence Analysis
Taxonomy
Training & Tutorials
Variation
All Databases
Welcome to NCBI
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
About the NCBI | Mission | Organization | Research | RSS Feeds
Jet Started
Tools: Analyze data using NCBI software
Downloads: Get NCBI data or software
How-To's: Learn how to accomplish specific tasks at NCBI
Submissions: Submit data to GenBank or other NCBI databases
Genomic Structural Variation
dbVar archives large scale genomic variation data and associates defined variants with phenotypic information.
My NCBI Sign In
Search
Popular Resources
PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein
PubChem
NCBI Announcements
New Microbial BLAST Page
12Jun2012
Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-comDlete
Skjn ud for the Fall Discovery Workshops!
Práce s databází NCBI
My NCBI Sign In
JNCBI
national Center for Biotechnology Information
NCBI Home Resource List (A-ZJ All Resources
Chemicals & Bioassays
Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology
Literature Proteins
Sequence Analysis Taxonomy
Training & Tutorials Variation
All Databases
All Resources
All
Databases
Downloads
Submissions
Tools
How To
Tools
1000 Genomes Browser
An interactive graphical viewer that allows users to explore variant calls, genotype calls and supporting evidence (such as aligned sequence reads) that have been produced by the 1000 Genomes Project.
ASN 1 Format Summary
An International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. For data specifications and conversion tools, see NCBI Data Specification below.
Amino Acid Explorer
This tool allows users to explore the characteristics of amino acids by compalng their structural and chemical properties, predicting protein sequence changes caused by mutations, viewing common substitutions, and browsing the functions of given residues in conserved domains.
Assembly Archive
Links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram.
BLAST Link (BLink)
A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other
Práce s databází NCBI
bené položky Náitroje Nápověda ) Identity Safe -
Domains & Structures
Genes & Expression
Genetics & Medicine
Genomes & Maps
Homology
Literature
Proteins
Sequence Analysis
Taxonomy
Training & Tutorials
Variation
aligned sequence reads) that have been produced by the 10QQ Genomes Project. ASN t Format Summary
An International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. For data specif cations and conversion tools, see NCBI Data Specification below.
Amino Acid Explorer
This tool allows users to explore the characteristics of amino acids by comparing their structural and chemical properties, predicting protein sequence changes caused by mutations, viewing common substitutions, and browsing the functions of given residues in conserved domains.
Assembly Archive
Links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ). The Assembly Viewer allows a user to see the multiple sequence alignments as well as the actual sequence chromatogram.
BLAST Link (BLinK)
A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other protein sequences at NCBI.
BLAST Microbial Genomes
Performs a BLAST search for similar sequences from selected complete eukaryotic and prokaryotic genomes.
BLAST RefSeqGene
Performs a BLAST search of the genomic sequences in the RefSeqGene/LRG set. The default display provides ready navigation to review alignments in the Graphics display.
BLAST Tutorials and Guides
This page links to a number of BLAST-related tutorials and guides, including a selection guide tor BLAST algorithms, descriptions of BLAST output formats, explanations of the parameters tor stand-alone BLAST, directions tor setting up standalone BLAST on local machines and using the BLAST URL API.
Práce s databází NCBI
	BLAST®			Basic Local Alignment Search Tool		My NCBI	
	Home	Recent Results	Saved Strategies	Help			
► NCBľ BLAST/ blastn suite bio
BLAST microbial genomes
blastn	blastx	tblastn
		
Enter Query Sequence
BLASTN programs search nucleotide databases using a nucleotide query, more...
Enter accession numbers], gi(s), or FASTA sequence(s) y>
Clear      Query subrange _■ From
To
Or, upload file Job Title
Enter a descriptive title for your BLAST search
Procházet.
4»
Choose Search Set
Database
Organism
Optional
f Complete genomes O Draft genomes yj Genomes: 2096
Enter organism name or id-completions will be suggested    E3 Exclude 2
Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown, y
Entrez Query
Optional
\z „ .   Enter an Entrez query to limit search mi
Program Selection
<
Reset cage Bookmark
Dostali jste se na prohledavač BLAST
Další zajímavé „ Tools"
Vyhledávání STS
This interactive tool allows users to build E-utility URLs, either from a form or by hand, arid then view their raw output. The tool provides a simple environment for testing E-utility URLs before including them in applications.
E-Utilities
Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.
Ebot
A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline.
Electronic PGR (e-PCRl
A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs.
Frequency-weighted Link fFLinIO
FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics.
Gene Expression Omnibus (GEO) BLAST
Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database.
Gene Plot
A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes.
Genetic Codes
Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonornic tree.
Genome BLAST
Další zajímavé „ Tools"
Srovnání dvou prokaryotických genomů
This interactive tool allows users to build E-utility URLs, either from a form or by hand, arid then view their raw output. The tool provides a simple environment for testing E-utility URLs before including them in applications.
E-Utilities
Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.
Ebot
A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline.
Electronic PGR (e-PCRl
A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs.
Frequency-weighted Link fFLinIO
FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics.
Gene Expression Omnibus (GEO) BLAST
Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms Gene Plot
A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes.
Genetic Codes
Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonomie tree.
Genome BLAST_
Další zajímavé „ Tools
Tabulky genetických kódu
This interactive tool allows users to build E-utility URLs, either from a form or by hand, arid then view their raw output. The tool provides a simple environment for testing E-utility URLs before including them in applications.
E-Utilities
Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.
Ebot
A tool that allows users to construct an E-utility analysis pipeline using an online form, and then generates a Perl script to execute the pipeline.
Electronic PGR (e-PCRl
A computational procedure that is used to identify sequence tagged sites (STSs) within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PGR primers and have the correct order, orientation, and spacing that could represent the PGR primers used to generate known STSs.
Frequency-weighted Link fFLinIO
FLink is a tool that enables you to link from a group of records in a source database to a ranked list of associated records in a destination database based on frequency-weighted statistics.
Gene Expression Omnibus (GEO) BLAST
Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database.
Gene Plot
A tool for pairwise comparison of two prokaryotic genomes that displays pairs of protein homologs that are symmetrical best hits between the two genomes.
Genetic Codes
Displays the genetic codes for organisms in the Taxonomy database in tables and on a taxonomic tree.
Genome BLAST
Další zajímavé „ Tools"
Navrhování primem pro PCR
PSSM Viewer
Allows users to display, sort, subset and download position-specific score matrices {PSSMs) either from CDD records or from Position Specific Iterated (PSI)-BLAST protein searches. The tool also can align a query protein to the PSSM and highlight positions of nigh conservation
Phenotype-Genotype Integrator(PheGenl)
Supports finding human phenotype/genotype relationships with queries by phenotype, chromosome location, gene, and SNP identifiers. Currently includes information from dbGaP, the NHGRI GWAS Catalog, and GTeX. Displays results on the genome, on sequence, or in tables for download.
Primer-BLAST
The Primer-BLAST tool uses Primer3 to design PCR primers to a sequence template. The potential products are then automatically analyzed with a BLAST search against user specified databases, to check the specificity to the target intended.
ProSplign
A utility for computing alignment of proteins to genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, ProSplign is accurate in determining splice sites and tolerant to sequencing errors.
PubChem Power User Gateway (PUG)
PUG provides access to PubChem services via a programmatic interface. PUG allows users to download data, initiate chemical structure searches, standardize chemical structures and interact with the E-utilities. PUG can be accessed using either standard URLs or via SOAP.
PubChem Standardization Service
Standardization, in PubChem terminology, is the processing of chemical structures in the same way used to create PubChem Compound records from contributors' original structures. This service lets users see how PubChem would handle any structure they would lite to submit.
PubChem Structure Search
PubChem Structure Search allows the PubChem Compound Database to be queried by chemical structure or chemical
Primer-BLAST
Primer-BLAST
► NCBK Primer-BLAST: Finding primers specific to your PC R template (using Primer3 and BLAST),   more...    Tips for finding specific primers
pQpj Template Reset page   Save search parameters   Retrieve recent results
Enter accession, gi, or FASTA sequence (A refseq record is preferred) j>; Clear
Range
		From	To	
		Forward primer		
		Reverse primer		
Or, upload FASTA file	Procházet...			
Primer Parameters				
Use my own forward primer (5'->3' on plus strand} Use my own reverse primer l5'->3' on minus strand)
PCR product size # of primers to return
i*1
Clear Clear
Min
Max
70
5 Min
Primer melting temperatures 57 q
Exon/intron selection
1000
Opt
Max
60.0
63.0
Max Tm difference 3 ty
Exon junction span Exon junction match
A refseq mRNA sequence as PCR template input is required for options in the section
Hi®
No preference
Exon at 5' side    Exon at 3" side
Prohlédněme si tuto stránku podrobně
Navrhněte prímery pro identifikaci genu pro 16S rRNA Borrelia burgdorferi metodou PCR
> Do zadávacího okénka pro sekvenci zadejte Acc. No. sekvence pro 16S rRNA, např. HQ433693.1
> Využijte DEFAULT nastavení nebo měňte parametry podle vlastního uvážení
Ukázka výsledku
Primer-BLAST
► NCBIÍ Primer-BLAST : results: Job id=JSID 01 366985 130.14.18.128 9002 more..
Input PCR template Range
Specificity of primers
Other reports
HQ433693,1 Borrelia burgdorferi strain QSYSP3 16S ribosomal RNA gene, partial sequence 1 - 481
primers may not be specific to the input PCR template as targets were found in selected database:All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGE sequences) ...help on specific primers Search Summary
▼ Summary of primer pairs
Sequence (5'->3') Template strand Length Start Stop Tm   GC% Self complementarity Self 3' complementarity
Forward primer GCGAAAGCCTGACGGAGCGA Plus 20      322 341 59.7765.00 3.00 D.DO
á
Ukázka výsledku
▼ Detailed primer reports
Primer pair 1
Seq u e n ce (5' -> 3') Ternplate strand Length Start StopTm   GC% SeIf compIementarity Self 3' complementarity
Forward primer GCGAAAGCCTGACGGAGCGA Plus 20       322 341 59.77 65.00 3.00 0.00
Reverse primer ATTACCGCGGCTGCTGGCAC Minus 20       473 459 60.3965.006.00 200
Product length 157
Products on intended target
>HQ433693 1 Borrelia burgdorferi strain GSYSP3 16S ribosomal RNA gene: partial sequence product length = 157
Forward primer    1        gcgaaagcctgacggagcga 20 Template 322 .................... 341
Reverse primer    1        attaccgcggctgctggcac 20 Template 478 .................... 459
Products on potentially unintended templates
>EU135595.1 Borrelia valaisiana strain GSYSP3 16S ribosomal RNA gene: partial sequence product length = 157
Forward primer    1       gcgaaajgcctgacggagcga 20 Template 350 .................... 369
á
Vyhledejte sekvenci HQ433693.1 (16S rRNA Borrelia burgdorferi) a vyznačte na ní pozici nalezených primerů
1) Do vyhledávače BLAST zadejte „Borrelia burgdorferi 16S"
2) Najděte sekvenci HQ433693.1
3) Můžete do vyhledávače zadat taky přímo Acc. No.
Výsledek
AGCATGCAAGTCAAACGGGATGTAGCAATACATCTAGTGGCGAAC GGGTGAGTAACGCGTGGATGATCTACCTATGAGATGGGGATAACT ATTAGAAATAGTAGCTAATACCGAATAAAGTCAATTAATTTGTTA ATTGATGAAAGGAAGCCTTTAAAGCTTCGCTTGTAGATGAGTCTG CGTCTTATTAGTTAGTTGGTAGGGTAAATGCCTACCAAGGCGATG ATAAGTAACCGGCCTGAGAGGGTGAACGGTCACACTGGAACTGAG ACACGGTCCAGACTCCTACGGGAGGCAGCAGCTAAGAATCTTCCG CAATGGGCGAAAGCCTGACGGAGCGACACTGCGTGAATGAAGAAG GTCGAAAGATTGTAAAATTCTTTTATAAATGAGGAATAAGCTTTG TAGGAAATGACAAAGTGATGACGTTAATTTATGAATAAGCCCCGG CTAATTACGTGCCAGCAGCCGCGGTAATACG
Forward 322-341
5'- GCGAAAGCCTGACGGAGCGA - 3'
Reverse 478-459
5'- ATTACCGCGGCTGCTGGCAC - 3'
Další zajímavé „ Tools"
Taxonomie
a umný iui computing cuiMA-lu-<jeiioiiiic sequence aiiyiiinems. u is r>aseu uri a vaiiauun 01 uie Neeuierrian-vvunscii yiuuai alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, Splign is accurate in determining splice sites and tolerant to sequencing errors.
TaxPlot
A tool for comparing genomes on the basis of the protein sequences they encode. To use TaxPlot, one selects a reference genome and two species for comparison. Pre-computed BLAST results are then used to plot a point for each predicted protein in the reference genome, based on the best alignment with proteins in each of the two genomes being compared.
Taxonomy Browser
Supports searching the taxonomy tree using partial taxonomic names, common names, wild cards and phonetically similar names. For each taxonomic node, the tool provides links to all data in Entrez for that node, displays the lineage, and provides links to external sites related to the node.
Taxonomy Common Tree
Generates a taxonomic tree for a selected group of organisms. Users can upload a tile of taxonomy IDs or names, or they can enter names or IDs directly.
Taxonomy Statistics
Displays the number of taxonomic nodes in the database for a given rank and date of inclusion. Taxonomy Status Reports
Displays the current status of a set of taxonomic nodes or IDs. Variation Reporter
A tool designed to search for and report human sequence variation data from dbSNP and dbVar. Individual variations or batch lies can be submitted in HGVS, GVF or BED formats. Related information will be retrieved and reported in a downloadable table containing variation identifiers, nucleotide and cytogenetic band locations on various genomic assemblies, allele type and minor allele frequencies, predicted functional consequences (missense, nonsense, fra mesh iff, splice site, etc.), reported clinical significance, and relevant citations.
VecScreen
A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen searches a
Kolik záznamů o sekvencích DNA a kolik záznamů o sekvencích proteinů je v databázi ohledně druhu Thermus aquaticus?
Ke konci června 2012 to bylo 338 záznamů o DNA a 562 (5 641) záznamů o proteinech
Práce s databází NCBI
www.ncbi.nlm.nih.gov
Resources M  How To |v|
My NCBI Sign In
ÍNCBI
National Center for Biotechnology Information
All Databases
NCBI Home
Resource List (A-Z)
All Resources
Chemicals & Bioassays
Data & Software
DNA & RNA
Literature
Proteins
Sequence Analysis
Taxonomy
Training & Tutorials
Variation
Welcome to NCBI
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
About the NCBI | Mission | Organization | Research | RSS Feeds
Get Started
Tools: Analyze data using NCBI software
Downloads: Get NCBI data or software
How-To's: Learn how to accomplish specific tasks at NCBI
Submissions: Submit da:a to GenBank or other NCBi databases
Genomic Structural Variation
dbVar archives large scale genomic variation data and associates defined variants with phenotypic information.
Search
Popular Resources
PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein
PubChem
NCBI Announcements
New Microbial BLAST Page
12Jun2012
Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-complete
Sign up for the Fall Discovery Workshops!
Práce s databází NCBI
www.ncbi.nlm.nih.gov
Resources M  How To |v|
My NCBI Sign In
ÍNCBI
National Center for Biotechnology Information
NCBI Home
Resource List (A-Z)
All Resources
Chemicals & Bioassays
Data & Software
DNA & RNA
Domains & Structures
Genes & Expression
Genetics & Medicine Genomes & Maps
Homology
Literature
Proteins
Sequence Analysis
Taxonomy
Training & Tutorials
Variation
All Databases
Welcome to NCBI
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
About the NCBI | Mission | Organization | Research | RSS Feeds
Get Started
Tools: Analyze data using NCBI software
Downloads: Get NCBI data or software
How-To's: Learn how to accomplish specific tasks at NCBI
Submissions: Submit da:a to GenBank or other NCBI databases
Genomic Structural Variation
dbVar archives large scale genomic vanation data and associates defined variants with phenotypic information.
Search
Popular Resources
PubMed Bookshelf PubMed Central PubMed Health BLAST Nucleotide Genome SNP Gene Protein
PubChem
NCBI Announcements
New Microbial BLAST Page
12Jun2012
Now easier to use and with the familiar format and features of the standard NCBI BLAST services, includina auto-comDlete
Sian ud for the Fall Discovery Workshops!
Jak s nástroji pracovat
% NCBI    Resources 0  How To© My NCBI Sign In
%NCBI				
	All Databases			I Search |
National Center for Biotechnology Information				
NCBI Home	All Resources			
Resource List (A-Z) All Resources
Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures
Genes & Expression
Genetics & Medicine
Genomes & Maps Homology Literature Proteins
Sequence Analysis Taxonomy
1
Training & Tutorials Variation
All
Databases
Downloads
Submissions
Tools
How To
How To
Find bioassays in which a given drug is active Find bioassavs that test a particular disease or protein target Submit data to NCBI
Save text searches and set up automated searches with E-mail Download NCBI Software Retrieve all sequences for an organism ortaxon Find the function of a gene or gene product Find expression patterns
Find genes associated with a phenotvpe or disease
Compare protein homologs between two microbial genomes
View/download features around an object or between two objects on a chromosome
Find sequenced aenomes. including those in progress, for a taxonomic group
Download the complete genome for an organism
Display genomic annotation graphically
Submit sequence data :o NCBI
Convert feature coordinates between genomic assemblies Determine conserved synteny between the genomes of two organisms Find a homolog for a gene in another organism Obtain the full text of an article
uvidíme později
Porovnaní proteinů u dvou génomů
NCBl    Resources (v)  How To Q
MyNCBI Sign In
I All Databases
National Center for Biotechnology Information
NCBl Home
Resource List [A-Z]
All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins
Sequence Analysis Taxonomy Training & Tutorials Variation
How to: Compare protein homologs between two microbial genomes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Starting with the Frokaryotic Genome Project homepage...
FOR TWO ORGANISMS
1. Scroll down to find the genome of interest.
2. Click the NC_ accession link from the RefSeq column.
3. Click GenePlot (if available) from the BLAST homologs column of the resulting table interface.
4. Select the two organisms of choice and then click "Compare Selected Pair.
FOR THREE ORGANISMS
1. Proceed as in Steps 1 and 2 above.
2. Select TaxPlot from the BLAST homologs column of the resulting table interface.
3. Select two other organisms from the drop-down menus below the selected genome of interest.
4. Click the "compare" button located just below the graphical plot.
Návod
FOR TWO ORGANISMS
1) Scroll down to find the genome of interest.
2) Click the NC_ accession link from the RefSeq column.
3) Click GenePlot (if available) from the BLAST homologs column of the resulting table interface.
4) Select the two organisms of choice and then click "Compare Selected Pair".
FOR THREE ORGANISMS
1) Proceed as in Steps 1 and 2 above.
2) Select TaxPlot from the BLAST homologs column of the resulting table interface.
3) Select two other organisms from the drop-down menus below the selected genome of interest.
4) Click the "compare" button located just below the graphical plot.
Jak s nástroji pracovat
Download the complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI
Convert feature coordinates between genomic assemblies Determine conserved synteny between Itie genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article
Find articles about a topic similar to that in a given article View the 3D structure of a protein
Find a curated version of a sequence record (NCBI Reference Sequence)
Align two or more 3D structures to a given structure
Find published information on a gene or sequence
Find transcript sequences for a gene
Link from an object on a map to another resource
Design PGR primers and check them for specificity
Automate BLAST searches performed on NCBI servers
Obtain genomic sequence for/near a gene, marker, transcript or protein
Compare your sequence to the RefSeqGene/LRG standard
Run BLAST software on a local computer
Submit multiple query sequences in a single BLAST search
Find the complete taxonomic lineage for an organism
Generate a Common Tree for a set of taxa
Complete an NCBI tutorial
F nd out vv hat's new at NC3I
Learn about an NCBI resource
Learn about the basics of molecular biology and bioinformatics Download a large, custom set of records from NCBI
Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated wiih a gene
View genotype frequency data for a gene, disease or short genetic variation
Databáze Pub Med
Resources Q  How To @
MyNCBI Sign In
%NCBI
National Center for Biotechnology Information
n
All Databases
i -
NCBI Home Resource List (A-Z)
All Resources Chemicals & Bioass^^ Data & Software DNA & RNA Domains & Structures Geres & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins
Sequence Analysis Taxonomy Training & Tutorials Variation
How to: Obtain the full text of an article
Please note that there is a VoulJM3 tutorial about this.
Starting with an abstract in PubMed...
1. Search the PubMed with a search term, author name, or PubMed ID. Author name can be entered as follows: smith aj[au].
2. Click on the title of an entry of interest.
3. Look Tor icons in the upper-right-hand corner of the record:
■ Click on the PubMed Central link or a Publisher's link to access the full text of the article. Articles in PubMed Central are freely available. Articles on Publisher's websites are either freely available or can be accessed with a fee. Contact the specific publisher for questions about their site.
■ For PubMed records with no icons in the upper-right-hand corner, Loansome Doc can be accessed to order the article following these directions: PubMed Help.
Databáze Pub Med
Resources M  How To M
PublQed
US National Library of Mediáne National institutes of Health
PubMed
Advanced
MyNCBI Sign In
Help
PubMed
PubMed comprises more than 21 million citations for biomedical literature from MEDLINE, life science journals, and online books Citations may include links to full-text content from PubMed Central and publisher web sites.
Using PubMed PubMed Tools More Resources
PubMed Quick Start Guide PubMed Mobile MeSH Database
Full Text Articles Single Citation Matcher Journals in NCBl Databases
PubMed FAQs Batch Citation Matcher Clinical Trials
PubMed Tutorials Clinical Queries E-Utilities
New and Noteworthy
Topic-Specific Queries
LinkOut
Najděte publikace o Deinococcus radiodurans
Kolik review databáze obsahuje?
1) Ke konci června 2012 jich bylo kolem 962
2) Z toho review bylo 52
3) Všimněte si, že jen některé jsou volně dostupné
Jak s nástroji pracovat
Download trie complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI
Convert feature coordinates between genomic assemblies Determine conserved synteny between tJie genomes of two organisms Find a hornolog for a gene in another organism Obtain the full text of an article
Find articles about a topic similar to that in a given article View the 3D structure of a protein
Find a curated version of a sequence record (NCBI Reference Sequence)
Align two or more 3D structures to a given structure
Find published information on a gene or sequence
Find transcript sequences for a gene
Link from an object on a map to another resource
Design PGR primers and check them for specificity
Automate BLAST searches performed on NCBI servers
Obtain genomic sequence for/near a gene, marker, transcript or protein
Compare your sequence to the RefSeqGene/LRG standard
Run BLAST software on a local computer
Submit multiple query sequences in a single BLAST search
Find the complete taxonomic lineage for an organism
Generate a Common Tree for a set of taxa
Complete an NCBI tutorial
F nd out vv hat's new at NC3I
Learn about an NCBI resource
Learn about the basics of molecular biology and bioinformatics Download a large, custom set of records from NCBI
Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated wiih a gene
View genotype frequency data for a gene, disease or short genetic variation
3D struktury proteinů
řj NCBl    Resources Q  How To Q My NCBI Sign In
%NCB1
National Center for Biotechnology Information
All Databases
NCBI Home	
Resource List (A-Z)	
All Resources	
Chemicals & Bioassays	
Data & Software	
DNA & RNA	
Domains & Structures	
Genes & Expression	
Genetics & Medicine	
Genomes & Maps	
Homology	
Literature	
Proteins	
Sequence Analysis	
Taxonomy	
Training & Tutorials	
Variation	
How to: View the 3D structure of a protein
Starting with...
'A PUL4 HUUb (e.g.lUUUJ
1. Go to the Structure Home Page.
2^
3. Click the structure image, and on the resulting page click the "Structure View in Cn3D" button.
A PDB-FORMAT FILE THAT IS NOT IN PDB
1. Go to the VAST search page.
2. Enter or browse for the PDB file name and click the Submit button.
3. Click the 'View 3D Structure" button on the next page.
A PROTEIN ACCESSION NUMBER {e.g. NP_000240) OR SEQUENCE
1. Use the Finding a Structural Template guide to find the mostappropnate PDB structure.
2. Continue with step 1 under "a PDB code" above.
3D struktury proteinů
Resources M How To M
My NCBI Sign In
Structure
í Structure [£j
Limits Advanced
Help
Three dimensional structures provide a wealth of information on the biological function and the evolutionary history of macromolecules. They can be used to examine sequence-structure-function relationships, interactions, active sites, and more.
Using Structure
Search
How to (Quick Start) Guides
Help
News
FTP
Structure Tools
Macromolecular Resources Overview
CBLAST
Cn3D
IBIS
VAST
More Resources
PDB
Protein CDD
PubChem
NCBI Structure Group Resources & Research
Publications
Discover
Najděte strukturu mykobakteriální katalázy
Kolik záznamů najdete?
1) Heslo „catalase Mycobacterium"
2) Ke konci června 2012 jich bylo 46, všechny získané z krystalografických dat prostřednictvím paprsků X, žádná NMR
Jak s nástroji pracovat
Download trie complete genome for an organism Display genomic annotation graphically Submit sequence data to NCBI
Convert feature coordinates between genomic assemblies Determine conserved synteny between Itie genomes of two organisms Find a homolog for a gene in another organism Obtain the full text of an article
Find articles about a topic similar to that in a given article View the 3D structure of a protein
Find a curated version of a sequence record (NCBI Reference Sequence)
Align two or more 3D structures to a given structure
Find published information on a gene or sequence
Find transcript sequences for a gene
Link from an object on a map to another resource
Design PCR primers and check them for specificity
Automate BLAST searches performed on NCBI servers
Obtain genomic sequence for/near a gene, marker, transcript or protein
Compare your sequence to the RefSeqGene/LRG standard
Run BLAST software on a local computer
Submit multiple query sequences in a single BLAST search
Find the complete taxonomic lineage for an organism
Generate a Common Tree for a set of taxa
Complete an NCBI tutorial
F nd out vv hat's new at NC3I
Learn about an NCBI resource
Learn about the basics of molecular biology and bio informatics Download a large, custom set of records from NCBI
Find human variations associated with a phenotype or disease (clinical association) View a mutation site in a 3D structure View all SNPs associated wiih a gene
View genotype frequency data for a gene, disease or short genetic variation
Srovnání sekvence s referenčními
NCBl    Resources [v]  How To Q
MyNCBI Sign In
%NCBI
National Center lor Biotechnology Information
All Databases ^
NCBl Home
Resource List (A-Z)
All Resources Chemicals & Bioassays Data & Software
How to: Compare your sequence to the Ref SeqGene/LRG standard
Startinr. with a «nn.nM nr »«»Pn».
DNA&RNA Domains & Structures Geres & Expression Genetics & Medicine Genomes & Maps Homology Literature
Proteins
Sequence Analysis
Taxonomy Training & Tutorials Variation
1. From the RefSeqGene homepage, click on RefSeqGene BLAST in the Tools section. ^^^u^mi^Qjjrjjuerv^ea^ejic^orji^
3. Review the results as aligned to the RefSeqGene records by clicking on the Graphics in the Descriptions table.
4. If you submitted more than one query sequence and would like to review the alignment of a particular sequence, click on 'Configure", select you" chosen a igmient and wove :ie check box in Pont of:ne alignment ycu don': want cisplayed. Then click on "Configure" at the bottom of the page to apply your revised selections.
5. If you identify any differences between your sequence and the RefSeqGene, you can evaluate whether others have reported sequence vaiation in that region by reviewing the variation annotated on the RefSeqGene.
Srovnání sekvence s referenčními
BLAST
Home    Recent Results    Saved St
► NCBIf BLAST/ blastn suite
Basic Local Alignment Search Tool
RefSeqGene Nucleotide BLAST
blastn
My NCBI
rsiqn lni IRegisterl
I
Enter Query Sequence
Enter accession number(s), gi(s), or FASTA sequence(s) y:
Search RefSeqGene using a nucleotide query, more...
Clear      Query subrange y
From To
Or, upload file Job Title
Enter a descriptive title for your BLAST search D Align two or more sequences yj Choose Search Set
Procházet..
Database
Reference genomic sequences (refseq_genomic) f* \ <£>
Optional
Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown, w
Exclude □ |y|0[je|s (XM/XP) □ Uncultured/environmental sample sequences
Optional
LI Exclude ^
Entrez Query
Reset cage Bookmark
Zkopírujte si níže uvedenou sekvenci a porovnejte ji s databází referenčních sekvencí. Komu patří?
1) ATGAGTGAAATGAAATGCCCTTATGACCATACCAACTTGACCATGAGTAATGGCGCGCCTGTTATTGACA
2) ACCAAAATTCAATGACCGCAGGTGCCAGAGGGCCACTGCTTGCCCAAGATTTATGGCTCAATGAAAAATT
3) AGCCGACTTTGCCCGTGAGGTCATTCCAGAACGCCGCATGCACGCCAAAGGCTCAGGCGCATTTGGCACA
4) TTCACGGTAACGCACGACATCACCCAATACACCCGTGCTAAGATTTTTAGTGAAGTTGGCAAAAAAACTG
5) AGATGTTCGCTCGTTTTACCACCGTAGCAGGCGAGCGGGGGGCGGCGGACGCTGAGCGTGATATCCGTGG
6) TTTTGCCCTAAAATTCTACACCGAAGAGGGTAATTGGGACATGGTGGGTAATAACACGCCTGTTTTCTTT
7) TTAAGAGACCCAAAAAAATTCCCTGATTTAAATAAAGCGGTCAAACGAGACCCACGCACCAACATGCGTT
8) CTGCCACCAATAACTGGGATTTTTGGACACTGCTGCCAGAGGCGTTTCATCAGGTGACCATTGTGATGAG
9) CGACCGTGGCATTCCTAAATCTTACCGTCATATGCACGGCTTTGGCTCGCACACTTATAGCTTTATCAAT
10) GCTGATAATGAACGCTTTTGGGTCAAATTTCACTTTCGCACCCAACAAGGCATTGAAAATCTAACCGATG
11) CCGAAGCTG AAATGGTGGTTGGTAAAGACCGTGAGAGCAATCAGCGTG ATTTGTTTG ATGCCATTGAGCG
12) TGGCGATTTCCCAAAATGGACAATGTATGTGCAAATCATGCCAGAAACCGATGCCCAAACTGTGCCTTAT
13) CACCCATTTGATTTAACCAAAGTGTGGCCAAAAGGCGACTATCCGCTCATTGAAGTGGGTGAGTTTGAGT
14) TAAATAAAAATCCTGAAAACTTCTTTTTAGACGTTGAACAATCCGCTTTTGCCCCAAGCAACCTAGTCCC
15) GGGCATCAGTGTGTCCCCTGACCGCATGCTCCAAGCACGCCTATTTAACTATGCTGATGCGCAGCGTTAT
16) CGTTTGGGCGTCAATCGTAACCAAATTCCAGTGAATGCCCCACGCTGTCCTGTGTACTCAAACCAAAGAG
17) ACGGACAAGGGCGAGTGGGCGATAACTATGGCGGTCGTCCGCACTATGAACCGAACAGTTTTGGACAATG
18) GCAAGACCAGCCGCATTTGGCTGAACCAGCATTAAAAATTCATGGCGATGCTAAGTTTTGGGATTATCGT
19) GAGAATGATGATGATTATTTTAGCCAACCCAGAGCCTTGTTTGAGTTGATGAGCGATGAGCAAAAACAGG
20) CGTTATTTGGTAATACGGCTCGTGCGATGGGCGATGCCCCTGATTTTATTAAATACCGCCATATCCGTAA
21) TTGCGATAAATGCCACCCTGATTATGCCATGGGTGTGGCCAAAGCGTTAGGCCTTACGGTTGAAGATGCC
22) AAAAATGCGTATGAGAGCGACCCTGCTCGCCATCTGCCCAGCTTTTTATA
Mohlo by vám vyjít to, co je na následující -^r
strance
Distribution of 5 Blast Hits on the Query Sequence ■&
Mouse over to see the define, click to show alignments
Color key for alignment scores
Qutrv
<40	40-50		SO-200 >=200
			
1                     1                     1                     1 1			
300
GOO
900
1200
1500
Legend for links to other resources: E UniGene Q GEO [±3 Gene E] Structure    Map Viewer EA PubChem BioAssay
Sequences producing significant alignments:
Accession
Description
Max score	Total score	Ouerv coveraae	E — value	Max ident
2808	2808	100%	0.0	100%
763	763	83%	0.0	78%
695	695	87%	0.0	76%
553	553	89%	7e-153	74%
333	333	56%	le-86	74%
Links
NC 014147.1 Moraxella catarrhalis RH4 chromosome, complete qenome
NC 015460.1 Gallibacterium anatis UMN179 chromosome, complete aenome
NC 009524.1 Psychrobacter sp. PRwf-1 chromosome, complete genome
NC 014752,1 Neisseria lactamica 020-06 chromosome, complete genome
NC 010332.1 Lysinibacillus sphaericus C3-41 chromosome, complete qenome
Práce s databází NCBI
www.ncbi.nlm.nih.gov
Resources M  How To |v|
MyNCBI Sign In
ÍNCBI
National Center for Biotechnology Information
NCBI Home
Resource List (A-Z)
All Resources
Chemicals & Bioassays
Data & Software
DNA & RNA
Domains & Structures
Genes & Expression
Genetics & Medicine Genomes & Maps
Homology
Literature
Proteins
Sequence Analysis
Taxonomy
Training & Tutorials
Variation
All Databases
Welcome to NCBI
The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
About the NCBI | Mission | Organization | Research | RSS Feeds
Get Started
Tools: Analyze data using NCBI software
Downloads: Get NCBI data or software
How-To's: Learn how to accomplish specific tasks at NCBI
Submissions: Submit da:a to GenBank or other NCBI databases
Genomic Structural Variation
dbVar archives large scale genomic vanation data and associates defined variants with phenotypic information.
Search
Popular Resources
PubMed Bookshelf PubMed Central PubMed Hearth BLAST Nucleotide Genome SNP Gene Protein
PubChem
NCBI Announcements
New Microbial BLAST Page
12Jun2012
Now easier to use and with the familiar format and features of the standard NCBI BLAST services, including auto-comDiete
Sian ud for the Fall Discovery Workshops!
Pokyny pro vložení vlastních dat
ř; NCBI    Resources Q  How To 0 My NCBI Sign In
ÍNCBI
National Center for Biotechnology Information
NCBI Home
Resource List (A-Z]
All Resources
Chemicals & Bioassays Data & Software DNA & RNA Domairs & Structures Genes & Expression Genetics & Medicine Genomes & Maps
Homology Literature Proteins
Sequence Analysis Taxonomy Training & Tutorials Variation
All Databases
How to: Submit data to NCBI Starting with... SEQUENCE DATA
For guidance on the submission process for your sequencer), please seeJHow To: Submit sequence data to NCBI. Your data will be submitted to one of the following databases: GenSank
Sequence Read Archive (SRA> dbSNP dbVar GEO
MICRO ARRAY DATA
If you have microarray data from clinical studies that require controlled access, you should submit your data to dbGaP. For all other microarray data, you should submit your data to GEO via GEO's Submission page.
BIOASSAY DATA, SUBSTANCE OR SEQUENCE-BASED REAGENTS
BioAssay data and chemical substance information should be submitted to PubChem via their PubChem Deposition Gateway.
Posuzování podobnosti
sekvencí
Posuzování podobnosti sekvencí
Hledáme homologické sekvence vzniklé
v průběhu evoluce
Úkol: Jsou si podobnější sekvence A a B nebo B a C?
Výchozí sekvence
A = ATTGCTCTGT
B = ATAGCTCGGT
C = ATTGCACTGTAATGCCATGT
D = ATTGCTCTGAAATGCCCTGT
Posuzování podobnosti sekvencí
Přiložíme sekvence k sobě = přiřazení
(alignment)
A =
B =
A T T G C T C
II        I   I   I I
A T A G C T C
G T
I I
G T
par
nepár
C=ATTGCACTGTAATGCCATGT
I   I   I   I   I       Ml       I   I   I   I   I   I III
D=ATTGCTCTGAAATGCCCTGT
Posuzování podobností sekvencí
Výpočet normalizované hodnoty podobnosti
(score)
A = ATTGCTCTGT
II        I   I   I   I II
B=ATAGCTCGGT
hodnota páru    hodnota nepáru
SAB = (8x1 + 2 x 0)/10 = 0,80
y \ počet pozic
počet párů    počet nepárů (match) (mismatch)
Posuzování podobností sekvencí
ATTGCACTGTAATGCCATGT
MIM    Ml    I I I I I I III
ATTGCTCTGAAATGCCCTGT
SCD = (17x1 +3x0)720 = 0,85
0,85 > 0,80 -> C a D jsou si podobnější
Globální a lokální přiřazení
Problém sekvencí odlišné délky nebo velmi odlišné sekvence stejné délky
Global alignment
> Sekvence přiřadíme po celé délce i za cenu vnášení mezer
> Vhodné pouze u příbuzných sekvencí
> Vhodné pro mnohočetná přiřazení
Local alignment
> Sekvence přiřadíme jen tam, kde jsou velmi podobné, ostatní budeme ignorovat
> Vhodné pro nepříbuzné sekvence
> U podobných sekvencí odpovídá globálnímu přiřazení
Globální a lokální přiřazení
Global alignment
SLAV----------APATNIK-------PIQNYR-I------AKSE TQRYMVIE
SLAVYTYIE FVRANAPATNIKSECVRAAPIQNYRRVEHVRATAKSE TQRYMVIE
Local alignment
S LAVYT YIE FVRANAPATNIKSE C VRAAPIQN YRRVE H VRAT AKSE TQRYMVIE -------------NAPATNI KSE CVRA- PIQNYRRVE HVRA-------------
Bodový diagram
Grafická mapa podobností sekvencí, pomůcka pro
volbu přiřazení
ATTGATCGGTCÍ
A# T •
Ť •F
G
i
C
T » C • G • G • T • A#
T.j».............
T •
G •
TG
		Filtrace krátkých
Nalezené shody		diagonál
ATTGATCGGTCTTG		ATTGATCGGTCTTG
A# •		A#
T •      • •		T • •
T    • •		T    • %
G                • •		G •
C             • •		C •
T   • •		T •
C            • •		C •
G •		G •
G •		G •
T •		T
A# •		A# •
T • •		T # •
		T     • •
G        • #		G        • •
Výběr algoritmu přiřazení
l\	\ 1	l\
\ 1	K 1	
Globální přiřazení je možné jen pro dvojici A-B
Prohledavače
FASTA
> Modelový heuristický algoritmus
> Vytvořený v roce 1988
> Dnes už se málo používá, jsou výkonnější metody
BLAST
> Nej rozšířenější heuristický algoritmus
> Vytvořený v roce 1990
> Rychlejší než FASTA asi 6x
BLAST
Basic Local Alignment Search Tool http://blast.ncbi.nlm.nih.gov/Blast.cgi
Sjjjc Local Alignment Search Tool
» NCBI/ BLAST Hem*
BLAST flndi region* of umllarHy bHWMn biological »»qu«r»c«
DELTA-BLAST, a mora sensitive protein pfoten search JaJ
BLAST Assembled RefSeq Genomes
i l,i,I /-r.:n.- I.i --r.t<. •   , i lilt all mnnrn.c FU AST tillJ|>J•.«-.
o Rai
□ ArMbtdoevt IhJtlttnt
Basic BLAST I
Choose a BLAST program Id run
0 Boa mimi
1 ujmo leno
■ Ofo«oprirtJmtjfiOfl.isw
a Micro*—
nucl*ctid« blan protein bla»
Search a mtcteottd» database uamg a nucleotide query
AJgonltimi btaln rncgabunl discontiguous megatJcM
Search protein database using a protein rjuery Ahjmtnmt blaslp psi-blast phi Wait drli.i hum
...    prow.n •.iimikj.itraralatadnucleotide-»rty
Your Recent. Raaute mm
m Nun........ ummum
Nuciwuai HiunutSU qi-AMuhh twwiiti in,..
A m miocXx* 51 AST p*f* •
Man 04 Jun 2012 12 00 00 EST
r.iitot blah wit.,
ijtarta)Oa)
lirou an nliitiM «<r» lolai
Tento prohledávací nástroj prochází celou databází a už
jsme jej několikrát použili
BLAST
I Basic BLASH
Choose a BLAST program to run.
available.
Moii,04Jun2012 12:00:00 EST ß More BLAST news...
nucleotide blast
protein blast
blastx tblastn tblastx
Search a nucleotide database using a nucleotide query
Algorithms: blastn, megablast, discontiguous megablast
Search protein database using a protein query
Algorithms: blastp, psi-blast, phi-blast, delta-blast
Search protein database using a translated nucleotide query Search translated nucleotide database using a protein query Search translated nucleotide database using a translated nucleotide query
Specialized BLAST
Tip of the Day
Use Genomic BLAST to see the genomic context
If you are interested in the evolution of a particular gene or gere family it is often interesting to examine the intro -t   1 structure even across species.
13 More tips...
Choose a type of specialized search (or database name in parentheses.)
Make specific primers with Primer-BLAST Search trace archives
a Find conserved domains in your sequence (cds)
n Find sequences with similar conserved domain architecture (cdart)
n Search sequences that have gene expression profiles (GEO)
□ Search immunoglobulins (IgBLAST) Q Search using SNP flanks
n Screen sequence for vector contamination (vecscreen) n Align two (or more) sequences using BLAST (bl2seq)
□ Search protein or nucleotide targets in PubChem BioAssay
Využití variant BLAST
Program     Dotaz      Databáze Úroveň
blastp protein
tblastn protein
srovnaní
protein
protein
protein
protein
DNA1
protein
tblastx DNA
DNA1
Použití
Hledání edentických sekvencí DNA
Hledání homologických proteinů
Hledání genů a homologických proteinů na nové DNA
Hledání genů u necharakterizovaných DNA
protein     Studium struktury genů
* Jsou srovnávány přeložené DNA sekvence ve všech čtecích rámcích
Datové soubory
Jsou jednotné pro všechny zmíněné databáze
> Každý záznam má přístupový kód - Accession Number - proměnlivý počet písmen a číslic podle toho, přes kterou databázi byl přijat-je to jakési rodné číslo
> Publikací v GenBank získá jedinečné číslo Gl (GenBank Identifier) - číslo občanského průkazu
> Autoři primárního záznamu jej mohou upravovat a vznikají tak verze, první má číslo 1
> Změnou verze se mění číslo Gl
> Všechny verze se uchovávají
Hlavička záznamů
NCBI    Resources Q How To ©
Nucleotide
Nucleotide
přístupový kód název
Display Settirlps: Fl GenBank
its Advanced
Send to: ©
Mycobacterium aviur/ insertion element hot spot flanking region FR300
GenBank: AF3S9936 1 FASTA Graphi)
Go to: [v
LOCUS AF369936      W 312  bp        DNA linear      BCT 27-MAY-2 '. 1
DEFINITION    Mycobacterium avium insertion element hot spot  flanking region
FR300. ACCESSION AF369936
VERSION AF369936I1 1GI:14210032
typ záznamu
F F h F
verze
číslo Gl
gb = GenBank, emb = EMBL, dbj = DDBJ
Někdy sekvenuje daný úsek nezávisle více různých skupin, pak je v databázi v několika podobách s různými přístupovými kódy a často i pod různými názvy!
Anatomie databázového záznamu
řj NCBI    Resources @  How To ©
MyNCBI Sign In
Nucleotide
Nucleotide
Limits Advanced
Help
Display Settings: fcl GenBank Send to: fcl
Mycobacterium avium insertion element hot spot flanking region FR300
GenBank: AF369936.1 FASTA Graphics
Go to: R
LOCUS
DEFINITION
ACCESSION VESSI017 KEYWORDS SOURCE CEC-JiXISK
REFEPE17CE AUTHORS TITLE
REFEEEHCE AUTHORS TITLE
ľOVS-VP.L
FEATURES
source
AF3E5936 312 bp       D1I?_ linear     3CT 2 7-MAY-2-3 01
Mycobacterium avium Lnser-ion element hot gpot flaniir.g region
r*3-::.
AF-3€593«
AF3E593S.1 GI:14210D82
Mycobacterium avium Mycobacterium avium
Bacterid; A-ctlnobacteria; Actinabacteridae; Actinomycetales; Corynebacterineae; Mycobacteriaceae; Mycobacterium,- Mycobacterium ivim complex   {MAC) .
1 (bases 1 to 312)
Bartos,M. ,  Siraatcva, P., Dvoraka,!., Weaton,Fi. I.  and PavlLlr,I.
Inaertiar. element ISS-31 hot apc-t FR300
Unpublished
2 (bases 1 to 312)
Bart-os,M.,  Svaatcva, P.,  Dvotaka.L., Has-or., R. T .  and Pa vl LI:, I. I-lrect Submission
Submit-ed  lL3-APR-20C1!   Department cf Bacteriology, Veterinary Reaearch InatLtute, Hudcoua 70, Bmo £21 32r  Czech republic
Location/Qualifiers
1_.312
/organiair= "Mycobact-eriun avium" /mc-l_type="ger.oir.lc DMA" /db_xref="t axon:1764" 1_.312
/note="inaertion element hot spct- fLanding region FR30C; contair.a ho- spo- fcr IS90I insertion.1'
Chan go region shown Customize view	
Analyze this sequence	H
Run BLAST	
Pick Primers	
Highlight Sequence Features	
Find in this Sequence	
	
Related information Related Sequences Taxonomy	e
Recent activity h
Turn Off Clear
[=]   Mycobacterium avium insertion elernen! hol spot flanking region FR300 ^lsieoucj^
FR3ÜD (2)
|5   Neisseria gonorrhoeae strain PID2059
TraG3 (traG3). EppA (eppA): Ych1 (. ^ieöd^
Neisseria gonorrhoeae (22947)
NLJClHHde
|~]  Actinobacillus pleuropneumonias in vivo induced promoler iviG; and CpsIB (c nusIboü^
1 cagccagccg aatgtcatcc zgagg^agag aagccagaac agzcgaaaga cgc^ccacgc
CI cgcracggrg ccggrgccga gcccgatgta gaggctgcgc tgrcgat-rca cgcggttgat
121 ctgr-tcttrg atgc-ggcgg gcacgatctt cattgg-ggc ttrctttcgg tggggcggcg
1S1 ccggagtggc gc-Lľg^-cgttg zgizccagtac: aagcccggcc ggzggctacü gatzecaacc
241 acgriccggca cg-rartaccc -gcacggcag ggggctgtcg aaagggttcg ccggtgaacg 301 tgtzgcgagt tg
Anatomie databázového záznamu
Mycobacterium avium FR300
Neisseria gonorrhoeae
Program b!2seq
Porovnání dvou a více sekvencí
Specialized BLAST
Choose a type of specialized search (or database name in parentheses.)
□ Make specific primers with Primer-BLAST
□ Search trace archives
□ Find conserved domains in your sequence (cds)
□ Find sequences with similar conserved domain architecture (cdart)
□ Search sequences that have gene expression profiles (GEO) n Search immunoglobulins (IgBLAST)
□ Search using SNPflanks
□ Align two (or more) sequences using BLAST (bl2seq) ^^^r^e^rcTT^TrcfleTTr^iHujc^
□ Search SRA transcript and genomic libraries
n Constraint Based Protein Multiple Alignment Tool
□ Needleman-Wunsch Global Sequence Alignment Tool
□ Search RefSeqGene
" Search WGS sequences grouped by organism
BLAST is a registered trademark of the National Library of Medicine.
Copyright I Disclaimer I Privacy I Accessibilr.v C-srlact I Send feedback
Program bl2seq
BLAST
Home    Recent Results     Saved Strategies Help
► NCE .' ELA STl bias-in suite Mastn    r; 3iip     math i tMarin
mlaBfa
Enter Query Sequence Enter accession numbenlsl, ni(sl. or FflSTA sequencer! ,
Or. upload file Job Tide
Enter a oeEcripUve UDe Kir pour BLAST seared y H Align two or more sequences S* Enler Subject Sequence
En^er^ccesjiojMiiiniberji^^A^r^^eo^
[ Prpchazet...   I ..
Or. upload file Program Selection
Optimize for
[ Prpchazet...   ] *
Align Sequences Nucleotide BLAST
bla3th program b March miäflatMe subjMtB ualng a ruclsaCdB quer/, more-... gin      Query subrange ,
From To
Subject subrange fee From To
V Highly similar sequences (mega bl=s:i 'J More dissimilar sequences {discontiguous megablastl
Somewhat similar sequences (plasln) cnoase a 3last aigoritnm St
Search n..cleotide seque-ce   using Megablast lOptimiie for highly sirrila- sequences) ^haw rflGultc ki a nn window
l±lAlgorithm parameters
rrr
Výsledek porovnání dvou sekvencí
Basic Lacat Alignment Search Tooľ
Home     Recent Results-     -Saved Strategies Help
MlUBi; BLJ.5T/blastn s u itfr2 sequence Formatting Rťsultŕ - YZXRUWrWIIR
E-iir. s.r-z R^BLbrriŕ    5a v s Searľr Strata ies     > For matt i rig options    t> D>nv,Tka-j
Nucleotide Sequence [774 letters}
Blast 1 sequences
QueiylD ld|31915
Deiciipticn None
Molecule type nucleic acid
Query Length 7 74
dotaz
Subject ID 31917
Desciipticn None
Molecule type nucleic acid
Subject Length 589
Program BLASTN 2,2,26+- >Citation
Other reports: > Search Summary FT axon o my reports!
QGraphic Summary
CisribLnon ctf 2 Kast H its otithe Query Sequence !i' Mouse-over to show deriine and scones, -click: to shew alignments
Color key for alignment scares
<40 -Í0-50	50-30	33-200 >-200
		
ff) Dot Matrix View / ň Descriptions
_s:s-: -or
r resources: [5 U n Gene Q GEO Q Cene S Structure Q Map Viewer Hl PubCŕiem BioAsssy
Lducinfl significant alignments;
Dot Matrix View
Plot of Id |42899 vs 42901
oo -r---																	
o ■—i																	
UD CD																	
CD																	
OO O i—i																	
C^J																	
CO o																	
		i i i i	i i i i	i i i i	i i i i	i i i i	I I I I	i i i i	I I I I	i i i i	i i i i	i i i i	1111	i i i i	i i i i	i i i i	|
		ICII428	99 11	]0 1!	50 2(	)0 z\	50 3i	]0 z\	50 4i	]0 4^	JO 5(	)o 5;	50 6(	)0	50 7i	30 7i	1 51
Výsledek porovnání dvou sekvencí
Q Descriptions
Legend for links to other resources: E U n Gene Q GEO B Gene    Structure □ klap Viewer Si PubCtam BnAssay
Sequences pruducinp iiflrtifitdrtt uli^nnrkäntä-;
Aťľťť		ůtiůrl                                                                            Měk acurt;	Tutsi acurĽ	Query cuvĽrEqt!	E value	Max ident	Linka
	1	1057	12"	B7ifc	0.0	100«	
ň Alignments
LengťhF6EŠ
Sort alignment? for this subject sequence bj: ^Ľerv s-ar- nrsi-i-r.   ľľbisr- s-ar- nrsi-irr.
&7 Mts (57E),    Expect = ř.O IKies = 5B4/5?ř Gaps =
5tranu=Plus/Plu=
llllllll    llllllllllllllll llllllllllllllllllllllllllllllll
Sbjct sl CiQTCTTrím^TTTCiCmTACACGT^A^^ 1EŮ Ůuer7   U.1   ACTCCCňGTCKAAÄTGĽAGTTCCCAňGTTA^^ IEP
1.65 bits (6&),    Ejcpecfc = 6e-45 Ideiitities = 55/5? ílom). Gaps = 0/65 I Dl) Strarařflus/ flus
(Juerj   61 ä TCiGCAŕAGAAAGCŕAG^TTm^I^^^
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct   500 TCňGCAAAGAAAGCAAGCTTTCITCCrGCTACCGTTľľGÄCI^^
ÍJuer;   71ä   SCCiSCGTTCi^TCIGASCCiiíSmiiAC   7 El
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sbjct   650   SCCiSCGTTCi^TCIGASCCiiiSmiiAC 676
Identities = frakce totožných pozic
Výsledek porovnání dvou sekvencí
Q Descriptions
Legend for links to other resources: E U n Gene Q GEO B Gene    Structure □ klap Viewer Si PubCtam BnAssay
Stqutinctia pruducincj gi-ůrtifitůrtt ůli-ůrtrrtůnti;
Aťľťť		ůtiůn                                                                            Měk acurt;	Tutal a curĽ	Query cuvLrĚqt!	e value	Max idt±nt	Linka
	1	1057	12"	B7ifc	0.0		
Q Alignments
>LľL|S19.1-Lengtb=SEE
Score = 1057 bits (57e],    Ijcpect = c1 .D Identities = &e4r&ŠB (ššl), Caps = ÍV&LM> (cl) St r anŕfliiP Lu=
Query   1       'KľirrC'KľT;^ 60 I I i I I ii I      I i I I ii I I i I I ii I ii   I i I I ii I I i I I ii I ii I I i I I ii I ii I I i I I i
Sc-rt alignments fcr this subject sequence b;: L viLue    ?ĽDľf    ľťrcer.o iier_-ioy Sľc-sv ľ ~ar~ ■pc-si-icn   Fľjic-c.- ľ - ar~ ■pdsí-íkt.
ETbjrt ftuerf
ETbjrt
51     Cř.Cř.T lľTrľTA"5 IZ^.TTTri1. ""5 IľT.1. HA"5T5EAATT IľTA ""Z ZTZ H [ľTAAAETAITT CľTA5 I I i I I i i I I i I I i I I i i I I i I I i i I i i I I i I I i i I I i I I i i     i I I i I I i i I i i I I i I I i
Quer?    Ľl.   AO^XZCAjSTCTSA^TGCAGTT ""riAAETTAA 5CT" SEGW.TTTCACATCrCr.CTTAAAA IEP
»155 bits (E&],    Licpecfc = Ee-45 Identities = ES/E& í lodi], -Sip? = IV55 í H] 3trand=eius/ flu s
Querj   6? J TCÍÁKjJiJ^^^
I i I I i I i I I i I i i I i I i i I i I I i I i I I i I i I I i I i i I i I i i I i I I i I i I I i I i I I i I i i I ETbj ct TCAGCAAAGAAASCAA-SOTTCTTCCT^
ftuerj   711   ^^GC^^J-.TCľ^jGC^^TC?J-JZ   7 51
I i I I i I i I I i I i i I i I i i I i I I i I i I I i I Sbjct   65ľ   ffiľCAGOffTTJCAATÍľTSAGO^ 67 E
Score (zjištěná hodnota podobnosti) = pokud dosáhne zvolené mezní hodnoty (cutoffs program přiřazení zaznamená jako HSP [high scoring pairs, jinak je opustí
Výsledek porovnání dvou sekvencí
Q Descriptions
Legend for links to other resources: E U n Gene Q GEO B Gene    Structure □ klap Viewer Si PubCtam BnAssay
Sequences pruducinn. äi-unificdnt ůliůnirtůnfci;
Aťľťť		ůtiůn                                                                            Měk acurt;	Total acurĽ	ťjuérv CĽVĽrSqt!	E value	Max idt±nt	Linka
	1	1057	12"	B7ifc	0.0		
Q Alignments
>LľL|S19.1-Lengtb=EEE
Sc-rt alignments fcr this subject sequence b;: L viLue    ?ĽDľf    ľťrcer.o iier_-ioy Juc-ry ľ~ar~ T-z-i-izr.   žiici-z- ľ- ar~ ■p;;iľ-irr. Sccre = ld&7 bits (57E],    Ijcpect = ľ.ŕ Identities = 5E4/59ľ Caps = íy&5ŕ (PI)
5t r aniŕPlu s/P Lu=
Query   L       'KľirrC'KľT;^^ 6ľ
I I I I I II I      I I I I II I I I I I II I II   I I I I II I I I I I II I II I I I I I II I II I I I I I I řbjet   i "Sn1. r^.T 5A" 5T Ľ.STAIIATT™ HAA 555 S :TT5 "HľT "5 " ľTT " 5 IľTATTCCTľíľ 6u
£uerf    6L     CňCňTXTTCTAÍľBCňTTTCA^ LíľC
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I     I I I I I I I I I I I I I I I I I Sbjct   Gl     CrOTÍľTCTAjlľ'SCÄTTTCAXľCHľrACAC^ Líllľ
Quer7    LEI   ACTCCCňGTCTGA^J^ LEP
Score =    L55 bits (E&],    Licpect = Ee-Mentities = E&7E& ÍLĽIM], Saps = IVES í H] Strand=eLias/ flu s
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
5bj ct TCMCAAJ^AAASCAAKTTI^^
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
Expectancy, E-value (hodnota očekáváte I n osti) = 8e-45 8 x 10"455 průkazné jsou hodnoty pod 0,001
Něco navíc k procvičení BLAST
Prohledejte databázi a zjistěte, jakému organismu patří následující sekvence
GCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAATGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCCCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGGCATGGCTGCGTCAGGGTTCCC CCCATTGCGCAATATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGACCGTGTCTCA GTTCCAGTGTGGCTGGTCATCCTCTCAGACCAGCTAGAGATCGCAGGCTTGGTAGG CCTTTACCCCACCAACTACCTAATCCCACTTGGGCTCATCTTATGGCAGGTGGCCC TAAGGTCCCACCCTTTCCTCCTCAGAGAATACGCGGTATTAGCTGCAGTTTCCCAC AGTTATCCCCCTCCATAAGCCAGATTCCCAAGCATTACTCACCCGTCCGCCACTCG TCAGCAAAGAAAGCAAGCTTTCTTCCTGCTACCGTTCGACTTGCATGTGTTAAGCC TGCCGCCAGCGTTCAATCTGAGCCAGGATCAACNTCTTTCTCCAAA
Měla by to být Pasteurella multocida
Porovnejte tyto dvě sekvence, patří
stejnému druhu?
GCTTTCGCACATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAATGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCCCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGG
GCTTTCGCGCATGAGCGTCAGTACATTCCCAAGGGGCTGCCTTCGCCTTCGGTATT CCTCCACATCTCTACGCATTTCACCGCTACACGTGGAATTCTACCCCTCCCTAAAG TACTCTAGACTCCCAGTCTGAAAAGCAGTTCCCAAGTTAAGCTCGGGGATTTCACA TCTCACTTAAAAGTCCGCCTGCGTGCCCTTTACGCGCAGTTATTCCGATTAACGCT CGCACCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGTGCTTCTTCTGT AATTAACGTCAATGATGCTATCTATTTAACAACATCCCTTCCTCATTACCGAAAGA ACTTTACAACCCGAAGGCCTTCTTCATTCACGCGG
ANO, shoda 368/371, 99%
Tímto jsme se bavili ve 3. ročníku v praktických cvičeních dost a dost
Mnohočetné přiřazení
Multiple alignment
> Jedním z příkladů využití je porovnávání více sekvencí současně
CLUSTAL
> CLUSTAL W = všeobecně dostupný
> CLUSTAL X = CLUSTAL W opatřený grafickým rozhraním pro Windows
> CLUSTAL OMEGA = poslední verze
http://www.clustal.org
Shrnutí
1) Práce se sekvenčními daty
2) Základní veřejně dostupné databáze
3) Práce se stránkami NCBI
4) Jak se posuzuje podobnost sekvencí
5) Prohledavač BLAST
6) Mnohočetné přiřazení - program CLUSTAL