Week 5 : Sequence Alignment Introduction to Bioinformatics (LF:DSIB01) Adobe Systems BLAST - Basic Local Alignment Search Tool Image result for Volume 215, Issue 3 jmb Published 1990 ~80,000 citations -BLAST is a method for performing Local Alignment -It uses a Seed query that must match perfectly to the reference, then builds around it -The alignment ends when a score threshold is passed - -Step 1: Break Query into short words of specific length W -Step 2: Search for this sequence in a database of the reference -Step 3: Keep seeds that pass Threshold and extend -Step 4: Calculate E (Expected Value) – log10 chance that such alignment is found by luck Adobe Systems BLAST - Basic Local Alignment Search Tool •https://blast.ncbi.nlm.nih.gov/Blast.cgi • • 3 Adobe Systems BLAST - Basic Local Alignment Search Tool 4 Adobe Systems BLAST - Basic Local Alignment Search Tool 5 -Useful information about alignment(s) -Works well for a single sequence Adobe Systems BLAST variations – old standalone programs •BLASTN - Compares a DNA query to a DNA database. Searches both strands automatically. It is optimized for speed, rather than sensitivity. •BLASTP - Compares a protein query to a protein database. •BLASTX - Compares a DNA query to a protein database, by translating the query sequence in the 6 possible frames, and comparing each against the database (3 reading frames from each strand of the DNA) searching. •TBLASTN - Compares a protein query to a DNA database, in the 6 possible frames of the database. •TBLASTX - Compares the protein encoded in a DNA query to the protein encoded in a DNA database, in the 6*6 possible frames of both query and database sequences (Note that all the combinations of frames may have different scores). •BLAST2 - Also called advanced BLAST. It can perform gapped alignments. •PSI-BLAST - (Position Specific Iterated) Performs iterative database searches • 6 1999 Adobe Systems BLAST variations – web services and API 7 2019 Adobe Systems BWA - Burrows-Wheeler Aligner •BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. •Variants: short seq (up to 100bp), longer seq (70-1,000,000bp) •Utilizes “Burrows Wheeler Transform” to make alignment faster 8 BWA, 2009 Adobe Systems Burrows Wheeler Transform 9 figure1 a)Sort every permutation (sliding through) of the string and then take the last column = Transform Essentially sorting by the character that comes AFTER this one (right-context) Adobe Systems figure1 Burrows Wheeler Transform 10 Important Property: LF (Last First) On the First and Last columns the order of same letters remains the same Next Step: Sort the table by order of letter AND index $ A1 A2 A3 C1 C2 G1 Adobe Systems Burrows Wheeler Transform - Reversing 11 figure1 b) We can easily recreate the original sequence by working on a reverse order from any point in the Transform Adobe Systems Burrows Wheeler Transform 12 figure1 c) For every subsequence we can quickly find the location of all possible matches by working backwards in the transform Using precalculation the complexity of the lookup can be reduced close to O(1) https://www.youtube.com/watch?v=4n7NPk5lwbI https://www.youtube.com/watch?v=kvVGj5V65io Explanation of BWT for genomic indexing by Ben Langmead (creator of bowtie) Adobe Systems Bowtie 13 Bowtie, 2009 Bowtie 2, 2012 Adobe Systems TopHat2 14 TopHat2, 2013 Bowtie 2, 2012 TopHat2 works well with gaps + introns Adobe Systems STAR •STAR works well with gaps + introns • •Preferable for mapping on RNA • •Faster than TopHat2 15 STAR, 2013 Adobe Systems Clustal – Omega (Multiple Sequence Alignment) 16 Adobe Systems Exercise •Global Alignment: Use Alignment Matrix • •Global Alignment: Align Position Weight Matrices • •https://bit.ly/3427HeE 17 Adobe Systems Adobe Systems Adobe Systems Adobe Systems Adobe Systems Adobe Systems 18 www.ceitec.eu CEITEC @CEITEC_Brno Thank you for your attention! 60 minutes lunch break. >