2016
Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets
KREJČÍ, Adam, TR HUPP, Matej LEXA, Bořivoj VOJTĚŠEK, Petr MÜLLER et. al.Základní údaje
Originální název
Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets
Autoři
KREJČÍ, Adam (203 Česká republika), TR HUPP (826 Velká Británie a Severní Irsko), Matej LEXA (703 Slovensko, domácí), Bořivoj VOJTĚŠEK (203 Česká republika) a Petr MÜLLER (203 Česká republika)
Vydání
Bioinformatics, Oxford, Oxford University Press, 2016, 1367-4803
Další údaje
Jazyk
angličtina
Typ výsledku
Článek v odborném periodiku
Obor
10201 Computer sciences, information science, bioinformatics
Stát vydavatele
Velká Británie a Severní Irsko
Utajení
není předmětem státního či obchodního tajemství
Impakt faktor
Impact factor: 7.307
Kód RIV
RIV/00216224:14330/16:00089377
Organizační jednotka
Fakulta informatiky
UT WoS
000368357800002
Klíčová slova anglicky
phage display; sequence logo; clustering;
Příznaky
Mezinárodní význam, Recenzováno
Změněno: 13. 3. 2018 14:02, doc. Ing. Matej Lexa, Ph.D.
Anotace
V originále
Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible.