MÍČ, Vladimír, Tomáš RAČEK, Aleš KŘENEK and Pavel ZEZULA. Similarity Search for an Extreme Application: Experience and Implementation. In Nora Reyes, Richard Connor, Nils Kriege, Daniyal Kazempour, Ilaria Bartolini, Erich Schubert, Jian-Jia Chen. Similarity Search and Applications: 14th International Conference, SISAP 2021, Dortmund, Germany, September 29 - October 1, 2021, Proceedings. Cham: Springer, 2021, p. 265-279. ISBN 978-3-030-89656-0. Available from: https://dx.doi.org/10.1007/978-3-030-89657-7_20.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Similarity Search for an Extreme Application: Experience and Implementation
Authors MÍČ, Vladimír (203 Czech Republic, belonging to the institution), Tomáš RAČEK (203 Czech Republic, belonging to the institution), Aleš KŘENEK (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, guarantor, belonging to the institution).
Edition Cham, Similarity Search and Applications: 14th International Conference, SISAP 2021, Dortmund, Germany, September 29 - October 1, 2021, Proceedings, p. 265-279, 15 pp. 2021.
Publisher Springer
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Switzerland
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
Impact factor Impact factor: 0.402 in 2005
RIV identification code RIV/00216224:14330/21:00122667
Organization unit Faculty of Informatics
ISBN 978-3-030-89656-0
ISSN 0302-9743
Doi http://dx.doi.org/10.1007/978-3-030-89657-7_20
UT WoS 000722252200020
Keywords in English Similarity search in metric space;Efficiency;Distance distribution;Dimensionality curse;Extreme distance function
Tags DISA, firank_B
Tags International impact, Reviewed
Changed by Changed by: RNDr. Vladimír Míč, Ph.D., učo 359890. Changed: 23/11/2021 14:16.
Abstract
Contemporary challenges for efficient similarity search include complex similarity functions, the curse of dimensionality, and large sizes of descriptive features of data objects. This article reports our experience with a database of protein chains which form (almost) metric space and demonstrate the following extreme properties. Evaluation of the pairwise similarity of protein chains can take even tens of minutes, and has a variance of six orders of magnitude. The minimisation of a number of similarity comparisons is thus crucial, so we propose a generic three stage search engine to solve it. We improve the median searching time 73 times in comparison with the search engine currently employed for the protein database in practice.
Links
EF16_019/0000822, research and development projectName: Centrum excelence pro kyberkriminalitu, kyberbezpečnost a ochranu kritických informačních infrastruktur
PrintDisplayed: 27/4/2024 11:39