Detailed Information on Publication Record
2021
Similarity Search for an Extreme Application: Experience and Implementation
MÍČ, Vladimír, Tomáš RAČEK, Aleš KŘENEK and Pavel ZEZULABasic information
Original name
Similarity Search for an Extreme Application: Experience and Implementation
Authors
MÍČ, Vladimír (203 Czech Republic, belonging to the institution), Tomáš RAČEK (203 Czech Republic, belonging to the institution), Aleš KŘENEK (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, guarantor, belonging to the institution)
Edition
Cham, Similarity Search and Applications: 14th International Conference, SISAP 2021, Dortmund, Germany, September 29 - October 1, 2021, Proceedings, p. 265-279, 15 pp. 2021
Publisher
Springer
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Switzerland
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
Impact factor
Impact factor: 0.402 in 2005
RIV identification code
RIV/00216224:14330/21:00122667
Organization unit
Faculty of Informatics
ISBN
978-3-030-89656-0
ISSN
UT WoS
000722252200020
Keywords in English
Similarity search in metric space;Efficiency;Distance distribution;Dimensionality curse;Extreme distance function
Tags
International impact, Reviewed
Změněno: 23/11/2021 14:16, RNDr. Vladimír Míč, Ph.D.
Abstract
V originále
Contemporary challenges for efficient similarity search include complex similarity functions, the curse of dimensionality, and large sizes of descriptive features of data objects. This article reports our experience with a database of protein chains which form (almost) metric space and demonstrate the following extreme properties. Evaluation of the pairwise similarity of protein chains can take even tens of minutes, and has a variance of six orders of magnitude. The minimisation of a number of similarity comparisons is thus crucial, so we propose a generic three stage search engine to solve it. We improve the median searching time 73 times in comparison with the search engine currently employed for the protein database in practice.
Links
EF16_019/0000822, research and development project |
|