D 2021

Similarity Search for an Extreme Application: Experience and Implementation

MÍČ, Vladimír, Tomáš RAČEK, Aleš KŘENEK and Pavel ZEZULA

Basic information

Original name

Similarity Search for an Extreme Application: Experience and Implementation

Authors

MÍČ, Vladimír (203 Czech Republic, belonging to the institution), Tomáš RAČEK (203 Czech Republic, belonging to the institution), Aleš KŘENEK (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, guarantor, belonging to the institution)

Edition

Cham, Similarity Search and Applications: 14th International Conference, SISAP 2021, Dortmund, Germany, September 29 - October 1, 2021, Proceedings, p. 265-279, 15 pp. 2021

Publisher

Springer

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Switzerland

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/21:00122667

Organization unit

Faculty of Informatics

ISBN

978-3-030-89656-0

ISSN

UT WoS

000722252200020

Keywords in English

Similarity search in metric space;Efficiency;Distance distribution;Dimensionality curse;Extreme distance function

Tags

International impact, Reviewed
Změněno: 23/11/2021 14:16, RNDr. Vladimír Míč, Ph.D.

Abstract

V originále

Contemporary challenges for efficient similarity search include complex similarity functions, the curse of dimensionality, and large sizes of descriptive features of data objects. This article reports our experience with a database of protein chains which form (almost) metric space and demonstrate the following extreme properties. Evaluation of the pairwise similarity of protein chains can take even tens of minutes, and has a variance of six orders of magnitude. The minimisation of a number of similarity comparisons is thus crucial, so we propose a generic three stage search engine to solve it. We improve the median searching time 73 times in comparison with the search engine currently employed for the protein database in practice.

Links

EF16_019/0000822, research and development project
Name: Centrum excelence pro kyberkriminalitu, kyberbezpečnost a ochranu kritických informačních infrastruktur