Detailed Information on Publication Record
2018
Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
NÁLEPA, Filip, Michal BATKO and Pavel ZEZULABasic information
Original name
Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
Authors
NÁLEPA, Filip (203 Czech Republic, guarantor, belonging to the institution), Michal BATKO (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)
Edition
Berlin, Heidelberg, Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII, p. 61-88, 28 pp. 2018
Publisher
Springer
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Switzerland
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
Impact factor
Impact factor: 0.402 in 2005
RIV identification code
RIV/00216224:14330/18:00101119
Organization unit
Faculty of Informatics
ISBN
978-3-662-58383-8
ISSN
Keywords in English
stream processing; similarity search
Tags
International impact, Reviewed
Změněno: 30/4/2019 07:28, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.
Links
GA16-18889S, research and development project |
|