Towards Faster Similarity Search by Dynamic Reordering of
Streamed Queries

D 2018

Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries

NÁLEPA, Filip, Michal BATKO a Pavel ZEZULA

Základní údaje

Originální název

Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries

Autoři

NÁLEPA, Filip (203 Česká republika, garant, domácí), Michal BATKO (203 Česká republika, domácí) a Pavel ZEZULA (203 Česká republika, domácí)

Vydání

Berlin, Heidelberg, Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII, od s. 61-88, 28 s. 2018

Nakladatel

Springer

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Švýcarsko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Impakt faktor

Impact factor: 0.402 v roce 2005

Kód RIV

RIV/00216224:14330/18:00101119

Organizační jednotka

Fakulta informatiky

ISBN

978-3-662-58383-8

ISSN

DOI

http://dx.doi.org/10.1007/978-3-662-58384-5_3

Klíčová slova anglicky

stream processing; similarity search

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 30. 4. 2019 07:28, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.

Návaznosti

GA16-18889S, projekt VaV

Název: Analytika pro velká nestrukturovaná data (Akronym: Big Data Analytics for Unstructured Data)

Investor: Grantová agentura ČR, Big Data Analytics for Unstructured Data

Citovat

NÁLEPA, Filip, Michal BATKO a Pavel ZEZULA. Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries. In Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Berlin, Heidelberg: Springer, 2018, s. 61-88. ISBN 978-3-662-58383-8. Dostupné z: https://dx.doi.org/10.1007/978-3-662-58384-5_3.

@inproceedings{1430396,
   author = {Nálepa, Filip and Batko, Michal and Zezula, Pavel},
   address = {Berlin, Heidelberg},
   booktitle = {Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII},
   doi = {http://dx.doi.org/10.1007/978-3-662-58384-5_3},
   keywords = {stream processing; similarity search},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Berlin, Heidelberg},
   isbn = {978-3-662-58383-8},
   pages = {61-88},
   publisher = {Springer},
   title = {Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries},
   year = {2018}
}

TY  - JOUR
ID  - 1430396
AU  - Nálepa, Filip - Batko, Michal - Zezula, Pavel
PY  - 2018
TI  - Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
PB  - Springer
CY  - Berlin, Heidelberg
SN  - 9783662583838
KW  - stream processing
KW  - similarity search
N2  - Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.
ER  -

NÁLEPA, Filip, Michal BATKO a Pavel ZEZULA. Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries. In \textit{Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII}. Berlin, Heidelberg: Springer, 2018, s.~61-88. ISBN~978-3-662-58383-8. Dostupné z: https://dx.doi.org/10.1007/978-3-662-58384-5\_{}3.

Podrobný výpis o publikaci