Towards Faster Similarity Search by Dynamic Reordering of
Streamed Queries

NÁLEPA, Filip, Michal BATKO a Pavel ZEZULA. Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries. In Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Berlin, Heidelberg: Springer. s. 61-88. ISBN 978-3-662-58383-8. doi:10.1007/978-3-662-58384-5_3. 2018.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries
Autoři	NÁLEPA, Filip (203 Česká republika, garant, domácí), Michal BATKO (203 Česká republika, domácí) a Pavel ZEZULA (203 Česká republika, domácí).
Vydání	Berlin, Heidelberg, Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII, od s. 61-88, 28 s. 2018.
Nakladatel	Springer

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Švýcarsko
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	tištěná verze "print"
Impakt faktor	Impact factor: 0.402 v roce 2005
Kód RIV	RIV/00216224:14330/18:00101119
Organizační jednotka	Fakulta informatiky
ISBN	978-3-662-58383-8
ISSN	0302-9743
Doi	http://dx.doi.org/10.1007/978-3-662-58384-5_3
Klíčová slova anglicky	stream processing; similarity search
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: RNDr. Pavel Šmerk, Ph.D., učo 3880. Změněno: 30. 4. 2019 07:28.

Anotace

Current era of digital data explosion calls for employment of content-based similarity search techniques, since traditional searchable metadata like annotations are not always available. In our work, we focus on a scenario where the similarity search is used in the context of stream processing, which is one of the suitable approaches to deal with huge amounts of data. Our goal is to maximize the throughput of processed queries while a slight delay is acceptable. We propose a technique that dynamically reorders the queries coming from the stream in order to use our caching mechanism in huge data spaces more effectively. We were able to achieve significantly higher throughput compared to the baseline when no reordering and no caching were used. Moreover, our proposal does not incur any additional precision loss of the similarity search, as opposed to some other caching techniques. In addition to the throughput maximization, we also study the potential of trading off the throughput for low delays (waiting times). The proposed technique allows to be parameterized by the amount of the throughput that can be sacrificed.

Návaznosti
GA16-18889S, projekt VaV	Název: Analytika pro velká nestrukturovaná data (Akronym: Big Data Analytics for Unstructured Data)
GA16-18889S, projekt VaV	Investor: Grantová agentura ČR, Big Data Analytics for Unstructured Data

VytisknoutZobrazeno: 23. 4. 2024 07:30

Towards Faster Similarity Search by Dynamic Reordering of Streamed Queries

Další aplikace