Nearest-neighbor Search from Large Datasets using Narrow
Sketches

D 2022

Nearest-neighbor Search from Large Datasets using Narrow Sketches

NAOYA, Higuchi, Imamura YASUNOBU, Vladimír MÍČ, Shinohara TAKESHI, Hirata KOUICHI et. al.

Základní údaje

Originální název

Nearest-neighbor Search from Large Datasets using Narrow Sketches

Autoři

NAOYA, Higuchi, Imamura YASUNOBU, Vladimír MÍČ (203 Česká republika, garant, domácí), Shinohara TAKESHI, Hirata KOUICHI a Kuboyama TETSUJI

Vydání

Portugal, Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM, od s. 401-410, 10 s. 2022

Nakladatel

SciTePress

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Portugalsko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

URL

Kód RIV

RIV/00216224:14330/22:00125541

Organizační jednotka

Fakulta informatiky

ISBN

978-989-758-549-4

DOI

http://dx.doi.org/10.5220/0010817600003122

UT WoS

000819122200044

Klíčová slova anglicky

Narrow Sketch;Nearest-neighbor Search;Large Dataset;Sketch Enumeration;Partially Restored Distance

Štítky

DISA, firank_B

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 14. 5. 2024 12:44, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

We consider the nearest-neighbor search on large-scale high-dimensional datasets that cannot fit in the main memory. Sketches are bit strings that compactly express data points. Although it is usually thought that wide sketches are needed for high-precision searches, we use relatively narrow sketches such as 22-bit or 24-bit, to select a small set of candidates for the search. We use an asymmetric distance between data points and sketches as the criteria for candidate selection, instead of traditionally used Hamming distance. It can be considered a distance partially restoring quantization error. We utilize an efficient one-by-one sketch enumeration in the order of the partially restored distance to realize a fast candidate selection. We use two datasets to demonstrate the effectiveness of the method: YFCC100M-HNfc6 consisting of about 100 million 4,096 dimensional image descriptors and DEEP1B consisting of 1 billion 96 dimensional vectors. Using a standard desktop computer, we condu cted a nearest-neighbor search for a query on datasets stored on SSD, where vectors are represented by 8-bit integers. The proposed method executes the search in 5.8 seconds for the 400GB dataset YFCC100M, and 0.24 seconds for the 100GB dataset DEEP1B, while keeping the recall of 90%.

Návaznosti

EF16_019/0000822, projekt VaV

Název: Centrum excelence pro kyberkriminalitu, kyberbezpečnost a ochranu kritických informačních infrastruktur

Citovat

NAOYA, Higuchi, Imamura YASUNOBU, Vladimír MÍČ, Shinohara TAKESHI, Hirata KOUICHI a Kuboyama TETSUJI. Nearest-neighbor Search from Large Datasets using Narrow Sketches. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM. Portugal: SciTePress, 2022, s. 401-410. ISBN 978-989-758-549-4. Dostupné z: https://dx.doi.org/10.5220/0010817600003122.

@inproceedings{1841719,
   author = {Naoya, Higuchi and Yasunobu, Imamura and Míč, Vladimír and Takeshi, Shinohara and Kouichi, Hirata and Tetsuji, Kuboyama},
   address = {Portugal},
   booktitle = {Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
   doi = {http://dx.doi.org/10.5220/0010817600003122},
   keywords = {Narrow Sketch;Nearest-neighbor Search;Large Dataset;Sketch Enumeration;Partially Restored Distance},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Portugal},
   isbn = {978-989-758-549-4},
   pages = {401-410},
   publisher = {SciTePress},
   title = {Nearest-neighbor Search from Large Datasets using Narrow Sketches},
   url = {https://www.scitepress.org/PublicationsDetail.aspx?ID=s5xL4A2YSOs=&t=1},
   year = {2022}
}

TY  - JOUR
ID  - 1841719
AU  - Naoya, Higuchi - Yasunobu, Imamura - Míč, Vladimír - Takeshi, Shinohara - Kouichi, Hirata - Tetsuji, Kuboyama
PY  - 2022
TI  - Nearest-neighbor Search from Large Datasets using Narrow Sketches
PB  - SciTePress
CY  - Portugal
SN  - 9789897585494
KW  - Narrow Sketch;Nearest-neighbor Search;Large Dataset;Sketch Enumeration;Partially Restored Distance
UR  - https://www.scitepress.org/PublicationsDetail.aspx?ID=s5xL4A2YSOs=&t=1
N2  - We consider the nearest-neighbor search on large-scale high-dimensional datasets that cannot fit in the main memory. Sketches are bit strings that compactly express data points. Although it is usually thought that wide sketches are needed for high-precision searches, we use relatively narrow sketches such as 22-bit or 24-bit, to select a small set of candidates for the search. We use an asymmetric distance between data points and sketches as the criteria for candidate selection, instead of traditionally used Hamming distance. It can be considered a distance partially restoring quantization error. We utilize an efficient one-by-one sketch enumeration in the order of the partially restored distance to realize a fast candidate selection. We use two datasets to demonstrate the effectiveness of the method: YFCC100M-HNfc6 consisting of about 100 million 4,096 dimensional image descriptors and DEEP1B consisting of 1 billion 96 dimensional vectors. Using a standard desktop computer, we condu cted a nearest-neighbor search for a query on datasets stored on SSD, where vectors are represented by 8-bit integers. The proposed method executes the search in 5.8 seconds for the 400GB dataset YFCC100M, and 0.24 seconds for the 100GB dataset DEEP1B, while keeping the recall of 90%.
ER  -

NAOYA, Higuchi, Imamura YASUNOBU, Vladimír MÍČ, Shinohara TAKESHI, Hirata KOUICHI a Kuboyama TETSUJI. Nearest-neighbor Search from Large Datasets using Narrow Sketches. In \textit{Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM}. Portugal: SciTePress, 2022, s.~401-410. ISBN~978-989-758-549-4. Dostupné z: https://dx.doi.org/10.5220/0010817600003122.

Podrobný výpis o publikaci