Speeding up Continuous kNN Join by Binary Sketches

D 2018

Speeding up Continuous kNN Join by Binary Sketches

NÁLEPA, Filip, Michal BATKO and Pavel ZEZULA

Basic information

Original name

Speeding up Continuous kNN Join by Binary Sketches

Authors

NÁLEPA, Filip (203 Czech Republic, guarantor, belonging to the institution), Michal BATKO (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)

Edition

Cham, Advances in Data Mining, p. 183-198, 16 pp. 2018

Publisher

Springer

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Switzerland

Confidentiality degree

is not subject to a state or trade secret

Publication form

electronic version available online

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/18:00100950

Organization unit

Faculty of Informatics

ISBN

978-3-319-95785-2

ISSN

DOI

http://dx.doi.org/10.1007/978-3-319-95786-9_14

UT WoS

000469337800014

Keywords in English

continuous kNN similarity join; binary sketches

Abstract

V originále

Real-time recommendation is a necessary component of current social applications. It is responsible for suggesting relevant newly published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data, i.e., the kNN join is computed. In this work, we aim at a frequent requirement that only the recently published data are subject of the recommendation, thus a sliding time window is defined and only the data published within the limits of the window can be recommended. Due to large amounts of both the users and the published data, it becomes a challenging task to continuously update the results of the kNN join as new data come into and go out of the sliding window. We propose a binary sketch-based approximation technique suited especially to cases when the metric distance computation is an expensive operation (e.g., the Euclidean distance in high dimensional vector spaces). It applies cheap Hamming distances to skip over 90% of the expensive metric distance computations. As revealed by our experiments on 4,096 dimensional vectors, the proposed approach significantly outperforms compared existing approaches.

Links

GA16-18889S, research and development project

Name: Analytika pro velká nestrukturovaná data (Acronym: Big Data Analytics for Unstructured Data)

Investor: Czech Science Foundation

Citovat

NÁLEPA, Filip, Michal BATKO and Pavel ZEZULA. Speeding up Continuous kNN Join by Binary Sketches. Online. In Petra Perner. Advances in Data Mining. Cham: Springer, 2018, p. 183-198. ISBN 978-3-319-95785-2. Available from: https://dx.doi.org/10.1007/978-3-319-95786-9_14.

@inproceedings{1418218,
   author = {Nálepa, Filip and Batko, Michal and Zezula, Pavel},
   address = {Cham},
   booktitle = {Advances in Data Mining},
   doi = {http://dx.doi.org/10.1007/978-3-319-95786-9_14},
   editor = {Petra Perner},
   keywords = {continuous kNN similarity join; binary sketches},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Cham},
   isbn = {978-3-319-95785-2},
   pages = {183-198},
   publisher = {Springer},
   title = {Speeding up Continuous kNN Join by Binary Sketches},
   year = {2018}
}

TY  - CONF
ID  - 1418218
AU  - Nálepa, Filip - Batko, Michal - Zezula, Pavel
PY  - 2018
TI  - Speeding up Continuous kNN Join by Binary Sketches
PB  - Springer
CY  - Cham
SN  - 9783319957852
KW  - continuous kNN similarity join
KW  - binary sketches
N2  - Real-time recommendation is a necessary component of current social applications. It is responsible for suggesting relevant newly published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data, i.e., the kNN join is computed. In this work, we aim at a frequent requirement that only the recently published data are subject of the recommendation, thus a sliding time window is defined and only the data published within the limits of the window can be recommended. Due to large amounts of both the users and the published data, it becomes a challenging task to continuously update the results of the kNN join as new data come into and go out of the sliding window. We propose a binary sketch-based approximation technique suited especially to cases when the metric distance computation is an expensive operation (e.g., the Euclidean distance in high dimensional vector spaces). It applies cheap Hamming distances to skip over 90% of the expensive metric distance computations. As revealed by our experiments on 4,096 dimensional vectors, the proposed approach significantly outperforms compared existing approaches.
ER  -

NÁLEPA, Filip, Michal BATKO and Pavel ZEZULA. Speeding up Continuous kNN Join by Binary Sketches. Online. In Petra Perner. \textit{Advances in Data Mining}. Cham: Springer, 2018, p.~183-198. ISBN~978-3-319-95785-2. Available from: https://dx.doi.org/10.1007/978-3-319-95786-9\_{}14.

Přehled o publikaci