Detailed Information on Publication Record
2018
Speeding up Continuous kNN Join by Binary Sketches
NÁLEPA, Filip, Michal BATKO and Pavel ZEZULABasic information
Original name
Speeding up Continuous kNN Join by Binary Sketches
Authors
NÁLEPA, Filip (203 Czech Republic, guarantor, belonging to the institution), Michal BATKO (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)
Edition
Cham, Advances in Data Mining, p. 183-198, 16 pp. 2018
Publisher
Springer
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Switzerland
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
Impact factor
Impact factor: 0.402 in 2005
RIV identification code
RIV/00216224:14330/18:00100950
Organization unit
Faculty of Informatics
ISBN
978-3-319-95785-2
ISSN
UT WoS
000469337800014
Keywords in English
continuous kNN similarity join; binary sketches
Tags
Tags
International impact, Reviewed
Změněno: 13/5/2020 19:24, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
Real-time recommendation is a necessary component of current social applications. It is responsible for suggesting relevant newly published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data, i.e., the kNN join is computed. In this work, we aim at a frequent requirement that only the recently published data are subject of the recommendation, thus a sliding time window is defined and only the data published within the limits of the window can be recommended. Due to large amounts of both the users and the published data, it becomes a challenging task to continuously update the results of the kNN join as new data come into and go out of the sliding window. We propose a binary sketch-based approximation technique suited especially to cases when the metric distance computation is an expensive operation (e.g., the Euclidean distance in high dimensional vector spaces). It applies cheap Hamming distances to skip over 90% of the expensive metric distance computations. As revealed by our experiments on 4,096 dimensional vectors, the proposed approach significantly outperforms compared existing approaches.
Links
GA16-18889S, research and development project |
|