NOVÁK, David and Pavel ZEZULA. Performance Study of Independent Anchor Spaces for Similarity Searching. The Computer Journal. Oxford, UK: Oxford University Press, 2014, vol. 57, No 11, p. 1741-1755. ISSN 0010-4620. Available from: https://dx.doi.org/10.1093/comjnl/bxt114.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Performance Study of Independent Anchor Spaces for Similarity Searching
Authors NOVÁK, David (203 Czech Republic, guarantor, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution).
Edition The Computer Journal, Oxford, UK, Oxford University Press, 2014, 0010-4620.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher United Kingdom of Great Britain and Northern Ireland
Confidentiality degree is not subject to a state or trade secret
WWW publisher site
Impact factor Impact factor: 0.787
RIV identification code RIV/00216224:14330/14:00073219
Organization unit Faculty of Informatics
Doi http://dx.doi.org/10.1093/comjnl/bxt114
UT WoS 000344649500011
Keywords in English similarity search; metric space; multi-index; efficiency; robustness
Tags DISA, Metric Space
Tags International impact, Reviewed
Changed by Changed by: RNDr. David Novák, Ph.D., učo 4335. Changed: 8/1/2015 15:14.
Abstract
This work targets the problem of search efficiency vs. answer quality of approximate metric-based similarity search. We especially focus on techniques based on recursive Voronoi-like partitioning or, from another perspective, on pivot permutations. These techniques use sets of reference objects (anchors/pivots) to partition the metric space into cells of close data items. Instead of refining the search space by enlarging the anchor set of a single index, we propose to divide a large pivot set into several subsets and build multiple indexes with independent space partitioning; at query time, the overall search costs are also divided among the separate indexes. Our thorough experimental study on three different real datasets uncovers drawbacks of excessive increase of a single pivot set size—such partitioning refinement can be counterproductive beyond a certain number of pivots. Our approach overcomes the root causes of this limitation and increases the answer quality while preserving the search costs. Further, we address the question of robustness of the answer quality, which can be significantly improved by utilization of independent anchor spaces.
Links
GBP103/12/G084, research and development projectName: Centrum pro multi-modální interpretaci dat velkého rozsahu
Investor: Czech Science Foundation
VG20122015073, research and development projectName: Efektivní vyhledávání v rozsáhlých biometrických datech (Acronym: EFBIO)
Investor: Ministry of the Interior of the CR
PrintDisplayed: 27/4/2024 14:59