Other formats:
BibTeX
LaTeX
RIS
@article{905348, author = {Novák, David and Batko, Michal and Zezula, Pavel}, article_number = {4}, doi = {http://dx.doi.org/10.1016/j.is.2010.10.002}, keywords = {Metric space; Similarity search; Data structure; Approximation; Scalability}, language = {eng}, issn = {0306-4379}, journal = {Information Systems}, title = {Metric index: an efficient and scalable solution for precise and approximate similarity search}, volume = {36}, year = {2011} }
TY - JOUR ID - 905348 AU - Novák, David - Batko, Michal - Zezula, Pavel PY - 2011 TI - Metric index: an efficient and scalable solution for precise and approximate similarity search JF - Information Systems VL - 36 IS - 4 SP - 721-733 EP - 721-733 PB - Elsevier SN - 03064379 KW - Metric space KW - Similarity search KW - Data structure KW - Approximation KW - Scalability N2 - Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in established structures such as the B+ - tree or even in a distributed storage. We implemented the M-Index with the B+ - tree and performed experiments on two datasets - the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-Index variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-Index demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient - maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search. ER -
NOVÁK, David, Michal BATKO and Pavel ZEZULA. Metric index: an efficient and scalable solution for precise and approximate similarity search. \textit{Information Systems}. Elsevier, 2011, vol.~36, No~4, p.~721-733. ISSN~0306-4379. Available from: https://dx.doi.org/10.1016/j.is.2010.10.002.
|