Similarity Grid for Searching in Metric Spaces

D 2005

Similarity Grid for Searching in Metric Spaces

BATKO, Michal, Claudio GENNARO and Pavel ZEZULA

Basic information

Original name

Similarity Grid for Searching in Metric Spaces

Name in Czech

Podobnostní GRID pro hledání v metrických prostrorech

Authors

BATKO, Michal (203 Czech Republic), Claudio GENNARO (380 Italy) and Pavel ZEZULA (203 Czech Republic, guarantor)

Edition

Berlin, Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures: 6th Thematic Workshop of the EU Network of Excellence DELOS. Revised Selected Papers. LNCS 3664, p. 25-44, 20 pp. 2005

Publisher

Springer-Verlag Heidelberg

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

20206 Computer hardware and architecture

Country of publisher

Germany

Confidentiality degree

není předmětem státního či obchodního tajemství

RIV identification code

RIV/00216224:14610/05:00013400

Organization unit

Institute of Computer Science

ISBN

3-540-28711-6

UT WoS

000232268700003

Keywords in English

distributed data; scalable structures; similarity search; metric space

Abstract

ORIG CZ

V originále

Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets.

In Czech

Podobnostní hledání v centralizovaném prostředí se ukazuje nedostatečným z hlediska škálovatelnosti. GHT* je distribuovaná struktura pro podobnostní hledání, založeném na metrických prostorech, která dosahuje prakticky konstantní odezvy pro libovolně rozsáhlá data.

Links

1ET100300419, research and development project

Name: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu

Investor: Academy of Sciences of the Czech Republic, Intelligent Models, Algorithms, Methods and Tools for the Semantic Web (realization)

Citovat

BATKO, Michal, Claudio GENNARO and Pavel ZEZULA. Similarity Grid for Searching in Metric Spaces. In Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures: 6th Thematic Workshop of the EU Network of Excellence DELOS. Revised Selected Papers. LNCS 3664. Berlin: Springer-Verlag Heidelberg, 2005, p. 25-44. ISBN 3-540-28711-6.

@inproceedings{580521,
   author = {Batko, Michal and Gennaro, Claudio and Zezula, Pavel},
   address = {Berlin},
   booktitle = {Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures: 6th Thematic Workshop of the EU Network of Excellence DELOS. Revised Selected Papers. LNCS 3664},
   keywords = {distributed data; scalable structures; similarity search; metric space},
   language = {eng},
   location = {Berlin},
   isbn = {3-540-28711-6},
   pages = {25-44},
   publisher = {Springer-Verlag Heidelberg},
   title = {Similarity Grid for Searching in Metric Spaces},
   year = {2005}
}

TY  - JOUR
ID  - 580521
AU  - Batko, Michal - Gennaro, Claudio - Zezula, Pavel
PY  - 2005
TI  - Similarity Grid for Searching in Metric Spaces
PB  - Springer-Verlag Heidelberg
CY  - Berlin
SN  - 3540287116
KW  - distributed data
KW  - scalable structures
KW  - similarity search
KW  - metric space
N2  - Similarity search in metric spaces represents an important paradigm for content-based retrieval of many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. The proposed GHT* index is a scalable and distributed structure. By exploiting parallelism in a dynamic network of computers, the GHT* achieves practically constant search time for similarity range queries in data-sets of arbitrary size. The structure also scales well with respect to the growing volume of retrieved data. Moreover, a small amount of replicated routing information on each server increases logarithmically. At the same time, the potential for interquery parallelism is increasing with the growing data-sets because the relative number of servers utilized by individual queries is decreasing. All these properties are verified by experiments on a prototype system using real-life data-sets.
ER  -

BATKO, Michal, Claudio GENNARO and Pavel ZEZULA. Similarity Grid for Searching in Metric Spaces. In \textit{Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures: 6th Thematic Workshop of the EU Network of Excellence DELOS. Revised Selected Papers. LNCS 3664}. Berlin: Springer-Verlag Heidelberg, 2005, p.~25-44. ISBN~3-540-28711-6.

Detailed Information on Publication Record