J 2012

Large-scale similarity data management with distributed Metric Index

NOVÁK, David, Michal BATKO and Pavel ZEZULA

Basic information

Original name

Large-scale similarity data management with distributed Metric Index

Name in Czech

Zpracování rozsáhlých kolekcí podobnostních dat pomocí distribuovaného metrického indexu

Authors

NOVÁK, David (203 Czech Republic, belonging to the institution), Michal BATKO (203 Czech Republic, guarantor, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)

Edition

Information Processing and Management, ELSEVIER, 2012, 0306-4573

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

Impact factor

Impact factor: 0.817

RIV identification code

RIV/00216224:14330/12:00057505

Organization unit

Faculty of Informatics

UT WoS

000307682100005

Keywords in English

Distributed data structures; Performance tuning; Similarity search; Scalability; Peer-to-peer structured networks; Metric space

Tags

Tags

International impact, Reviewed
Změněno: 23/4/2013 12:17, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application.

Links

GAP103/10/0886, research and development project
Name: Vizuální vyhledávání obrázků na Webu (Acronym: VisualWeb)
Investor: Czech Science Foundation, Content-based Image Retrieval on the Web Scale
GPP202/10/P220, research and development project
Name: Podobnostní vyhledávání s konstantní škálovatelností (Acronym: SIM-SCALE)
Investor: Czech Science Foundation
VF20102014004, research and development project
Name: Multimediální analýza (Acronym: Multimediální analýza)
Investor: Ministry of the Interior of the CR