Combining Metric Features in Large Collections

D 2008

Combining Metric Features in Large Collections

BATKO, Michal, Petra KOHOUTKOVÁ and Pavel ZEZULA

Basic information

Original name

Combining Metric Features in Large Collections

Name in Czech

Kombinování metrických charakteristik ve velkých kolekcích dat

Authors

BATKO, Michal (203 Czech Republic, guarantor), Petra KOHOUTKOVÁ (203 Czech Republic) and Pavel ZEZULA (203 Czech Republic)

Edition

Los Alamitos CA, Washington, Tokyo, 1st International Workshop on Similarity Search and Applications (SISAP 2008), p. 79-86, 8 pp. 2008

Publisher

IEEE Computer Society

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Mexico

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

URL

RIV identification code

RIV/00216224:14330/08:00024185

Organization unit

Faculty of Informatics

ISBN

978-0-7695-3101-4

UT WoS

000255509900009

Keywords in English

similarity search; complex query; p2p network; approximation

Abstract

ORIG CZ

V originále

Current information systems are required to process complex digital objects, which are typically characterized by multiple descriptors. Since the values of many descriptors belong to non-sortable domains, they are effectively comparable only by a sort ofsimilarity. Moreover, the scalability is very important in the current digital-explosion age. Therefore, we propose a distributed extension of the well-known threshold algorithm for peer-to-peer paradigm. The technique allows to answer similarity queries that combine multiple similarity measures and due to its peer-to-peer nature it is highly scalable. We also explore possibilities of approximate evaluation strategies, where some relevant results can be lost in favor of increasing the efficiency by order of magnitude. To reveal the strengths and weaknesses of our approach we have experimented with a 1.6 million image database from Flicker comparing the content of the images by five similarity measures from the MPEG-7 standard. To the best of our knowledge, the experience with such a huge real-life dataset is quite unique.

In Czech

Článek popisuje rozšíření existujícího "prahovacího" algoritmu pro prostředí peer-to-peer sítí. Technika umožňuje řešit podobnostní dotazy kombinující několik podobnostních měřítek a díky využití peer-to-peer technologie je vysoce škálovatelná. Dále jsou v článku rozebírany přínosy aproximativní strategie. Výsledky jsou ověřeny na databázi s 1,6 miliony obrázků ze systému Flickr.

Links

GP201/08/P507, research and development project

Name: Komplexní podobnostní dotazy nad rozsáhlými objemy dat

Investor: Czech Science Foundation, Complex similarity searching in very large data collections

1ET100300419, research and development project

Name: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu

Investor: Academy of Sciences of the Czech Republic, Intelligent Models, Algorithms, Methods and Tools for the Semantic Web (realization)

Detailed Information on Publication Record