Crawling, Indexing, and Similarity Searching Images on the Web

D 2008

Crawling, Indexing, and Similarity Searching Images on the Web

BATKO, Michal, Fabrizio FALCHI, Claudio LUCCHESE, David NOVÁK, Raffaele PEREGO et. al.

Basic information

Original name

Crawling, Indexing, and Similarity Searching Images on the Web

Name in Czech

Získávání, indexování a podobnostní vyhledávání obrázků na webu

Authors

BATKO, Michal (203 Czech Republic, belonging to the institution), Fabrizio FALCHI (380 Italy), Claudio LUCCHESE (380 Italy), David NOVÁK (203 Czech Republic, belonging to the institution), Raffaele PEREGO (380 Italy), Fausto RABITTI (380 Italy), Jan SEDMIDUBSKÝ (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, guarantor)

Edition

Mondello, Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, p. 382-389, 8 pp. 2008

Publisher

Salvatore Gaglio, Ignazio Infantino, Domenico Sacca

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Italy

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/08:00024250

Organization unit

Faculty of Informatics

ISBN

978-88-6122-154-3

Keywords in English

similarity search; content-based image retrieval; metric space; MPEG-7 descriptors; peer-to-peer search network

Abstract

ORIG CZ

V originále

In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images, to show the possibility to scale Content-based Image Retrieval (CBIR) systems towards the Web size. First, we had to tackle the non-trivial process of image crawling and descriptive feature extraction, performed by using the European EGEE computer GRID, building a test collection, the first of such scale, that will be opened to the research community for experiments and comparisons. Then, we had to develop indexing and searching mechanisms which can scale up to these volumes and answer similarity queries in real-time. The results of our experiments are very encouraging for future applications.

In Czech

V tomto článku prezentujeme naše zkušenosti získané z budování experimentálního systému pro podobnostní hledání na datové množině obsahující více jak 50 miliónů obrázků. Nejdříve jsme museli vyřešit netriviální proces získávání obrázků a jejich popisů pro vytvoření testovací sady, první takového rozsahu, která bude k dispozici všem výzkumníkům pro různé experimenty a porovnání. Potom jsme museli vyvinout indexovací a vyhledávací mechanismy, které umožňují škálovat to takových objemů a zároveň zodpovídat podobnostní dotazy v reálném čase. Výsledky našich experimentů jsou velice slibné pro budoucí aplikace.

Links

GD102/05/H050, research and development project

Name: Integrovaný přístup k výchově studentů DSP v oblasti paralelních a distribuovaných systémů

Investor: Czech Science Foundation, Integrated approach to education of PhD students in the area of parallel and distributed systems

GP201/08/P507, research and development project

Name: Komplexní podobnostní dotazy nad rozsáhlými objemy dat

Investor: Czech Science Foundation, Complex similarity searching in very large data collections

1ET100300419, research and development project

Name: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu

Investor: Academy of Sciences of the Czech Republic, Intelligent Models, Algorithms, Methods and Tools for the Semantic Web (realization)

Detailed Information on Publication Record