Building a Web-scale Image Similarity Search System

BATKO, Michal, Fabrizio FALCHI, Claudio LUCCHESE, David NOVÁK, Raffaele PEREGO, Fausto RABITTI, Jan SEDMIDUBSKÝ and Pavel ZEZULA. Building a Web-scale Image Similarity Search System. Online. Multimedia Tools and Applications. Springer Netherlands, 2010, vol. 47, No 3, p. 599-629. ISSN 1380-7501. [citováno 2024-04-23]

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Building a Web-scale Image Similarity Search System
Name in Czech	Budování rozsáhlého systému pro podobnostní vyhledávání v obrázcích
Authors	BATKO, Michal (203 Czech Republic, belonging to the institution), Fabrizio FALCHI (380 Italy), Claudio LUCCHESE (380 Italy), David NOVÁK (203 Czech Republic, belonging to the institution), Raffaele PEREGO (380 Italy), Fausto RABITTI (380 Italy), Jan SEDMIDUBSKÝ (203 Czech Republic, guarantor, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)
Edition	Multimedia Tools and Applications, Springer Netherlands, 2010, 1380-7501.

Other information
Original language	English
Type of outcome	Article in a journal
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	United States of America
Confidentiality degree	is not subject to a state or trade secret
WWW	URL
Impact factor	Impact factor: 0.914
RIV identification code	RIV/00216224:14330/10:00042682
Organization unit	Faculty of Informatics
UT WoS	000275800200012
Keywords in English	similarity search; content-based image retrieval; metric space; MPEG-7 descriptors; peer-to-peer search network
Tags	DISA
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 10/3/2016 11:28.

Abstract

As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web-scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.

Abstract (in Czech)

S obrovským rozmachem digitálních obrázků je třeba navrhnout systémy, které budou schopny vyhledávat ve velkých kolekcích obrázků podle jejich obsahu. V tomto článku prezentujeme naše zkušenosti získané z budování experimentálního systému pro podobnostní hledání na datové množině obsahující více jak 50 miliónů obrázků. Nejdříve jsme museli vyřešit netriviální proces získávání obrázků a jejich popisů pro vytvoření testovací sady, první takového rozsahu, která bude k dispozici všem výzkumníkům pro různé experimenty a porovnání. Potom jsme museli vyvinout indexovací a vyhledávací mechanismy, které umožňují škálovat to takových objemů a zároveň zodpovídat podobnostní dotazy v reálném čase. Výsledky našich experimentů jsou velice slibné pro budoucí aplikace.

Links
GA201/09/0683, research and development project	Name: Vyhledávání v rozsáhlých multimediálních databázích
GA201/09/0683, research and development project	Investor: Czech Science Foundation, Similarity Searching in Very Large Multimedia Databases
GD102/09/H042, research and development project	Name: Matematické a inženýrské metody pro vývoj spolehlivých a bezpečných paralelních a distribuovaných počítačových systémů
GD102/09/H042, research and development project	Investor: Czech Science Foundation
GP201/08/P507, research and development project	Name: Komplexní podobnostní dotazy nad rozsáhlými objemy dat
GP201/08/P507, research and development project	Investor: Czech Science Foundation, Complex similarity searching in very large data collections
1M0545, research and development project	Name: Institut Teoretické Informatiky
1M0545, research and development project	Investor: Ministry of Education, Youth and Sports of the CR, Institute for Theoretical Computer Science

PrintDisplayed: 23/4/2024 17:47

Building a Web-scale Image Similarity Search System

Other applications