NOVÁK, David, Michal BATKO and Pavel ZEZULA. Large-scale similarity data management with distributed Metric Index. Online. Information Processing and Management. ELSEVIER, 2012, vol. 48, No 5, p. 855-872. ISSN 0306-4573. Available from: https://dx.doi.org/10.1016/j.ipm.2010.12.004. [citováno 2024-04-23]
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Large-scale similarity data management with distributed Metric Index
Name in Czech Zpracování rozsáhlých kolekcí podobnostních dat pomocí distribuovaného metrického indexu
Authors NOVÁK, David (203 Czech Republic, belonging to the institution), Michal BATKO (203 Czech Republic, guarantor, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)
Edition Information Processing and Management, ELSEVIER, 2012, 0306-4573.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher United States of America
Confidentiality degree is not subject to a state or trade secret
Impact factor Impact factor: 0.817
RIV identification code RIV/00216224:14330/12:00057505
Organization unit Faculty of Informatics
Doi http://dx.doi.org/10.1016/j.ipm.2010.12.004
UT WoS 000307682100005
Keywords in English Distributed data structures; Performance tuning; Similarity search; Scalability; Peer-to-peer structured networks; Metric space
Tags DISA
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 23/4/2013 12:17.
Abstract
Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application.
Links
GAP103/10/0886, research and development projectName: Vizuální vyhledávání obrázků na Webu (Acronym: VisualWeb)
Investor: Czech Science Foundation, Content-based Image Retrieval on the Web Scale
GPP202/10/P220, research and development projectName: Podobnostní vyhledávání s konstantní škálovatelností (Acronym: SIM-SCALE)
Investor: Czech Science Foundation
VF20102014004, research and development projectName: Multimediální analýza (Acronym: Multimediální analýza)
Investor: Ministry of the Interior of the CR
PrintDisplayed: 23/4/2024 17:31