ANTOL, Matej, Miriama JÁNOŠOVÁ and Vlastislav DOHNAL. Metric hull as similarity-aware operator for representing unstructured data. Pattern Recognition Letters. Amsterdam: Elsevier, 2021, vol. 149, September 2021, p. 91-98. ISSN 0167-8655. Available from: https://dx.doi.org/10.1016/j.patrec.2021.05.011.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Metric hull as similarity-aware operator for representing unstructured data
Authors ANTOL, Matej (703 Slovakia, belonging to the institution), Miriama JÁNOŠOVÁ (703 Slovakia, belonging to the institution) and Vlastislav DOHNAL (203 Czech Republic, guarantor, belonging to the institution).
Edition Pattern Recognition Letters, Amsterdam, Elsevier, 2021, 0167-8655.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10200 1.2 Computer and information sciences
Country of publisher Netherlands
Confidentiality degree is not subject to a state or trade secret
WWW URL
Impact factor Impact factor: 4.757
RIV identification code RIV/00216224:14330/21:00121873
Organization unit Faculty of Informatics
Doi http://dx.doi.org/10.1016/j.patrec.2021.05.011
UT WoS 000680052800013
Keywords in English Similarity operators; Metric space; Data aggregation
Tags AIS-Q2, data representation, DISA, LMI, metric data, metric hull, similarity search
Tags International impact, Reviewed
Changed by Changed by: doc. RNDr. Vlastislav Dohnal, Ph.D., učo 2952. Changed: 19/4/2022 12:11.
Abstract
Similarity searching has become widely utilized in many online services processing unstructured and complex data, e.g., Google Images. Metric spaces are often applied to model and organize such data by their mutual similarity. As top-k queries provide only a local view on data, a data analyst must pose multiple requests to observe the entire dataset. Thus, group-by operators for metric data have been proposed. These operators identify groups by respecting a given similarity constraint and produce a set of objects per group. The analyst can then tediously browse these sets directly, but representative members may provide better insight. In this paper, we focus on concise representations of metric datasets. We propose a novel concept of a metric hull which encompasses a given set by selecting a few objects. Testing an object to be part of the set is then made much faster. We verify this concept on synthetic Euclidean data and real-life image and text datasets and show its effectiveness and scalability. The metric hulls provide much faster and more compact representations when compared with commonly used ball representations.
Links
EF16_019/0000822, research and development projectName: Centrum excelence pro kyberkriminalitu, kyberbezpečnost a ochranu kritických informačních infrastruktur
MUNI/A/1549/2020, interní kód MUName: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 21 (Acronym: SKOMU)
Investor: Masaryk University
MUNI/A/1573/2020, interní kód MUName: Aplikovaný výzkum: vyhledávání, analýza a vizualizace rozsáhlých dat, zpracování přirozeného jazyka, umělá inteligence pro analýzu biomedicínských obrazů.
Investor: Masaryk University
PrintDisplayed: 27/4/2024 10:20