ANTOL, Matej, Jaroslav OĽHA, Terézia SLANINÁKOVÁ and Vlastislav DOHNAL. Learned metric index - proposition of learned indexing for unstructured data. Information Systems. Elsevier, 2021, vol. 100, No 101774, p. 1-12. ISSN 0306-4379. Available from: https://dx.doi.org/10.1016/j.is.2021.101774.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Learned metric index - proposition of learned indexing for unstructured data
Authors ANTOL, Matej (703 Slovakia, guarantor, belonging to the institution), Jaroslav OĽHA (703 Slovakia, belonging to the institution), Terézia SLANINÁKOVÁ (703 Slovakia, belonging to the institution) and Vlastislav DOHNAL (203 Czech Republic, belonging to the institution).
Edition Information Systems, Elsevier, 2021, 0306-4379.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10200 1.2 Computer and information sciences
Country of publisher Netherlands
Confidentiality degree is not subject to a state or trade secret
WWW URL
Impact factor Impact factor: 3.180
RIV identification code RIV/00216224:14330/21:00118915
Organization unit Faculty of Informatics
Doi http://dx.doi.org/10.1016/j.is.2021.101774
UT WoS 000649115200005
Keywords in English Index structures;Learned index;Unstructured data;Content-based search;Metric space
Tags AIS-Q2, approximate search, DISA, index structures, LMI, metric data, similarity search
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 28/4/2022 09:52.
Abstract
The main paradigm of similarity searching in metric spaces has remained mostly unchanged for decades - data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data. We propose an alternative to this paradigm, using machine learning models to replace pivots, thus posing similarity search as a classification problem, which stands in for numerous expensive distance computations. Even a relatively naive implementation of this idea is more than competitive with state-of-the-art methods in terms of speed and recall, proving the concept as viable and showing great potential for its future development.
Links
GA19-02033S, research and development projectName: Vyhledávání, analytika a anotace datových toků lidských pohybů
Investor: Czech Science Foundation
LM2018140, research and development projectName: e-Infrastruktura CZ (Acronym: e-INFRA CZ)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1549/2020, interní kód MUName: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 21 (Acronym: SKOMU)
Investor: Masaryk University
MUNI/A/1573/2020, interní kód MUName: Aplikovaný výzkum: vyhledávání, analýza a vizualizace rozsáhlých dat, zpracování přirozeného jazyka, umělá inteligence pro analýzu biomedicínských obrazů.
Investor: Masaryk University
PrintDisplayed: 27/4/2024 14:10