J 2021

Learned metric index - proposition of learned indexing for unstructured data

ANTOL, Matej, Jaroslav OĽHA, Terézia SLANINÁKOVÁ and Vlastislav DOHNAL

Basic information

Original name

Learned metric index - proposition of learned indexing for unstructured data

Authors

ANTOL, Matej (703 Slovakia, guarantor, belonging to the institution), Jaroslav OĽHA (703 Slovakia, belonging to the institution), Terézia SLANINÁKOVÁ (703 Slovakia, belonging to the institution) and Vlastislav DOHNAL (203 Czech Republic, belonging to the institution)

Edition

Information Systems, Elsevier, 2021, 0306-4379

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

10200 1.2 Computer and information sciences

Country of publisher

Netherlands

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

Impact factor

Impact factor: 3.180

RIV identification code

RIV/00216224:14330/21:00118915

Organization unit

Faculty of Informatics

UT WoS

000649115200005

Keywords in English

Index structures;Learned index;Unstructured data;Content-based search;Metric space

Tags

International impact, Reviewed
Změněno: 28/4/2022 09:52, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

The main paradigm of similarity searching in metric spaces has remained mostly unchanged for decades - data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data. We propose an alternative to this paradigm, using machine learning models to replace pivots, thus posing similarity search as a classification problem, which stands in for numerous expensive distance computations. Even a relatively naive implementation of this idea is more than competitive with state-of-the-art methods in terms of speed and recall, proving the concept as viable and showing great potential for its future development.

Links

GA19-02033S, research and development project
Name: Vyhledávání, analytika a anotace datových toků lidských pohybů
Investor: Czech Science Foundation
LM2018140, research and development project
Name: e-Infrastruktura CZ (Acronym: e-INFRA CZ)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1549/2020, interní kód MU
Name: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 21 (Acronym: SKOMU)
Investor: Masaryk University
MUNI/A/1573/2020, interní kód MU
Name: Aplikovaný výzkum: vyhledávání, analýza a vizualizace rozsáhlých dat, zpracování přirozeného jazyka, umělá inteligence pro analýzu biomedicínských obrazů.
Investor: Masaryk University