D 2025

On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data

SLANINÁKOVÁ, Terézia; Jaroslav OĽHA; David PROCHÁZKA; Matej ANTOL; Vlastislav DOHNAL et. al.

Basic information

Original name

On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data

Edition

1. vyd. Cham, Big Data Analytics and Knowledge Discovery 27th International Conference, DaWaK 2025, Bangkok, Thailand, August 25–27, 2025, Proceedings, p. 251-258, 8 pp. 2025

Publisher

Springer Cham

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10200 1.2 Computer and information sciences

Country of publisher

Switzerland

Confidentiality degree

is not subject to a state or trade secret

Publication form

printed version "print"

References:

Organization unit

Faculty of Informatics

ISBN

978-3-032-02214-1

Keywords in English

Learned indexing; Dynamization; Dynamic datasets; kNN search ; ANN search

Tags

International impact, Reviewed
Changed: 21/8/2025 14:23, doc. RNDr. Vlastislav Dohnal, Ph.D.

Abstract

In the original language

One of the main challenges within the growing research area of learned indexing is the lack of adaptability to dynamically expanding datasets . This paper explores the dynamization of a static learned index for complex data through operations such as node splitting and broadening, enabling efficient adaptation to new data. Furthermore, we evaluate the trade-offs between static and dynamic approaches by introducing an amortized cost model to assess query performance in tandem with the build costs of the index structure, enabling experimental determination of when a dynamic learned index outperforms its static counterpart. We apply the dynamization method to a static learned index and demonstrate that its superior scaling quickly surpasses the static implementation in terms of overall costs as the database grows.

Links

GF23-07040K, research and development project
Name: Naučené indexy pro podobností hledání
Investor: Czech Science Foundation, Learned Indexing for Similarity Searching, Lead Agency
LM2018131, research and development project
Name: Česká národní infrastruktura pro biologická data (Acronym: ELIXIR-CZ)
Investor: Ministry of Education, Youth and Sports of the CR, Czech National Infrastructure for Biological Data
LM2018140, research and development project
Name: e-Infrastruktura CZ (Acronym: e-INFRA CZ)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1638/2024, interní kód MU
Name: Umělá inteligence a správa komplexních rozsáhlých dat
Investor: Masaryk University, Artificial intelligence in large-scale data management