2026
Kubernetes Scheduling with Checkpoint/Restore: Challenges and Open Problems
SPIŠAKOVÁ, Viktória; Radostin STOYANOV; Lukáš HEJTMÁNEK; Dalibor KLUSÁČEK; Adrian REBER et al.Základní údaje
Originální název
Kubernetes Scheduling with Checkpoint/Restore: Challenges and Open Problems
Autoři
SPIŠAKOVÁ, Viktória ORCID; Radostin STOYANOV; Lukáš HEJTMÁNEK; Dalibor KLUSÁČEK; Adrian REBER a Rodrigo BRUNO
Vydání
Cham (Switzerland), Job Scheduling Strategies for Parallel Processing, od s. 41-62, 22 s. 2026
Nakladatel
Springer
Další údaje
Jazyk
angličtina
Typ výsledku
Stať ve sborníku
Obor
10200 1.2 Computer and information sciences
Stát vydavatele
Švýcarsko
Utajení
není předmětem státního či obchodního tajemství
Forma vydání
tištěná verze "print"
Impakt faktor
Impact factor: 0.402 v roce 2005
Označené pro přenos do RIV
Ano
Organizační jednotka
Fakulta informatiky
ISBN
978-3-032-10506-6
ISSN
EID Scopus
Klíčová slova anglicky
Checkpoint and Restore; Kubernetes; Containers; Resource Management; Scheduling
Štítky
Příznaky
Mezinárodní význam, Recenzováno
Změněno: 2. 4. 2026 14:32, RNDr. Pavel Šmerk, Ph.D.
Anotace
V originále
Efficient resource management and scheduling have been persistent challenges since the early days of computing and remain critical to this day.The widespread adoption of containers managed by orchestrators like Kubernetes have introduced new dimensions to this challenge. Despite the lightweight nature and minimal overhead of containers, they still suffer from utilization inefficiencies due to overprovisioning. Existing scheduling techniques are not enough to meet these demands and there is a growing need for orchestration and scheduling policies that support advanced preemption, migration, and fault tolerance. Well-established container checkpoint/restore (C/R) mechanisms implemented through tools like CRIU, offer a promising solution for improving resource scheduling efficiency. However, these mechanisms remain only partially integrated with platforms like Kubernetes. In this paper, we explore the use cases for general C/R, examine the current state, and delve into the open problems and challenges associated with native integration into Kubernetes. We propose potential solutions to these challenges, offering a pathway towards more efficient resource management to better meet the needs of today's computational landscape. While scheduling efficiency is considered critical in HPC clusters, serverless and deep learning platforms also benefit directly from these optimizations.
Návaznosti
| LM2018140, projekt VaV |
|