D 2026

Kubernetes Scheduling with Checkpoint/Restore: Challenges and Open Problems

SPIŠAKOVÁ, Viktória; Radostin STOYANOV; Lukáš HEJTMÁNEK; Dalibor KLUSÁČEK; Adrian REBER et al.

Základní údaje

Originální název

Kubernetes Scheduling with Checkpoint/Restore: Challenges and Open Problems

Autoři

SPIŠAKOVÁ, Viktória ORCID; Radostin STOYANOV; Lukáš HEJTMÁNEK; Dalibor KLUSÁČEK; Adrian REBER a Rodrigo BRUNO

Vydání

Cham (Switzerland), Job Scheduling Strategies for Parallel Processing, od s. 41-62, 22 s. 2026

Nakladatel

Springer

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10200 1.2 Computer and information sciences

Stát vydavatele

Švýcarsko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Impakt faktor

Impact factor: 0.402 v roce 2005

Označené pro přenos do RIV

Ano

Organizační jednotka

Fakulta informatiky

ISBN

978-3-032-10506-6

ISSN

EID Scopus

Klíčová slova anglicky

Checkpoint and Restore; Kubernetes; Containers; Resource Management; Scheduling

Štítky

Příznaky

Mezinárodní význam, Recenzováno
Změněno: 2. 4. 2026 14:32, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

Efficient resource management and scheduling have been persistent challenges since the early days of computing and remain critical to this day.The widespread adoption of containers managed by orchestrators like Kubernetes have introduced new dimensions to this challenge. Despite the lightweight nature and minimal overhead of containers, they still suffer from utilization inefficiencies due to overprovisioning. Existing scheduling techniques are not enough to meet these demands and there is a growing need for orchestration and scheduling policies that support advanced preemption, migration, and fault tolerance. Well-established container checkpoint/restore (C/R) mechanisms implemented through tools like CRIU, offer a promising solution for improving resource scheduling efficiency. However, these mechanisms remain only partially integrated with platforms like Kubernetes. In this paper, we explore the use cases for general C/R, examine the current state, and delve into the open problems and challenges associated with native integration into Kubernetes. We propose potential solutions to these challenges, offering a pathway towards more efficient resource management to better meet the needs of today's computational landscape. While scheduling efficiency is considered critical in HPC clusters, serverless and deep learning platforms also benefit directly from these optimizations.

Návaznosti

LM2018140, projekt VaV
Název: e-Infrastruktura CZ (Akronym: e-INFRA CZ)
Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, e-Infrastruktura CZ