D 2014

Understanding the Importance of Interactions among Job Scheduling Policies

TÓTH, Šimon and Dalibor KLUSÁČEK

Basic information

Original name

Understanding the Importance of Interactions among Job Scheduling Policies

Authors

TÓTH, Šimon (203 Czech Republic, guarantor, belonging to the institution) and Dalibor KLUSÁČEK (203 Czech Republic)

Edition

Brno, Czech Republic, Memics 2014, p. 144-145, 2 pp. 2014

Publisher

NOVPRESS

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/14:00073853

Organization unit

Faculty of Informatics

ISBN

978-80-214-5022-6

Keywords in English

Scheduling; Queues; Fairshare; Simulation

Tags

Reviewed
Změněno: 10/4/2015 10:39, RNDr. Šimon Tóth

Abstract

V originále

Many studies in the past two decades focused on the problem of efficient job scheduling in large computational systems. While many new scheduling algorithms have been proposed, mainstream resource management systems and schedulers are still using only a limited set of scheduling policies. For example, the core of the system is generally based on the simple First Come First Served (FCFS) approach, while backfilling (a trivial optimization of FCFS to increase utilization) is typically the most advanced option available. Since backfilling has been proposed in 1995, it is obvious that there is some misunderstanding between the research community and system administrators concerning "what is really important". In this work -- recently presented at the Euro-Par conference -- we show that the problem of operating a production scheduler is far more complex than just choosing a proper scheduling algorithm. Using our experience from the Czech National Grid Infrastructure MetaCentrum we explain several additional challenges that appear when searching for a functional solution. These problems are related to the fact that real systems must meet far more complicated requirements than those that are typically considered in classical research papers. In fact, production systems need to balance various policies that are set in place to satisfy both resource providers and users. While many works address these separate policies, e.g., fairshare for fair resource allocation, complex interactions between policies are not properly discussed in the literature. In our work we describe how to approach these interactions when developing site-specific policies. Notably, we describe how (priority) queues interact with scheduling algorithms, fairshare and with anti-starvation mechanisms. Moreover, we present a~case study describing how detailed simulations were used to find new configuration for MetaCentrum, significantly increasing its performance.

Links

GAP202/12/0306, research and development project
Name: Dyschnet - Dynamické plánování a rozvrhování výpočetních a síťových zdrojů (Acronym: Dyschnet)
Investor: Czech Science Foundation
MUNI/A/0855/2013, interní kód MU
Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace III. (Acronym: FI MAV III.)
Investor: Masaryk University, Category A