Using hardware performance counters to speed up autotuning
convergence on GPUs

J 2022

Using hardware performance counters to speed up autotuning convergence on GPUs

FILIPOVIČ, Jiří, Jana HOZZOVÁ, Amin NEZARAT, Jaroslav OĽHA, Filip PETROVIČ et. al.

Základní údaje

Originální název

Using hardware performance counters to speed up autotuning convergence on GPUs

Autoři

FILIPOVIČ, Jiří (203 Česká republika, garant, domácí), Jana HOZZOVÁ (703 Slovensko, domácí), Amin NEZARAT (364 Írán, domácí), Jaroslav OĽHA (703 Slovensko, domácí) a Filip PETROVIČ (703 Slovensko, domácí)

Vydání

Journal of Parallel and Distributed Computing, Elsevier, 2022, 0743-7315

Další údaje

Jazyk

angličtina

Typ výsledku

Článek v odborném periodiku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Nizozemské království

Utajení

není předmětem státního či obchodního tajemství

Odkazy

URL

Impakt faktor

Impact factor: 3.800

Kód RIV

RIV/00216224:14610/22:00125022

Organizační jednotka

Ústav výpočetní techniky

DOI

http://dx.doi.org/10.1016/j.jpdc.2021.10.003

UT WoS

000711621300002

Klíčová slova anglicky

Auto-tuning; Search method; Performance counters; CUDA

Štítky

J-Q1, rivok

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 20. 3. 2023 12:40, doc. RNDr. Jiří Filipovič, Ph.D.

Anotace

V originále

Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching generic tuning spaces. The tuning spaces can contain tuning parameters changing any user-defined property of the source code. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.

Návaznosti

EF16_013/0001802, projekt VaV

Název: CERIT Scientific Cloud

MUNI/A/1145/2021, interní kód MU

Název: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace XI. (Akronym: SV-FI MAV XI.)

Investor: Masarykova univerzita, Rozsáhlé výpočetní systémy: modely, aplikace a verifikace XI.

Citovat

FILIPOVIČ, Jiří, Jana HOZZOVÁ, Amin NEZARAT, Jaroslav OĽHA a Filip PETROVIČ. Using hardware performance counters to speed up autotuning convergence on GPUs. Journal of Parallel and Distributed Computing. Elsevier, 2022, roč. 160, February, s. 16-35. ISSN 0743-7315. Dostupné z: https://dx.doi.org/10.1016/j.jpdc.2021.10.003.

@article{1799520,
   author = {Filipovič, Jiří and Hozzová, Jana and Nezarat, Amin and Oľha, Jaroslav and Petrovič, Filip},
   article_number = {February},
   doi = {http://dx.doi.org/10.1016/j.jpdc.2021.10.003},
   keywords = {Auto-tuning; Search method; Performance counters; CUDA},
   language = {eng},
   issn = {0743-7315},
   journal = {Journal of Parallel and Distributed Computing},
   title = {Using hardware performance counters to speed up autotuning convergence on GPUs},
   url = {https://www.sciencedirect.com/science/article/pii/S0743731521001945?via%3Dihub},
   volume = {160},
   year = {2022}
}

TY  - JOUR
ID  - 1799520
AU  - Filipovič, Jiří - Hozzová, Jana - Nezarat, Amin - Oľha, Jaroslav - Petrovič, Filip
PY  - 2022
TI  - Using hardware performance counters to speed up autotuning convergence on GPUs
JF  - Journal of Parallel and Distributed Computing
VL  - 160
IS  - February
SP  - 16-35
EP  - 16-35
PB  - Elsevier
SN  - 07437315
KW  - Auto-tuning
KW  - Search method
KW  - Performance counters
KW  - CUDA
UR  - https://www.sciencedirect.com/science/article/pii/S0743731521001945?via%3Dihub
N2  - Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching generic tuning spaces. The tuning spaces can contain tuning parameters changing any user-defined property of the source code. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.
ER  -

FILIPOVIČ, Jiří, Jana HOZZOVÁ, Amin NEZARAT, Jaroslav OĽHA a Filip PETROVIČ. Using hardware performance counters to speed up autotuning convergence on GPUs. \textit{Journal of Parallel and Distributed Computing}. Elsevier, 2022, roč.~160, February, s.~16-35. ISSN~0743-7315. Dostupné z: https://dx.doi.org/10.1016/j.jpdc.2021.10.003.

Podrobný výpis o publikaci