FILIPOVIČ, Jiří, Jana HOZZOVÁ, Amin NEZARAT, Jaroslav OĽHA and Filip PETROVIČ. Using hardware performance counters to speed up autotuning convergence on GPUs. Journal of Parallel and Distributed Computing. Elsevier, 2022, vol. 160, February, p. 16-35. ISSN 0743-7315. Available from: https://dx.doi.org/10.1016/j.jpdc.2021.10.003.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Using hardware performance counters to speed up autotuning convergence on GPUs
Authors FILIPOVIČ, Jiří (203 Czech Republic, guarantor, belonging to the institution), Jana HOZZOVÁ (703 Slovakia, belonging to the institution), Amin NEZARAT (364 Islamic Republic of Iran, belonging to the institution), Jaroslav OĽHA (703 Slovakia, belonging to the institution) and Filip PETROVIČ (703 Slovakia, belonging to the institution).
Edition Journal of Parallel and Distributed Computing, Elsevier, 2022, 0743-7315.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Netherlands
Confidentiality degree is not subject to a state or trade secret
WWW URL
Impact factor Impact factor: 3.800
RIV identification code RIV/00216224:14610/22:00125022
Organization unit Institute of Computer Science
Doi http://dx.doi.org/10.1016/j.jpdc.2021.10.003
UT WoS 000711621300002
Keywords in English Auto-tuning; Search method; Performance counters; CUDA
Tags J-Q1, rivok
Tags International impact, Reviewed
Changed by Changed by: RNDr. Jiří Filipovič, Ph.D., učo 72898. Changed: 20/3/2023 12:40.
Abstract
Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching generic tuning spaces. The tuning spaces can contain tuning parameters changing any user-defined property of the source code. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.
Links
EF16_013/0001802, research and development projectName: CERIT Scientific Cloud
MUNI/A/1145/2021, interní kód MUName: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace XI. (Acronym: SV-FI MAV XI.)
Investor: Masaryk University
PrintDisplayed: 3/5/2024 04:58