Towards a Benchmarking Suite for Kernel Tuners

TØRRING, Jacob O, van Werkhoven BEN, Filip PETROVIČ, Floris-Jan WILLEMSEN, Jiří FILIPOVIČ and Anne C ELSTER. Towards a Benchmarking Suite for Kernel Tuners. Online. In 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). neuveden: IEEE, 2023, p. 724-733. ISBN 979-8-3503-1199-0. Available from: https://dx.doi.org/10.1109/IPDPSW59300.2023.00124.

Other formats: BibTeX LaTeX RIS

TY  - JOUR
ID  - 2306557
AU  - Tørring, Jacob O - Ben, van Werkhoven - Petrovič, Filip - Willemsen, Floris-Jan - Filipovič, Jiří - Elster, Anne C
PY  - 2023
TI  - Towards a Benchmarking Suite for Kernel Tuners
PB  - IEEE
CY  - neuveden
SN  - 9798350311990
KW  - autotuning
KW  - benchmarking
UR  - https://ieeexplore.ieee.org/abstract/document/10196663
N2  - As computing system become more complex combining CPUs and GPUs, it is becoming harder and harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many architecture-based optimization details as possible from the end-user, so that the code can be used efficiently across different generations of systems. Several autotuning frameworks have emerged, but a comparative analysis between these related works is scarce, owing to the significant manual effort required to port a tunable kernel from one tuner another. In this article we introduce a new benchmark suite for evaluating the performance of optimization algorithms used by modern autotuners targeting GPUs. The suite contains tunable GPU kernels that are representative of real-world applications, allowing for comparisons between optimization algorithms and the examination of code optimization, search space difficulty, and performance portability. Our framework facilitates easy integration of new autotuners and benchmarks by defining a shared problem interface. Our benchmark suite is evaluated based on five characteristics: convergence rate, local minima centrality, optimal speedup, Permutation Feature Importance (PFI), and performance portability. The results show that optimization parameters greatly impact performance and the need for global optimization. The importance of each parameter is consistent across GPU architectures, however, the specific values need to be optimized for each architecture. Our portability study highlights the crucial importance of autotuning each application for a specific target architecture. The results reveal that simply transferring the optimal configuration from one architecture to another can result in a performance ranging from 58.5% to 99.9% of the optimal performance, depending on the GPU architecture. This highlights the importance of autotuning in modern computing systems and the value of our benchmark suite in facilitating the study of optimization algorithms and their effectiveness in achieving optimal performance for specific target architectures.
ER  -

Basic information
Original name	Towards a Benchmarking Suite for Kernel Tuners
Authors	TØRRING, Jacob O (578 Norway), van Werkhoven BEN (528 Netherlands), Filip PETROVIČ (703 Slovakia, belonging to the institution), Floris-Jan WILLEMSEN (528 Netherlands), Jiří FILIPOVIČ (203 Czech Republic, guarantor, belonging to the institution) and Anne C ELSTER (578 Norway).
Edition	neuveden, 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), p. 724-733, 10 pp. 2023.
Publisher	IEEE

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	United States of America
Confidentiality degree	is not subject to a state or trade secret
Publication form	electronic version available online
WWW	URL
RIV identification code	RIV/00216224:14610/23:00131587
Organization unit	Institute of Computer Science
ISBN	979-8-3503-1199-0
ISSN	2164-7062
Doi	http://dx.doi.org/10.1109/IPDPSW59300.2023.00124
UT WoS	001055030700088
Keywords in English	autotuning; benchmarking
Tags	rivok
Tags	International impact, Reviewed
Changed by	Changed by: Mgr. Alena Mokrá, učo 362754. Changed: 20/3/2024 14:59.

Abstract

As computing system become more complex combining CPUs and GPUs, it is becoming harder and harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many architecture-based optimization details as possible from the end-user, so that the code can be used efficiently across different generations of systems. Several autotuning frameworks have emerged, but a comparative analysis between these related works is scarce, owing to the significant manual effort required to port a tunable kernel from one tuner another. In this article we introduce a new benchmark suite for evaluating the performance of optimization algorithms used by modern autotuners targeting GPUs. The suite contains tunable GPU kernels that are representative of real-world applications, allowing for comparisons between optimization algorithms and the examination of code optimization, search space difficulty, and performance portability. Our framework facilitates easy integration of new autotuners and benchmarks by defining a shared problem interface. Our benchmark suite is evaluated based on five characteristics: convergence rate, local minima centrality, optimal speedup, Permutation Feature Importance (PFI), and performance portability. The results show that optimization parameters greatly impact performance and the need for global optimization. The importance of each parameter is consistent across GPU architectures, however, the specific values need to be optimized for each architecture. Our portability study highlights the crucial importance of autotuning each application for a specific target architecture. The results reveal that simply transferring the optimal configuration from one architecture to another can result in a performance ranging from 58.5% to 99.9% of the optimal performance, depending on the GPU architecture. This highlights the importance of autotuning in modern computing systems and the value of our benchmark suite in facilitating the study of optimization algorithms and their effectiveness in achieving optimal performance for specific target architectures.

Links
LM2023054, research and development project	Name: e-Infrastruktura CZ
LM2023054, research and development project	Investor: Ministry of Education, Youth and Sports of the CR

PrintDisplayed: 4/10/2024 12:26

Towards a Benchmarking Suite for Kernel Tuners

Other applications