V originále
Autotuning, the practice of automatic tuning of code to provide performance portability, has received increased attention in the research community, especially in high performance computing. Ensuring high performance on a variety of hardware usually means modifications to the code, often via different values of a selected set of parameters, such as tiling size, loop unrolling factor or data layout. However, the search space of all possible combinations of these parameters can be enormous. Traditional search methods often fail to find a well-performing set of parameter values quickly. We have found that certain properties of tuning spaces do not vary much when hardware is changed. In this paper, we demonstrate that it is possible to use historical data to reliably predict the number of tuning steps necessary to find a well-performing configuration, and to reduce the size of the tuning space. We evaluate our hypotheses on a number of GPU-accelerated benchmarks written in CUDA and OpenCL.