Další formáty:
BibTeX
LaTeX
RIS
@article{1306828, author = {Filipovič, Jiří and Madzin, Matúš and Fousek, Jan and Matyska, Luděk}, article_number = {10}, doi = {http://dx.doi.org/10.1007/s11227-015-1483-z}, keywords = {GPU; CUDA; BLAS; Kernel fusion; Code generation}, language = {eng}, issn = {0920-8542}, journal = {The Journal of Supercomputing}, title = {Optimizing CUDA code by kernel fusion: application on BLAS}, url = {http://link.springer.com/article/10.1007/s11227-015-1483-z}, volume = {71}, year = {2015} }
TY - JOUR ID - 1306828 AU - Filipovič, Jiří - Madzin, Matúš - Fousek, Jan - Matyska, Luděk PY - 2015 TI - Optimizing CUDA code by kernel fusion: application on BLAS JF - The Journal of Supercomputing VL - 71 IS - 10 SP - 3934-3957 EP - 3934-3957 PB - Springer US SN - 09208542 KW - GPU KW - CUDA KW - BLAS KW - Kernel fusion KW - Code generation UR - http://link.springer.com/article/10.1007/s11227-015-1483-z N2 - Contemporary GPUs have significantly higher arithmetic throughput than a memory throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic power of the GPU. Examples of memory-bound kernels are BLAS-1 (vector–vector) and BLAS-2 (matrix–vector) operations. However, when kernels share data, kernel fusion can improve memory locality by placing shared data, originally passed via off-chip global memory, into a faster, but distributed on-chip memory. In this paper, we show how kernels performing map, reduce or their nested combinations can be fused automatically by our source-to-source compiler. To demonstrate the usability of the compiler, we have implemented several BLAS-1 and BLAS-2 routines and show how the performance of their sequences can be improved by fusions. Compared with similar sequences using CUBLAS, our compiler is able to generate code that is up to 2.24x faster for the examples tested. ER -
FILIPOVIČ, Jiří, Matúš MADZIN, Jan FOUSEK a Luděk MATYSKA. Optimizing CUDA code by kernel fusion: application on BLAS. \textit{The Journal of Supercomputing}. Springer US, 2015, roč.~71, č.~10, s.~3934-3957. ISSN~0920-8542. Dostupné z: https://dx.doi.org/10.1007/s11227-015-1483-z.
|