Acceleration of dRMSD Calculation and Efficient Usage of GPU
Caches

FILIPOVIČ, Jiří, Jan PLHÁK a David STŘELÁK. Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches. In Waleed Smari. Proceedings of IEEE International Conference on High Performance Computing & Simulation. neuveden: IEEE, 2015, s. 47-54. ISBN 978-1-4673-7812-3. Dostupné z: https://dx.doi.org/10.1109/HPCSim.2015.7237020.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
Název česky	Akcelerace dRMSD výpočtu a efektivní užití GPU cache
Autoři	FILIPOVIČ, Jiří (203 Česká republika, garant, domácí), Jan PLHÁK (203 Česká republika, domácí) a David STŘELÁK (203 Česká republika, domácí).
Vydání	neuveden, Proceedings of IEEE International Conference on High Performance Computing & Simulation, od s. 47-54, 8 s. 2015.
Nakladatel	IEEE

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Nizozemské království
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	tištěná verze "print"
Kód RIV	RIV/00216224:14330/15:00083460
Organizační jednotka	Fakulta informatiky
ISBN	978-1-4673-7812-3
Doi	http://dx.doi.org/10.1109/HPCSim.2015.7237020
UT WoS	000375684100006
Klíčová slova česky	RMSD; GPU; optimalizace kódu; cache
Klíčová slova anglicky	RMSD; GPU; code optimization; cache
Štítky	firank_B
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: doc. RNDr. Jiří Filipovič, Ph.D., učo 72898. Změněno: 13. 7. 2016 11:10.

Anotace

In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.

Návaznosti
EE2.3.30.0037, projekt VaV	Název: Zaměstnáním nejlepších mladých vědců k rozvoji mezinárodní spolupráce

VytisknoutZobrazeno: 24. 7. 2024 10:21

Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches

Další aplikace