Acceleration of dRMSD Calculation and Efficient Usage of GPU
Caches

FILIPOVIČ, Jiří, Jan PLHÁK and David STŘELÁK. Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches. Online. In Waleed Smari. Proceedings of IEEE International Conference on High Performance Computing & Simulation. neuveden: IEEE, 2015. p. 47-54. ISBN 978-1-4673-7812-3. Available from: https://dx.doi.org/10.1109/HPCSim.2015.7237020. [citováno 2024-04-23]

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
Name in Czech	Akcelerace dRMSD výpočtu a efektivní užití GPU cache
Authors	FILIPOVIČ, Jiří (203 Czech Republic, guarantor, belonging to the institution), Jan PLHÁK (203 Czech Republic, belonging to the institution) and David STŘELÁK (203 Czech Republic, belonging to the institution)
Edition	neuveden, Proceedings of IEEE International Conference on High Performance Computing & Simulation, p. 47-54, 8 pp. 2015.
Publisher	IEEE

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Netherlands
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
RIV identification code	RIV/00216224:14330/15:00083460
Organization unit	Faculty of Informatics
ISBN	978-1-4673-7812-3
Doi	http://dx.doi.org/10.1109/HPCSim.2015.7237020
UT WoS	000375684100006
Keywords (in Czech)	RMSD; GPU; optimalizace kódu; cache
Keywords in English	RMSD; GPU; code optimization; cache
Tags	firank_B
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Jiří Filipovič, Ph.D., učo 72898. Changed: 13/7/2016 11:10.

Abstract

In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.

Links
EE2.3.30.0037, research and development project	Name: Zaměstnáním nejlepších mladých vědců k rozvoji mezinárodní spolupráce

PrintDisplayed: 23/4/2024 18:34

Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches

Other applications