Detailed Information on Publication Record
2015
Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
FILIPOVIČ, Jiří, Jan PLHÁK and David STŘELÁKBasic information
Original name
Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
Name in Czech
Akcelerace dRMSD výpočtu a efektivní užití GPU cache
Authors
FILIPOVIČ, Jiří (203 Czech Republic, guarantor, belonging to the institution), Jan PLHÁK (203 Czech Republic, belonging to the institution) and David STŘELÁK (203 Czech Republic, belonging to the institution)
Edition
neuveden, Proceedings of IEEE International Conference on High Performance Computing & Simulation, p. 47-54, 8 pp. 2015
Publisher
IEEE
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Netherlands
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
RIV identification code
RIV/00216224:14330/15:00083460
Organization unit
Faculty of Informatics
ISBN
978-1-4673-7812-3
UT WoS
000375684100006
Keywords (in Czech)
RMSD; GPU; optimalizace kódu; cache
Keywords in English
RMSD; GPU; code optimization; cache
Tags
International impact, Reviewed
Změněno: 13/7/2016 11:10, doc. RNDr. Jiří Filipovič, Ph.D.
Abstract
V originále
In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
Links
EE2.3.30.0037, research and development project |
|