Acceleration of dRMSD Calculation and Efficient Usage of GPU
Caches

D 2015

Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches

FILIPOVIČ, Jiří, Jan PLHÁK and David STŘELÁK

Basic information

Original name

Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches

Name in Czech

Akcelerace dRMSD výpočtu a efektivní užití GPU cache

Authors

FILIPOVIČ, Jiří (203 Czech Republic, guarantor, belonging to the institution), Jan PLHÁK (203 Czech Republic, belonging to the institution) and David STŘELÁK (203 Czech Republic, belonging to the institution)

Edition

neuveden, Proceedings of IEEE International Conference on High Performance Computing & Simulation, p. 47-54, 8 pp. 2015

Publisher

IEEE

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Netherlands

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/15:00083460

Organization unit

Faculty of Informatics

ISBN

978-1-4673-7812-3

DOI

http://dx.doi.org/10.1109/HPCSim.2015.7237020

UT WoS

000375684100006

Keywords (in Czech)

RMSD; GPU; optimalizace kódu; cache

Keywords in English

RMSD; GPU; code optimization; cache

Abstract

V originále

In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.

Links

EE2.3.30.0037, research and development project

Name: Zaměstnáním nejlepších mladých vědců k rozvoji mezinárodní spolupráce

Citovat

FILIPOVIČ, Jiří, Jan PLHÁK and David STŘELÁK. Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches. In Waleed Smari. Proceedings of IEEE International Conference on High Performance Computing & Simulation. neuveden: IEEE, 2015, p. 47-54. ISBN 978-1-4673-7812-3. Available from: https://dx.doi.org/10.1109/HPCSim.2015.7237020.

@inproceedings{1306977,
   author = {Filipovič, Jiří and Plhák, Jan and Střelák, David},
   address = {neuveden},
   booktitle = {Proceedings of IEEE International Conference on High Performance Computing & Simulation},
   doi = {http://dx.doi.org/10.1109/HPCSim.2015.7237020},
   editor = {Waleed Smari},
   keywords = {RMSD; GPU; code optimization; cache},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {neuveden},
   isbn = {978-1-4673-7812-3},
   pages = {47-54},
   publisher = {IEEE},
   title = {Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches},
   year = {2015}
}

TY  - JOUR
ID  - 1306977
AU  - Filipovič, Jiří - Plhák, Jan - Střelák, David
PY  - 2015
TI  - Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
PB  - IEEE
CY  - neuveden
SN  - 9781467378123
KW  - RMSD
KW  - GPU
KW  - code optimization
KW  - cache
N2  - In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
ER  -

FILIPOVIČ, Jiří, Jan PLHÁK and David STŘELÁK. Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches. In Waleed Smari. \textit{Proceedings of IEEE International Conference on High Performance Computing \&{} Simulation}. neuveden: IEEE, 2015, p.~47-54. ISBN~978-1-4673-7812-3. Available from: https://dx.doi.org/10.1109/HPCSim.2015.7237020.

Detailed Information on Publication Record