Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned (5 cent scientific cloud Acceleration of 3D reconstruction in cryo-EM David Střelák, Carlos Oscar Sorzano, Jose Maria Carazo, Jiří Filipovič Fall 2020 CO EVROPSKÁ UNIE EVROPSKÝ FOND PRO REGIONÁLNÍ ROZVOJ INVESTICE DO VAŠÍ BUDOUCNOSTI 2007-13 OP Výzkum a vývoj pro inovace David Střelák, Carlos Óscar Sorzano, José Maria Carazo, Jiří Filipovič Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Introduction Evaluation Conclusion Image Reconstruction Lessons Learned (5 C©TH Visualization of Molecules scientific cloud Several methods are able to visualize molecules on atomic level ► X-ray diffraction ► nuclear magnetic resonance (NMR) cryo-electron microscopy (cryo-EM) Cryo-EM has some superiority over other methods ► catches molecules in natural environment (diffraction needs crystalization) ► usable for large molecules (NMR is restricted to smaller proteins) David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned Introduction Image Reconstruction C©TH Cryo-electron microscopy scientific cloud Rapidly-developed recently ► in 2012, there was only four structures at near-atomic resolution ► in 2015, 115 structures was discovered ► this progress is allowed by direct electron detectors, viterious ice and image reconstruction methods In 2017, Nobel price in chemistry was given for cryo-EM ► Jacques Dubochet, Joachim Frank, Richard Henderson ► Joachim Frank got his price for image processing methods allowing to obtain 3D structure from electron microscope data David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Introduction Evaluation Conclusion Image Reconstruction Lessons Learned (5 C6TK Cryo-electron microscopy scientific cloud Illustration: ©Martin Högbom/The Royal Swedish Academy of Sciences David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Introduction Evaluation Conclusion Image Reconstruction Lessons Learned (5 cent scientific cloud Specimens in the Ice David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Introduction Evaluation Conclusion Image Reconstruction Lessons Learned (5 C©TH Image Analysis in Cryo-EM scientific cloud Reconstruction of 3D volume is challenging ► electron beam causes damages, so it must be weak, so a noise-to-signal distance is very low ► surrounding water adds another source of noise ► specimens are captured in random positions, possibly with conformational changes ► when captured multiple times, the image is moving and deforming David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Introduction Evaluation Conclusion Image Reconstruction Lessons Learned (5 C©TK Image of Specimens scientific cloud David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Introduction Evaluation Conclusion Image Reconstruction Lessons Learned (5 cent scientific cloud Aligning Images David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, J in Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned Introduction Image Reconstruction C©TH 3D Volume Reconstruction scientific cloud 2D image 3D volume David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation cent scientific cloud Our Focus Image reconstruction is very computationally-demanding ► requires thousands of CPU hours at least ► 3D reconstruction is one of main bottlenecks We focus on acceleration of 3D volume reconstruction in Xmipp software ► software developed in Spanish National Center for Biotechnology (CNB-CSIC) ► production use, not a prototype-toy David Stře lák, Carlos Óscar Sorzano, José Maria Carazo, Jiří Filipovič Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH Getting 3D Volume from Images? scientific cloud Central slice theorem ► let / be real-space projection image, which has concentrated information about 3D volume v ► let / be a Fourier transform of image / and V be Fourier transform of v ► / forms a slice of V with the same orientation as / holds with respect to v, moreover, slice / is going through center of V So, we need to transform our images into Fourier space, create 3D Fourier volume and transform the volume back to real space. David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH 3D Volume Reconstruction scientific cloud David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation C©TH 3D Volume Reconstruction scientific cloud We need to guess orientation of each 2D image ► computed iteratively ► bottleneck is creating 3D volume from 2D images We have accelerated the creation of 3D volume on GPUs. David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation C©TH State-of-the-art scientific cloud Multiple papers deal with GPU acceleration of 3D reconstruction, all implementing a scatter method ► GPU thread are associated to 2D pixels of the image ► each thread computes projection of the pixel into volume (resulting in floating-point position) ► the pixel value is put into multiple voxels (integer position) using interpolation David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TK State-of-the-art scientific cloud David Stře lák, Carlos Óscar Sorzano, José Maria Carazo, Jiří Filipovič Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation cent scientific cloud State-of-the-art Drawbacks of scatter pattern ► race conditions in writing (distances within a voxel up to a/3x longer than distance between two pixels), requires atomic writes ► some wrong optimizations removing atomics have been published ► frequent writing into 3D domain with poor spatial locality David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation C©TH The Gather Pattern scientific cloud The image value is computed for each 3D voxel ► so each voxel is written only once ► no race conditions in reading ► image data are interpolated (we obtain floating-point position in the image), so they are accessed multiple times ► much better memory locality (we are now repeating accesses into 2D image, not 3D volume) David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C6TK The Gather Pattern scientific cloud David Střelák, Carlos Oscar Sorzano, Jose Maria Carazo, Jiří Filipovič Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH The Gather Pattern scientific cloud Projecting 3D voxels to 2D image ► when going into image space, we get position in the image and z-distance from the image ► a lot of voxels do not hit the image (z-distance is too high, or they are out of image boundaries) ► we have (D(n ) pixels, but (D(n ) voxels - a lot of them is not used David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH The Gather Pattern scientific cloud Projection planes optimization ► we look at the image from some plane orthogonal to coordinate axes (XY, XZ, YZ), which maximizes projected image size ► the iteration space is reduced to the projection plane ► for each point of the projection plane, we compute the distance of the image and start to process voxels from there David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation C6TK The Gather Pattern scientific cloud David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH Basic GPU Implementation scientific cloud The gather pattern can be rewritten for GPU directly ► one GPU thread is assigned to one point of projection plane Optimization opportunities ► the advanced interpolation method may be computationally demanding ► GPU cache system is limited in maintaining data locality David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 cent scientific cloud Interpolation Our computation is a kind of stencil, but with floating-point positions ► interpolation coefficients cannot be precomputed easily • -• • t t t 1 '* % % * 7 \m % % im % • L m David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH Interpolation scientific cloud We have implemented two strategies on-the-fly interpolation ► precomputed table for very fine steps (originally in Xmipp) ► can be cached or preloaded into shared memory David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH Explicit Caching of Image Data scientific cloud A thread block accesses only a part of the image ► can be cached in fast shared memory ► however, its size may vary depending on image rotation ► we upper-bound image size to \\/2\/3{b -\- 2/)], where b is thread block size and / is interpolation radius ► shared memory is allocated to upper-bound prior GPU kernel execution ► for each image, AABB is computed and proper size is preloaded in shared memory David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction State-of-the-art Evaluation Our Algorithm Conclusion GPU Implementation Lessons Learned (5 C©TH Additional Optimizations scientific cloud Register consumption optimization ► many parameters into templates or macros ► allows to increase GPU parallelism CPU-GPU load balancing ► CPU prepares images for GPU, one core is not powerful to do so ► multiple threads are preparing images and sharing GPU time, also allows copy and computation overlay David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation cent scientific cloud Autotuning parameter values BLOCK.DIM ATOMICS GRID_DIM_Z PRECOMPJNT SHAREDJNT SHAREDJMG TILE.SIZE 8, 12, 16, 20, 24, 28, 32 0, 1 1, 4, 8, 16 0, 1 0, 1 0, 1 1, 2, 4, 8 David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned State-of-the-art Our Algorithm GPU Implementation cent scientific cloud Architecture Process #0 distribute tasks Process #1 (batches of samples) Thread Manager distribute samples CPU CPU CPU Thread Thread • • • Thread #1 #2 #n rH ► 2D Fourier —>► 3D Fourier —>► 3D real David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned (5 CGrrc Sphere scientific cloud David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM Introduction 3D Reconstruction Evaluation Conclusion Lessons Learned (5 cent scientific cloud Sphere with Indexing Bug © ft T Volume: ../27.7/spere_mpi.vol (64 x 64 x 64) File Display Tools Metadata Help QI 100|j] {}\\ 43|rj F±| Cols 11Rows 0 0 §\ 1 23456789 10 11 slice 1 slice 2 slice 3 slice 4 slices slice 6 slice 7 slice 8 slice 9 slice 10 slice 11 slice 12 slice 13 slice 14 slice 15 slice 16 slice 17 slice 18 slice 19 slice 20 slice 21 slice 22 slice 23 slice 24 slice 25 slice 26 slice 27 slice 28 slice 29 slice 30 slice 31 slice 32 slice 33 slice 34 slice 35 slice 36 slice 37 slice 38 slice 39 slice 40 slice 41 slice 42 [ slice 43 I slice 44 slice 45 slice 46 slice 47 slice 48 slice 49 slice 50 slice 51 slice 52 slice 53 slice 54 David St re la k, Carlos Oscar Sorzano, Jose Maria Carazo, Jin Filipovic Acceleration of 3D reconstruction in cryo-EM