Image analysis IV C9940 3-Dimensional Transmission Electron Microscopy S1007 Doing structural biology with the electron microscope April 24, 2017 Outline Image analysis III More on FFTs Classification • Review of multivariate data analysis • Classification in 2D • Classification in 3D Resolution estimation • Fourier Shell Correlation • Expectation value of noise • “Gold standard” resolution Some simple 2D Fourier transforms: a row of points Some simple 2D Fourier transforms: a series of lines Some simple 2D Fourier transforms: a 2D lattice Single point If the point was infinitely sharp, the FFT would be flat. Some simple 1D transforms: a sharp point (Dirac delta function) http://en.labs.wikimedia.org/wiki/Basic_Physics_of_Nuclear_Medicine/Fourier_Methods Single point If the point was infinitely sharp, the FFT would be flat. Two points Three points Five points One row Two rows Three rows Five rows Full lattice Animation What if? ? Molecule g(x) lattice: f(x) Set a molecule down at every lattice point. Adapted from David DeRosier Convolution: a review Cross-correlation: F*(X) G(X) f(x) F(X) F(X) G(X) f(x)•g(x) G(X) g(x) What if? ? Hint What if? f(x) F(X) f(x)•g(x) G(X) g(x) F(X) G(X) ? Classification Reiteration of the problem 8 classes of faces, 64x64 pixels With noise added Before we can average the data, we first should find homogeneous subsets. Average: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Multivariate data analysis (MDA) 1 2 3 4 9 10 11 125 6 7 8 13 14 15 16 Multivariate data analysis (MDA), or Multivariate statistical analysis (MSA) Our 16-pixel image can be reorganized into a 16-coordinate vector. MDA: Reconstituted images Linear combinations of these images will give us approximations of the images that make up the data. Average Eigenimage #1 Eigenimage #2 Eigenimage #3 c0 + c1 + c2 + c3 + ... Phantom images of worm hemoglobin MDA of worm hemoglobin Average: +c0 -c0 +c1 +c2 +c3 +c4 +c5 -c1 -c2 -c3 -c4 -c5 1 2 3 4 9 10 11 125 6 7 8 13 14 15 16 Classification How do we categorize/classify the images? K-means classification A number K of images are chosen as seeds. BAD: Some clusters may be overrepresented/underrepresented. Diday's method of moving centers Diday's method of moving centers Diday's method of moving centers Diday's method of moving centers We will note the images that always “travel” together, and will call them a class. Dendrogram Dendrogram Hierarchical ascendant classification “Images” Hierarchical Ascendant Classification All images are represented. The dendrogram will be too heavily branched to interpret without truncation. Binary-tree viewer BAD: Information about the height of the branch is lost. Classification in 3D Classification: Reference-based classification vs. Maximum likelihood (ML3D) Reference-based classification: • Possible conformations must be known. • The combination of parameters (shift, rotation, class) is chosen from the highest correlation value. • Possible reference bias ML3D • Possible conformations are not known. • The probability of the occurrence of the parameters (shift, rotation, class) is maximized. • Random, data-dependent RELION is a variation of maximum likelihood. Seeding ML3D classification There will be slight differences in the reconstructions. We will iteratively maximize the likelihood of a particle belonging to a particular class. images We split the data set into K classes at random. How good is our reconstruction? images “odd” reconstruction “even” reconstruction We split the data set into halves and compare them. How do we evaluate the quality of a reconstruction? Fourier Shell Correlation (FSC) Properties: - Fourier terms have amplitude + phase. - Correlation values range from -1 to +1. - Noise should give an average of 0. - The comparison is done as a function of spatial frequency (or “resolution”) Reconstruction1 Reconstruction2 term 1 term 2 Fourier Shell Correlation curve FSC curve with expectation value of noise Why does σ vary with spatial frequency? Random walks: Why signal-to-noise improves with √N The “Drunkard's walk” Let's conduct an experiment. The “Drunkard's walk” 0 1 2 3 4-1-2-3-4 We're going to assume that each step is random and independent of previous steps. The “Drunkard's walk” 0 1 2 3 4-1-2-3-4 t=1 t=2 t=3 t=4 t=5 t=6 The teetotaler's walk 0 1 2 3 4-1-2-3-4 t=1 t=2 t=3 t=4 Expectation value The expected distance that “noise” travels increases with √N. However, it is not as fast as the distance that “signal” travels. Thus, as we collect more data, the SNR increase by N/√N = √N Random walks: more information Expectation values and how they related to resolution criteria With small N, behavior is more unpredictable One resolution criterion was to compare the FSC to, say, 3*σ. BUT: The σ value describes the behavior of unaligned noise. Review: model bias N = 128 N = 256 N = 512 N = 1024 N = 2048 original The model bias can yields false correlations in real space is equivalent to false correlations in Fourier space. images “odd” reconstruction “even” reconstruction Refinement: classical and “gold standard” + OLD STRATEGY merge & refine orientations “GOLD STANDARD” refinement1 refinement2 Different resolution criteria FSC=0.5 FSC=0.143 FSC=0.333