Non-square systems Numerical methods for LS problem Comparison of various decompositions Bi7740: Scientific computing Non-square linear systems Vlad Popovici popovici@iba.muni.cz Institute of Biostatistics and Analyses Masaryk University, Brno Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case The systems of linear equations General form: Ax = b   a11 . . . a1n ... am1 . . . amn     x1 ... xn   =   b1 ... bm   if m < n: underdetermined case; find a minimum-norm solution if m > n: overdetermined case; minimize the squared error if m = n: determined case; already discussed Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Reminder two vectors y, z are orthogonal if yT z = 0 the span of a set of n independent vectors is span({v1, . . . , vn}) = n i=1 αivi | αi ∈ R the row (column) space of a matrix A is the linear subspace generated (or spanned) by the rows (colums) of A. Its dimension is equal to rank(A) ≤ min(m, n). by definition, span(A) is the column space of A and can be written as C(A) = {v ∈ Rm : v = Ax, x ∈ Rn }, so it is the space of transformed vectors by the action of multiplication by the matrix. Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Underdetermined case m < n there are more variables than equations, hence the solution is not unique consider the rows to be linearly independent then, any n-dimensional vector x ∈ Rn can be decomposed into x = x+ + x− where x+ is in the row space of A and x− is in the null space of A (orthogonal to the previous space): x+ = AT α Ax− = 0 Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case this leads to A(x+ + x− ) = AAT α + Ax− = AAT α = b AAT is a m × m nonsingular matrix, so AAT α = b has a unique solution α0 = (AAT )−1 b the corresponding minimal norm solution to original system is x+ 0 = AT (AAT )−1 b note, however, that the orthogonal component x− remains unspecified the matrix AT (AAT )−1 is called the right pseudo-inverse of A (right: A · AT (AAT )−1 = I) Matlab: pinv() Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Example: let A = [1 2] and b = [3] (hence m = 1). solution space: x2 = − 1 2 x1 + 3 2 is a solution, for any x1 ∈ R. x+ = AT α = 1 2 α (row space) Ax− = 0 ⇒ [1 2][x− 1 x− 2 ]T = 0. ⇒ x− 2 = −1 2 x− 1 (null space) The minimal norm solution is the intersection of solution space with the row space and is the closest vector to the origin, among all vectors in the solution space: x+ 0 = [0.6 1.2]T Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Overdetermined case if the rows of A are independent, there is no perfect solution to the system (b span(A)) one needs some other criterion to call a solution acceptable least squares solution x0 minimizes the square Euclidean norm of the residual vector: x0 = arg min x r 2 2 = arg min x b − Ax 2 2 Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Solution to the LS problem From a linear system problem, we arrived at solving an optimization problem with objective function J = 1 2 b − Ax 2 2 = 1 2 (b − Ax)T (b − Ax) Set the derivative wrt x to zero: ∂ ∂x J = AT b − AT Ax = 0 which leads to normal equations AT Ax = AT b, with the solution x0 = (AT A)−1 AT b A† = (AT A)−1 AT is the left pseudo-inverse of A. Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Solution to the LS problem - geometric interpretation let y = Ax, where x is the LS solution the residual r = b − y is orthogonal to span(A), Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case LS data approximation Model: y = c3x2 − c2x + c1. Problem: ci =? when (xi, yi) are given. 1 x = linspace(−2, 2, 100); 2 y = 2 * x.^2 − x + 3; % known model 3 yn = y + 2*(rand(1, 100) − 0.5); % add some noise 4 5 A = [ones(100,1), x', x'.^2]; % Vandermonde matrix 6 coef = pinv(A) * y'; % solution, no noise in y 7 coefn = pinv(A) * yn'; % solution, uniform noise in y 8 9 hold on; 10 plot(x, yn, 'b.'); 11 plot(x, coef(1) + coef(2) .* x + coef(3) .* x.^2, 'k−'); 12 plot(x, coefn(1) + coefn(2) .* x + coefn(3) .* x.^2, ... 'r−'); 13 hold off; Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions The underdetermined case The overdetermined case Condition number if rank(A) = n (columns are independent), the condition number is cond(A) = A 2 A† 2 by convention, if rank(A) < n, cond(A) = ∞ for non-square matrices, the condition number measures the closeness to rank deficiency Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Numerical methods for LS problem the LS solution can be obtained using the pseudo-inverse A† = (AT A)−1 AT or by solving the normal equations AT Ax = AT b which is a system of n equations AT A is symmetric positive definite, so it admits a Cholesky decomposition, AT A = LLT Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Issues with normal equations method floating-point computations in AT A and AT b may lead to information loss sensitivity of the solution is worsen, since cond(AT A) = [cond(A)]2 Example: Let A =   1 1 0 0   with ∈ R+ and < √ mach. Then, in floating-point arithmetic, AT A = 1 + 2 1 1 1 + 2 = 1 1 1 1 which is singular! Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Augmented systems idea: find the solution and the residual as a solution of an extended system, under the orthogonality requirement the new system is I A AT 0 r x = b 0 despite requiring more storage and not being positive definite, it allows more freedom in choosing pivots for LU decomposition in some cases it is useful, but not much used in practice Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Orthogonal transformations a matrix Q is orthogonal if QT Q = I multiplication of a vector by an orthogonal matrix does not change its Euclidean norm: Qv 2 2 = (Qv)T Qv = vT QT Qv = vT v = v 2 2 so, multiplying the two sides of the system by Q does not change the solution again: try to transform the system so it’s easy to solve e.g. triangular system Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares an upper triangular overdetermined (m > n) LS problem has the form R 0 x ≈ b1 b2 where R is an n × n upper triangular matrix and b is partitioned accordingly the residual becomes r 2 2 = b1 − Rx 2 2 + b2 2 2 to minimize the residual, one has to minimize b1 − Rx 2 2 (since b2 2 2 is fixed) and this leads to the system Rx = b1 which can be solved by back-substitution the residual becomes r 2 2 = b2 2 2 and x is the LS solution Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares QR factorization problem: find an m × m orthogonal matrix Q such that an m × n matrix A can be written as A = Q R 0 where R is n × n upper triangular the new problem to solve is QT Ax = R 0 x ≈ b1 b2 = QT b Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares if Q is partitioned as Q = [Q1Q2] with Q1 having n columns, then A = Q R 0 = Q1R is called reduced QR factorization of A (Matlab: [Q,R] = qr(A, 0)) columns of Q1 form an orthonormal basis of span(A), and the columns of Q2 forn an orthonormal basis of span(A)⊥ Q1QT 1 is orthogonal projector onto span(A) the solution to the initial problem is given by the solution to the square system QT 1 Ax = QT 1 b Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares QR factorization - Remember: In general, for an m × n matrix A, with m > n, the factorization is A = QR and Q is an orthogonal matrix: QT Q = I ⇔ Q−1 = QT R is an upper triangular matrix solving the normal equations (for LS solution) AT Ax = AT b comes to solving Rx = QT b HOMEWORK: prove the above statement. Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example 1 A = [3 −6; 4 −8; 0 1]; 2 b = [−1 7 2]'; 3 [Q1,R] = qr(A,0) 4 d = Q1*b; 5 bksolve(R, d) % remember back−subsitution method 6 7 % equivalent (linear regression!): 8 regress(b, A) Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares A statistical perspective Changing a bit the notation, the linear model is E[y] = Xβ, Cov(y) = σ2 I It can be shown that the best linear unbiased estimator is ˆβ = (XT X)−1 XT y = R−1 QT y for a decomposition X = QR. Then ˆy = QQT y. (Gauss-Markov thm.: LS estimator has the lowest variance among all unbiased linear estimators.) Also, Var(ˆβ) = (XT X)−1 σ2 = (RT R)−1 σ2 where σ2 = y − ˆy 2 /(m − n − 1). Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Computing the QR factorization similarly to LU factorization, we nullify entries under the diagonal, column by column now, use orthogonal transformations: Householder transformations Givens rotations Gram-Schmidt orthogonalization Matlab:qr() Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Householder transformations H = I − 2 vvT vT v , v 0 H is orthogonal and symmetric: H = HT = H−1 v are chosen such that for a vector a: Ha =   α 0 ... 0   = α   1 0 ... 0   = αe1 this leads to v = a − αe1 with α = ± a 2, where the sign is chosen to avoid cancellation Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Householder QR factorization apply, the Householder transformation to nuliffy the entries below diagonal the process is applied to each column (of the n) and produces a transformation of the form Hn . . . H1A = R 0 where R is n × n upper triangular then take Q = H1 . . . Hn note that the multiplication of H with a vector u is much cheaper than a general matrix-vector multiplication: Hu = I − 2 vvT vT v u = u − 2 vT u vT v v Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Gram-Schmidt orthogonalization idea: given two vectors a1 and a2, we seek orthonormal vectors q1 and q2 having the same span method: subtract from a2 its projection on a1 and normalize the resulting vectors apply this method to each column of A to obtain the classical Gram-Schmidt procedure Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Algorithm: Classical Gram-Schmidt for k = 1 to n do qk ← ak ; for j = 1 to k − 1 do rjk ← qT j ak ; qk ← qk − rjk qj; end for rkk ← qk 2; qk ← qk /rkk ; end for The resulting matrices Q (with qk as columns) and R (with elements rjk ) form the reduced QR factorization of A. Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Modified Gram-Schmidt procedure improved orthogonality in finite-precision reduced storage requirements HOMEWORK: implement the procedure below in Matlab Algorithm: Modified Gram-Schmidt for k = 1 to n do rkk ← ak 2; qk ← ak /rkk ; for j = k + 1 to n do rjk ← qT k aj; aj ← aj − rkjqk ; end for end for Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Further topics on QR factorization if rank(A) < n then R is singular and there are multiple solutions x; choose the x with the smallest norm in limited precision, the rank can be lower than the theoretical one, leading to highly sensitive solutions → an alternative could be the SVD method (later) there exists a version, QR with pivoting, that chooses everytime the column with largest Euclidean norm for reduction → improves stability in rank deficient scenarios another method of factorization: Givens rotations - makes one 0 at a time Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Singular Value Decomposition - SVD SVD of an m × n matrix A has the form A = UΣVT where U is m × m orthogonal matrix, V is n × n orthogonal matrix, and Σ is m × n diagonal matrix, with σii =    0 if i j σi ≥ 0 if i = j σi are usually ordered such that σ1 ≥ · · · ≥ σn and are called singular values of A the columns ui and vi are called left and right singular vectors of A, respectively Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares minimum norm solution to Ax ≈ b is x = σi 0 uT i b σi vi for ill-conditioned or rank-deficient problems, the sum should be taken over "large enough" σ’s: σi≥ . . . Euclidean norm: A 2 = maxi{σi} Euclidean condition number: cond(A) = maxi{σi} mini{σi} Rank of A : rank(A) = #{σi > 0} Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Pseudoinverse (again) the pseudoinverse of an m × n matrix A with SVD decomposition A = UΣVT is A+ = VΣ−1 UT where [Σ−1 ]ii =    1/σi for σi > 0 0 otherwise pseudoinverse always exists and minimum norm solution to Ax ≈ b is x = A+b if A is square and nonsingular, A−1 = A+ Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares SVD and subspaces relevant to A ui for which σi > 0 form the orthonormal basis of span(A) ui for which σi = 0 form the orthonormal basis of the orthogonal complement of span(A) vi for which σi = 0 form the orthonormal basis of the null space of A vi for which σi > 0 form the orthonormal basis of the orthogonal complement of the null space of A Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares SVD and matrix approximation A can be re-written as A = UΣVT = σ1u1vT 1 + · · · + σnunvT n let Ei = uivT i ; Ei has rank 1 and requires only m + n storage locations Eix multiplication requires only m + n multiplications assuming σ1 ≥ σ2 ≥ . . . σn then by using the largest k singular values, one obtains the closes approximation of A of rank k: A ≈ k i=1 σiEi many applications to image processing, data compression, cryptography, etc. Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Example - image compression Matlab: [U,S,V] = svd(X) Original image and its approximations using 1,2,3,4,5 and 10 terms: Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares 1 X = imread('face01.png'); 2 X = double(X) ./ 255.0; 3 4 [U,S,V] = svd(X); 5 6 imshow(U(:,1) * S(1,1) * V(:,1)' ); 7 imshow(U(:,1:5)* S(1:5, 1:5) * V(:,1:5)'); 8 imshow(U(:,1:10)* S(1:10, 1:10) * V(:,1:10)'); Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Orthogonal transformations Singular Value Decomposition Total least squares Total least squares Ax b ordinary least squares applies when the error affects only b what if there is error (uncertainty) in A as well? total least squares minimizes the orthogonal distances, rather than vertical distances, between model and data can be computed using SVD of [A, b] Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Outline 1 Non-square systems The underdetermined case The overdetermined case 2 Numerical methods for LS problem Orthogonal transformations Singular Value Decomposition Total least squares 3 Comparison of various decompositions Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Comparison: work effort computing AT A requires about n2 m/2 multiplications and solving the resulting symmetric system, about n3 /6 multiplications LS problem solution by Householder QR requires about mn2 − n3 /3 multiplications if m n, Householder method requires about twice as much work normal eqs. cost of SVD is ≈ (4 . . . 10) × (mn2 + n3 ) depending on implementation Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Comparison: precision relative error for normal eqs. is ∼ [cond(A)]2 ; if cond(A) ≈ 1/ √ mach, Cholesky factorization will break Householder method has a relative error ∼ cond(A) + r 2[cond(A)]2 which is the best achievable for LS problems Householder method breaks (in back-substitution step) for cond(A) 1/ mach while Householder method is more general and more accurate than normal equations, it may not always be worth the additional cost Vlad Bi7740: Scientific computing Non-square systems Numerical methods for LS problem Comparison of various decompositions Comparison: precision, cont’d for (nearly) rank-deficient problems, the pivoting Householder method produces useful solution, while normal equations method fails SVD is more precise and more robust than Householder method, but much more expensive computationally Vlad Bi7740: Scientific computing