Numerical Methods for Differential Equations YA YAN LU Department of Mathematics City University of Hong Kong Kowloon, Hong Kong 1 Contents 1 ODE IVP: Explicit One-step Methods 4 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Euler and Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Local truncation error and order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Embedded Runge-Kutta methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 ODE IVP: Implicit One-step Methods 17 2.1 Stiff equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Implicit one-step methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 ODE IVP: Multi-step Methods 23 3.1 Explicit multi-step methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Implicit multi-step methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 ODE IVP: Stability Concepts 29 4.1 Zero stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Absolute stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 ODE Boundary Value Problems 34 5.1 The shooting method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Finite difference methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 The finite element method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6 Finite Difference Methods for Parabolic PDEs 42 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2 Classical explicit method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3 Crank-Nicolson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.5 Alternating direction implicit method . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7 Finite Difference Methods for Hyperbolic PDEs 52 7.1 First order hyperbolic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.2 Explicit methods for wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.3 Maxwell’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2 8 Finite Difference Methods for Elliptic PDEs 60 8.1 Finite difference method for Poisson equation . . . . . . . . . . . . . . . . . . . . . . 60 8.2 Fast Poisson solver based on FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 8.3 Classical iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8.4 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8.4.1 1-D optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8.4.2 Subspace minimization problem . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.4.3 Orthogonal residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.4.4 The next conjugate direction . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.4.5 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.4.6 Rate of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3 Chapter 1 ODE IVP: Explicit One-step Methods 1.1 Introduction In this chapter, we study numerical methods for initial value problems (IVP) of ordinary differential equations (ODE). The first step is to re-formulate your ODE as a system of first order ODEs: dy dt = f(t,y) for t > t0 (1.1) with the initial condition y(t0) = y0 (1.2) where t is the independent variable, y = y(t) is the unknown function of t, y0 is the given initial condition, and f is a given function of t and y which describes the differential equation. High order differential equations can also be written as a first order system by introducing the derivatives as new functions. Our numerical methods can be used to solve any ordinary differential equations. We only need to specify the function f. The variable t is discretized, say tj for j = 0,1,2,..., then we determine yj ≈ y(tj) for j = 1,2,3,.... The first class of methods (Runge-Kutta methods) involve one-step. If yj is calculated, then we construct yj+1 from yj. Previous values such as yj−1 are not needed. Since this is an IVP and for the first step, we have y0 only at t0, then we can find y1, y2, ..., in a sequence. The one-step methods are vary natural. A higher order method gives a more accurate numerical solution than a lower order method for a fixed step size. But a higher order one-step method requires more evaluations of the f function. For example, the first order Euler’s method requires only one evaluation of f, i.e., f(tj,yj), but a fourth order Runge-Kutta method requires four evaluations of f. For a large scale problem, the computation of f could be time consuming. Thus, it is desirable to have high order methods that require only one evaluation of f in each step. This is not possible in a onestep method. But it is possible in a multi-step method. Therefore, the main advantage of the multi-step method is that they are efficient. However, they are more difficult to use. For one-step methods, we will introduce implicit methods. These are methods designed for the socalled “stiff” ODEs. If an explicit method is used for a stiff ODE and the step size is not small enough, the error (between the exact and the numerical solution) may grow very fast. For these stiff ODEs, the 4 implicit methods are useful. The situation is the same for multi-step methods. We also need implicit multi-step methods for stiff ODEs. We will also introduce the embedded Runge-Kutta methods. These are methods that combine two methods together, so that the step size can be automatically chosen for a desired accuracy. There are also multi-step methods that allow automatic selection of the step size. But they are more complicated and we will not cover them. Consider the following example. We have the following differential equation for u = u(t): u′′′ +sin(t) 1+(u′′)2u′ + u 1+e−t = t2 (1.3) for t > 0, with the initial conditions: u(0) = 1, u′ (0) = 2, u′′ (0) = 3. (1.4) We can introduce a vector y y(t) =   u(t) u′(t) u′′(t)   and write down the equation for y as y′ = f(t,y) =   u′ u′′ −sin(t) 1+(u′′)2 u′ −u/(1+e−t)+t2   The initial condition is y(0) = [1,2,3]. Here is a simple MATLAB program for the above function f. function k = f(t, y) % remember y is a column vector of three components. k = zeros(3,1); k(1) = y(2); k(2) = y(3); k(3) = -sin(t) * sqrt(1+y(3)^2) * y(2) - y(1)/(1 + exp(-t)) + t^2; In the MATLAB program, y(1), y(2), y(3) are the three components of the vector y. They are u(t), u′(t) and u′′(t), respectively. They are different from y(1), y(2) and y(3) which are the vectors y evaluated at t = 1, t = 2 and t = 3. Notice that we also have y(0), which is the initial value of y. But we do not have y(0). Anyway, the components of y are only used inside the MATLAB programs. A numerical method is usually given for the general system (1.1-1.2). We specify the system of ODEs by writing a program for the function f, then the same numerical method can be easily used for solving many different differential equations. 1.2 Euler and Runge-Kutta Methods Numerical methods start with a discretization of t by t0, t1, t2, ..., say tj = t0 + jh 5 where h is the step size. Numerical methods are formulas for y1, y2, y3, ..., where yj is the approximate solution at tj. We use y(tj) to denote the (unknown) exact solution, thus yj ≈ y(tj). Please notice that when y is a vector, y1, y2, ..., are also vectors. In particular, y1 is not the first component of y vector, y2 is not the 2nd component of the y vector. The components of y are only explicitly given inside the MATLAB programs as y(1), y(2), etc. Euler’s method: yj+1 = yj +hf(tj,yj). (1.5) Since y0 is the known initial condition, the above formula allows us o find y1, y2, etc, in a sequence. The Euler’s method can be easily derived as follows. First, we assume h is small and consider the Taylor expansion: y(t1) = y(t0 +h) = y(t0)+hy′ (t0)+... Now, we know that y′(t0) = f(t0,y(t0)). If we keep only the first two terms of the Taylor series, we obtain the first step of Euler’s method: y1 = y0 +hf(t0,y0), where y(t1) is replaced by the “numerical solution” y1, etc. The general step from tj to tj+1 is similar. Here is a MATLAB program for the Euler’s method: function y1 = eulerstep(h, t0, y0) % This is one step of the Euler’s method. It is % given for the first step, but any other step % is just the same. You need the MATLAB function % f to specify the system of ODEs. y1 = y0 + h* f(t0, y0) Now, let us solve (1.3-1.4) from t = 0 to t = 1 with the step size h = 0.01. For this purpose, we need to write a main program. In the main program, we specify the initial conditions, initial time t0, final time and the total number of steps. The step size can then be calculated. Here is the MATLAB program. % The main program to solve (1.3)-(1.4) from t=0 to % t = 1 by Euler’s method. % initial time t0 = 0; % final time tfinal = 1; % number of steps nsteps = 100; % step size 6 h = (tfinal - t0)/ nsteps; % initial conditions y = [1, 2, 3]’; % set the variable t. t = t0 % go through the steps. for j= 1 : nsteps y = eulerstep(h, t, y) t = t + h % saved output for u(t) only, i.e. the first component of y. tout(j) = t; u(j) = y(1); end % draw a figure for the solution u. plot(tout, u) Now, insider MATLAB, in a folder containing the three programs: f.m, eulerstep.m, eulermain.m, if we type eulermain, we will see a solution curve. That is the solid curve in Fig. 1.1. This is for the 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1.5 2 2.5 3 3.5 4 4.5 5 t u(t) Numerical solutions by Euler’s method using h=0.01, 0.1, 0.2 Figure 1.1: Numerical solutions of (1.3) and (1.4) by Euler’s method. The solid curve is for h = 0.01. The “+” is for h = 0.1 and the “o” is for h = 0.2. case of h = 0.01. We also want to see what happens if h is 0.2 and 0.1. For this purpose, we change nsteps to 5 and 10, then use plot(tout, u, ’o’) and plot(tout, u, ’+’) to show the results. All three plots are shown in the Fig. 1.1. The Euler’s method is not very accurate. To obtain a numerical solution with an acceptable accuracy, we have to use a very small step size h. A small step size h implies a larger number of steps, thus more 7 computing time. It is desirable to develop methods that are more accurate than Euler’s method. If we look at the Taylor series again, we have y(t1) = y(t0 +h) = y(t0)+hy′ (t0)+ h2 2 y′′ (t0)+ h3 6 y′′′ (t0)+... This can be written as y(t1)−y(t0) h = y′ (t0)+ h 2 y′′ (t0)+ h2 6 y′′′ (t0)+... (1.6) Actually, the right hand side is a more accurate approximation for y′(t0 +h/2), since y′ (t0 + h 2 ) = y′ (t0)+ h 2 y′′ (t0)+ h2 8 y′′′ (t0)+... The first two terms on the right hand sides of the above two equations are identical, although the third terms involving y′′′(t0) are different. Thus, y(t1)−y(t0) h ≈ y′ (t0 + h 2 ) = f t0 + h 2 ,y(t0 + h 2 ) The right hand side now involves y(t0 + h/2). Of course, this is now known, because we only have y(t0). The idea is that we can use Euler’s method (with half step size h/2) to get an approximate y(t0 +h/2), then use the above to get an approximation of y(t1). The Euler approximation for y(t0 +h/2) is y(t0)+h/2f(t0,y0). Therefore, we have k1 = f(t0,y0) (1.7) k2 = f(t0 + h 2 ,y0 + h 2 k1) (1.8) y1 = y0 +hk2. (1.9) This is the first step of the so-called midpoint method. The general step is obtained by simply replacing t0, y0 and y1 by tj, yj and yj+1, respectively. The right hand side of (1.6) can also be approximated by (y′(t0)+y′(t1))/2, because y′(t0)+y′(t1) 2 = y′ (t0)+ h 2 y′′ (t0)+ h2 4 y′′′ (t0)+... Therefore, we have y(t1)−y(t0) h ≈ y′(t0)+y′(t1) 2 . We can replace y′(t0) and y′(t1) by f(t0,y(t0)) and f(t1,y(t1)), but of course, we do not know y(t1), because that is what we are trying to solve. But we can use Euler’s method to get the first approximation of y(t1) and use it in f(t1,y(t1)), then use the above to get the second (and better) approximation of y(t1). This can be summarized as k1 = f(t0,y0) (1.10) k2 = f(t0 +h,y0 +hk1) (1.11) y1 = y0 + h 2 (k1 +k2). (1.12) 8 This is the first step of the so-called modified Euler’s method. The general step from tj to tj+1 is easily obtained by replacing the subscripts 0 and 1 by j and j +1, respectively. Similarly, the right hand side of (1.6) can be approximated by Ay′ (t0)+By′ (t0 +αh), where α is a given constant, 0 < α ≤ 1, the coefficients A and B can be determined, such that the above matches the first two terms of the right hand side of (1.6). We obtain A = 1− 1 2α , B = 1 2α . Then y′(t0 +αh) = f(t0 +αh,y(t0 +αh)) and we use Euler’s method to approximate y(t0 +αh). That is y(t0 +αh) ≈ y(t0)+αhf(t0,y(t0)). Finally, we obtain the following general 2nd order Runge-Kutta Methods: k1 = f(tj,yj) (1.13) k2 = f(tj +αh,yj +αhk1) (1.14) yj+1 = yj +h 1− 1 2α k1 + 1 2α k2 (1.15) Since α is an arbitrary parameter, there are infinitely many 2nd order Runge-Kutta methods. The midpoint method and the modified Euler’s method correspond to α = 1/2 and α = 1, respectively. In this formula, k1 and k2 are temporary variables, they are different for different steps. There are many other Runge-Kutta methods (3rd order, 4th order and higher order). The following classical 4th order Runge-Kutta method is widely used, because it is quite easy to remember. k1 = f(tj,yj) (1.16) k2 = f(tj + h 2 ,yj + h 2 k1) (1.17) k3 = f(tj + h 2 ,yj + h 2 k2) (1.18) k4 = f(tj +h,yj +hk3) (1.19) yj+1 = yj + h 6 (k1 +2k2 +2k3 +k4) (1.20) We have mentioned the order of a method above. This concept will be explained in the next section. Next, we consider a MATLAB implementation of the midpoint method. For this purpose, we write the following function called midptstep which is saved in the file called midptstep.m. function y1 = midptstep(h, t0, y0) % This is midpoint method (one of the second order Runge-Kutta methods). % It is given for the first step, but any other step is just the same. % You need the MATLAB function f to specify the system of ODEs. k1 = f(t0, y0); k2 = f(t0+h/2, y0 + (h/2)*k1) y1 = y0 + h* k2; 9 To solve the same differential equation (1.3-1.4), we need the earlier MATLAB function f and a main program. We can write a main program by copying the main program eulermain for Euler’s method. The new main program midptmain is different from eulermain only in one line. The line y = eulerstep(h, t, y) is now replaced by y = midptstep(h, t, y) You can see that writing a program for a new method is very easy, since we have separated the differential equation (in f.m ) and the numerical method (in eulerstep.m or midptstep.m) from the main program. In Fig. 1.2, we show the numerical solution u(t) for (1.3-1.4) calculated by the midpoint 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1.5 2 2.5 3 3.5 4 4.5 5 t u(t) Numerical solutions by the midpoint method for h=0.01 and h=0.2 Figure 1.2: Numerical solutions by the midpoint method. The solid curve is for h = 0.01. The “o” is for h = 0.2. method with h = 0.01 and h = 0.2. You can see that the midpoint solution obtained with h = 0.2 is much more accurate than the Euler’s solution with the same h. 1.3 Local truncation error and order When a numerical method is used to solve a differential equation, we want to know how accurate is the numerical solution. We will denote the exact solution as y(t), thus y(tj) is the exact solution at tj. The numerical solution at tj is denoted as yj, therefore, we are interested in the following error: ej = |y(tj)−yj|. We do not expect to be able to know ej exactly, because we do not have the exact solution in general. Therefore, we will be happy to have some estimates (such as approximate formulas or inequalities) for ej. However, even this is not so easy. The reason is that the error accumulates. Let us look at the steps. 10 We start with y0 = y(t0) which is exact, then we calculate y1 which approximates y(t1), then we calculate y2 which approximates y(t2), etc. Notice that when we calculate y2, we use y1, not y(t1). The numerical solution y1 has some error, this error will influence y2. Therefore, the error e2 depends on e1. Similarly, the error at the third step, i.e., e3, depends on the error at step 2, etc. As a result, it is rather difficult to estimate ej. The numerical methods given in the previous sections can be written in the following general form: yj+1 = φ(tj,h,yj), (1.21) where φ is some function related to the function f which defines the differential equation. For example, the Euler’s method is φ(tj,h,yj) = yj +hf(tj,yj). The midpoint method is φ(tj,h,yj) = yj +hf tj + h 2 ,yj + h 2 f(tj,yj) . If we have the exact solution, we can put the exact solution y(t) into (1.21). That is, we replace yj and yj+1 by y(tj) and y(tj+1) in (1.21). When this is done, the two sides of (1.21) will not equal, so we should consider Tj+1 = y(tj+1)−φ(tj,h,y(tj)). (1.22) The above Tj+1 is the so-called local truncation error. If we know the exact solution y(t), then we can calculate Tj. In reality, we do not know the exact solution, but we can understand how Tj+1 depends on step size h by studying the Taylor series of Tj+1. We are interested in the local truncation error because it can be estimated and it gives information on the true error. Therefore, we will try to do a Taylor series for Tj+1 at tj, assuming h is small. In fact, we only need to calculate the first non-zero term of the Taylor series: Tj+1 = Chp+1 +... where the integer p is the order of the method, C is a coefficient that depends on tj, y(tj), y′(tj), f(tj,y(tj)), etc. But C does not depend on the step size h. The above formula for Tj+1 gives us information on how Tj+1 varies with the step size h. Because h is supposed to be small, we notice that a larger p implies that |Tj+1| will be smaller. Therefore, the method will be more accurate if p is larger. We notice that |T1| = e1, because y0 = y(t0), thus y1 = φ(t0,h,y0) = φ(t0,h,y(t0)). However, it is clear that |Tj| = ej for j > 1. When we try to work out the first non-zero term of the Taylor series of Tj+1, we work on the general equation (1.1). This is for the local truncation error at tj+1. But the general case at tj+1 has no real difference with the special case at t1. If we work out the Taylor series for T1, we automatically know the result at Tj+1. The integer p (that is the order of the method) should be the same. In the coefficient C, we just need to replace t0, y(t0), f(t0,y(t0)), ... by tj, y(tj), f(tj,y(tj)), ... Now, let us work out the local truncation error for Euler’s method. The method is yj+1 = yj + hf(tj,yj) = φ(tj,h,yj). Thus, T1 = y(t1)−φ(t0,h,y(t0)) = y(t1)−y1. 11 We have a Taylor expansion for y(t1) at t0: y(t1) = y(t0)+hy′ (t0)+ h2 2 y′′ (t0)+... Notice that y′(t0) = f(t0,y(t0)). Therefore, T1 = h2 2 y′′ (t0)+... The power of h is p+1 for p = 1. Therefore, the Euler’s method is a first order method. We can show that the local truncation error of the general 2nd order Runge-Kutta methods is T1 = h3 4 2 3 −α y′′′ +αy′′ ∂ f ∂y t=t0 +... As an example, we prove the result for the midpoint method (α = 1/2). The local truncation error is T1 = h3 1 24 y′′′ + 1 8 y′′ ∂ f ∂y t=t0 +O(h4 ) Proof: First, since the differential equation is y′ = f(t,y). We use the chain rule and obtain: y′′ = ft + fyy′ = ft + f fy y′′′ = ftt + ftyy′ +[ f]′ fy + f[ fy]′ = ftt + f fty +[ ft + fyy′ ] fy + f[ fty + fyyy′ ] = ftt +2f fty + f2 fyy +[ ft + f fy] fy = ftt +2f fty + f2 fyy +y′′ fy Now for y1 using the midpoint method, we have k1 = f(t0,y0) = y′ (t0) k2 = f(t0 + h 2 ,y0 + h 2 k1) = f(t0 + h 2 ,y0 + h 2 y′ (t0)). Now, we need Taylor expansion for functions of two variables. In general, we have f(t +δ,y+∆) = f(t,y)+δ ft(t,y)+∆ fy(t,y) + δ2 2 ftt(t,y)+δ∆ fty(t,y)+ ∆2 2 fyy(t,y)+... Now, for k2, apply the above Taylor formula and use f to denote f(t0,y0) = y′(t0), we have k2 = f + h 2 ft + h 2 y′ fy + h2 8 ftt + h2y′ 4 fty + h2(y′)2 8 fyy +O(h3 ) = y′ + h 2 y′′ + h2 8 [y′′′ −y′′ fy]+O(h3 ). Here y, f and their derivatives are all evaluated at t0. Notice that y(t0) = y0. Therefore, y1 = y+hk2 = y+hy′ + h2 2 y′′ + h3 8 [y′′′ −y′′ fy]+O(h4 ) Use the Taylor expansion y(t1) = y(t0 +h) = y+hy′ + h2 2 y′′ + h3 6 y′′′ +O(h4 ) and the definition for T1, we have T1 = h3 6 y′′′ − h3 8 [y′′′ −y′′ fy]+O(h4 ) = h3 1 24 y′′′ + 1 8 y′′ fy +O(h4 ). 12 1.4 Embedded Runge-Kutta methods Some differential equations may have solutions that change rapidly in some time intervals and change relatively slowly in other time intervals. As an example, we consider the Van der Pol equation: u′′ +u = µ(1−u2 )u′ , t > 0. For µ = 6 and the initial conditions u(0) = 1, u′(0) = 0, we use the midpoint method with h = 0.04 and solve the equation from t = 0 to t = 40. The solution is given in Fig. 1.3. It appears that we should not 0 5 10 15 20 25 30 35 40 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 t u(t) Solution of the Van der Pol equation Figure 1.3: A solution of the Van der Pol equation for µ = 6 and u(0) = 1, u′(0) = 0. keep the step size h as a constant. Rather, we should only use a small h when the solution changes with time rapidly. A numerical method that automatically selects the step size in each step is an adaptive method. A class of adaptive method for solving differential equations is the so-called embedded Runge-Kutta methods. An embedded Runge-Kutta method uses two ordinary Runge-Kutta methods for comparing the numerical solutions and selecting the step size. Moreover, the two methods in an embedded method typically share the evaluation of f (we are solving y′ = f(t,y)). Therefore, the required computation effort is minimized. Here is a 3rd order Runge-Kutta method k1 = f(tj,yj) (1.23) k2 = f(tj +h,yj +hk1) (1.24) k3 = f(tj + h 2 ,yj + h 4 (k1 +k2)) (1.25) yj+1 = yj + h 6 (k1 +4k3 +k2) (1.26) 13 The cost of this method is mainly related to the calculation of k1, k2 and k3. That is, three evaluations of f. With the above k1 and k2, we can use the 2nd order Runge-Kutta method (α = 1, the modified Euler’s method) to get a less accurate solution at tj+1: y∗ j+1 = yj + h 2 (k1 +k2). (1.27) Although we are not going to use y∗ j+1 as the numerical solution at tj+1, we can still use y∗ j+1 to compare with the 3rd order solution yj+1. If their difference is too large, we reject the solution and use a smaller stepsize h to repeat the calculation. If their difference is small enough, we will accept yj+1. But we also use this information to suggest a step size for the next step. A user must specify a small number ε (called the error tolerance) to control the error for selecting the step size. The difference between yj+1 and y∗ j+1 is e = ||yj+1 −y∗ j+1|| = h 3 ||k1 −2k3 +k2||. (1.28) Since y may be a vector, we have used a vector norm above. To understand the formula for changing the step size, we consider the first step and the exact solution y(t1) at t1. The local truncation errors give us y(t1)−y∗ 1 = C1h3 +... y(t1)−y1 = C2h4 +... for some C1 and C2 related to the solution at t0, its derivatives, the function f and its partial derivatives at t1. Thus, we have y1 −y∗ 1 = C1h3 +... Therefore, e ≈ ||C1||h3 . (1.29) Although we do not know C1, the above relationship allows us to design a stepsize selection method based on the user specified error tolerance ε. If e ≤ ε, we accept y1, otherwise, we reject y1 and repeat this step. The current step size used for calculating y1 and y∗ 1 is h, how should we choose a new step size? We have enew ≈ ||C1||h3 new < ε. Compare this with (1.29), we have ||C1|| h3 new ||C1|| h3 < ε e or hnew < h ε e 1/3 To satisfy the above inequality, we use h := 0.9h ε e 1/3 (1.30) to reset the stepsize. Now, if e ≤ ε, we accept t1 = t0 +h and y1, but we also use formula (1.30) to reset the stepsize. This gives rise to the possibility to increase the stepsize when the original h is too small (so that e is much smaller than ε). Algorithm: to solve y′ = f(t,y) from t0 to tend with error tolerance ε and initial condition y(t0) = y0, 14 initialize t = t0, y = y0, ε, h (initial step size) while t < tend k1 = f(t,y) k2 = f(t +h,y+hk1) k3 = f(t +h/2,y+ h 4(k1 +k2)) e = h 3||k1 −2k3 +k2|| if e ≤ ε, then y = y+ h 6(k1 +4k3 +k2) t = t +h output t, y end if h = 0.9h(ε/e)1/3 end Notice that the formula for resetting h is outside the “if...end if” loop. That is, whether the calculation is accepted or not, h will always be changed. As an example, we consider y′ = y−ty2 , t > 0, (1.31) with initial condition y(0) = 1. If we use ε = 10−5 and the initial step size h = 0.5, we get k1 = 1, k2 = 0.375, k3 ≈ 0.8286, e ≈ 0.047. Since e > ε, this step is rejected. We have the new step size h ≈ 0.0269 and k2 ≈ 0.9985, k3 ≈ 0.9996, e ≈ 6.4194×10−5 . Thus, the new step is accepted and t1 ≈ 0.0269, y1 ≈ 1.0268. The numerical solution of this differential equation is shown in Fig. 1.4. A MATLAB program (erk23.m) for the embedded Runge-Kutta method is given below. function [tout, yout] = erk23(t0, tfinal, y0, tiny, h0) % This is the embedded Runge-Kutta method using a 3rd order % Runge-Kutta method and a 2nd order Runge-Kutta method. % We are solving y’ = f(t, y), where y is a column vector % of functions of t. % Input: t0, the initial time % tfinal, the final time % y0, a column vector of the initial conditions, i.e., y(t0) = y0. % tiny, the small parameter for error tolerance % h0, initial time step % Output: tout, a row vector for the discrete time steps 15 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 t y(t) Figure 1.4: Numerical solution of (1.31) by embedded Runge-Kutta method. % yout, a matrix for solutions of y at various various time. t = t0; y = y0; h = h0; tout = t0; yout = y0; while t < tfinal k1 = f(t, y); k2 = f(t+h, y + h*k1); k3 = f(t+h/2, y + h*(k1+k2)/4); E = (h/3)*norm(k1-2*k3 +k2); if E <= tiny y = y + (h/6)*(k1 + 4*k3 + k2); t = t + h; tout = [tout, t]; yout = [yout, y]; end h = 0.9 * h * (tiny/E)^(1/3); end This program requires f.m which specifies the differential equation. 16 Chapter 2 ODE IVP: Implicit One-step Methods 2.1 Stiff equations The Euler’s method and the Runge-Kutta methods in previous sections are explicit methods. For the step from tj to tj+1, the numerical solution yj+1 has an explicit formula. In the section on local truncation errors, we write down such a formula by yj+1 = φ(tj,h,yj). It turns out that the explicit methods have some difficulties for some differential equations. These are the so-called stiff differential equations. First, we consider the following example: y′ +sin(t) = −200(y−cos(t)), (2.1) with initial condition y(0) = 0. The exact solution is y(t) = cos(t)−e−200t . As t increases, the exact solution converges to cos(t) rapidly. Let us use the Euler’s method for this equation. The numerical solutions are obtained with h = 0.008 and h = 1/99 in Fig. 2.1. We observe that the numerical solution looks reasonable for large t if h = 0.008. There are large errors at the first a few steps, then the error decrease rapidly. In fact, this is true for h < 0.01. If h = 1/99, we can see that the error oscillates and grows exponentially in t. If we replace the Euler’s method by a higher order Runge-Kutta (explicit) method, we still have similar difficulties. While the above example appears to be a toy problem, we also have more realistic examples. Let us consider the heat equation ut = uxx, 0 < x < L, t > 0. (2.2) This is one of the simplest partial differential equations (PDEs) and it will be studied again in Chapter 6. This equation is usually solved with two boundary conditions and one initial condition. For example, we have u(0,t) = a and u(L,t) = b, t ≥ 0, u(x,0) = f(x), 0 < x < L, 17 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.5 1 1.5 2 t y(t) step size h = 0.008 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −5 0 5 step size h = 1/99 t y(t) Figure 2.1: Comparison of the exact and the numerical solutions of equation (2.1). The numerical solutions are obtained by the Euler’s method where f is a given function of x. We can solve this initial and boundary value problem by separation of variables. We have u(x,t) = u∞(x)+ ∞ ∑ j=1 ˆgj sin jπx L e−( jπ/L)2t , where u∞(x) = a+(b−a) x L , ˆgj = 2 L Z L 0 [ f(x)−u∞(x)]sin jπx L dx. Notice that the solution converges rapidly to the time-independent (steady) solution u∞ as t → ∞. The steady solution is determined by the boundary conditions only and it is a linear function of x. In Chapter 6, we will study a number of numerical methods for this equation. For the moment, we will use a simple method to discretize the variable x and approximate this PDE by a system of ODEs. We discretize x by xi = i∆x for i = 0,1,2,...,m+1 and ∆x = L m+1 , denote ui = u(xi,t) and approximate uxx(xi,t) by uxx(xi,t) ≈ ui−1 −2ui +ui+1 (∆x)2 , 18 then the heat equation is approximated by the following system of ODEs: d dt         u1 u2 ... um−1 um         = 1 (∆x)2        −2 1 1 −2 1 1 ... ... ... −2 1 1 −2                u1 u2 ... um−1 um         + 1 (∆x)2        a 0 ... 0 b        . (2.3) Since only x is discretized, we call the above approximation a semi-discretization. Originally, the heat equation is defined on the two-dimensional domain {(x,t)|0 < x < L,t > 0}, now we are approximating u only on the lines: x = xi for t > 0. We call such a process that turns PDE to a system of ODEs the method of lines. In the following, we let L = 1, a = 1, b = 2 and f(x) = 0 for 0 < x < L, and solve the above system by the 4th order classical Runge-Kutta method. The right hand side of the ODE system is given in f.m: function k=f(t,u) % ODE system for semi-discretized heat equation. % u_t = u_xx, 00, % u=a at x=0, u=b at x=L. global L a b m = length(u); dx = L/(m+1); s = 1/(dx)^2; k(1)=s*(a-2*u(1)+u(2)); k(m)=s*(u(m-1)-2*u(m)+b); k(2:m-1)=s*(u(1:m-2)-2*u(2:m-1)+u(3:m)); The 4th order classical Runge-Kutta method is given in rk4step.m: function y1 = rk4step(h,t0,y0); k1 = f(t0,y0); k2 = f(t0+h/2, y0 + (h/2)*k1); k3 = f(t0+h/2, y0 + (h/2)*k2); k4 = f(t0+h, y0+ h*k3); y1 = y0 + (h/6)*(k1+2*k2+2*k3+k4); The main program is given below. global L a b L=1; a=1; b=2; % discretizing x by m points in (0,L) m = 99; dx = L/(m+1); x = dx*(1:m); % simple initial condition u = 0. 19 u = zeros(1,m); % solve from t=0 to t=0.05 with nsteps tzero = 0; tfinal = 0.05; nsteps = 718; % try 717, 716 h = (tfinal - tzero)/nsteps for j=1:nsteps t = (j-1)*h; u = rk4step(h,t,u); end % draw the solution at t=tfinal plot([0,x,L],[a,u,b]) We have tried two steps h = 0.05/718 ≈ 6.964×10−5 and h = 0.05/716 ≈ 6.983×10−5. The smaller h gives a satisfactory solution, while the larger h gives an incorrect solution with wild oscillations. Compared with the grid size ∆x = 0.01, the time step size h = 6.964 × 10−5 appears to be extremely 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1 −0.5 0 0.5 1 1.5 2 x u(x,0.05) Figure 2.2: Numerical solutions of the heat equation at t = 0.05 by a 4th order Runge-Kutta method with step size h = 0.05/718 and h = 0.05/716. small. The concept of “stiff” differential equation is not rigorously defined. Suppose we have an exact solution y(t) of a differential equation. Fix a time t∗, the differential equation has infinitely many other solutions for t > t∗. If these other solutions converge to y(t) rapidly for t > t∗, then we may say that this differential equation is stiff at t∗. If a nearby solution, say ˜y(t), converges to y(t) rapidly, the derivative of ˜y can be large (in absoluate value). Numerically, this can be difficult to catch. For explicit method, the error can decrease if the step size is sufficiently small. But the error may increase exponentially, if the step size is not small enough. 20 2.2 Implicit one-step methods For stiff differential equations, we need the “implicit” methods. One step implicit methods can be written as yj+1 = φ(tj,h,yj,yj+1). (2.4) Notice that yj+1 is what we want to calculate, but it also appears in the right hand side. To be more precise, the yj+1 in the right hand side only appears inside the function f. Remember that we are trying to solve y′ = f(t,y). The method is called implicit, because we do not have an explicit formula for yj+1. Instead, we have to solve an equation to find yj+1. If the differential equation is complicated, an implicit method can be very difficult to use. When applied to stiff differential equations, the implicit methods behave better. We can use a large step size. Of course, if the step size is large, the numerical solution may be not very accurate. But at least the error is under control. Next, we list some one step implicit methods. Backward Euler’s method: yj+1 = yj +hf(tj+1,yj+1). (2.5) This is a first order implicit method. Notice that yj+1 also appears in the right hand side in f. Trapezoid method: yj+1 = yj + h 2 [ f(tj,yj)+ f(tj+1,yj+1)] (2.6) This is one of the most widely used implicit method. It is a 2nd order method. Implicit midpoint method: yj+1 = yj +hf tj + h 2 , 1 2 (yj +yj+1) (2.7) Again, this is a 2nd order implicit method. Notice that yj+1 also appears in the right hand side. The implicit midpoint method is equavelent to the so-called 2nd order Gauss method: k1 = f tj + h 2 ,yj + h 2 k1 (2.8) yj+1 = yj +hk1. (2.9) This time, k1 is implicitly given in the first equation. If we eliminate k1, the method can still be written as (2.10). Now, let us solve the differential equation (2.1) by the implicit midpoint method. Using the step size h = 0.02, we obtain the numerical results shown as the little circles in Fig. 2.3. Besides the first a few steps, the numerical solutions are pretty accurate. The implicit methods given in this section are one-step method, since yj+1 depends on yj only (does not depend on earlier solutions, such as yj−1). The local truncation error and the order of the method can be defined as before. For the general implicit method (2.10), we first calculate ˜yj+1 by changing yj to the exact solution y(tj), i.e., ˜yj+1 = φ(tj,h,y(tj), ˜yj+1), then the local truncation error is Tj+1 = y(tj+1)− ˜yj+1. 21 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Figure 2.3: Comparision of the exact solution and the numerical solution by the implicit midpoint method with h = 0.02 for (2.1). This definition is somewhat complicated, since ˜yj+1 must be solved from an equation. Notice that a one-step implicit method can be written as yj+1 = φ(tj,h,yj,yj+1) = yj +hΦ(tj,yj,yj+1), (2.10) where Φ is related to f. We can approximate ˜yj+1 by ˆyj+1 = φ(tj,h,y(tj),y(tj+1)) = y(tj)+hΦ(tj,y(tj),y(tj+1)). Notice that ˆyj+1 is given explicitly. This gives rise to the modified definition of local truncation error: ˆTj+1 = y(tj+1)− ˆyj+1 = y(tj+1)−φ(tj,h,y(tj),y(tj+1)). It appears that we just need to insert the exact solution into the numerical formula to get the local truncation error. It can be proved that the original and the modified definitions give the same first nonzero term in their Taylor expansions. That is, if we assume the time step h is small and work out the first a few terms of the Taylor expansions, then Tj+1 = Chp+1 +Dhp+2 +... ˆTj+1 = Chp+1 + ˆDhp+2 +... Since we are only interested in the first non-zero term of the local truncation error, we can use ˆTj+1 to replace the original Tj+1. As before, the method has order p, if the first non-zero term of the Taylor series of the local truncation error is proportional to hp+1. For example, the local truncation error of the backward Euler’s method is ˆTj+1 = y(tj+1)−y(tj)−hf(tj+1,y(tj+1)) = y(tj+1)−y(tj)−hy′ (tj+1) = − h2 2 y′′ (tj)+... Therefore, the backward Euler’s method is a first order method. 22 Chapter 3 ODE IVP: Multi-step Methods 3.1 Explicit multi-step methods In Runge-Kutta methods, the solution at tj+1 is based on the solution at tj. That is yj+1 is calculated from yj. In multi-step methods, yj+1 is calculated from the solutions yj, yj−1, etc. Multi-step methods are more difficult to program, but they are more efficient than the Runge-Kutta methods. To present the multi-step methods, we need the following notation: fk = f(tk,yk) for any integer k. The Adams-Bashforth methods are the most widely used explicit multi-step methods: AB2 (2nd order) yj+1 = yj +h 3 2 fj − 1 2 fj−1 (3.1) AB3 (3rd order) yj+1 = yj +h 23 12 fj − 16 12 fj−1 + 5 12 fj−2 . (3.2) AB4 (4th order) yj+1 = yj +h 55 24 fj − 59 24 fj−1 + 37 24 fj−2 − 9 24 fj−3 (3.3) The Adams-Bashforth methods are derived in the following steps. 1. The following formula is exact Z tj+1 tj y′ (t) dt = y(tj+1)−y(tj) = Z tj+1 tj f(t,y(t)) dt 2. Find a polynomial interpolation for f based on the points (tj, fj), (tj−1, fj−1),... 3. Replace f by its polynomial aproximation in step 2, then integrate. 23 The method AB2 is a 2-step method, and it is also a second order method. The concept of order is related to the concept of local truncation error (LTE). If we write an explicit multi-step method as yj+1 = φ(tj,h,yj,yj−1,...) for some function φ related to the differential equation y′ = f(t,y), the LTE is defined as Tj+1 = y(tj+1)−φ(tj,h,y(tj),y(tj−1),...) where y(t) is the exact solution of differential equation. As an example, we consider the method AB2. We have φ(tj,h,y(tj),y(tj−1)) = y(tj)+h 3 2 f(tj,y(tj))− 1 2 f(tj−1,y(tj−1)) = y(tj)+h 3 2 y′ (tj)− 1 2 y′ (tj−1) To find the order, we need the first non-zero term of the Taylor series of the LTE. Since y′ (tj−1) = y′ (tj −h) = y′ (tj)−hy′′ (tj)+ h2 2 y′′′ (tj)+... we obtain φ(tj,h,y(tj),y(tj−1)) = y(tj)+hy′ (tj)+ h2 2 y′′ (tj)− h3 4 y′′′ (tj)+... On the other hand, y(tj+1) = y(tj +h) = y(tj)+hy′ (tj)+ h2 2 y′′ (tj)+ h3 6 y′′′ (tj)+... Therefore, Tj+1 = 5h3 12 y′′′ (tj)+... If the LTE is related to h as Tj+1 = Chp+1 +... then the order of the numerical method is p. For AB2, the order is 2. In other words, AB2 is a second order method. Let us consider the method AB4 (which is a fourth order method). To evaluate yj+1, we need fj, fj−1, fj−2 and fj−3. However, we only need to calculate fj in this step, because the other three values of f (i.e. fj−1, fj−2 and fj−3) are already calculated in the previous steps. In comparision, in each step of the 4th order Runge-Kutta method, we need to evaluate f four times (i.e., k1, k2, k3 and k4). For large scale problems, usually the most expensive part of the calculation is to evaluate the function f. Therefore, AB4 is roughly four times faster than a 4th order Runge-Kutta method. A MATLAB program for the method AB4 is given below: % We implement the 4th order Adams-Bashforth’s method here. A % constant step size h is used. The differential equation is y’ = f(t,y), % where f is the name (a string) of the function f(t,y). Notice that y 24 % and f are supposed to be column vectors. % Input: % t0 --- the intial time % y0 --- the initial values (a column vector) % tfinal --- the final time % steps --- the total number of steps. % Output: % t --- a row vector for the discretized time % y --- a matrix for solutions at various time function [t, y] = myab4(t0, y0, tfinal, f, steps) % setup the step size. h = (tfinal - t0)/steps; % setup the vector for output. n = length(y0); t = t0 : h: tfinal; y = zeros(n, steps+1); y(:,1) = y0; % first 3 steps by the classical 4th order Runge-Kutta method. [y(:,2), f1] = myrk4a(f, h, y(:,1), t(1)); [y(:,3), f2] = myrk4a(f, h, y(:,2), t(2)); [y(:,4), f3] = myrk4a(f, h, y(:,3), t(3)); % calculate the remaining steps by AB4 for j=4:steps f4 = feval(f, t(j), y(:,j)); y(:,j+1) = y(:,j) + (h/24)*(-9*f1 + 37*f2-59*f3+55*f4); f1 = f2; f2 = f3; f3 = f4; end % The 4th order classical Runge-Kutta method function [y1, k1] = myrk4a(f, h, y, t) k1 = feval(f, t, y); k2 = feval(f, t+0.5*h, y+0.5*h*k1); k3 = feval(f, t+0.5*h, y+0.5*h*k2); k4 = feval(f, t+h, y+h*k3); y1 = y + (h/6) * (k1 + 2*k2 + 2*k3 + k4); Next, we use AB4 to solve the following Lorenz system: y′ 1 = 10(y2 −y1) 25 y′ 2 = −y1y3 +28y1 −y2 y′ 3 = y1y2 − 8 3 y3. This is implemented in the following MATLAB program lorenz.m: % The Lorenz system function k = lorenz(t,y) k = zeros(3,1); k(1) = 10 * (y(2) - y(1)); k(2) = - y(1)*y(3) + 28*y(1) - y(2); k(3) = y(1) * y(2) - (8/3) * y(3); Now, we solve the Lorenz system from t = 0 to t = 40 with the following main program. % the main program to solve the Lorenz system by AB4. % initial time t0 = 0; % final time tfinal = 40; % initial conditions (column vector): y0 = [-11.3360, -16.0335, 24.4450]’ ; % total number of steps steps = 2000; % call the function myab4 [t, y] = myab4(t0, y0, tfinal, ’lorenz’, steps); The solutions are plotted in Fig. 3.1. The Lorenz system exhibits chaos. If you think about the solution as a trajectory in the 3-D space of y1, y2 and y3, then the trajectory does not approach a fixed point or a closed loop. If the trajectory approaches a fixed point, the solutions (as functions of t) tend to constants. If the trajectory approaches a closed loop, then the solutions become periodic as t → ∞. But the solutions of the Lorenz equation is non-periodic as t → ∞. 3.2 Implicit multi-step methods The Adams-Bashforth methods, like the explicit Runge-Kutta methods, have difficulties for stiff differential equations. Some one-step implicit methods are introduced in Chapter 2. In the following, we develop some implicit multi-step methods. The Adams-Moulton methods are implicit multi-steps methods and they are derived similarly as the Adams-Bashforth methods. We start with y(tj+1)−y(tj) = Z tj+1 tj f(t,y(t))dt and use polynomial interpolation for f. For Adams-Moulton methods, we include (tj+1, fj+1) as an interpolation point. If we only use two points: (tj, fj) and (tj+1, fj+1) for approximating f, then we get 26 0 5 10 15 20 25 30 35 40 −20 −10 0 10 20 t y 1 0 5 10 15 20 25 30 35 40 −40 −20 0 20 40 t y 2 0 5 10 15 20 25 30 35 40 0 20 40 60 t y 3 Figure 3.1: Solutions of the Lorenz system by AB4. a single-step implicit method. This is the Trapezoid method given in section 2.2 (which can be regarded as AM1). If f is approximated by its polynomial interpolation using the three points: (tj+1, fj+1), (tj, fj), (tj−1, fj−1) we get the following 2-step Adams-Moulton method (AM2): yj+1 = yj +h 5 12 fj+1 + 8 12 fj − 1 12 fj−1 . (3.4) The above is a 3rd order method. The 3-step Adams-Moulton method (AM3) has a fourth order of accuracy. It is given as follows: yj+1 = yj +h 9 24 fj+1 + 19 24 fj − 5 24 fj−1 + 1 24 fj−2 . The Admas-Moulton methods are useful when they are used together with the Admas-Bashforth methods as the so-called Predictor-Corrector methods. In this a method, an explicit method is used to calculate a solution at tj+1, say ˜yj+1, then it is improved by an implicit method. In the implicit method, 27 fj+1 is replaced by f(tj+1, ˜yj+1). Here is the 3rd order Adams predictor-corrector method ˜yj+1 = yj +h 23 12 fj − 16 12 fj−1 + 5 12 fj−2 (3.5) yj+1 = yj +h 5 12 f(tj+1, ˜yj+1)+ 8 12 fj − 1 12 fj−1 . (3.6) Overall, this is still an explicit method. Notice that the two methods involed in the above predictorcorrector method both are 3rd order and the resulting method is also 3rd order. A class of useful implicit multi-step method is the Backward Differentiation Formulas (BDF). The derivation is as follows: 1. Write down a polynomial Q(t) that interpolates (tj+1,yj+1), (tj,yj), (tj−1,yj−1),... BDF2 is a 2-step method, so Q is based on the above three points. BDF3 is a 3-step method, so (tj−2,yj−2) is also needed. 2. Replace y′ = f(t,y) at tj+1 by Q′ (tj+1) = f(tj+1,yj+1) Consider BDF2, we have Q(t) = yj+1 (t −tj)(t −tj−1) (tj+1 −tj)(tj+1 −tj−1) +yj (t −tj+1)(t −tj−1) (tj −tj+1)(tj −tj−1) +yj−1 (t −tj)(t −tj+1) (tj−1 −tj)(tj−1 −tj+1) Take a derivative and set t = tj+1, we get 3 2h yj+1 − 2 h yj + 1 2h yj−1 = f(tj+1,yj+1) or yj+1 − 4 3 yj + 1 3 yj−1 = 2h 3 f(tj+1,yj+1). The method BDF3 can be similarly derived. We have yj+1 − 18 11 yj + 9 11 yj−1 − 2 11 yj−2 = 6 11 hf(tj+1,yj+1). For an implicit multi-step method, given as yj+1 = φ(tj,h,yj+1,yj,yj−1,...), the local truncation error is defined as Tj+1 = y(tj+1)−φ(tj,h,y(tj+1),y(tj),y(tj−1),...). Going through a Taylor series, we may find Tj+1 = Chp+1 +... then the order of the method is p. 28 Chapter 4 ODE IVP: Stability Concepts 4.1 Zero stability There are many other multi-step methods. Some of them have higher order than the methods in the previous chapter (for the same number of steps). The following explicit multi-step method yj+1 +4yj −5yj−1 = h[4fj +2fj−1] (4.1) is a third order 2-step method. This can be verified by calculating its local truncation error. We have Tj+1 = y(tj+1)+4y(tj)−5y(tj−1)−h[4f(tj,y(tj))+2f(tj−1,y(tj−1))] = y(tj+1)+4y(tj)−5y(tj−1)−h 4y′ (tj)+2y′ (tj−1) . Now, if we insert the Taylor series of y(tj+1), y(tj−1) and y′(tj−1) at t = tj, we obtain Tj+1 = h4 6 y(4) (tj)+... Since the power of h is 4 = 3+1, this is a third order method. Notice that the 2-step AB2 is only a second order method. It thus seems that the above method would be more useful than AB2. This is not the case, since it can not solve the simplest differential equation y′ = 0 The solution of the above should be y = const. If the initial condition is y(t0) = y0, then y(t) = y0 for all t. For the 2-step method (4.1), we must assume that y1 ≈ y(t1) is also given. We assume that y1 ≈ y0, but y1 = y0 This can happen, if we have somehow computed y1 with a small error. Then (4.1) is simple yj+1 +4yj −5yj−1 = 0 for j = 1,2,3,.... This linear recurrence relationship can be solved. The general solution is yj = C1λ j 1 +C2λ j 2, 29 where C1 and C2 are constants, λ1 and λ2 are the solutions of λ2 +4λ−5 = 0. Therefore, λ1 = 1, λ2 = −5. Thus the general solution is yj = C1 +C2(−5)j . Now, we can try to determine C1 and C2 from y0 = C1 +C2 y1 = C1 −5C2. We have C1 = 5y0 +y1 6 , C2 = y0 −y1 6 . If y0 = y1, then C2 = 0, thus lim j→∞ |yj| = ∞. Therefore, the error grows exponentially fast and the method (4.1) is useless. Let us write a numerical method for ODE IVPs in the following general form: k ∑ l=0 αlyj+l = αkyj+k +...+α1yj+1 +α0yj = hΦ(yj+k,...,yj+1,yj,tj;h). (4.2) This is a k-step method. The right hand side is related to the function f, i.e., the right hand side of the ODE. In general, the above method is implicit, since yj+k is also in the right hand side. Furthermore, we may require that αk = 1. Notice that we have shifted the subscripts, so that terms like yj−1 do not appear. In fact, method (4.1) is now written as yj+2 +4yj+1 −5yj = h[4fj+1 +2fj]. In any case, we may ask when the general method (4.2) is zero stable. If the method is applied to y′ = 0, then we just have the left hand side: αkyj+k +...+α1yj+1 +α0yj = 0. (4.3) Consider a special solution yj = ζj, we obtain ρ(ζ) = k ∑ l=0 αlζl = 0. For zero-stability, we require that all solutions of the linear recurrence (4.3) must be bounded for all j and for all initial conditions. Therefore, the roots of the polynomial ρ(ζ), i.e. the zeros of the polynomial ρ(ζ) or the solutions of the above equation, must satisfy |ζ| ≤ 1, and |ζ| = 1 only if ζ is a simple root. 30 4.2 Absolute stability For stiff differential equations, it is desirable to have A-stable numerical methods. A numerical method is called A-stable (which means absolutely stable), if when it is applied to y′ = ay, t > 0, y(0) = y0, where a is any complex number with Re(a) < 0, the numerical solution yj → 0 as j → ∞, for any step size h > 0. Notice that the exact solution of the above equation y(t) = y0eat → 0, as t → ∞, since Re(a) < 0. Therefore, the A-stable numerical methods have the correct behavior for large t. An explicit numerical method can never be A-stable. When an explicit method is applied to y′ = ay, the numerical solution converges to zero if h is small enough, otherwise, the numerical solution diverges exponentially. The implicit methods presented in Chapter 2 are all A-stable. When applied to y′ = ay, the backward Euler’s method gives yj+1 = 1 1−ah yj, or yj = 1 1−ah j y0. Since Re(a) < 0, |1−ah| > 1, thus, yj → 0 as j → ∞. For the implicit midpoint method and the Trapzoid method, we get yj+1 = 1+ah/2 1−ah/2 yj. Therefore, yj = 1+ah/2 1−ah/2 j y0. Since Re(a) < 0, we have 1− ah 2 > 1+ ah 2 . Therefore, yj → 0 as j → ∞. Other implicit methods may or may not be A-stable. In fact, the Adams-Moulton methods are not A-stable. Let us consider the third order method AM2. If we apply the method to y′ = ay, where a is a complex constant with a negative real part, we get (1−5s)yj+1 −(1+8s)yj +syj−1 = 0, where s = ah/12. The general solution of the linear recurrence relationship is yj = Aλ j 1 +Bλ j 2 where λ1 and λ2 satisfies (1−5s)λ2 −(1+8s)λ+s = 0. 31 If λ1 or λ2 satisfies |λ| > 1 for some s, then |yj| → ∞ as j → ∞ for the given s (thus the given step size h). If that is the case, the method will not be A-stable. This is indeed the case for real a < 0 when s < −1/2 or h > −6/a. Therefore, the method is not A-stable. The 2-step BDF formula (i.e., BDF2) is A-stable, but the k-step BDF formulae for k ≥ 3 are not A-stable. Furthermore, for BDF formulae, we also need to check their zero-stability. It turns out that the BDF formulae are zero-stable only if the number of steps k ≤ 6. Although the BDF formulae are not A-stable for 3 ≤ k ≤ 6, they are A(0)-stable. A numerical method is called A(0)-stable, if when it is applied to y′ = ay for any real and negative a, the numerical solution always satisfies yj → 0 as j → ∞ for any step size h > 0. It is clear that A(0)-stable is a weaker condition than A-stable. If a method is A-stable, then it is certainly A(0)-stable, but the reverse is not true. Notice that the A-stable condition checks the solution for all complex a with a real and negative imaginary part, but it includes a real a as a special case. Actually, we can have more information if we calculate the region of absolute stability of a numerical method. This concept is again related to y′ = ay for complex a, but it is also related to the step size h. As we see from the earlier calculations, the numerical solution for this equation is closely related to z = ah. Therefore, we define the region of absolute stability as a region in the complex z plane, where z = ah. It is defined as those values of z such that the numerical solutions of y′ = ay satisfy yj → 0 as j → ∞ for any initial conditions. For the explicit Runge-Kutta methods in Chapter 1, the Adams-Bashforth methods, the Adams-Moulton methods, and the BDF methods, we show the regions of absolute stability in the extra handout. What about the three implicit methods in Chapter 2? The backward Euler’s method is identified as BDF1, the trapezoid method is identified as AM1, and the region of absolute stability for implicit midpoint method is identical to that of the trapezoid method. With this concept, we realize that a method is A-stable, if its region of absolute stability includes the left half of the complex z-plane, and a method is A(0)-stable, if its region of absolute stability includes the negative half of the real line in the complex z-plane. Furthermore, we can say that one method is more stable than the other method, if the first method has a larger absolute stability region. As a little exercise, we consider the interval of absolute stability on the real axis of z. For y = ay, the 4th order Runge-Kutta method gives yj+1 = 1+z+ z2 2 + z3 6 + z4 24 yj On the real axis z = ah is real, the interval is thus defined as 1+z+ z2 2 + z3 6 + z4 24 < 1. We solve the end points of the interval from 1+z+ z2 2 + z3 6 + z4 24 = ±1. The case of 1 gives z = 0 and z = −2.7853, the case −1 has no real roots. Therefore, the interval on the real axis (of the region of absolute stability) is −2.7853 < z < 0. While our numerical methods are designed for the general first order system y′ = f(t,y) where y is in general a vector, we only considered the absolute stability concept for y′ = ay where y is a scalar and 32 a is a constant. Therefore, it is natural to ask whether this concept is relevant. First, we consider the linear equations: y′ = Ay where A is a square matrix and it is t-independent. In that case, the matrix A has eigenvalues λ1, λ2, ..., the corresponding right eigenvectors p1, p2, ..., and left eigenvectors wT 1 , wT 2 , ..., where T denotes the transpose operation. That is Apj = λj pj, wT j A = λjwT j , j = 1,2,... As y is a column vector of functions, we can multiply the row vector wT j and obtain wT j y′ = wT j Ay = λjwT j y, If we define a scalar function gj = wT j y, then g′ j = λjgj. This equation has the same form as the simple equation y′ = ay studied earlier. If we assume Re(λj) < 0 for all j, then the analytic solution satisfies y → 0 as t → ∞. In order to have numerical solutions that converge to zero, we must make sure that λjh, for all j, are in the region of absolute stability. This type of argument goes though for the linear system of ODEs with an inhomogeneous term: y′ = Ay+b, where b is a vector. This is exactly the semi-discretized form (2.3) of heat equation discussed in section 2.1. At that time, we did not explain why the method is stable for step size h = 0.05/718 and unstable for h = 0.05/716. We can explain this, if we calculate the eigenvalues of coefficient matrix in (2.3) and then consider the region of absolute stability of the 4th order Runge-Kutta method. Actually, since the eigenvalues are real, we only need to consider the intersection of the absolute stability region with the real axis. If the eigenvalues are λj (all real and negative), then the numerical method is stable if the step size h satisfies |λj|h < 2.7853. It turns out that the eigenvalues of the coefficient matrix in (2.3) are λj = − 4 (∆x)2 sin2 jπ 2(m+1) , j = 1,2,...,m. The one with the largest absolute value is λm. For m = 99, ∆x = 0.01, we have λm ≈ −39990.13. Therefore, we need −39990.13h < 2.7853 or h < 6.964968×10−5. This is satisfied for h = 0.05/718 but not h = 0.05/717 or h = 0.05/716. For the more general system y′ = f(t,y), the absolute stability concept is useful if we think of approximating f(t,y) at any fixed time tj by a linear system of ODEs using Taylor expansion. But the approximate linear system changes as tj changes. 33 Chapter 5 ODE Boundary Value Problems 5.1 The shooting method Consider a 2nd order ordinary differential equation with two boundary conditions y′′ = f(x,y,y′ ), a < x < b y(a) = α y(b) = β, where a, b, α, β are given constants, y is the unknown function of x, f is a given function that specifies the differential equation. This is a two-point boundary value problem. An initial value problem (IVP) would require that the two conditions be given at the same value of x. For example, y(a) = α and y′(a) = γ. Because the two separate boundary conditions, the above two-point boundary value problem (BVP) is more difficult to solve. The basic idea of “shooting method” is to replace the above BVP by an IVP. But of course, we do not know the derivative of y at x = a. But we can guess and then further improve the guess iteratively. More precisely, we treat y′(a) as the unknown, and use secant method or Newton’s method (or other methods for solving nonlinear equations) to determine y′(a). We introduce a function u, which is a function of x, but it also depends on a parameter t. Namely, u = u(x;t). We use u′ and u′′ to denote the partial derivative of u, with respect to x. We want u to be exactly y, if t is properly chosen. But u is defined for any t, by u′′ = f(x,u,u′ ) u(a;t) = α u′ (a;t) = t. If you choose some t, you can then solve the above IVP of u. In general u is not the same as y, since u′(a) = t = y′(a). But if t is y′(a), then u is y. Since we do not know y′(a), we determine it from the boundary condition at x = b. Namely, we solve t from: φ(t) = u(b;t)−β = 0. 34 If a solution t is found such that φ(t) = 0, that means u(b;t) = β. Therefore, u satisfies the same two boundary conditions at x = a and x = b, as y. In other words, u = y. Thus, the solution t of φ(t) = 0 must be t = y′(a). If we can solve the IVP of u (for arbitrary t) analytically, we can write down a formula for φ(t) = u(b;t) − β. Of course, this is not possible in general. However, without an analytic formula, we can still solve φ(t) = 0 numerically. For any t, a numerical method for IVP of u can be used to find an approximate value of u(b;t) (thus φ(t)). The simplest method is to use the secant method. tj+1 = tj − tj −tj−1 φ(tj)−φ(tj−1) φ(tj), j = 1,2,3,... For that purpose, we need two initial guesses: t0 and t1. We can also use Newton’s method: tj+1 = tj − φ(tj) φ′(tj) , j = 0,1,2,... We need a method to calculate the derivative φ(t). Since φ(t) = u(b;t)−β, we have φ′ (t) = ∂u ∂t (b;t)−0 = ∂u ∂t (b;t). If we define v(x;t) = ∂u/∂t, we have the following IVP for v: v′′ = fu(x,u,u′ ) v+ fu′(x,u,u′ ) v′ v(a;t) = 0 v′ (a;t) = 1. Here v′ and v′′ are the first and 2nd order partial derivatives of v, with respect to x. The above set of equations are obtained from taking partial derivative with respect to x for the system for u. The chain rule is used to obtain the differential equation of v. Now, we have φ′(t) = v(b;t). Here is the algorithm for the shooting method which involves Newton’s method for solving φ(t) = 0: t0 = initial guess for y′(a). for j = 0,1,2,... solve the following system numerically from x = a to x = b u′′ = f(x,u,u′) u|x=a = α u′|x=a = tj v′′ = fu(x,u,u′)v+ fu′ (x,u,u′)v′ v|x=a = 0 v′|x=a = 1. set tj+1 = tj − u|x=b−β v|x=b . If we want to use the methods developed in the previous chapter to solve the above system of two 2nd order equations for u and v, we need to introduce a vector z = (u,u′,v,v′)T and write the differential equation as z′ = F(x,z) for some vector F. The initial condition is z(a) = (α,tj,0,1)T . 35 The shooting method is also applicable to eigenvalue problem: y′′ = f(x,y,y′ ,λ), a < x < b, y(a) = 0, y(b) = 0, where f satisfies the condition f(x,0,0,λ) = 0 and more generally, f is homogeneous in y, i.e., f(x,cy,cy′ ,λ) = c f(x,y,y′ ,λ). for any constant c. Notice that y = 0 is always a solution of the above boundary value problem. In fact, an eigenvalue problem is a special boundary value problem satisfying (1) y = 0 is always a solution, (2) there is a parameter called λ in the equation (or boundary condition). The eigenvalue problem is to determine non-zero solutions which exist only for special values of λ. The solutions of eigenvalue problems are the pairs {λ,y(x)}, where λ is the eigenvalue and y is the eigenfunction. Usually, there are many (may be infinite) eigenvalues and eigenfunctions. To use the shooting method, we consider the initial value problem u′′ = f(x,u,u′ ,λ), x > a, u(a;λ) = 0, u′ (a;λ) = 1, where λ is considered as a parameter. Since the solution depends on λ, we use the notation u = u(x;λ), but u′ and u′′ represent the first and second order derivatives with respect to x. For any given λ, we can solve the above initial value problem. Now suppose λ satisfies the condition φ(λ) = u(b;λ) = 0, then y(x) = u(x,λ) is the eigenfunction we are looking for, and λ is the corresponding eigenvalue. Therefore, we just have to use secant or Newton’s method to solve λ from the equation φ(λ) = 0. If a secant method is used, we just have to solve initial value problems for different iterates of λ. If Newton’s method is used, we must evaluate φ′(λ) for given λ. Therefore, we need v(x;λ) = ∂u ∂λ (x;λ). We need an initial value problem for v. This can be obtained by taking partial derivative with respect to λ for the initial value problem of u. We have v′′ = fu(x,u,u′ ,λ)v+ fu′ (x,u,u′ ,λ)v′ + fλ(x,u,u′ ,λ), x > a, v(a;λ) = 0, v′ (a;λ) = 0. Notice that we have been using the chain rule (of Calculus) to get the equation for v. Now, you can solve the initial value problem for v (together with u), then evaluate φ′(λ) for any given λ. 36 5.2 Finite difference methods The basic idea of “finite difference method” is to replace the derivatives in a differential equation by “difference approximations”. To approximate the derivative f′(x0), we can use the left side of the following equations: 1. Forward difference: f(x0 +h)− f(x0) h = f′ (x0)+ h 2 f′′ (x0)+... 2. Backward difference: f(x0)− f(x0 −h) h = f′ (x0)− h 2 f′′ (x0)+... 3. Central difference: f(x0 +h)− f(x0 −h) 2h = f′ (x0)+ h2 6 f′′′ (x0)+... 4. Central difference using half step: f(x0 +0.5h)− f(x0 −0.5h) h = f′ (x0)+ h2 24 f′′′ (x0)+... 5. Three-point formulas: − f(x0 +2h)+4f(x0 +h)−3f(x0) 2h = f′ (x0)− h2 3 f′′′ (x0)+... f(x0 −2h)−4f(x0 −h)+3f(x0) 2h = f′ (x0)− h2 3 f′′′ (x0)+... For the second order derivative, we have: f(x0 +h)−2f(x0)+ f(x0 −h) h2 = f′′ (x0)+ h2 12 f(4) (x0)+... We consider the following two-point BVP of a linear 2nd order ODE: y′′ + p(x)y′ +q(x)y = r(x), a < x < b y(a) = α y(b) = β. Let x0 = a, xj = x0 + jh, and xn+1 = b, we obtain h = b−a n+1 . We are looking for yj for j = 1,2,...,n, where yj ≈ y(xj). 37 We also let y0 = y(x0) = y(a) = α and yn+1 = y(xn+1) = y(b) = β. Thus, y0 and yn+1 are known. The derivatives at xj can be approximated by y′ (xj) ≈ y(xj+1)−y(xj−1) 2h y′′ (xj) ≈ y(xj−1)−2y(xj)+y(xj+1) h2 . These are the central difference approximations. Therefore, the 2nd order differential equation is discretized by yj−1 −2yj +yj+1 h2 + p(xj) yj+1 −yj−1 2h +q(xj)yj = r(xj) for j = 1,2,...,n. This can be written as 1− h 2 p(xj) yj−1 +[−2+h2 q(xj)]yj + 1+ h 2 p(xj) yj+1 = h2 r(xj). We define aj = −2+h2 q(xj) bj = 1− p(xj)h/2 cj = 1+ p(xj)h/2 dj = h2 r(xj) and obtain         a1 c1 b2 a2 c2 b3 ... ... ... ... cn−1 bn an                 y1 y2 y3 ... yn         =         d1 −b1y(a) d2 d3 ... dn −cny(b)         . If we have the boundary condition y′ (b) = β then we let x0 = a, xj = x0 + jh, and xn = b. Thus h = b−a n . While y0 = y(x0) = y(a) = α is known, yn = y(xn) = y(b) is not known. Therefore, we have the n unknowns: y1,y2,...,yn, where yj ≈ y(xj). Imagine that there is xn+1 = b+ h and yn+1 ≈ y(xn+1), then we can write down the approximation of the differential equation at x = b as bnyn−1 +anyn +cnyn+1 = dn 38 where aj, bj, cj and dj are defined by the same set of formulas (even though xj here is different). The boundary condition y′(b) = β can be discretized as yn+1 −yn−1 2h = y′ (b) = β. We can then solve yn+1 and subsititute to the equation obtained earlier. This leads to (bn +cn)yn−1 +anyn = dn −2βhcn Therefore, with this new boundary condition at x = b and the new definition of h and xj, we have         a1 c1 b2 a2 c2 b3 ... ... ... ... cn−1 bn +cn an                 y1 y2 y3 ... yn         =         d1 −b1y(a) d2 d3 ... dn −2hcny′(b)         . 5.3 The finite element method Consider the following linear second order ODE boundary value problem: u′′ + p(x)u′ +q(x)u = r(x) for a < x < b u(a) = α, u(b) = β, where a, b, α and β are given constants, p, q and r are given functions. We start with the discretization: a = x0 < x1 < ... < xn−1 < xn < xn+1 = b, then try to find numerical solutions on the grid: uj ≈ u(xj) for j = 0,1,...,n,n+1. From the boundary conditions, we have u0 = α, un+1 = β. Thus the unknowns are u1, u2, ..., un. The finite element method provides us an approach for computing these n unknowns. The finite element method relies on an integral relation derived from the differential equation. Let ϕ be a differentiable function of x satisfying ϕ(a) = ϕ(b) = 0, we can multiply ϕ to the differential equation of u and integrate. That is Z b a ϕ(x)[u′′ + p(x)u′ +q(x)u−r(x)] dx = 0. If we use integration by parts for the term involving u′′, we obtain Z b a −ϕ′ u′ + p(x)ϕu′ +q(x)ϕu−r(x)ϕ dx = 0. (5.1) Now, we consider the basis function φj(x) defined as the continuous piecewise linear function satisfying φj(xj) = 1, φj(xk) = 0 if k = j. 39 More precisely, we have φj(x) =    (x−xj−1)/(xj −xj−1), xj−1 < x ≤ xj, (xj+1 −x)/(xj+1 −xj), xj < x < xj+1, 0, otherwise. The derivative of this function is piecewise constant. We have φ′ j(x) =    1/(xj −xj−1), xj−1 < x < xj, −1/(xj+1 −xj), xj < x < xj+1, 0, otherwise. The piecewise linear function obtained by connecting (xj,uj) by line segments is u(n) (x) = n+1 ∑ j=0 ujφj(x) (5.2) Obviously, u(n) is an approximation for u(x). If we plug u(n) into the differential equation, we will not get an exact identity. In fact, u(n) does not even have derivative at the the grid points. In the finite element method, we replace u in (5.1) by u(n) and replace ϕ in (5.1) by φk, for k = 1,2,...,n. This gives rise to n equations for the n unknowns u1, u2, ..., un. These equations can be written as n+1 ∑ j=0 ak juj = bk where ak j = Z b a −φ′ k(x)φ′ j(x)+ p(x)φk(x)φ′ j(x)+q(x)φk(x)φj(x) dx, bk = Z b a φk(x)r(x) dx. for k = 1,2,...,n. If | j − k| > 1, we observe that φk and φj are non-zero only on intervals that do not overlap. This leads to ak j = 0 if | j −k| > 1. Therefore, we have      a11 a12 a21 a22 ... ... ... an−1,n an,n−1 ann           u1 u2 ... un      =         b1 −a10u0 b2 ... bn−1 bn −an,n+1un+1         . This is a tridiagonal system that can be solved in O(n) operations by a special version of Gaussian elimination with partial pivoting. The formula for ak j can be further simplified if we integrate as much as possible and use approximations for p(x) and q(x) when necessary. We have akk ≈ − 1 h − 1 H + 1 2 p(xk−1/2)− p(xk+1/2) + 1 3 h q(xk−1/2)+H q(xk+1/2) ak,k−1 ≈ 1 h − 1 2 p(xk−1/2)+ h 6 q(xk−1/2) ak,k+1 ≈ 1 H + 1 2 p(xk+1/2)+ H 6 q(xk+1/2) bk ≈ 1 2 h r(xk−1/2)+H r(xk+1/2) 40 where h = xk −xk−1 H = xk+1 −xk xk±1/2 = 1 2 (xk +xk±1) The above formulae are exact when p and q are constants. For the more general p and q, we have used their midpoint values on each interval. Furthermore, if p(x) = 0, the resulting tridiagonal coefficient matrix is symmetric. 41 Chapter 6 Finite Difference Methods for Parabolic PDEs 6.1 Introduction For scientitic and engineering applications, it is often necessary to solve partial differential equations. Most partial differential equations for practical problems cannot be solved analytically. Therefore, numerical methods for partial differential equations are extremely important. In this chapter, we study numerical methods for simple parabolic partial differential equations. The simplest parabolic partial differential equation (PDE) is ut = auxx, (6.1) where a is a positive constant. Often, this is called the heat equation, when u represents the temperature of a thin rod. Here, x is the spatial variable along the axis of the rod. We assume that the cross section of the rod is very small and the temperature in the cross section is constant. Then, u is only a function of x and time t. Equation (6.1) is also called the diffusion equation. In this case, we consider a thin tube with water and ink inside. The variable u then represents the density of ink in the tube. As the tube is assumed to have a very small cross section, u is assumed to depend only on x and t. Because of this interprtetation, the coefficient a is called the diffusion coefficient. Equation (6.1) must be solved with some boundary conditions and an initial condition. Assume that the rod is given by 0 < x < L (the length of the rod is L), we solve (6.1) with the following two boundary conditions: u(0,t) = α, u(L,t) = β, (6.2) and the following initial condition: u(x,0) = g(x). (6.3) Here, α and β are given constants, g is a given function of x. As t → ∞, the temperature setttles down to a time independent (i.e., steady) solution: lim t→∞ u(x,t) = u∞(x) = α+ x L (β−α). 42 The above solution gives a linear profile that changes from u = α at one end of the rod to u = β at the other end of the rod. For t < ∞, we have the following time dependent solution: u(x,t) = u∞(x)+ ∞ ∑ k=1 cke−a(kπ/L)2t sin kπx L , where the coefficients {ck} can be determined from the intial condition u(x,0) = g(x). If the rod is not uniform in the x direction (different part of the rod may be made from differential materials), the coefficient a is no longer a constant. Therefore, we consider the following general parabolic equation: ut = a(x)uxx +b(x)ux +c(x)u+d(x), 0 < x < L. (6.4) Here, a, b, c and d are given functions of x, the term d(x) corresponds to some heat source in the rod. We can solve the above equation with the initial condition (6.3), the boundary conditions (6.2) or the following boundary conditions: ux(0,t) = e0u(0,t)+ f0, ux(L,t) = e1u(L,t)+ f1, (6.5) where e0, f0, e1 and f1 are given constants. 6.2 Classical explicit method We consider equation (6.4) with the boundary conditions (6.2) and the initial condition (6.3). First, we discretize x and t by xj = j∆x, ∆x = L n+1 , tk = k∆t for some integer n and some ∆t > 0. We will use the notation uk j to represent the numerical solution. That is, uk j ≈ u(xj,tk). From the initial condition (6.3), we have u0 j = g(xj), j = 0,1,2,...,n+1. From the boundary conditions (6.2), we obtain uk 0 = α, uk n+1 = β. Our objective is to find uk j for k > 0 and for j = 1,2,...,n. For the derivatives in (6.4), we have the following difference approximantions: ut(xj,tk) ≈ u(xj,tk+1)−u(xj,tk) ∆t , ux(xj,tk) ≈ u(xj+1,tk)−u(xj−1,tk) 2∆x , uxx(xj,tk) ≈ u(xj+1,tk)−2u(xj,tk)+u(xj−1,tk) (∆x)2 . 43 Notice that for the time derivative, we only use the first order forward difference formula. If we insert these difference formulas into the differential equation (6.4) and replace u(xj,tk) by uk j, etc, we obtain: 1 ∆t (uk+1 j −uk j) = aj (∆x)2 (uk j+1 −2uk j +uk j−1)+ bj 2∆x (uk j+1 −uk j−1)+cjuk j +dj (6.6) Here, aj = a(xj), bj = b(xj), cj = c(xj) and dj = d(xj). The above is an explicit formula for uk+1 j . Numerical implementation of the above is very simple. We need a loop in k, for k = 0,1,2,.... Inside this loop, we need another loop for j. It is for j = 1,2,...,n. Since we have used the first order forward difference formula to approximate ut, we have an O(∆t) error from the discretization of t. We have used the second order central difference formulas for ux and uxx, thus we have an O((∆x)2) error from the discretization of x. If we keep ∆t = O((∆x)2), then the errors from the discretizations of t and x are roughly the same magnitude. This suggests that the time step should be small. However, there are more serious reasons. Notice that the forward difference approximation corresponds to Euler’s method for ODE. It is not suitable for stiff differential equations. Here, we have a PDE, but if we discretize x first by the central difference formulas (and keep the original continuous t), we obtain a system of ODEs. If we then discretize the system by Euler’s method, we get exactly the same method as (6.6). It turns out that our system of ODEs (obtained with a discretization in x only) is a stiff system. Thus, if ∆t is not small, the error will grow exponentially. To avoid the use of very small ∆t, we need an implicit method. 6.3 Crank-Nicolson method If we define the half time step: tk+1/2 = tk + ∆t 2 , then u(xj,tk+1)−u(xj,tk) ∆t ≈ ut(xj,tk+1/2) is a second order formula. To discretize (6.4), we use the same difference formulas for the x-derivatives, but also do an average between the tk and tk+1 time levels. More precisely, we have 1 ∆t (uk+1 j −uk j) = aj 2(∆x)2 (uk j+1 −2uk j +uk j−1 +uk+1 j+1 −2uk+1 j +uk+1 j−1) + bj 4∆x (uk j+1 −uk j−1 +uk+1 j+1 −uk+1 j−1)+ cj 2 (uk j +uk+1 j )+dj. (6.7) For boundary conditions (6.2), we have uk 0 = α, uk n+1 = β for all k. We can then re-write the above numerical method as A      uk+1 1 uk+1 2 ... uk+1 n      = B      uk 1 uk 2 ... uk n      + p, (6.8) 44 where A and B are tridiagonal matrices given by aj j = 1+ aj∆t (∆x)2 − cj∆t 2 , (6.9) aj,j−1 = − aj∆t 2(∆x)2 + bj∆t 4∆x , (6.10) aj,j+1 = − aj∆t 2(∆x)2 − bj∆t 4∆x , (6.11) bj j = 1− aj∆t (∆x)2 + cj∆t 2 = 2−aj j, (6.12) bj,j−1 = aj∆t 2(∆x)2 − bj∆t 4∆x = −aj,j−1, (6.13) bj,j+1 = aj∆t 2(∆x)2 + bj∆t 4∆x = −aj,j+1 (6.14) and p is the following vector p =         d1∆t −a10uk+1 0 +b10uk 0 d2∆t ... dn−1∆t dn∆t −an,n+1uk+1 n+1 +bn,n+1uk n+1         =         d1∆t −2a10α d2∆t ... dn−1∆t dn∆t −2an,n+1β         . Since the matrices A and B are tridiagonal, we have ajk = 0, bjk = 0, if | j −k| ≥ 2. Therefore, for each step, we need to solve a linear system with a tridiagonal matrix. This can be done efficiently in O(n) operations. The Crank-Nicolson method corresponds to the “implicit midpoint” method for ODE IVP. If we discretize the x variable only for (6.4), we obtain a system of ODEs. If we then apply the implicit midpoint method, we get the Crank-Nicolson method. 6.4 Stability analysis The stability analysis is to find out for what values of ∆t and ∆x, the numerical method is stable. Let us consider the simple constant coefficient heat equation ut = auxx, where a is a positive constant. Let x and t be discretized as xj = j∆x, tk = k∆t for integers j and k. Let uk j be a numerical solution for u(xj,tk). That is uk j ≈ u(xj,tk). 45 The classical explicit method described in section 3.1.1 gives 1 ∆t (uk+1 j −uk j) = a (∆x)2 (uk j+1 −2uk j +uk j−1). (6.15) To understand the stability of this method, we consider special solutions of the following form uk j = ρk eiβj∆x , (6.16) where β is an arbitrary constant, ρ is to be determined. If |ρ| > 1, then the solution grows exponentially in time, so the numerical method is unstable. Otherwise, it is stable. The purpose of stability analysis is to find out the values of ∆t and ∆x such that |ρ| ≤ 1. If we insert the (6.16) into (6.15), we can solve ρ in terms of β, ∆x and ∆t. We have 1 ∆t (ρ−1) = a (∆x)2 (ei˜β −2+e−i˜β ), where ˜β = β∆x. This can be simplified to ρ = 1−4 a∆t (∆x)2 sin2 ˜β 2 . We can see that ρ ≤ 1 for any real β. However, it is possible to have ρ < −1. For stability, we require that |ρ| ≤ 1 for all choice of β. This is guaranteed if 4 a∆t (∆x)2 ≤ 2. This gives rise to ∆t ≤ (∆x)2 2a . This is the condition on ∆t for stability of the numerical method. If the above is not valid, then, we can find a β, such that ρ < −1. In that case, the numerical method becomes unstable. Since there is a condition on ∆t for the method to be stable, we call such a method conditionally stable. The following MATLAB program illustrates the stability and instability for ∆t = 1/20000 and ∆t = 1/19997 respectively. In this example, we have a = 1 and ∆x = 0.01. % we will solve the heat equation u_t = u_{xx} for 0 < x < 1, with % zero boundary conditions at x=0 and x=1 and the initial condition: % u(x, 0) = 1- 2 |x - 0.5|. % we choose dx = 0.01 dx = 0.01; x = 0: dx: 1; m = length(x); u = 1 - 2 * abs(x - 0.5); u(1) = 0; u(m) = 0; 46 % we solve up to t = 1. steps = 19997; % unstable % steps = 20000; % stable dt = 1/steps; s = dt/(dx*dx); for k= 0 : steps-1 b = u(1); for j=2:m-1 a = u(j); u(j) = a + s*(u(j+1)-2*a+ b); b = a; end end plot(x, u) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 x 10 −5 ∆ t = 1/20000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 x 10 −5 ∆ t = 1/19997 Figure 6.1: Stability of the classical explicit method for the heat equation. Next, we perform a stability analysis for the Crank-Nicolson method. For ut = auxx, we have the 47 following discrete formula: 1 ∆t (uk+1 j −uk j) = a 2(∆x)2 (uk j+1 −2uk j +uk j−1 +uk+1 j+1 −2uk+1 j +uk+1 j−1). (6.17) For the special solution uk j = ρkeiβj∆x, we have 1 ∆t (ρ−1) = a 2(∆x)2 [ei˜β −2+e−i˜β +ρ(ei˜β −2+e−i˜β )] This is reduced to ρ = 1−s 1+s , s = 2a∆t (∆x)2 sin2 ˜β 2 , where ˜β = β∆x. Clearly, s ≥ 0, therefore, |ρ| ≤ 1 for all choice of β, ∆t and ∆x. Therefore, Crank-Nicolson is always stable. Since there is no condition for stability, we call such a method unconditionally stable. 6.5 Alternating direction implicit method In this section, we consider the heat equation with two spatial varriables: ut = a(uxx +uyy), (x,y) ∈ Ω, t > 0. (6.18) The initial condition is u(x,y,0) = f(x,y), (x,y) ∈ Ω. (6.19) We assume that u satisfy the so-called Dirichlet boundary condition. u(x,y,t) = g(x,y), (x,y) ∈ ∂Ω. (6.20) That is, u is given on ∂Ω — the boundary of Ω. We will consider the case where Ω is the square: Ω = {(x,y) | 0 < x < L, 0 < y < L}. We can discretize x and y by xi = ih, yj = jh, h = L n+1 , i, j = 0,1,...,n,n+1 and discretize t by tk = k∆t for k = 0,1,2,.... As in the previous section, we have a classical explicit method that uses forward difference approximation for ut and central difference approximation for uxx and uyy. This gives rise to uk+1 ij −uk ij ∆t = a h2 uk i−1,j +uk i+1,j +uk i,j−1 +uk i,j+1 −4uk ij , (6.21) where uk ij ≈ u(xi,yj,tk), etc. The method is very easy to use. Given the numerical solutions at time level tk, we can simply evaluate uk+1 ij for i = 1,2,...,n and j = 1,2,...,n to obtain the numerical solutions at time level tk+1. However, this method is only stable when ∆t satisfies ∆t ≤ h2 4a . 48 The method is unstable if ∆t > h2/(4a). In any event, since there is a condition on ∆t for stability, we say that this method is “conditionally” stable. This stability condition is derived for constant coefficient a and the boundary condition is ignored. Notice that ∆t must be related to the square of h for stability. Therefore, we have to use a very small time step. Similar to the previous section, we also have the Crank-Nicolson method: uk+1 ij −uk ij ∆t = a 2h2 uk i−1,j +uk i+1,j +uk i,j−1 +uk i,j+1 −4uk ij +uk+1 i−1,j +uk+1 i+1,j +uk+1 i,j−1 +uk+1 i,j+1 −4uk+1 ij . This is an implicit method. Given the solution at tk, we must solve n2 unknowns: uk+1 ij for i = 1,2,...,n and j = 1,2,...,n. If we put all these n2 unknowns into one long vector of length n2, we have a linear system of equations with a coefficient matrix of size n2 ×n2. It is not so easy to solve this large system efficiently. This is particularly the case when a is repplaced by a(x,y) or aij = a(xi,yj) in the discrtized numerical method. As before, Crank-Nicolson method has good stability property. That is, the method is always stable, or unconditionally stable. Here, we introduce a method which is a lot easier to solve and is also unconditionally stable. This is the Alternating Direction Implicit (ADI) method. The method was originally developed in the 50’s. Instead of solving one large linear system of n2 unknowns, we need to solve n linear systems each with n unknowns. This is achieved by separating the x and y directions. We present the method without discretizing x and y. If t is discretized, the Crank-Nicolson method is uk+1 −uk ∆t = a 2 (∂2 x +∂2 y)(uk+1 +uk ), where uk is a function of x and y, and uk ≈ u(x,y,tk). Thus, 1− a∆t 2 (∂2 x +∂2 y) uk+1 = 1+ a∆t 2 (∂2 x +∂2 y) uk . (6.22) The Crank-Nicolson method has a second order error. If we put the exact solution into the above equation, then the error term is O((∆t)3). That is, 1− a∆t 2 (∂2 x +∂2 y) u(x,y,tk+1) = 1+ a∆t 2 (∂2 x +∂2 y) u(x,y,tk)+O((∆t)3 ). Now, we add a2(∆t)2 4 ∂2 x∂2 yuk+1 to the left hand side of (6.22) and add a2(∆t)2 4 ∂2 x∂2 yuk to the right hand side of (6.22). Then, we can factor both sides of the new equation and obtain 1− a∆t 2 ∂2 x 1− a∆t 2 ∂2 y uk+1 = 1+ a∆t 2 ∂2 x 1+ a∆t 2 ∂2 y uk . (6.23) Since u(x,y,tk+1) = u(x,y,tk)+O(∆t), we have a2(∆t)2 4 ∂2 x∂2 yu(x,y,tk+1)− a2(∆t)2 4 ∂2 x∂2 yu(x,y,tk)+O((∆t)3 ). 49 This implies that (6.23) is still a 2nd order method. Let s = a∆t/2, we have uk+1 = 1−s∂2 y −1 1+s∂2 x 1−s∂2 x 1+s∂2 y uk . (6.24) This gives rise to following procedure for computing uk+1: 1. evaluate v by v = uk +s∂2 yuk . (6.25) 2. evlaute w by w = v+s∂2 xv. (6.26) 3. solve a new v from v−s∂2 xv = w. (6.27) 4. solve uk+1 from uk+1 −s∂2 yuk+1 = v. (6.28) Now, let us consider each of these sub-steps. Let us discretize the x and y variables as before. For (6.25), we use central difference approximation for the second derivative in y. This leads to vij = uk ij + s h2 uk i,j−1 −2uk ij +uk i,j+1 . We can simply evaluate vij for i = 1,2,...,n and for j = 1,2,...,n. Similarly, for w in (6.26), we have wij = vij + s h2 (vi−1,j −2vij +vi+1,j). We can evaluate wij for i = 1,2,...,n and for j = 1,2,...,n. Notice that we need v0j and vn+1,j. These are related to the boundary conditions for u. We simply use the boundary condition of u as the boundary condition of v. Now, for the new v satisfying (6.27), we are solving a boundary value problem. Since the y-derivative is not involved. We can solve for each yj separately. That is, we solve v1j, v2j, ..., vnj from vij − s h2 (vi−1,j −2vij +vi+1,j) = wij, i = 1,2,...,n. This can be writted as a linear system for n unknowns. That is      c b b c ... ... ... b b c           v1j v2j ... vnj      =      w1j w2j ... wnj      +      −bv0j 0 ... −bvn+1,j      where c = 1+ 2s h2 , b = − s h2 . Furthermore, we need to use the boundary condition for u as the boundary condition for v again. We let v0j = g(0,yj), vn+1,j = g(L,yj). 50 For (6.28), we also have a two-point boundary value problem in only one variable. We can discretize (6.28) as uk+1 ij − s h2 uk+1 i,j−1 −2uk+1 ij +uk+1 i,j+1 = vij, j = 1,2,...,n. This can be written as a linear system:      c b b c ... ... ... b b c           uk+1 i1 uk+1 i2 ... uk+1 in      =      vi1 vi2 ... vin      +      −buk+1 i0 0 ... −buk+1 i,n+1      Here, uk+1 i0 and uk+1 i,n+1 come from the boundary condition: uk+1 i0 = g(xi,0), uk+1 i,n+1 = g(xi,L). If we store uk+1 ij , vij as matrices, the n unknowns for a fixed xi is actually a row vector. Thus, we can replace the above system by its transpose. [uk+1 i1 ,uk+1 i2 ,...,uk+1 in ]      c b b c ... ... ... b b c      = [vi1,vi2,...,vin ]+[−buk+1 i0 ,0,...,−buk+1 i,n+1 ] Using matrix notations, let Uk+1, V and W be the n× n matrices whose (i, j) entries are uk+1 ij , vij and wij, respectively, we have TV = W +B1, Uk+1 T = V +B2, where T is the tridiagonal matrix T =      c b b c ... ... ... b b c      , B1 and B2 are matrices related to the boundary conditions: B1 = −b        g(0,y1) g(0,y2) ... g(0,yn) 0 0 ... ... 0 0 g(L,y1) g(L,y2) ... g(L,yn)        , B2 = −b         g(x1,0) 0 ... 0 g(x1,L) g(x2,0) 0 ... 0 g(x2,L) ... ... ... ... g(xn,0) 0 ... 0 g(xn,L)         . It can be proved that the ADI method is still unconditionally stable. 51 Chapter 7 Finite Difference Methods for Hyperbolic PDEs 7.1 First order hyperbolic equations In this section, we consider hyperbolic equations given as ut +a(x,t)ux = 0, (7.1) and ut +[ f(u)]x = 0. (7.2) Notice that Eq. (7.1) is a first order linear equation, while Eq. (7.2), where f is given function of u, is a nonlinear equation in general. As an example of the nonlinear hyperbolic equation, we have the Burger’s equation ut +uux = 0. This can be written as (7.2) for f(u) = u2/2. We start with (7.1) assuming that a is a non-zero constant. Let us consider the following three numerical methods uk+1 j −uk j ∆t +a uk j+1 −uk j−1 2∆x = 0, (7.3) uk+1 j −uk j ∆t +a uk j+1 −uk j ∆x = 0, (7.4) uk+1 j −uk j ∆t +a uk j −uk j−1 ∆x = 0. (7.5) In all three methods, the time derivative is approximated by the forward difference scheme. For the partial derivative in x, we have used the central difference approximation in (7.3), the forward difference approximation in (7.4) and the backward difference approximation in (7.5). To find the stability of these methods, we follow the standard procedure as in the Von Neumann stability analysis. We look for a special solution gievn as uk j = ρk eiβxj = ρk eij˜β , (7.6) 52 where xj = j∆x and ˜β = β∆x. If we insert the special solution into a finite difference method, we obtain a relation between ρ and ˜β. Now for a fixed ∆t, if there is at least one ˜β, such that |ρ| > 1, then the numerical method is unstable for that ∆t. If |ρ| ≤ 1 for all ˜β, then the numerical method is stable for that ∆t. Furthermore, if a numerical method is unstable for all ∆t > 0, so the method is completely useless, we call the method unconditionally unstable. If the method is stable for small ∆t (usually given by an inequality) and unstable for large ∆t, then we call the method conditionally stable. If the method is stable for all ∆t > 0, then we call the method unconditionally stable. For these three methods, (7.3) is unconditionally unstable. If a > 0, then (7.4) is unconditionally unstable and (7.5) is conditionally stable. If a < 0, then (7.4) is conditionally stable and (7.5) is unconditionally unstable. Here, let us prove that (7.5) is conditionally stable if a > 0. For uk j given in (7.6), Eq. (7.5) gives ρ−1+s(1−e−i˜β ) = 0, where s = a∆t/∆x. That is ρ = 1−s+s cos ˜β−is sin ˜β. Therefore, |ρ|2 = 1−2s(1−s)(1−cos ˜β). If 0 ≤ s ≤ 1, then |ρ|2 ≤ 1, then |ρ| ≤ 1, therefore the numerical method is stable. If s > 1, we can choose ˜β = π/2 such that cos ˜β = 0, then |ρ|2 = 1−2s(1−s) = 1+2s(s−1) > 1, therefore the method is unstable. In conclusion, (7.5) is conditionally stable and the stability condition is s = a∆t ∆x ≤ 1. Here, we have already assumed a > 0, thus s > 0. As a result of the stability analysis, we have to use (7.4) or (7.5) selectively, depending on the sign of a. For a general a = a(x,t), we have the following upwind scheme for (7.1): uk+1 j = uk j −sk j(uk j+1 −uk j), if a(xj,tk) < 0, (7.7) uk+1 j = uk j −sk j(uk j −uk j−1), if a(xj,tk) > 0, (7.8) where sk j = a(xj,tk)∆t/∆x. For the nonlinear equation (7.2), the upwind scheme is uk+1 j −uk j ∆t + f(uk j+1)− f(uk j) ∆x = 0, if f(uk j+1)− f(uk j) uk j+1 −uk j < 0, (7.9) uk+1 j −uk j ∆t + f(uk j)− f(uk j−1) ∆x = 0, if f(uk j)− f(uk j−1) uk j −uk j−1 > 0. (7.10) The principle here is ∂ f(u) ∂x = f′ (u) ∂u ∂x , therefore, f′(u) = d f/du plays the role of a in (7.1). Actually, f′(u) may change sign, so that the two conditions ak j+ 1 2 = f(uk j+1)− f(uk j) uk j+1 −uk j < 0, ak j− 1 2 = f(uk j)− f(uk j−1) uk j −uk j−1 > 0, 53 may be satisfied simultaneously. Therefore, we merge the two equations into one: uk+1 j = uk j − ∆t 2∆x [1−sgn(ak j+1/2)][ f(uk j+1)− f(uk j)]+(1+sgn(ak j−1/2)][ f(uk j)− f(uk j−1)] , (7.11) where sgn(z) =    1, z > 0, 0, z = 0, −1, z < 0. The upwind scheme is only a first order numerical method (first order in both t and x). Next, we introduce the second order Lax-Wendroff method. The basic idea of the Lax-Wendroff method is to use the Taylor series: u(x,t +∆t) = u(x,t)+∆t ut(x,t)+ (∆t)2 2 utt (x,t)+... but we only keep the first three terms in the Taylor series. Let us start with (7.1) assuming that a is a constant. We have ut = −aux, utt = −autx = a2 uxx. Therefore, u(x,t +∆t) ≈ u(x,t)−a∆t ux(x,t)+ (a∆t)2 2 uxx(x,t). Now, we can use central difference approximations and obtain uk+1 j = uk j − s 2 (uk j+1 −uk j−1)+ s2 2 (uk j−1 −2uk j +uk j+1), (7.12) where s = a∆t/∆x. We can carry out a stability analysis for (7.12). Insert uk j as in (7.6) into (7.12), we obtain ρ = 1−is sin ˜β+s2 (cos ˜β−1), and |ρ|2 = 1+4s2 (s2 −1)sin4 ˜β 2 . This leads to the conclusion that Lax-Wendroff method is conditionally stable and the stability condition is |s| ≤ 1 or |a|∆t ∆x ≤ 1. Now, let us consider the Lax-Wendroff method for (7.1) where a varies with x and t. From ut = −aux, we obtain utt = −(aux)t = −atux +a(aux)x. Therefore, u(x,t +∆t) ≈ u−a∆tux + (∆t)2 2 [−atux +a(aux)x] = u−∆t(a+ ∆t 2 at)ux + (∆t)2 2 a(aux)x ≈ u−∆ta(x,t +∆t/2)ux + (∆t)2 2 a(aux)x, 54 where u = u(x,t), ux = ux(x,t) and a = a(x,t). Now, we can use the central difference scheme and obtain uk+1 j = uk j − ν 2 a k+1/2 j (uk j+1 −uk j−1)+ ν2 2 ak j ak j+1/2(uk j+1 −uk j)−ak j−1/2(uk j −uk j−1) , (7.13) where ν = ∆t ∆x , ak j = a(xj,tk), a k+1/2 j = a(xj,tk + ∆t 2 ),... Finally, we consider the Lax-Wendroff for the nonlinear equation (7.2). Let us denote a = a(u) = f′(u) = d f/du, then from ut = −[ f(u)]x, we have utt = −[ f(u)]tx = [afx]x, and u(x,t +∆t) ≈ u−∆t fx + (∆t)2 2 [a(u) fx]x, where u = u(x,t) above. Now, using the centeral difference approximations, we obtain uk+1 j = uk j − ν 2 f(uk j+1)− f(uk j−1) + ν2 2 a(uk j+1/2) f(uk j+1)− f(uk j) −a(uk j−1/2) f(uk j)− f(uk j−1) , (7.14) where ν = ∆t/∆x and uk j+1/2 = uk j+1 +uk j 2 , uk j−1/2 = uk j +uk j−1 2 . 7.2 Explicit methods for wave equation In this section, we will consider the linear wave equation utt = c2 (x)uxx. (7.15) As in Chapter 6, we discretize x and t by xj and tk, respectively. The time step size is ∆t and the spatial grid size is ∆x. The following method is based on central difference approximations for both utt and uxx. Let uk j ≈ u(xj,tk), we have uk+1 j −2uk j +uk−1 j (∆t)2 = c2 (xj) uk j+1 −2uk j +uk j−1 (∆x)2 . Let sj = c(xj)∆t ∆x 2 , then uk+1 j = (2−2sj)uk j +sj(uk j+1 +uk j−1)−uk−1 j . This is an explicit 2-step (or three time-level) method. It is an explicit method. Sometimes, it is called the leap-frog method. Equation (7.15) is solved with two initial conditions. If we start the equation at t = 0, then the two initial conditions are u(x,0) = f(x), ut(x,0) = g(x). (7.16) 55 For t0 = 0, we have u0 j = u(xj,t0) = f(xj). An approximation at t1 can be obtained from the first three terms of the Taylor series: u(x,t1) ≈ u(x,t0)+(∆t)ut(x,t0)+ (∆t)2 2 utt(x,t0) = f(x)+(∆t)g(x)+ (∆t)2 2 c2 (x)∂xx f(x). With a central difference approximation, we obtain u1 j = f(xj)+g(xj)∆t + (∆t)2c2(xj) 2(∆x)2 [ f(xj−1)−2f(xj)+ f(xj+1)]. Next, we perform a stability analysis assuming that c(x) is a constant. When c is a constatn, we have s = (c∆t/∆x)2. If we insert uk j = ρk eiβxj = ρk eij˜β , xj = j∆x, ˜β = β∆x, into the numerical method, we obtain ρ = 2−2s+2scos ˜β− 1 ρ . This gives rise to ρ2 −2γρ+1 = 0, where γ = 1−2ssin2 ˜β 2 . The two solutions of ρ are ρ = γ± γ2 −1. If |γ| > 1, then we always have one ρ such that |ρ| > 1, thus the method is unstable. If |γ| ≤ 1, then γ = γ± 1−γ2i. We can see that |γ| = 1 exactly. Therefore, the stability condition is |γ| ≤ 1. Obviously, we have γ ≤ 1, but we also need γ ≥ −1. This must be true for any ˜β. Thus, we can choose ˜β = π, so that sin ˜β 2 = 1. Then, the condition γ ≥ −1 implies s ≤ 1. That is ∆t ≤ ∆x c . (7.17) Notice that this stability condition is not so restrictive as the stability condition of the classical explicit method (forward difference in time) for the heat equation. For the heat equation, we need ∆t on the order of (∆x)2 for stability. Here, we need ∆t on the order of ∆x for stability. In conclusion, the leapfrog method is conditionally stable, with the above stability condition. If c is a function of x, we interprete the above as ∆t ≤ ∆x maxc(x) . 56 A generalization of this method for wave equation with two spatial variables is straight forward. Consider the wave equation utt = c2 (x,y)[uxx +uyy]. (7.18) If we use the central difference approximation for all second derivatives in the above equation, we obtain 1 (∆t)2 uk+1 ij −2uk ij +uk−1 ij = c2(xi,yj) h2 uk i−1,j +uk i+1,j +uk i,j−1 +uk i,j+1 −4uk ij . Here, we assume that ∆x = ∆y = h. 7.3 Maxwell’s equations The 2-D wave equation (7.18) is a special case of the Maxwell’s equations for electromagnetic waves. Under some simplifying conditions, the Maxwell’s equations are ∇×E = −µ ∂H ∂t (7.19) ∇×H = ε ∂E ∂t , (7.20) where E is the electric field, H is the magnetic field, t is the time, µ is the magnetic permeability and ε is the electric permittivity. In general, µ and ε are functions of the spatial variables: x, y and z. If we consider two dimensional case, we assume µ and ε are functions of x and y only, and E and H has no z-dependence (i.e. ∂zE = 0 and ∂zH = 0). In this case, there is a special two solution given by E =   0 0 Ez  , H =   Hx Hy 0  . That is to say, the electric field E has only one non-zero component in the z direction, the z-component of the magnetic field H is zero and the x and y components of H are non-zero. Here, Ez is not the partial derivative of E with respect to z, it is the z-component of the vector E. Similarly, Hx and Hy are the xand y-components of H. Now, the Maxwell’s equations can be simplified to µ ∂Hx ∂t = − ∂Ez ∂y , (7.21) µ ∂Hy ∂t = ∂Ez ∂x , (7.22) ε ∂Ez ∂t = ∂Hy ∂x − ∂Hx ∂y . (7.23) We can eliminate Hx and Hy. This leads to ∂2Ez ∂t2 = c2 ∂2Ez ∂x2 + ∂2Ez ∂y2 . where c = 1/ √ εµ is the speed of light in the medium. While the speed of light in vacuum (c0) is a constant, here c is the speed of light in the medium and it is still a function of x and y. 57 For the Maxwell’s equations, Kane Yee introduced a famuous numerical method based on central difference for first order derivatives in 1966. It is convenient to present this method in the first order system (7.21-7.23). Let us discretize t, x and y by the step size ∆t and grid sizes ∆x and ∆y, respectively. Therefore, tk = t0 +k∆t, xi = x0 +i∆x, yj = y0 + j∆y. However, we also need to half steps and half grids. Namely, tk+1/2 = t0 +(k +0.5)∆t, xi+1/2 = x0 +(i+0.5)∆x, yj+1/2 = y0 +(j +0.5)∆y. For Ez, we try to calculate its approximation at xi, yj and tk+1/2. Namely, Ez| k+1/2 ij ≈ Ez(xi,yj,tk+1/2). Similarly, we have Hx|k i,j+1/2 ≈ Hx(xi,yj+1/2,tk), Hy|k i+1/2,j ≈ Hx(xi+1/2,yj,tk). The discretization of Ez, Hx and Hy are shown in Fig. (7.1). Ez are evaluated at the grid points marked by “o”, Hx are evaluated at the gird points marked by “⋄” and Hy corresponds to “∗”. With this type of 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Figure 7.1: Discrete grid points for Ez (marked by “o”), Hx (marked by “⋄”) and Hy (marked by “∗”). staggered grid, we can discretize the Maxwell’s equations with second order finite difference method. Yee’s finite difference time domain (FDTD) method is µ ∆t Hx|k+1 i,j+1/2 −Hx|k i,j+1/2 = − 1 ∆y Ez| k+1/2 i,j+1 −Ez| k+1/2 i,j , (7.24) 58 µ ∆t Hy|k+1 i+1/2,j −Hy|k i+1/2,j = 1 ∆x Ez| k+1/2 i+1,j −Ez| k+1/2 i,j , (7.25) ε ∆t Ez| k+1/2 ij −Ez| k−1/2 ij = 1 ∆x Hy|k i+1/2,j −Hy|k i−1/2,j − 1 ∆y Hx|k i,j+1/2 −Hx|k i,j−1/2 . (7.26) This is an explicit method, we can use (7.26) to calculate Ez at time level tk+1/2, we can use (7.24) and (7.25) to calculate Hx and Hy at time level tk+1. The method is in fact identical to the earlier method for a single scalar wave equation. However, the more general Yee’s method for full Maxwell’s equations cannot be written in a single unknown function. But this formulation using Hx, Hy and Ez allows us to treat boundary conditions easily. 59 Chapter 8 Finite Difference Methods for Elliptic PDEs 8.1 Finite difference method for Poisson equation Consider the Poisson’s equation in the unit square Ω = {(x,y)|0 < x < 1,0 < y < 1}: uxx +uyy = F(x,y) for (x,y) ∈ Ω with some boundary condition u = g for (x,y) ∈ ∂Ω. Here, ∂Ω denots the four edges of the unit square and g is a function defined on ∂Ω. We can discretize the problem with a second order finite difference method. Let δ = 1/(n+1) and uij ≈ u(ih, jh), we have the following discretized formula at (ih, jh): ui−1,j −2uij +ui+1,j δ2 + ui,j−1 −2uij +ui,j+1 δ2 = Fij = F(iδ, jδ). or −4uij +ui−1,j +ui+1,j +ui,j−1 +ui,j+1 = fij (8.1) where fij = δ2Fij. Notice that from the boundary conditions, we have known values for u0j = u(0, jδ), un+1,j = u(1, jδ), ui0 = u(iδ,0), ui,n+1 = u(iδ,1). Therefore, we have n2 unknowns and we need n2 equations. This is exactly what we have, if we choose i = 1,2,...,n and j = 1,2,...,n in (8.1). For the n2 unknowns, we can order them in a large vector. For example, we can define U = [u11,u21,...,un1,u12,u22,...,un2,......,unn]T , then (8.1) can be written as AU = b, (8.2) where b is related to a bij which is related to Fij and the boundary conditions. In fact, bij = δ2 Fij if 1 < i < n, 1 < j < n. 60 But near the boundary, i.e., i = 1 or i = n or j = 1 or j = n, we need to include some terms from the boundary conditions. b1j = δ2 F1j −u0j for 1 < j < n bnj = δ2 Fnj −un+1,j for 1 < j < n bi1 = δ2 Fi1 −ui0 for 1 < i < n bin = δ2 Fin −ui,n+1 for 1 < i < n. At the four corners, we have to define bij to include two nearby points from the boundary: b11 = δ2 F11 −(u01 +u10) bn1 = δ2 Fn1 −(un+1,1 +un0) b1n = δ2 F1n −(u0n +u1,n+1) bnn = δ2 Fnn −(un,n+1 +un+1,n). The vector b can be obtained from bij in the same way uij is ordered to give U. The coefficient matrix A is an n2 ×n2 matrix. It cannot be efficiently solved by Gaussian elimination directly, since the standard Gaussian elimination algorithm will require O((n2)3) = O(n6) operations. Actually, the matrix A has at most five non-zeros in each row and A has a bandwidth of O(n). Using Gasssian elimination for banded matrices, the required number of operations is reduced to O(n4). 8.2 Fast Poisson solver based on FFT Here, we describe a FFT based fast algorithm which requires only O(n2 logn) operations. The method uses the discrete sine transform (which is related to the Discrte Fourier Transform) to obtain a system that can be easily solved. The discrete sine transform is gj = n ∑ k=1 ˆgk sin jkπ n+1 , j = 1,2,...,n, ˆgk = 2 n+1 n ∑ j=1 gj sin jkπ n+1 , k = 1,2,...,n. If we introduce an n×n matrix S, whose ( j,k) entry is sin jkπ n+1, then S−1 = 2 n+1 S. Now, for uij and bij, we will fix i, then use discrete sine transform. We have uij = n ∑ k=1 ˆuik sin jkπ n+1 , bij = n ∑ k=1 ˆbik sin jkπ n+1 . If we insert these into (8.2), we obtain n ∑ k=1 −4ˆuik sin jkπ n+1 + ˆui−1,k sin jkπ n+1 + ˆui+1,k sin jkπ n+1 + ˆuik sin ( j −1)kπ n+1 + ˆuik sin ( j +1)kπ n+1 = n ∑ k=1 ˆbik sin jkπ n+1 . 61 This can be simplified to n ∑ k=1 (−4+2cos kπ n+1 ) ˆuik + ˆui−1,k + ˆui+1,k sin jkπ n+1 = δ2 n ∑ k=1 ˆfik sin jkπ n+1 . Therefore, (−4+2cos kπ n+1 ) ˆuik + ˆui−1,k + ˆui+1,k = ˆbik For i = 1 or i = n, the above equation should be modified to remove the term ˆui−1,k or ˆui+1,k, respectively. Now, if we fix k, we can solve ˆuik (for all i) from the above equation.      α 1 1 α ... ... ... 1 1 α           ˆu1k ˆu2k ... ˆunk      =      ˆb1k ˆb2k ... ˆbnk      for α = −4 + 2cos kπ n+1. This is a tridiagonal system and it can be solved in O(n) operations. Since we have to do this for all k, the total operations required here is O(n2). But we first need to calculate ˆbik from bij, based on the discrete sine transform. This can be done in O(n2 logn) operations. Once we found ˆuik, we can use discrte sine transform to find uij. Again, this requires O(n2 logn) operations. Since n2 logn is larger than n2, the overall operations required to solve the Poisson equation is thus O(n2 logn). This is the FFT based “fast Poisson solver”. 8.3 Classical iterative methods Although the FFT-based fast Poisson solver is very efficient, it cannot be generalized to more general equations with variable coefficients. Notice that the matrix A in Eq. (8.2) is sparse, i.e., most of its entries are zero and only a few non-zeros in each row or column. Iterative methods produce a sequence of approximate solutions that converge to the exact solution. Since the matrix A is sparse, it is possible to develop some iterative methods for solving AU = b. We start by writing the matrix A as three parts: the diagonal D, the strictly lower triangular part −L and the strictly upper triangular part −R, such that A = D−L−R. The minus sign in front of L and R are introduced for convenience. Now, AU = b is identical to DU = (L+R)U +b and U = D−1 (L+R)U +D−1 b, This leads to the Jacobi iterative method: U( j+1) = D−1 (L+R)U( j) +D−1 b, j = 0,1,2,... (8.3) We have to start with an initial guess U(0), after that we can use (8.3) to calculate U(1), U(2), ... We can prove that for the finite difference approximation of the Poisson equation, i.e., for (8.2), Jacobi iteration converges. To prove the convergence, we need to show that all eigenvalues of D−1(L+R) have 62 magnitude less than 1. Meanwhile, we can also write (8.2) as (D−L)U = RU +b. Therefore, we have the following Gauss-Seidel iterative method: solve U( j+1) from (D−L)U( j+1) = RU( j) +b. (8.4) Notice that D − L is a lower triangular matrix, therefore, it is easy to solve a linear system with a coefficient matrix D − L. Again, for the discrete Poisson equation, we can prove that Gauss-Seidel iterative method converges. For this purpose, it is necessary to show that the eigenvalues of (D−L)−1R has megnitude less than 1. Finally, we can multiply (8.2) by a parameter ω, then add DU to both sides: DU +ω(D−L−R)U = DU +ωb. This can be written as (D−ωL)U = [(1−ω)D+ωR]U +ωb. This leads to the following Successive Overrelaxation (SOR) method developed by Young and Frankel in 1950: solve U( j+1) from (D−ωL)U( j+1) = [(1−ω)D+ωR]U( j) +ωb. (8.5) For the discrete Poisson equation, SOR method converges if 0 < ω < 2. The optimal parameter is ω = 2 1+sin(πδ) , where δ = 1/(n + 1) is the grid size as in section 8.1. These three iterative methods are all classical iterative methods. The conjugate gradient method, introduced by Hestenes and Stiefel in 1952, is a modern iterative method with a faster convergence rate. The discrete Poisson equation (8.2) can also be efficiently solved by the multi-grid method, where the numerical solutions with larger grid sizes are used to improve the approximation of the numerical solution at the smallest grid size. 63 8.4 Conjugate gradient method The conjugate gradient method is a method for solving Ax = b, where A is a symmetric positive definite matrix. Here the size of A is large, thus a direct method by Cholesky decomposition (related to the LU decomposition) is expensive. But A is sparse — only very few non-zeros for each row or each column, thus it is efficient to multiply A with any given vector. It is an iterative method that produces the sequence of approximations: x1, x2, x3, .... Let A be m×m, define the Krylov space by Kn =< b,Ab,A2 b,...,An−1 b > This is the vector space spanned by the vectors b, Ab, ..., An−1b. It is the “column space” of the Krylov matrix Kn = [b,Ab,A2 b,...,An−1 b]. The conjugate gradient method finds xn ∈ Kn which solves the minimization problem min x∈Kn (x−x∗)T A(x−x∗) where x∗ = A−1b is the exact solution. However, since (x−x∗)T A(x−x∗) = 2φ(x)−bT A−1 b, for φ(x) = 1 2 xT Ax−xT b. It is equivalent to say that xn solves min x∈Kn φ(x). 8.4.1 1-D optimization problem For a given point xn−1 and a given direction pn−1, we have a line that passes through xn−1 along the direction of pn−1. The points on the line are given by xn−1 +αpn−1 for α ∈ R Alternatively, we denote this line by xn−1+ < pn−1 > where < pn−1 > is a 1-D vector space. We can minimize the function φ along this line min x∈xn−1+ φ(x) = min α∈R φ(xn−1 +αpn−1) Now, φ(xn−1 +αpn−1) is a quadratic polynomial of α, its minimum is reached at αn = rT n−1 pn−1 pT n−1Apn−1 The minimum is obtained at xn−1 +αn pn−1. If xn−1 happens to be a conjugate gradient iteration, i.e., xn−1 minimizes φ(x) in Kn−1. The above procedure gives ˜xn = xn−1 +αn pn−1 Of course, ˜xn is usually not xn which minimizes φ in Kn. However, we will find a special way of choosing pn−1, such that ˜xn = xn. 64 8.4.2 Subspace minimization problem We now look for xn ∈ Kn such that φ(xn) = min x∈Kn φ(x) We assume that Kn has the following basis p0, p1,..., pn−1 Now, min x∈Kn φ(x) = min α1,α2,...,αn∈R φ(α1 p0 +α2 p1 +...+αnpn−1) To find the minimum, we solve the system ∂φ ∂αi = 0 for i = 1,2,...,n. In fact, ∂φ ∂αi = pT i−1A(α1 p0 +α2 p1 +...+αnpn−1)− pT i−1b Therefore, we have the system for α1, α2, ..., αn: C      α1 α2 ... αn      =      pT 0 b pT 1 b ... pT n−1b      where the (i+1, j +1) entry of C is pT i Apj. If we assume that pT i Apj = 0 if i = j then the matrix C is diagonal and αi is easily solved αi = pT i−1b pT i−1Api−1 . Furthermore, if we assume that {p0, p1,..., pi−1} is a basis for Ki for all i (we only assume that for i = n earlier), then xn−1 = α1 p0 +α2 p1 +...+αn−1pn−2 is the conjugate gradient iteration that minimizes φ in Kn−1 and xn = xn−1 +αn pn−1 Indeed, you can show that the formula for αn here is equivalent to the formula in last section. Therefore, the subspace minimization problem can be solved by 1-D optimization process under these assumptions on the search vectors p0, p1, ..., pn−1. 65 8.4.3 Orthogonal residual Clearly, we need a simple way to find these vectors p0, p1, .... It turns out that the following property on the residual is very important. Let xn be the n-th conjugate gradient iteration, rn = b− Axn be the residual, then rn ⊥ Kn. 8.4.4 The next conjugate direction Suppose xj is the conjugate gradient iteration that solves the subspace minimization problem minx∈K j φ(x), it is not difficult to realize that Kn =< x1,x2,...,xn >=< r0,r1,...,rn−1 > where r0 = b−Ax0 = b. We also assume that K j =< p0, p1,..., pj−1 > for j ≤ n The question now is how to choose pn, such that • Kn+1 =< p0, p1,..., pn >; • pT n Apj = 0 for j = 0, 1, 2, ..., n−1. To satisfy the first condition, we realize that rn = b−Axn is in Kn+1 (and not in Kn), therefore, we can choose pn = rn +a component in Kn to satisfy the second condition. The component in Kn can be written as βn pn−1 +(∗)pn−2 +...+(∗)p0 since {p0, p1,..., pn−1} is a basis of Kn. We use the condition pT j Apn = pT n Apj = 0 (since A = AT ) to find the coefficients. For j ≤ n−2, we have 0 = pT j Apn = pT j Arn +(∗)pT j Apj Now, pT j Arn = rT n (Apj) = 0, since pj ∈ Kn−1 or Apj ∈ Kn (and rn ⊥ Kn as in the last section), therefore, (∗) = 0. Meanwhile, we obtain pn = rn +βn pn−1 for βn = − rT n Apn−1 pT n−1Apn−1 66 8.4.5 The method We now have the following conjugate gradient method: • Let x0 = 0, r0 = b, p0 = r0. • For n = 1,2,3,... αn = rT n−1rn−1 pT n−1Apn−1 xn = xn−1 +αn pn−1 rn = rn−1 −αnApn−1 βn = rT n rn rT n−1rn−1 pn = rn +βnpn−1 We notice that the formulas for αn and βn are different. But they are equivalent to the formulas in previous sections. One step of this algorithm requires • Evaluate v = Apn−1; • 2m operations for pT n−1v = pT n−1Apn−1; • 2m operations for xn = xn−1 +αn pn−1; • 2m operations for rn = rn−1 −αnv = rn−1 −αnApn−1; • 2m operations for rT n rn; • 2m operations for pn = rn +βn pn−1 This is a total of 10m operations, plus one matrix vector multiplication. Exercise: Using the standard notations for the Conjugate Gradient method, where xn is the n-th iteration of the approximate solution (for Ax = b, assuming x0 = 0), rn is the residual, pn is the n-th A-conjugate direction, show that αn = pT n−1b pT n−1Apn−1 = rT n−1 pn−1 pT n−1Apn−1 = rT n−1rn−1 pT n−1Apn−1 βn = − rT n Apn−1 pT n−1Apn−1 = rT n rn rT n−1rn−1 . 8.4.6 Rate of convergence For a vector y and the symmetric positive definite matrix A, we define ||y||A = yT Ay. Now, for the conjugate gradient method, we can prove ||xn −x∗||A ||x0 −x∗||A ≤ 2 √ κ−1 √ κ+1 n (8.6) where x∗ = A−1b, x0 = 0, κ = λ1/λm, λ1 and λm are the largest and smallest eigenvalues of A, respec- tively. 67