11 Stochastic processes and an introduction to stochastic differential equations 11.1 DETERMINISTIC AND STOCHASTIC DIFFERENTIAL EQUATIONS A differential equation usually expresses a relation between a function and its derivatives. For example, if t> t0 represents time, and the rate of growth of a quantity y(t) is proportional to the amount y(t) already present, then we have dv ■r = ky, H.l) dt where k is a constant of proportionality. Equation (11.1) is called a first-order differential equation because the highest order derivative appearing is the first derivative. It is also called linear because both y and its derivative occur raised to power 1. Equation (11.1) may be viewed as a prescription or mathematical model for finding y at all times subsequent to (or before) a given time t0 at which the value y0 of y is known. This is expressed in the solution of (11.1), M = V«-»«> (11.2) which has the same form as the Malthusian population growth law of Section 9.1. It is also a formula for finding an asset value with compound interest when the initial value is y0. In the natural sciences (biology, chemistry, physics, etc.), differential equations have provided a concise method of summarizing physical principles. An important example of a nonlinear first-order differential equation is Verhulst's logistic equation: 220 Stochastic processes with r > 0. This equation is frequently used to model the growth of populations of organisms. The quantity y* is called the carrying capacity whereas r is called the intrinsic growth rate. It will be seen in Exercise 1 that the solution of (11.3) which passes through the value y0 at time t = t0 is Figure 11.1 shows how populations evolve for different starting values. As t -> oo the population approaches the value y* asymptotically, which explains the term carrying capacity. Since its inception by the Belgian mathematician Verhulst (1838), the logistic equation has been used for many different populations, including those of cancer cells (Thompson and Brown, 1987) as well as human populations over countries (Pearl and Reed, 1920), continents and the world (Tuckwell and Koziol, 1992, 1993). The differential equations (11.1) and (11.3) we have thus far considered are called deterministic because a given initial value determines the solution completely for all subsequent times. The behaviour of the solution is totally predictable and there are no chance elements. Put another way, the trajectory y(t) is fixed (it is a particular function) and there are no haphazard or random fluctuations. Deterministic differential equations proved to be extremely powerful in Figure 11.1 Showing solutions of a logistic differential equation for various initial population sizes. (11.4) y Deterministic and stochastic differential equations 221 some branches of classical physics and chemistry, but at the beginning of the twentieth century the study of atomic and subatomic systems indicated that deterministic theories were inadequate. Thus quantum mechanics, which is fundamentally probabilistic, was formulated to describe changes in very small systems (see for example, Schiff, 1955). Furthermore, in complex systems, containing millions or billions of interacting particles, the application of deterministic methods would have been so laborious that scientists also devised probabilistic methods for them. Such considerations for large collections of atoms or molecules lead to the discipline of statistical mechanics (see for example, Reichl, 1980). In the latter part of the twentieth century quantitative methods have become increasingly widely used in the study of intrinsically complex systems such as arise in biology and economics. The use of deterministic methods is limited so there has been a large and rapid development in the application of probabilistic methods. One such very useful concept has been that of stochastic differential equations. In the case of deterministic differential equations which are useful for quantitatively describing the evolution of natural systems, the solution is uniquely determined, usually by imposing a starting value and possibly other constraints. In the case of stochastic differential equations there are several possible trajectories or paths over which the system of interest may evolve. It is not known which of these trajectories will be followed, but one can often find the probabilities associated with the various paths. The situation Figure 11.2a The three records on the left (A) show the fluctuations in the resting electrical potential difference across a nerve cell membrane. These fluctuations can be modelled with a stochastic differential equation involving a Wiener process - see section 12.7. On the right (B) is shown a histogram of amplitudes of the fluctuations, fitted with a normal density (from Jack, Redman and Wong, 1981). 0 1msec (A) (B) 222 Stochastic processes $6.001- $3.00 5/1/90 25/5/90 12/10/90 1/3/91 19/7/91 6/12/91 24/4/92 11/9/92 29/1/93 Figure 11.2b Here are shown the fluctuations in the price of a share (Coles-Myer Limited) from week to week over a period of a few years. Such fluctuations can also be modelled using a stochastic differential equation - see section 12.7. is simiiar to that in the simple random walk which we studied in Chapter 7, except that in most cases the time variable is continuous rather than discrete. We could say that the quantity we are looking at wanders all over the place in a random and thus unpredictable fashion. Physical examples of quantities which might be modelled with stochastic differential equations are illustrated in Figs 11.2a and 11.2b. In the first of these we show a record of fluctuations in the electrical potential difference across the membrane of a nerve cell in a cat's spinal cord (a spinal motorneurone which receives messages from the brain and sends messages to a muscle fibre which may result in a movement). In the second example, the weekly variations in the price of an industrial share are shown from May 1990 to January 1993. 11.2 THE WIENER PROCESS (BROWNIAN MOTION) The most useful stochastic differential equations have proven to be those which involve either Wiener processes or Poisson processes. When Wiener processes are involved, the solutions are usually continuous whereas when Poisson processes are involved the solutions exhibit jumps. Most of our discussion focuses on continuous processes so that our immediate concern is to define Wiener processes and discuss their properties. In Section 7.8 we considered a simple random walk and let the step size get smaller as the rate of their occurrence increased. We took this to the limit of zero step sizes and an infinite rate of occurrence, but did so in such a way that the variance at any time neither vanished nor became unbounded. In fact, the variance of the limiting random process at time t was made to equal t. The symbol we employ for the limiting process, which we call the Wiener The Wiener process 223 process, is W — {W(t),t ^0}. However, this process can be defined in a more general way, which makes no reference to limiting operations. In this section we will give this a more general definition, and discuss some of the elementary yet important properties of W. Before we give this definition we will define a large class of processes to which both the Wiener process and Poisson process belong. This consists of those processes whose behaviour during any time interval is independent of their behaviour during any non-overlapping time interval. We will restrict our attention to processes whose index set (see section 7.1) is continuous. Definition Let X = {X(t)} be a random process with a continuous parameter set [0, F], where 0 < T < oo. Let n > 2 be an integer and suppose 0 < r0 < t: < t2 < •■■ < f„ ^ T. Then X is said to be a random process with independent increments if the n random variables X(tx)- X(t0), X(t2) - *(*,),..., X(t„) - *(*,_,), are independent. Thus, increments in X which occur in disjoint time intervals are independent. This implies that the evolution of the process after any time s > 0 is independent of the history up to and including 5. Thus any process with independent increments is a Markov process as will be shown formally in the exercises. The converse is not true. We have already encountered one example of an independent-increment process in section 9.2 - the Poisson process. Before defining a Wiener process, we mention that if the distributions of the increments of a process in various time intervals depend only on the lengths of those intervals and not their locations (i.e., their starting values), then the increments are said to be stationary. In section 9.2 we saw that for a Poisson process N = {N(t), t > 0}, the random increment N{t2) — N(tt) is Poisson distributed with a parameter proportional to the length of the interval (tut2]- Thus a Poisson process has stationary independent increments. Definition A standard Wiener process W= {W{i), t > 0}, on [0,7'], is a process with stationary independent increments such that for any 0 < tj < f2 ^ T, the increment W(ti) — W{t\) is a Gaussian random variable with mean zero and variance equal to t2 — tt; i.e., £[ff(r2)-^(r,)] = 0, V*rlW(t2)-W(tl)] = t2-t1. Furthermore, with probability 1. IV(0) = 0, 224 Stochastic processes The probability density p(x; f j, t2) of the increment of W in the interval (f,,f2) is defined through Pr{ W(t2) - W(t!)e(x, x + A.x]} = p(x; t,, t2)Ax + o(Ax). From the definition of W we see that this is given by p(x;r1,t2) = 1 exp 2(f2-f,). 5) In the case tl = 0, it is seen that the random variable W(t2) has mean 0 and variance t2. Thus, for any t > 0, W(t) is a Gaussian random variable with mean 0 and variance t, so that its probability density p{x; t) is given by the simple expression P(x;t). I exp It The word 'standard' in the definition refers to the fact that the mean is zero, the variance at t is t and the initial value is zero. Sample paths It can be proved for the random process defined above, that the sample paths or trajectories are continuous with probability one. Sample paths are also called realizations and correspond to a 'value' of the process when an experiment is performed. That is, supposing it is possible to observe a standard Wiener process over the time interval [0, T], we would see, with probability one, a continuous function starting at the origin, wandering around haphazardly and reaching some random end-value W(T) -as in Fig. 11.3. WW Figure 11.3 A depiction of a few sample paths for a Wiener process. The Wiener process 225 Note, however, that there are possibly discontinuous paths but these have zero probability associated with them. Usually, attention is restricted to those paths which are in fact continuous and in fact continuity of sample paths is often included in the definition. This is a convenient way to discard the problem of the discontinuous paths. Although the probability of finding a continuous trajectory for W is one, the probability is zero that at any time fe[0, 7] the path is differentiate. This is considered to be a pathological property and is one reason why a study of the Wiener process has been so interesting to mathematicians. This, and the fact that sample paths have unbounded variation, are proved and elaborated on in, for example, Hida (1980). An elementary consideration is given in Exercise 3. Mean value and covariance function An important property of a random process X is its mean at time t, E(X(t)), which is often called its mean value function, being a function of t alone. We have the mean and variance of W(t) immediately from the above definition. To further understand the behaviour of a random process, it is useful to know how its value at any time is connected with its value at any other time. Although knowing the joint probability distribution of these values would be nice, we may be content with a rougher indication. To this end we make the following definition. Definition The covariance function of a random process is the covariance (cf. Chapter 1) of the values of the process at two arbitrary times. Note that sometimes the covariance function is called an autocovariance function to distinguish it from a covariance between two different processes. It is also useful to define a class of processes whose covariance function depends only on the difference between the times at which it is evaluated and not on their location. Definition If the covariance function Cov(A'(i), X(t)) depends only on \ t — s\, the random process A" is said to be covariance stationary. Other terms for this are wide-sense stationary or weakly stationary. If A" is a weakly stationary process, we may put Co\(X(s), X(s + t)) = R(t). We can see for such a process that (see Exercises): (a) the mean value function is a constant; and, 226 Stochastic processes (b) the covariance function is an even function: R(t) = K(-t). In the case of a standard Wiener process we will see that the following is true. The covariance function of a standard Wiener process is Cov(W(s), W(t)) = mm(s,t), where min(.,.) is defined as the smaller of the two arguments. Proof We utilize the fact that the increments of a Wiener process over disjoint (nonoverlapping) time intervals are independent random variables and hence have covariance equal to zero. With s < t we have Cov[W(s), W(t)-W(s)] = 0. The quantity we seek can be written Cov[W(s), W(t) + W(s) - W(s)l But in general, if A,B, and C are three random variables (see Exercises), Cov[4, B + C] = Cov [A, B] + Cov [A, C]. Thus, Cov[W(s), W{t)1 = Cov[W(s), W(t)- W(sY\ + Cov[W(s), W{s)] = Co\[W(s),W(s)] = Var[W(s)] = s. Had t been less than s we would have obtained r instead of s, Hence the covariance is the smaller of s and t, which proves the result. Note that the Wiener process is not therefore covariance-stationary as the covariance of W(s) and W(t) depends directly on the magnitude of the smaller of s or t. For further information on the topics we have dealt with in this section, the reader may consult Papoulis (1965), Parzen (1962) and Yaglom (1973). 11.3 WHITE NOISE Although the Wiener process is of central importance in the theory of stochastic differential equations, there is a useful related concept, called white noise, which we introduce in this section. The paths traced out by a Wiener process are with probability one not differentiable. However, it is often convenient to talk about the derivative of White noise 227 W as if it did exist. We use the symbol w(t) for the 'derivative' of W(t), and we call the random process w = {w(t), t ^ 0}, (Gaussian) white noise. However, if must be remembered that, strictly speaking, this process does not have a well-defined meaning - it is nevertheless heuristically useful. The word noise, of course, refers to unwanted signals. If you are in a crowded cafeteria or football stadium or surrounded by dense city traffic, close your eyes and listen, you will hear a noise that seems an amorphous assortment of meaningless sounds; you generally won't be able to pick out particular signals unless they originate close-by. This kind of background noise is an acoustic approximation to white noise. Sound engineers have devices called white noise generators which are used to test the acoustic properties of rooms - the basic idea is to subject the chamber to all frequencies at once. The mean value and covariance functions of white noise can be obtained from those of a Wiener process - as will be seen in the exercises. These turn out to be E[vv(r)] = 0, Cov[w(s),w(r)] = (5(r-4 (11.6) Thus the covariance is zero whenever s # t and is very very large when s = t. Covariance functions are often decomposed to see if there are regularities present, especially in the form of periodicities or harmonics of various frequencies. Such a decomposition is done using the following definition. Note that we restrict our attention to real-valued processes. Definition \The spectral density S(k) of a covariance-stationary random process whoseN^ovariance function is R(t),t ^ 0, is given by Hie integral S(k)= cos(kt)R(t)dt. (11.7) J - CO / The reader may recognize this as the Fourier transform of R(t), recalling that the latter is here an eyen function of t. Another name for S(k) is the power spectrum - it indicatesMhe contributions from various frequencies to the total activity of the processX / A knowledge of the spectral density can be used to obtain the covariance function using the following inversioVformula which is proved in courses of analysis (see for example Wylie, 1960),\ i \ R(t) = — S(k)cdsikt)dk. (11.8) / J — tr v / \ Let us see how various harmonics in R(t) manifest themselves in S(k). Suppose S(k) were7very much concentrated around tFie^single frequency k0 228 \Stochastic processes so we rnigrvi put S{k) = 5{k — k0). Then R(t) 1 2tt <5(/<-/co)cos(/c0d/c = — cos(/c0r) where we have used the substitution property of .t-he'delta function (formula (3.13)). Thus we see that a very large peak in the/Spectral density S{k) comes about at k0 if there is a single dominant frequency k0/2n in the covariance function R(t). \ / Let us consider white noise wk) from/this point of view. We have from Equation (11.6), R(x) = 3(t). Substituting this in the definition of the spectral density gives S(k) cos(kV0{t)dt where we have used/fhe substitution property and the fact that cos(0)= 1. This tells us that the spectral density of white noise is a constant, independent ofrne frequency. That is, all frequencies contribute equally, from — oo to oo, whereby we can see the analogy with 'white ligit'. Hence the description/of the derivative of a Wiener process as (Gaussianj'white noise. It is reah^ed that it is not physically possible to have frequencies o^er such a huge/range. In engineering practice white noise generators have\ut-off frecurencies at finite values - they are called band-limited white noises. Same-times white noise is called delta-correlated noise. 11.4 THE SIMPLEST STOCHASTIC DIFFERENTIAL EQUATIONS - THE WIENER PROCESS WITH DRIFT In this section we will take a first look at stochastic differential equations involving Wiener processes. A more detailed account will be given in the next chapter. The increment in a standard Wiener process in a small time interval (t,t + A/] is AW(t)= W(t + At)- W(t), and we know from above that AW is normally distributed with mean zero and variance At. We use a similar notation as in differential calculus and The simplest stochastic differential equations 229 use the symbol dW(t) or dW to indicate the limiting increment or stochastic differential as Af->0. The simplest stochastic differential equation involving a Wiener process is thus: dX = d\Y (11.9) which states that the increments in X are those of W. The solution of (11.9) is X(t) = X(0) + W(t), which states that the value of the process X at time f, namely the random variable X(t), is equal to the sum of two random variables: the initial value A'(O) and the value of a standard Wiener process at time f. Equation (11.9) is interpreted more rigorously as the corresponding integral dX{f) = ft W(t'), whose meaning will be explained in section 12.5. This gives X(t) - X(0) = W(t) - W(0) = Wit), which is the same as (11.10) because from the definition, W(0) = 0, identically. Notice that when writing stochastic differential equations involving a Wiener process, we usually avoid writing time derivatives because, as we have seen, these do not, strictly speaking, exist. However, we can, if we are careful in our interpretation, just as well write (11.9) as dX — = »VW. df where w is white noise. We may perform simple algebraic operations on a standard Wiener process. For example, we can form a new process whose value at time t is obtained by multiplying W(t) by a constant a, usually assumed to be positive; adding a linear function of time ^f, where fi can be negative, zero, or positive; and giving a particular initial value A"(0) = .xo: X(t) = x0 + iit + aW(t) (11.10) This defines a Wiener process with drift fit and variance parameter a. The drift function \xt here is linear, though any other deterministic function of time can be added instead of jxt. For the random process defined by (11.10) we write the stochastic differential equation dX = ndt + adW, (11.11) and say that (11.10) is a solution of (11.11) with a particular initial value. 230 Stochastic processes The following properties of a Wiener process with drift will be verified in the exercises: E[X(t)l = x0 + (it, Covpffs), X{t)] = a2mm(s,t). Var[X(f)] = ff2f. To obtain the probability density function for the Wiener process with drift, as defined by (11.10), we note, as proven in introductory probability theory, that linear operations on a Gaussian random variable produce another Gaussian random variable. Thus X(t) must be a Gaussian random variable with mean and variance as given above. Its probability density function, conditioned on an initial value ,x0, is defined through either p{x,t\x0)= lim Ax-0 Pr{x < X{t) < x + Ax\X{0) = ,x0} Ax or p{x,r|x0)Ax = Pr{x < X{t) 0, — x < x0, x < cc. This density must be given by (x-x0-/u)2" p(x,t|x0 '2na2t :exp 7«2 (11.12) (Note that when dealing with continuous random variables as we are here, we can put < rather than ^ in inequalities because single points make no contribution.) Figure 11.4 A depiction of a few sample paths for a Wiener process with drift X{t) = x0 + fit + aW(t) with x0 = 1, H = j, and a = 1. Transition probabilities 231 In anticipation of the material in the next chapter, we mention that the function p(x,t\x0), given in (11.12), satisfies a simple partial differential equation called a heat equation. This will be familiar to students either from calculus or physics courses and here takes the form, dp dp a d2p — = - n----h-- dt ox 2 ox ("•") as will be verified in the exercises. It can be seen, therefore, that asserting that the probability density of a Markov process satisfies this partial differential equation, is, for all intents and purposes, the same as saying that the process is a Wiener process with drift. Figure 11.4 illustrates how a Wiener process with drift might behave in the case of a positive drift, with drift parameter \x = \ and variance parameter a = 1 when x0 = 1. 11.5 TRANSITION PROBABILITIES AND THE CHAPMAN KOLMOGOROV EQUATION Before considering a wide class of random processes which can be succinctly described in the language of stochastic differential equations, we will lay the groundwork for an analytical approach to studying their properties. We saw in Chapter 8 that the fundamental descriptive quantity for Markov chains in discrete time was the set (or matrix) of transition probabilities. For the processes we considered, it was sufficient to specify the one-step transition probabilities, as the probabilities of all other transitions could be obtained from them. In particular, if the initial probability distribution was specified, the probability distribution of the process could be obtained at any time point - see Equation (8.11). Similarly, in Chapter 9, we saw that a set of transition probabilities could be used to quantitatively describe the evolution of Markov chains in continuous time. The processes we are concerned with here are Markov processes in continuous time which take on a continuous set of values. The evolution of such processes is also specified by giving a set of transition probabilities as alluded to in the case of a Wiener process with drift. In general, let {X(t), t ^ 0} be such a process. Then the transition probability distribution function gives the probability distribution of the value of the process at a particular time, conditioned on a known value of the process at some earlier time. Definition Let X be a continuous time random process taking on a continuous set of values. The transition probability distribution function P(y, t\x,s), with s ^ f, is the distribution function of X(t) conditioned on the event X(s) = x. 232 Stochastic processes Thus, P(y,t\x,s) = Pr{X(t) 0, we have to integrate over all possible initial values x, weighted with /(x)dx and with the probability of a transition from x to y. Pr{y(y) = exp dy can be employed - see the exercises. 12.5 STOCHASTIC INTEGRALS AND STOCHASTIC DIFFERENTIAL EQUATIONS We have seen in section 11.4 that a Wiener process with drift can be characterized by the stochastic differential equation The correct interpretation of this equation is in terms of an integral involving a Wiener process - called a stochastic integral. There are a large number of integrals which one may define in connection with random processes. Mathematical complexities arise when integrals involving W are considered because of the irregular properties of the paths of W. This means that the methods of defining integrals given in real-variable calculus courses cannot be used. We will consider stochastic integrals very briefly and somewhat superficially - there are numerous technical accounts -see for example Gihman and Skorohod (1972), Arnold (1974), Lipster and Shiryayev (1977) or Oksendal (1985). Our main purpose is to enable the reader to understand and know how to use a stochastic differential equation of the general form Equivalently, dropping the reference to t in the random processes, we can write this as where / and g are real-valued functions, W is a standard Wiener process and X is a random process which in cases of interest will be a diffusion process. However, it must be stated at the outset that (12.20) does not always have a unique interpretation. This situation arises for the following reason. Equation (12.20) is interpreted correctly as implying the stochastic integral dX = ndt + odW. dX(t) = f(X(t), t) dt + g(X(t\ t) d W(t). dX =f(X,t)dt + g(X,t)d\V, (12.20) Stochastic integrals 261 equation X(t) = X(Q) + f(X(t'),t')dt'+ J 0 g(X(t'),t')dW(t'\ (12.21) o and the process X so defined is called a solution of the stochastic differential equation (12.20). Although the first integral here presents no problems, there are many ways of defining the second one, g(X(t'),t')d\V(t'), 0 which is called a stochastic integral. Furthermore, the different definitions can lead to various solutions, X, with quite different properties. Despite this apparent ambiguity, there are two useful definitions which are most commonly employed - the Ito stochastic integral and the Stratonovich stochastic integral; and there is a simple relation between these two. A note on notation. It is preferable in (12.20) not to 'divide' throughout by df, because as we have seen, the derivatives of W and hence of X do not exist in the usual sense. However, as long as we keep that in mind, it is possible to display (12.20) as a stochastic differential equation involving white noise w, the 'derivative' with respect to time f of W (see section 11.3): or perhaps even ~=f(X,t) + g(X,t)w, df d^=f(X,t) + g(X,t)W. dt Stochastic differential equations written in this form are often called Langevin equations. Let us now make an important observation on the stochastic differential equation (12.20). If the function g is identically zero, the differential equation is deterministic and can be written in the usual way dt Assuming the initial value X(0) = x0 is not random then X{t) is non-random for all f and this equation is solved in the usual way. We expect that the behaviour of solutions of this deterministic equation would be related to those of the stochastic differential equation (12.20), and be close to them when the noise term g is small. We would be correct in believing that, in particular, the expected value E[X(t)] of the solution of 262 Diffusion processes (12.20) would not be very far, in most cases, from the solution of the deterministic equation. This can be illustrated nicely with the Wiener process with linear drift /if and variance parameter a. From section 11.4, this process has the stochastic differential equation (11.11): dX = ndt + odW. Here f(X,t) = n, and g(X,t) = a. If we put c = 0 we obtain the deterministic differential equation dX *="' The solution of this with initial value x0 is X(t) = x0 + /it, and this, as seen in section 11.4, is equal to the mean value function of the process satisfying (11.11). The added noise makes the paths of X very irregular, but the mean value is still /it. This was depicted in Fig. 11.4. Heuristic interpretation Before proceeding more formally, let us describe roughly how we can understand an equation of the form (12.20). This can perhaps best be accomplished by writing the related difference equation AX = f {X, t)At + g(X, t)A W. (12.22) Here we may regard the (random) increment in X in the time interval (f,t + Af] as having two components. The first component is equal to the value of f(X,t) at the beginning of the time interval multiplied by the length At of the time interval. The second component is the value of g(X, t) at the beginning of the time interval, multiplied by the (random) increment AW = W(t + At) — W{t) that occurs in a standard Wiener process in At. As we have seen, AH7 is a Gaussian random variable with mean zero and variance Af. We have essentially outlined a method of simulation of the stochastic differential equation (12.20)-this will be elaborated on below. It should be realized, however, that even though the functions / and g are functions in the usual deterministic sense, both the components of the increment in X, namely, fAt and gAW, are random variables, because