show that the probability of extinction is _l-v/l-4p(l-p) 8. Use Fig. 10.4b to show graphically that x„-*x* when p>l. 9. A branching process has initially one individual. Use the law of total probability in the form Pr(extinction) = £Pr(extinction\k descendants) Pr(/c descendants) k to deduce that the extinction probability x is a solution of x = P(x). 10. Let {X„, n = 0,1,2,...} be a branching process with X0 = 1 and with the number of offspring per individual 0,1,2 with probabilities p,q,r, respectively, where p + q + r = 1 and p,q,r> 0. Show that if q + 2r > 1, the probability of extinction is ^_1-g-V0-q)2-4pr 2r 11. Assume, very roughly speaking, that a human population is a branching process. What is the probability of extinction if the proportion of families having 0, 1 or 2 children are 0.2, 0.4 and 0.4 respectively? 11 Stochastic processes and an introduction to stochastic differential equations 11.1 DETERMINISTIC AND STOCHASTIC DIFFERENTIAL EQUATIONS A differential equation usually expresses a relation between a function and its derivatives. For example, if t^t0 represents time, and the rate of growth of a quantity y{t) is proportional to the amount y(i) already present, then we have — = kv, dt (11.1) where k is a constant of proportionality. Equation (11.1) is called a first-order differential equation because the highest order derivative appearing is the first derivative. It is also called linear because both y and its derivative occur raised to power 1. Equation (11.1) may be viewed as a prescription or mathematical model for finding y at all times subsequent to (or before) a given time t0 at which the value y0 of y is known. This is expressed in the solution of (11.1), (11.2) which has the same form as the Malthusian population growth law of Section 9.1. It is also a formula for finding an asset value with compound interest when the initial value is y0. In the natural sciences (biology, chemistry, physics, etc.), differential equations have provided a concise method of summarizing physical principles. An important example of a nonlinear first-order differential equation is Verhulst's logistic equation: dy — = ry dt (11.3) with r >0. This equation is frequently used to model the growth of population. Of organisms. The quantity y* is called the carrying capacity whereas r is called the intrinsic growth rate. It will be seen in Exercise 1 that the solution of (11.3) which passes through the value yQ at time t = t0 is y(t) 1 + y0 (11.4) Figure 11.1 shows how populations evolve for different starting values. As t -* oo the population approaches the value y* asymptotically, which explains the term carrying capacity. Since its inception by the Belgian mathematician Verhulst (1838), the logistic equation has been used for many different populations, including those of cancer cells (Thompson and Brown, 1987) as well as human populations over countries (Pearl and Reed, 1920), continents and the world (Tuckwell and Koziol, 1992, 1993). The differential equations (11.1) and (11.3) we have thus far considered are called deterministic because a given initial value determines the solution completely for all subsequent times. The behaviour of the solution is totally predictable and there are no chance elements. Put another way, the trajectory y(t) is fixed (it is a particular function) and there are no haphazard or random fluctuations. Deterministic differential equations proved to be extremely powerful in PcElon .sST"8 SOlUti°nS °f " '0giStiC differCntial equation for various WW some branches of classical physics and chemistry, but at the beginning of the twentieth century the study of atomic and subatomic systems indicated that deterministic theories were inadequate. Thus quantum mechanics, which is fundamentally probabilistic, was formulated to describe changes in very small systems (see for example, Schiff, 1955). Furthermore, in complex systems, containing millions or billions of interacting particles, the application of deterministic methods would have been so laborious that scientists also devised probabilistic methods for them. Such considerations for large collections of atoms or molecules lead to the discipline of statistical mechanics (see for example, Reichl, 1980). In the latter part of the twentieth century quantitative methods have become increasingly widely used in the study of intrinsically complex systems such as arise in biology and economics. The use of deterministic methods is limited so there has been a large and rapid development in the application of probabilistic methods. One such very useful concept has been that of stochastic differential equations. In the case of deterministic differential equations which are useful for quantitatively describing the evolution of natural systems, the solution is uniquely determined, usually by imposing a starting value and possibly other constraints. In the case of stochastic differential equations there are several possible trajectories or paths over which the system of interest may evolve. It is not known which of these trajectories will be followed, but one can often find the probabilities associated with the various paths. The situation Noise 500uV L 1msec (a i (b) Figure 11.2a The three records on the left (A) show the fluctuations in the resting electrical potential difference across a nerve cell membrane. These fluctuations can be modelled with a stochastic differential equation involving a Wiener process - see section 12.7. On the right (B) is shown a histogram of amplitudes of the fluctuations, fitted with a normal density (from Jack, Redman and Wong, 1981). ZZZ Stochastic processes $3.00 5/1/90 25/5/90 12/10/90 1/3/91 19/7/91 6/12/91 24/4/92 11/9/92 29/1/93 Figure 11.2b Here are shown the fluctuations in the price of a share (Coles-Myer Limited) from week to week over a period of a few years. Such fluctuations can also be modelled using a stochastic differential equation - see section 12.7. is similar to that in the simple random walk which we studied in Chapter 7, except that in most cases the time variable is continuous rather than discrete. We could say that the quantity we are looking at wanders all over the place in a random and thus unpredictable fashion. Physical examples of quantities which might be modelled with stochastic differential equations are illustrated in Figs 11.2a and 11.2b. In the first of these we show a record of fluctuations in the electrical potential difference across the membrane of a nerve cell in a cat's spinal cord (a spinal motorneurone which receives messages from the brain and sends messages to a muscle fibre which may result in a movement). In the second example, the weekly variations in the price of an industrial share are shown from May 1990 to January 1993. 11.2 THE WIENER PROCESS (BROWNIAN MOTION) The most useful stochastic differential equations have proven to be those which involve either Wiener processes or Poisson processes. When Wiener processes are involved, the solutions are usually continuous whereas when Poisson processes are involved the solutions exhibit jumps. Most of our discussion focuses on continuous processes so that our immediate concern is to define Wiener processes and discuss their properties. In Section 7.8 we considered a simple random walk and let the step size get smaller as the rate of their occurrence increased. We took this to the limit of zero step sizes and an infinite rate of occurrence, but did so in such a way that the variance at any time neither vanished nor became unbounded. In fact, the variance of the limiting random process at time t was made to equal t. The symbol we employ for the limiting process, which we call the Wiener rne wiener process tto process, is W= {W{t),t JsO}. However, this process can be defined in a more general way, which makes no reference to limiting operations. In this section we will give this a more general definition, and discuss some of the elementary yet important properties of W. Before we give this definition we will define a large class of processes to which both the Wiener process and Poisson process belong. This consists of those processes whose behaviour during any time interval is independent of their behaviour during any non-overlapping time interval. We will restrict our attention to processes whose index set (see section 7.1) is continuous. Definition Let X = {X(t)} be a random process with a continuous parameter set [0, J], where 0 < T < oo. Let n ^ 2 be an integer and suppose 0 < /0 < *\ < h < •••< t„ < T. Then X is said to be a random process with independent increments if the n random variables X(h) - X(t0),X(t2) - X(tt),...,X(tn) - *(*„_!), are independent. Thus, increments in X which occur in disjoint time intervals are independent. This implies that the evolution of the process after any time s>0 is independent of the history up to and including s. Thus any process with independent increments is a Markov process as will be shown formally in the exercises. The converse is not true. We have already encountered one example of an independent-increment process in section 9.2 - the Poisson process. Before defining a Wiener process, we mention that if the distributions of the increments of a process in various time intervals depend only on the lengths of those intervals and not their locations (i.e., their starting values), then the increments are said to be stationary. In section 9.2 we saw that for a Poisson process N = {N(t), t $t 0}, the random increment N(t2) - N(tj) is Poisson distributed with a parameter proportional to the length of the interval (tu tj. Thus a Poisson process has stationary independent increments. Definition A standard Wiener process W = {W(t),f£0}, on [0, TJ, is a process with stationary independent increments such that for any 0 < t\ < tj ^ T, the increment W(t2) - W{tx) is a Gaussian random variable with mean zero and variance equal to t2 —t\\ i.e., Var[If(f2)- ff(f,)] = r2 - r,. Furthermore, with probability 1. W>(0) = 0, 224 Stochastic processes The probability density p[x;t1,t2) of the increment of W in the interval (r1(t2) is defined through Pr{W(t2)-W(tl)e(x,x + Ax]} = p(x;t,,t2)Ax + o(Ax). From the definition of W we see that this is given by p(x;t,,t2) = 1 h)) exp 2(í2 - ři)J (11.5) V(27t(f2 ■ In the case t, = 0, it is seen that the random variable W(t2) has mean 0 and variance t2. Thus, for any t > 0, W(i) is a Gaussian random variable with mean 0 and variance r, so that its probability density p(x; t) is given by the simple expression p{x;t) = 1 V(2«t) exp — x ~2t The word 'standard' in the definition refers to the fact that the mean is zero, the variance at tis t and the initial value is zero. Sample paths It can be proved for the random process defined above, that the sample paths or trajectories are continuous with probability one. Sample paths are also called realizations and correspond to a 'value' of the process when an experiment is performed. That is, supposing it is possible to observe a standard Wiener process over the time interval [0, T], we would see, with probability one, a continuous function starting at the origin, wandering around haphazardly and reaching some random end-value W(T) - as in Fig. 11.3. Figure 11.3 A depiction of a few sample paths for a Wiener process. The Wiener process 225 Note, however, that there are possibly discontinuous paths but these have zero probability associated with them. Usually, attention is restricted to those paths which are in fact continuous and in fact continuity of sample paths is often included in the definition. This is a convenient way to discard the problem of the discontinuous paths. Although the probability of finding a continuous trajectory for W is one, the probability is zero that at any time re[0, T] the path is differentiable. This is considered to be a pathological property and is one reason why a study of the Wiener process has been so interesting to mathematicians. This, and the fact that sample paths have unbounded variation, are proved and elaborated on in, for example, Hida (1980). An elementary consideration is given in Exercise 3. Mean value and covariance function An important property of a random process X is its mean at time t, E(X(t)), which is often called its mean value function, being a function of t alone. We have the mean and variance of W{i) immediately from the above definition. To further understand the behaviour of a random process, it is useful to know how its value at any time is connected with its value at any other time. Although knowing the joint probability distribution of these values would be nice, we may be content with a rougher indication. To this end we make the following definition. Definition The covariance function of a random process is the covariance (cf. Chapter 1) of the values of the process at two arbitrary times. Note that sometimes the covariance function is called an autocovariance function to distinguish it from a covariance between two different processes. It is also useful to define a class of processes whose covariance function depends only on the difference between the times at which it is evaluated and not on their location. Definition If the covariance function Co\(X(s), X(t)) depends only on \t — s|, the random process X is said to be covariance stationary. Other terms for this are wide-sense stationary or weakly stationary. If X is a weakly stationary process, we may put Co\(X{s),X(s + z)) = R(x). We can see for such a process that (see Exercises): (a) the mean value function is a constant; and, 226 Stochastic processes (b) the covariance function is an even function: R(z) = R(-x). In the case of a standard Wiener process we will see that the following is true. The covariance function of a standard Wiener process is Cov(W(s),W(t))^m\n(s,t), where min (.,.) is defined as the smaller of the two arguments. Proof We utilize the fact that the increments of a Wiener process over disjoint (nonoverlapping) time intervals are independent random variables and hence have covariance equal to zero. With s < t we have Cov[W(s), W(t)-W(s)'] = 0. The quantity we seek can be written Cov[ W(s), W{t) + W(s) - W(s)l But in general, if A, B, and C are three random variables (see Exercises), Cov IA, B + C] = Cov IA,B] + Cov [A, C]. Thus, Cov[W(s), W{t)2 = Cov[W(s), W(t) - W(s)1 + Cov[W(s), W(s)] = Cov[W(s), W(s)2 = Var[W(s)] = s. Had t been less than s we would have obtained t instead of s. Hence the covariance is the smaller of s and t, which proves the result. Note that the Wiener process is not therefore covariance-stationary as the covariance of W(s) and W(t) depends directly on the magnitude of the smaller of s or t. For further information on the topics we have dealt with in this section, the reader may consult Papoulis (1965), Parzen (1962) and Yaglom (1973). 11.3 WHITE NOISE Although the Wiener process is of central importance in the theory of stochastic differential equations, there is a useful related concept, called white noise, which we introduce in this section. The paths traced out by a Wiener process are with probability one not differentiable. However, it is often convenient to talk about the derivative of White noise 227 W as if it did exist. We use the symbol w(t) for the 'derivative' of W(t), and we call the random process w = {w(t), t ^ 0}, (Gaussian) white noise. However, it must be remembered that, strictly speaking, this process does not have a well-defined meaning - it is nevertheless heuristically useful. The word noise, of course, refers to unwanted signals. If you are in a crowded cafeteria or football stadium or surrounded by dense city traffic, close your eyes and listen, you will hear a noise that seems an amorphous assortment of meaningless sounds; you generally won't be able to pick out particular signals unless they originate close-by. This kind of background noise is an acoustic approximation to white noise. Sound engineers have devices called white noise generators which are used to test the acoustic properties of rooms - the basic idea is to subject the chamber to all frequencies at once. The mean value and covariance functions of white noise can be obtained from those of a Wiener process - as will be seen in the exercises. These turn out to be £[w(£)] = 0, Cov[w(s), wit)'] m 5(t - s). (11.6) Thus the covariance is zero whenever 5 j= t and is very very large when s = t. Covariance functions are often decomposed to see if there are regularities present, especially in the form of periodicities or harmonics of various frequencies. Such a decomposition is done using the following definition. Note that we restrict our attention to real-valued processes. Definition The spectral density S(k) of a covariance-stationary random process whose covariance function is R(t), t > 0, is given by the integral S(*) = cos(kt)R(t)it. (11.7) The reader may recognize this as the Fourier transform of R(t), recalling that the latter is here an even function of t. Another name for S(k) is the power spectrum - it indicates the contributions from various frequencies to the total activity of the process. A knowledge of the spectral density can be used to obtain the covariance function using the following inversion formula which is proved in courses of analysis (see for example Wylie, 1960), R(t) = 2n S(k)cos(kt)dk. (11.8) Let us see how various harmonics in R(t) manifest themselves in S{k). Suppose S(k) were very much concentrated around the single frequency k0 228 Stochastic processes so we might put S(k) = S(k - kQ). Then 1 ~2n ö(k - k0)cos(kt) dk cos(k0t), where we have used the substitution property of the delta function (formula (3.13)). Thus we see that a very large peak in the spectral density S(k) comes about at k0 if there is a single dominant frequency k0/2n in the covariance function R{t). Let us consider white noise w(t) from this point of view. We have from Equation (11.6), R(x) = <5(t). Substituting this in the definition of the spectral density gives S(k): cos{kt)ö{t)dt = 1, where we have used the substitution property and the fact that cos(O) = 1. This tells us that the spectral density of white noise is a constant, independent of the frequency. That is, all frequencies contribute equally, from - oo to oo, whereby we can see the analogy with 'white light'. Hence the description of the derivative of a Wiener process as (Gaussian) white noise. It is realized that it is not physically possible to have frequencies over such a huge range. In engineering practice white noise generators have cut-off frequencies at finite values - they are called band-limited white noises. Sometimes white noise is called delta-correlated noise. 11.4 THE SIMPLEST STOCHASTIC DIFFERENTIAL EQUATIONS - THE WIENER PROCESS WITH DRIFT In this section we will take a first look at stochastic differential equations involving Wiener processes. A more detailed account will be given in the next chapter. The increment in a standard Wiener process in a small time interval (t, t + At] is AW{t)= W(t + At)- W(t), and we know from above that AW is normally distributed with mean zero and variance At. We use a similar notation as in differential calculus and The simplest stochastic differential equations 229 Uke^ueäi)^ use the symbol dW(t) or dW to indicate the limiting increment or stochastic differential as Af-+0. The simplest stochastic differential equation involving a Wiener process is thus: dX = dW (11.9) which states that the increments in X are those of W. The solution of (11.9) is X(t) = X(0) + W(t), S^ffjU which states that the value of the process X at time t, namely the random variable X(t), is equal to the sum of two random variables: the initial value X(0) and the value of a standard Wiener process at time t. Equation (11.9) is interpreted more rigorously as the corresponding integral r\ dX(t')= dW{t'\ J \ ) J 0 whose meaning will be explained in section 12.5. This gives X(t) - X(0) - W(t) - W(0) = W(t), which is the same as (11.10) because from the definition, W{0) = 0, identically. Notice that when writing stochastic differential equations involving a Wiener process, we usually avoid writing time derivatives because, as we have seen, these do not, strictly speaking, exist. However, we can, if we are careful in our interpretation, just as well write (11.9) as dX dt w(t), where w is white noise. We may perform simple algebraic operations on a standard Wiener process. For example, we can form a new process whose value at time t is obtained by multiplying W(t) by a constant 0, - oo < x0,x < oo. This density must be given by p(x,t\x0) 1 ^/lna2t exp (x-x0- fit)2 2a2t (11.12) (Note that when dealing with continuous random variables as we are here, we can put < rather than < in inequalities because single points make no contribution.) Figure 11.4 A depiction of a few sample paths for a Wiener process with drift X(t) = x0 + fa + aW(t) with x0 = 1, n = i and a = 1. Transition probabilities 231 In anticipation of the material in the next chapter, we mention that the function p(x, t\x0), given in (11.12), satisfies a simple partial differential equation called a heat equation. This will be familiar to students either from calculus or physics courses and here takes the form, dp dp 82P dt ^ dx 2 dx2 (11.13) as will be verified in the exercises. It can be seen, therefore, that asserting that the probability density of a Markov process satisfies this partial differential equation, is, for all intents and purposes, the same as saying that the process is a Wiener process with drift. Figure 11.4 illustrates how a Wiener process with drift might behave in the case of a positive drift, with drift parameter p. — \ and variance parameter a = 1 when x0 = 1. 11.5 TRANSITION PROBABILITIES AND THE CHAPMAN-KOLMOGOROV EQUATION Before considering a wide class of random processes which can be succinctly described in the language of stochastic differential equations, we will lay the groundwork for an analytical approach to studying their properties. We saw in Chapter 8 that the fundamental descriptive quantity for Markov chains in discrete time was the set (or matrix) of transition probabilities. For the processes we considered, it was sufficient to specify the one-step transition probabilities, as the probabilities of all other transitions could be obtained from them. In particular, if the initial probability distribution was specified, the probability distribution of the process could be obtained at any time point - see Equation (8.11). Similarly, in Chapter 9, we saw that a set of transition probabilities could be used to quantitatively describe the evolution of Markov chains in continuous time. The processes we are concerned with here are Markov processes in continuous time which take on a continuous set of values. The evolution of such processes is also specified by giving a set of transition probabilities as alluded to in the case of a Wiener process with drift. In general, let {X(t), t ^ 0} be such a process. Then the transition probability distribution function gives the probability distribution of the value of the process at a particular time, conditioned on a known value of the process at some earlier time. Definition Let A1 be a continuous time random process taking on a continuous set of values. The transition probability distribution function P(y,t\x,s), with s ^ t, is the distribution function of X(t) conditioned on the event X(s) = x. 232 Stochastic processes Thus, P(y, t\x,s) = Pr{X(t) 0, we have to integrate over all possible initial values x, weighted with f(x)dx and with the probability of a transition from x to y: Pr{y 0} directly, or one may work with the transition probability functions. The latter approach is called the analytical approach and is often more useful than the direct approach because it involves solving differential equations, which is a long and much-studied discipline. The direct approach usually involves stochastic integrals which we shall consider in section 12.5. REFERENCES Hida, T. (1980). Brownian Motion. Springer-Verlag, New York. Jack, J.J.B., Redman, S.J. and Wong, K. (1981). The components of synaptic potentials evoked in cat spinal motoneurones by impulses in single group la afferents. J. Physiol. 321, 65-96. Papoulis, A. (1965). Probability, Random Variables and Stochastic Processes. McGraw-Hill, New York. Parzen, E. (1962). Stochastic Processes. Holden-Day, San Francisco. Pearl, R. and Reed, L.J. (1920). On the rate of growth of the population of the United States since 1790 and its mathematical representation. Proc. Natl. Acad. Sei. USA, 6, 275-288. Reichl, L.E. (1980). A Modern Course in Statistical Physics. University of Texas, Austin. Schiff, L.I. (1955). Quantum Mechanics. McGraw-Hill, New York. Thompson, J.R. and Brown, B.W. (eds) (1987). Cancer Modeling. Marcel Dekker, New York. Tuckwell, H.C. and Koziol, J.A. (1992). World population. Nature, 359, 200. Tuckwell, H.C. and Koziol, J.A. (1993). World and regional populations. BioSystems 31, 59-63. Verhulst, P.F. (1838). Notice sur la loi que la population suit dans son accroissement. Corr. Math. Phys., 10, 113-121. Wylie, CR. (1960). Advanced Engineering Mathematics. McGraw-Hill, New York. Yaglom, A.M. (1973). An Introduction to the Theory oj Stationary Random Functions. Dover, New York. EXERCISES 1. Prove that the solution of the logistic differential equation (11.3) is in fact given by (11.4). (Him: Separate the variables by putting the equation in the form f(y)dy = g(t)dt, and integrate each side.) Exercises 235 2. Show that a continuous time process with independent increments is a Markov process. {Hint: It will suffice to proceed as in Exercise 7.2; examine Pr(X(t3) = z\X(t2) = y,X(tl) = x), where t1