INTRODUCTION TO THE SERIES The Handbooks in Finance are intended to be a definitive source for comprehensive and accessible information in the field of finance. Each individual volume in the series should present an accurate self-contained survey of a sub-field of finance, suitable for use by finance and economics professors and lecturers, professional researchers, graduate students and as a teaching supplement. The goal is to have a broad group of outstanding volumes in various areas of finance. v Chapter 1 HEAVY TAILS IN FINANCE FOR INDEPENDENT OR MULTIFRACTAL PRICE INCREMENTS BENOIT B. MANDELBROT Sterling Professor of Mathematical Sciences, Yale University, New Haven, CT 065020-8283, USA Contents Abstract 4 1. Introduction: A path that led to model price by Brownian motion (Wiener or fractional) of a multifractal trading time 5 1.1. From the law of Pareto to infinite moment "anomalies" that contradict the Gaussian "norm" 5 1.2. A scientific principle: scaling invariance in finance 6 1.3. Analysis alone versus statistical analysis followed by synthesis and graphic output 7 1.4. Actual implementation of scaling invariance by multifractal functions: it requires additional assumptions that are convenient but not a matter of principle, for example, separability and compounding 7 2. Background: the Bernoulli binomial measure and two random variants: shuffled and canonical 8 2.1. Definition and construction of the Bernoulli binomial measure 8 2.2. The concept of canonical random cascade and the definition of the canonical binomial measure 9 2.3. Two forms of conservation: strict and on the average 9 2.4. The term "canonical" is motivated by statistical thermodynamics 10 2.5. In every variant of the binomial measure one can view all finite (positive or negative) powers together, as forming a single "class of equivalence" 10 2.6. The full and folded forms of the address plane 11 2.7. Alternative parameters 11 3. Definition of the two-valued canonical multifractals 11 3.1. Construction of the two-valued canonical multifractal in the interval [0,1] 11 3.2. A second special two-valued canonical multifractal: the unifractal measure on the canonical Cantor dust 12 3.3. Generalization of a useful new viewpoint: when considered together with their powers from - to , all the TVCM parametrized by either p or 1- p form a single class of equivalence 12 3.4. The full and folded address planes 12 3.5. Background of the two-valued canonical measures in the historical development of multifractals 13 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 2 B.B. Mandelbrot 4. The limit random variable = ([0,1]), its distribution and the star functional equation 13 4.1. The identity EM = 1 implies that the limit measure has the "martingale" property, hence the cascade defines a limit random variable = ([0,1]) 13 4.2. Questions 14 4.3. Exact stochastic renormalizability and the "star functional equation" for 14 4.4. Metaphor for the probability of large values of , arising in the theory of discrete time branching processes 14 4.5. To a large extent, the asymptotic measure of a TVCM is large if, and only if, the pre-fractal measure k([0,1]) has become large during the very first few stages of the generating cascade 15 5. The function (q): motivation and form of the graph 15 5.1. Motivation of (q) 15 5.2. A generalization of the role of : middle- and high-frequency contributions to microrandomness 15 5.3. The expected "partition function" Eq(dit) 16 5.4. Form of the (q) graph 17 5.5. Reducible and irreducible canonical multifractals 18 6. When u > 1, the moment Eq diverges if q exceeds a critical exponent qcrit satisfying (q) = 0; follows a power-law distribution of exponent qcrit 18 6.1. Divergent moments, power-law distributions and limits to the ability of moments to determine a distribution 18 6.2. Discussion 19 6.3. An important apparent "anomaly": in a TVCM, the q-th moment of may diverge 19 6.4. An important role of (q): if q > 1 the q-th moment of is finite if, and only if, (q) > 0; the same holds for (dt) whenever dt is a dyadic interval 19 6.5. Definition of qcrit; proof that in the case of TVCM qcrit is finite if, and only if, u > 1 20 6.6. The exponent qcrit can be considered as a macroscopic variable of the generating process 20 7. The quantity : the original Hölder exponent and beyond 21 7.1. The Bernoulli binomial case and two forms of the Hölder exponent: coarse-grained (or coarse) and fine-grained 21 7.2. In the general TVCM measure, = ~, and the link between "" and the Hölder exponent breaks down; one consequence is that the "doubly anomalous" inequalities min < 0, hence ~ < 0, are not excluded 22 8. The full function f () and the function () 23 8.1. The Bernoulli binomial measure: definition and derivation of the box dimension function f () 23 8.2. The "entropy ogive" function f (); the role of statistical thermodynamics in multifractals and the contrast between equipartition and concentration 23 8.3. The Bernoulli binomial measure, continued: definition and derivation of a function () = f () - 1 that originates as a rescaled logarithm of a probability 24 8.4. Generalization of () to the case of TVCM; the definition of f () as () + 1 is indirect but significant because it allows the generalized f to be negative 24 8.5. Comments in terms of probability theory 25 Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 3 8.6. Distinction between "center" and "tail" theorems in probability 26 8.7. The reason for the anomalous inequalities f () < 0 and < 0 is that, by the definition of a random variable (dt), the sample size is bounded and is prescribed intrinsically; the notion of supersampling 26 8.8. Excluding the Bernoulli case p = 1/2, TVCM faces either one of two major "anomalies": for p > -1/2, one has f (min) = 1 + log2 p > 0 and f (max) = 1 + log2(1 - p) < 0; for p < 1/2, the opposite signs hold 27 8.9. The "minor anomalies" f (max) > 0 or f (min) > 0 lead to sample function with a clear "ceiling" or "floor" 27 9. The fractal dimension D = (1) = 2[-pulog2 u - (1 - p)v log2 v] and multifractal concentration 27 9.1. In the Bernoulli binomial measures weak asymptotic negligibility holds but strong asymptotic negligibility fails 28 9.2. For the Bernoulli or canonical binomials, the equation f () = has one and only one solution; that solution satisfies D > 0 and is the fractal dimension of the "carrier" of the measure 28 9.3. The notion of "multifractal concentration" 29 9.4. The case of TVCM with p < 1/2, allows D to be positive, negative, or zero 29 10. A noteworthy and unexpected separation of roles, between the "dimension spectrum" and the total mass ; the former is ruled by the accessible for which f () > 0, the latter, by the inaccessible for which f () < 0 30 10.1. Definitions of the "accessible ranges" of the variables: qs from q min to q max and s from min to max; the accessible functions (q) and f () 30 10.2. A confrontation 30 10.3. The simplest cases where f () > 0 for all , as exemplified by the canonical binomial 31 10.4. The extreme case where f () < 0 and < 0 both occur, as exemplified by TVCM when u > 1 31 10.5. The intermediate case where min > 0 but f () < 0 for some values of 31 11. A broad form of the multifractal formalism that allows < 0 and f () < 0 31 11.1. The broad "multifractal formalism" confirms the form of f () and allows f () < 0 for some 32 11.2. The Legendre and inverse Legendre transforms and the thermodynamical analogy 32 Acknowledgments 32 References 32 4 B.B. Mandelbrot Abstract This chapter has two goals. Section 1 sketches the history of heavy tails in finance through the author's three successive models of the variation of a financial price: mesofractal, unifractal and multifractal. The heavy tails occur, respectively, in the marginal distribution only (Mandelbrot, 1963), in the dependence only (Mandelbrot, 1965), or in both (Mandelbrot, 1997). These models increase in the scope of the "principle of scaling invariance", which the author has used since 1957. The mesofractal model is founded on the stable processes that date to Cauchy and Lévy. The unifractal model uses the fractional Brownian motions introduced by the author. By now, both are well-understood. To the contrary, one of the key features of the multifractals (Mandelbrot, 1974a, b) remains little known. Using the author's recent work, introduced for the first time in this chapter, the exposition can be unusually brief and mathematically elementary, yet covering all the key features of multifractality. It is restricted to very special but powerful cases: (a) the Bernoulli binomial measure, which is classical but presented in a little-known fashion, and (b) a new two-valued "canonical" measure. The latter generalizes Bernoulli and provides an especially short path to negative dimensions, divergent moments, and divergent (i.e., long range) dependence. All those features are now obtained as separately tunable aspects of the same set of simple construction rules. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 5 My work in finance is well-documented in easily accessible sources, many of them reproduced in Mandelbrot (1997 and also in 2001a, b, c, d). That work having expanded and been commented upon by many authors, a survey of the literature is desirable, but this is a task I cannot undertake now. However, it was a pleasure to yield to the entreaties of this Handbook's editors by a text in which a new technical contribution is preceded by an introductory sketch followed by a simple new presentation of an old feature that used to be dismissed as "technical", but now moves to center stage. The history of heavy tails in finance began in 1963. While acknowledging that the successive increments of a financial price are interdependent, I assumed independence as a first approximation and combined it with the principle of scaling invariance. This led to (Lévy) stable distributions for the price changes. The tails are very heavy, in fact, powerlaw distributed with an exponent < 2. The multifractal model advanced in Mandelbrot (1997) extends scale invariance to allow for dependence. Readily controllable parameters generate tails that are as heavy as desired and can be made to follow a power-law with an exponent in the range 1 < < . This last result, an essential one, involves a property of multifractals that was described in Mandelbrot (1974a, b) but remains little known among users. The goal of the example described after the introduction is to illustrate this property in a very simple form. 1. Introduction: A path that led to model price by Brownian motion (Wiener or fractional) of a multifractal trading time Given a financial price record P(t) and a time lag dt, define L(t,dt) = logP(t + dt) logP(t). The 1900 dissertation of Louis Bachelier introduced Brownian motion as a model of P(t). In later publications, however, Bachelier acknowledged that this is a very rough first approximation: he recognized the presence of heavy tails and did not rule out dependence. But until 1963, no one had proposed a model of the heavy tails' distribution. 1.1. From the law of Pareto to infinite moment "anomalies" that contradict the Gaussian "norm" All along, search for a model was inspired by a finding rooted in economics outside of finance. Indeed, the distribution of personal incomes proposed in 1896 by Pareto involved tails that are heavy in the sense of following a power-law distribution Pr{U > u} = u-. However, almost nobody took this income distribution seriously. The strongest "conventional wisdom" argument against Pareto was that the value = 1.7 that he claimed leads to the variance of U being infinite. Infinite moments have been a perennial issue both before my work and (unfortunately) ever since. Partly to avoid them, Pareto volunteered an exponential multiplier, resulting in Pr{U > u} = u- exp(-u). 6 B.B. Mandelbrot Also, Herbert A. Simon expressed a universally held view when he asserted in 1953 that infinite moments are (somehow) "improper". But in fact, the exponential multipliers are not needed and infinite moments are perfectly proper and have important consequences. In multifractal models, depending on specific features, variance can be either finite or infinite. In fact, all moments can be finite, or they can be finite only up to a critical power qcrit that may be 3, 4, or any other value needed to represent the data. Beginning in the late 1950s, a general theme of my work has been that the uses of statistics must be recognized as falling into at least two broad categories. In the "normal" category, one can use the Gaussian distribution as a good approximation, so that the common replacement of the term, "Gaussian", by "normal" is fully justified. To the contrary, in the category one can call "abnormal" or "anomalous", the Gaussian is very misleading, even as an approximation. To underline this distinction, I have long suggested ­ to little effect up to now ­ that the substance of the so-called ordinary central limit theorem would be better understood if it is relabeled as the center limit theorem. Indeed, that theorem concerns the center of the distribution, while the anomalies concern the tails. Following up on this vocabulary, the generalized central limit theorem that yields Lévy stable limits would be better understood if called a tail limit theorem. This distinction becomes essential in Section 8.5. Be that as it may, I came to believe in the 1950s that the power-law distribution and the associated infinite moments are key elements that distinguish economics from classical physics. This distinction grew by being extended from independent to highly dependent random variables. In 1997, it became ready to be phrased in terms of randomness and variability falling in one of several distinct "states". The "mild" state prevails for classical errors of observation and for sequences of near-Gaussian and near-independent quantities. To the contrary, phenomena that present deep inequality necessarily belong to the "wild" state of randomness. 1.2. A scientific principle: scaling invariance in finance A second general theme of my work is the "principle" that financial records are invariant by dilating or reducing the scales of time and price in ways suitably related to each other. There is no need to believe that this principle is exactly valid, nor that its exact validity could ever be tested empirically. However, a proper application of this principle has provided the basis of models or scenarios that can be called good because they satisfy all the following properties: (a) they closely model reality, (b) they are exceptionally parsimonious, being based on very few very general a priori assumptions, and (c) they are creative in the following sense: extensive and correct predictions arise as consequences of a few assumptions; when those assumptions are changed the consequences also change. By contrast, all too many financial models start with Brownian motion, then build upon it by including in the input every one of the properties that one wishes to see present in the output. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 7 1.3. Analysis alone versus statistical analysis followed by synthesis and graphic output The topic of multifractal functions has grown into a well-developed analytic theory, making it easy to apply the multifractal formalism blindly. But it is far harder to understand it and draw consequences from its output. In particular, statistical techniques for handling multifractals are conspicuous by their near-total absence. After they become actually available, their applicability will have to be investigated carefully. A chastening example is provided by the much simpler question of whether or not financial series exhibit global (long range) dependence. My claim that they do was largely based on R/S analysis which at this point relies heavily on graphical evidence. Lo (1991) criticized this conclusion very severely as being subjective. Also, a certain alternative test Lo described as "objective" led to a mixed pattern of "they do" and "they do not". This pattern being practically impossible to interpret, Lo took the position that the simpler outcome has not been shown wrong, hence one can assume that long range dependence is absent. Unfortunately, the "objective test" in question assumed the margins to be Gaussian. Hence, Lo's experiment did not invalidate my conclusion, only showed that the test is not robust and had repeatedly failed to recognize long range dependence. The proper conclusion is that careful graphic evidence has not yet been superseded. The first step is to attach special importance to models for which sample functions can be generated. 1.4. Actual implementation of scaling invariance by multifractal functions: it requires additional assumptions that are convenient but not a matter of principle, for example, separability and compounding By and large, an increase in the number and specificity in the assumptions leads to an increase in the specificity of the results. It follows that generality may be an ideal unto itself in mathematics, but in the sciences it competes with specificity, hence typically with simplicity, familiarity, and intuition. In the case of multifractal functions, two additional considerations should be heeded. The so-called multifractal formalism (to be described below) is extremely important. But it does not by itself specify a random function closely enough to allow analysis to be followed by synthesis. Furthermore, multifractal functions are so new that it is best, in a first stage, to be able to rely on existing knowledge while pursuing a concrete application. For these and related reasons, my study of multifractals in finance has relied heavily on two special cases. One is implemented by the recursive "cartoons" investigated in Mandelbrot (1997) and in much greater detail in Mandelbrot (2001c). The other uses compounding.This process begins with a random function F() in which the variable is called an "intrinsic time". In the key context of financial prices, is called "trading time". The possible functions F() include all the functions that have been previously used to model price variation. Foremost is the Wiener Brownian motion B(t) 8 B.B. Mandelbrot postulated by Bachelier. The next simplest are the fractional Brownian motion BH (t) and the Lévy stable "flight" L(t). A separate step selects for the intrinsic trading time a scale invariant random functions of the physical "clock time" t. Mandelbrot (1972) recommended for the function (t) the integral of a multifractal measure. This choice was developed in Mandelbrot (1997) and Mandelbrot, Calvet and Fisher (1997). In summary, one begins with two statistically independent random functions F() and (t), where (t) is non-decreasing. Then one creates the "compound" function F[(t)] = (t). Choosing F() and (t) to be scale-invariant insures that (t) will be scale-invariant as well. A limitation of compounding as defined thus far is that it demands independence of F and , therefore restricts the scope of the compound function. In a well-known special case called Bochner subordination, the increments of (t) are independent. As shown in Mandelbrot and Taylor (1967), it follows that B[(t)] is a Lévy stable process, i.e., the mesofractal model. This approach has become well-known. The tails it creates are heavy and do follow a power law distribution but there are at least two drawbacks. The exponent is at most 2, a clearly unacceptable restriction in many cases, and the increments are independent. Compounding beyond subordination was introduced because it allows to take any value > 1 and the increments to exhibit long term dependence. All this is discussed elsewhere (Mandelbrot, 1997 and more recent papers). The goal of the remainder of this chapter is to use a specially designed simple case to explain how multifractal measure suffices to create a power-law distribution. The idea is that L(t,dt) = d(t) where = BH [(t)]. Roughly, d(t) is |dBH |1/H . In the Wiener Brownian case, H = 1/2 and d is the "local variance". This is how a price that fluctuates up and down is reduced to a positive measure. 2. Background: the Bernoulli binomial measure and two random variants: shuffled and canonical The prototype of all multifractals is nonrandom: it is a Bernoulli binomial measure. Its well-known properties are recalled in this section, then Section 3 introduces a random "canonical" version. Also, all Bernoulli binomial measures being powers of one another, a broader viewpoint considers them as forming a single "class of equivalence". 2.1. Definition and construction of the Bernoulli binomial measure A multiplicative nonrandom cascade. A recursive construction of the Bernoulli binomial measures involves an "initiator" and a "generator". The initiator is the interval [0,1] on which a unit of mass is uniformly spread. This interval will recursively split into halves, yielding dyadic intervals of length 2-k. The generator consists in a single parameter u, variously called multiplier or mass. The first stage spreads mass over the halves of every dyadic interval, with unequal proportions. Applied to [0,1], it leaves the mass u in [0,1/2] Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 9 and the mass v in [1/2,1]. The (k + 1)-th stage begins with dyadic intervals of length 2-k, each split in two subintervals of length 2-k-1. A proportion equal to u goes to the left subinterval and the proportion v, to the right. After k stages, let 0 and 1 = 1 - 0 denote the relative frequencies of 0's and 1's in the finite binary development t = 0.12 ...k. The "pre-binomial" measures in the dyadic interval [dt] = [t,t + 2-k] takes the value k(dt) = uk0 vk1 , which will be called "pre-multifractal". This measure is distributed uniformly over the interval. For k , this sequence of measures k(dt) has a limit (dt), which is the Bernoulli binomial multifractal. Shuffled binomial measure. The proportion equal to u now goes to either the left or the right subinterval, with equal probabilities, and the remaining proportion v goes to the remaining subinterval. This variant must be mentioned but is not interesting. 2.2. The concept of canonical random cascade and the definition of the canonical binomial measure Mandelbrot (1974a, b) took a major step beyond the preceding constructions. The random multiplier M. In this generalization every recursive construction can be described as follows. Given the mass m in a dyadic interval of length 2-k, the two subintervals of length 2-k-1 are assigned the masses M1m and M2m, where M1 and M2 are independent realizations of a random variable M called multiplier. This M is equal to u or v with probabilities p = 1/2 and 1 - p = 1/2. The Bernoulli and shuffled binomials both impose the constraint that M1 +M2 = 1. The canonical binomial does not. It follows that the canonical mass in each interval of duration 2-k is multiplied in the next stage by the sum M1 + M2 of two independent realizations of M. That sum is either 2u (with probability p2), or 1 (with probability 2(1 - p)p), or 2v (with probability 1 - p2). Writing p instead of 1/2 in the Bernoulli case and its variants complicates the notation now, but will soon prove advantageous: the step to the TVCM will simply consist in allowing 0 < p < 1. 2.3. Two forms of conservation: strict and on the average Both the Bernoulli and shuffled binomials repeatedly redistribute mass, but within a dyadic interval of duration 2-k, the mass remains exactly conserved in all stages beyond the k-th. That is, the limit mass (t) in a dyadic interval satisfies k(dt) = (dt). In a canonical binomial, to the contrary, the sum M1 + M2 is not identically 1, only its expectation is 1. Therefore, canonical binomial construction preserve mass on the average, but not exactly. 10 B.B. Mandelbrot The random variable . In particular, the mass ([0,1]) is no longer equal to 1. It is a basic random variable denoted by and discussed in Section 4. Within a dyadic interval dt of length 2-k, the cascade is simply a reduced-scale version of the overall cascade. It transforms the mass k(dt) into a product of the form (dt) = k(dt)(dt) where all the (dt) are independent realizations of the same variable . 2.4. The term "canonical" is motivated by statistical thermodynamics As is well known, statistical thermodynamics finds it valuable to approximate large systems as juxtapositions of parts, the "canonical ensembles", whose energy only depends on a common temperature and not on the energies of the other parts. Microcanonical ensembles' energies are constrained to add to a prescribed total energy. In the study of multifractals, the use of this metaphor should not obscure the fact that the multiplication of canonical factors introduces strong dependence among (dt) for different intervals dt. 2.5. In every variant of the binomial measure one can view all finite (positive or negative) powers together, as forming a single "class of equivalence" To any given real exponent g = 1 and multipliers u and v corresponds a multiplier Mg that can take either of two values ug = ug with probability p, and vg = vg with probability 1 - p. The factor is meant to insure pug + (1 - p)vg = 1/2. Therefore, [pug + (1 - p)vg] = 1/2, that is, = 1/[2EMg]. The expression 2EMg will be generalized and encountered repeatedly especially through the expression (q) = -log2 puq + (1 - p)vq - 1 = -log2 2EMq . This is simply a notation at this point but will be justified in Section 5. It follows that = 2-(g), hence ug = ug 2(g) and vg = vg 2(g) . Assume u > v. As g ranges from 0 to , ug ranges from 1/2 to 1 and vg ranges from 1/2 to 0; the inequality ug > vg is preserved. To the contrary, as g ranges from 0 to , vg < ug. For example, g = -1 yields ug = 1/u 1/u + 1/v = v and vg = 1/v 1/v + 1/v = u. Thus, inversion leaves both the shuffled and the canonical binomial measures unchanged. For the Bernoulli binomial, it only changes the direction of the time axis. Altogether, every Bernoulli binomial measure can be obtained from any other as a reduced positive or negative power. If one agrees to consider a measure and its reduced powers as equivalent, there is only one Bernoulli binomial measure. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 11 In concrete terms relative to non-infinitesimal dyadic intervals, the sequences representing log for different values of g are mutually affine. Each is obtained from the special case g = 1 by a multiplication by g followed by a vertical translation. 2.6. The full and folded forms of the address plane In anticipation of TVCM, the point of coordinates u and v will be called the address of a binomial measure in a full address space. In that plane, the locus of the Bernoulli measures is the interval defined by 0 < v, 0 < u, and u + v = 1. The folded address space will be obtained by identifying the measures (u,v) and (v,u), and representing both by one point. The locus of the Bernoulli measures becomes the interval defined by the inequalities 0 < v < u and u + v = 1. 2.7. Alternative parameters In its role as parameter added to p = 1/2, one can replace u by the ("informationtheoretical") fractal dimension D = -ulog2 u - v log2 v which can be chosen at will in this open interval ]0,1[. The value of D characterizes the "set that supports" the measure. It received a new application in the new notion of multifractal concentration described in Mandelbrot (2001c). More generally, the study of all multifractals, including the Bernoulli binomial, is filled with fractal dimensions of many other sets. All are unquestionably positive. One of the newest features of the TVCM will prove to be that they also allow negative dimensions. 3. Definition of the two-valued canonical multifractals 3.1. Construction of the two-valued canonical multifractal in the interval [0,1] The TVCM are called two-valued because, as with the Bernoulli binomial, the multiplier M can only take 2 possible values u and v. The novelties are that p need not be 1/2, the multipliers u and v are not bounded by 1, and the inequality u + v = 1 is acceptable. For u + v = 1, the total mass cannot be preserved exactly. Preservation on the average requires EM = pu + (1 - p)v = 1 2 , hence 0 < p = (1/2 - v)/(u - v) < 1. The construction of TVCM is based upon a recursive subdivision of the interval [0,1] into equal intervals. The point of departure is, once again, a uniformly spread unit mass. The first stage splits [0,1] into two parts of equal lengths. On each, mass is poured uniformly, with the respective densities M1 and M2 that are independent copies of M. The second stage continues similarly with the interval [0,1/2] and [1/2,1]. 12 B.B. Mandelbrot 3.2. A second special two-valued canonical multifractal: the unifractal measure on the canonical Cantor dust The identity EM = 1/2 is also satisfied by u = 1/2p and v = 0. In this case, let the lengths and number of non-empty dyadic cells after k stages be denoted by t = 2-k and Nk. The random variable Nk follows a simple birth and death process leading to the following alternative. When p > 1/2, ENk = (EN1)k = (2p)k = (dt)log(2p). To be able to write ENk = (dt)-D, it suffices to introduce the exponent D = -log(2p). It satisfies D > 0 and defines a fractal dimension. When p < 1/2, to the contrary, the number of non-empty cells almost surely vanishes asymptotically. At the same time, the formal fractal dimension D = -log(2p) satisfies D < 0. 3.3. Generalization of a useful new viewpoint: when considered together with their powers from - to , all the TVCM parametrized by either p or 1 - p form a single class of equivalence To take the key case, the multiplier M-1 takes the values u-1 = 1/u 2(p/u + (1 - p)/v) = v 2(v + u) - 1 and v-1 = u 2(v + u) - 1 . It follows that pu-1 + (1 - p)v-1 = 1/2 and u-1/v-1 = v/u. In the full address plane, the relations imply the following: (a) the point (u-1,v-1) lies on the extension beyond (1/2,1/2) of the interval from (u,v) to (1/2,1/2) and (b) the slopes of the intervals from 0 to (u,v) and from 0 to (u-1,v-1) are inverse of one another. It suffices to fold the full phase diagram along the diagonal to achieve v > u. The point (u-1,v-1) will be the intersection of the interval corresponding to the probability 1 -p and of the interval joining 0 to (u,v). 3.4. The full and folded address planes In the full address plane, the locus of all the points (u,v) with fixed p has the equation pu + (1 - p)v = 1/2. This is the negatively sloped interval joining the points (0,1/2p) and ([1/2(1 - p)],0). When (u,v) and (v,u) are identified, the locus becomes the same interval plus the negatively sloped interval from [0,1/2(1 - p)] to (1/2p,0). In the folded address plane, the locus is made of two shorter intervals from (1,1) to both (1/2p,0) and ([1/2(1 - p)],0). In the special case u + v = 1 corresponding to p = 1/2, the two shorter intervals coincide. Those two intervals correspond to TVCM in the same class of equivalence. Starting from an arbitrary point on either interval, positive moments correspond to points to the same interval and negative moments, to points of the other. Moments for g > 1 correspond to points to the left on the same interval; moments for 0 < g < 1, to points to the right on the same interval; negative moments to points on the other interval. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 13 For p = 1/2, the class of equivalence of p includes a measure that corresponds to u = 1 and v = [1/2 - min(p,1 - p)]/[max(p,1 - p)]. This novel and convenient universal point of reference requires p = 1/2. In terms to be explained below, it corresponds to min = -logu = 0. 3.5. Background of the two-valued canonical measures in the historical development of multifractals The construction of TVCM is new but takes a well-defined place among the three main approaches to the development of a theory of multifractals. General mathematical theories came late and have the drawback that they are accessible to few non-mathematicians and many are less general than they seem. The heuristic presentation in Frisch and Parisi (1985) and Halsey et al. (1986) came after Mandelbrot (1974a, b) but before most of the mathematics. Most importantly for this paper's purpose, those presentations fail to include significantly random constructions, hence cannot yield measures following the power law distribution. Both the mathematical and the heuristic approaches seek generality and only later consider the special cases. To the contrary, a third approach, the first historically, began in Mandelbrot (1974a, b) with the careful investigation of a variety of special random multiplicative measures. I believe that each feature of the general theory continues to be best understood when introduced through a special case that is as general as needed, but no more. The general theory is understood very easily when it comes last. In pedagogical terms, the "third way" associates with each distinct feature of multifractals a special construction, often one that consists of generalizing the binomial multifractal in a new direction. TVCM is part of a continuation of that effective approach; it could have been investigated much earlier if a clear need had been perceived. 4. The limit random variable = ([0,1]), its distribution and the star functional equation 4.1. The identity EM = 1 implies that the limit measure has the "martingale" property, hence the cascade defines a limit random variable = ([0,1]) We cannot deal with martingales here, but positive martingales are mathematically attractive because they converge (almost surely) to a limit. But the situation is complicated because the limit depends on the sign of D = 2[-pulog2 u - (1 - p)v log2 v]. Under the condition D > 0, which is discussed in Section 9, what seemed obvious is confirmed: Pr{ > 0} > 0, conservation on the average continues to hold as k , and is either non-random, or is random and satisfies the identity E = 1. But if D < 0, one finds that = 0 almost surely and conservation on the average holds for finite k but fails as k . The possibility that = 0 arose in mathematical esoterica and seemed bizarre, but is unavoidably introduced into concrete science. 14 B.B. Mandelbrot 4.2. Questions (A) Which feature of the generating process dominates the tail distribution of ? It is shown in Section 6 to be the sign of max(u,v) - 1. (B) Which feature of the generating process allows to have a high probability of being either very large or very small? Section 6 will show that the criterion is that the function (q) becomes negative for large enough q. (C) Divide [0,1] into 2k intervals of length 2-k. Which feature of the generating process determines the relative distribution of the overall among those small intervals? This relative distribution motivated the introduction of the functions f () and (), and is discussed in Section 8. (D) Are the features discussed under (B) and (C) interdependent? Section 10 will address this issue and show that, even when has a high probability of being large, its value does not affect the distribution under (C). 4.3. Exact stochastic renormalizability and the "star functional equation" for Once again, the masses in [0,1/2] and [1/2,1] take, respectively, the forms M11 and M22, where M1 and M2 are two independent realizations of the random variable M and 1, and 2 are two independent realizations of the random variable . Adding the two parts yields 1M1 + 2M2. This identity in distribution, now called the "star equation", combines with E = 1 to determine . It was introduced in Mandelbrot (1974a, b) and has since then been investigated by several authors, for example by Durrett and Liggett (1983). A large bibliography is found in Liu (2002). In the special case where M is non-random, the star equation reduces to the equation due to Cauchy whose solutions have become well-known: they are the Cauchy­Lévy stable distributions. 4.4. Metaphor for the probability of large values of , arising in the theory of discrete time branching processes A growth process begins at t = 0 with a single cell. Then, at every integer instant of time, every cell splits into a random non-negative number of N1 cells. At time k, one deals with a clone of Nk cells. All those random splittings are statistically independent and identically distributed. The normalized clone size, defined as Nk/ENk 1 has an expectation equal to 1. The sequence of normalized sizes is a positive martingale, hence (as already mentioned) converges to a limit random variable. When EN > 1, that limit does not reduce to 0 and is random for a very intuitive reason. As long as clone size is small, its growth very much depends on chance, therefore Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 15 the normalized clone size is very variable. However, after a small number of splittings, a law of large numbers comes into force, the effects of chances become negligible, and the clone grows near-exponentially. That is, the randomness in the relative number of family members can be very large but acts very early. 4.5. To a large extent, the asymptotic measure of a TVCM is large if, and only if, the pre-fractal measure k([0,1]) has become large during the very first few stages of the generating cascade Such behavior is suggested by the analogy to a branching process, and analysis shows that such is indeed the case. After the first stage, the measures 1([0,1/2]) and 1([1/2,1]) are both equal to u2 with probability p2, uv with probability 2p(1 - p), and v2 with probability (1 - p)2. Extensive simulations were carried out for large k in "batches", and the largest, medium, and smallest measure was recorded for each batch. Invariably, the largest (resp., smallest) started from a high (resp., low) overall level. 5. The function (q): motivation and form of the graph So far (q) was nothing but a notation. It is important as it is the special form taken for TVCM by a function that was first defined for an arbitrary multiplier in Mandelbrot (1974a, b). (Actually, the little appreciated Figure 1 of that original paper did not include q < 0 and worked with -(q), but the opposite sign came to be generally adopted.) 5.1. Motivation of (q) After k cascade stages, consider an arbitrary dyadic interval of duration dt = 2-k. For the k-approximant TVCM measure k(dt) the q-th power has an expected value equal to [puq + (1 - p)vq]k = {EMq}k. Its logarithm of base 2 is log2 puq + (1 - p)vq k = k log2 puq + (1 - p)vq = log2(dt) (q) + 1 . Hence E q k (dt) = (dt)(q)+1 . 5.2. A generalization of the role of : middle- and high-frequency contributions to microrandomness Exactly the same cascade transforms the measure in dt from k(dt) to (dt) and the measure in [0,1] from 1 to . Hence, one can write (dt) = k(dt)(dt). 16 B.B. Mandelbrot Fig. 1. The full phase diagram of TVCM with coordinates u and v. The isolines of the quantity p are straight intervals from (1/{2(1 - p)},0) to (0,1/{2p}). The values p and 1 - p are equivalent and the corresponding isolines are symmetric with respect to the main bisector u = v. The acceptable part of the plane excludes the points (u,v) such that either max(u,v) < 1/2 or min(u,v) > 1/2. Hence, the relevant part of this diagram is made of two infinite halfstrips reducible to one another by folding along the bisector. The folded phase diagram of TVCM corresponds to v < 0.5 < u. It shows the following curves. The isolines of 1 - p and p are straight intervals that start at the point (1,1) and end at the points (1/{2p},0) and (1/{2(1 - p)},0). The isolines of D start on the interval 1/2 < u < 1 of the u-axis and continue to the point (,0). The isolines of qcrit start at the point (1,0) and continue to the point (,0). The Bernoulli binomial measure corresponds to p = 1/2 and the canonical Cantor measure corresponds to the half line v = 0, u > 1/2. In this product, frequencies of wavelength > dt, to be described as "low", contribute k([0,1]), and frequencies of wavelength < dt, to be described as "high", contribute . 5.3. The expected "partition function" Eq(dit) Section 6 will show that Eq need not be finite. But if it is, the limit measure (dt) = k(dt)(dt) satisfies Eq (dt) = (dt)(q)+1 Eq . The interval [0,1] subdivides into 1/dt intervals dit of common length dt. The sum of the q-th moments over those intervals takes the form E(dt) = Eq (dit) = (dt)(q) Eq . Estimation of (q) from a sample. It is affected by the prefactor insofar as one must estimate both (q) and logEq. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 17 5.4. Form of the (q) graph Due to conservation on the average, EM = pu + (1 - p)v = 1/2, hence (1) = -log2[1/2] - 1 = 0. An additional universal value is (0) = -log2(1) - 1 = -1. For other values of q, (q) is a cap-convex continuous function satisfying (q) < -1 for q < 0. For TVCM, a more special property is that (q) is asymptotically linear: assuming u > v, and letting q : (q) -log2 p - 1 - q logu and (-q) -log2(1 - p) - 1 + q logv. The sign of u - 1 affects the sign of logu, a fact that will be very important in Section 6. Moving as little as possible beyond these properties. The very special tau function of the TVCM is simple but Figure 2 suffices to bring out every one of the delicate possibilities first reported in Mandelbrot (1974a), where -(q) is plotted in that little appreciated Figure 1. Other features of that deserve to be mentioned. Direct proofs are tedious and the short proofs require the multifractal formalism that will only be described in Section 11. Fig. 2. The function (q) for p = 3/4 and varying g. By arbitrary choice, the value g = 1 is assigned u = 1, from which follows that g = -1 is assigned to the case v = 1. Behavior of (q) for the value g > 0: as q -, the graph of (q) is asymptotically tangent to = -q log2 v, as q , the graph of (q) is asymptotically tangent to = -q log2 u. Those properties are widely believed to describe the main facts about (q). But for TVCM they do not. Thus, (q) is also tangent to = q max and = q min. Beyond those points of tangency, f becomes < 0. For g > 1, that is, for u > 1, (q) has a maximum. Values of q beyond this maximum correspond to min < 0. Because of the capconvexity of (q), the equation (q) = 0 may, in addition to the "universal" value q = 1, have a root qcrit > 1. For u > 2.5, one deals with a very different phenomenon also first described in Mandelbrot (1974a, b). One finds that the construction of TVCM leads to a measure that degenerates to 0. 18 B.B. Mandelbrot The quantity D(q) = (q)/(q - 1). This popular expression is often called a "generalized dimension", a term too vague to mean anything. D(q) is obtained by extending the line from (q,) to (1,0) to its intercept with the line q = 0. It plays the role of a critical embedding codimension for the existence of a finite q-th moment. This topic cannot be discussed here but is treated in Mandelbrot (2003). The ratio (q)/q and the "accessible" values of q. Increase q from - to 0 then to +. In the Bernoulli case, (q)/q increases from max to , jumps down to - for q = 0, then increases again from - to min. For TVCM with p = 1/2, the behavior is very different. For example, let p < 1/2. As q increases from 1 to , (q) increases from 0 to a maximum max, then decreases. In a way explored in Section 10, the values of > max are not "accessible". 5.5. Reducible and irreducible canonical multifractals Once again, being "canonical" implies conservation on the average. When there exists a microcanonical (conservative) variant having the same function f (), a canonical measure can be called "reducible". The canonical binomial is reducible because its f () is shared by the Bernoulli binomial. Another example introduced in Mandelbrot (1989b) is the "Erice" measure, in which the multiplier M is uniformly distributed on [0,1]. But the TVCM with p = 1/2 is not reducible. In the interval [0,1] subdivided in the base b = 2, reducibility demands a multiplier M whose distribution is symmetric with respect to M = 1/2. Since u > 0, this implies u < 1. 6. When u > 1, the moment Eq diverges if q exceeds a critical exponent qcrit satisfying (q) = 0; follows a power-law distribution of exponent qcrit 6.1. Divergent moments, power-law distributions and limits to the ability of moments to determine a distribution This section injects a concern that might have been voiced in Sections 4 and 5. The canonical binomial and many other examples satisfy the following properties, which everyone takes for granted and no one seems to think about: (a) = 1, Eq < , (b) (q) > 0 for all q > 0, and (c) (q)/q increases monotonically as q . Many presentations of fractals take those properties for granted in all cases. In fact, as this section will show, the TVCM with u > 1 lead to the "anomalous" divergence Eq = and the "inconceivable" inequality (q) < 0 for qcrit < q < . Also, the monotonicity of (q)/q fails for all TVCM with p = 1/2. Since Pareto in 1897, infinite moments have been known to characterize the power-law distributions of the form Pr{X > x} = x-qcrit . But in the case of TVCM and other canonical multifractals, the complicating factor L(x) is absent. One finds that when u > 1, the overall measure follows a power law of exponent qcrit determined by (q). Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 19 6.2. Discussion The power-law "anomalies" have very concrete consequences deduced in Mandelbrot (1997) and discussed, for example, in Mandelbrot (2001c). But does all this make sense? After all, (q) and Eq are given by simple formulas and are finite for all parameters. The fact that those values cannot actually be observed raises a question. Are high moments lost by being unobservable? In fact, they are "latent" but can be made "actual" by a process is indeed provided by the process of "embedding" studied elsewhere. An additional comment is useful. The fact that high moments are non-observable does not express a deficiency of TVCM but a limitation of the notion of moment. Features ordinarily expressed by moments must be expressed by other means. 6.3. An important apparent "anomaly": in a TVCM, the q-th moment of may diverge Let us elaborate. From long past experience, physicists' and statisticians' natural impulse is to define and manipulate moments without envisioning or voicing the possibility of their being infinite. This lack of concern cannot extend to multifractals. The distribution of the TVCM within a dyadic interval introduces an additional critical exponent qcrit that satisfies qcrit > 1. When 1 < qcrit < , which is a stronger requirement that D > 0, the q-th moment of (dt) diverges for q > qcrit. A stronger result holds: the TVCM cascade generates a measure whose distribution follows the power law of exponent qcrit. Comment. The heuristic approach to non-random multifractals fails to extend to random ones, in particular, it fails to allow qcrit < . This makes it incomplete from the viewpoint of finance and several other important applications. The finite qcrit has been around since Mandelbrot (1974a, b) (where it is denoted by ) and triggered a substantial literature in mathematics. But it is linked with events so extraordinarily unlikely as to appear incapable of having any perceptible effect on the generated measure. The applications continue to neglect it, perhaps because it is ill-understood. A central goal of TVCM is to make this concept well-understood and widely adopted. 6.4. An important role of (q): if q > 1 the q-th moment of is finite if, and only if, (q) > 0; the same holds for (dt) whenever dt is a dyadic interval By definition, after k levels of iteration, the following symbolic equality relates independent realizations of M and . That is, it does not link random variables but distributions k [0,1] = Mk-1 [0,1] + Mk-1 [0,1] . Conservation on the average is expressed by the identity Ek-1([0,1]) = 1. In addition, we have the following recursion relative to the second moment. E2 [0,1] = 2EM2 E2 k-1 [0,1] + 2EM2 Ek-1 [0,1] 2 . 20 B.B. Mandelbrot The second term to the right reduces to 1/2. Now let k . The necessary and sufficient condition for the variance of k([0,1]) to converge to a finite limit is 2 EM2 < 1 in other words (2) = -log2 EM2 - 1 > 0. When such is the case, Kahane and Peyrire (1976) gave a mathematically rigorous proof that there exists a limit measure ([0,1]) satisfying the formal expression E2 [0,1] = 1 2(1 - 2(2)) . Higher integer moments satisfy analogous recursion relations. That is, knowing that all moments of order up to q - 1 are finite, the moment of order q is finite if and only if (q) > 0. The moments of non-integer order q are more delicate to handle, but they too are finite if, and only if, (q) > 0. 6.5. Definition of qcrit; proof that in the case of TVCM qcrit is finite if, and only if, u > 1 Section 5.4 noted that the graph of (q) is always cap-convex and for large q > 0, (q) -log2 puq + -1 -log2 p - 1 - q log2 u. The dependence of (q) on q is ruled by the sign of u - 1, as follows. * The case when u < 1, hence min > 0. In this case, (q) is monotone increasing and (q) > 0 for q > 1. This behavior is exemplified by the Bernoulli binomial. * The case when u > 1, hence min < 0. In this case, one has (q) < 0 for large q. In addition to the root q = 1, the equation (q) = 1 has a second root that is denoted by qcrit. Comment. In terms of the function f () graphed on Figure 3, the values 1 and qcrit are the slopes of the two tangents drawn to f () from the origin (0,0). Within the class of equivalence of any p and 1 - p; the parameter g can be "tuned" so that qcrit begins by being > 1 then converges to 1; if so, it is seen that D converges to 0. * Therefore, the conditions qcrit = 1 and D = 0 describe the same "anomaly". In Figure 1, isolines of qcrit are drawn for qcrit = 1,2,3, and 4. When q = 1 is the only root, it is convenient to say that qcrit = . This isoset qcrit = is made of the half-line {v = 1/2 and u > 1/2} and of the square {0 < v < 1/2,1/2 < u < 1}. 6.6. The exponent qcrit can be considered as a macroscopic variable of the generating process Any set of two parameters that fully describes a TVCM can be called "microscopic". All the quantities that are directly observable and can be called macroscopic are functions of those two parameters. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 21 Fig. 3. The functions f () for p = 3/4 and varying g. All those graphs are linked by horizontal reductions or dilations followed by translation and further self-affinity. It is widely anticipated that f () > 0 holds in all cases, but for the TVCM this anticipation fails, as shown in this figure. For g > 0 (resp., g < 0) the left endpoint of f () (resp., the right endpoint) satisfies f () < 0 and the other endpoint, f () > 0. For the general canonical multifractal, a full specification requires a far larger number of microscopic quantities but the same number of macroscopic ones. Some of the latter characterize each sample, but others, for example qcrit, characterize the population. 7. The quantity : the original Hölder exponent and beyond The multiplicative cascades ­ common to the Bernoulli and canonical binomials and TVCM ­ involve successive multiplications. An immediate consequence is that both the basic (dt) and its probability are most intrinsically viewed through their logarithms. A less obvious fact is that a normalizing factor 1/log(dt) is appropriate in each case. An even less obvious fact is that the normalizations log/logdt and logP/logdt are of far broader usefulness in the study of multifractals. The exact extend of their domain of usefulness is beyond the goal of this chapter, but we keep some special cases that can be treated fully by elementary arguments. 7.1. The Bernoulli binomial case and two forms of the Hölder exponent: coarse-grained (or coarse) and fine-grained Recall that due to conservation, the measure in an interval of length dt = 2-k is the same after k stages and in the limit, namely, (dt) = k(dt). As a result, the coarse-grained Hölder exponent can be defined in either of two ways, (dt) = log(dt) log(dt) and ~(dt) = logk(dt) log(dt) . 22 B.B. Mandelbrot The distinction is empty in the Bernoulli case but prove prove essential for the TVCM. In terms of the relative frequencies 0 and 1 defined in Section 2.1, (dt) = ~(dt) = (0,1) = -0 log2 u - 1 log2 v = -0(log2 u - log2 v) - logv. Since u > v, one has 0 < min = -log2 u = ~ max = -log2 v < . In particular, > 0, hence ~ > 0. As dt 0, so does (dt), and a formal inversion of the definition of yields (dt) = (dt) . This inversion reveals an old mathematical pedigree. Redefine 0 and 1 from denoting the finite frequencies of 0 and 1 in an interval, into denoting the limit frequencies at an instant t. The instant t is the limit of an infinite sequence of approximating intervals of duration 2-k. The function ([0,t]) is non-differentiable because limdt0 (dt)/dt is not defined and cannot serve to define the local density of at the instant dt. The need for alternative measures of roughness of a singularity expression first arose around 1870 in mathematical esoterica due to L. Hölder. In fractal/multifractal geometry this expression merged with a very concrete exponent due to H.E. Hurst and is continually being generalized. It follows that for the Bernoulli binomial measure, it is legitimate to interpret the coarse s as finite-difference surrogates of the local (infinitesimal) Hölder exponents. 7.2. In the general TVCM measure, = ~, and the link between "" and the Hölder exponent breaks down; one consequence is that the "doubly anomalous" inequalities min < 0, hence ~ < 0, are not excluded A Hölder (Hurst) exponent is necessarily positive. Hence negative ~s cannot be interpreted as Hölder exponents. Let us describe the heuristic argument that leads to this paradox and then show that ~ < 0 is a serious "anomaly": it shows that the link between "some kind of " and the Hölder exponent requires a searching look. The resolution of the paradox is very subtle and is associated with the finite qcrit introduced in Section 6.5. Once again, except in the Bernoulli case, = 1 and (dt) = k(dt)(dt), hence (dt) = ~(dt) + log(dt) logdt. In the limit dt 0 the factor log = /log(dt) tends to 0, hence it seems that = ~. Assume u > 1, hence min < 0 and consider an interval where ~(dt) < 0. The formal equality "k(dt) = (dt)~ " Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 23 seems to hold and to imply that "the" mass in an interval increases as the interval length 0. On casual inspection, this is absurd. On careful inspection, it is not ­ simply because the variable dt = 2-k and the function k(dt) both depend on k. For example, consider the point t for which 0 = 1. Around this point, one has k = uk-1 > k-1. This inequality is not paradoxical. Furthermore, Section 8 shows that the theory of the multiplicative measures introduces ~ intrinsically and inevitably and allows ~ < 0. Those seemingly contradictory properties will be reexamined in Section 9. Values of (dt) will be seen to have a positive probability but one so minute that they can never be observed in the way > 0 are observed. But they affect the distribution of the variable examined in Section 4, therefore are observed indirectly. 8. The full function f () and the function () 8.1. The Bernoulli binomial measure: definition and derivation of the box dimension function f () The number of intervals of denumerator 2-k leading to 0 and 1 is N(k,0,1) = k!/(k0)!(k1)!, and dt is the reduction ratio r from [0,1] to an interval of duration dt. Therefore, the expression f (k,0,1) = - logN(k,0,1) log(dt) = - log[k!/(k0)!(k1)!] log(dt) is of the form f (k,0,1) = -logN/logr. Fractal geometry calls this the "box similarity dimension" of a set. This is one of several forms taken by fractal dimension. More precisely, since the boxes belong to a grid, it is a grid fractal dimension. The dimension function f (). For large k, the leading term in the Stirling approximation of the factorial yields lim k f (k,0,1) = f (0,1) = -0 log2 0 - 1 log2 1. 8.2. The "entropy ogive" function f (); the role of statistical thermodynamics in multifractals and the contrast between equipartition and concentration Eliminate 0 and 1 between the functions f and = -0 logu - 1 logv. This yields in parametric form a function, f (). Note that 0 f () min{,1}. Equality to the right is achieved when 0 = u. The value where f = is very important and will be discussed in Section 9. In terms of the reduced variable 0 = ( - min)/(max - min), the function f () becomes the "ogive" ~f (0) = -0 log2 0 - (1 - 0)log2(1 - 0). 24 B.B. Mandelbrot This ~f (0) can be called a universal function. The f () corresponding to fixed p and varying g are affine transforms of ~f (0), therefore of one another. The ogive function ~f first arose in thermodynamics as an entropy and in 1948 (with Shannon) entered communication theory as an information. Its occurrence here is the first of several roles the formalism of thermodynamics plays in the theory of multifractals. An essential but paradoxical feature. Equilibrium thermodynamics is a study of various forms of near-equality, for example postulates the equipartition of states on a surface in phase space or of energy among modes. In sharp contrast, multifractals are characterized by extreme inequality between the measures in different intervals of common duration dt. Upon more careful examination, the paradox dissolves by being turned around: the main tools of thermodynamics can handle phenomena well beyond their original scope. 8.3. The Bernoulli binomial measure, continued: definition and derivation of a function () = f () - 1 that originates as a rescaled logarithm of a probability The function f () never fully specifies the measure. For example, it does not distinguish between the Bernoulli, shuffled and canonical binomials. The function f () can be generalized by being deduced from a function () = f ()-1 that will now be defined. Instead of dimensions, that deduction relies on probabilities. In the Bernoulli case, the derivation of is a minute variant of the argument in Section 8.1, but, contrary to the definition of f , the definition of easily extends to TVCM and other random multifractals. In the Bernoulli binomial case, the probability of hitting an interval leading to 0 and 1 is simply P(k,0,) = N(k,0,1)2-k = k!/(k0)!(k1)!2-k. Consider the expression (k,0,1) = - log[P(k,0,1)] log(dt) , which is a rescaled but not averaged form of entropy. For large k, Stirling yields lim k (k,0,1) = (0,1) = -0 log2 0 - 1 log2 1 - 1 = f () - 1. 8.4. Generalization of () to the case of TVCM; the definition of f () as () + 1 is indirect but significant because it allows the generalized f to be negative Comparing the arguments in Sections 8.1 and 8.2 link the concepts of fractal dimension and of minus log (probability). However, when f () is reported through f () = ()+1, the latter is not a mysterious "spectrum of singularities". It is simply the peculiar but proper way a probability distribution must be handled in the case of multifractal measures. Moreover, there is a major a priori difference exploited in Section 10. Minus log (probability) is not subjected to any bound. To the contrary, every one of the traditional definitions of fractal dimension (including Hausdorff­Besicovitch or Minkowski­Bouligand) necessarily yields a positive value. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 25 The point is that the dimension argument in Section 8.1 does not carry over to TVCM, but the probability argument does carry over as follows. The probability of hitting an interval leading to 0 and 1 now changes to P(k,0,1) = p(0k)!/(k0)!(k1)! One can now form the expression (k,0,1) = - log[P(k,0,1)] log(dt) . Stirling now yields (0,1) = lim k (k,0,1) = {-0 log2 0 - 1 log2 1} + 0 log2 p + 1 log2(1 - p) . In this sum of two terms marked by braces, we know that the first one transforms (by horizontal stretching and translation) into the entropy ogive. The second is a linear function of , namely 0[log2 p - log2(1 - p)] + log2(1 - p). It transforms the entropy ogive by an affinity in which the line joining the two support endpoints changes from horizontal to inclined. The overall affinity solely depends on p, but 0 depends explicitly on u and v. This affinity extends to all values of p. Another property familiar from the binomial extends to all values of p. For all u and v, the graphs of (), hence of f () have a vertical slope for q = . Alternatively, (0,1) = -0 log2[0/p] - 1 log2[1/(1 - p)]. 8.5. Comments in terms of probability theory Roughly speaking, the measure is a product of random variables, while the limit theorems of probability theory are concerned with sums. The definition of as log(dt)/log(dt) replaces a product of random variables M by a weighted sum of random variables of the form logM. Let us now go through this argument step by step in greater rigor and generality. One needs a cumbersome restatement of k(dt). The low frequency factor of k(dt) and the random variable Hlow. Consider once again a dyadic cell of length 2-k that starts at t = 0.12 ...k. The first k stages of the cascade can be called of low frequency because they involve multipliers that are constant over dyadic intervals of length dt = 2-k or longer. These stages yield k(dt) = M(1)M(1,2)M(1,...,k) = M. We transform k(dt) into the low frequency random variable Hlow = log[k(dt)] log(dt) = 1 k -log2 M(1) - log2 M(1,2) - . 26 B.B. Mandelbrot We saw in Section 4.5 that the first few values of M largely determine the distribution of . But the last expression involves an operation of averaging in which the first terms contributing to (dt) are asymptotically washed out. 8.6. Distinction between "center" and "tail" theorems in probability The quantity ~k(dt) = 0 log2 u - 1 log2 v is the average of a sum of variables -logM; but why is its distribution is not Gaussian and the graph of () is an entropy ogive rather than a parabola? Why is this so? The law of large numbers tells us that ~k(dt) almost surely converges to its expectation which tells us very little. A tempting heuristic argument continues as follows. The central limit theorem is believed to ensure that for small dt, Hlow(dt) becomes Gaussian, therefore the graph of logp(dt) should be expected to be a parabola. This being granted, why is it that the Stirling approximation yields an entropy ogive ­ not a parabola? In fact, there is no paradox of any kind. While the central limit theorem is indeed central to probability theory, all it asserts in this context is that, asymptotically, the Gaussian rules the center of the distribution, its "bell". Renormalizations reduce this center to the immediate neighborhood of the top of the () graph and the central limit theorem is correct in asserting that the top of the entropy ogive is locally parabolic. But in the present context this information is of little significance. We need instead an alternative that is only concerned with the tail behavior which it ought to blow up. For this and many other reasons, it would be an excellent idea to speak of center, not central limit theorem. The tail limit theorem is due to H. Cramer and asserts that the tail consisting in the bulk of the graph is not a parabola but an entropy ogive. 8.7. The reason for the anomalous inequalities f () < 0 and < 0 is that, by the definition of a random variable (dt), the sample size is bounded and is prescribed intrinsically; the notion of supersampling The inequality () < -1 characterizes events whose probability is extraordinarily small. The finding that this inequality plays a significant role was not anticipated, remains difficult to understand and appreciate, and demands comment. The common response is that even extremely low probability events are captured if one simply takes a sufficiently long sample of independent values. But this is impossible, even if one forgets that, in the present uncommon context, the values are extremely far from being statistically independent. Indeed, the choice the duration dt = 2-k has two effects. Not only does it fix the distribution of (dt), but it also sets the sample size at the value N = 1/dt = 2k. Roughly speaking, a sample of size N can only reveal values having a probability greater than 1/N, which means () > -1. In summary, it is true that decreasing dt to 2-k-1 increases the sample size. But it also changes the distribution and does so in such a way that the bound = -1 remains untouched. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 27 This bound excludes u items of information that correspond to f () < 0 (for example, the value of qcrit when finite). Those items remain hidden and latent in the sense that they cannot be inferred from one sample of values of (dt). Ways of revealing those values, supersampling and embedding, are examined in Mandelbrot (1989b, 1995) and forthcoming Mandelbrot (2003). Figure 3 shows, for p = 3/4, how the graph of f () depends on g. 8.8. Excluding the Bernoulli case p = 1/2, TVCM faces either one of two major "anomalies": for p > -1/2, one has f (min) = 1 + log2 p > 0 and f (max) = 1 + log2(1 - p) < 0; for p < 1/2, the opposite signs hold The fact that the values of (min) = f (min) - 1 and (max) = f (max) - 1 are logarithms of probabilities confirms and extends the definition of p() = f () - 1 as a limit rescaled probability. Here, those endpoint values of f () are independent of g and the affinity that deduces them from the entropy ogive (with ends on the horizontal axis) characterizes the class of equivalence of p and 1-p. If, and only if, p = 1/2 and u+v = 1, that is, in the familiar Bernoulli binomial case, one has (min) = (max) = log2(1/2) = -1 hence f (min) = f (max) = 0. When u + v = 1, one of the endpoints satisfies f > 0 and the other satisfies f < 0. Sections 8.9 and 10 shall examine the sharply differing consequences of those inequalities. 8.9. The "minor anomalies" f (max) > 0 or f (min) > 0 lead to sample function with a clear "ceiling" or "floor" Suppose that f (min) = 0 and f (max) = 0, as is the case for p = 1/2. Then, using terms often applied to the printed page ­ but after it has been turned 90 to the side ­ the sample functions are "non-justified" or "ragged" for both high and low values. That is, the values tend to be unequal; one is clearly larger than all others, a second is clearly the second largest, etc. To the contrary, TVCM with p = 1/2 yield either f (max) > 0 or f (min) > 0. Sample functions have a conspicuous "ceiling" (resp., a "floor"). That is, a largest (resp., smallest) value is attained repeatedly for values of t belonging to a set of positive dimension. To use the printers' vocabulary, when one side is "ragged" the other is "justified". On visual inspection of the data, the ceiling is always visible; the floor merges with the time axis, except when one plots log[(dt)]. 9. The fractal dimension D = (1) = 2[-pulog2 u - (1 - p)v log2 v] and multifractal concentration The function f () satisfies f () , with equality f () = when = D = (1). From the value of = D follows one of the most important properties of multifractals. Mandelbrot (2001d) proposed to call it "multifractal concentration". This section will first examine its opposite, which is asymptotic negligibility. 28 B.B. Mandelbrot 9.1. In the Bernoulli binomial measures weak asymptotic negligibility holds but strong asymptotic negligibility fails Recall that during construction, the total binomial measure of [0,1] remains constant and equal to 1. But the first few stages of construction make its distribution become very unequal and a few values that stand out as sharp spikes. After k stages, the maximum measure is uk, which is far larger than the minimum measure vk. From the relations 2-k = dt, 2k = N, -log2 u = min < 1, and - log2 v = min > 1, it follows that uk = b(-logb u)(-k) = (dt)min = N-min . In words: even the maximum uk tends to 0. This is a weak form of asymptotic negligibility following a power-law. The preceding result holds for every multifractal for which there is an min > 0 that plays the same role as in the binomial case. (In more general multifractals the same role is held by some min > max{min,0}.) Similarly, the total contribution of any fixed number of largest spikes is asymptotically negligible. 9.2. For the Bernoulli or canonical binomials, the equation f () = has one and only one solution; that solution satisfies D > 0 and is the fractal dimension of the "carrier" of the measure We now proceed to the total contribution of a number of spikes that is no longer fixed but increases with N. In the simplest of all possible worlds, many spikes would have been more or less equal to the largest, and the sum of all the other spikes would have been negligible. If so, the sum of Nmin spikes would have been of the order of Nmin N-min = 1. While the world is actually more complicated there is an element of orderliness. The equality 0 = u is achieved for = f () = -ulogu - v logv = D. For finite but large k, it follows that (k,0,1) 2-k = 2-kD and N(k10,1) 2kf () = 2kD . Hence, (k10, 1)N(k101) is approximately equal to 1. Actually, this product is necessarily 1 but the difference tends to 0 as k . That is, an increasingly overwhelming bulk of the measure tends to "concentrate" in the cells where = D. The remainder is small, but in the theory of multifractals even very small remainders are extremely significant for some purposes. Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 29 9.3. The notion of "multifractal concentration" A key feature of multifractals is a subtle interaction between number and size that is elaborated upon in Mandelbrot (2001d). Section 9.2 showed that the contributions that are large are too few to matter. The small contributions are very numerous, but so extremely small that their total contribution is negligible as well. The bulk of the measure is found in a rather inconspicuous intermediate range one can call "mass carrying". Since D > min, the ND spikes of size N-D are far smaller than the largest one. Separately, each is asymptotically negligible. But their number ND is exactly large enough to insure that their total contribution is nearly equal to the overall measure 1. When a sample is plotted, this range does not stand out but it makes a perfect match between size and frequency. Practically, the number of visible peaks is so small compared to ND that a combination of the peaks and the intermediate range is still of the order of ND. The combined range has the advantage of simplicity, since it includes the ND largest values. Note that the peaks tend to be located in the midst of stretches of values of intermediate size. 9.4. The case of TVCM with p < 1/2, allows D to be positive, negative, or zero Using the alternative expression for f () given in Section 8.4, the identity f () = demands the equality of the two expressions f () = -0 log2 0 p - 1 log2 1 1 - p and = -0 log2 u - 1 log2 v. The solution is, obviously, 0 = pu and 1 = (1 -p)v. The sum 1 +1 is 1, as it must. Hence, D = -pulog2 u - (1 - p)v log2 v, as announced. The novelty is that TVCM allow D > 0, D = 0, and D < 0. Familiar role of D under the inequality D > 0. Mandelbrot (1974a, b) obtained the following criterion, which has become widely known and includes the TVCM case. When positive, D is the fractal dimension of the "set that supports" the measure. Figure 1 shows isolines of D for D = 0,1/4,1/2, and 3/4. The isoline for D = 1 is made of the interval {u = 1, 0 < v < 1} and the half-line {v = 1, u 1}. The key result is that, contrary to the Bernoulli binomial case, the half line 1 < q < subdivides into up to three subranges of values. Largely unfamiliar consequence of the inequality D < 0. For all non-random multifractals, (1) > 0. A casual acquaintance with multifractals takes for granted that this is not changed by randomness. But Mandelbrot (1974a, b) also allows for an alternative possibility, which has so far remained little known. The example of TVCM shows that, in a canonical case, the formally evaluated D can be negative. In the example of TVCM, D is negative when the point (u,v) falls in a domain to the bottom right of the folded phase diagram in Figure 1. The consequences of D < 0 are drastic: the multifractal reduces to 0 almost surely and is called degenerate. A classical "pathological limit" as metaphor. This limit behavior of the distribution of seems incompatible with the fact that E = 1 by definition. But in fact, no contradiction 30 B.B. Mandelbrot is observed. A convincing idea of the distribution is provided for each p, by the behavior of the g limit of the weights ug2(g) and vg2(g). This recalls a classical counterexample of analysis, namely, the behavior for k of the variable Pk defined as follows: Pk = k with the probability 1/k and Pk = 0 with the probability 1 - 1/k. For finite k, one has EPk = 1. But in the limit k , P = 0, hence EP = 0, so that in the limit the expectation drops discontinuously from 1 to 0. In practice, the preasymptotic measure is extremely small with a high probability and huge with a tiny probability. The condition D = 0. It defines the threshold of degeneracy. 10. A noteworthy and unexpected separation of roles, between the "dimension spectrum" and the total mass ; the former is ruled by the accessible for which f () > 0, the latter, by the inaccessible for which f () < 0 Brought together, Sections 4, 7, 8, and 9 imply, in plain words, that what you do not necessarily see may affect you significantly. This section serves to underline that the notion of canonical multifractal is very subtle and deserves to be well-understood and further discussed. 10.1. Definitions of the "accessible ranges" of the variables: qs from q min to q max and s from min to max; the accessible functions (q) and f () Mandelbrot (1995) worked to introduce to the function f () = max{0,f ()}. That is, * In the interval [ min, max] where f () > 0, f () = f (); * When f () 0, f () = 0. The graph of f () is identical to that of f () except that the "tails" with f < 0 are truncated so that f > 0. In terms of (q), the equality f () = 0 corresponds to lines that are tangent to the graph of (q) and also go through (0,0). In the most general case, those lines' slopes are min and max and the points of contact are denoted by q max (satisfying >0) and q min (satisfying <0). Therefore, the function f () corresponds to the following truncated function (q). * When q < q min, (q) = maxq; * When q min < q < q max, (q) = (q); * When q > q max, (q) = minq. In other words, the graph of is identical to that of except that beyond q max or q min it follows the tangents that go through the origins. Therefore it is straight. For the TVCM, one has either max = max with q min = -, or min = min with q max = . 10.2. A confrontation Section 4 noted that the largest values of ([0,1]) are generated when a sample cascade begins with a few large values. Section 7 noted that the value of ([0,1]) ­ irrespective of Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 31 size ­ ceases, for k , to have any impact on . Section 8 noted that, again for k , values of such that f () < 0 have a vanishing probability of being observed. Section 9.1 followed up by defining the accessible function f (). Section 9 returned to large values of ([0,1]) and noted their association with qcrit < . The values of they involve satisfy < 0, hence a fortiori f () < 0. Those values do not occur in multifractal decomposition, yet they are extremely important. 10.3. The simplest cases where f () > 0 for all , as exemplified by the canonical binomial Here, the large values of are ruled by the left-most part of the graph of f (). That is, the same graph controls those large values and the distribution of ([0,1]) among the 1/dt intervals of length dt. 10.4. The extreme case where f () < 0 and < 0 both occur, as exemplified by TVCM when u > 1 Due to the inequality f () < , the graph of f () never intersects the quadrant where < 0 and f > 0. The key unexpected fact is that the portions of f () within other quadrants play more or less separate roles. In the TVCM case, those quadrants are parts of one (analytically simple) function. But in general they are nearly independent of each other. The function f () was defined as having a graph that lies in the non-anomalousquadrant > 0 and f > 0. This f determines completely the multifractal decomposition of our TVCM measure, in particular, the dimension D and the exponents q min, q max, min and max. To the contrary, qcrit is entirely determined by the doubly anomalous left tail located in the quadrant characterized by f () < 0 and < 0. A priori, it was quite unexpected that this quadrant should exist and play any role, least of all a central role, in the theory of multifractals. But in fact, qcrit has a major effect on the distribution, hence the value of the total measure in an interval. 10.5. The intermediate case where min > 0 but f () < 0 for some values of When p < 1/2, but u < 1 so that qcrit = and all moments are finite, large values of have a much lower probability than when u > 1. As always, however, their probability distribution continues to be determined by the left tail of the probability graph where f < 0. 11. A broad form of the multifractal formalism that allows < 0 and f () < 0 The collection of rules that relate (q) to f () is called "multifractal formalism". TVCM was specifically designed to understand multifractals directly, thus avoiding all formalism. 32 B.B. Mandelbrot However, general random multifractals more than TVCM demand their own broad multifractal formalism. Once again, the most widely known form of the multifractal formalism does not allow randomness and yields f () > 0, but the broad formalism first introduced in Mandelbrot (1974a, b) concerns a generalized function for which f () < 0 is allowed. 11.1. The broad "multifractal formalism" confirms the form of f () and allows f () < 0 for some Through a point on the graph of coordinates q and (q), draw the tangent to that graph. Under wide conditions, the tangenťs slope is (q) and its intercept by the ordinate axis is -f (q). Thus (q) = d(q) dq and - f (q) = (q) - q d(q) dq . Through the quantities (q) and f (q), a function f () is defined by using q as parame- ter. The slope f () is the inverse of the function (q). The tangent of slope f () intersects the line = 0 at the point of ordinate -(q). The D(q) tangenťs equation being -(q) + q, its intersection with the bisector satisfies the condition - + q = , hence D = (q)/(q - 1). This is the critical embedding dimension discussed in Section 5.4. 11.2. The Legendre and inverse Legendre transforms and the thermodynamical analogy The transforms that replace q and (q) by and f (), or conversely, are due to Legendre. They play a central role in thermodynamics, as does already the argument that yielded f () and () in the original formalism introduced in Mandelbrot (1974a, b). Acknowledgments The figures in this chapter were prepared by Evgenyi Vilenkin, Yale class of 2003. References Chernoff, H., 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics 23, 493­507. Cootner, P.H. (Ed.), 1964. The Random Character of Stock Market Prices. MIT Press, Cambridge, MA. Daniels, H.E., 1954. Saddlepoint approximations in statistics. The Annals of Mathematical Statistics 25, 631­649. Durrett, R., Liggett, T.M., 1983. Fixed points of the smoothing transformation. Zeitschrift für Wahrscheinlichkeitstheorie 64, 275­301. Frisch, U., Parisi, G., 1985. Fully developed turbulence and intermittency. In: Ghil, M. (Ed.), Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics, North-Holland, pp. 84­86. Excepted in Mandelbrot (1999a). Ch. 1: Heavy Tails in Finance for Independent or Multifractal Price Increments 33 Halsey, T.C., Jensen, M.H., Kadanoff, L.P., Procaccia, I., Shraiman, B.I., 1986. Fractal measures and their singularities: the characterization of strange sets. Physical Review A 33, 1141­1151. Hentschel, H.G.E., Procaccia, I., 1983. The infinite number of generalized dimensions of fractals and strange attractors. Physica (Utrecht) 8D, 435­444. Kahane, J. P., Peyrire, J., 1976. Sur certaines martingales de B. Mandelbrot. Advances in Mathematics 22, 131­ 145. Translated in Mandelbrot (1999a) as Chapter N17. Kolmogorov, A.N., 1962. A refinement of previous hypotheses concerning the local structure of turbulence in a viscous incompressible fluid at high Reynolds number. Journal of Fluid Mechanics 13, 82­85. Liu, Q.S., 2002. An extension of a fundamental equation of Poincaré and Mandelbrot. Asian Journal of Mathematics 6, 145­68. Lo, A.W., 1991. Long-term memory in stock market prices. Econometrica 59, 1279­1313. Mandelbrot, B.B., 1963. The variation of certain speculative prices. Journal of Business (Chicago) 36, 394­ 419. Reprinted in Cootner (1964), as Chapter E 14 of Mandelbrot (1997), in Telser (2000), and several other collections of papers on finance. Mandelbrot, B.B., 1965. Une classe de processus stochastiques homothétiques soi; application la loi climatologique de H.E. Hurst. Comptes Rendus (Paris) 260, 3274­3277. Translated as Chapter H9 of Mandelbrot (2002). Mandelbrot, B.B., 1967. The variation of some other speculative prices. Journal of Business (Chicago) 40, 393­ 413. Reprinted as Chapter E14 of Mandelbrot (1997), pp. 419­443, in Telser (2000), and several other collections of papers on finance. Mandelbrot, B.B., 1972. Possible refinement of the lognormal hypothesis concerning the distribution of energy dissipation in intermittent turbulence. In: Rosenblatt, M., Van Atta, C. (Eds.), Statistical Models and Turbulence. Springer-Verlag, New York, pp. 333­351. Reprinted in Mandelbrot (1999a) as Chapter N14. Mandelbrot, B.B., 1974a. Intermittent turbulence in self similar cascades; divergence of high moments and dimension of the carrier. Journal of Fluid Mechanics 62, 331­358. Reprinted in Mandelbrot (1999a) as Chapter N15. Mandelbrot, B.B., 1974b. Multiplications aléa.atoires itérées et distributions invariantes par moyenne pondérée aléatoire, Comptes Rendus (Paris) A 278, 289­292 and 355­358. Reprinted in Mandelbrot (1999a) as Chapter N16. Mandelbrot, B.B., 1982. The Fractal Geometry of Nature. Freeman, New York. Mandelbrot, B.B., 1984. Fractals in physics: squid clusters, diffusions, fractal measures and the unicity of fractal dimension. Journal of Statistical Physics 34, 895­930. Mandelbrot, B.B., 1989a. Multifractal measures, especially for the geophysicist. Pure and Applied Geophysics 131, 5­42. Mandelbrot, B.B., 1989b. A class of multinomial multifractal measures with negative (latent) values for the "dimension" f (). In: Pietronero, L. (Ed.), Fractals' Physical Origin and Properties. Plenum, New York, pp. 3­29. Mandelbrot, B.B., 1990a. Negative fractal dimensions and multifractals. Physica A 163, 306­315. Mandelbrot, B.B., 1990b. New "anomalous" multiplicative multifractals: left-sided f () and the modeling of DLA. Physica A 168, 95­111. Mandelbrot, B.B., 1990c. Limit lognormal multifractal measures. In: Gotsman, E.A., Neéman, Y., Voronel, A. (Eds.), Frontiers of Physics: Landau Memorial Conference. Pergamon, New York, pp. 309­340. Mandelbrot, B.B., 1995. Negative dimensions and Hölders, multifractals and their Hölder spectra, and the role of lateral preasymptotics in science. In: Bonami, A., Peyrire, J. (Eds.), J.P. Kahane's meeting, Paris, 1993. The Journal of Fourier Analysis and Applications, 409­432. Mandelbrot, B.B., 1997. Fractals and Scaling in Finance: Discontinuity, Concentration, Risk (Selecta Volume E). Springer-Verlag. Mandelbrot, B.B., 1999a. Multifractals and 1/f Noise: Wild Self-Affinity in Physics (Selecta Volume N). Springer-Verlag. Mandelbrot, B.B., 1999b. A multifractal walk through Wall Street. Scientific American, February issue, 50­53. 34 B.B. Mandelbrot Mandelbrot, B.B., 2001a. Scaling in financial prices, I: Tails and dependence. Quantitative Finance 1, 113­124. Reprint: Farmer, D., Geanakoplos, J. (Eds.), Beyond Efficiency and Equilibrium. Oxford University Press, UK, 2002. Mandelbrot, B.B., 2001b. Scaling in financial prices, II: Multifractals and the star equation. Quantitative Finance 1, 124­130. Reprint: Farmer, D., Geanakoplos, J. (Eds.), Beyond Efficiency and Equilibrium. Oxford University Press, UK, 2002. Mandelbrot, B.B., 2001c. Scaling in financial prices, III: Cartoon Brownian motions in multifractal time. Quantitative Finance 1, 427­440. Mandelbrot, B.B., 2001d. Scaling in financial prices, IV: Multifractal concentration. Quantitative Finance 1, 558­ 559. Mandelbrot, B.B., 2001e. Stochastic volatility, power-laws and long memory. Quantitative Finance 1, 427­440. Mandelbrot, B.B., 2002. Gaussian Self-Affinity and Fractals (Selecta Volume H). Springer-Verlag. Mandelbrot, B.B., 2003, forthcoming. Mandelbrot, B.B., Calvet, L., Fisher, A., 1997. The multifractal model of asset returns. Large deviations and the distribution of price changes. The multifractality of the Deutschmark/US Dollar exchange rate. Discussion Papers numbers 1164, 1165, and 1166 of the Cowles Foundation for Economics at Yale University, New Haven, CT. Available on the web at the following addresses. http://papers.ssrn.com/sol3/paper.taf? ABSTRACT ID=78588. http://papers.ssrn.com/sol3/paper.taf? ABSTRACT ID=78606. http://papers.ssrn.com/sol3/paper.taf? ABSTRACT ID=78628. Mandelbrot, B.B., Taylor, H.M., 1967. On the distribution of stock price differences. Operations Research 15, 1057­1062. Obukhov, A.M., 1962. Some specific features of atmospheric turbulence. Journal of Fluid Mechanics 13, 77­81. Telser, L. (Ed.), 2000. Classic Futures: Lessons from the Past for the Electronic Age. Risk Books, London. Chapter 2 FINANCIAL RISK AND HEAVY TAILS BRENDAN O. BRADLEY and MURAD S. TAQQU Department of Mathematics and Statistics, Boston University, 111 Cummington Street, Boston, MA 02215, USA e-mail: bbradley@bu.edu, murad@bu.edu web: http://math.bu.edu/people/murad Contents Abstract 36 1. Introduction 37 2. Historical perspective 38 2.1. Risk and utility 38 2.2. Markowitz mean­variance portfolio theory 39 2.3. CAPM and APT 40 2.4. Empirical evidence 43 3. Value at risk 45 3.1. Computation of VaR 47 3.1.1. Historical simulation VaR 47 3.1.2. Parametric VaR 48 3.1.3. Monte Carlo VaR 50 3.2. Parameter estimation 51 3.2.1. Historical volatility 51 3.2.2. ARCH/GARCH volatilities 53 3.2.3. Implied volatilities 57 3.2.4. Extreme value theory 58 4. Risk measures 58 4.1. Coherent risk measures 58 4.2. Expected shortfall 59 5. Portfolios and dependence 61 5.1. Copulas 61 5.2. Measures of dependence 66 5.2.1. Linear correlation 67 5.2.2. Rank correlation 67 5.2.3. Comonotonicity 68 5.2.4. Tail dependence 70 5.3. Elliptical distributions 72 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 36 B.O. Bradley and M.S. Taqqu 6. Univariate extreme value theory 77 6.1. Limit law for maxima 78 6.2. Block maxima method 80 6.3. Using the block maxima method for stress testing 82 6.4. Peaks over threshold method 83 6.4.1. Semiparametric approach 83 6.4.2. Fully parametric approach 85 6.4.3. Numerical illustration 89 6.4.4. A GARCH-EVT model for risk 91 7. Stable Paretian models 94 7.1. Stable portfolio theory 95 7.2. Stable asset pricing 98 Acknowledgments 100 References 101 Abstract It is of great importance for those in charge of managing risk to understand how financial asset returns are distributed. Practitioners often assume for convenience that the distribution is normal. Since the 1960s, however, empirical evidence has led many to reject this assumption in favor of various heavy-tailed alternatives. In a heavy-tailed distribution the likelihood that one encounters significant deviations from the mean is much greater than in the case of the normal distribution. It is now commonly accepted that financial asset returns are, in fact, heavy-tailed. The goal of this survey is to examine how these heavy tails affect several aspects of financial portfolio theory and risk management. We describe some of the methods that one can use to deal with heavy tails and we illustrate them using the NASDAQ composite index. Ch. 2: Financial Risk and Heavy Tails 37 1. Introduction Financial theory has long recognized the interaction of risk and reward. The seminal work of Markowitz (1952) made explicit the trade-off of risk and reward in the context of a portfolio of financial assets. Others such as Sharpe (1964), Lintner (1965), and Ross (1976), have used equilibrium arguments to develop asset pricing models such as the capital asset pricing model (CAPM) and the arbitrage pricing theory (APT), relating the expected return of an asset to other risk factors. A common theme of these models is the assumption of normally distributed returns. Even the classic Black and Scholes option pricing theory (Black and Scholes, 1973) assumes that the return distribution of the underlying asset is normal. The problem with these models is that they do not always comport with the empirical evidence. Financial asset returns often possess distributions with tails heavier than those of the normal distribution. As early as 1963, Mandelbrot (1963) recognized the heavy-tailed, highly peaked nature of certain financial time series. Since that time many models have been proposed to model heavy-tailed returns of financial assets. The implication that returns of financial assets have a heavy-tailed distribution may be profound to a risk manager in a financial institution. For example, 3 events may occur with a much larger probability when the return distribution is heavy-tailed than when it is normal. Quantile based measures of risk, such as value at risk, may also be drastically different if calculated for a heavy-tailed distribution. This is especially true for the highest quantiles of the distribution associated with very rare but very damaging adverse market movements. This chapter serves as a review of the literature. In Section 2, we examine financial risk from an historical perspective. We review risk in the context of the mean­variance portfolio theory, CAPM and the APT, and briefly discuss the validity of their assumption of normality. Section 3 introduces the popular risk measure called value at risk (VaR). The computation of VaR often involves estimating a scale parameter of a distribution. This scale parameter is usually the volatility of the underlying asset. It is sometimes regarded as constant, but it can also be made to depend on the previous observations as in the popular class of ARCH/GARCH models. In Section 4, we discuss the validity of several risk measures by reviewing a proposed set of properties suggested by Artzner, Delbean, Eber and Heath (1999) that any sensible risk measure should satisfy. Measures satisfying these properties are said to be coherent. The popular measure VaR is, in general, not coherent, but the expected shortfall measure is. The expected shortfall, in addition to being coherent, gives information on the expected size of a large loss. Such information is of great interest to the risk manager. In Section 5, we return to risk, portfolios and dependence. Copulas are introduced as a tool for specifying the dependence structure of a multivariate distribution separately from the univariate marginal distributions. Different measures of dependence are discussed including rank correlations and tail dependence. Since the use of linear correlation in finance is ubiquitous, we introduce the class of elliptical distributions. Linear correlation is shown to be the canonical measure of dependence for this class of multivariate distributions and the standard tools of risk management and portfolio theory apply. 38 B.O. Bradley and M.S. Taqqu Since the risk manager is concerned with extreme market movements we introduce extreme value theory (EVT) in Section 6. We review the fundamentals of EVT and argue that it shows great promise in quantifying risk associated with heavy-tailed distributions. Lastly, in Section 7, we examine the use of stable distributions in finance. We reformulate the mean­variance portfolio theory of Markowitz and the CAPM in the context of the multivariate stable distribution. 2. Historical perspective 2.1. Risk and utility Perhaps the most cherished tenet of modern day financial theory is the trade-off between risk and return. This, however, was not always the case, as Bernstein's (1996) narrative on risk indicates. In fact, investment decisions used to be based primarily on expected return. The higher the expected return, the better the investment. Risk considerations were involved in the investment decision process, but only in a qualitative way, stocks are more risky than bonds, for example. Thus any investor considering only the expected payoff EX of a game (investment) would, in practice, be willing to pay a fee equal to EX for the right to play. The practice of basing investment decisions solely on expected return is problematic, however. Consider the game known today as the Saint Petersburg Paradox, introduced in 1728 by Nicholas Bernoulli. The game involves flipping a fair coin and receiving a payoff of 2n-1 roubles1 if the first head appears on the nth toss of the coin. The longer tails appears, the larger the payoff. While in this game the expected payoff is infinite, no one would be willing to wager an infinite sum to play, hence the paradox. Investment decisions cannot be made on the basis of expected return alone. Daniel Bernoulli, Nicholas' cousin, proposed a solution to the paradox ten years later. He believed that, instead of trying to maximize their expected wealth, investors want to maximize their expected utility of wealth. The notion of utility is now widespread in economics.2 A utility function U :R R indicates how desirable is a quantity of wealth W. One generally agrees that the utility function U should have the following properties: (1) U is continuous and differentiable over some domain D. (2) U (W) > 0 for all W D, meaning investors prefer more wealth to less. (3) U (W) < 0 for all W D, meaning investors are risk averse. Each additional dollar of wealth adds less to the investors utility when wealth is large than when wealth is small. In other words, U is smooth and concave over D. An investor can use his utility function to express his level of risk aversion. 1 In fact, it was ducats (Bernstein, 1996). 2 For introductions to utility theory see for example Ingersoll (1987) or Huang and Litzenberger (1988). Ch. 2: Financial Risk and Heavy Tails 39 2.2. Markowitz mean­variance portfolio theory In 1952, while a graduate student at the University of Chicago, Harry Markowitz (1952) produced his seminal work on portfolio theory connecting risk and reward. He defined the reward of the portfolio as the expected return and the risk as its standard deviation or variance.3 Since the expectation operator is linear, the portfolio's expected return is simply given by the weighted sum of the individual assets' expected returns. The variance operator, however, is not linear. This means that the risk of a portfolio, as measured by the variance, is not equal to the weighted sum of risks of the individual assets. This provides a way to quantify the benefits of diversification. We briefly describe Markowitz' theory in its classical setting where we assume that the assets distribution is multivariate normal. We will relax this assumption in the sequel. For example, in Section 5.3, we will suppose that the distribution is elliptical and, in Section 7.1, that it is an infinite variance stable distribution. Consider a universe with n risky assets with random rates of return X = (X1,...,Xn), with mean = (1,...,n), covariance matrix and portfolio weights w = (w1,..., wn). If X is assumed to have a multivariate normal distribution X N(,), then the return distribution of the portfolio Xp = wTX is also normally distributed, Xp N(p,2 p) where p = wT and 2 p = wTw. The problem is to find the portfolio of minimum variance that achieves a minimum level a of expected return: min w wT w such that wT a, (1) eT w = 1. Here e = (1,...,1) and T denotes a transpose. The last condition in (1), eT w = n i=1 wi = 1, indicates that the portfolio is fully invested. Additional restrictions are usually added on the weights4 and the problem is generally solved through quadratic programming. By varying the minimum level a of expected return, a set of portfolios Xp is chosen, each of which is optimal in the sense that an investor cannot achieve a greater expected return, p = EXp, without increasing his risk, p. The set of optimal portfolios corresponds to a convex curve (p,EXp) called the efficient frontier. Any rational investor making decisions based only on the mean and variance of the distribution of returns of a portfolio would only choose 3 In practice, one minimizes the variance, but it is convenient to view risk as measured by the standard deviation. 4 For example, wi 0, in other words no short selling. Without the additional constraints, the problem can be solved as a system of linear equations. 40 B.O. Bradley and M.S. Taqqu Fig. 1. The efficient frontier (p,p). In the case when only risky assets R are available, the frontier traces out a convex curve in risk-return space. The inclusion of a risk-free asset r, has a profound effect on the efficient set. In this case, all efficient portfolios will consist of linear combinations of r and some risky portfolio R, where (R,R) lies on the efficient frontier. to own portfolios on this efficient frontier. The specific portfolio he chooses depends on his level of risk aversion.5 If the universe of assets also includes a risk-free asset which the investor may borrow and lend without constraint, then the optimal portfolio is a linear combination of the risk-free asset r and a certain risky portfolio XR on the efficient frontier. As shown in Figure 1, this line is tangent to the convex risky asset efficient frontier at the point (R,EXR). The risky portfolio therefore maximizes the slope of this linear combination, max w E(XR) - r XR . (2) Again, the specific weights given to the risk-free and risky assets depend on the individual investors level of risk aversion. 2.3. CAPM and APT The mean­variance portfolio theory of Markowitz describes the construction of an optimal portfolio, in the mean­variance sense, for an individual investor. It requires only estimates 5 One can reconcile maximizing expected utility with the mean­variance portfolio theory of Markowitz, but one has to assume either a quadratic utility function or that returns are multivariate normal or, more generally, elliptical. (Elliptical distributions are introduced in Section 5.3.) For example, if returns are multivariate normal and if Xp1 and Xp2 are the returns of two linear portfolios with the same expected return, then for all utility functions U with properties listed in Section 2.1, EU(Xp1 ) EU(Xp2 ) if and only if 2 p1 2 p2 . See for example Ingersoll (1987). Ch. 2: Financial Risk and Heavy Tails 41 for each asset mean return, and the covariance between assets.6 If all investors act in a way consistent with Markowitz' theory, then under additional assumptions, one will be able to learn something about the trade-off between risk and return in a market in equilibrium.7 This is what the CAPM does. The capital asset pricing model (CAPM) is an equilibrium pricing model [see Sharpe (1964) and Lintner (1965)] which relates the expected return of an asset to the risk-free return, to the markeťs expected return and to the covariance between the market and the asset. In addition to assuming that market participants use the mean­variance framework, the model makes two additional major assumptions. First, the market is assumed frictionless. This means that securities are infinitely divisible, there exist no transaction costs, no taxes, and there are no trading restrictions. Second, the investors beliefs are homogeneous. This means investors agree on mean returns and covariances for all assets in the market. The efficient frontier in Figure 1 depended on the investors' belief. Under the CAPM assumptions, since all investors assume the same expected return and covariances for all assets in the market, they all have the same (risky) efficient frontier. However, the individual investors choice of the optimal risky portfolio still depends on the investors own level of risk aversion. Additionally, with the inclusion of a risk-free asset, we saw that the investors portfolios become dramatically more simple. Each investor can own only two assets: the risk-free asset and an optimal risky portfolio, with the relative weights depending on the investors appetite for risk. But since each investor holds the same optimal portfolio of risky assets, and since the market is assumed to be in equilibrium, this optimal risky portfolio must be the market portfolio. Thus Figure 1 applies with R = M, where M denotes the market portfolio. M consists of all risky assets held in proportion to their overall market capitalization. Letting XM denote the return on the market portfolio, Xi denote the return of asset i, and r denote the risk-free return, the CAPM establishes the following relationship: E(Xi - r) = iE(XM - r), (3) where i = Cov(Xi,XM) VarXM . (4) The CAPM thus relates in a linear way the expected premium EXi -r of holding the risky asset i over the risk-free asset to the expected premium EXM - r of holding the market portfolio over the risk-free asset. The constant of proportionality is the asseťs beta. The coefficient i is a measure of asset i's sensitivity to the market portfolio. The expected 6 For a universe of n assets it is necessary to compute n(n-1)/2+n covariances. This means that if the universe under consideration consists of n = 1000 assets, it is necessary to estimate over 500000 covariances. 7 By market equilibrium, we mean a market place where security prices are set so that supply equals demand. 42 B.O. Bradley and M.S. Taqqu premium for asset i is greater than that of the market if i > 1 and less if i < 1. But if i > 1, then the risk will be greater. Indeed, if we assume that Xi - r = i(XM - r) + i, (5) where i is such that Ei = 0 and Cov(i,XM ) = 0, then we have (3) and 2 Xi = 2 i 2 XM + 2 i . (6) Equation (5) is often known as a single factor model for asset returns. Notice from (6) that the asseťs risk is the sum of two terms, the systematic or market risk 2 i 2 XM and the unsystematic or residual risk 2 i . For a portfolio Xp with weights w = (w1,...,wn), one gets similarly 2 Xp = 2 p2 XM + 2 p where p = n i=1 wii. If one additionally assumes that Cov(i,j ) = 0 for all i = j then the residual risk is 2 p = n i=1 w2 i 2 i . (7) It is bounded by c/n for some constant c, if for example, wi = 1/n, and hence the portfolio's residual risk can be greatly reduced by diversification. The investor, for example, is only rewarded for bearing systematic or market risk, that is, he can expect a higher return than the market only by holding a portfolio which is riskier (p > 1) than the market. In the CAPM, all assets are exposed to a single common source of randomness, namely the market. The arbitrage pricing theory (APT) model, due to Ross (1976), is a generalization of the CAPM in which assets are exposed to a larger number of common sources of randomness. The APT differs from the CAPM in that the mean­variance framework that led to (5) is now replaced by the assumption of a multifactor model Xi = i + i1f1 + + ikfk + i (8) for generating security returns. All assets are exposed to the k sources of randomness fj , j = 1,...,k, called factors. Additionally, each asset i is exposed to its own specific source of randomness i. The equilibrium argument used in the CAPM led to the central result (3). In the APT, the equilibrium assumption takes a slightly different form, namely, one assumes that the market is free of arbitrage. The major result of the APT then relates the expected premium of asset i to its exposure ij to factor j, and to each factor premium j , j = 1,...,k. Specifically EXi = r + i11 + + ikk, (9) where j , j = 1,...,k, is the expected premium investors demand for bearing the risk of factor j. Notice that the factor premiums j are the same for each security, and it is the Ch. 2: Financial Risk and Heavy Tails 43 Fig. 2. Left: Empirical probability density function (pdf) for NASDAQ standardized returns (solid) versus the normal distribution (dot­dash) over the period February 1971 to February 2001. Right: Corresponding quantile­quantile (QQ) plot with quantiles of the normal distribution on the abscissa and empirical quantiles on the ordinate. Returns are expressed as a %. exposure ij to each factor that depends on the security. Additionally if k = 1 in (8) and if we assume the existence of a risk-free asset r, f1 = XM and that i are uncorrelated with each other and the market, then 1 = E(XM - r) and we get back the CAPM. 2.4. Empirical evidence Markowitz's mean­variance portfolio theory, as well as the CAPM and APT models, rely either explicitly or implicitly on the assumption of normally distributed asset returns.8 Today, with long histories of price/return data available for a great many financial assets, it is easy to see that this assumption is inadequate. Empirical evidence suggests that asset returns have distributions which are heavier-tailed than the normal distribution. Figure 2 illustrates this for the NASDAQ.9 The quantile­quantile (QQ) plot10 shows clearly that the distribution tails of the NASDAQ are heavier than the tails of the normal distribution. As early as 1963, Mandelbrot (1963) and Fama (1965) rejected the assumption of normality for other heavier-tailed distributions. In his 1963 paper, Mandelbrot not only confirmed the poor fit of the normal distribution, but proposed the model which is known today as the stable model for asset returns. 8 As noted before, the multivariate normal assumption is consistent with maximizing expected utility. 9 The daily NASDAQ time series, the corresponding returns and their maxima and minima are displayed in Figure 16. The time series starts in February 1971 and ends February 2001 (actually from February 08, 1971 to January 26, 2001). The corresponding empirical statistics can be found in Table 1. 10 A quantile­quantile (QQ) plot is a graphical check to see if two distributions are of the same type. Two random variables X and Y are said to be of the same type if their distributions are the same up to a change in location and scale. That is X d = aY + b for some a R+, b R. Since the QQ plot plots quantiles of two distributions, if they are of the same type, the plot should be linear. In this case we are checking whether the empirical distribution of NASDAQ standardized returns and the hypothesized normal distribution are of the same type. 44 B.O. Bradley and M.S. Taqqu Table 1 Empirical statistics for daily returns (as %) of several financial assets: the S&P 500 index, the USD/British pound exchange rate, the Thai Baht/USD exchange rate and the NASDAQ composite index Asset Period Mean Std. dev. Skewness Kurtosis11 Min Max S&P 500 01/51-03/2001 0.033 0.870 -1.61 43.9 -22.9 8.71 USD/GBP 02/1985-02/2001 0.006 0.677 0.043 3.40 -4.13 4.59 TB/USD 02/85-03/2001 0.011 0.663 4.22 158 -8.57 17.8 NASDAQ 02/1971-02/2001 0.044 1.08 -0.523 15.5 -12.0 13.3 Fig. 3. Ratio of tail probabilities P(T > k)/P(X > k) plotted in units of k. Here T t4 and X is normal, both with variance 2. T is more likely to take large values than X. Recall that if the normal distribution is valid, then about 95% of the observations would lie within two standard deviations of the mean, and about 99% would lie within three standard deviations of the mean. In financial time series, large returns (both positive and negative) occur far too often to be compatible with the normal distribution assumption. The distribution of the financial return series are characterized not only by heavy tails, but also by a high peakedness at the center. In the Econometric terminology, they are said to be leptokurtotic. To the risk manager trying to guard against large losses, the deviation from normality cannot be neglected. Suppose for example that daily returns are distributed as a stable distribution with 4 degrees of freedom (denoted t4) and a variance given by 2. Since this distribution has a much heavier tail than a normal distribution with the same variance, as one moves farther out into the tail of the distribution, rare events occur much more frequently. Figure 3 shows how much more likely rare events occur under the t4 assumption than under the normal, when rare is defined in terms of standard deviations. 11 In this chapter we use as definition of kurtosis K(X) = E(X - X)4 (VarX)2 - 3, so that the normal distribution has a kurtosis of zero. Heavy tails, therefore, will lead to positive kurtosis. Ch. 2: Financial Risk and Heavy Tails 45 3. Value at risk In the early 1990s, a number of financial institutions (J.P. Morgan, Bankers Trust, ...) proposed a new risk measure to quantify by a single number the firms aggregate exposure to market risk. This measure, commonly known today as value at risk (VaR), is now used to measure not only market risk but other forms of risk to which the firm is exposed, such as credit, operational, liquidity, and legal risk. VaR is defined as the loss of a financial position over a time horizon that would be exceeded with small probability 1 - , that is, P(Loss > VaR) 1 - . (10) The confidence level is typically a large number12 between 0.95 and 1. To define VaR precisely, let X be the random variable whose cumulative distribution function FX describes the negative profit and loss distribution (P&L) of the risky financial position at the specified horizon time . Negative values of X correspond now to profits and positive values of X correspond to losses. This is a useful convention in risk management since there is then no ambiguity when discussing large losses (large values of X correspond to large losses). Formally, value at risk is a quantile of the probability distribution FX, that is roughly, the x corresponding to a given value of 0 < = FX(x) < 1. Definition 3.1. Let X be the random variable whose cumulative distribution function FX describes the negative profit and loss distribution (P&L) of the risky financial position at the specified horizon time (so that losses are positive). Then, for a confidence level 0 < < 1, VaR(X) = inf x | FX(x) . (11) We set, avoiding technicalities VaR(X) = F-1 X (), where F-1 X denotes the inverse function of FX 13 (see Figure 4). Hence the value VaR(X) over the horizon time would be exceeded on the average 100(1 - ) times every 100 time periods. 12 In statistics, and 1 - are usually interchanged because , in statistics, denotes typically the Type 1 hypothesis testing error and is chosen small. The corresponding confidence level is then 1 - . 13 This is strictly correct when FX is strictly increasing and continuous. Otherwise, one needs to use the generalized inverse of FX, denoted F X , and defined as F X () = inf{x | FX(x) }, 0 < < 1. The definition (11) of VaR(X) is then VaR(X) = F X (). Thus, if FX(x) = for x0 x x1, then VaR(X) = F X () = x0. 46 B.O. Bradley and M.S. Taqqu Fig. 4. VaR(X) for different cumulative distributions functions (cdfs) of the loss distribution X. The cdf on the right corresponds to an asset with discontinuous payoff, for example a binary option. See Definition 3.1. Because of its intuitive appeal and simplicity, it is no surprise that VaR has become the de facto standard risk measure used around the world today. For example, today VaR is frequently used by regulators to determine minimum capital adequacy requirements. In 1995, the Basle Committee on Banking Supervision14 suggested that banks be allowed to use their own internal VaR models for the purpose of determining minimum capital reserves. The internal models approach of the Basle Committee is a ten day VaR at the = 99% confidence level multiplied by a safety factor of at least 3. Thus if VaR = 1M, the institution is required to have at least 3M in reserve in a safe account. The safety factor of three is an effort by regulators to ensure the solvency of their institutions. It has also been argued, see Stahl (1997) or Danielsson et al. (1998), that the safety factor of three comes from the heavy-tailed nature of the return distribution. Since most VaR calculations are based on the simplifying assumption that the distribution of returns are normal,15 how bad does this assumption effect VaR? Assume that the Profit and Loss (P&L) distribution is symmetric and has finite variance 2. Then regardless of the actual distribution, if X represents the random loss over the specified horizon time with mean zero, Chebyshev's inequality gives P[X > c] 1 2c2 . So if we are interested in VaR bounds for = 0.99, setting 1/2c2 = 0.01 gives c = 7.071, and this implies VaRmax =0.99(X) = 7.071. If the VaR calculation were done under the assumption of normality (Gaussian distribution) then VaRGa =0.99(X) = 2.326, and so if the true distribution is indeed heavy-tailed with finite variance then the correction for VaR=0.99 of three is reasonable, since 3 × 2.326 = 6.978. 14 See Basle Committee on Banking Supervision (1995a, 1995b). Basle is a city in Switzerland. In French, Basle is Bâle, in German, it is Basel. Basle is the old name for the city. The accent in Bâle stands for the s that has been dropped from Basle. 15 See for example the RiskMetrics manual (RiskMetrics, 1996). Ch. 2: Financial Risk and Heavy Tails 47 3.1. Computation of VaR Before we discuss how VaR(X) is computed, we need to say a few words about X. Typically X represents the risk of some aggregated position which is influenced by many underlying risk factors Y1,...,Yd , X = f (Y1,...,Yd). (12) The functional form of the dependence of X on the factors Y1,...,Yd is usually never known exactly, but it may be approximated in several standard ways depending on the nature of the position. For example, f is linear in the case of a portfolio of straight equity positions. The function f is non-linear, for example, if the portfolio contains a call option on an equity since the value of the call changes non-linearly with respect to a change in the underlying asset. The usual procedure is to approximate the change in the calls value with respect to its underlying by the options delta. For small changes in the underlying such an approximation is reasonable. However for large changes in the underlying, the approximation can be quite bad. In an effort to improve the approximation, a second order term is sometimes added, the options gamma. This second order approximation is referred to as the delta­gamma approximation. In practice, the VaR of a risky position X is calculated in one of three ways: through historical simulation, through a parametric model, or through some sort of Monte Carlo simulation. Each way involves assumptions and approximations and it is the responsibility of the user to be aware of them. The risk manager who blindly performs the model calculations does so at his or her peril. For a full treatment of the commonly used procedures for the calculation of VaR, see Jorion (2001), Dowd (1998) or Wilson (1998). See Duffie and Pan (1997) for a discussion of heavy tails and VaR calculations. We now describe the three ways of calculating VaR. 3.1.1. Historical simulation VaR The historical simulation model uses the historical returns of assets currently held in the portfolio in order to calculate VaR.16 First, returns over the horizon time are constructed for each asset in the portfolio using historical price information. Then portfolio returns are computed using the current weight distribution of assets as though the portfolio had been held during the whole historical period which is being sampled. The VaR is then read from the historical sample by using the order statistics. For example, if 1000 time periods are sampled, then 1000 portfolio returns are calculated, one for each time period. Let X(1) p X(2) p X(1000) p be the order statistics of these returns, where losses are positive. Then VaR=0.95(Xp) = X(50) p . The size of the sample is chosen by the user, but may be constrained by the available data for some of the assets currently held. 16 Over a fixed time horizon, VaR may be reported in units of rate of return (%) or of currency (profit and loss) since these are essentially the same, up to multiplication by the initial wealth/value. 48 B.O. Bradley and M.S. Taqqu The model is simple to implement and has several advantages. Since it is based on historical prices it allows for a non-linear dependence between assets in the portfolio and underlying risk factors. Also since it uses historical returns it allows for the presence of heavy tails without making assumptions on the probability distributions of returns of the assets in the portfolio. There is therefore no model risk. In addition, there is no need to worry about the dependence structure of assets within the portfolio since it is already reflected in the price and return data. The drawbacks are typical of models involving historical data. There may not be enough data available and there may be no reason to believe that the future will look like the past. For example, if the user would like to compute VaR for regulatory requirements, then = 10 days. With about 260 business days, there are only 26 such observations in each year, four years worth of data are required to get about 100 historical simulations. This is the absolute minimum necessary to calculate VaR with = 0.99, since with 100 data points, there is but a single observation in the tail. If one or several of the assets in the portfolio have insufficient histories then adjustments must be made. For example, some practitioners bootstrap from the shorter return histories in order to take advantage of the longer histories on other assets. When working only with historical data it is important to realize that we are assuming that the future will look like the past. If this assumption is likely to be unrealistic, the VaR estimate may be dangerously off the mark. For instance, if the sample period or window is devoid of large price changes, then our historical VaR will be low. But it will be large if there were large price fluctuations during the sample period. As large price fluctuations leave the sample window, the VaR will change accordingly. This yields a highly variable estimate and one which does not take into account the current financial climate. The deficiencies of historical simulation notwithstanding, its ease of use makes it the most popular method for VaR calculations. 3.1.2. Parametric VaR The parametric VaR model assumes that the returns possess a specific distribution, usually normal. The parameters of the distribution are estimated using either historical data or forward looking option data. Example 3.1. Assume that over the desired time horizon the (negative) return distribution of a portfolio is given by FX N( ,2 ). Then the value at risk of portfolio X for horizon and confidence level > 0.5 is given by VaR(X) = inf x | FX(x) = F-1 X () = + -1 (), where -1() is the quantile of the standard normal distribution. More generally, if the (negative) return distribution of X is any FX with finite mean and finite variance 2 , then VaR(X) = + q, (13) Ch. 2: Financial Risk and Heavy Tails 49 where q is the quantile of the standardized version of X. In other words, q = F-1 X () where X = (X - )/ . If the VaR is computed under the assumption that returns are light-tailed, say normal, when in fact they are heavy tailed, say t (Student-t distribution with degrees of freedom), the risk may be seriously underestimated for high confidence levels. This is because for large , F-1 normal() F-1 t (), so that the value of x that achieves Fnormal(x) = is smaller than the value of x that achieves Ft (x) = . It is thus very important that the return distribution be modelled well. A wide variety of parametric distributions can be considered. Within the portfolio context, the most easily implemented parametric model is the so called delta-normal method, where the joint distribution of the risk factor returns is multivariate normal and the returns of the portfolio are assumed to be a linear function of the returns of the underlying risk factors. In this case the portfolio returns are themselves normally distributed. Example 3.2. Take a portfolio of equities whose (negative) returns are given by Xp = w1X1 + + wnXn where wi is the weight given to asset i and Xi is the assets (negative) return over the horizon in question. Assume (X1,...,Xn) N(0,). Then, for (0.5,1), VaR(Xp) = -1 () wTw = -- VaR T -- VaR, where -- VaR = (VaR(w1X1),...,VaR(wnXn)) is the vector of the individual weighted asset VaRs and is the asset return correlation matrix. See Dowd (1998) for details. When the number of assets is large, the central limit theorem is often invoked in defense of the normal model. Even if the individual asset returns are non-normal, the central limit theorem tells us that the weighted sum of many assets should be approximately normal. This argument may be disposed of in various ways. Consider, for example, the empirical distribution of daily returns of a large diversified index such as the NASDAQ, which is clearly heavy-tailed (see Figure 2). From a probabilistic point of view it is not at all obvious that the assumptions of the central limit theorem are satisfied. For example, if the returns do not have finite variance, there may be convergence to the class of stable distributions. The class of stable distributions (also known as -stable or stable Paretian) may be defined in a variety of ways. More will be said about them in Section 7. We define, at this stage, a stable distribution as the only possible limiting distribution of appropriately normalized sums of independent random variables. Definition 3.2. The random variable X has a stable distribution if there exists a sequences of i.i.d. random variables {Yi} and constants {an} R and {bn} R+ such that Y1 + + Yn bn - an d - X as n . (14) 50 B.O. Bradley and M.S. Taqqu The stable distribution of X in (14) is characterized by four parameters (,,,) and we write X S(,,). The parameter (0,2] is called the index of stability or the tail exponent and controls the decay in the tails of the distribution. The remaining parameters , , control scale, skewness, and location respectively. If the Yi have finite variance (the case in the usual CLT) then = 2 and the distribution of X is Gaussian. For all (0,2) the distribution is non-Gaussian stable and possess heavy tails. Example 3.3. Properties of weekly returns of the Nikkei 225 Index over a 12 year period are examined in Mittnik, Rachev and Paolella (1998). The authors fit the return distribution using a number of parametric distributions, including the normal, Student-t and stable. According to various measures of goodness of fit, the partially asymmetric Weibull, Student-t and the asymmetric stable provide the best fit. The fit by the normal is shown to be relatively poor. The stable distribution, in addition, fits best the tail quantiles of the empirical distribution, which is a result most relevant to the calculation of VaR. The central limit theorem typically assumes independence. Although it has extensions to allow for mild dependence, this dependence must be sufficiently weak. In fact, for a given number of assets, the greater the dependence, the worse the normal approximation. This affects the speed of the convergence. Since a VaR calculation involves the tails of the distribution, it is most important that the approximation hold in the tails. However, even when the conditions for the central limit theorem hold, the convergence in the tail is known to be very slow. The normal approximation may then only be valid in the central part of the distribution. In this case, the return distribution may be better approximated by a heaviertailed distribution such as the Student-t or hyperbolic whose use in finance is becoming more common. The hyperbolic distribution is a subclass of the class of generalized hyperbolic distributions. The generalized hyperbolic distributions were introduced in 1977 by BarndorffNielsen (1977) in order to explain empirical findings in geology. Today these distributions are becoming popular in finance, and in particular in risk management. Two subclasses, the hyperbolic and the inverse Gaussian, are most commonly used. Both these subclasses may be shown to be mixtures of Gaussians. As such, they possess heavier tails than the normal distribution but not as heavy as the stable distribution. For an introduction to generalized hyperbolic distributions in finance, see for example Eberlein and Keller (1995), Eberlein and Prause (2002) or Shiryaev (1999). 3.1.3. Monte Carlo VaR Monte Carlo procedures are perhaps the most flexible methods for computing VaR. The risk manager specifies a model for the underlying risk factors, which incorporates somehow their dependence. For example, the risk factors in (12) may be described by the stochastic differential equation dY (i) t = Y (i) t (i) t dt + (i) t dW (i) t , (15) Ch. 2: Financial Risk and Heavy Tails 51 for i = 1,...,d, where Wt = (W(1) t ,...,W(d) t ) is a multivariate Wiener process. Once parameters of the model are estimated, for example by using historical data, or option implied estimates, the risk factors paths are then computer generated, thousands of paths for each risk factor. Each set of simulated paths for the risk factors yields a portfolio path and the portfolio is priced accordingly. Each computed price of the portfolio represents a point on the portfolio's return distribution. After many such points are obtained the portfolio's VaR may then be read off the simulated distribution. This method has the advantage of being extremely versatile. It allows for heavy tails, non-linear payoffs and a great many other user specifications. Within the Monte Carlo framework, risk managers may use their own pricing models to determine non-linear payoffs under many different scenarios for the underlying risk factors. The method has also the advantage of allowing for time varying parameters within the risk factor processes. See for example Broadie and Glasserman (1998). There are two major drawbacks to Monte Carlo methods. First, they are computationally very expensive. Thousands of simulations of the risk factors may have to be carried out for results to be trusted. For a portfolio with a large number of assets this procedure may quickly become unmanageable, since each asset within the portfolio must be valued using these simulations. Second, the method is prone to model risk. The risk factors and the pricing models of assets with non-linear payoffs may both be mis-specified. And, as is the case of the parametric VaR, there is the risk of mis-specifying the model parameters. 3.2. Parameter estimation The parametric and Monte Carlo VaR methods require parameters to be estimated. When one is interested in short time horizons, the primary goal is to estimate the volatility and covariance/correlation.17 We outline some of the common estimation techniques here. 3.2.1. Historical volatility There are two different approaches to modelling volatility and covariance using only historical data. The more common approach gives constant weights to each data point. It assumes that volatility and covariance are constant over time. The other approach attempts to address the fact that volatility and covariance are time dependent by giving more weight to the more recent data points in the sample window. First assume that variances and covariances do not to change over time. Take a large window of length n in which historical data on the risk factors is available. Let Yi,tk be the return of factor i at time period tk. The variance of factor i and covariance of factors i and j are then computed by giving equal weights to each data point in the past. The n-period estimates at time T for the variance and covariance ^2 i = 1 n - 1 T -1 t=T -n (Yi,t - ^Yi )2 , where ^Yi = 1 n T -1 t=T -n Yi,t , (16) 17 For example, over short time horizons, the mean return is usually assumed to be zero. 52 B.O. Bradley and M.S. Taqqu and ^i,j = 1 n - 1 T -1 t=T -n (Yi,t - ^Yi )(Yj,t - ^Yj ) (17) respectively.18 Since equal weight is given to each data point in the sample, the estimated volatility and covariance change only slowly. If one keeps the window length fixed, the estimated values will rise or fall as new large returns enter the sample period and old large returns leave it. This means that even a single extreme return will affect the estimates in the same way, whether it occurred at time T -1 or time T -n. The estimated variance and covariance, therefore, are greatly influenced by the choice of the window size n. Another stylized fact of financial time series, however, is that volatility itself is volatile. With this in mind, another historical estimate of variance and covariance uses a weighting scheme which gives more weight to more recent observations. The corresponding estimates of variance and covariance are ^2 i (T ) = T -1 t=T -n t (Yi,t - ^Yi )2 , ^i,j (T ) = T -1 t=T -n t (Yi,t - ^Yi )(Yj,t - ^Yj ), where the weights t , T -1 t=T -n t = 1, are chosen to reflect current volatility conditions. In particular, more weight is given to recent observations: 1 > T -1 > T -2 > > T -n > 0. The model using exponentially decreasing weights, such as that used by RiskMetrics, is probably the most popular. In RiskMetrics, the volatility estimator is given by ^i(T ) = (1 - ) n t=1 t-1(Yi,T -t - ^Yi )2, (18) where the decay factor is chosen to best match a large group of assets.19 The covariance estimate is similar. RiskMetrics chooses = 0.94 in the case of daily returns. 18 The normalization constant n - 1 gives an unbiased estimate. It is sometimes replaced by n in order to correspond to the maximum likelihood estimate. 19 In this estimate it is assumed that the decay parameter and window length n are such that the approximation n t=1 t-1 = 1 1 - is valid. Ch. 2: Financial Risk and Heavy Tails 53 The choice (18) allows the forecast of the next periods volatility given the current information, and hence to make parametric VaR calculations given the current information. To see this, assume that the time T (negative) return distribution XT is being modelled by XT d = T ZT , (19) where Zt , t Z, is an innovation process, that is a sequence of i.i.d. mean zero and unit variance random variables. Letting Ft denote the filtration20 we have 2 T +1|FT = (1 - ) t=0 t X2 i,T -t = (1 - )X2 T + (1 - )(X2 T -1 + X2 T -2 + 2 X2 T -3 + ) = (1 - )X2 T + 2 T |FT -1 . This allows us to make our VaR calculation depend on the conditional return distribution FXT +1|FT . If VaRT +1 (X) denotes the estimated value at risk for X at confidence level for the period T + 1 at time T , then, by (19), VaRT +1 (X) = T +1|FT q, where q is the quantile of the innovation process Zt+1. In RiskMetrics Z is N(0,1), in which case the return process Xt is conditionally normal.21 The modelling of the volatility using exponential weights and the assumption of conditional normality has two major effects. First, the volatility estimator, which is now truly time varying, attempts to account for the local volatility conditions by giving more weight to the most recent observations. It also has a second less obvious, but no less profound effect on the calculation of VaR. Even though the conditional return distribution may be assumed to be normal (thin-tailed) within the VaR calculation, the unconditional return distribution will typically have heavier tails than the normal. This result is not surprising since we may think of our time t return as being sampled from a normal distribution with changing variance. This means that our unconditional distribution is more likely to fit the empirical returns and thus to provide a better estimate of the true VaR. 3.2.2. ARCH/GARCH volatilities The ARCH/GARCH class of conditional volatility models were first proposed by Engle (1982) and Bollerslev (1986) respectively. We will again assume that the (negative) return 20 Conditioning over FT means conditioning over all the observations X1,... ,XT . 21 RiskMetrics allows the assumption of conditional normality to be relaxed in favor of heavier-tailed conditional distributions. For example the conditional distribution of returns may be mixture of normals or a generalized error distribution, that is, a double sided exponential. 54 B.O. Bradley and M.S. Taqqu process to be modelled is of the form (19) where Zt are i.i.d. mean zero, unit variance random variables representing the innovations of the return process. In the GARCH(p,q) model,22 the conditional variance is given by 2 t = 0 + p i=1 iX2 t-i + q j=1 j 2 t-j . In its most common form, Zt N(0,1), so that the returns are conditionally normal. Just as in the exponentially weighted model for volatility (see Section 3.1.1), the GARCH model with a conditionally normal return distribution can lead to heavy tails in the unconditional return distribution. In the case of the GARCH(1,1) model Xt = tZt, where Zt N(0,1) i.i.d., 2 t = 0 + 1X2 t-1 + 12 t-1, it is straightforward to show that under certain conditions23 the unconditional centered kurtosis is given by K = EX4 t (EX2 t )2 - 3 = 62 1 1 - 2 1 - 211 - 32 1 , which for most financial return series will be greater than zero. For example, in the case of a stationary ARCH(1) model, Xt = 0 + 1X2 t-1Zt , with 0 > 0 and 1 (0,2 e ), where is Euler's constant,24 Embrechts, Klüppelberg and Mikosch (1997) show that the unconditional distribution is formally heavy-tailed, that is P(X > x) cx, x , (20) where /2 > 0 is the unique solution to the equation h(u) = (21)u (u + 1 2 ) = 1. The ARCH/GARCH models allow for both volatility clustering (periods of large volatility) and for heavy tails. The GARCH(1,1) estimated volatility process t for the NASDAQ is displayed in Figure 5. The assumption of conditional normality can be checked, for ex- 22 The ARCH(p) model first proposed by Engle is equivalent to the GARCH(p,0) model later proposed by Bollerslev. The advantage of the GARCH model over the ARCH model is that it requires fewer parameters to be estimated, because AR models (ARCH) of high order are often less parsimonious than ARMA models (GARCH) of lower order. 23 These conditions are 1 + 1 < 1 to guarantee stationarity, and 32 1 + 211 + 2 1 < 1 for K > 0. Both are generally met in financial time series. 24 Euler's constant is given by = limn( n k=1 1 k - lnn) and is approximately 0.577. Ch. 2: Financial Risk and Heavy Tails 55 Fig. 5. GARCH(1,1) volatilities t for NASDAQ. Fig. 6. Quantile­quantile (QQ) plot of the conditionally normal GARCH(1,1) standardized ex post innovations for NASDAQ with the N (0,1) distribution. ample, by examining a QQ plot of the ex post innovations, that is Zt = Xt /^t . Figure 6 displays the QQ plot of Zt in the traditional, conditionally normal GARCH(1,1) model for the NASDAQ. The fit of the GARCH(1,1) conditionally normal model in the lower tail is poor, showing the lower tail of Zt is heavier than the normal distribution. If the distribution of the historical innovations Zt-n,...,Zt is heavier-tailed than the normal, one can modify the model to allow a heavy-tailed conditional distribution FXt+1|Ft .25 In Panorska, Mittnik and Rachev (1995) and Mittnik, Paolella and Rachev (1997), returns on the Nikkei index are modelled using an ARMA-GARCH model of the form Xt = a0 + r i=1 aiXt-i + t + s j=1 bj t-j (21) (contrast with (19)), where t = t Zt , with Zt an i.i.d. location zero, unit scale heavytailed random variable. The conditional distribution of the return series FXt |Ft-1 is given 25 For example the GARCH module in the statistical software package SPlus allows for three different nonGaussian conditional distributions. As long as the user can estimate the GARCH parameters, usually through maximum likelihood, there are virtually no limits to the choice of the conditional distribution. 56 B.O. Bradley and M.S. Taqqu by the distribution type of Zt . The ARMA structure in (21) is used to model the conditional mean E(Xt |Ft-1) of the return series Xt . The GARCH structure is imposed on the scale parameter26 t through 2 t = 0 + p i=1 i2 t-i + q j=1 j 2 t-j . Several choices for the distribution of Zt are tested. In the case where Zt are realizations from a stable distribution, the GARCH model used is t = 0 + p i=1 i|t-i| + q j=1 j t-j , and the index of stability exponent for the stable distribution is constrained to be greater than one. Using several goodness of fit measures, the authors find that it is better to model the conditional distribution of returns for the Nikkei than the unconditional distribution, since the unconditional distribution cannot capture the observed temporal dependencies of the return series.27 Within the tested models for Zt , the partially asymmetric Weibull, the Student-t, and the asymmetric stable all outperform the normal. In order to perform reliable value at risk calculations one must model the tail of the distribution Zt particularly well. The Anderson­Darling (AD) statistic can be used to measure goodness of fit in the tails. Letting Femp(x) and Fhyp(x) denote the empirical and hypothesized parametric distributions respectively, the AD statistic AD = sup xR |Femp(x) - Fhyp(x)| Fhyp(x)(1 - Fhyp(x)) gives more weight to the tails of the distribution. Using this statistic, as well as others, the authors propose the asymmetric stable distribution as the best of the tested models for performing VaR calculations at high quantiles. The class of ARCH/GARCH models have become increasingly popular for computing VaR. The modelling of the conditional distribution has two immediate benefits. First, it allows for the predicted volatility (or scaling) to use local information, i.e., it allows for volatility clustering. Second, since volatility is allowed to be volatile, the unconditional distribution will typically not be thin-tailed. This is true, as we have seen, even when the conditional distribution is normal. 26 In their model t is to be interpreted as a scale parameter, not necessarily a volatility, since for some of the distributional choices for Zt , the variance may not exist. 27 The type of the conditional distribution is that of Zt , the unconditional distribution is that of Xt . Ch. 2: Financial Risk and Heavy Tails 57 There now exist many generalizations of the class of ARCH/GARCH models. Models such as EGARCH, HGARCH, AGARCH, and others, all attempt to use the local volatility structure to better predict future volatility while trying to account for other observed phenomenon. See Bollerslev, Chou and Kroner (1992) for a review. The time series of returns {Xt}tZ in (19) is generally assumed to be stationary. In a recent paper, Mikosch and St˘aric˘a (2000) show that this assumption is not supported, at least globally, by the S&P 500 from 1953 to 1990 and the DEM/USD foreign exchange rate from 1975 to 1982. The authors show that when using a GARCH model the parameters must be updated to account for changes of structure (changes in the unconditional variance) of the time series. A method for detecting these changes is also proposed. Additionally, they show that the long range dependence behavior associated with the absolute return series, another of the so called stylized facts of financial time series, may only be an artifact of structural changes in the series, that is, to non-stationarity. Stochastic volatility models are not limited to the class of ARCH/GARCH models and their generalizations. Other models may involve additional sources of randomness. For example, the model of Hull and White (1987) dYt = Yt + tYt dW (1) t , dVt = Vt + Vt dW(2) t , where 2 t = Vt and (W(1) t ,W(2) t ) is a bivariate Wiener process, introduces a second source of randomness through the volatility. The two sources of randomness W (1) t and W (2) t need not be uncorrelated. Again, the introduction of a stochastic scaling generally leads to an unconditional return distribution which is leptokurtotic. See Shiryaev (1999), for an introduction to stochastic volatility models in discrete and continuous time. 3.2.3. Implied volatilities The parametric VaR calculation requires a forecast of the volatility. All of the models examined so far have used historical data. One may prefer to use a forward looking data set instead of historical data in the forecast of volatility, for example options data, which provide the market estimate of future volatility. To do so, one could use the implied volatility derived from the Black­Scholes model. In this model, European call options prices Ct = C(St ,K,r,,T - t) are an increasing function of the volatility . The stock price St at time t, the strike price K, the interest rate r and the time to expiration T - t are known at time t. Since is the only unknown parameter/variable, we may then use the observed market price Ct to solve for . This estimate of is commonly called the (Black­Scholes) implied volatility. The Black­Scholes model, however is imperfect. While should be constant, one typically observes that depends on the time to expiration T - t and on the strike price K. For fixed T - t, the implied volatility = (T - t,K) as a function of the strike price K is often convex, a phenomenon known as the volatility smile. To obtain volatility estimates it is common to use at-the-money options, where St = K, since they are the most actively traded and hence are thought to provide the most accurate estimates. 58 B.O. Bradley and M.S. Taqqu 3.2.4. Extreme value theory Since VaR calculations are only concerned with the tails of a probability distribution, techniques from Extreme Value Theory (EVT) may be particularly effective. Proponents of EVT have made compelling arguments for its use in calculating VaR and for risk management in general. We will discuss EVT in Section 6. 4. Risk measures We have considered two different measures of risk: standard deviation and value at risk. Standard deviation, used by Markowitz and others, is still commonly used in portfolio theory today. The second measure, VaR, is the standard measure used today by regulators and investment banks. We detailed some of the computational issues surrounding these measures but have not discussed their validity. It is easy to criticize standard deviation and value at risk. Even in Markowitz's pioneering work on portfolio theory, the shortcomings of standard deviation as a risk measure were recognized. In Markowitz (1959), an entire chapter is devoted to semi-variance28 as a potential alternative. In Artzner et al. (1997), for example, measures based on standard deviation are criticized based on their inability to describe rare events and VaR is criticized because of its inability to aggregate risks in a logical manner. In two now famous papers (Artzner et al., 1997, 1999) on financial risk, the authors propose a set of properties any reasonable risk measure should satisfy. Any risk measure which satisfies these properties is called coherent. We shall now introduce these properties and indicate why the risk measures described above are not coherent. 4.1. Coherent risk measures Suppose that the financial position of an investor will lead at time T to a loss X,29 which is a random variable. Let G be the set of all such X. A risk measure is defined as a mapping from G to R. Intuitively, for a given potential loss X in the future we may think of (X) as the minimum amount of cash that we need to invest prudently today (in a reference instrument) to be allowed to take the position X.30 A risk measure may be coherent or not. Definition 4.1. Given a reference instrument with return r, possibly random, a risk measure satisfying the following four axioms is said to be coherent: 28 In order to put the accent on (negative) returns above the mean, semi-variance is defined as X = E[(X - EX)1{X>EX}]2. 29 Losses are positive and profits negative. This is at odds with the authors' original notation. 30 The authors refer to X as risk and axiomatically define acceptance sets which are sets of acceptable risks, and proceed to define measures of risk as describing the risks proximity to the acceptance set. Ch. 2: Financial Risk and Heavy Tails 59 Translation invariance. For all X G and all R, we have (X + r) = (X) + . This means that adding the amount to the position, and investing it prudently, reduces the overall risk of the position by . Subadditivity. For all X1 and X2 G, (X1 + X2) (X1) + (X2). Hence a merger does not create extra risk. This is the basis for diversification. Positive homogeneity. For all 0 and all X G, (X) = (X). This requires that the risk scales with the size of a position. If the size of a position renders it illiquid, then this should be considered when modelling the future net worth. Monotonicity. For all X and Y G with X Y, we have (X) (Y). If the future net loss X is greater, then X is more risky. The term coherent measure of risk has found its way into the risk management vernacular. It is defined, for example, in the second edition of Philippe Jorion's Value at Risk (Jorion, 2001). Note that the axioms of translation invariance and monotonicity rule out standard deviation as a coherent measure of risk. Indeed, since X+r = X, translation invariance fails, and since also penalizes the investor for large profits as well as large losses, monotonicity fails as well. Consider, for example, two portfolios X and Y which are identical except for the free lottery ticket held in Y. We have X Y, since there is no down-side to the free ticket and therefore the potential losses in Y are smaller than in X. Nevertheless, the standard deviation measure assigns to Y a higher risk, hence monotonicity fails. Markowitz's alternative risk measure semi-variance is not coherent either because it is not subadditive. 4.2. Expected shortfall VaR is not a coherent measure of risk because it fails to be subadditive in general. One can indeed easily construct scenarios [see Albanese (1997)] where for two positions X and Y it is true that VaR(X + Y) > VaR(X) + VaR(Y). This is contrary to the risk managers feelings, that the overall risk of different trading desks is bounded by the sum of their individual risks. In short, VaR fails to aggregate risks in a logical manner. In addition, VaR tells us nothing about the size of the loss that exceeds it. Two distributions may have the same VaR yet be dramatically different in the tail. Hence neither the standard deviation nor VaR are coherent. On the other hand, the expected shortfall, also called tail conditional expectation, is a coherent risk measure. Intuitively, the expected shortfall addresses the question: given that we will have a bad day, how bad do we expect it to be? It is a more conservative measure than VaR and looks at the average of all losses that exceed VaR. Formally, the expected shortfall for risk X and high confidence level is defined as follows: 60 B.O. Bradley and M.S. Taqqu Definition 4.2. Let X be the random variable whose distribution function FX describes the negative profit and loss distribution (P&L) of the risky financial position at the specified horizon time (thus losses are positive). Then the expected shortfall for X is S(X) = E X|X > VaR(X) . (22) Suppose, for example, that a portfolio's risk is to be calculated through simulation. If 1000 simulations are run, then for = 0.95, the portfolios VaR would be the smallest of the 50 largest losses. The corresponding expected shortfall would be estimated by the numerical average of these 50 largest losses. Expected shortfall, therefore, tells us something about the expected size of a loss exceeding VaR. It is subadditive, coherent and puts fewer restrictions on the distribution of X, requiring only a finite first moment to be well defined. Additionally, it may be reconciled with the idea of maximizing expected utility. Levy and Kroll (1978) show that for all utility functions U with the properties described in Section 2.1 and all random variables X and Y (representing losses) that EU(-X) EU(-Y) S(X) S(Y) for all (0,1). Expected shortfall can be used in portfolio theory as a replacement of the standard deviation if the distribution of X is normal, or more generally, elliptical. As we will see in Section 5.3, in this case any positive homogeneous translation invariant risk measure will yield the same optimal linear portfolio for the same level of expected return. Unlike standard deviation, expected shortfall, as defined in (22), does not measure deviation from the mean. Bertsimas, Lauprete and Samarov (2000) define shortfall31 as s(X) = E X|X > VaR(X) - EX. (23) The subtraction of the mean makes it more similar to the standard deviation X = E(X - EX)2 and again, as far as portfolio theory is concerned, in the case of elliptical distributions, one obtains the same optimal portfolio for the same level of expected return if one uses s to measure risk. In fact, it can be shown that for a linear portfolio Xp = w1X1 + + wnXn of multivariate normally distributed returns X N(,), that s(Xp) = (-1()) 1 - p, where (x) and (x) are respectively, the pdf and cdf of a standard normal random variable evaluated at x. In other words, arg min Aw=b wT w = arg min Aw=b s wT X , 31 We still assume losses are positive. This is at odds with the authors notation. Ch. 2: Financial Risk and Heavy Tails 61 for all (0,1), where Aw = b is any set of linear constraints, including constraints that do not require all portfolios to have the same mean. Note, however, that s is not coherent since it violates the axioms of translation invariance and monotonicity. 5. Portfolios and dependence The measure of dependence most popular in the financial community is linear correlation.32 Its popularity may be traced back to Markowitz' mean variance portfolio theory since, under the assumption of multivariate normality, the correlation is the canonical measure of dependence. Outside of the world of multivariate normal distributions, correlation as a measure of dependence may lead to misleading conclusions (see Section 5.2.1).33 The linear correlation between two random variables X and Y, defined by (X,Y) = Cov(X,Y) XY , (24) is a measure of linear dependence between X and Y. The word linear is used because when variances are finite, (X,Y) = 1 if and only if Y is an affine transformation of X almost surely, that is if Y = aX + b a.s. for some constants a R \ {0}, and b R. When the distribution of returns X is multivariate normal, the dependence structure of the returns is determined completely by the covariance matrix or, equivalently, by the correlation matrix . One has = [][] where [] is a diagonal matrix with the standard deviations j on the diagonal. When returns are not multivariate normal, linear correlation may no longer be a meaningful measure of dependence. To deal with potential alternatives, we will introduce the concept of copulas, describe various measures of dependence and focus on elliptical distributions. For additional details and proofs, see Embrechts, McNeil and Straumann (2001), Lindskog (2000b), Nelsen (1999), Joe (1997) and Fang, Kotz and Ng (1990). 5.1. Copulas When X = (X1,...,Xn) N(,), the distribution of any linear portfolio of the Xj 's is normal with known mean and variance. In the non-normal case, the joint distribution of X, F(x1,...,xn) = P(X1 x1,...,Xn xn) is not fully described by its mean and covariance. One would like, however, to describe the joint distribution by specifying separately the marginal distributions, that is, the distribution of the components X1,...,Xn, and the dependence structure. One can do this with copulas. 32 Also known as Pearson's correlation. 33 Linear correlation is actually the canonical measure of dependence for the class of elliptical distributions. This class will be introduced shortly and may be thought of as an extension of multivariate normal distributions. 62 B.O. Bradley and M.S. Taqqu Definition 5.1. An n-copula is any function C :[0,1]n [0,1] satisfying the following properties: (1) For every u = (u1,...,un) in [0,1]n we have that C(u) = 0 if at least one component uj = 0 and C(u) = uj if u = (1,...,1,uj ,1,...,1). (2) For every a,b [0,1]n such that a b 2 i1=1 2 in=1 (-1)i1++in C(u1i1 ,...,unin ) 0, (25) where uj1 = aj and uj2 = bj for j = 1,...,n. Corollary 5.1 below provides a concrete way to construct copulas. It is based on the following theorem due to Sklar [see Sklar (1996), Nelsen (1999)], which states that by using copulas one can separate the dependence structure of the multivariate distribution from the marginal behavior. Theorem 5.1 (Sklar). Let F be an n-dimensional distribution function with marginals Xj Fj for j = 1,...,n. Then there exists an n-copula C :[0,1]n [0,1] such that for every x = (x1,...,xn) Rn, F(x1,...,xn) = C F1(x1),...,Fn(xn) . (26) Furthermore, if the Fj are continuous then C is unique. Conversely, if C is an n-copula and Fj are distribution functions, then F in (26) is an n-dimensional distribution function with marginals Fj . The function C is called the copula of the multivariate distribution of X. Assuming continuity of the marginals Fj , j = 1,...,n, we see that the copula C of F is the joint distribution of the uniform transformed variables Fj (Xj ), C(u1,...,un) = F F-1 1 (u1),...,F-1 n (un) . (27) Corollary 5.1. If the Fj are the cdfs of U(0,1) random variables, then xj = Fj (xj ), 0 < xj < 1, and (26) becomes F(x1,...,xn) = C(x1,...,xn). Therefore the copula C may be thought of as the cumulative distribution function (cdf) of a random vector with uniform marginals. Copulas allow us to model the joint distribution of X in two natural steps. First, one models the univariate marginals Xj . Second, one chooses a copula that characterizes the dependence structure of the joint distribution. Any n-dimensional distribution function can serve as a copula. The following examples relate familiar multivariate distributions to their associated copulas and marginals. Ch. 2: Financial Risk and Heavy Tails 63 Example 5.1. Suppose X1,...,Xn are independent then F(x1,...,xn) = P(X1 x1,...,Xn xn) = P(X1 x1)P(Xn xn) = F1(x1)Fn(xn). Hence, in the case of independence, C(u1,...,un) = u1 un for all (u1,...,un) [0,1]n. Example 5.2. Suppose (X1,...,Xn) is multivariate standard normal with linear correlation matrix . Let (z) = P(Z z) for Z N(0,1). Then F(x1,...,xn) = P(X1 x1,...,Xn xn) = P F1(X1) F1(x1),...,Fn(Xn) Fn(xn) = CGa (x1),...,(xn) , where CGa (u1,...,un) = 1 ||(2)n -1(u1) - -1(un) e- 1 2 sT-1s ds (28) is called the multivariate Gaussian copula. Example 5.3. Suppose (X1,...,Xn) is multivariate t with degrees of freedom and linear correlation matrix .34 Let t(x) = P(T x) where T t. Then F(x1,...,xn) = P(X1 x1,...,Xn xn) = P F1(X1) F1(x1),...,Fn(Xn) Fn(xn) = Ct t(x1),...,t(xn) , where Ct (u1,...,un) = (+n 2 ) ( 2 ) ||()n t-1 (u1) - t-1 (un) - 1 + sT-1s -(+n)/2 ds (29) is called the multivariate t copula. 34 Its cdf is given by (29) where the upper limits t-1 (u1),... ,t-1 (un) are replaced by x1,... ,xn respectively. A multivariate t is easy to generate. Generate a multivariate normal with covariance matrix and divide it by 2 / where 2 is an independent chi-squared random variable with degrees of freedom. 64 B.O. Bradley and M.S. Taqqu In Examples 5.2 and 5.3, || denotes the determinant of the matrix . In these examples, the copulas were introduced through the joint distribution, but it is important to remember that the copula characterizes the dependence structure of the multivariate distribution through (26). The Gaussian and t copulas (28) and (29) exist separately from their associated multivariate distributions. Example 5.4. The bivariate Gumbel copula CGu is given by CGu (u1,u2) = exp - (-lnu1)1/ + (-lnu2)1/ , (30) where 0 < 1 is a parameter controlling the dependence, 0+ implies perfect dependence (see Section 5.2.3), and = 1 implies independence. Example 5.5. The bivariate Clayton copula CCl is given by CCl (u1,u2) = u - 1 + u - 2 - 1 -1/ , (31) where 0 < < is a parameter controlling the dependence, 0+ implies independence, and implies perfect dependence. This copula family is sometimes referred to as the Kimeldorf and Sampson family. Both the Gumbel and Clayton copulas are strict Archimedean copulas. Archimedean copulas are defined as follows. Let :[0,1] [0,) with (0) = and (1) = 0 be a continuous, convex, strictly decreasing function. The transformation -1 maintains the uniform 1-dimensional distribution since -1(u) = u, u [0,1]. To obtain a 2-dimensional distribution function use instead of -1(u), u [0,1], the function -1((u) + (v)), u,v [0,1]. Definition 5.2. A strict Archimedean copula with generator is of the form C(u,v) = -1 (u) + (v) , u,v [0,1]. (32) Example 5.6. The function (t) = (-lnt)1/,0 < 1, generates the bivariate Gumbel copula CGu (see Example 5.4). Example 5.7. The function (t) = (t- - 1)/, > 0, generates the bivariate Clayton copula CCl (see Example 5.5). Example 5.8. The function (t) = -ln((e-t - 1)/(e- - 1)), R \ {0}, generates the bivariate Frank copula CFr (u,v) = - 1 ln 1 + (e-u - 1)(e-v - 1) e- - 1 [see Frank (1979)]. Ch. 2: Financial Risk and Heavy Tails 65 If (0) < , then the term strict in Definition 5.2 is dropped and -1(s) in (32) is replaced by the pseudo-inverse [-1](s) which equals -1(s) if 0 s (0) and is zero otherwise. Example 5.9. The function (t) = 1 - t, t [0,1], satisfies (0) = 1 and hence [-1](t) = max(1 - t,0). It generates the non-strict Archimedean copula C(u,v) = max(u + v - 1,0). The class of Archimedean copulas has many nice properties, including various simple multivariate extensions. For more on Archimedean copulas see Lindskog (2000b), Nelsen (1999), Joe (1997) and Embrechts, Lindskog and McNeil (2001). Figure 7 illustrates how the choice of a copula can affect the joint distribution. Each figure shows contours of constant density of a bivariate distribution (X,Y) with standard normal marginals and linear correlations 0.7. The differences in the distributions is due to the choice of the copula. [For an introduction on the choice of a copula, see Frees and Valdez (1998).] Fig. 7. Contours of constant density for different bivariate distributions with standard normal marginals. All have roughly the same linear correlation, and differ only in their copula. Clockwise from upper left: Gaussian, t2, Gumbel, Clayton. See Examples 5.2­5.5 for the copula definitions. 66 B.O. Bradley and M.S. Taqqu The following theorem provides a bound for the joint cdf. Theorem 5.2 (Fréchet). Let F be the joint cdf of distribution with univariate marginals F1,...,Fn. Then for all x Rn max{0,F1(x1) + + Fn(xn) - (n - 1)} CL(F1(x1),...,Fn(xn)) F(x1,...,xn) C(F1(x1),...,Fn(xn)) min{F1(x1),...,Fn(xn)} CU (F1(x1),...,Fn(xn)) . The function CU (u1 ...,un) is a copula for all n 2, but the function CL(u1,...,un) is a copula for n = 2 only. If n = 2, the copulas CL and CU are the bivariate cdf's of the random vectors (U,1 - U) and (U,U) respectively, where U U(0,1). Another important property of copulas is their invariance under an increasing transformation of the marginals. Theorem 5.3. Let X1,...,Xn be continuous random variables with copula C. Let 1,...,n be strictly increasing transformations. Then the random vector (1(X1),..., n(Xn)) has the same copula C as (X1,...,Xn). 5.2. Measures of dependence As already mentioned, linear correlation is the only measure of dependence involved in the mean­variance portfolio theory. This theory assumes, either implicitly or explicitly, that returns are multivariate normal. This assumption seems implausible today given the many complex financial products in the marketplace and the empirical evidence against normality. Without the restrictive assumption of normality, is linear correlation still an appropriate measure of dependence? Linear correlation is often used in the financial community to describe any form of dependence. As illustrated in Embrechts, McNeil and Straumann (2001, 1999), linear correlation is often a very misunderstood measure of dependence. Consider the following example. Example 5.10. Figure 8 represent 10000 simulations from bivariate distributions (X,Y)L and (X,Y)R. In both cases X and Y have a standard normal distribution with (approximately) the same linear correlation 0.7. Thus, on the basis of the marginal distributions and linear correlation, the two distributions are indistinguishable. The two distributions are however clearly different. If positive values represent losses, the distribution on the right is clearly of greater concern to the risk manager since large losses in X and Y occur simultaneously. The two distributions differ only in their copula. In the figure on the left the dependence structure is given by the bivariate Gaussian copula. Since the marginals are standard normal, this means that distribution is the bivariate Ch. 2: Financial Risk and Heavy Tails 67 Fig. 8. Simulation of 10000 realizations from bivariate distributions both with standard normal marginals and linear correlation of 0.7. The distribution on the left has a Gaussian copula, on the right a Gumbel copula. Compare the shapes with those illustrated in Figure 7, where the population distribution is used. standard normal distribution with the given correlation coefficient. The copula in the figure on the right the Gumbel copula given in (30) with = 1/2. Various values of were tried until the simulation sample linear correlation was 0.7. We now briefly describe several measures of dependence which may be useful to the risk manager. Again the reader in encouraged to look at the above references, especially Embrechts, McNeil and Straumann (2001) for details. 5.2.1. Linear correlation The linear correlation coefficient , defined in (24), is a commonly misused measure of dependence. To illustrate the confusion involved in interpreting it, consider the following classic example. Let X N(,2) and let Y = X2. Then (X,Y) = 0, yet clearly X and Y are dependent. Unless we are willing to make certain assumptions about the multivariate distribution, linear correlation can therefore be a misleading measure of dependence. Since the copula of a multivariate distribution describes its dependence structure we would like to use measures of dependence which are copula-based. Linear correlation is not such a measure. 5.2.2. Rank correlation Two well-known rank correlation measures which are copula based and have better properties than linear correlation are the Kendalĺs tau and Spearman's rho. Definition 5.3. Let (X1,Y1) and (X2,Y2) be two independent copies of (X,Y). Then , denoted , is given by (X,Y) = P (X1 - X2)(Y1 - Y2) > 0 - P (X1 - X2)(Y1 - Y2) < 0 . 68 B.O. Bradley and M.S. Taqqu If the marginal distributions FX and FY of X and Y are continuous and if F is the bivariate distribution function of (X,Y) with copula C, then can be expressed in terms of C as follows [see Embrechts, McNeil and Straumann (2001)]: (X,Y) = 4 1 0 1 0 C(u,v)dC(u,v) - 1. Definition 5.4. Let X FX and Y FY . Spearman's correlation, denoted S, is the linear correlation of FX(X) and FY (Y), that is, S(X,Y) = FX(X),FY (Y) . Spearman's correlation can also be expressed in a form similar to Definition 5.3 [see Lindskog (2000b)]. Let (X1,Y1),(X2,Y2) and (X3,Y3) be three independent copies of (X,Y). Then S(X,Y) = 3 P (X1 - X2)(Y1 - Y3) > 0 - P (X1 - X2)(Y1 - Y3) < 0 . If the marginal distributions are continuous, S is related to the copula of the joint distribution as follows: S(X,Y) = 12 1 0 1 0 C(u,v)dudv - 3. Whereas linear correlation is a measure of linear dependence, both Kendalĺs tau and Spearman's rho are measures of monotonic dependence. Since they are copula based, they are invariant under strictly increasing transformations.35 Indeed, if 1, 2 are strictly increasing transformations, then 1(X1),2(X2) = (X1,X2), S 1(X1),2(X2) = S(X1,X2), but 1(X1),2(X2) = (X1,X2). 5.2.3. Comonotonicity An additional important property of these rank correlations is their handling of perfect dependence. By perfect dependencewe mean intuitively that X and Y are monotone functions of the same source of randomness. Recall that in the bivariate case, the Fréchet bounds CL and CU in Theorem 5.2 are themselves copulas. The following theorem shows that if the copula is CL or CU then X and Y are perfectly dependent. 35 Recall that invariance under increasing transformations is a property of copulas. Ch. 2: Financial Risk and Heavy Tails 69 Theorem 5.4 (Embrechts, McNeil and Straumann, 2001). Suppose that the copula C of (X,Y) is either CL or CU . Then there exist monotone functions and and a random variable Z such that (X,Y) d = (Z),(Z) . If C = CL then and are increasing and decreasing respectively. If C = CU , then both and are increasing. X and Y are said to be countermonotonic if they have copula CL. If they have copula CU , they are said to be comonotonic. In fact, when FX and FY are continuous, C = CL Y = T (X) a.s., T = F-1 Y (1 - FX) , C = CU Y = T (X) a.s., T = F-1 Y FX . Kendalĺs tau and Spearman's rho handle perfect dependence in a reasonable manner. Indeed, Theorem 5.5 (Embrechts, McNeil and Straumann, 2001). Let (X,Y) F with continuous marginals and copula C. Then (X,Y) = -1 S(X,Y) = -1 C = CL X and Y are countermonotonic, (X,Y) = 1 S(X,Y) = 1 C = CU X and Y are comonotonic. The following theorem due to Höffding and Fréchet deals with linear correlation. See Embrechts, McNeil and Straumann (2001) for its proof. Theorem 5.6. Let (X,Y) be a random vector with marginals non-degenerate FX and FY and unspecified dependence structure. If X and Y have finite variance, then (1) The set of possible linear correlations is a closed interval [min,max] with min < 0 < max. (2) The extremal linear correlation = min is attained iff X and Y are countermonotonic; = max is attained iff X and Y are comonotonic. (3) min = -1 X and -Y are of the same type;36 max = 1 X and Y are of the same type. The following example shows that linear correlation does not handle perfect dependence in a reasonable manner. 36 Recall that two random variables are the same type if their distributions are the same up to a change in location and scale. 70 B.O. Bradley and M.S. Taqqu Fig. 9. Range of maximal and minimal linear correlation in Example 5.11. The x-axis is in units of . As increases, both the maximal and minimal linear correlations tend to zero. Example 5.11 (Embrechts, McNeil and Straumann, 2001). Let X Lognormal(0,1) and Y Lognormal(0,2) with > 0. By Theorem 5.6, = min and = max when X and Y are countermonotonic and comonotonic respectively. By Theorem 5.4, (X,Y) d = ((Z),(Z)), and in fact, (X,Y) d = (eZ,e-Z) when X and Y are countermonotonic and (X,Y) d = (eZ,eZ) when X and Y are comonotonic, where Z N(0,1). Hence min = (eZ,e-Z) and max = (eZ,eZ) where Z N(0,1). Using the properties of the lognormal distribution, these maximal and minimal correlations can be evaluated explicitly and one gets min = e- - 1 (e - 1)(e2 - 1) , max = e - 1 (e - 1)(e2 - 1) . As increases, the maximal and minimal linear correlation both tend to zero even though X and Y are monotonic functions of the same source of randomness. This is illustrated in Figure 9. 5.2.4. Tail dependence There is a saying in finance that in times of stress all correlations go to one.37 While it shows that the financial community uses linear correlation to describe any measure of dependence, it can also serve as motivation for the next measure of dependence, known as tail dependence. Bivariate tail dependence measures the amount of dependence in the upper and lower quadrant tail of the distribution. This is of great interest to the risk manager trying to guard against concurrent bad events in the tails. 37 See Cizeau, Potters and Bouchaud (2001) for example. Ch. 2: Financial Risk and Heavy Tails 71 Definition 5.5. Let X FX and Y FY and observe that as 1-, F-1 X () and F-1 Y () . The coefficient of upper tail dependence U is U (X,Y) = lim 1P Y > F-1 Y ()|X > F-1 X () (33) provided the limit exists. If U = 0, then X and Y are said to asymptotically independent in the upper tail. If U (0,1], then X and Y are asymptotically dependent in the upper tail. The coefficient of lower tail dependence L is similarly defined: L(X,Y) = lim 0+ P Y < F-1 Y ()|X < F-1 X () . Since U (X,Y) = lim 1- 1 - P(X F-1 X ()) - P(Y F-1 Y ()) + P(X F-1 X (),Y F-1 Y ()) 1 - P(X F-1 X ()) , U , as well as L, can be expressed in terms of copulas. Let (X,Y) have continuous distribution F with copula C. It is easily seen that the coefficient of upper tail dependence U can be expressed as U (X,Y) = lim 1- C(,) 1 - , (34) where C(,) = 1 - 2 + C(,).38 Similarly, L(X,Y) = lim 0+ C(,) . Example 5.12. Recall the simulation Example 5.10. In this example, both distributions had the same marginal distributions with the same linear correlation. Yet the distributions were clearly different in the upper tail. This difference came from the choice of copula and may now be quantified by using the notion of upper tail dependence. In Figure 8 on the left, F(x,y) = CGa ((x),(y)), denotes the standard N(0,1) cdf and CGa is given by 38 If (U1,U2)T C then C(u1,u2) = P(U1 > u1,U2 > u2) = 1 - u1 - u2 + C(u1,u2). 72 B.O. Bradley and M.S. Taqqu (28) that is, the distribution is a bivariate standard normal with linear correlation = 0.7. The coefficient of upper tail dependence can be calculated explicitly,39 U (X,Y) = 2 lim x x 1 - 1 + = 0, which is a general characteristic of Gaussian copulas. This means that if we go far enough out into the tail then extreme events occur independently in X and Y. In the figure of the right, F(x,y) = CGu (x),(y) , with CGu given by (30), where the dependence parameter was chosen to give (approximately) the same linear correlation.40 In the case of the Gumbel copula a simple calculation shows that for all 0 < < 1, the coefficient of upper tail dependence is U (X,Y) = 2 - 2 . Hence, for the Gumbel copula, U = 0 for 0 < < 1. Suppose the risk manager tries to account for heavy tails of a distribution by simply modelling the joint distribution as a multivariate t. He will not get U = 0 as in the case of the multivariate normal distribution. Example 5.13. If (X,Y) t with any linear correlation (-1,1) then it can be shown (Embrechts, McNeil and Straumann, 2001) that U (X,Y) = 2t+1 ( + 1)(1 - ) 1 + . Hence for all (-1,1) there is upper tail dependence of the bivariate t. The stronger the linear correlation and the lower the degrees of freedom, the stronger the upper tail dependence. 5.3. Elliptical distributions There are distributions other than multivariate normal where linear correlation can be used effectively. These are the spherical, or more generally, the elliptical distributions. Elliptical distributions extend in a natural way the class of multivariate normal distributions. Linear correlation (when it exists) will still be the canonical measure of dependence, yet elliptical distributions can display heavy tails. 39 (x) = 1 - (x), and, below t(x) = 1 - t(x). 40 The dependence parameter of the bivariate Gumbel copula is related to Kendalĺs tau by = 1 - . Ch. 2: Financial Risk and Heavy Tails 73 We shall define first the spherical distributions. These extend the class of standard multivariate normal distributions with zero correlations (Fang, Kotz and Ng, 1990; Embrechts, McNeil and Straumann, 2001). Definition 5.6. The random vector X Rn is said to be spherically distributed if X d = X O(n), where O(n) is the group of n × n orthogonal matrices. In other words, the distribution of X is invariant under rotation of the coordinates. Here are further characterizations. Theorem 5.7. The random vector X Rn has a spherical distribution iff its characteristic function X satisfies one of the following equivalent conditions: (1) X( Tt) = X(t) O(n); (2) There exists a function ():R+ R such that X(t) = (tTt), that is, X(t) = ( n i=1 t2 i ), where t = (t1,...,tn). Alternatively, spherical distributions admit a stochastic representation, namely, X Rn has a spherical distribution iff there exists a non-negative random variable R and random vector U independent of R and uniformly distributed over the unit hypersphere Sn = {s Rn | s = 1} such that X d = RU. (35) Example 5.14. Let X N(0,In) then X(t) = e-(1/2)(tTt) = e-(1/2)( n i=1 t2 i ) , and so (u) = e-u/2. Additionally, R 2 n in the stochastic representation (35). The function is called the characteristic generator of the spherical distribution. We write X Sn() to indicate that X Rn is spherically distributed with generator . Note that if X possesses a density, then Theorem 5.7 requires that it is of the form f (x) = g xT x = g n i=1 x2 i for some non-negative function g. The curves of constant density are spheroids in Rn. 74 B.O. Bradley and M.S. Taqqu Table 2 Partial list of spherical distributions used in finance Type pdf f (x) or ch.f. (t) Normal f (x) = c exp(-xTx/2) t f (x) = c(1 + xTx/)-(+n)/2 Logistic f (x) = c exp(-xTx)/[1 + exp(-xTx)]2 Scale mixture f (x) = c 0 t-n/2 exp(-xTx/2t)dG(t), G(t) a c.d.f. Stable laws (t) = exp{r(tTt)/2}, 0 < 2 and r > 0 Example 5.15. If X Rn has a multivariate t distribution with zero correlation, then f (x) = (+n 2 ) ( 2 )()n/2 1 + xTx -(+n)/2 . X is therefore spherically distributed. Table 2 gives a partial list of the spherical distributions used in finance. Recall that if X N(0,In), then Y = + AX has a multivariate normal distribution with mean and covariance matrix = AAT. Elliptical distributions are defined from spherical distributions in a similar manner. They are affine transformations of spherical distributions. Definition 5.7. Let X Rn, Rn, and Rn×n. Then X has an elliptical distribution with parameters and if X d = + AY, where Y Sk(), and A Rn×k, = AAT, with rank() = k. Since the characteristic function of X may be written X(t) = eitT tT t , we use the notation X En(,,). In this representation only is uniquely determined. Since both and are determined up to a positive constant may be chosen to be the covariance matrix if variances are finite (which we assume here). An elliptically distributed random variable X En(,,) is thus described by its mean, covariance matrix and its characteristic generator. If X pos- Ch. 2: Financial Risk and Heavy Tails 75 sesses a density, then it is of the form f (x) = ||-1/2 g (x - )T -1 (x - ) (36) so that contours of constant density are ellipsoids in Rn.41 The following theorem describes some properties of linear combinations, marginal distributions and conditional distributions of elliptical distributions. Theorem 5.8 (Fang, Kotz and Ng, 1990). Let X En(,,). (1) If B Rm×n and Rm, then + BX Em + B,BBT , . Hence any linear combination of elliptically distributed variates is elliptical with the same characteristic generator. (2) Partition X, ,and into X = X(1) X(2) , = (1) (2) , = 11 12 21 22 , where X(1) Rm, (1) Rm and 11 Rm×m, 0 < m < n. Then X(1) Em (1) ,11, , X(2) En-m (2) ,22, . Hence all marginals of an elliptical distribution are also elliptical with the same gen- erator. (3) Partition X, ,and as above and assume that is strictly positive definite. Then X(1) |X(2) = x(2) 0 Em 1.2,11.2, ~ , where 1.2 = (1) + 12-1 22 x (2) 0 - (2) , 11.2 = 11 - 12-1 22 21. Hence the conditional distribution of X(1) given X(2) is also elliptical, though with different generator.42 41 For example if rank() = n and Y has density of the form g(yTy). 42 The form of the generator ~ can be related to through the stochastic representation of an elliptically distributed random vector in (35). See Fang, Kotz and Ng (1990) for details. 76 B.O. Bradley and M.S. Taqqu The importance of the class of elliptical distributions to risk management can be seen in the following theorem. It indicates that the standard approaches to risk management apply to a linear portfolio with elliptically distributed risk factors. Theorem 5.9 (Embrechts, McNeil and Straumann, 2001). Suppose X En(,,) with finite variances for all univariate marginals. Let P = Z = n i=1 wiXi wi R be the set of all linear portfolios. Then: (1) (Subadditivity of VaR.) For any two portfolios Z1,Z2 P and 0.5 < 1, VaR(Z1 + Z2) VaR(Z1) + VaR(Z2). (2) (Equivalence of variance and any other positive homogeneous risk measure.) Let be any real valued, positive homogeneous risk measure depending only on the distribution of a random variable X. Then for Z1,Z2 P, (Z1 - EZ1) (Z2 - EZ2) 2 Z1 2 Z2 . (3) (Markowitz risk minimizing portfolio.) Let be as in (2), but also translation invariant, and let E = Z = n i=1 wiXi wi R, n i=1 wi = 1, EZ = r be the subset of portfolios with the same expected return r. Then argmin ZE (Z) = argmin ZE 2 Z. The theorem43 states that: 43 Because of the importance of Theorem 5.9 and because its proof is illuminating and straightforward we shall sketch it. It is based on the observation that (Z1,Z2) is elliptical and so portfolios Z1, Z2 and Z1 + Z2 are all of the same type. Let q, 1/2 < < 1, denote the quantile of the corresponding standardized distribution. Then VaR(Z1) = EZ1 + Z1 q, VaR(Z2) = EZ2 + Z2 q, VaR(Z1 + Z2) = EZ1 + EZ2 + Z1+Z2 q Ch. 2: Financial Risk and Heavy Tails 77 * For any linear portfolio of elliptical risk factors, VaR is a coherent measure of risk. * If the risk factors are elliptical, the linear correlation is the canonical measure of depen- dence. * For elliptical risk factors, the Markowitz mean variance optimal portfolio, for a given level of expected return, will be the same regardless of whether the risk measure is given by the variance, VaR, expected shortfall or any other positive homogeneous, translation invariant risk measure. Hence, all the usual techniques of portfolio theory and risk management apply. * It may be strange at first that the expected shortfall S(X), for example, which does not involve subtraction of the mean (see (22)), can be used instead of the variance in Markowitz' risk minimization portfolio theory. This is because one considers a set of portfolios E, all of the same mean. Since S(X - EX) = S(X) - EX and since EX is the same for all portfolios X in E, the term EX can be ignored. Note that elliptical distributions are not required to be thin-tailed. The multivariate normal is but one elliptical distribution. The risk manager may well feel that the risk factors under consideration are better modelled using a heavy-tailed elliptical distribution.44 The usual techniques then apply, but the risk of a linear portfolio will be greater than if the risk factors were assumed multivariate normal. 6. Univariate extreme value theory Managing extreme market risk is a goal of any financial institution or individual investor. In an effort to guarantee solvency, financial regulators require most financial institutions to maintain a minimum level of capital in reserve. The recommendation of the Basle Committee (1995b) of a minimum capital reserve requirement based on VaR is an attempt to manage extreme market risks. Recall that VaR is nothing more that a quantile of a probability but Z1+Z2 Z1 + Z2 and q > 0, proving (1). Next, note that there exists a > 0 such that Z1 - EZ1 d = a(Z2 -EZ2), so that a 1 2 1 2 2 . Since the risk measure is assumed positive homogeneous and depends only on the distribution of Z, (Z1 - EZ1) = (a(Z2 - EZ2)) = a(Z2 - EZ2) and hence (Z1 - EZ1) (Z2 - EZ2) a 1 2 Z1 2 Z2 (37) which proves (2). Now consider only portfolios in E. Then (37) holds with EZ1 = EZ2 = r. However, using translation invariance of , (Zj - r) = (Zj ) - r for j = 1,2. This gives (Z1) (Z2) 2 Z1 2 Z2 proving (3). 44 In a recent paper, Lindskog (2000a) compares estimators for linear correlation showing that the standard covariance estimator (17) performs poorly for heavy-tailed elliptical data. Several alternatives are proposed and compared. 78 B.O. Bradley and M.S. Taqqu distribution. The minimum capital reserve is then a multiple of this high quantile, usually computed with = 0.99. Therefore it is very important to attempt to model correctly the tail of probability distribution of returns (profit and losses). The primary difficulty is that we are trying to model events about which we know very little. By definition, these events are rare. The model must allow for these rare but very damaging events. Extreme value theory (EVT) approaches the modelling of these rare and damaging events in a statistically sound way. Once the risks have been modelled they may be measured. We will use VaR and Expected Shortfall to measure them. Extreme value theory (EVT) has its roots in hydrology, where, for example, one needed to compute how high a sea dyke had to be to guard against a 100 year storm. EVT has recently found its way into the financial community. The reader interested in a solid background may now consult various texts on EVT such as Embrechts, Klüppelberg and Mikosch (1997), Reiss and Thomas (2001) and Beirlant, Teugels and Vynckier (1996). For discussions of the use of EVT in risk management, see Embrechts (2000) and Diebold, Schuermann and Stroughair (2000). The modelling of extremes may be done in two different ways: modelling the maximum of a collection of random variables, and modelling the largest values over some high threshold. We start, for historical reasons, with the first method, called block maxima. 6.1. Limit law for maxima The Fisher­Tippett theorem is one of two fundamental theorems in EVT. It does for the maxima of i.i.d. random variables what the central limit theorem does for sums. It provides the limit law for maxima. Theorem 6.1 (Fisher­Tippett, 1928). Let (Xn) be a sequence of i.i.d. random variables with distribution F. Let Mn = max(X1,...,Xn). If there exist norming constants cn > 0 and dn R and some non-degenerate distribution function H such that Mn - dn cn d - H, then H is one of the following three types: Fr´echet (x) = 0, x 0, exp -x, x > 0, > 0, Weibull (x) = exp -(-x) , x 0, 1, x > 0, > 0, Gumbel (x) = exp -e-x , x R. The distributions , , and are called standard extreme value distributions. The expressions given above are cumulative distribution functions. The Weibull is usually defined Ch. 2: Financial Risk and Heavy Tails 79 Fig. 10. Densities of the generalized extreme value distribution H . Left: Weibull with = -0.5. Middle: Gumbel with = 0. Right: Fréchet with = 0.5. as having support (0,) but, in the context of extreme value theory, it has support on (-,0), as indicated in the theorem. These distributions are related: X lnX - 1 X . A one-parameter representation of these distributions (due to Jenkinson and von Mises) will be useful. The reparameterized version is called the generalized extreme value (GEV) distribution. H (x) = exp -(1 + x)-1/ , = 0, exp -e-x , = 0, where 1 + x > 0. The standard extreme value distributions , , and follow by taking = -1 > 0, = --1 < 0, and = 0 respectively.45 There densities are sketched in Figure 10. The parameter is the shape parameter of H . Since for any random variable X FX and constants R and > 0, the distribution function of X = + X is given by FX(x) = FX((x - )/), we can add location and scale parameters to the above parameterization, and consider H,, (x) = H x - . If the Fisher­Tippett theorem holds, then we say that F is in the maximum domain of attraction of H and write F MDA(H). Most distributions in statistics are in MDA(H ) for some . If F MDA(H ) and = 0 or F MDA(H ) and < 0, then F is said to be thin-tailed or short-tailed respectively. Thin-tailed distributions ( = 0) include the normal, exponential, gamma and lognormal. Short-tailed distributions ( < 0) have a finite 45 Consider, for example, the Fréchet distribution where = -1 > 0. Since the support of H is 1 + x > 0, one has H-1 (x) = exp{-(1 + -1x)-} = (1 + -1x) for 1 + -1x > 0. 80 B.O. Bradley and M.S. Taqqu right-hand end point and include the uniform and beta distributions. The heavy-tailed distributions, those in the domain of attraction of the Fréchet distribution, F MDA(H ), for > 0, are of particular interest in finance. They are characterized in the following theorem due to Gnedenko. Theorem 6.2 (Gnedenko, 1943). The distribution function F MDA(H ) for > 0 if and only if F(x) = 1 - F(x) = x-1/ L(x) for some slowly varying function L.46 Distributions such as the Student-t, -stable and Pareto are in this class. Note that if X F with F MDA(H ), > 0 then all moments EX are infinite for > 1/. Note also that < 1 corresponds to > 1, where is as in Theorem 6.1. 6.2. Block maxima method We now explain the block maxima method, where one assumes in practice that the maximum is distributed as H,, . The implementation of this method requires a great deal of data. Let X1,X2,...,Xmn be daily (negative) returns and divide them into m adjacent blocks of size n. Choose the block size n large enough so that our limiting theorem results apply to M (j) n = max(X(j-1)n+1,...,X(j-1)n+n) for j = 1,...,m. Our data set must then be long enough to allow for m blocks of length n. There are three parameters, , and , which need to be estimated, using for example maximum likelihood based on the extreme value distribution. The value of m must be sufficiently large as well, to allow for a reasonable confidence in the parameter estimation. This is the classic bias-variance trade-off since for a finite data set, increasing the number of blocks m, which reduces the variance, decreases the block size n, which increases the bias. Once the GEV model H,, is fit using M(1) n ,...,M(m) n , we may estimate quantities of interest. For example, assuming n = 261 trading days per year, we may want to find R261,k, the daily loss we expect to be exceeded in one year every k years.47 If this loss is exceeded in a given day, this day is viewed as an exceedance day and the year to which the day belongs is regarded as an exceedance year. While an exceedance year has at least one exceedance day, we are not concerned here with the total number of exceedance days in that year. This would involve taking into consideration the propensity of extremes to form clusters. Since we want M261 to be less than R261,k for k - 1 of k years, R261,k is the 1 - 1/k quantile of M261: R261,k = inf r P(M261 r) 1 - 1 k . (38) 46 The function L is said to be slowly varying (at infinity) if lim x L(tx) L(x) = 1, t > 0. 47 Note the obvious hydrological analogy: How high to build a sea dyke to guard against a k year storm. Ch. 2: Financial Risk and Heavy Tails 81 If we assume that M261 has approximately the H,, distribution, the quantile R261,k is given by R261,k = H-1 ,, 1 - 1 k (39) = + -ln 1 - 1 k - 1 , = 0, (40) since the inverse function of y = exp{-(1 + x)}-1/ is x = (1/)[(-lny)- - 1]. Confidence intervals for R261,k may also be constructed using profile log-likelihood functions. The idea is as follows. The GEV distribution H,, depends on three parameters. Substitute R261,k for using (40) and denote the reparameterized H as H,R261,k, after some abuse of notation. Then obtain the log-likelihood L(,R261,k,|M1,...,Mm) for our m observations from H,R261,k, . Take H0: R261,k = r as the null hypothesis in an asymptotic likelihood ratio test and let 0 = ( R, R261,k = r, R+) and = ( R, R261,k R, R+) be the constrained and unconstrained parameter spaces respectively. Then under certain regularity conditions we have that -2 sup 0 L(|M1,...,Mm) - sup L(|M1,...,Mm) 2 1 as m where = (,R261,k,) and 2 1 is a chi-squared distribution with one degree of freedom. Let L(^,r, ^) = sup0 L(|M1,...,Mm) and L(^,R261,k, ^) = sup L(|M1,...,Mm) denote the constrained and unconstrained maximum log-likelihood values respectively. The confidence interval for R261,k is the set r: L ^,r, ^ L ^,R261,k, ^ - 1 2 2 1 () , that is, the set r for which the null hypothesis cannot be rejected for level . See McNeil (1998a) or Këllezi and Gilli (2000) for details. Example 6.1. We have 7570 data points for the NASDAQ, which we subdivided into m = 31 blocks of roughly n = 261 trading days. (The last block, which corresponds to January 2001, has relatively few trading days, but was included because of the large fluctuations.) Estimating the GEV distribution by maximum likelihood leads to ^ = 0.319, ^ = 2.80 and ^ = 1.38. The value of ^ corresponds to ^ = 1/^ = 3.14, which is in the expected range for financial data. The GEV fit is not perfect (see Figure 11). Choosing k = 20 yields an estimate of the twenty year return level R261,20 = 9.62%. Figure 12, which displays the log-likelihood corresponding to the null-hypothesis that R261,20 = r, where r is displayed on the abscissa, also provides the corresponding confidence interval. 82 B.O. Bradley and M.S. Taqqu Fig. 11. The GEV distribution H^, ^,^ fitted using the 31 annual maxima of daily (negative, as %) NASDAQ returns. Fig. 12. The profile log-likelihood curve for the 20 year return level R261,20 for NASDAQ. The abscissa displays return levels (as %) and the ordinate displays log-likelihoods. The point estimate R261,20 = 9.62% corresponds to the location of the maximum and the asymmetric 95% confidence interval, computed using the profile log-likelihood curve, is (6.79%,21.1%). 6.3. Using the block maxima method for stress testing For the purpose of stress testing (worst case scenario), it is the high quantiles of the daily return distribution F that we are interested in, not those of Mn. If the Xi F have a continuous distribution, we have P(Mn Rn,k) = 1 - 1 k . If they are also i.i.d., P(Mn Rn,k) = P(X Rn,k) n , where X F, and hence P(X Rn,k) = 1 - 1 k 1/n . (41) Ch. 2: Financial Risk and Heavy Tails 83 This means that Rn,k is the (1 - 1/k)1/n quantile of the marginal distribution F. Suppose we would like to calculate VaR at very high quantiles for the purposes of stress testing. The block size n has been fixed for the calibration of the model. This leaves the parameter k for the Rn,k return level free. High quantiles, x = F-1(), of F may then be computed from (41) by choosing = (1 - 1/k)1/n, that is k = 1/(1 - n). Hence VaR(X) = Rn,k, where k = 1 1 - n . (42) For the NASDAQ data, our choice of k = 20, corresponds to = 0.9998 and VaR=0.9998(X) = R261,20 = 9.62%. In practice is given, and one chooses k = 1/(1 - n), then computes Rn,k using (40) and thus one obtains VaR(X) = Rn,k. We assumed independence but, in finance, this assumption is not realistic. At best, the marginal distribution F can be viewed as stationary. For the extension of the Fisher­Tippett theorem to stationary time series see Leadbetter, Lindgren and Rootzén (1983, 1997) and McNeil (1998a). See McNeil (1998b) for a non-technical example pertaining to the block maxima method and the market crash of 1987. 6.4. Peaks over threshold method The more modern approach to modelling extreme events is to attempt to focus not only the largest (maximum) events, but on all events greater than some large preset threshold. This is referred to as peaks over threshold (POT) modelling. We will discuss two approaches to POT modelling currently found in the literature. The first is a semi-parametric approach based on a Hill type estimator of the tail index (Beirlant, Teugels and Vynckier, 1996; Danielsson and de Vries, 1997, 2000; Mills, 1999). The second approach is a fully parametric approach based on the generalized Pareto distribution (Embrechts, Klüppelberg and Mikosch, 1997; McNeil and Saladin, 1997; Embrechts, Resnick and Samorodnitsky,1999). 6.4.1. Semiparametric approach Recall that FX is in the maximum domain of attraction of the Fréchet distribution if and only if FX(x) = x-L(x) for some slowly varying function L. Suppose FX is the distribution function of a loss distribution over some time horizon, where we would like to calculate a quantile based risk measure such as VaR. Assume for simplicity that the distribution of large losses is of Pareto type P(X > x) = cx, > 0, x > x0. (43) The semiparametric approach uses a Hill type estimator for and order statistics of historical data to invert and solve for VaR. 84 B.O. Bradley and M.S. Taqqu We first focus on VaR. Let X(1) X(2) X(n) be the order statistics of an historical sample of losses of size n, assumed i.i.d. with distribution FX. If X is of Pareto type in the tail and X(k+1) is a high order statistic then for x > X(k+1), FX(x) FX(X(k+1)) = x X(k+1) - . The empirical distribution function estimator FX(X(k+1)) = k/n suggests the following estimator of FX in the upper tail, FX(x) = 1 - k n x X(k+1) -^ for x > X(k+1) . By inverting this relation, one can express x in terms of FX(x), so that fixing q = FX(x) one gets48 x = VaRq(X). The value of q should be large, namely, q = FX(x) > F(X(k+1)) = 1 - k/n. This yields VaRq(X) = X(k+1) n k (1 - q) -1/^ . (44) We obtained an estimator for VaR but it depends on k through X(k+1), on the sample size n and ^. To estimate , Hill (1975) proposed the following estimator ^(Hill) which is also dependent on the order statistics and sample size: ^(Hill) = ^(Hill) k,n = 1 k k i=1 lnX(i) - lnX(k+1) -1 . (45) The consistency and asymptotic normality properties of this ^(Hill) estimator are known in the i.i.d. case and for certain stationary processes. There are however, many issues surrounding Hill-type estimators, see for example Beirlant, Teugels and Vynckier (1996), Embrechts, Klüppelberg and Mikosch (1997) and Drees, de Haan and Resnick (2000). To obtain VaRq(X), one also needs to choose the threshold level X(k+1) or, equivalently, k. Danielsson et al. (2001) provide an optimal choice for k by means of a two stage bootstrap method. Even in this case, however, optimal means merely minimizing the asymptotic mean squared error, which leaves the user uncertain as to how to proceed in the finite sample case. Traditionally the choice of k is done visually by constructing a Hill plot. The Hill plot {(k, ^ (Hill) k,n ): k = 1,...,n-1} is a visual check for the optimal choice of k. The choice of k and therefore of ^ (Hill) k,n , is inferred from a stable region of the plot since 48 We write here VaRq and not VaR since now represents the heavy-tail exponent. Ch. 2: Financial Risk and Heavy Tails 85 Fig. 13. Hill plots for the NASDAQ data set. Left: The Hill plot {(k, ^ (Hill) k,n ): k = 1,... ,n-1}. Right: The AltHill plot{(, ^ (Hill) n ,n ): 0 < 1}. The Hill plot is difficult to read, whereas the AltHill plot gives the user an estimate of ^AltHill 3. in the Pareto case, where (43) holds, ^(Hill) n-1,n is the maximum likelihood estimator for . In the more general case 1 - F(x) xL(x), x , > 0, (46) where L is a slowly varying function, the traditional Hill plot is often difficult to interpret. Resnick and St˘aric˘a (1997) suggest an alternative plot, called an AltHill plot by plotting {(, ^(Hill) n ,n ): 0 < 1} where n denotes the smallest integer greater than or equal to n . This plot has the advantage of stretching the left-hand side of the plot, which corresponds to smaller values of k, often making the choice of k easier. See Figure 13 for examples of the Hill and AltHill plots for the ordered negative returns X(j) for the NAS- DAQ. 6.4.2. Fully parametric approach The fully parametric approach uses the generalized Pareto distribution (GPD) and the second fundamental theorem in EVT by Pickands, Balkema and de Haan. The GPD is a twoparameter distribution G,(x) = 1 - 1 + x -1/ , = 0, 1 - exp - x , = 0, where an additional parameter > 0 has been introduced. The support of G,(x) is x 0 for 0 and 0 x -/ for < 0. The distribution is heavy-tailed when > 0. GPD distributions with = 1 are displayed in Figure 14. 86 B.O. Bradley and M.S. Taqqu Fig. 14. GPD distribution functions G,, all with = 1. Left: = -0.5, Middle: = 0, Right: = 0.5, which corresponds to a location adjusted Pareto distribution with = 2. Definition 6.1. Let X F with right-end-point xF = sup{x R | F(x) < 1} . For any high threshold u < xF define the excess distribution function Fu(x) = P(X - u x|X > u) for 0 x < xF - u. (47) The mean excess function of X is then eX(u) = E(X - u|X > u). (48) If X has exceeded the high level u, Fu(x) measures the probability that it did not exceed it by more than x. Note that for 0 x < xF - u, we may express Fu(x) in terms of F, Fu(x) = F(u + x) - F(u) 1 - F(u) , and the mean excess function eX(u) may be expressed as a function of the excess distribution Fu as eX(u) = xF -u 0 x dFu(x). The following theorem relates Fu to a GPD through the maximum domain of attraction of a GEV distribution. In fact, it completely characterizes the maximum domain of attraction of H . Theorem 6.3 (Pickands, 1975, Balkema and de Haan, 1974). Let X F. Then for every R, X MDA(H ) if and only if lim uxF sup 0 u. Assuming that u is sufficiently large, we may then approximate Fu by G,(u) and use the empirical estimator, for F(u), F(u) = Nu n , where Nu = n i=1 1{Xi>u} and where n is the total number of observations. The upper tail of F(x) may then be estimated by F(x) = 1 - F = 1 - Nu n 1 + ^ x - u ^ -1/^ for all x > u. (50) This way of doing things allows us to extrapolate beyond the available data which would not be possible had we chosen an empirical estimator for F(x), x > u. We can therefore deal with potentially catastrophic events which have not yet occurred. The parameters and of the GPD G,(u) may be estimated by using, for example, maximum likelihood once the threshold u has been chosen. The data points that are used in the maximum likelihood estimation are Xi1 - u,...,Xik - u where Xi1 ,...,Xik are the observations that exceed u. Again there is a bias-variance trade-off in the choice of u. To choose a value for u, a graphical tool known as the mean excess plot (u,eX(u)) is often used. The mean excess plot relies on the following theorem for generalized Pareto distribu- tions. Theorem 6.4 (Embrechts, Klüppelberg and Mikosch, 1997). Suppose X has GPD distribution with < 1 and . Then, for u < xF , eX(u) = + u 1 - , + u > 0. The restriction < 1 implies that the heavy-tailed distribution must have at least a finite mean. If the threshold u is large enough so that Fu is approximately G, then, by Theorem 6.4, the plot (u,e(u)) is linear in u. How then is one to pick u? The mean excess plot is a graphical tool for examining the relationship between the possible threshold u and 88 B.O. Bradley and M.S. Taqqu the mean excess function eX(u) and checking the values of u where there is linearity. In practice it is not eX(u), but its sample version ^eX(u) = n i=1(Xi - u)+ n i=1 1{Xi>u} which is plotted against u. After using the mean excess plot to pick the upper threshold u one obtains an estimator of the tail of the distribution by applying (50). For the NASDAQ data, since linearity seems to start at relatively small values of u (Figure 15), we choose u = 1.59 which corresponds to the 95% of the empirical NASDAQ return distribution. To obtain VaR(X) for VaR(X) > u, one simply inverts the tail estimator (50), which yields VaR(X) = u + ^ ^ n Nu (1 - ) -^ - 1 . (51) Since expected shortfall is a risk measure with better technical properties than VaR we would like to find an estimator for it which uses our GPD model of the tail. Recalling the definitions of the expected shortfall (22) and the mean excess function (48) we have that S(X) = VaR(X) + eX VaR(X) . Since the excess distribution Fu is approximated by a GPD G,(u) with < 1 then, applying Theorem 6.4, we get for VaR(X) > u, S(X) = VaR(X) + + (VaR(X) - u) 1 - = + VaR(X) - u 1 - . This suggests the following estimator for expected shortfall, S(X) = ^x 1 - ^ + ^ - ^u 1 - ^ , (52) Fig. 15. Sample mean excess plot (u, ^eX(u)) for NASDAQ. Ch. 2: Financial Risk and Heavy Tails 89 where ^x = VaR(X) may be obtained by using (51). As in the case of block maxima, confidence intervals for VaR and S may be constructed using profile log-likelihood func- tions. 6.4.3. Numerical illustration To illustrate the usefulness of EVT in risk management, we consider the following example. Let X1,...,Xn represent the daily negative returns of the NASDAQ index over most of its history from February 1971 to February 2001, which gives a time series of n = 7570 data points. The price and return series are displayed in Figure 16. Let X(1) X(n) be the corresponding order statistics. Suppose the risk manager wants to obtain value at risk and expected shortfall estimates of the returns on the index at some high quantile. Assume that {Xi}n i=1 are i.i.d. so that Theorem 6.1 holds. Then, using Theorem 6.3, we model the tail of the excess distribution Fu by a GPD G, and use (49) to model the distribution F(x) Fig. 16. Time series of NASDAQ daily prices, (log) returns and annual maxima and minima daily returns given as a percent for the period February 1971 (when it was created) to February 2001. If Pt is the price (level) at time t, the returns are defined as 100ln(Pt /Pt-1) and expressed as %. The crash of 1987 is clearly visible. The NASDAQ price level peaked in March of 2000. 90 B.O. Bradley and M.S. Taqqu of the observations for all x > u. We use Theorem 6.4 and the sample mean excess plot, Figure 15, to pick the high threshold u = 1.59%. This leaves us with k = 379 observations from which we estimate the parameters of the GPD by maximum likelihood. The estimates give = 0.189 and = 0.915. The model fit is checked by using a QQ plot displayed in Figure 17. Accepting the model, we go on to calculate the value at risk and expected shortfall for various high quantiles by using (51) and (52). The results for the NASDAQ are plotted in Figure 18 (solid lines). If one had assumed that the observations were normally distributed (dashed lines), both the VaR and the expected shortfall would have been significantly underestimated for high quantiles. For example, at the = 0.99 confidence level, VaR(X) = 6.59% under the normal model versus VaR(X) = 8.19% for the GPD model. For the expected shortfall, the difference is even more dramatic. For the normal model, S(X) = 7.09% versus S(X) = 10.8% for the GPD model. This is to be expected, since under the assumption of normality it may be shown (Embrechts, Klüppelberg and Mikosch, 1997) that S VaR 1 as 1- , Fig. 17. For the NASDAQ return data (as %), there were 379 exceedances above the high threshold u = 1.59%. These are fitted with a GPD distribution G^, ^ with ^ = 0.189 and ^ = 0.915. Left: The fitted GPD distribution (dark curve) and the empirical one (dotted curve). Right: QQ-plot of sample quantiles versus the quantiles of the fitted G^, ^ distribution. Fig. 18. Risk estimates for NASDAQ in percent returns versus . Left: Value at risk VaR, for GPD (solid) and normal (dashed). Right: Expected shortfall S, for GDP (solid) and normal (dashed). The parameters of the GPD are fitted by maximum likelihood using 30 years of data. The sample mean and volatility of the normal distribution are computed by (16) using the most recent year of daily observations. Ch. 2: Financial Risk and Heavy Tails 91 whereas for the GPD model S VaR - 1 1 - as 1- . These results indicate that for very high quantiles, the expected shortfall S and the value at risk VaR are comparable under normality, but for the GPD with < 1, S tends to be larger than VaR. 6.4.4. A GARCH-EVT model for risk In order to invoke Theorems 6.1 and 6.3 in the numerical illustration above it was necessary to assume that the (negative) returns {Xt}tZ were i.i.d. However, from inspection of Figures 16 and 19, it is apparent that this assumption is unrealistic. The time series of returns is characterized by periods of varying volatility, that is, the time series is heteroscedastic. The heteroscadicity of the time series may cause problems for the estimation of the parameters of the GPD model since we would expect the high threshold u to be violated more often during periods of high volatility. Smith (2000) suggests using Bayesian techniques to model time-varying GPD parameters. In this section, we review a model proposed by McNeil and Frey (2000) which extends the EVT methodology to models of financial time series that allow for stochastic volatility and apply this model to the NASDAQ data set. Fig. 19. Sample auto correlation functions with lags on the abscissa and sample autocorrelation on the ordinate: returns (top left), squared returns (bottom left), GARCH innovations (top right), squared GARCH innovations (bottom right). The sample consists of 1000 daily returns for the NASDAQ ending February 2001. Horizontal lines indicate the 95% confidence bands (1.96/ n) corresponding to Gaussian white noise. 92 B.O. Bradley and M.S. Taqqu Recall from Section 3.2.2 that the standard GARCH(1,1) model is given by49 Xt = tZt, where Zt FZ i.i.d., (53) 2 t = 0 + 1X2 t-1 + 12 t-1. (54) Since the time t + 1 volatility t+1 is known at time t we have that VaR(Xt+1|Ft) := inf x R | FXt+1|Ft (x) = t+1z, (55) where z = F-1 Z (). The same argument shows that the conditional expected shortfall S(Xt+1|Ft) := E Xt+1|Xt+1 > VaR(Xt+1|Ft),Ft = t+1E(Z|Z > z). Traditionally the innovation distribution FZ is assumed normal. Figures 6 and 20 show that this assumption may still underestimate the tails of the loss portion of the distribution. McNeil and Frey propose a two step procedure to estimate VaR and expected shortfall of the conditional distribution. First they use a GARCH(1,1) model for the volatility of the (negative) return series {Xt}. This gives a series of model implied innovations Zt = Xt /t . Second, EVT is used to model the tails of the distribution of these innovations. This approach has the obvious benefit that the resulting innovations Zt are much closer to satisfying the requirements of Theorems 6.1 and 6.3 than is the original series. We illustrate the methodology with an example using the NASDAQ data. (1) Let (xt-n+1,...,xt-1,xt) be n daily negative returns of the NASDAQ. We take50 n = 1000 and use pseudo-maximum-likelihood (PML) to estimate the model parameters ^ = ( ^0, ^1, ^1) in (54) under the assumption51 that FZ is normal in (53). The parameter vector ^ depends on the true distribution of (Xt-n+1,...,Xt-1,Xt ), which Fig. 20. QQ plots versus the normal for returns (left) and innovations (right) in Figure 19. Notice that the lower (loss) tail of the innovations are still heavier than the normal distribution. 49 Since the NASDAQ series appears to have a zero conditional mean we do not set Xt = t + t Zt and model the mean t , for example as an AR(1) process t = Xt-1. 50 We keep the sample size moderate in order to avoid the IGARCH effect, that is 1 + 1 = 1, corresponding to non-stationarity. See Mikosch and St˘aric˘a (2000) for details. 51 The term pseudo refers to the fact that one is not maximizing the true likelihood. Ch. 2: Financial Risk and Heavy Tails 93 is assumed stationary, and on the distribution FZ used to compute the likelihood function.52 When we assume FZ is normal we fit a model whose distributional assumptions we do not believe. Under standard regularity conditions this is justified since ^ is a consistent estimator of (in fact, asymptotically normal) even if FZ is non-normal. See Gouriéroux (1997) and references therein for details. (2) The model innovations (zt-n+1,...,zt-1,zt ) = (xt-n+1/^t-n+1,...,xt-1/^t-1, xt/^t ) are now calculated. If the model is tenable, these innovations should be i.i.d. Figure 19 shows that while the i.i.d. assumption is not realistic for the series of returns, it is defensible for the series of innovations.53 While the returns appear uncorrelated, their squares clearly are not, and hence the returns are dependent. The GARCH innovations and their squares appear uncorrelated. The i.i.d. assumption is therefore more tenable. (3) Examination of the QQ plot of the innovations in Figure 20 reveals that the loss tail is heavier than that of the normal. Therefore the EVT tools of Section 6.4.2 are now applied to the innovations (zt-n+1,...,zt-1,zt). Let z(1) z(n) be the order statistics of the innovation sample. We choose the threshold u = 1.79, again corresponding to the 95% of the empirical distribution of innovations, which leaves k = 50 observations (z(n-k+1),...,z(n)), from which to estimate the GPD parameters by maximum likelihood. The estimates give = 0.323 and = 0.364. Observe that = 0.323 corresponds to a heavier tail than = 0.189 which we found in Section 6.4.3. We are fitting here, however, over a particularly volatile period of 1000 days of the NASDAQ ending February 2001, whereas in Section 6.4.3, we considered nearly 30 Fig. 21. Backtest results for the GARCH-EVT methodology of McNeil and Frey. Under the assumption that the model correctly estimates the conditional quantiles we expect violations 5% and 1% of the time for = 0.95 and = 0.99 respectively. VaR for = 0.95 and = 0.99 are given by the solid and dotted lines respectively. We obtain 5.8% violations of the = 0.95 level and 1% violations of the = 0.99 level. 52 The condition 1 + 1 < 1 is sufficient for stationarity of the GARCH model. We found ^0 = 0.080, ^1 = 0.181 and ^1 = 0.811. However, as indicated in the sequel, the GARCH model is constantly updated, and hence is never used on an infinite horizon. 53 Ljung-Box tests also found no evidence against the i.i.d. assumption for the innovations. 94 B.O. Bradley and M.S. Taqqu years worth of returns where for the majority of the time the NASDAQ was significantly less volatile (see Figure 16). Since the model is assumed stationary, we could, in principle, use the estimated GARCH parameters to compute ^t+1|Ft using (54) for t beyond February 2001. Using z corresponding to the GPD distribution G,, we would obtain, by using (55), VaR(Xt+1|Ft) for t beyond February 2001. In practice, however, stationarity is not always assured and in any case one wants to use the most recent data available in order to calibrate the model. In order to backtest the methodology we use the most recent 500 days in our NASDAQ data set. For each day, t + 1, in this data set we use the previous n = 1000 days (negative) returns (Xt-n+1,...,Xt-1,Xt ) to calibrate the model and estimate VaR(Xt+1|Ft) for = 0.95 and = 0.99 using the steps above. We compare VaR(Xt+1|Ft) with the actual loss xt+1. A violation, at the level, is said to occur whenever xt+1 > VaR(Xt+1|Ft). Results for the period ending February 2001 are given in Figure 21. 7. Stable Paretian models The works of Mandelbrot (1963) and Fama (1965) introduced the use of stable distributions to finance. The excessively peaked and heavy-tailed nature of the return distribution led the authors to reject the standard hypothesis of normally distributed returns in favor of the stable distribution. Since this time, the stable distribution has been used to model both the unconditional, and conditional return distributions. In addition, portfolio theories and market equilibrium models have been constructed using it. For an in depth introduction to the general properties of stable distributions see Samorodnitsky and Taqqu (1994) and the upcoming text Nolan (2001). A major reference for applications in finance is Rachev and Mittnik (2000). In Definition 3.2, the stable distribution S(,,) is defined as the limiting distribution of the sum of i.i.d. random variables. Like the normal distribution, stable distributions are closed under addition, and are often defined by this property. Recall that if X1 N(1,2 1 ) and X2 N(2,2 2 ) are independent then X1 +X2 N(1 +2,2 1 +2 2 ). Similarly, for stable random variables, if X1 S(1,1,1) and X2 S(2,2,2) are independent, then X1 + X2 S(,,) where = 1 + 2 1/ , = 1 1 + 2 2 1 + 2 , = 1 + 2. It is in this sense that the stable distribution is a natural heavy-tailed alternative to the normal distribution. However, a common criticism of the stable distribution is that their tails are too heavy. One has P(X > x) cx- as x . For 0 < < 2, this implies that E|X|p < if 0 < p < . In particular, EX2 = , that is, all non-Gaussian stable distributions have infinite variance. The stable distributions can be defined and parameterized in different ways. One way to specify a stable distribution is through its characteristic function. This is helpful since Ch. 2: Financial Risk and Heavy Tails 95 in general there exists no closed form for the probability density function,54 which historically, has been an impediment to their widespread use. Today, however, there are efficient computer programs to evaluate their densities using fast Fourier transform methods (Rachev and Mittnik, 2000; Nolan, 2001). Definition 7.1. A random variable X is said to have a stable distribution if there are parameters (0,2], [0,), [-1,1] and R such that its characteristic function has the following form: X(t) = exp - |t| 1 - i(signt)tan 2 + it for = 1, exp -|t| 1 + i 2 (signt)ln|t| + it for = 1. (56) If both the skewness and location parameters and are zero, X is said to be symmetric stable, which is denoted X SS, and its characteristic function takes the simple form X(t) = e-|t| . If X SS, then it is characterized completely by its index of stability and its scale parameter . If = 2, the Gaussian case, then the scale parameter is = 1 2 Var(X). 7.1. Stable portfolio theory In Section 2.2 we introduced the mean­variance portfolio theory of Markowitz. The model assumed that the distribution of asset returns is multivariate normal, and provides efficient portfolios, that is, portfolios with maximum expected return for a given level of risk, where risk is measured by the variance of the portfolio. It is possible to extend the ideas of portfolio theory to the case where asset returns have a multivariate stable distribution, even though, variances are now infinite. We need first to define a stable random vector and specify its characteristic function. Definition 7.2. The random vector X = (X1,...,Xn) is said to be a stable random vector in Rn if for any a,b > 0 there exists c > 0 and d Rn such that aX1 + bX2 d = cX + d, (57) where Xj , j = 1,2, are independent copies of X. 54 The exceptions to this rule are the distributions S2(,0,), S1(,0,),and S1/2(,1,) which correspond to the Gaussian, Cauchy and Lévy distributions respectively. 96 B.O. Bradley and M.S. Taqqu The constants in (57) are related by c = a +b, where (0,2] is the index of stability. Setting n = 1 in (57) yields one of the alternate definitions of a stable random variable alluded to earlier. In the case of a stable random vector, the scale and skewness parameters and are replaced by a finite measure X on the unit hyperspherein Rn. For convenience here, let (,) denote the inner product so that (t,s) = n i=1 tisi .55 Theorem 7.1. Let 0 < < 2. Then X = (X1,...,Xn) is a stable random vector with index of stability if and only if there exists a finite measure X on the unit hypersphere Sn = {s Rn| s = 1} and a vector Rn such that (t) = exp - Sn (t,s) 1 - isign (t,s) tan 2 X(ds) + i(t,) , = 1, exp - Sn (t,s) 1 + i 2 sign (t,s) ln (t,s) X(ds) + i(t,) , = 1. (58) The pair (X,) is unique. The measure X is called the spectral measure of the stable random vector X and specifies the dependence structure. If X is SS in Rn, then the characteristic function takes the simple form (t) = exp - Sn (t,s) X(ds) , where is the unique symmetric spectral measure. The expression in (58) for the characteristic function is also valid for the normal case = 2. When = 2, it reduces to 2(t) = exp{- Sn |(t,s)|2X(ds)} but in this case X is no longer unique. To get a feeling for X, suppose X = (X1,X2) and that the distribution is Gaussian. Then S2 (t,s) 2 (X1,X2)(ds) = S2 (t1s1 + t2s2) 2 (X1,X2)(ds) = t2 1 2 1 + 2t1t21,2 + t2 1 2 1 , where 2 i = S2 s2 i (X1,X2)(ds), i = 1,2, and 1,2 = S2 s1s2(X1,X2)(ds), 55 Previously we wrote tTs instead of (t,s). Ch. 2: Financial Risk and Heavy Tails 97 and where integration over the circle S2 means integration on {s = (s1,s2) | s2 1 + s2 2 = 1}. One recognizes the normal characteristic function with VarX1 = 22 1 , VarX2 = 22 2 and Cov(X1,X2) = 21,2. Since different choices of (X1,X2) can yield the same values for 2 1 , 2 2 and 1,2, the choice of X is not unique in the Gaussian case. As in the case of a normal random vector, if X is multivariate stable with index of stability 0 < < 2, then all linear combinations of the components of X are stable with the same . So, if X is a stable random vector in Rn, and w Rn, we know that Y = (w,X) = n i=1 wiXi is S(Y ,Y ,Y ). Using the characteristic function (58), it can be shown [see Samorodnitsky and Taqqu (1994), Example 2.3.4], that Y = Sn (w,s) X(ds) 1/ , (59) Y = Sn |(w,s)| sign(w,s)X(ds) Sn |(w,s)|X(ds) , (60) Y = (w,) for = 1, (w,) - 2 Sn (w,s)ln|(w,s)|X(ds) for = 1. (61) In the mean­variance portfolio theory, the risk to be minimized for any level of expected return is given by the portfolios' variance. If the asset returns are assumed multivariate stable with index of stability 0 < < 2 then the variance is infinite and cannot be used. In the stable portfolio theory, it is assumed that 1 < < 2, EX = and that X - SS. Let w be the vector of weights for the risky portfolio Xp = (w,X). Given the relationship between the scale parameter and the variance in the Gaussian case (that is, stable with = 2), it is natural to use the scale parameter Xp of the resulting stable distribution instead of the standard deviation. It is given by (59). This brings us to the corresponding stable portfolio problem: min w Xp = Sn (w,s) X(ds) 1/ such that (w,) a, (62) (w,e) = 1. The risk measure Xp = (w,X) is a convex function of w and the problem is generally solved using sequential quadratic programming. See Belkacem (1997) and Rachev and Mittnik (2000) and references therein for details of the procedure and on the estimation of the index of stability, spectral measure and scale parameters. If a risk free asset is included in the asset universe, then we end up with a maximization problem similar to (2) in Section 2.2, but where the risk measure is the scale parameter Xp of the risky portfolio. 98 B.O. Bradley and M.S. Taqqu 7.2. Stable asset pricing Since there exists a portfolio theory under the assumption of a multivariate stable distribution of asset returns (1 < < 2), it is natural to ask whether there exists an analogous CAPM. The answer is positive, and it was first introduced by Fama (1970). For recent descriptions of the stable CAPM see Belkacem, Lévy Véhel and Walter (1996) and, of course, Rachev and Mittnik (2000). The assumptions behind the stable CAPM are the same as in the Gaussian case in Section 2.3 with the assumption of joint normality of asset returns replaced by that of jointly stable asset returns with index of stability (1,2). That is, we assume EX = and that X - SS. Recall from the traditional CAPM and Equations (3) and (4), that the expected premium of holding the risky asset i over the riskless asset is proportional to the expected premium of holding the market portfolio over the riskless asset. The constant of proportionality was the risky assets beta given by (4). In the stable CAPM, we require an alternative measure of dependence since covariances do not exist. Naturally, the scale parameter replaces the standard deviation. The covariation is a natural alternative to the covariance in the stable case when 1 < < 2. This measure possesses many, but not all, of the useful properties of covariance in the Gaussian case. We define and present several of the properties of covariation. Details may be found in Samorodnitsky and Taqqu (1994) and Rachev and Mittnik (2000). Definition 7.3. Let X1 and X2 be jointly SS with 1 < 2 and let (X1,X2) be the spectral measure of the random vector (X1,X2). The covariation of X1 on X2 is given by [X1,X2] = S2 s1s -1 2 (X1,X2)(ds) (63) where s p denotes the signed power s p = |s|p(signs). In the Gaussian case = 2 it reduces to [X1,X2]2 = 1 2 Cov(X1,X2). (64) Note, however, that whereas in the Gaussian case the dependence structure is fully characterized by the covariance, in the stable one needs to use X, and the covariation does not fully characterize the dependence structure. We now derive the stable CAPM under the preceding assumptions, following Belkacem, Lévy Véhel and Walter (1996). Consider a portfolio of a riskless asset with rate of return r and a risky asset Xi with weights w and 1 - w respectively. The expected rate of return of the portfolio Xp = wr + Ch. 2: Financial Risk and Heavy Tails 99 (1 - w)Xi is then EXp = wr + (1 - w)EXi , and its risk, as given by its scale parameter, is p = (1 - w)i .56 The risk-return trade-off is then given by EXp = r + EXi - r i p (65) after setting w = 1 - p/i . Under the assumptions of CAPM, investors have homogeneous beliefs, that is, they all agree on the multivariate stable parameters. This means that all investors hold the market portfolio (as in Section 2.3) as their risky asset and the riskreturn trade-off (65) becomes EXp = r + EXM - r M p, (66) where XM and M are the rate of return and scale parameter respectively of the market. Now consider the suboptimal portfolio Xp = wXi + (1 - w)XM obtained by adding to the market portfolio a certain position in asset i (the portfolio is optimal if w = 0). Since X - SS we know that Xi - i and XM - M are jointly SS. By properties of symmetric stable random vectors this means that Xp S(p,0,p), where the scale and location parameters are given by (59) and (61), that is p = S2 ws1 + (1 - w)s2 (Xi,Xp)(ds1,ds2), (67) p = EXp = wi + (1 - w)M , (68) respectively. Differentiating with respect to w gives p w = i - M, (69) p w = 1 -1 p p w = 1 -1 p S2 (s1 - s2) ws1 + (1 - w)s2 -1 (Xi,Xp)(ds1,ds2). (70) So evaluating (69) and (70) at w = 0 and using Definition 7.3 we get p p w=0 = p w p w w=0 = -1 M (i - M) [Xi,XM ] - M , (71) 56 Note that if X S(,,) then aX + b S(|a|,sign(a),a + b) if 1 < < 2. 100 B.O. Bradley and M.S. Taqqu Fig. 22. The stable efficient frontier. The portfolio Xp = wXi + (1 - w)XM is suboptimal, and hence must be dominated by the efficient frontier. since at w = 0 the portfolio Xp becomes XM and p becomes M . Moreover, in market equilibrium the trade-off between risk and return is given by (66), so that the slope p/p at w = 0 is given by (M - r)/M (see Figure 22). Hence M - r M = -1 M (i - M) [Xi,XM] - M . (72) This may be rewritten in the familiar CAPM form (3) as E(Xi - r) = iE(XM - r), where now, in the stable case, i = [Xi,XM] M . (73) Note that if we assume Gaussian returns, then X - SS with = 2, and by using (64), we recover i = Cov(Xi,XM ) Var(XM) , that is, the traditional CAPM result. Acknowledgments We would like to thank Paul Embrechts and Filip Lindskog for many valuable comments which led to an improved exposition of this material. This research was partially supported by the NSF Grant ANI-9805623 at Boston University. Ch. 2: Financial Risk and Heavy Tails 101 References Albanese, C., 1997. Credit exposure. Diversification risk and coherent VaR. Preprint. Department of Mathematics, University of Toronto. Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1997. Thinking coherently. RISK 10 (11). Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1999. Coherent measures of risk. Mathematical Finance 9 (3), 203­228. Barndorff-Nielsen, O.E., 1977. Exponentially decreasing distributions for the logarithm of particle size. Proceeding of the Royal Society London. Series A. Basle Committee on Banking Supervision, 1995a. An internal model-based approach to market risk capital requirements. Technical report. Basle Committee on Banking Supervision, Basle, Switzerland. Basle Committee on Banking Supervision, 1995b. Planned supplement to the capital accord to incorporate market risks. Technical report. Basle Committee on Banking Supervision, Basle, Switzerland. Beirlant, J., Teugels, J.L., Vynckier, P., 1996. Practical Analysis of Extreme Values. Leuven University Press. Belkacem, L., 1997. How to select optimal portfolio in -stable markets. Preprint. INRIA. Belkacem, L., Lévy Véhel, J., Walter, C., 1996. CAPM, risk and portfolio selection in stable markets. Preprint. INRIA. Bernstein, P.L., 1996. Against the Gods: The Remarkable Story of Risk. Wiley. Bertsimas, D., Lauprete, G.J., Samarov, A., 2000. Shortfall as a risk measure: Properties, optimization and application. Preprint. MIT. Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637­654. Bollerslev, T., 1986. Generalized autoregressive conditional heteroscadicity. Journal of Econometrics 31 (1), 34­ 105. Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modelling in finance: A review of the theory and empirical evidence. Journal of Econometrics 52, 307­327. Broadie, M., Glasserman, P., 1998. Simulation for option pricing and risk management. In: Alexander, C. (Ed.), Risk Management and Analysis. Wiley. Cizeau, P., Potters, M., Bouchaud, J.-P., 2001. Correlation structure of extreme stock returns. Quantitative Finance 1, 217­222. Danielsson, J., Hartmann, P., De Vries, C.G., 1998. The cost of conservatism: Extreme returns, value at risk and the Basle multiplication factor. RISK 11, 101­103. Danielsson, J., De Haan, L., Peng, L., De Vries, C.G., 2001. Using a bootstrap method to choose the sample fraction in the tail index estimation. Journal of Multivariate Analysis 76, 226­248. Danielsson, J., de Vries, C., 1997. Beyond the sample: Extreme quantile and probability estimation. Preprint, LSE. Danielsson, J., de Vries, C., 2000. Value at risk and extreme returns. In: Embrechts, P. (Ed.), Extremes and Integrated Risk Management. Risk Books, pp. 85­106. Diebold, F.X., Schuermann, T., Stroughair, J.D., 2000. Pitfalls and opportunities in the use of extreme value theory in risk management. Journal of Risk Finance 1 (Winter), 30­36. Dowd, K., 1998. Beyond Value at Risk: The New Science of Risk Management. Wiley. Drees, H., de Haan, L., Resnick, S., 2000. How to make a Hill plot. The Annals of Statistics 28 (1), 254­274. Duffie, D., Pan, J., 1997. An overview of value at risk. Journal of Derivatives 4 (3), 7­49. Eberlein, E., Keller, U., 1995. Hyperbolic distributions in finance. Bernoulli 1, 281­299. Eberlein, E., Prause, K., 2002. The generalized hyperbolic model: Financial derivatives and risk measures. In: Geman, H. (Ed.), Mathematical Finance ­ Bachelier Congress 2000. Springer-Verlag, pp. 245­267. Embrechts, P., 2000. Extreme value theory: Potential and limitations as an integrated risk management tool. Derivatives Use, Trading & Regulation 6, 449­456. Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal Events for Insurance and Finance. Springer, Berlin. 102 B.O. Bradley and M.S. Taqqu Embrechts, P., Lindskog, F., McNeil, A., 2003. Modelling dependence with copulas and applications to risk management. In: Heavy Tailed Distibutions in Finance. Elsevier. In this volume. Embrechts, P., McNeil, A.J., Straumann, D., 1999. Correlation: Pitfalls and alternatives. RISK 12 (5), 69­71. Embrechts, P., McNeil, A.J., Straumann, D., 2001. Correlation and dependence in risk management: Properties and pitfalls. In: Dempster, M., Moffatt, H.K. (Eds.), Risk Management: Value at Risk and Beyond. Cambridge University Press. Embrechts, P., Resnick, S.I., Samorodnitsky, G., 1999. Extreme value theory as a risk management tool. North American Actuarial Journal 3, 30­41. Engle, R.F., 1982. Autoregressive conditional heteroscadicity with estimates of the variance of U.K. inflation. Econometrica 50, 987­1008. Fama, E.F., 1965. The behavior of stock market prices. Journal of Business 38 (1), 34­105. Fama, E.F., 1970. Risk, return and equilibrium. Journal of Political Economy 79 (1), 30­55. Fang, K.T., Kotz, S., Ng, K.W., 1990. Symmetric Multivariate and Related Distributions. Chapman & Hall. Frank, M., 1979. On the simultaneous associativity of F(x,y) and x + y - F(x,y). Aequationes Mathematicae 19, 194­226. Frees, E.W., Valdez, E.A., 1998. Understanding relationships using copulas. North American Actuarial Journal 2 (1), 1­25. Gouriéroux, C., 1997. ARCH Models and Financial Applications. Springer-Verlag. Hill, B.M., 1975. A simple general approach to inference about the tail of a distribution. The Annals of Statistics 3 (5), 1163­1174. Huang, C., Litzenberger, R., 1988. Foundations of Financial Economics. North-Holland, New York. Hull, J., White, A., 1987. The pricing of options on assets with stochastic volatilities. Journal of Finance 2, 281­300. Ingersoll, J.E., 1987. Theory of Financial Decision Making. Rowman & Littlefield. Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman & Hall. Jorion, P., 2001. Value at Risk: The New Benchmark for Controlling Market Risk, 2nd edition. McGraw-Hill. Këllezi, E., Gilli, M., 2000. Extreme value theory for tail-related risk measures. Preprint. University of Geneva. Leadbetter, M.R., Lindgren, G., Rootzén, H., 1983. Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag. Levy, H., Kroll, Y., 1978. Ordering uncertain options with borrowing and lending. Journal of Finance 33, 553­ 573. Lindskog, F., 2000a. Linear correlation estimation. Preprint. ETH, Zürich. Lindskog, F., 2000b. Modelling dependence with copulas. Master's thesis. ETH, Zürich. Lintner, J., 1965. The valuation of risky assets and the selection of risky investment in stock portfolios and capital budgets. Review of Economics and Statistics 47, 13­37. Mandelbrot, B.B., 1963. The variation of certain speculative prices. Journal of Business 36, 392­417. Markowitz, H., 1952. Portfolio selection. Journal of Finance 7, 77­91. Markowitz, H., 1959. Portfolio Selection: Efficient Diversification of Investments. Wiley. McNeil, A., 1998a. Calculating quantile risk measures for financial return series using extreme value theory. Preprint. ETH, Zürich. McNeil, A., 1998b. On extremes and crashes. RISK, January. McNeil, A., Frey, R., 2000. Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. Journal of Empirical Finance 7, 271­300. Mikosch, T., St˘aric˘a, C., 2000. Change of structure in financial time series, long range dependence and the GARCH model. Preprint. McNeil, A., Saladin, T., 1997. The peaks over threshold method for estimating high quantiles of loss distributions. In: Proceedings of the 28th International ASTIN Colloquium. Mills, T.C., 1999. The Econometric Modelling of Financial Time Series. Cambridge University Press. Mittnik, S., Paolella, M.S., Rachev, T., 1997. Modelling the persistence of conditional volatilities with GARCHstable processes. Preprint. University of California, Santa Barbara. Ch. 2: Financial Risk and Heavy Tails 103 Mittnik, S., Rachev, T., Paolella, M.S., 1998. Stable Paretian modelling in finance: Some empirical and theoretical aspects. In: Adler et al. (Eds.), A Practical Guide to Heavy Tails. Birkhäuser. Nelsen, R.B., 1999. An Introduction to Copulas. Springer, New York. Nolan, J., 2001. Stable Distribution: Models for Heavy-Tailed Data. Birkhäuser, forthcoming. Panorska, A.K., Mittnik, S., Rachev, T., 1995. Stable GARCH models for financial time series. Applied Mathematics Letters 8, 33­37. Rachev, S., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley. Reiss, R.-D., Thomas, M., 2001. Statistical Analysis of Extreme Values, 2nd edition. Birkhäuser. Resnick, S., St˘aric˘a, C., 1997. Smoothing the Hill estimator. Advances in Applied Probability, 29. RiskMetrics, 1996. Technical document. Technical report. JP Morgan. Ross, S.A., 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13, 341­360. Samorodnitsky, G., Taqqu, M., 1994. Stable Non-Gaussian Random Processes. Chapman & Hall. Sharpe, W.F., 1964. Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance 19, 425­442. Shiryaev, A.N., 1999. Essentials of Stochastic Finance. World Scientific. Sklar, A., 1996. Random variables, distribution functions, and copulas ­ a personal look backward and forward. In: Rüschendorff et al. (Eds.), Distributions with Fixed Marginals and Related Topics. Institute of Mathematical Sciences, Hayward, CA. Smith, R., 2000. Measuring risk with extreme value theory. In: Embrechts, P. (Ed.), Extremes and Integrated Risk Management. Risk Books, pp. 19­35. Stahl, G., 1997. Three cheers. RISK 10, 67­69. Wilson, T.C., 1998. Value at risk. In: Alexander, C. (Ed.), Risk Management and Analysis. Wiley. Chapter 3 MODELING FINANCIAL DATA WITH STABLE DISTRIBUTIONS JOHN P. NOLAN Department of Mathematics and Statistics, American University Contents Abstract 106 1. Basic facts about stable distributions 107 2. Appropriateness of stable models 111 3. Computation, simulation, estimation and diagnostics 113 4. Applications to financial data 114 5. Multivariate stable distributions 116 6. Multivariate computation, simulation, estimation and diagnostics 121 7. Multivariate application 124 8. Classes of multivariate stable distributions 126 9. Operator stable distributions 128 10. Discussion 128 References 129 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 106 J.P. Nolan Abstract Stable distributions are a class of probability distributions that allow heavy tails and skewness. In addition to theoretical reasons for using stable laws, they are a rich family that can accurately model different kinds of financial data. We review the basic facts, describe programs that make it practical to use stable distributions, and give examples of these distributions in finance. A non-technical introduction to multivariate stable laws is also given. Ch. 3: Modeling Financial Data 107 1. Basic facts about stable distributions Stable distributions are a class of probability laws that have intriguing theoretical and practical properties. Their applications to financial modeling comes from the fact that they generalize the normal (Gaussian) distribution and allow heavy tails and skewness, which are frequently seen in financial data. In this chapter, we focus on the basic definition and properties of stable laws, and show how they can be used in practice. We give no proofs; interested readers can find these in Zolotarev (1986), Samorodnitsky and Taqqu (1994), Janicki and Weron (1994), Uchaikin and Zolotarev (1999), Rachev and Mittnik (2000) and Nolan (2003). The defining characteristic, and reason for the term stable, is that they retain their shape (up to scale and shift) under addition: if X,X1,X2,...,Xn are independent, identically distributed stable random variables, then for every n X1 + X2 + + Xn d = cnX + dn (1) for some constants cn > 0 and dn. The symbol d = means equality in distribution, i.e., the right- and left-hand sides have the same distribution. The law is called strictly stable if dn = 0 for all n. Some authors use the term sum stable to emphasize the stability under addition and to distinguish it from other concepts, e.g., max-stable, min-stable, etc. The normal distributions satisfy this property: the sum of normals is normal. Likewise the Cauchy laws and the Lévy laws (see below) satisfy this property. The class of all laws that satisfy (1) is described by four parameters, which we call (,,,), see Figure 1 for some density graphs. In general, there are no closed form formulas for stable densities f and cumulative distribution functions F, but there are now reliable computer programs for working with these laws. The parameter is called the index of the law or the index of stability or characteristic exponent and must be in the range 0 < 2. The constant cn in (1) must be of the form n1/. The parameter is called the skewness of the law, and must be in the range -1 1. If = 0, the distribution is symmetric, if > 0 it is skewed toward the right, if < 0, it is skewed toward the left. The parameters and determine the shape of the distribution. The parameter is a scale parameter, it can be any positive number. The parameter is a location parameter, it shifts the distribution right if > 0, and left if < 0. A confusing issue with stable parameters is that there are multiple definitions of what the parameters mean. There are at least 10 different definitions of stable parameters, see Nolan (2003). The reader should be careful in reading the literature and verify what parameterization is being used. We will describe two different parameterizations, which we denote by S(,,,0;0) and S(,,,1;1). The first is what we will use in all our applications, because it has better numerical behavior and intuitive meaning. The second parameterization is more commonly used in the literature, so it is important to understand it. The parameters , and have the same meaning in the two parameterizations, only 108 J.P. Nolan Fig. 1. Standardized stable densities for different and in the S(,,1,0;0) parameterization. The top graph includes a Lévy(1,-1) = S(1/2,1,1,0;0) = S(1/2,1,1,-1;1) graph and the middle graph includes a Cauchy(1,0) = S(1,0,1,0;0) = S(1,0,1,0;1) graph. the location parameter is different. To distinguish between the two, we will sometimes use a subscript to indicate which parameterization is being used: 0 for the location parameter in the S(,,,0;0) parameterization and 1 for the location parameter in the S(,,,1;1) parameterization. Ch. 3: Modeling Financial Data 109 Definition 1. A random variable X is S(,,,0;0) if it has characteristic function E exp(iuX) (2) = exp - |u| 1 + i tan 2 (signu) | u|1- 1 + i0u , = 1, exp - |u| 1 + i 2 (signu)ln |u| + i0u , = 1. Definition 2. A random variable X is S(,,,1;1) if it has characteristic function E exp(iuX) = exp - |u| 1 - i tan 2 (signu) + i1u , = 1, exp - |u| 1 + i 2 (signu)ln|u| + i1u , = 1. (3) The location parameters are related by 0 = 1 + tan 2 , = 1, 1 + 2 ln, = 1, 1 = 0 - tan 2 , = 1, 0 - 2 ln, = 1. (4) Note that if = 0, the parameterizations coincide. When = 0, the parameterizations differ by a shift tan 2 , which gets infinitely large as 1. In particular, the mode of a S(,,,1;1) density tends toward (if sign( - 1) > 0) or - (otherwise) as 1. When is near 1, computing stable densities and cumulatives in this range is numerically difficult and estimating parameters is unreliable. From the applied point of view, it is preferred to use the S(,,,0;0) parameterization, which is jointly continuous in all four parameters. The arguments for using the S(,,,1;1) parameterization are historical and algebraic simplicity. It seems unavoidable that both parameterizations will be used, so users of stable distributions should know both and state clearly which they are using. There are three cases where one can write down closed form expressions for the density and verify directly that they are stable ­ normal, Cauchy and Lévy distributions. Example 1 (Normal or Gaussian distributions). X N(,2) if it has a density f (x) = 1 2 exp (x - )2 22 , - < x < . Gaussian laws are stable with = 2 and = 0; more precisely N(,2) = S(2,0,/ 2, ;0) = S(2,0,/ 2,;1). 110 J.P. Nolan Example 2 (Cauchy distributions). X Cauchy(,) if it has density f (x) = 1 2 + (x - )2 , - < x < . Cauchy laws are stable with = 1 and = 0; more precisely, Cauchy(,) = S(1,0,,;0) = S(1,0,,;1). Example 3 (Lévy distributions). X Lévy(,) if it has density f (x) = 2 1 (x - )3/2 exp - 2(x - ) , < x < . These are stable with = 1/2, = 1; Lévy(,) = S 1 2 ,1,, + ;0 = S 1 2 ,1,,;1 . The graphs in Figure 1 show several qualitative features of stable laws. First, stable distributions have densities and are unimodal. These facts are not obvious: since there is no general formula for stable densities, indirect arguments must be used and it is quite involved to prove unimodality. Second, the - curve is a reflection of the curve. Third, when is small, the skewness is significant, when is large, the skewness parameter matters less and less. The support of a stable density is either all of (-,) or a halfline. The latter case occurs if and only if 0 < < 1 and = +1 or -1. More precisely, the support of density f (x|,,,;k) for a S(,,, ;k) law is - tan 2 , , < 1, = +1, k = 0, -, + tan 2 , < 1, = -1, k = 0, [,), < 1, = +1, k = 1, (-,], < 1, = -1, k = 1, (-,), otherwise. In particular, to model a positive distribution, a S(,1,,0;1) distribution with < 1 is used. When = 2, the normal law has light tails and all moments exist. Except for the normal law, all stable laws have heavy tails with an asymptotic power law (Pareto) decay. The term stable Paretian distributions is used to distinguish the < 2 cases from the normal case. For X S(,,1,0;0) with 0 < < 2 and -1 < 1, then as x , P(X > x) c(1 + )x- , f (x|,;0) c(1 + )x-(+1) , Ch. 3: Modeling Financial Data 111 where c = ()(sin 2 )/. When = -1, the right tail decays faster than any power. The left tail behavior is similar by the symmetry property mentioned above. One consequence of these heavy tails is that only certain moments exist. This is not a property restricted to stable laws: any distribution with power law decay will not have certain moments. When < 2, it can be shown that the variance does not exist and that when 1, the mean does not exist. If we use fractional moments, then the p-th absolute moment E|X|p = |x|pf (x)dx exists if and only if p < . We stress that this is a population moment, and by definition it is finite when the integral just above converges. If the tails are too heavy, the integral will diverge. In contrast, the sample moments of all orders will exist: one can always compute the variance of a sample. The problem is that it does not tell you much about stable laws because the sample variance does not converge to a well-defined population moment (unless = 2). If X, X1, X2 are i.i.d. stable, then for any a,b > 0, aX1 + bX2 d = cX + d, for some c > 0, - < d < . This condition is equivalent to (1) and can be taken as a definition of stability. More generally, linear combinations of independent stable laws with the same are stable: if Xj S(,j ,j ,j ;k) for j = 1,...,n, then a1X1 + a2X2 + + anXn S(,,,;k), (5) where = ( n j=1 j (signaj )|aj j |)/ n j=1 |aj j |, = n j=1 |aj j |, and = j + tan 2 , k = 0, = 1, j + 2 ln, k = 0, = 1, j , k = 1. This is a generalization of (1): it allows different skewness, scales and locations in the terms. It is essential that all the s are the same: adding two stable random variables with different s does not give a stable law. 2. Appropriateness of stable models Stable distributions have been proposed as a model for many types of physical and economic systems. There are several reasons for using a stable distribution to describe a system. The first is where there are solid theoretical reasons for expecting a non-Gaussian stable model, e.g., reflection off a rotating mirror yielding a Cauchy distribution, hitting times for a Brownian motion yielding a Lévy distribution, the gravitational field of stars 112 J.P. Nolan yielding the Holtsmark distribution; see Feller (1975) and Uchaikin and Zolotarev (1999) for these and other examples. The second reason is the Generalized Central Limit Theorem, see below, which states that the only possible non-trivial limit of normalized sums of independent identically distributed terms is stable. It is argued that some observed quantities are the sum of many small terms, e.g., the price of a stock, and hence a stable model should be used to describe such systems. The third argument for modeling with stable distributions is empirical: many large data sets exhibit heavy tails and skewness. The strong empirical evidence for these features combined with the Generalized Central Limit Theorem is used to justify the use of stable models. Examples in finance and economics are given in Mandelbrot (1963), Fama (1965), Embrechts, Klüppelberg and Mikosch (1997), and Rachev and Mittnik (2000). Such data sets are poorly described by a Gaussian model, some can be well described by a stable distribution. The classical Central Limit Theorem says that the normalized sum of independent, identical terms with a finite variance converges to a normal distribution. The Generalized Central Limit Theorem shows that if the finite variance assumption is dropped, the only possible resulting limits are stable. Let X1,X2,X3,... be a sequence of independent, identically distributed random variables. There exists constants an > 0, bn and a non-degenerate random variable Z with an(X1 + + Xn) - bn d -Z (6) if and only if Z is stable. A random variable X is in the domain of attraction of Z if there exists constants an > 0, bn such that (6) holds when X1,X2,X3,... are independent identically distributed copies of X. The Generalized Central Limit Theorem says that the only possible distributions with a domain of attraction are stable. Characterizations of distributions in the domain of attraction of a stable law are in terms of tail probabilities. The simplest is: if X is a random variable with xP(|X| > x) c > 0 for some 0 < < 2 as x , then X is in the domain of attraction of an -stable law. Even if we accept that large data sets have heavy tails, is it ever reasonable to use a stable model? One of the arguments against using stable models is that they have infinite variance, which is inappropriate for real data that have bounded range. However, bounded data are routinely modeled by normal distributions which have infinite support. The only justification for this is that the normal distribution gives a usable description of the shape of the distribution, even though it is clearly inappropriate on the tails for any problem with naturally bounded data. The same justification can be used for stable models: does a stable fit gives an accurate description of the shape of the distribution? The variance is one measure of spread; the scale in a stable model is another. Perhaps practioners are so used to using the variance as the measure of spread, that they automatically retreat from models without a variance. The parameters and can play the role of the scale and location usually played by the mean and variance. For the normal distribution, the first and second moment completely specify the distribution; for most distributions they do not. Ch. 3: Modeling Financial Data 113 We propose that the practitioner approach this dispute as an agnostic. The fact is that until recently we have not really been able to compare data sets to a proposed stable model. The next Section shows that estimation of all four stable parameters is feasible and that there are methods to assess whether a stable model accurately describes the data. In some cases there are solid theoretical reasons for believing that a stable model is appropriate; in other cases we will be pragmatic: if a stable distribution describes the data accurately and parsimoniously with four parameters, then we accept it as a model for the observed data. 3. Computation, simulation, estimation and diagnostics Until recently, it was difficult to use stable laws in practical problems because of computational difficulties. Most of these difficulties have been resolved by the program STABLE,1 which can compute stable densities, cumulative distribution functions and quantiles. The basic method used in the program are described in Nolan (1997). Later improvements to the program include incorporating the Chambers, Mallows and Stuck (1976) method of simulating stable random variables, improved accuracy in the calculations, and estimation of stable parameters from data sets. Except for close to 0, it is now possible to quickly and accurately work with stable distributions. We will not discuss details of these programs here, but will focus on the practical problems of estimation and assessing goodness of fit. The basic estimation problem for stable laws is to estimate the four parameters (,,,) from an i.i.d. sample X1,X2,...,Xn. Because of numerical problems with the 1-parameterization, we will always use the 0-parameterization in estimation. If desired, the parameter 1 can be estimated by using (4). There are several methods available for this basic estimation problem: a quantile method of McCulloch (1986), a fractional moment method of Ma and Nikias (1995), sample characteristic function (SCF) method of Kogon and Williams (1998) based on ideas of Koutrouvelis, and maximum likelihood (ML) estimation of DuMouchel (1971) and Nolan (2001). These methods have been compared in a large simulation study, Ojeda (2001), who found that the ML estimates are almost always more accurate, with the SCF estimates next best, followed by the quantile method, and finally the moment method. The ML method has the added advantage that one can give large sample confidence intervals for the parameters, based on numerical computations of the Fisher information matrix. Perhaps just as important as methods of estimation, are diagnostics for assessing the fit. While a Kolmogorov­Smirnov goodness-of-fit test statistic can be computed, giving a correct significance level to such a test when comparing a data set to a fitted distribution is an involved problem. However, one can adapt standard exploratory data analysis graphical techniques to informally evaluate the closeness of a stable fit. We have found that comparing smoothed data density plots to a proposed fit gives a good sense of how good the fit is near the center of the data. P­P plots allow a comparison over the range of the data. 1 The program STABLE is available at www.mathstat.american.edu and following the "Faculty" link to the author's homepage. 114 J.P. Nolan For technical reasons we recommend the "variance stabilized" P­P plot of Michael (1983). We found Q­Q plots not as satisfactory for comparing heavy tailed data to proposed fit. One reason for this is visual ­ by definition a heavy tailed data set will have many more extreme values than a typical sample from finite variance population. This forces a Q­Q plot to be visually compressed, with a few extreme values dominating the plot. Also, the heavy tails imply that the extreme order statistics will have a lot of variability, and hence deviations from an ideal straight line Q­Q plot are hard to assess. The next section shows some examples of these techniques on financial data, more examples can be found in Nolan (1999, 2001). There are methods for more complicated estimation problems involving stable laws. For example, regression models with stable residuals have been described by McCulloch (1998) for the symmetric stable case and Ojeda (2001) for the general case. The problem analyzing time series with stable noise is discussed in Section II of Adler, Feldman and Taqqu (1998), in Nikias and Shao (1995), and in Rachev and Mittnik (2000). McCulloch (1996) and Rachev and Mittnik (2000) give methods of pricing options under stable models. 4. Applications to financial data The first example we consider is the British Pound vs. German Mark exchange rate. The data set has daily exchange rates for the 16 year period from 2 January 1980 to 21 May 1996. The log of the successive exchange rates was computed as yt = ln(xt+1/xt), yielding 4,274 yt values. The ML parameter estimates with 95% confidence intervals are 1.495 0.047 for , -0.1820.085 for , 0.002440.00008 for and 0.000190.00013for 0. Figure 2 shows a P­P plot and density for the data vs. the stable fit. The third curve in the density plot is the normal/Gaussian fit to the data. The next example is another exchange rate one, this time from a developing country. This data set consists of monthly exchange rates between the US Dollar and the Tanzanian Shilling, from January 1975 to September 1997. The log of the successive exchange rates were computed as above for this monthly data, giving a data set with n = 213 points. The ML parameter estimates with 95% confidence intervals are 1.088 0.185 for , 0.112 0.251 for , 0.0300 0.0055 for and 0.00501 0.00621 for 0. The more extreme fluctuations of the Tanzanian Shilling exchange rate show up in the smaller estimate of and in the larger estimate of . Figure 3 shows the diagnostics, with the third curve again showing a normal/Gaussian fit. The third example is from the stock market. McCulloch (1997) analyzed 40 years of monthly stock price data from the Center for Research in Security Prices (CRSP). The data set is 480 values of the CRSP value-weighted stock index, including dividends and adjusted for inflation. The ML estimates with 95% confidence intervals are 1.855 0.110 for , -0.5580.615 for , 2.7110.213 for , and 0.8710.424 for 0. Figure 4 shows the goodness of fit. Stable distributions may be a useful tool in Value at Risk (VaR) calculations. The goal of VaR calculations is to assess the risk in an asset by estimating population quantiles. Stable Ch. 3: Modeling Financial Data 115 Fig. 2. P­P plot and density plot for Pound vs. Mark exchange rate data. On the density plot, the dotted curve is the smoothed data, the solid curve is the stable fit, the dashed curve is a normal fit. Fig. 3. P­P plot and density plot for the Tanzanian Shilling/US Dollar exchange rate. distributions have two advantages over normal distributions: they can explicitly model both the heavier tails and asymmetry that are frequently found in financial data. Sometimes the normal distribution can give reasonable VaR estimates, because the sample variance is inflated by the extreme values in the sample. If one is lucky, the poor fitting normal distribution may approximate certain quantiles well, at the cost of poorly approximating other quantiles. Additionally, some practioners compensate for the heavy tail behavior by "adjusting" a normal quantile estimate by some empirical factor. If a stable distribution gives a more accurate fit to the sample, then it is more likely to accurately predict the VaR values. In order to compare different fits, a plot like Figure 5 can be useful. It uses the Deutsch Mark exchange rate data (log ratios of successive values) described above. 116 J.P. Nolan Fig. 4. P­P plot and densities for the CRSP stock price data. Fig. 5. VaR comparison of quantiles for the Deutsch Mark exchange rate data (circles), quantiles predicted by the stable fit (solid line), and quantiles predicted by the normal distribution (dotted line). 5. Multivariate stable distributions This section is about d-dimensional stable laws. Such random vectors will be denoted by X = (X1,...,Xd). The definition of stability is the same as in (1): for i.i.d. X,X1,X2,..., X1 + X2 + + Xn d = anX + bn, (7) Ch. 3: Modeling Financial Data 117 for some an > 0, and some vector bn Rd . As in one dimension, an equivalent definition is that aX1 + bX2 d = cX + d for all a,b > 0. If X is a stable random vector, then every one-dimensional projection u X = uiXi is a one-dimensional stable random variable with the same index for every u. The phrase "jointly stable" is sometimes used to stress the fact that the definition forces all the components Xj to be univariate -stable with one . Conversely, suppose X is a random vector with the property that every one-dimensional projection u X is one-dimensional stable, e.g., u X S(,(u),(u), (u),(u);1). Then there is one that is the index of all projections, i.e., (u) = is constant. If 1, then X is stable. If < 1 and the location parameter function (u) and the vector of location parameters = (1,2,...,d) of the components X1,X2,...,Xd (all in the 1 parameterization) are related by (u) = u , (8) then X is stable. The point here is that we have a way of determining joint stability in terms of univariate stability and, when < 1, Equation (8). We note that (8) holds automatically when > 1, so the condition is only required when < 1. Furthermore, (8) is necessary when = 1, so it cannot be dropped. There are examples, e.g., Section 2.2 of Samorodnitsky and Taqqu (1994), where < 1 and all one-dimensional projections are stable, but (8) fails and X is not jointly stable. One way of parameterizing multivariate stable distributions is to use the above results about one dimensional projections. For any vector u Rd, u X S ,(u), (u),(u);k , k = 0,1. Thus we know the (univariate) characteristic function of u X for every u, and hence the joint characteristic function of X. Therefore and the functions (), () and () completely characterize the joint distribution. In fact, knowing these functions on the sphere Sd = {u Rd: |u| = 1} characterizes the distribution. The functions (), () and () must satisfy certain regularity conditions. The standard way of describing multivariate stable distributions is in terms of a finite measure on Sd , called the spectral measure. Let X = (X1,...,Xd) be jointly stable, say u X S ,(u), (u),(u);k , k = 0,1. Then there exists a finite measure on Sd and a location vector Rd with (u) = Sd |u s| (ds) 1/ , (u) = Sd |u s| sign(u s)(ds) Sd |u s|(ds) , (9) 118 J.P. Nolan (u) = u, k = 1, = 1, u - 2 Sd (u s)ln|u s|(ds), k = 1, = 1, u + tan 2 (u) (u), k = 0, = 1, u - 2 Sd (u s)ln(u s)(ds) + 2 (u) (u)ln (u), k = 0, = 1. Thus another way to parameterize is X S(,,;k), k = 0,1. If one knows , then the above equations specify the parameter functions (), () and (). Going the other direction is more difficult. If one recognizes a certain form for the parameter functions, then one can specify the spectral measure. In the general case, one can numerically invert the map ((), (),()) to get a discrete approximation to . It is possible for X to be non-degenerate, but singular. For example, X = (X1,0) is formally a two-dimensional stable distribution if X1 is univariate stable, but it is supported on a one-dimensional subspace. In what follows, we will always assume that X is nonsingular that is, it has a density on Rd. It can be shown that the following are equivalent: (i) X is non-singular, (ii) (u) > 0 for all non-zero u Rd , and (iii) span support() = Rd . For 1, the support of non-singular stable X is all of Rd. When < 1, it can be all of Rd or a cone, depending on the spectral measure. For A is a subset of Rd, define CCH(A) = closed convex hull of A = closure of x = a1b1 + + anbn Rd : a1,...,an A, b1,...,bn 0 . Note that we only take positive linear combinations of elements of A, so this is not generally the closed span of A. The translate of a cone is denoted by CCH(A) + = {x + : x CCH(A)}. Then the support of X S(,,;1) is supportX = CCH(support()) + , < 1, Rd, 1. For example, in the two-dimensional case, if the spectral measure is supported in the first quadrant, < 1, and = 0, then the support of the corresponding stable distribution is contained in the first quadrant, i.e., both components are positive. The tail behavior of X is easiest to describe in terms of the spectral measure. It is best stated in polar form: let A Sd , then lim r P(X CCH(A),|X| > r) P(|X| > r) = (A) (Sd) . Ch. 3: Modeling Financial Data 119 The tail behavior of the densities is more intricate. In the radially symmetric case, f (x) c|x|-(d+) as |x| . In other cases, the tail behavior can have very different behavior in different directions. For example, in the bivariate independent case, the joint density factors f (x1,x2) = f1(x1)f2(x2). The one-dimensional results above show f (x,0) c1x-(1+) along the x-axis, but f (x,x) c2x-2(1+) along the diagonal. The Fig. 6. Density surface and level curves for "triangle" example of a bivariate stable law. Fig. 7. Contour plots for bivariate stable densities with independent S(,,1,0;1) components. The plots show = 0.6, = 0 in upper left, = 0.6, = 1 in upper right, = 1.6, = 0 in lower left, and = 1.6, = 1 in lower right. 120 J.P. Nolan general case is complicated, depending on the nature (discrete, continuous) and spread of the spectral measure. We now give some examples of bivariate stable densities, see the next section for information on their computation. In all cases, the shift vector = 0. Example 4. The first example uses = 1.2 and a discrete spectral measure with three unit point masses, distributed on the unit circle at angles /3, and -/3. A plot of the density surface and level curves are given in Figure 6. The triangular spread of the spectral measure shows up in the triangular shape of the level curves. The contour plot reveals more about the shape of the surface, so the following examples will show only the contour plots. Example 5. Figure 7 shows the contour plots of the independent components cases when = 0.6, 1.6 and = 0,1. Note that the upper right graph has < 1 and is supported in the first quadrant. Example 6. Figure 8 shows a mix of different contours, mostly to show the range of possibilities. The upper left plot shows an elliptically contoured stable distribution with = 1.5 and "covariation matrix" R = 1.0 0.7 0.7 1.0 . Fig. 8. Contours of miscellaneous bivariate stable distributions. Ch. 3: Modeling Financial Data 121 The upper right plot shows a = 0.8 stable distribution with discrete spectral measure having point masses at angles -/9, /6, /3, /2 and uniform weight j = 0.3. The lower left plot uses = 0.7 with a discrete spectral measure with point masses at angles /9, 4/9, 10/9, 13/9 of weight 0.75, 1, 0.25, 1. The lower right plot uses the same discrete spectral measure as the lower left, but with = 1.5. There are some general statements that can be made about the qualitative behavior of multivariate stable densities. For fixed , central behavior is determined by overall spread of the spectral measure: if the spectral mass is highly concentrated the density is close to singular, with large values near the center; if the spectral mass is more evenly spread around the sphere, the density is less peaked. On the tails, behavior is determined by the exact distribution of the spectral measure, with the contour lines bulging out in directions where the spectral measure is concentrated. This tail effect is more pronounced for small values of , where distributions can be highly skewed, and becomes less pronounced as approaches 2, where contours are all rounded into ellipses. 6. Multivariate computation, simulation, estimation and diagnostics The computational problems are challenging, and not solved for general multivariate stable distributions. The problems are caused by the both the usual difficulties of working in d dimensions and by the complexity of the possible distributions: spectral measures are an uncountable set of "parameters". The graphs above were computed by the program MVSTABLE (available at the same web-site noted above), which only works in 2 dimensions and has limited accuracy. Density calculations are based on either numerically inverting the characteristic function as described in Nolan and Rajput (1995) or by numerically implementing the symmetric formulas in Abdul-Hamid and Nolan (1998). One class of accessible models is when the spectral measure is discrete with a finite number of point masses: () = n j=1 j 1{}(sj ). (10) This class is dense in the space of all stable distributions: given an arbitrary spectral measure 1, there is a concrete formula for n and a discrete spectral measure 2 such that the densities of the corresponding stable densities are uniformly close on Rd. In the case of a discrete spectral measure, the parameter functions (), () and () are computed as finite sums, rather than (d - 1)-dimensional integrals, which makes all computations easier. It also makes simulation simple in an arbitrary dimension: X S(,,;k) where is given by (10) can be simulated by the vector sum X d = n j=1 1/ j Zj sj + , 122 J.P. Nolan where Z1,...,Zn are i.i.d. univariate S(,1,1,0;k) random variables. Another example where computations are more accessible is the elliptically contoured, or sub-Gaussian, stable distributions described in Section 8. Such densities are easier to compute and simulation is straightforward. Certain sub-stable distributions are also easy to simulate: if < 1, X is strictly 1-stable and A is positive (/1)-stable, then A1/1X is -stable. Since sums and shifts of multivariate stables are also multivariate stable, one can combine these different classes to simulate a large class of multivariate stable laws. There are several methods of estimating for multivariate stable distributions. If you know the distribution is isotropic (radially symmetric), then Problem 4, p. 44 of Nikias and Shao (1995) gives a way to estimate and then the constant scale function/uniform spectral measure from fractional moments. In general one should let the data speak for itself, and see if the spectral measure is constant. The general techniques involve some estimate of and some estimate of the spectral measure ^ = m k=1 k1{}(sk), sk Sd. Rachev and Xin (1993) and Cheng and Rachev (1995) use the fact that the directional tail behavior of multivariate stable distributions is Pareto, and base an estimate of on this. Nolan, Panorska and McCulloch (2001) define two other estimates of , one based on the joint empirical/sample ch. f. and one based on the one-dimensional projections of the data. Using the fact that one-dimensional projections are univariate stable gives a way of assessing whether a multivariate data set is stable by looking at just one-dimensional projections of the data. Fit projections in multiple directions using the univariate techniques described above, and see if they are well described by a univariate stable fit. If so, and if the 's are the same for every direction (and if < 1, the location parameters satisfy (8)), then a multivariate stable model is appropriate. We will illustrate this in examples below. For the purposes of comparing two multivariate stable distributions, the parameter functions (,(u), (u),(u)) are more useful than itself. This is because the distribution of X depends more on how distributes mass around the sphere than exactly on the measure. Two spectral measures can be far away in the traditional total variation norm (e.g., one can be discrete and the other continuous), but their corresponding parameter functions and densities can be very close. The diagnostics suggested for assessing stability of a multivariate data set are: * Project the data in a variety of directions u and use the univariate diagnostics described in Section 3 on each of those distributions. Bad fits in any direction indicate that the data is not stable. * For each direction u, estimate the parameter functions (u), (u), (u), (u) by ML estimation. The plot of (u) should be a constant, significant departures from this indicate that the data has different decay rates in different directions. (Note that (t) will be a constant iff the distribution is isotropic.) * Assess the goodness-of-fit by computing a discrete ^ by one of the methods above. Substitute the discrete ^ in (9) to compute parameter functions. If it differs from the one obtained above by projection, then either the data is not jointly stable, or not enough points were chosen in the discrete spectral measure approximation. These techniques are illustrated in the next section. Ch. 3: Modeling Financial Data 123 Fig. 9. Projection diagnostics for the German Mark and Japanese Yen exchange rates. 124 J.P. Nolan 7. Multivariate application Here we will examine the joint distribution of the German Mark and the Japanese Yen. The data set is the one described above in the univariate example. We are interested in both assessing whether the joint distribution is bivariate stable and in estimating the fit. Figure 9 shows a sequence of smoothed density, q­q plot and variance stabilized p­p plot for projections in 8 different directions: /2, /3, /4, /6, 0, -/6, -/4, -/3. (We restrict to the right half plane because projections in the left half plane are reflections of those in the right half plane.) These projections are similar to Figure 2, in fact the fifth Fig. 10. Estimation results for the German Mark and Japanese Yen exchange rates. Ch. 3: Modeling Financial Data 125 row of Figure 9 is exactly the same as Figure 2. Except on the extreme tails, the stable fit does a good job of describing the data. The projection functions (t), (t), (t), and (t) were estimated and used to compute an estimate of the spectral measure using the projection method. The results are shown in Figure 10. It shows a discrete estimate of the spectral measure (with m = 100 evenly spaced point masses) in polar form, a cumulative plot of the spectral measure in rectangular form, and then four plots for the parameter estimates ((t),(t), (t),(t)). Also on the (t) plot is a horizontal line showing the average value of all the estimated indices which is taken as the estimate of the common that should come from a jointly stable distribution. The plots of (t) and (t) also show the skewness and scale functions computed from the estimated spectral measure substituted into (9). These curves, which are based on a joint estimate of the spectral measure, are indistinguishable from the direct, separate estimates of the directional parameters. The fitted spectral measure was used to plot the fitted bivariate density shown in Figure 11. The spread of the spectral measure is spiky, and masks a pattern that is more obvious in the density surface: the approximate elliptical contours of the fitted density. This suggests modeling the data by a sub-Gaussian stable distribution, a topic discussed in the next section. Some comments on these plots. The polar plots of the spectral measure show a unit circle and lines connecting the points (j ,rj ), where j = 2(j -1)/m and rj = 1 +(j /max), where max = maxj . The polar plots are spiky, because we are estimating a discrete object. What should be looked at is the overall spread of mass, not specific spikes in the plot. In cases where the spectral measure is really smooth, it may be appropriate to smooth these plots out to better show iťs true nature. In cases where the measure is discrete, i.e., the independent case, then one wants to emphasize the spikes. So there is no satisfactory general solution and we just plot the raw data. Finally, most graphing programs will set vertical scale so that the graph fills the graph. This emphasizes minor fluctuations in the data that are not of practical significance. In the graphs below, the vertical scales for the parameter functions (t), (t), (t) are respectively [0,2], [-1,1], and [0,1.2 × max (t)]. These bounds show how the functions vary Fig. 11. Estimated density surface and level curves for a bivariate stable fit to the German Mark and Japanese Yen exchange rates. 126 J.P. Nolan over their possible range. For (t), we used the bounds [-1.2×max|(t)|,1.2×max|(t)|], which visually exaggerates the changes in (t). A scale that depends on max (t) may be more appropriate. 8. Classes of multivariate stable distributions There may be cases where we believe that a multivariate sample has certain structure. If so, we can fit a stable model that takes this into account. This may give a more parsimonious fit to the model, especially if the data set is high dimensional. Below we fill focus on elliptically contoured distributions and see that it is computationally accessible. The idea here is to estimate an and a matrix R so that the scale function is closely approximated by (u) = (uRu)/2. The principle can be generalized to other special classes of distributions. Given some parametric model for the scale function (), one can fit parameters, or use a nonparametric model (smoothing or loess) for the scale. Or, one can assume a special form of the spectral measure (), which determines the scale function (). The methods of estimation described above do this implicitly, by assuming is discrete as in (10). This can be adapted in many ways. If we assume the components of the data are independent, then we can only allow point masses at "poles", i.e., where the coordinate axes intersect the sphere. If we assume the spectral measure is concentrated on some smaller region, then one can allow point masses only in that region. If we assume the spectral measure is continuous, then one can use some particular model for its density, say as a sum of terms like (ds) = n k=1 k(s)ds, where the density terms k() in the sum have some accessible form. If the goal is a computationally accessible model, then an ad hoc approach may be useful. First compute a fit using a discrete spectral measure. If there are clearly defined point masses that are isolated, then include them and try to model the rest as an elliptical model, or using some spectral density. Since the foreign exchange data seems to be approximately elliptically contoured, there may be interest in categorizing such stable distributions. The main practical advantage to this is that all d-dimensional elliptically contoured stable distributions are parameterized by and a symmetric, positive definite d × d matrix. Since the matrix is symmetric, there are a total of 1 + d(d + 1)/2 parameters. This is quite different from the general stable case, which involves an infinite dimensional spectral measure. Even a discrete approximating measure involves a much larger number of terms: if a "polar grid" is used with each of the angle directions divided up evenly with k subintervals, then there are kd-1 point masses to be estimated. For X an non-singular symmetric -stable random vector, the following are equivalent: * X is elliptically contoured around the origin. * X is sub-Gaussian, i.e., X d = A1/2G, where A S(,1,,0;1) and G N(0,R). * The characteristic function is E exp(iu X) = exp(-(uRuT)/2), for some symmetric, positive definite matrix R. There is a "random volatility" interpretation of sub-Gaussian distributions. Think of G as an underlying multivariate normal model for the returns on d assets with random scale Ch. 3: Modeling Financial Data 127 A1/2. In general, A can be any positive random variable, but the product will be -stable only when A is itself a positive (/2)-stable random variable. Computations with elliptically contoured stable distributions is much simpler than the general stable case. All calculations are essentially reduced to one-dimensional problems: the linear transformation Y = R-1/2X gives a radially symmetric distribution. With a radially symmetric density, one only needs to compute it along some one-dimensional ray. In symbols, f (x) = det(R)-1/2f (|R-1/2xT|,0,0,...,0) = c(R)g(|R-1/2x |). The univariate function g can be computed for arbitrary dimension d by numerically evaluating the univariate integral g(x) = (2)-d/2 0 e-x2/(2t) f t 2 ,1,2 cos 4 2/ ,0;1 dt. We next describe ways of assessing a d-dimensional data set to see if it is approximately sub-Gaussian and then estimating the parameters of a sub-Gaussian vector. First perform a one-dimensional stable fit to each coordinate of the data using one of the methods described above, to get estimates ^i = (^i, ^i, ^i, ^i). If the i's are significantly different, then the data is not jointly -stable, so it cannot be sub-Gaussian. Likewise, if the i's are not all close to 0, then the distribution is not symmetric and it cannot be sub- Gaussian. If the i 's are all close, form a pooled estimate of = ( d i=1 i)/d = average of the indices of each component. Then shift the data by ^ = (^1, ^2,..., ^d) so the distribution is centered at the origin. Next, test for sub-Gaussian behavior. This can be accomplished by examining twodimensional projections because of the following result. If X is a d-dimensional subGaussian -stable random vector, then every two-dimensional projection Y = (Y1,Y2) = (a1 X,a2 X), (11) (a1,a2 Rd) is a two-dimensional sub-Gaussian -stable random vector. Conversely, suppose X is a d-dimensional -stable random vector with the property that every twodimensional projection of form (11) is non-singular sub-Gaussian. Then d-dimensional X is non-singular sub-Gaussian -stable. Estimating the d(d + 1)/2 parameters (upper triangular part) of R can be done in at least two ways. For the first method, set rii = 2 i , i.e., the square of the scale parameter of the i-th coordinate. Then estimate rij by analyzing the pair (Xi,Xj ) and take rij = ( 2(1,1)-rii -rjj )/2, where (1,1) is the scale parameter of (1,1)(Xi,Xj ) = Xi +Xj . This involves estimating d + d(d - 1)/2 = d(d + 1)/2 one-dimensional scale parameters. For the second method, note that if X is -stable sub-Gaussian, then E exp(iu X) = exp(-(uRuT)/2), so -lnE exp(iu X) 2/ = uRuT = i u2 i rii + 2 i 1) while I(t) = Sd ,1 t,s (ds). (3) Here, Sd is the unit sphere in Rd, is a finite measure on Sd, called the spectral measure, the quantity t,s = j tj sj is the inner product in Rd, and ,(u) = |u| 1 - i sign(u)tan 2 for = 1, |u| 1 + i 2 sign(u)log|u| for = 1. (4) We denote the distribution of a stable r.v. X with the ch.f. (2) by S(m, ). Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 135 The index of stability (0,2] determines the tail of the stable law and can be thought of as a shape parameter. When = 2 we obtain the special case of multivariate normal distribution, while when < 2, the probability P(Xj > x) associated with each component Xj of an -stable r.v. X decreases like the power function x- as x increases to infinity. Spectral measure controls the dependence among the components of X. The latter are independent if and only if is discrete and concentrated on the intersection of Sd with the coordinate axes. 2.1. Domains of attraction Stable laws are the only possible limiting distributions of scalar-normalized sums of i.i.d. random vectors. A random vector X is said to be in the domain of attraction of a multivariate stable r.v. Y if for some an > 0 and bn Rd the following convergence in distribution holds an(X1 + + Xn) + bn d Y as n , (5) where the Xi's are i.i.d. copies of X. By the stability property (1) it is clear that any stable r.v. belongs to its own domain of attraction. The domain of attraction of a stable law with index = 2 (the normal law) includes all distributions with finite second moments for which the convergence in (5) coincides with the classical Central Limit Theorem. The domain of attraction of a nonnormal stable law admits the following characterization due to Rvaˇceva1 (1962), and plays a crucial role in estimating the spectral measure . Proposition 2.1. A random vector X on Rd belongs to the domain of attraction of some full2 stable S(m, ) law with < 2 if and only if V (r) = P(||X|| > r) is regularly varying at infinity with index - and P X X D given X > r = P(X/ X D, X > r) V (r) (D) (Sd) (6) as r for all Borel subsets D of the sphere Sd with (D) = 0. In other words, the tail behavior of X in the direction of D is determined by the spectral measure of the set D. 1 The original proof in Rvaˇceva (1962) seems to contain an error; for a corrected proof and a more modern treatment (in terms of regular variation), see Meerschaert and Scheffler (2001). 2 The probability distribution of a random vector X on Rd is full if t,X is nondegenerate for every t = 0. 136 T.J. Kozubowski et al. 2.2. Strictly stable and symmetric stable vectors A r.v. X is strictly stable if the relation (1) is valid with Dn = 0. This holds if the shift vector m is zero for = 1 and if Sd s (ds) = 0 (7) if = 1 [see, e.g., Samorodnitsky and Taqqu (1994)]. A r.v. X is said to be symmetric stable if it is stable and the probabilities P(X A) and P(-X A) are the same for all Borel sets A of Rd. Then, the spectral measure of X is symmetric and the ch.f. (2) reduces to (t) = e - Sd | t,s | (ds) . (8) 2.3. One-dimensional case In one dimension, the unit sphere is the set {-1,1} and the ch.f. (2) reduces to (t) = EeitX = eit-, (t) , (9) where the parameter is the index of stability as before, [-1,1] is the skewness parameter, parameters R and > 0 control location and scale, respectively, and , is given by (4). We shall use the notation S(,,) to denote the stable distribution given by the ch.f. (9). Strictly stable laws in one dimension correspond to = 0 for = 1 and = 0 for = 1. Symmetric univariate stable laws are strictly stable with = = 0. Stable distributions are supported on the entire real line, except when < 1 and || = 1, when we obtain totally skewed distributions concentrated on (,) for = 1 and (-,) for = -1. The following moment formula from Samorodnitsky and Taqqu (1994), is useful in estimating parameters of multivariate stable laws [cf. Nikias and Shao (1995)]. Proposition 2.2. Let X S(,,0) with (0,2) and = 0 for = 1. Then for any p (0,) we have E|X|p = p C, (10) where C = C(,,p) = 2p-1 (1 - p/) p 0 u-p-1 sin2 udu 1 + 2 tan2 2 p/(2) × cos p arctan tan 2 . (11) Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 137 2.4. Discrete spectral measure An important special class of stable laws are those with a discrete spectral measure. A measure is discrete if (A) = k j=1 sj (A)j , (12) where the sj 's are k points on the unit sphere, s denotes a point mass at s, s(A) = 1 if s A, 0 otherwise, (13) and j > 0 for j = 1,2,...,k. If the spectral measure has form (12), then the corresponding ch.f. is straightforward to compute, because in this case I in (3) takes the form: I(t) = k j=1 ,1 t,sj j . (14) Because of the simple form of their ch.f.'s, stable laws with discrete spectral measures are much easier to handle in practice than the general ones. In particular, their computer simulation is straightforward, whereas exact algorithms for simulation of general stable vectors are not available. The simulation of stable variates with discrete spectral measure is based on the following representation from Modarres and Nolan (1994) [see also Samorodnitsky and Taqqu (1994), Example 2.3.6]. Proposition 2.3. Let X S(m, ) with of the form (12). Then X d = m + k j=1 1/ j Vj sj if = 1, m + k j=1 j Vj + 2 logj sj if = 1, (15) where the Vj 's are i.i.d. totally skewed, one-dimensional standard stable variables S(1,1,0). Since there exist exact algorithms for simulating one-dimensional stable variates [see, e.g., Weron (1996)], representation (15) can be used to generate d-dimensional stable vectors with discrete spectral measure. 138 T.J. Kozubowski et al. Another important aspect of stable laws with discrete spectral measure is their role in approximating general stable distributions. As shown in Byczkowski, Nolan and Rajput (1993) every stable distribution can be approximated by one with a discrete spectral measure. Proposition 2.4. Given a stable vector X S(m, ) in Rd with density p, for every > 0 there exists a positive integer k = k(,d,, ), points s1,...,sk on the unit sphere Sd , and positive constants 1,...,k such that sup xRd p(x) - p (x) < , (16) where p is the density of the stable distribution on Rd with a discrete given by (12). The value of k is given explicitly in Byczkowski, Nolan and Rajput (1993). Because of the above approximation, in practice one usually restricts attention to laws with discrete spectral measure, see Nolan (1998) for further discussion. 2.5. Linear combinations and risk of a financial portfolio Return on a d-asset portfolio can be modeled as a linear combination b,X = b1X1 + + bdXd (17) of the stable vector of returns on individual assets X and the vector of weights b indicating the portion with which each asset enters the portfolio. The properties of a portfolio can then be studied via properties of linear combinations of stable random variables. It is well known that all linear transformations (17), which include marginal distributions of stable vectors, are again stable. In particular, linear combinations of a stable r.v. X = (X1,...,Xd) S(m, ) are univariate stable S(b,b,b), where b = Sd b,s (ds) 1/ , (18) b = Sd | b,s | sign( b,s ) (ds) Sd | b,s | (ds) , (19) b = b,m for = 1, b,m - 2 Sd b,s log b,s (ds) for = 1. (20) Parameter b is often called the risk of a stable portfolio. We would like to note here, that it is necessary to have information about the spectral measure in order to estimate that risk. For the motivation and more discussion of the definition of risk please see Rachev and Mittnik (2000). Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 139 2.6. Densities All fully d-dimensional stable laws are absolutely continuous and admit bounded unimodal densities. In general there are no closed form expressions for stable densities. For numerical stable density computations one can use the following integral representation of stable densities due to Abdul-Hamid and Nolan (1998). Proposition 2.5. Let X S(m, ) be a nondegenerate stable random vector in Rd with d 1, and let s, s and s be given by (18)­(20). Then the density of X admits the following form: (i) For = 1, p(x) = Sd g,d x - m,s s ,s -d s ds, (21) where g,d(v,) = 1 (2)d 0 cos vu - u tan 2 ud-1 e-u du. (22) (ii) For = 1, p(x) = Sd g1,d x - m,s - s + (2/)ss logs s ,s -d s ds, (23) where g1,d(v,) = 1 (2)d 0 cos vu - 2 ulogu ud-1 e-u du. (24) As remarked by Nolan (1998), this representation is more suitable for approximating multivariate stable densities than the numerical inversion of the stable ch.f. [see Nolan and Rajput (1995)], since g,d is a function of two variables regardless of the dimension d and it is the same for any stable random vector. 2.7. An alternative parameterization Note that the spectral measure is not necessarily a probability measure on Sd . An alternative parameterization introduces a scale parameter = (Sd) 1/ (25) and the normalized measure (ds) = - (ds), (26) 140 T.J. Kozubowski et al. so that (Sd) = 1 [see, e.g., Davydov and Paulauskas (1999)]. With this new normalized spectral measure, the ch.f. (2) takes the form (t) = e-I(t)+i t,m , (27) where I is as before (with in place of ). We now have four parameters: the stability index (0,2], the scale parameter > 0, the shift parameter m Rd, and the normalized spectral measure . We shall use the notation S (,m, ) for the distribution corresponding to the ch.f. (27). 2.8. Association A strong form of positive dependence of the components of a d-dimensional r.v. X = (X1,...,Xd) is the association, introduced in Esary, Proschan and Walkup (1967). The components of X are said to be associated if for any functions f,g :Rd R, nondecreasing in each coordinate, we have Cov f (X),g(Y) 0 (28) whenever covariance exists. Normal variables are associated if and only if they are nonnegatively correlated (Pitt, 1982). Association of stable variables has been characterized in terms of the spectral measure in Lee, Rachev and Samorodnitsky (1990a). Proposition 2.6. Let X = (X1,...,Xd) S(m, ), where 0 < < 2. Then X1,...,Xd are associated if and only if Sd = 0, (29) where Sd = s = (s1,...,sd) Sd: for some i,j {1,...,d}, si > 0 and sj < 0 . (30) Thus, bivariate stable vectors are associated if and only if their corresponding spectral measure is concentrated on the first and third quadrants. Remark. Other notions of positive dependence include positive upper orthant dependence (PUOD) and positive lower orthant dependence (PLOD). The variables X1,...,Xd are PUOD if P(X1 > x1,...,Xd > xd) P(X1 > x1)P(Xd > xd) (31) for any x1,...,xd, and they are PLOD if P(X1 x1,...,Xd xd) P(X1 x1)P(Xd xd), (32) Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 141 so that the variables are likely to take on larger or smaller values together. It is well known that association implies both PUOD and PLOD, but one cannot in general reverse these implications. However, as shown in Lee, Rachev and Samorodnitsky (1990a), for stable random vectors association is equivalent to PUOD and also to PLOD, so that all of the above notions of positive dependence are equivalent. The components of X = (X1,...,Xd) are said to be negatively associated if for any 1 k < d and any functions f :Rk R, g :Rd-k R, nondecreasing in each coordinate, we have Cov f (Y),g(Z) 0 (33) whenever the covariance exists, where Y and Z are any k and (d - k)-dimensional subvectors of X [see Alam and Saxena (1982)]. The negative association of stable random vectors was characterized in Lee, Rachev and Samorodnitsky (1990a). Proposition 2.7. Let X = (X1,...,Xd) S(m, ), where 0 < < 2. Then X1,...,Xd are negatively associated if and only if S+ d = 0, (34) where S+ d = s = (s1,...,sd) Sd: sisj > 0 for some i = j . (35) Thus, a bivariate stable vector has negatively associated components if and only if the corresponding spectral measure is concentrated on the second and forth quadrants. 3. Estimation of the index of stability In this section we address the issue of estimating the tail index . We start with the case when the sample comes from a univariate -stable distribution, and then consider a more general situation where the observations are not necessarily stable, but asymptotically have a stable-Pareto tail with index , that is P(X1 > x) = 1 - F(x) xL(x), (36) where L is some slowly varying function. Given a multivariate heavy tailed data set X1,...,Xn, one can apply the methods of this section to one-dimensional samples corresponding to the norms ||Xj || or the projections Xj ,b for some b Rd. 142 T.J. Kozubowski et al. 3.1. Estimation of univariate stable parameters Estimating the parameters of stable distributions is a challenging problem due to the fact that the densities and distributions functions of these laws are not available in closed form. Various estimation methods have been developed over the last 30 years, most of them requiring numerical approximations. Since the stable characteristic function can be written in a closed form, several estimation techniques are based on fitting the sample characteristic function to its theoretical counterpart. The substantial collection of papers in this area started with Press (1972b), and include Arad (1980), Feuerverger and McDunnough (1977, 1981a, b), Kogon and Williams (1998), Koutrouvelis (1980, 1981), Paulson and Delehanty (1984, 1985), Paulson, Holcomb and Leitch (1975). As noted by McCulloch (1996), these estimation procedures were reported by practitioners to have high efficiency relative to the maximum likelihood approach. However, some of these methods are quite complex and require the practitioner to choose certain arbitrary parameters. A discussion and comparative study of these approaches can be found in Kogon and Williams (1998). The maximum likelihood (ML) method for the stable case was first proposed by DuMouchel (1971, 1973), who also discussed the asymptotic properties of the estimators. To approximate the loglikelihood function DuMouchel (1971) employed fast Fourier transform (FFT) for the central part of the data and series expansions for the tails. See also DuMouchel (1975, 1983) for numerical approximation of the Fisher information matrix and further comments on this approach. Since this early work, various numerical procedures for approximating stable densities have been developed, which now permit an efficient computation of the likelihood function without the grouping procedure of DuMouchel (1971). For the ML in the symmetric case, see Brorsen and Yang (1990), McCulloch (1979, 1998). Asymmetric stable ML was treated in Brorsen and Preckel (1993), Liu and Brorsen (1995), Mittnik et al. (1999), Nolan (2001), Stuck (1976). As noted in Mittnik et al. (1999), one advantage of the ML approach over most other methods is its ability to handle generalizations to dependent or not identically distributed data arising in financial modeling (for example, regression or various time series models with stable disturbances). An implementation of the ML method for such generalizations can be found in Liu and Brorsen (1995) (stable GARCH), Mittnik, Rachev and Paolella (1998) (ARMA models driven by asymmetric stable distributions), and Brorsen and Preckel (1993), McCulloch (1998) (linear regression). In the last section of our chapter, we utilize the maximum likelihood numerical procedures of Nolan (1998), applicable for the most general i.i.d. stable case (available on the author's web site). Numerous other methods of estimating stable parameters have been suggested. Perhaps the most commonly used estimators in empirical work are quantile procedures of Fama and Roll (1971) for the symmetric case and their modifications to the general case obtained by McCulloch (1986). Buckle (1995) proposed sampling based Bayesian inference for stable laws, see also Qiou and Ravishanker (1995), Ravishanker and Qiou (1998) for further extensions and discussion of the Bayesian approach. Nikias and Shao (1995) derived moment estimators based on sample fractional moments. Computationally simple estimators based on the modified method of scoring were proposed in Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 143 Klebanov, Melamed and Rachev (1994). For further references on estimating stable parameters, see, e.g., McCulloch (1996), Rachev and Mittnik (2000). Comparative studies of various estimators for stable parameters include Akgiray and Lamoureux (1989) and more recent Höpfner and Rüschendorf (1999), Kogon and Williams (1998). 3.2. Estimation of the tail index Assume that we have a one-dimensional random sample X1,...,Xn satisfying (36) and belonging to the domain of attraction of an -stable distribution. There is a large body of literature concerning estimation of the tail index . Many common estimators of are based on a subset of the sample order statistics, X(1) X(n). (37) Below we sketch few standard and some recent methods for estimating and give references for many others. 3.2.1. The Hill estimator The Hill estimator [see Hill (1975)] along with its various modifications is perhaps the most common way of estimating the tail thickness of a financial data set [see, e.g., Jansen and de Vries (1991), Koedijk, Schafgans and de Vries (1990), Loretan and Phillips (1994), Phillips (1993)]. The estimator uses the k largest order statistics, ^Hill = 1 k k j=1 logX(n+1-j) - logX(n-k) -1 , (38) and arises as the conditional maximum likelihood estimator for the Pareto distribution P(X > x) = Cx-. With the proper choice of the sequence k = k(n), the estimator is consistent and asymptotically normal, see, e.g., Beirlant and Teugels (1989), Csörg˝o and Mason (1985), de Haan and Resnick (1998), Deheuvels, Haeusler and Mason (1988), Goldie and Smith (1987), Haeusler and Teugels (1985), Hall (1982), Hall and Welsh (1984, 1985), Mason (1982). For further discussion and extensions, see, e.g., Csörg˝o, Deheuvels and Mason (1985), Csörg˝o and Viharos (1995), Dekkers and de Haan (1993), Dekkers, Einmahl and de Haan (1989). An obvious problem with the Hill estimator and its generalizations discussed below is the practical choice of k. Generally, we must have k and k n 0 as n (39) to achieve strong consistency and asymptotic normality. In practice, one usually plots values of the estimator against the values of k (obtaining the so-called Hill plot) and looks for 144 T.J. Kozubowski et al. a stabilization (flat spot) in the graph. An alternative, more informative method of doing a Hill plot, is described in Drees, de Haan and Resnick (2000), Resnick and St˘aric˘a (1997). We refer the readers to Danielsson, Jansen and de Vries (1996), Embrechts, Klüppelberg and Mikosh (1997), Kratz and Resnick (1996), Mittnik and Paolella (1999), Rachev and Mittnik (2000), Resnick (1998), Resnick and St˘aric˘a (1997) and references therein for more details on this and related tail estimators. 3.2.2. A shifted Hilĺs estimator Noting that the Hill estimator is scale invariant but not shift invariant, Aban and Meerschaert (2001) proposed the modification that is shift invariant. Their method consists of conditional maximum likelihood estimation for the shifted Pareto distribution P(X > x) = C(x - s)-, and yields the estimators: ^ = 1 k k j=1 log(X (j) - ^s) - log(X (k+1) - ^s) -1 , (40) ^c = k n (X (k+1) - ^s)^ , (41) where ^s is obtained by solving the equation ^(X (k+1) - ^s)-1 = (^ + 1)k-1 k j=1 (X (j) - ^s)-1 (42) over the set ^s < X (k+1). Here the starred variables indicate the order statistics taken in the decreasing order: X (1) X (n). (43) Numerical procedures are required to compute the estimators. 3.2.3. The Pickands estimator and its modifications Pickands (1975) introduced a tail estimator of the form ^Pick = log2 log(X(n-k+1) - X(n-2k+1)) - log(X(n-2k+1) - X(n-4k+1)) , 4k < n, (44) see also Drees (1996), Rosen and Weissman (1996). Noting its poor performance on samples from stable distributions, Mittnik and Rachev (1996) introduced a modification of (44) Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 145 based on Bergström expansion of stable distribution function [see Bergström (1952), and also Janicki and Weron (1994)]. Their unconditional Pickands estimator is of the form ^UP = log2 logX(n-k+1) - logX(n-2k+1) . (45) We refer the readers to Rachev and Mittnikl (2000) for further discussion on the practical performance and other modifications of the Pickands estimator. 3.2.4. Least-squares estimators Taking the logarithm of both sides in relation (36) we observe that for large values of x the points with abscissa logx and ordinate log(1 - F(x)) should approximately fall on a straight line with slope -. Using the k largest order statistics Xn+1-j , j = 1,...,k, we can examine the plot of logXn+1-j versus log j n log 1 - F(Xn+1-j ) (46) and visually estimate the slope of the resulting line. This graphical approach was suggested by Mandelbrot (1963b). Using these upper order statistics one can estimate the slope by the classical leastsquares method [see Kratz and Resnick (1996), Schultze and Steinebach (1996)]. Below we briefly describe the estimators obtained in Schultze and Steinebach (1996). Assuming that in (36) we have L(x) = ec (which is the case for stable distributions), Schultze and Steinebach (1996) applied the method of least squares to estimate the intercept c/ and the slope 1/ of a straight line fit to logXn+1-j c + 1 log n j , j = 1,...,k. (47) This resulted in the following estimator of : ^(1) LS = 1 k k j=1 log n j logXn+1-j - 1 k2 k j=1 log n j k j=1 logXn+1-j -1 × 1 k k j=1 log2 n j - 1 k k j=1 log n j 2 . (48) 146 T.J. Kozubowski et al. Another estimator was obtained in Schultze and Steinebach (1996) by the least squares method under the assumption of zero intercept in (47): ^(2) LS = k j=1 log n j logXn+1-j k j=1 log2 n j . (49) Finally, Schultze and Steinebach (1996) proposed yet another estimator of resulting from expressing (47) in the form logXn+1-j c + log n j , j = 1,...,k, (50) and minimizing the sum of squares k j=1 logXn+1-j - c - log n j 2 . This produced: ^ (3) LS = 1 k k j=1 log n j logXn+1-j - 1 k2 k j=1 log n j k j=1 logXn+1-j × 1 k k j=1 log2 Xn+1-j - 1 k k j=1 logXn+1-j 2 -1 . (51) Consistency and asymptotic normality of the above estimators are established in Schultze and Steinebach (1996) and Csörg˝o and Viharos (1997), respectively [see also Kratz and Resnick (1996) for similar results on their QQ estimator]. 3.2.5. The M­S method Meerschaert and Scheffler (1998) introduced a simple robust estimator for the tail index that is based on the asymptotics of the sum and utilizes the entire sample not just the largest order statistics. The estimator is based on the idea that if Xi 's are i.i.d. and belong to the domain of attraction of an -stable law with 0 < < 2 (and their distribution function satisfies (36)), then their sample variance, ^2 = 1 n n j=1 Xj - X 2 , (52) Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 147 converges to an /2-stable (totally skewed) r.v. Y: n1-2/ ^2 d Y. (53) Taking the logarithm on both sides of (53) we obtain the convergence 2 logn 1 ^ - 1 d logY, (54) where 1 ^ = logn + log ^2 2 logn (55) is the Meerschaert­Scheffler (M­S) estimator of 1/. The estimator is consistent and its asymptotic distribution is that of logY for some totally skewed /2 positive stable r.v. Y. Moreover, the estimator applies to certain dependent data. Comparing its performance with that of Hilĺs estimator, Meerschaert and Scheffler (1998) concluded that it works as well as the latter in most cases, and substantially better when applied to stable data, see Meerschaert and Scheffler (1998) for further details. 4. Estimation of the stable spectral measure 4.1. Tail estimators A method of estimating the spectral measure of a stable r.v. Y based on a random sample X1,...,Xn (56) from the domain of attraction of Y was proposed by Rachev and Xin (1993) and Cheng and Rachev (1995). The method, referred to as the Rachev­Xin­Cheng (RXC) method by Nolan and Panorska (1997), is based on the limiting relation in Proposition 2.1. To estimate (D), where is the (normalized) spectral measure of Y [cf. parameterization (27)], choose a large value of r and calculate the proportion of the Xi's with the norm exceeding r that belong to the set D when normalized, that is (D) = {Xi/ Xi D and Xi > r} { Xi > r} . (57) Equivalently, we can choose an integer k = k(n) n/2 and consider the set Xi1 ,..., Xik (58) 148 T.J. Kozubowski et al. of the k largest order statistics connected with the corresponding sample of the norms: X1 ,..., Xn . (59) Then, the RXC estimator of is the discrete measure on Sd that assigns the mass of 1/k to each of the unit vectors Xi1 Xi1 ,..., Xik Xik . (60) The authors suggest taking about 20% of the largest order statistics. Under appropriate technical conditions the estimator is strongly consistent and asymptotically normal. A similar method was recently proposed by Davydov et al. (2000) and discussed further in Davydov and Paulauskas (1999). We refer to this approach as the Davydov­PaulauskasRackauskas (DPR) method. Assuming that the sample (56) is actually from an -stable distribution with a zero shift vector m and a symmetric (normalized) spectral measure , and the sample size n is a perfect square n = k2 for some integer k, the method consists of splitting the data into k groups of k variables each, choosing a vector with the largest norm within each group, leading to a set of k vectors Xi1 ,...,Xik , and again estimating by the empirical measure based on the unit vectors (60). The consistency and asymptotic normality of the resulting estimators, (D) = 1 k k j=1 ID Xij Xij , (61) is established in Davydov and Paulauskas (1999). Both RXC and DPR methods do not assume any prior knowledge of and are well suited for the S (m,, ) parameterization, as they provide estimators for the normalized spectral measure. Once the spectral measure and the index are estimated, the scale parameter can be estimated by methods described in Section 5. 4.2. The empirical characteristic function method The method described below, proposed in Nolan, Panorska and McCulloch (2001) and investigated in Nolan and Panorska (1997), assumes that the sample comes from an stable distribution with shift vector m equal to zero. First, estimate the index of stability and center the data by the sample mean (if > 1) or sample median (if < 1). In Nolan, Panorska and McCulloch (2001) the value of was estimated by the average 1 d d j=1 ^j , where ^j is an estimate of the index obtained from a univariate sample X1j ,...,Xnj (the quantile method of McCulloch (1986) was used to obtain these). Then, the method uses the sample to estimate the exponent I of the stable ch.f. (2) (with m = 0): ^I(t) = -logn(t), (62) Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 149 where the quantity n is the sample characteristic function, n(t) = 1 n n j=1 ei t,Xj . (63) For some grid t1,...,tk Sd, the quantity ^IECF = ^In(t1),..., ^In(tk) (64) is the empirical ch.f. (ECF) estimate of I. If is a discrete measure of the form (12), then the exponent I is given by (14), and we can estimate = (1,...,k) by solving the following system of linear equations: I = A, (65) where I = ^IECF is an estimate of I given by (64) and A is a k × k (complex) matrix [aij ]i,j=1,...,k with aij = ,1 ti,sj . (66) If the grid is chosen so that the inverse of A exists, then the solution of the system (65) is = A-1I. For a general spectral measure, divide the unit sphere into k non-overlapping patches Aj with some central points sj , where j = 1,...,k, and consider an approximation of of the form (12), where i = (Aj ) (which is always possible in view of Proposition 2.4). When d = 2, it is convenient to take the arcs Aj = 2(j - 3/2) k , 2(j - 1/2) k , j = 1,...,k, (67) centered at sj = cos 2(j - 1) k ,sin 2(j - 1) k Sd, j = 1,...,k. (68) We would again estimate I by (64) and solve the system (65) to obtain the estimates of the weights j . As reported in Nolan and Panorska (1997), in practice there are some problems with the direct implementation of the above method; the matrix A may be ill-conditioned and the solution of the system (65) may include negative or complex numbers (although the values of j must be real and positive). Thus, in practice one should restate the problem as a constrained quadratic programming problem, minimize I - A = (I - A ) (I - A ) subject to 0, (69) 150 T.J. Kozubowski et al. which guarantees a nonnegative solution . We refer the readers to Nolan (1998), Nolan and Panorska (1997), Nolan, Panorska and McCulloch (2001) for examples and further discussion of these issues. 4.3. The projection method The projection (PROJ) method was introduced in McCulloch (1994) and studied in Nolan and Panorska (1997), Nolan, Panorska and McCulloch (2001). As before, assume that the data have been shifted so that the parameter m is zero. The method is similar to the ECF method, since we estimate the weights j at sj of a discrete spectral measure of the form (12) by solving the linear system of Equations (65). However, the PROJ method uses a different value of I, the estimate of I, obtained from estimators of univariate stable parameters applied to a one-dimensional sample X1,tj ,..., Xn,tj , j = 1,...,k, (70) where t1,...,tk Sd is a suitably chosen grid on the unit sphere. More precisely, for each t Rd the r.v. X1,t is one-dimensional stable with parameters given by (18)­(20) and ch.f. (u) = Eeiu t,X = Eei ut,X = (ut) = e-I(ut) , (71) where I is the characteristic exponent of the Xj 's. Now, we can estimate the scale ^(tj ) and skewness ^(tj ) (and also the shift ^(tj ) if = 1) of the univariate stable law corresponding to the sample (70), and use them to estimate the ch.f. (9) of this univariate law. Then, we can equate the above estimate with the right-hand side of (71) with u = 1 to estimate the quantity I on the grid t1,...,tk: ^In(tj ) = ^ (tj ) 1 - i ^(tj )tan 2 for = 1, ^(tj ) 1 - i ^(tj ) for = 1. (72) For the index McCulloch (1994) recommend using the pooled estimate obtained by averaging the univariate estimates obtained for each of the univariate samples (70). Thus, the PROJ estimate of I on the grid t1,...,tk is the quantity ^IPROJ = ^In(t1),..., ^In(tk) . (73) Now, the weights j of the spectral measure are obtained as before by solving the system (65). For examples and further discussion, please see McCulloch (1994), Nolan and Panorska (1997), Nolan, Panorska and McCulloch (2001). Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 151 5. Estimation of the scale parameter Let us consider the problem of estimating the scale parameter of the stable S (,m, ) distribution given by the ch.f. (27) (where is the normalized spectral measure). As before, we shall assume that the distribution is strictly stable with > 1, so that m = 0. We extend the moment estimators of Davydov and Paulauskas (1999) who considered the case of symmetric spectral measure. Note that if X S (,0, ) then Y = -1X S (1,0, ), so that for any 0 < p < we have E X p = p C(, ,p), (74) where C(, ,p) = E Y p (75) is independent of and can be computed for a given values of and . Then, approximating E X p by the corresponding sample moment we obtain ^n = 1 nC(, ,p) n j=1 Xj p 1/p . (76) Alternatively, we might use moment estimator for univariate stable variables on the i.i.d. observations X1,t ,..., Xn,t (77) for some t Rd. Then, by (18), (19), the above variables are univariate S(t,t,0), where t = Sd t,s (ds) 1/ (78) and t = Sd | t,s | sign( t,s ) (ds) Sd | t,s | (ds) . (79) Then, for any 0 < p < , we have E X,t p = p C(,t,p)C1(, ,p), (80) 152 T.J. Kozubowski et al. where C(,t,p) is given by (11) and C1(, ,p) = Sd t,s (ds) p/ . (81) Now, approximating (80) with the sample p-moment we obtain the following estimator of : ~n = 1 nC(,t,p)C1(, ,p) n j=1 Xj ,t p 1/p . (82) 6. Extensions to other stable models In this section we briefly discuss two generalizations of multivariate stable laws that often compete with them in modeling financial data: the -stable laws that arise as limiting distributions in the random summation scheme and operator stable laws arising as limits in ordinary summation (5) but normalized by linear operators an. 6.1. -stable laws Let X1,X2,... be a sequence of i.i.d. random vectors in Rd and let p, p (0,1), be a family of integer-valued random variables independent of the Xi's. Assuming that p converges to infinity (in probability) as p 0, we can study the limiting distributions of the random sums ap p j=1 (Xj + bp), (83) where ap > 0 and bp Rd. It follows from transfer theorems [see, e.g., Rosínski (1976)] that if the variables pp converge in distribution to a positive r.v. Z with the Laplace transform (s) = Eexp(-sZ) and the Xj 's are in the domain of attraction of some -stable distribution with ch.f. , then the random sums (83) will converge to a random variable with the ch.f. of the form (t) = -log(t) . (84) The variables with the ch.f. (84), referred to as the -stable laws ­ see, e.g., Klebanov and Rachev (1996), Kozubowski and Panorska (1998, 1999b), can be described by the same parameters as the corresponding stable laws: the tail index , location vector m, and spectral measure . Strictly -stable laws are given by (84) with a strictly stable ch.f. . Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 153 We use the notation (m, ) for the distribution corresponding to the ch.f. (84) with given by (2). The -stable laws are essentially location-scale mixtures of stable laws [see, e.g., Kozubowski and Panorska (1998)] and for a light-tailed r.v. Z have the same tail behavior as the corresponding stable laws. More precisely, the tail behavior of each coordinate of a -stable r.v. X is of the form P(Xk > x) = O(x-) as x under the following conditions [see Kozubowski and Panorska (1996, 1998)]: * EZ < if X is strictly -stable, * EZ1 < and = 1 or E|Z logZ| < and = 1, if X is not strictly -stable. Under the above conditions, the same tail behavior applies to every linear combinations X,b of X, the order statistics of the vector X (as well as their absolute values), and the norm X , see Kozubowski and Panorska (1998) for details. Note that these conditions are satisfied, for example, by the geometric stable laws discussed below. Remark. Although the tails of -stable laws are essentially of the same type as those of stable distributions, -stable densities may behave very differently near the mode than their stable counterparts (may be more peaked, or even infinite) which may lead to an improved fit when modeling financial data. Kozubowski and Panorska (1999b) showed that if the spectral measure is discrete, then truly d-dimensional -stable random vectors admit a representation similar to that of stable laws given in Proposition 2.3: Proposition 6.1. Let Y (m, ) with of the form (12) and 0 < < 2. Then Y d = Zm + Z1/ k j=1 1/ j Vj sj if = 1, Zm + Z k j=1 Vj + 2 log(j Z) j sj if = 1, (85) where the Vj 's are i.i.d. totally skewed, one-dimensional standard stable variables S(1,1,0), independent of Z. Thus, -stable random variates are straightforward to simulate if is discrete. Distributions with general can be approximated by those with discrete spectral measure [see Kozubowski and Panorska (1999b)] as in the stable case, so that in practice we can restrict attention to the case with discrete . 6.1.1. Geometric stable laws An important special case are the limiting distributions of (83) when the variables p are geometric with mean 1/p in which case the variables pp converge to a standard expo- 154 T.J. Kozubowski et al. nential variable with the Laplace transform (s) = (1 + s)-1. We then obtain the class of geometric stable law (GS) laws GS(m, ) with the ch.f. (t) = 1 + I(t) - i t,m -1 , (86) where m Rd and I is given by (3). In financial applications, where these laws have been successfully applied [see, e.g., Kozubowski and Panorska (1999a), Kozubowski and Rachev (1994), Mittnik and Rachev (1991, 1993a)] the r.v. p represents the moment when the probabilistic structure governing the returns changes, so that the random sum p j=1 Xj (87) represents the total return up to this random time. In case = 2, we obtain the multivariate Laplace distribution [see, e.g., Kozubowski and Podgórski (2000)], which may be particularly well suited for financial applications due to its simplicity and flexibility [see, e.g., Kozubowski and Podgórski (2001)], although the tails of these laws, being heavier than Gaussian tails, are not as heavy as those of stable and geometric stable laws. More information on theory and applications of GS laws can be found in Kozubowski and Rachev (1999). 6.1.2. Statistical issues Most estimation procedures for stable laws can be extended to the corresponding -stable distributions. For simplicity we consider the problem of estimating and of a strictly geometric stable distribution given by the ch.f. (86) with m = 0 and = 2, based on a random sample Y1,...,Yn. (88) For estimating , the tail estimators of Section 3.2 can be applied to one-dimensional samples corresponding to (88) by taking the norms of the Yi's or their projections Yi,b for some b Rd. These apply regardless of whether the sample is actually geometric stable or only belongs to a geometric stable domain of attraction. Alternatively, assuming that the Yi's are geometric stable, one can use estimators for univariate geometric stable parameters [see, e.g., Kozubowski (1983, 2001), Rachev and Mittnik (2000)] applied to the projections Yi,b . To estimate the spectral measure , one can use the RXC tail estimator discussed in Section 4.1 since geometric stable distributions have the same domains of attraction as the corresponding stable laws (that have the same and ), see, e.g., Klebanov and Rachev (1996). Alternatively, the empirical characteristic function method discussed in Section 4.2 Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 155 can be modified to accommodate the geometric stable case. Assuming that the sample (88) is from a GS distribution, we estimate the exponent I of the GS ch.f. (86) as follows: ^I(t) = 1 n(t) - 1, (89) where n is the sample characteristic function (63) based on the Yi's. The rest is the same as in the stable case. For some grid t1,...,tk Sd , the quantity (64) is the empirical ch.f. (ECF) estimate of I. If is a discrete measure of the form (12), then I is given by (14), and we can estimate = (1,...,k) by solving the system of linear equations of the form (65), where I = ^IECF is an estimate of I given by (64) and A is a k × k (complex) matrix with the entries specified in (66). If the inverse of A exists, then the solution of the system (65) is = A-1I. To avoid the same numerical problems as in the stable case, in practice one should restate the problem as a constrained quadratic programming problem (69). The projection method of Section 4.3 can be extended similarly. 6.2. Operator stable laws If we have a heavy-tail multivariate data with different tail indexes in different directions, then the multivariate stable (as well as the -stable) laws are no longer appropriate to model such data. Instead, we can consider the class of multivariate laws with stable marginal distributions, introduced in Resnick and Greenwood (1979), that arise as limiting distributions in the summation scheme (5) where the scaling factors are diagonal matrices, diag(an1,...,and), for some positive ani 's. The resulting limiting marginally stable random vectors X possess a stability property similar to (1), X1 + + Xn d = nE X + Dn, (90) where the Xi's are i.i.d. copies of X, E is a diagonal matrix E = diag 1 1 ,..., 1 d , 0 < i 2, i = 1,...,d, (91) called the characteristic exponent of X, and nE denotes the diagonal matrix nE = diag n1/1 ,...,n1/d . (92) Remark. More general operator stable (OS) laws arise as the limits in (5) when the sums are normalized by some linear operators an [see Sharpe (1969)]. For a comprehensive review of the theory of OS laws see Jurek and Mason (1993). 156 T.J. Kozubowski et al. Marginally stable OS laws satisfying (90) with the characteristic exponent E of the form (91) can be described in terms of their characteristic function. If all 's are strictly between one and two, we have (t) = exp C Sd 0 ei t,rEs - 1 - i t,rE s dr r2 (ds) + i t,m , (93) where m Rd is the shift parameter, C > 0 controls the scale, and is a probability measure on the unit sphere Sd (the normalized spectral measure, also called the mixing measure). If all the 's of the characteristic exponent E in (91) are equal, then (93) reduces to the stable ch.f. with the same spectral measure. We use the notation OS(m,C,E, ) to denote the distributions with the ch.f. (93) with E given by (91). Similarly to the stable case, the measure determines the dependence among the components of a marginally stable vector. For example, if X OS(m,C,E, ) is positively or negatively associated, then the spectral measure satisfies the condition (29) or (34), respectively [see Mittnik, Rachev and Rüschendorf (1999)]. 6.2.1. Statistical issues Estimating the parameters of an OS(m,C,E, ) distribution is similar to the stable case. Since all marginal distributions are univariate stable, one can obtain estimates of the i 's by using the methods for univariate stable laws (see Section 3.1) for each of the d samples X1j ,...,Xnj , j = 1,...,d. (94) For samples from a domain of attraction of an OS law we can again consider univariate samples (94) and apply the methods of Section 3.2, or use the moment estimator of E based on the sample covariance matrix [see Meerschaert and Scheffler (1999)]. To estimate C and , one can use a generalization of the tail estimator of the spectral measure for stable laws described in Section 4.1 [see Mittnik, Rachev and Rüschendorf (1999), Scheffler (1999)]. First, write each of the data points (different than zero) in the unique form Xi = (Xi)E si, (95) where (Xi) > 0 is the "radius" of Xi and si is a point on the unit sphere Sd [these are the so-called Jurek coordinates, see Jurek (1984)]. Next, for some integer k = k(n) consider the k largest of the (Xi)'s, that is the k largest order statistics (Xi1 ),...,(Xik ) (96) corresponding to the random sample (X1),...,(Xn). (97) Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 157 Then, the estimator of is the discrete measure on Sd that assigns the mass of 1/k to each of the unit vectors si1 ,...,sik (98) corresponding to these order statistics via (95). Thus, the probability assigned by the estimated spectral measure to a set A Sd is the fraction of the points (98) falling in the set A. The corresponding estimator of C is C = k n Y , (99) where Y is the k-th largest of the values (97). More details regarding the estimation of (including the asymptotic properties of estimators) can be found in Mittnik, Rachev and Rüschendorf (1999), Scheffler (1999). 7. Applications In this section we present an example of fitting bivariate financial data sets with stable models. We fit a bivariate stable and a bivariate operator stable models to two data sets. Our data consists of 1162 daily DAX30 Index (DAX), FTSE100 Index (UKX), and S&P500 Index (SPX) prices for the period from 1/1/95 to 11/3/99. The raw indexes are first transformed into log-returns by taking natural logarithms of the quotients of their consecutive values. We analyze log-returns (1161 observations) Xt = ln(Yt /Yt-1), where the Yt 's are the raw daily index prices. The goal is to fit reasonable models to the bivariate vectors (DAX, UKX) and (UKX, SPX). This section is modeled after Nolan and Panorska (1997). We start with Exploratory Data Analysis (EDA) which focuses on general properties of the data with particular attention to the amount of variability in each data set. We first plot the log-returns of individual indexes as time series (see Figure 1). The plots show relatively large number of high spikes in the returns which points out to high volatility and the possibility that the log-returns' innovations are non-Gaussian. The next step is to plot density histograms (total area under a density histogram equals one) of the log-returns and check for indications of long tails which again suggest more variability than allowed by a Gaussian distribution. It helps at this time to fit a Gaussian distribution to the data and overlay the histogram with the fitted Gaussian density curve. Fitting a Gaussian model amounts to estimating its mean and standard deviation from the data using the sample mean and sample standard deviation. We also check for unimodality and symmetry of the data. The density histograms of the univariate log-returns overlayed with the Gaussian (and stable) models' densities are presented in Figure 2. We note that the histograms are much more peaky in the center and have heavier tails than the Gaussian models. Since the histograms are fairly symmetric and unimodal, the 158 T.J. Kozubowski et al. Fig. 1. Time series plots of the daily log-returns for the three indexes (1/1/95­11/3/99). Top panel: DAX log-returns. Middle panel: UKX log-returns. Bottom panel: SPX log-returns. Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 159 two distributional problems (sharp center peaks and long tails) with the Gaussian model could be alleviated by the stable approach. The next step in model fitting is deciding if we should use stable or operator stable models. To answer that question we have to estimate the tail indexes for all three financial indexes' log-returns. If the 's for a pair of indexes are the same, we fit a stable law to their bivariate distribution, otherwise we work with an operator stable model. To fit univariate stable models to the indexes' log-returns we estimated their parameters using maximum likelihood procedure of Nolan (2001) and its numerical implementation (STABLE 2.16) due to Nolan and available on his web page. We report the parameters according to the parametrization used by Samorodnitsky and Taqqu (see (9)). Estimation results are summarized in Table 1. We used STABLE 2.16 to compute densities of the stable models with the estimated parameters. To evaluate and compare stable and Gaussian fit we overlayed density histograms of the data with stable and Gaussian densities of the models. The results appear in Figure 2. We note much better fit of the stable models. From now on we will work under the assumption that the individual stock indexes have univariate stable distributions. Since the tail parameters for DAX and UKX and for DAX and SPX are different, we model bivariate distribution of DAX and UKX using an operator stable distribution. The tail parameters for UKX and SPX appear to be the same and thus we will fit a bivariate stable model to UKX and SPX data. To fit these bivariate models we need estimates of the spectral measures for both DAX­UKX and UKX­SPX portfolios. To fit an operator stable distribution, we estimated the spectral measure using the method described in Section 6.2. The estimated cumulative normalized (total mass equal to one) spectral measure in radian coordinates is presented in Figure 3. Since the spectral measure seems to be concentrated on the first and third quadrants, we conclude that these variables are positively associated (see our comments in Section 6.2). Conversion to Jurek coordinates was performed using a Fortran program due to Meerschaert (personal communication), all other numerical and graphical work was done by the authors in Splus2000 Professional. To estimate the bivariate stable spectral measure for the UKX­SPX portfolio we used all four methods described in Section 4: the tail estimators (RXC and DPR), the projection method (PROJ) and the empirical characteristic function method (ECF). The numerical implementation of the RXC, PROJ and ECF estimation procedures was done with the program MVSTABLE (Version 2.0) of Nolan available on J.P. Nolan's web site3 with 40 weights, that is using a 40 points estimation grid on the unit circle. The DPR estimator Table 1 Index alpha beta gamma delta UKX 1.28 0 0.0055 0.0004 DAX 1.57 0.31 0.0076 0.0041 SPX 1.28 0 0.0058 0.001 3 See http://academic2.american.edu/jpnolan/ for Stable 2.16 and MVSTABLE programs. 160 T.J. Kozubowski et al. Fig. 2. Density histograms with Gaussian (solid line) and stable (dashed line) fitted densities. Top panel: DAX log-returns. Middle panel: UKX log-returns. Bottom panel: SPX log-returns. Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 161 Fig. 3. Estimated cumulative (normalized) operator spectral measure for (DAX, UKX) data. Fig. 4. Estimates of the stable spectral measure for (UKX, SPX) vector: Solid line ­ RXC estimator, dotted line ­ ECF estimator, long-dashed line ­ PROJ estimator, and short-dashed line ­ DPR estimator. was computed for the first 1,156 (= 342) observed vectors of UKX­SPX log-returns (from 1/1/95 to 10/26/99). Numerical work for the DPR estimator was done by the authors. The graph of the estimated cumulative normalized spectral measure in radian coordinates for DAX-SPX is given in Figure 4. As the spectral measure appears to be concentrated in the first and third quadrants we believe that UKX and SPX are positively associated (see Proposition 2.6). To check the goodness of fit of our model we suggest methods described in Nolan and Panorska (1997). These include plotting parameters (e.g., scale) of one-dimensional projections of the sample (in several directions) computed first directly from the projected sample and then using the estimate of the spectral measure. A good fit will be indicated by 162 T.J. Kozubowski et al. general agreement between the parameters of these projections computed using two different methods. For a more detailed discussion of the choice of gridsize and its relationship with the goodness of fit, we refer the reader to Nolan and Panorska (1997). To summarize, we performed EDA and fit two data sets (DAX, UKX) and (UKX, SPX) with bivariate operator stable and stable models. The indexes seem to be positively associated, which is an important information in constructing a portfolio. The estimates of the stable spectral measures can be used to estimate risk of a portfolio using the methods described in Section 2.5. Acknowledgment We thank Dr. Mark Meerschaert, Department of Mathematics, University of Nevada for helpful comments and for his Fortran routines. The research of the first two authors was partially supported by NSF grant DMS-0139927. References Aban, I.B., Meerschaert, M.M., 2001. Shifted Hilĺs estimator for heavy tails. Communications in Statistics. Simulation and Computation 30 (4), 949­962. Abdul-Hamid, H., Nolan, J.P., 1998. Multivariate stable densities as functions of one dimensional projections. Journal of Multivariate Analysis 67, 80­89. Akgiray, V., Lamoureux, C.G., 1989. Estimation of stable law parameters: A comparative study. Journal of Business and Economical Statistics 7, 85­93. Alam, K., Saxena, K.M.L., 1982. Positive dependence in multivariate distributions. Communications in Statistics A10, 1183­1186. Arad, R.W., 1980. Parameter estimation for symmetric stable distribution. Econometric Reviews 21, 209­220. Bassi, F., Embrechts, P., Kafetzaki, M., 1988. Risk management and quantile estimation. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 111­130. Bawa, V.C., Elton, E.L., Gruber, M.J., 1979. Simple rules for optimal portfolio selection in stable Paretian markets. Journal of Finance 34, 1041­1047. Beirlant, J., Teugels, J.L., 1989. Asymptotic normality of Hilĺs estimator. In: Extreme Value Theory, Oberwolfach, 1987, Lecture Notes in Statistics, Vol. 51. Springer-Verlag, New York, pp. 148­155. Belkacem, L., Véhel, J., Walter, C., 2000. CAPM, risk and portfolio selection in -stable markets. Fractals 8 (1), 99­115. Bergström, H., 1952. On some expansions of stable distributions. Arkiv för Mathematik II 18, 375­378. Brorsen, B.W., Preckel, P.V., 1993. Linear regression with stably distributed residuals. Communications in Statistics. Theory and Methods 22, 659­667. Brorsen, W.B., Yang, S.R., 1990. Maximum likelihood estimates of symmetric stable distribution parameters. Communications in Statistics. Simulation and Computation 19 (4), 1459­1464. Buckle, D.J., 1995. Bayesian inference for stable distributions. Journal of the American Statistical Association 90, 605­613. Byczkowski, T., Nolan, J.P., Rajput, B., 1993. Approximation of multidimensional stable densities. Journal of Multivariate Analysis 46 (1), 13­31. Chamberlain, T.W., Cheung, C.S., Kwan, C.C., 1990. Optimal portfolio selection using the general multi-index model: A stable-Paretian framework. Decision Sciences 21, 563­571. Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 163 Cheng, B., Rachev, S.T., 1995. Multivariate stable future prices. Mathematical Finance 5 (2), 133­153. Csörg˝o, S., Deheuvels, P., Mason, D.M., 1985. Kernel estimates of the tail index of a distribution. The Annals of Statistics 13, 1050­1077. Csörg˝o, S., Mason, D.M., 1985. Central limit theorems for sums of extreme values. Mathematical Proceedings of the Cambridge Philosophical Society 98, 547­558. Csörg˝o, S., Viharos, L., 1995. On the asymptotic normality of Hilĺs estimator. Mathematical Proceedings of the Cambridge Philosophical Society 118, 375­382. Csörg˝o, S., Viharos, L., 1997. Asymptotic normality of least-squares estimators of tail indices. Bernoulli 3 (3), 351­370. Danielsson, J., Jansen, D.W., de Vries, C.G., 1996. The method of moments ratio estimator for the tail shape parameter. Communications in Statistics. Theory and Methods 25 (4), 711­720. Davydov, Yu., Paulauskas, V., 1999. On the estimation of the parameters of multivariate stable distributions. Acta Applicandae Mathematicae 58, 107­124. Davydov, Yu., Paulauskas, V., Rackauskas, A., 2000. More on p-stable convex sets in Banach spaces. Journal of Theoretical Probability 13 (1), 39­64. de Haan, L., Resnick, S., 1998. On asymptotic normality of the Hill estimator. Communications in Statistics. Stochastic Models 14 (4), 849­866. Deheuvels, P., Haeusler, E., Mason, D.M., 1988. Almost sure convergence of the Hilĺs estimator. Mathematical Proceedings of the Cambridge Philosophical Society 104, 371­381. Dekkers, A., de Haan, L., 1993. Optimal choice of sample fraction in extreme-value estimation. Journal of Multivariate Analysis 47, 173­195. Dekkers, A., Einmahl, J., de Haan, L., 1989. A moment estimator for the index of an extreme-value distribution. The Annals of Statistics 17 (4), 1833­1855. Drees, H., 1996. Refined Pickands estimators with bias correction. Communications in Statistics. Theory and Methods 25 (4), 837­851. Drees, H., de Haan, L., Resnick, S.I., 2000. How to make a Hill plot. The Annals of Statistics 28 (1), 254­274. Dostoglou, S.A., Rachev, S.T., 1999. Stable distributions and the term structure of interest rates. Mathematical and Computer Modelling 29, 57­60. DuMouchel, W., 1971. Stable distributions in statistical inference. Ph.D. Thesis. University of Michigan, Ann Arbor, MI. DuMouchel, W., 1973. On the asymptotic normality of the maximum-likelihood estimate when sampling from a stable distribution. The Annals of Statistics 1, 948­957. DuMouchel, W., 1975. Stable distributions in statistical inference 2: Information from stably distributed samples. Journal of the American Statistical Association 70, 386­393. DuMouchel, W., 1983. Estimating the stable index in order to measure tail thickness: A critique. The Annals of Statistics 11, 1019­1031. Embrechts, P., Klüppelberg, C., Mikosh, T., 1997. Modeling Extremal Events for Insurance and Finance. Springer, Berlin. Esary, J., Proschan, F., Walkup, D., 1967. Association of random variables with applications. The Annals of Mathematical Statistics 38, 1466­1474. Fama, E.F., 1965a. The behavior of stock market prices. Journal of Business 38, 34­105. Fama, E.F., 1965b. Portfolio analysis in a stable Paretian market. Management Science 11, 404­419. Fama, E., Roll, R., 1971. Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association 66, 331­338. Feuerverger, A., McDunnough, P., 1977. The empirical characteristic function and its applications. The Annals of Statistics 5, 88­97. Feuerverger, A., McDunnough, P., 1981a. On the efficiency of empirical characteristic function procedures. Journal of the Royal Statistical Society. Series B 43, 20­27. Feuerverger, A., McDunnough, P., 1981b. On efficient inference in symmetric stable laws and processes. In: Csörg˝o, M., Dawson, D.A., Rao, N.J.K., Saleh, A.K. (Eds.), Statistics and Related Topics. North-Holland, Amsterdam, pp. 109­122. 164 T.J. Kozubowski et al. Gamba, A., 1999. Portfolio analysis with symmetric stable Paretian returns. In: Canestrelli, E. (Ed.), Current Topics in Quantitative Finance. Springer-Verlag, Heidelberg, pp. 48­69. Gamrowski, B., Rachev, S.T., 1996. Testing the validity of value-at-risk measures. In: Heyde, C., et al. (Eds.), Applied Probability. Springer-Verlag, Berlin, pp. 307­320. Goldie, C.M., Smith, R.L., 1987. Slow variation with remainder: Theory and applications. The Quarterly Journal of Mathematics. Oxford. Second Series 38, 45­71. Haeusler, E., Teugels, J.L., 1985. On the asymptotic normality of Hilĺs estimator for the exponent of regular variation. The Annals of Statistics 13, 743­756. Hall, P., 1982. On some simple estimates of an exponent of regular variation. Journal of the Royal Statistical Society. Series B 44, 37­42. Hall, P., Welsh, A.H., 1984. Best attainable rates of convergence for estimates of regular variation. The Annals of Statistics 12, 1079­1083. Hall, P., Welsh, A.H., 1985. Adaptive estimates of parameters of regular variation. The Annals of Statistics 13, 331­341. Heathcote, C.R., Rachev, S.T., Cheng, B., 1995. Testing multivariate symmetry. Journal of Multivariate Analysis 54 (1), 91­112. Hill, B., 1975. A simple general approach to inference about the tail of a distribution. The Annals of Statistics 3, 1163­1173. Höpfner, R., Rüschendorf, L., 1999. Comparison of estimators in stable models. Mathematical and Computer Modelling 29, 145­160. Hurst, S.H., Platen, E., Rachev, S.T., 1999. Option pricing for a logstable asset price model. Mathematical and Computer Modelling 29, 105­119. Janicki, A., Popova, I., Ritchken, R., Woyczynski, W., 1997. Option pricing bounds in an -stable security market. Communications in Statistics. Stochastic Models 13, 817­839. Janicki, A., Weron, A., 1994. Simulation and Chaotic Behavior of -Stable Stochastic Processes. Marcel Dekker, New York. Jansen, D.W., de Vries, C.G., 1991. On the frequency of large stock returns: Putting booms and busts into perspective. Review of Economic Statistics 73, 18­24. Jurek, Z., 1984. Polar coordinates in Banach spaces. Bulletin of the Polish Academy of Sciences. Mathematics 32, 61­66. Jurek, Z., Mason, J.D., 1993. Operator-Limit Distributions in Probability Theory. Wiley, New York. Karandikar, R., Rachev, S.T., 1995. A generalized binomial model and option formulae for subordinated stockprice processes. Probability and Mathematical Statistics 15, 427­447. Klebanov, L.B., Melamed, J.A., Rachev, S.T., 1994. On the joint estimation of stable law parameters. In: Anastassiou, G., Rachev, S.T. (Eds.), Approximation, Probability and Related Fields. Plenum Press, New York, pp. 315­320. Klebanov, L.B., Rachev, S.T., 1996. Sums of random number of random variables and their approximations with -accompanying infinitely divisible laws. Serdica Mathematical Journal 22, 471­496. Koedijk, K.G., Schafgans, M.M., de Vries, C.G., 1990. The tail index of exchange rate returns. Journal of International Economics 29, 93­116. Kogon, S.M., Williams, D.B., 1998. Characteristic function based estimation of stable distribution parameters. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 311­335. Koutrouvelis, I.A., 1980. Regression-type estimation of the parameters of stable laws. Journal of the American Statistical Association 75, 918­928. Koutrouvelis, I.A., 1981. An iterative procedure for the estimation of the parameters of the stable law. Communications in Statistics. Simulation and Computation 10, 17­28. Kozubowski, T.J., 1993. Estimation of the parameters of geometric stable laws. Technical Report No. 253. Department of Statistics and Applied Probability, University of California, Santa Barbara; appeared in: Mathematical and Computer Modelling 29 (10­12), 241­253, 1999. Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 165 Kozubowski, T.J., 2001. Fractional moment estimation of Linnik and Mittag-Leffler parameters. Mathematical and Computer Modelling 34, 1023­1035. Kozubowski, T.J., Panorska, A.K., 1996. On moments and tail behavior of -stable random variables. Statistics Probability Letters 29, 307­315. Kozubowski, T.J., Panorska, A.K., 1998. Weak limits for multivariate random sums. Journal of Multivariate Analysis 67, 398­413. Kozubowski, T.J., Panorska, A.K., 1999a. Multivariate geometric stable distributions in financial applications. Mathematical and Computer Modelling 29, 83­92. Kozubowski, T.J., Panorska, A.K., 1999b. Simulation of geometric stable and other limiting multivariate distributions arising in random summation scheme. Mathematical and Computer Modelling 29, 255­262. Kozubowski, T.J., Podgórski, K., 2000. A multivariate and asymmetric generalization of Laplace distribution. Computational Statistics 15, 531­540. Kozubowski, T.J., Podgórski, K., 2001. Asymmetric Laplace laws and modeling financial data. Mathematical and Computer Modelling 34, 1003­1021. Kozubowski, T.J., Rachev, S.T., 1994. The theory of geometric stable distributions and its use in modeling financial data. European Journal of Operational Research 74, 310­324. Kozubowski, T.J., Rachev, S.T., 1999. Multivariate geometric stable laws. Journal of Computational Analysis and Applications 1 (4), 349­385. Kratz, M., Resnick, S.I., 1996. The QQ-estimator for heavy tails, Communications Statistics. Stochastic and Models 12 (4), 699­724. Lee, M.-L.T., Rachev, S.T., Samorodnitsky, G., 1990a. Association of Stable Random Variables. The Annals of Probability 18, 1759­1764. Lee, M.-L.T., Rachev, S.T., Samorodnitsky, G., 1990b. Dependence of stable random variables. In: Stochastic Inequalities. IMS Lecture Notes­Monograph Series, Vol. 22, pp. 219­234. Liu, S.M., Brorsen, B.W., 1995. Maximum likelihood estimation of a GARCH-stable model. Journal of Applied Econometrics 10 (3), 273­285. Loretan, M., Phillips, P.C.B., 1994. Testing the covariance stationarity of heavy-tailed time series. Journal of Empirical Finance 1, 211­248. Mandelbrot, B.B., 1963a. New methods of statistical economics. Journal of Political Economics 71, 421­440. Mandelbrot, B.B., 1963b. The variation of certain speculative prices. Journal of Business 36, 394­419. Mandelbrot, B., 1967. The variation of some other speculative prices. Journal of Business 40, 393­413. Mason, D.M., 1982. Laws of large numbers for sums of extreme values. The Annals of Probability 10, 754­764. McCulloch, J.H., 1979. Linear regression with symmetric stable residuals. Working Paper 63. Economics Department, Ohio State University. McCulloch, J.H., 1986. Simple consistent estimators of stable distribution parameters. Communications of Statistics. Simulation and Computation 15, 1109­1136. McCulloch, J.H., 1994. Estimation of bivariate stable spectral densities. Unpublished Manuscript. Department of Economics, Ohio State University. McCulloch, J.H., 1996. Financial applications of stable distributions. In: Maddala, G.S., Rao, C.R. (Eds.), Statistical Methods in Finance, Handbook in Statistics, Vol. 14. Elsevier, Amsterdam, pp. 393­425. McCulloch, J.H., 1998. Linear regression with stable disturbances. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 359­376. Meerschaert, M.M., Scheffler, H.P., 1998. A simple robust estimation method for the thickness of heavy tails. Journal of Statistical Planning and Inference 71, 19­34. Meerschaert, M.M., Scheffler, H.P., 1999. Moment estimator for random vectors with heavy tails. Journal of Multivariate Analysis 71, 145­159. Meerschaert, M.M., Scheffler, H.P., 2001. Limit Theorems for Sums of Independent Random Vectors. Wiley, New York. Mittnik, S., Paolella, M.S., 1999. A simple estimator for the characteristic exponent of the stable Paretian distribution. Mathematical and Computer Modelling 29, 161­176. 166 T.J. Kozubowski et al. Mittnik, S., Rachev, S.T., 1991. Alternative multivariate stable distributions and their applications to financial modelling. In: Cambanis, S., et al. (Eds.), Stable Processes and Related Topics. Birkhäuser, Boston, pp. 107­ 119. Mittnik, S., Rachev, S.T., 1993a. Modeling asset returns with alternative stable distributions. Econometric Reviews 12 (3), 261­330. Mittnik, S., Rachev, S.T., 1993b. Reply to comments on Modeling asset returns with alternative stable distributions, and some extensions. Econometric Reviews 12, 347­389. Mittnik, S., Rachev, S.T., 1996. Tail estimation of the stable index . Applied Mathematics Letters 9 (3), 53­56. Mittnik, S., Rachev, S.T., Doganoglu, T., Chenyao, D., 1999. Maximum likelihood estimation of stable Paretian models. Mathematical and Computer Modelling 29, 275­293. Mittnik, S., Rachev, S.T., Paolella, M.S., 1998. Stable Paretian modeling in finance: Some empirical and theoretical aspects. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 79­110. Mittnik, S., Rachev, S.T., Rüschendorf, L., 1999. Test of association between multivariate stable vectors. Mathematical and Computer Modelling 29, 181­195. Modarres, R., Nolan, J.P., 1994. A method for simulating stable random vectors. Computers & Statistics 9, 11­19. Nikias, C.L., Shao, M., 1995. Signal Processing with -Stable Distributions and Applications. Wiley, New York. Nolan, J.P., 1998. Multivariate stable distributions: Approximation, estimation, simulation and identification, In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 509­525. Nolan, J.P., 2001. Maximum likelihood estimation and diagnostics for stable distributions. In: BarndorffNielsen, O., Mikosch, T., Resnick, S. (Eds.), Lévy Processes. Birkhäuser, Boston, pp. 379­400. Nolan, J.P., Panorska, A.K., 1997. Data analysis for heavy tailed multivariate samples. Communications in Statistics. Stochastic Models 13, 687­702. Nolan, J.P., Panorska, A.K., McCulloch, J.H., 2001. Estimation of stable spectral measures. Mathematical and Computer Modelling 34, 1113­1122. Nolan, J.P., Rajput, B., 1995. Calculation of multidimensional stable densities. Communications in Statistics. Simulation and Computation 24, 551­556. Paulson, A.S., Delehanty, T.A., 1984. Some properties of modified integrated squared error estimators for the stable laws. Communications in Statistics. Simulation and Computation 13, 337­365. Paulson, A.S., Delehanty, T.A., 1985. Modified weighted squared error estimation procedures with special emphasis on the stable laws. Communications in Statistics. Simulation and Computation 14, 927­972. Paulson, A.S., Holcomb, E.W., Leitch, R., 1975. The estimation of the parameters of the stable law. Biometrika 62, 163­170. Phillips, P.C.B., 1993. Comments on "Modeling asset returns with alternative stable distributions, and some extensions." Econometric Reviews 12, 331­338. Pickands, J., 1975. Statistical inference using extreme-order statistics. The Annals of Statistics 3, 1­13. Pitt, L., 1982. Positively correlated normal variables are associated. The Annals of Probability 10 (2), 496­499. Press, S.J., 1972a. Multivariate stable distributions. Journal of Multivariate Analysis 2, 444­462. Press, S.J., 1972b. Estimation of univariate and multivariate stable distributions. Journal of the American Statistical Association 67, 842­846. Press, S.J., 1982. Applied Multivariate Analysis, 2nd edition, Krieger, Malabar. Qiou, Z., Ravishanker, N., 1995. Bayesian inference for stable law parameters. Technical Report. University of Connecticut. Rachev, S.T., Han, S., 2000. Portfolio management with stable distributions. Mathematical Methods of Operations Research 51 (2), 341­352. Rachev, S., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley, Chichester. Rachev, S.T., Rüschendorf, L., 1994. On the Cox­Ross and Rubinstein model for option pricing. Theory of Probability and its Applications 39, 150­190. Rachev, S.T., Samorodnitsky, G., 1993. Option pricing formula for speculative prices modelled by subordinated stochastic processes. Pliska 19, 175­190. Ch. 4: Statistical Issues in Modeling Multivariate Stable Portfolios 167 Rachev, S.T., Xin, H., 1993. Test for association of random variables in the domain of attraction of multivariate stable law. Probability and Mathematical Statistics 14 (1), 125­141. Ravishanker, N., Qiou, Z., 1998. Bayesian inference for time series with infinite variance stable innovations. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 259­280. Resnick, S.I., 1997. Heavy tail modeling and teletraffic data. The Annals of Statistics 25 (5), 1805­1869. Resnick, S.I., 1998. Why non-linearities can ruin heavy-tailed modeler's day. In: Adler, R.J., Feldman, R.E., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 219­239. Resnick, S., Greenwood, P., 1979. A bivariate stable characterization and domains of attraction. Journal of Multivariate Analysis 9, 206­221. Resnick, S.I., St˘aric˘a, 1997. Smoothing the Hill estimator. Advances in Applied Probabability 29, 271­293. Rosen, O., Weissman, I., 1996. Comparison of estimation methods in extreme value theory. Communications in Statistics. Theory and Methods 25 (4), 759­773. Rosínski, J., 1976. Weak compactness of laws of random sums of identically distributed random vectors in Banach spaces. Colloquium Mathematicum 35, 313­325. Rvaˇceva, E., 1962. On domains of attraction of multidimensional distributions. In: Select. Transl. Math. Stat. Prob., Vol. 2. American Mathematical Society, Providence, RI, pp. 183­205. Samorodnitsky, G., Taqqu, M., 1994. Stable Non-Gaussian Random Processes. Chapman & Hall, New York. Scheffler, H.P., 1999. On the estimation of the spectral measure of certain nonnormal operator stable laws. Statistics & Probabability Letters 43 (4), 385­392. Schultze, J., Steinebach, J., 1996. On least squares estimates of an exponential tail coefficient. Statistics & Decisions 14, 353­372. Sharpe, M., 1969. Operator-stable probability distributions on vector groups. Transactions of the American Mathematical Society 136, 51­65. Stuck, B.W., 1976. Distinguishing stable probability measures. Part I: Discrete time. Bell System Technical Journal 55, 1125­1182. Uchaikin, V.V., Zolotarev, V.M., 1999. Chance and Stability: Stable Distributions and their Applications. VSP, Utrecht. Weron, R., 1996. On the Chambers­Mallows­Stuck method for simulating skewed stable random variables. Statistics & Probability Letters 28, 165­171. Ziemba, W.T., 1974. Choosing investment portfolios when the returns have stable distributions. In: Hammer, P.L., Zoulendijk, G. (Eds.), Mathematical Programming in Theory and Practice. North Holland, Amsterdam, pp. 443­482. Chapter 5 JUMP-DIFFUSION MODELS WOLFGANG J. RUNGGALDIER Dipartimento di Matematica Pura ed Applicata, Universit di Padova, 7 Via Belzoni, I-35131, Padova, Italy web: http://www.math.unipd.it/runggaldier/index.html Contents Abstract 170 Keywords 170 1. Introduction 171 2. Preliminaries 173 2.1. Univariate point processes (Poisson jump processes) 173 2.2. Multivariate and marked point processes 175 2.3. Martingale representation 177 2.4. Exponential formula; generalized Ito formula 178 2.5. Absolutely continuous transformation of measures 180 3. Market models with jump-diffusions 182 3.1. Asset-price and term structure models with additive jumps 182 3.1.1. Asset price models with jumps 182 3.1.2. Term structure models with jumps 183 3.2. Jump-diffusion models driven by hidden jump processes 184 3.3. Asset prices as diffusions sampled at the jump times of a jump process 185 4. Martingale measures: Existence and uniqueness (Market price of risk and market completion) 186 4.1. The case of jump-diffusion asset price models 187 4.2. The case of jump-diffusion term structure models 190 5. Hedging in jump-diffusion market models 193 5.1. Hedging when the market is completed 194 5.1.1. Asset-price models 194 5.1.2. Term structure models 196 5.2. Hedging when the market is not complete 198 6. Pricing in jump-diffusion models 201 6.1. General aspects 201 6.2. Computational aspects 203 References 207 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 170 W.J. Runggaldier Abstract We discuss jump-diffusion type models for financial market as well as methods for pricing and hedging of contingent claims in such markets. We consider both, asset price and term structure models, and deal also with situations when there is a stochastic volatility correlated with the jumps and when one has very small time scales, i.e., high frequency data. To make the presentation possibly self-contained, in a preliminary section we recall some basic notions from stochastic analysis for jump-diffusions. Keywords jump-diffusions, Poisson point processes, marked point processes, Cox processes, hidden processes, martingale measures, market price of risk, pricing and hedging in incomplete markets, market completion, risk minimization, stochastic volatility, high frequency data, computing expectations of functionals of jump-diffusions Ch. 5: Jump-Diffusion Models 171 1. Introduction Most of the standard literature in Finance, in particular for pricing and hedging of contingent claims, is based on the assumption that the prices of the underlying assets follow a diffusion-type process, in particular a geometric Brownian motion (GBM). Documentation from various empirical studies shows that such models are inadequate, both in relation to their descriptive power, as well as for the mispricing that they might induce. The contributions to the present volume deal with various generalizations of the basic GBM; here we concentrate on the fact that returns of various asset prices and interest rates may exhibit a jumping behaviour. We thus study possible superpositions of jump and diffusion processes, namely what is called jump-diffusion processes. Jump-diffusions form a particular class of Levy processes. Our purpose here is not to study the general case of a Levy driving process, but rather to concentrate on the specific aspects of the subclass of jump-diffusions. Jump-diffusion models have also some intuitive appeal in that they let prices and interest rates change continuously most of the time, but they also take into account the fact that from time to time larger jumps may occur that cannot be adequately modeled by pure diffusion-type processes. Among the earlier empirical studies, documenting a jumping behaviour in prices and interest rates, one may quote Ball and Torous (1985), Jorion (1988). There are also studies, such as Babbs and Webber (1997), putting forward specific sources of jumps in interest rates like moves by central banks. On the other hand, a first approach developing further the basic Black and Scholes (BS) model with the inclusion of jumps appears to be that of Merton (1976). Since the introduction of jumps in the BS model implies that derivative prices are no longer determined by the principle of absence of arbitrage alone, Merton solved the pricing problem by assuming that the jump risk was not systematic. This was later criticized by showing that such an assumption is equivalent to the existence of a market portfolio, that contains the underlying asset and that does not present a jumping behaviour [for a discussion on this point see, e.g., Björk and Näslund (1998)]. Further studies then appeared showing that jumps in stock returns are indeed systematic. Another early approach is that in Cox and Ross (1976), where the market remains however complete since the authors consider just a simple jump-type process with fixed jump amplitude and thus with a single source of randomness. One of the major purposes of this chapter is now to try to give an overview of the state of the art of jump-diffusion modeling in stock and bond markets as well as of the corresponding approaches for pricing and hedging. It was further documented in empirical studies [see, e.g., Bakshi, Cao and Chen (1997)] that a combination of jumps and stochastic volatility leads to even better fits and allows to avoid implied volatility skews. Stochastic volatility models are treated elsewhere and so in this chapter we limit ourselves to stochastic volatility in conjunction with jump-diffusion modeling, also because empirical documentation gives evidence for a jump-type behaviour in the volatility and of a correlation between jumps in volatility and jumps in prices. In fact, as mentioned, e.g., in Naik (1993), it is natural to expect that, if the volatility jumps, also the price should jump. A further purpose of the present chapter is then to discuss issues related to such more general jump-diffusion-stochastic-volatility modeling. 172 W.J. Runggaldier On very small time scales actual prices do not really change continuously over time, but rather at discrete random points in time in reaction to trade and/or significant new information. We shall show that also such situations can be captured by models featuring a combination of diffusion and jump processes, that is however different from the canonical jump-diffusion processes. The outline of the chapter is as follows. In Section 2 we recall some preliminary notions from stochastic analysis for jump-diffusion processes, such as a martingale representation result and generalized versions of the Ito formula as well as of the Girsanov measure transformation. We limit ourselves to those notions that will be used in the sequel. In Section 3 we then describe various market models based on jump-diffusion representations. More precisely, in line with the introductory remarks above, we shall consider first canonical jump-diffusion models for stock and bond markets, then consider jump-diffusions correlated with stochastic volatility and, finally, combinations of diffusions and jumps to describe high frequency data. In Section 4 we discuss existence and uniqueness of martingale measures in a jump-diffusion setting, exhibiting also the market price of (jump-diffusion) risk. Some emphasis is given to the notion of completion of the market as a tool to obtain a unique martingale measure. In this context it is also pointed out that uniqueness of a martingale measure does not necessarily always imply completeness of the market in the sense of hedging, namely that every claim can be duplicated by a self financing portfolio. In Section 5 we then concentrate on hedging in jump-diffusion market models having two goals in mind: first, investigating whether and when a market, that has been completed to yield a unique martingale measure, is also complete in the sense of hedging. Second, to study the hedging problem when the market cannot be completed or market completion is inappropriate. In such cases there is always some residual risk and so one may want to choose a strategy such as to minimize a criterion related to this risk. Finally, Section 6 is devoted to the problem of pricing in jump-diffusion market models. With jumps and/or stochastic volatility, the market is incomplete. The principle of absence of arbitrage alone is then insufficient to define uniquely a price and so the preference structure of investors has to come into play to determine a pricing measure. From the point of view of pure pricing, the problem reduces formally to that of determining a specific martingale measure. In Section 6.1 we mention various approaches to this effect, related to the literature, in particular approaches based on market completion and on the relationship between the choice of a hedging criterion and that of a martingale measure. Once a martingale measure has been chosen, there remains the problem of the actual computation of the expectation of the discounted claim and this is dealt with in Section 6.2. Unavoidably, this overview of the state of the art may turn out to be incomplete and reflects the specific interests and competences of the author. Among the topics that are not discussed here, we just mention the American-type options in a jump-diffusion setting [for this see, e.g., Mulinacci (1996), Pham (1997)] and Portfolio Optimization [see, e.g., Framstad, Oeksendal and Sulem (2001) and references therein]. The same has to be said about the references to the literature: while we have tried to take into account a good deal of recent papers on the subject, we have only quoted a small selection of previous papers in order to keep the list within a reasonable size. Still, we hope to have succeeded in giving Ch. 5: Jump-Diffusion Models 173 a sufficiently comprehensive account on models and methods related to jump-diffusions in financial markets. 2. Preliminaries In this section we recall basic definitions and results needed for the study of jump-diffusion models, limiting ourselves to multivariate (univariate) and marked point processes and assuming that the reader is familiar with the corresponding notions concerning diffusion processes. In addition to the basic definitions we recall here a martingale representation result and discuss the Ito formula and Girsanov's measure transformation, generalized to jump-diffusion processes. The main reference for this section is Brémaud (1981), from which most of the contents of the section are taken. 2.1. Univariate point processes (Poisson jump processes) A point process is intended to describe events that occur randomly over time. It can be represented as a sequence of nonnegative random variables 0 = T0 < T1 < T2 < , where the generic Tn is the n-th instant of occurrence of an event. One makes the usual assumption of nonexplosion, according to which T = lim Tn = +. The process may equivalently be represented via its associated counting process Nt where Nt = n if t [Tn,Tn+1), n 0, or, equivalently, Nt = n 1 1{Tn t}. (1) It counts the number of events up to and including time t. The nonexplosion condition becomes Nt < for t 0. Both, Tn and Nt , are defined on some probability space (,F,P) with a filtration Ft to which Nt is adapted. A point process Nt is called a Poisson point process if (i) N0 = 0; (ii) Nt is a process with independent increments; (iii) Nt - Ns is a Poisson random variable with a given parameter s,t . Usually one assumes s,t = t s u du for a deterministic function t ; the latter is called the intensity of the Poisson point process Nt . If Ft is the filtration FN t , generated by Nt , and t 1, then Nt is called a standard Poisson process. It is also easily seen that, if Nt is a Poisson process with intensity t , then Tn+1 - Tn are i.i.d., exponential random 174 W.J. Runggaldier variables with parameter . A natural interpretation of the intensity and of this latter property comes from relating the above setup with the usual Poisson model, that is based on the following assumptions: (a) the probability of one change/jump in an interval of length is + o(); (b) the probability of two or more changes/jumps in an interval of length is o(); (c) the number of changes/jumps in nonoverlapping intervals are stochastically indepen- dent. In this setup one can in fact consider the two "dually" related random variables: (1) A discrete random variable X describing the number of changes/jumps in a time interval of given length T and having as distribution the usual Poisson distribution, i.e., P{X = k} = (T )k k! e-T , k N. (2) A continuous random variable T describing the time that is needed to obtain k successive changes/jumps and for which the distribution is of the Gamma-type with density fT (t) = k (k) tk-1 e-t , T > 0. The parameter is the same in both cases and corresponds to the in assumption (a) above. It will be convenient to consider also the case when the intensity of a Poisson process is itself an adapted process being driven by some background process. This can be explained by a two-step randomization procedure: first one draws at random a trajectory of the background process, say Zt ; then one generates a Poisson process with intensity t = (t,Zt), where the dependence also on t allows to incorporate seasonality effects. We now have a Poisson process Nt conditionally on Zt and it is called a doubly stochastic Poisson process, or a Cox process [see Cox (1955)]. Formally, we require that the random intensity t is F0measurable, i.e., FZ F0. For additional details of the intensity of a Poisson process we refer to Brémaud (1981). Notice that the above characterizations (i)­(iii) of a Poisson process parallel those of a Wiener process: both are processes with independent increments; the increments of a Wiener process are normally distributed, while those of a Poisson process are Poisson distributed. The Wiener process is the basic building block for processes with continuous trajectories, the Poisson process is a basic building block for processes with jumping trajectories. On the other hand, while the Wiener process is itself a martingale, a Poisson process as such is not. It becomes a martingale if one subtracts from Nt the process given by its mean. Indeed, Mt := Nt - t 0 s ds (2) Ch. 5: Jump-Diffusion Models 175 is an Ft -martingale by the F0-measurability of t , assuming in addition that E{ t 0 u du} < . By (iii) one then has in fact E{Nt - Ns|Fs} = E t s u du|Fs (3) which implies that E{Nt} < and that Mt in (2) is an Ft -martingale. Equality (3) admits a generalization in the form E 0 Cs dNs = E 0 Css ds (4) that has to be valid for all nonnegative, Ft -predictable processes Ct and as such characterizes a doubly stochastic Poisson process with intensity t [see Brémaud (1981)]. 2.2. Multivariate and marked point processes Let Tn be a (univariate) point process and Yn, n 1, a sequence of random variables with values in {1,2,...,K}, all defined on the same (,F,P). For each k = 1,...,K we may then consider the counting process Nt(k) := n 1 1{Tn t}1{Yn=k}. Each Nt(k) is a univariate point process and the various Nt (k)'s have no common jumps, i.e., Nt(k) Nt(h) = 0, t 0 and all k = h. Analogously to the case of univariate point processes, here too we have now two equivalent representations, either as the double sequence (Tn,Yn)n 1 or as the K-vector process Nt = (Nt (1),...,Nt (K)) and this process is called a multivariate, more precisely a K-variate point process. As in the univariate case, here too we shall mainly use the representation as the K-vector process Nt and we have formula (2) with Mt a K-vector and t the K-vector intensity process whose components are the individual intensities of the components Nt (k) of Nt . Considering the representation (Tn,Yn), we may interpret Tn as the n-th occurrence of some phenomenon and Yn as an attribute or mark of this phenomenon. We may then speak of (Tn,Yn) as a marked point process, or space-time point process and extend its definition to allow Yn to take values in a general measurable mark space (E,E). We synthesize the foregoing in the following Definition 2.1. An E-marked point process is a double sequence (Tn,Yn)n 1 where (i) Tn is a (univariate) point process; (ii) Yn is a sequence of E-valued random variables. Obviously, the univariate and multivariate point processes are special cases of a marked point process. 176 W.J. Runggaldier Generalizing the representation of a multivariate point process in the form of the K-vector process Nt , we associate to each A E the counting process Nt (A) := n 1 1{Tn t}1{YnA} and let simply Nt := Nt (E). Considering the filtration FN t := Ns(A); s t, A E define the associated (random) counting measure p (0,t],A = Nt (A), t 0, A E, (5) which is -finite under the assumption of nonexplosion of Tn. This measure allows to obtain more concise expressions via integrals of the form t 0 E H(s,y)p(ds,dy) = n 1 H(Tn,Yn)1{Tn t} = Nt n=1 H(Tn,Yn). (6) Again, we may represent an E-marked point process equivalently as the double sequence (Tn,Yn) or as the counting measure p(ds,dy). To introduce now the intensity process in this more general setup, assume that for each A E, the point process Nt (A) admits the intensity t (A). This then leads to a measurevalued intensity t (dy) so that, generalizing (4), one has E 0 E H(s,y)p(ds,dy) = E 0 E H(s,y)s(dy)ds (7) that has to be valid for all nonnegative Ft -predictable E-marked processes H (given a filtration Ft on , Ft -predictability here means measurability with respect to P(Ft) E where P(Ft) is the predictable -field on (0,) × ). We have also the generalization of (2) in the form q(ds,dy) = p(ds,dy) - s(dy)ds, (8) where q(ds,dy) is a (signed) measure-valued martingale in the sense that t 0 E H(s,y)q(ds,dy) is a (P,Ft )-martingale (local martingale) for each Ft -predictable E-marked process H , satisfying appropriate integrability conditions. The most common form of intensity is t (dy) = t mt(dy), (9) Ch. 5: Jump-Diffusion Models 177 where t is nonnegative Ft -predictable and represents the intensity of the Poisson process Nt(E), while mt(dy) is a probability measure on E (typically, the Yn will be i.i.d., independent of Nt (E)). The pair (t ,mt(dy)) is called the (P,Ft )-local characteristics of p(ds,dy). Notice finally that, as in the univariate case, we may let t(dy) depend on some driving F0-measurable random process Zt, leading to a doubly stochastic marked point process. If, in the representation (9), t is a deterministic time function, the marked point process is called a marked Poisson process. 2.3. Martingale representation Martingale representation results are widely used in Finance, especially when it comes to solving hedging problems. For pure "Wiener-martingales" we have in fact the well-known result that every square integrable martingale with respect to the filtration generated by a Wiener process is, up to an additive constant, a stochastic integral of the Ito type. We shall now recall a corresponding result for point-process martingales that we formulate in the most general case of a marked point process. We have in fact the following theorem [see Theorem VIII, T8 in Brémaud (1981)] Theorem 2.2. Let (,F,Ft,P) be a probability space satisfying the "usual assumptions" where Ft = F0 F p t with F p t the filtration generated by a marked point process, represented by the counting measure p(dt,dy). Then any (P,Ft )- martingale Mt admits the representation Mt = M0 + t 0 E H(s,y)q(ds,dy) (10) with q() as in (8) and H an integrable (with respect to t (dy)) Ft -predictable E-marked process. This representation is essentially unique. In the case of a multivariate (and therefore also univariate) point process, the representation (10) becomes Mt = M0 + K k=1 t 0 Hs(k) dNs(k) - s(k)ds , (11) where [Ht(1),...,Ht(K)] is Ft -predictable with Ht(k) integrable with respect to t (k). This representation result can be generalized according to Jacod and Shiryaev (1987) to include martingales that are simultaneously "Wiener" and point-process martingales and that will have some relevance later on. Theorem 2.3. Given a Wiener process wt and a marked point process p(ds,dy), let Ft := ws,p (0,s],A ,B; 0 s t, A E, B N 178 W.J. Runggaldier with N the collection of P -null sets from F. Then any (P,Ft )-local martingale Mt has the representation Mt = M0 + t 0 s dws + t 0 E H(s,y) p(ds,dy) - s(dy)ds , (12) where t is predictable and square integrable and H is an Ft-predictable E-marked process, integrable with respect to t (dy). 2.4. Exponential formula; generalized Ito formula With the definition of a marked point process and of integrals in the form of (6), we may now consider processes of the general type Xt = X0 + t 0 s ds + t 0 s dws + t 0 E (s,y)p(ds,dy) (13) that are called jump-diffusion processes and where the coefficients satisfy the implicit integrability conditions, t is adapted and (t,y) is predictable in the sense as defined previously. As usual, we may rewrite (13) in differential form and consider, more specifically, differential equations of the type dXt = Xt- t dt + t dwt + E (t,y)p(dt,dy) , (14) where we write Xt- with t- because of the predictability requirement in the last coefficient and where (t,y) > -1. Notice that [see (6)] the last term in (14) can also be written as E (t,y)p(dt,dy) = (t,Yt )dNt, (15) where Nt = Nt (E) = p((0,t],E) is the total number of jumps and Yt denotes the piecewise constant, left-continuous time interpolation of the sequence Yn. Notice also that, in the case of a multivariate (in particular univariate) point process, this last term in (14) takes the form E (t,y)p(ds,dy) = K k=1 t(k)dNt(k). (16) We shall not discuss here in detail equations of the form (14), in particular the uniqueness of their solutions, but limit ourselves to show that a solution to (14) is given by the following Ch. 5: Jump-Diffusion Models 179 Exponential formula Xt = X0 exp t 0 s - 1 2 2 s ds + t 0 s dws + t 0 log 1 + (s,Ys) dNs = X0 exp t 0 s - 1 2 2 s ds + t 0 s dws Nt n=1 1 + (Tn,Yn) . (17) While the diffusion part in this expression follows from the usual Ito formula, the jump part follows from the so-called exponential formula of Stieltjes­Lebesgue Calculus [see Theorem T4 of Appendix A4 in Brémaud (1981)], but it can also be obtained from the generalized Ito formula as we shall show next. For this purpose let a process Xt satisfy the general Equation (13). Given a C1,2-function F(t,X), we have the generalized Ito formula dF(t,Xt) = Ft ()dt + FX()t dt + 1 2 FXX()2 t dt + FX()t dwt + F t,Xt- + (t,Yt) - F(t,Xt-) dNt (18) that, in the specific case of (14), becomes dF(t,Xt) = Ft ()dt + FX()Xt t dt + 1 2 FXX()X2 t 2 t dt + FX()Xt t dwt + F t,Xt- 1 + (t,Yt) - F(t,Xt-) dNt (19) and where, again, Nt = Nt (E) = p((0,t],E) and () stands for (t,Xt); the pedices in F denote partial derivatives. Notice that, if (19) is written in integral form, for the last term on the right we have the two equivalent representations t 0 F s,Xs- 1 + (s,Ys) - F(s,Xs-) dNs = Nt n=1 F(Tn,XTn ) - F(Tn,XT - n ) , where the right-hand side remains the same also in the more general case of (18). We shall now use the generalized Ito formula (19) to obtain the solution (17) of Equation (14). Choosing F(t,X) = logX, from (19) and (14) we have dF = t dt - 1 2 2 t dt + t dwt + log 1 + (t,Yt ) dNt from which logXt = logX0 + t 0 s - 1 2 2 s ds + t 0 s dws + t 0 log 1 + (s,Ys-) dNs, (20) i.e., we obtain (17) by taking the exponential on both sides in (20). 180 W.J. Runggaldier 2.5. Absolutely continuous transformation of measures We recall that in the classical case of Wiener driven diffusion processes, the Girsanov-type measure transformation concerns a translation of the Wiener process that in turn induces a change in the drift of the diffusion equation. In view of its generalization below, we recall here the basic result of Girsanov's transformation, conveniently reformulated for a finite time horizon t [0,T ]. Theorem 2.4 (Girsanov's measure transformation). Given a filtered probability space (,F,Ft,P) with F = t Ft , let t [0,T ] with T given and t be a square integrable predictable process. Define L = (Lt ) by dLt = Ltt dwt, L0 = 1, (21) and suppose that, for all t, EP {Lt } = 1. Then there exists a probability measure Q on F, equivalent to P , with dQ = LT dP such that dwt = t dt + dw Q t , (22) where wQ t is a Q-Wiener process. Conversely, if Ft = Fw t , then every probability measure Q, equivalent to P , has the above structure. Notice that the second statement relies on martingale representation and requires thus the filtration Ft to be the one generated by the Wiener process. As mentioned, Girsanov's measure transformation allows to change the drift in a diffusion equation. In fact, suppose that under P we have dXt = atXt dt + tXt dwt and that we would like to change to a measure Q P ( meaning equivalent to), under which the same Xt satisfies dXt = rtXt dt + t Xt dwQ t . In this case just take t = -1 t (rt - at). If, besides a Wiener wt , we now have also a marked point process represented by a counting measure p(dt,dy), a Girsanov-type measure transformation allows, in addition to the translation of the Wiener, to perform also a change in the intensity process of the point process part. We have [see Theorem VIII, T10 in Brémaud (1981), see also Björk, Kabanov and Runggaldier (1997)] Theorem 2.5. On the finite time interval [0,T ] let p(dt,dy) be an E-marked point process with (P,Ft)-local characteristics (t ,mt(dy)). Let t 0 be Ft - predictable and ht (y) 0 an Ft -predictable E-indexed process such that, P -a.s. and for all t [0,T ], t 0 ss ds < ; E ht(y)mt(dy) = 1. Ch. 5: Jump-Diffusion Models 181 Define Lt = L(1) t L(2) t where L(1) t satisfies (21) and L(2) t satisfies dL (2) t = E tht (y) - 1 L (2) t- q(dt,dy) (23) with q(dt,dy) = p(dt,dy) - t mt(dy)dt ­ the martingale measure associated with p(dt,dy). If EP {L (2) t } = 1 for all t, then all the statements of Theorem 2.4 hold true in addition to the fact that p(dt,dy) has the (Q,Ft)-local characteristics (t t ,ht(y)mt(dy)). Notice that, using (21) and (23), we have for the Radon­Nikodym derivative Lt dLt = d L(1) t L(2) t = L(1) t- dL(2) t + L(2) t dL(1) t = Ltt dwt + Lt- E t ht(y) - 1 q(dt,dy), L0 = 1. (24) Using the exponential formula (17), we have that a solution of (24) is given by Lt = exp - 1 2 t 0 2 s ds + t 0 s dws × exp t 0 E 1 - shs(y) sms(dy)ds Nt n=1 Tn hTn (Yn) . (25) In the case of a multivariate (in particular univariate) point process (Nt(1),...,Nt (K)) with (P,Ft )-intensities (t (1),...,t(K)), consider an Ft -predictable process (t (1),..., t (K)) such that, P -a.s. and for t [0,T ], K k=1 t 0 s(k)s(k)ds < . Define then L (2) t by dL(2) t = K k=1 t (k) - 1 L(2) t- dNt(k) - t(k)dt (26) instead of by (23) making also corresponding changes in (24) and (25) for the RadonNikodym derivative Lt , namely Lt = exp - 1 2 t 0 2 s ds + t 0 s dws K k=1 exp t 0 1 - s(k) s(k)ds Nt (k) n=1 Tn (k) . (27) Then, under Q, the intensities become (t(1)t (1),...,t (K)t (K)). Notice, finally, that a condition to have EP {L (2) t } = 1 can be found in Theorem VIII, T11 of Brémaud (1981). 182 W.J. Runggaldier 3. Market models with jump-diffusions In this section we introduce various jump-diffusion type models that were studied in the literature and that we shall be dealing with in the sequel. In the first two subsections we discuss, for asset price and term structure models respectively, the canonical jump-diffusion models in which there are two additive terms: a diffusion term and a jump term. In the last two subsections we then discuss diffusion/jump-diffusion models with stochastic volatility, where the latter is also described in terms of a jumping process. In addition, in the last subsection we model asset price behaviour on very small time scales where actual prices do not change continuously in time but rather at discrete random time points in reaction to trades and significant information. This then leads to a rather peculiar combination of diffusion and jump processes. 3.1. Asset-price and term structure models with additive jumps As mentioned in the Introduction, the asset price evolution can perhaps be adequately described by a GBM for most of the time, but from time to time a large jump may occur and this cannot be adequately captured by a GBM. It appears thus natural to introduce models, where a jump process can be superimposed on a GBM, e.g., by adding to the diffusion term also a jump term. In a first subsection we discuss this modeling issue in the context of asset prices, while in the second subsection we concentrate on interest rate modeling. 3.1.1. Asset price models with jumps In this section we adapt the outline of Section 7.2 in Lamberton and Lapeyre (1997). Let the price St of a risky asset jump at the random times T1,...,Tn,... and suppose that the relative/proportional change in its value at a jump time is given by Y1,...,Yn,... respectively. We may then assume that, between two jump times, the price St follows a Black and Scholes model for a Wiener process wt , that Tn are the jump times of a Poisson process Nt with intensity t and that Yn is a sequence of random variables with values in (-1,). This description can be formalized by letting, on the intervals [Tn,Tn+1), dSt = St (t dt + t dwt) (28) while, at t = Tn, the jump is given by Sn = STn - ST - n = ST - n Yn so that STn = ST - n (1 + Yn) (29) which, by the assumption that Yn > -1, leads always to positive values of the prices. Using the standard Ito formula to obtain the solution to (28) as well as a recursive argument based on (29), it is easily seen that, at the generic time t, St can be given the following equivalent representations St = S0 exp t 0 s - 2 s 2 ds + t 0 s dws Nt n=1 (1 + Yn) Ch. 5: Jump-Diffusion Models 183 = S0 exp t 0 s - 2 s 2 ds + t 0 s dws + Nt n=1 log(1 + Yn) = S0 exp t 0 s - 2 s 2 ds + t 0 s dws + t 0 log(1 + Ys)dNs , (30) where, as before, Yt is obtained from Yn by a piecewise constant and left continuous time interpolation. By the generalized Ito formula (19), the process St in (30) is easily seen to be a solution of dSt = St-[t dt + t dwt + Yt dNt ]. (31) This equation corresponds to (28) with the addition of a jump term and is a particular case of the general jump-diffusion model (14) ((15)) when (t,y) = y. In what follows we shall thus consider the more general version of (31) given by dSt = St- t dt + t dwt + (t,Yt )dNt (32) that corresponds to (14) in the version of (15) and can thus equivalently be represented as dSt = St- t dt + t dwt + E (t,y)p(dt,dy) . (33) If the marked point process is in particular a multivariate (or univariate) point process (Nt(1),...,Nt (K)), then (32) ((33)) takes the form (see also (16)) dSt = St- t dt + t dwt + K k=1 t(k)dNt(k) . (34) We finally point out that the marked point process in (32) ((33)) may be doubly stochastic in the sense specified in Sections 2.1 and 2.2 and this allows for further flexibility when it comes to modeling. Remark 3.1. Occasionally, in the financial literature one finds model (32) ((33)) written in the form dSt = St-[t dt + t dwt + dJt], where, in the specific case when (32) reduces to (31), Jt := Nt n=1 Yn, while in the general case Jt := Nt n=1 (Tn,Yn). Furthermore, in models of the form (31) one may find the last term Yt dNt written as (Yt - 1)dNt; in this latter case, instead of (29), we would then have STn = ST - n Yn = ST - n YTn . 3.1.2. Term structure models with jumps Among the basic objects in term structure models we have the zero-coupon bonds with prices p(t,T ) (the price, at t, of a bond maturing at T ), forward rates f (t,T ) (the rate, contracted at t, for instantaneous borrowing at T ), and the short rate r(t). There 184 W.J. Runggaldier exist some well-known relationships among these quantities, in particular f (t,T ) = - logp(t,T )/T ; r(t) = f (t,t). Since interest rates, and therefore also bond prices may indeed jump, one may consider the following jump-diffusion models for the above three quantities dr(t) = at dt + bt dwt + E c(t,y)p(dt,dy), (35) df (t,T ) = (t,T )dt + (t,T )dwt + E (t,T ;y)p(dt,dy), (36) dp(t,T ) = p(t-,T ) m(t,T )dt + v(t,T )dwt + E n(t,T ;y)p(dt,dy) , (37) where the differential is with respect to the time argument t, not the maturity T . Notice that only (37) has the factor p(t-,T ) also in the right-hand side. This guarantees (see the exponential formula (17)) positivity of p(t,T ) as it should be since p(t,T ) is the price of an asset; the interest rates r(t), f (t,T ) need not necessarily be positive. Given the wellknown relationships between the three quantities in (35)­(37), there obviously has to exist a relationship also between the coefficients in these models. This relationship can be found in Proposition 2.2. of Björk, Kabanov and Runggaldier (1997). So far we have mentioned only continuously compounded interest rates. In financial markets also discretely compounded or simple rates such as LIBOR rates play an important role. Given a fixed accrual period , denote by L(t,T ) the forward rate, contracted at t < T , for the interval from T to T + . Jump-diffusion models for L(t,T ) are studied in Glasserman and Kou (1999) under the form dL(t,T ) = L(t-,T ) (t,T )dt + (t,T )dwt + dJ(t,T ) , (38) where (see Remark 3.1) J(t,T ) = Nt n=1 (Tn,Yn) for a given marked point process represented by the double sequence (Tn,Yn) [for a more general setup beyond jump-diffusions see Jamshidian (1999)]. Notice that the relationship L(t,T ) = 1 exp T + T f (t,s)ds - 1 (39) between discretely and continuously compounded forward rates induces a relationship between the coefficients of the corresponding dynamic equations (36) and (39). 3.2. Jump-diffusion models driven by hidden jump processes As mentioned in the introduction, empirical studies have led to consider also combinations of jumps and stochastic volatility, where the volatility presents a jump-type behaviour and is possibly also correlated with the jumps in the prices. As pointed out in Naik (1993), it is in fact natural to expect that, when the volatility jumps, also the price should jump. One can capture these aspects by a jump-diffusion model, where the coefficients depend on a Ch. 5: Jump-Diffusion Models 185 hidden/latent jump process Zt that affects also the intensity of the marked point process in the jump term (doubly stochastic marked point process). Formally, and limiting ourselves to asset price models of the form of (33) (that are equivalent to (32) and include (34)), we then have dSt = St- t(Zt)dt + t(Zt)dwt + E (t,y;Zt-)p(dt,dy) , (40) where Zt is any jump process with non-predictable jumps (could also be a Markov jump process) and p(dt,dy) is the counting measure of a doubly stochastic marked point process with intensity t (Zt-,dy). Notice that Zt affects the jump part both through the intensity as well as through the proportional jump sizes and it affects them in a predictable way. 3.3. Asset prices as diffusions sampled at the jump times of a jump process As was mentioned in the Introduction, on very small time scales the real asset prices do not change continuously over time, but rather only at discrete random points in time in reaction to trades and/or significant new information. This makes jump processes attractive also for modeling high frequency data and here we give a description of such a modeling approach according to Frey and Runggaldier (2001, 1999). Marked point processes as models for high frequency data were also studied independently by various authors in the recent literature [see, e.g., Geman, Madan and Yor (1999), Rogers and Zane (1998), Rydberg and Shephard (1999)]. The models in Frey and Runggaldier (2001, 1999) are more in the spirit of jump-diffusions in that they consider a combination, although not an additive one, of a diffusion and a jump process as follows: given is a background price process of the diffusion type and this process is then sampled according to the random jump times of a jump process. This setup allows also to incorporate a possible correlation between (stochastic) volatility and price jumps in the way mentioned in the previous section, by letting again Zt be a hidden process that drives the volatility of the background diffusion process and at the same time also the intensity of the (doubly stochastic) jump process that determines the random sampling times. In more formal terms, the logarithm t of the background price process is supposed to satisfy dt = vt (Zt)dwt (41) with wt a Wiener process independent of Zt . The process Zt is the hidden or latent state variable process that can be interpreted as modeling the rate at which new information is absorbed by the market. It may be given as a diffusion or as a finite state Markov process. Next consider a univariate doubly stochastic Poisson process (a Cox process) Nt with intensity t = t (Zt-). The time dependence of this as well as of v in (41) is introduced to incorporate systematic patterns in trading activity. The actual price process is now such that its logarithm Lt satisfies Lt = Tn-1 for t [Tn-1,Tn) (42) 186 W.J. Runggaldier with Tn the jump times of Nt . The given model can thus be interpreted as a stochastic volatility model, evaluated at random times Tn. It is easily seen that the process Lt in (42) satisfies dLt = (t - TNt - )dNt, (43) where TNt- is the time of the last jump strictly prior to t and it is thus a marked point process with local characteristics (t (Zt),N(0, t TNt vs ds)) where N(m,2) denotes a Gaussian r.v. with mean m and variance 2. Notice that we may choose an intensity of the form t (Zt) = (1) t + (2) t Zt (44) so that Nt can be seen as the sum Nt = N(1) t + N(2) t of two independent jump processes: N(1) t with deterministic intensity (1) t corresponding to noise trading and N(2) t corresponding to informed trading. One interesting aspect of the above model is that it makes it clear how sample path properties matter when it comes to volatility estimation: the volatility in a diffusion model, i.e., its quadratic variation, can be approximated arbitrarily well by the sum of the observed squared increments. For the given piecewise constant processes the empirical quadratic variation is useless for volatility estimation, even if computed over very small time inter- vals. We finally point out that the definition, that was given in Section 2 concerning a doubly stochastic Poisson process, in particular that t is F0-measurable, has as consequence the fact that Nt and Zt cannot have common jumps and that the actual trading activity, namely the realization of the point process Nt , does not affect the law of Zt . In economic terms this means that, in the given model, trading is caused purely by exogenous factors such as fundamental information, and not by the observed past trading activity. 4. Martingale measures: Existence and uniqueness (Market price of risk and market completion) In each of the models discussed in Section 3, individual asset prices are driven by at least two independent sources of randomness so that the corresponding market models are incomplete. Based on the extended Girsanov-type measure transformation recalled in Section 2.5, in this section we shall discuss existence and, where applicable, uniqueness of martingale measures, thereby exhibiting also the market price of (jump-diffusion) risk. Uniqueness of the martingale measure will be mainly related to completion of the market. We want to point out that, as will be shown in more detail in the next Section 5 on hedging, it is not necessarily true that, if a market is completed to yield a unique martingale measure, then it is also genuinely complete in the sense that every contingent claim can be duplicated by a self financing portfolio. In fact, for marked point process with an infinite mark space, i.e., with an infinite number of sources of randomness, it will be shown in Ch. 5: Jump-Diffusion Models 187 Section 5.1.2 that uniqueness of the martingale measure implies only some form of approximate completeness. In this Section 4 we shall limit ourselves to the jump-diffusion asset price and term structure models of Section 3.1. In Section 4.1 below we treat the case of jump-diffusion models for asset prices and show that the market can relatively easily be completed to yield a unique martingale measure if the jump part corresponds to a marked point process with a finite number of marks (multivariate point processes). For an infinite number of marks the situation is studied in more detail in Section 4.2 below in the context of term structure models. 4.1. The case of jump-diffusion asset price models We start with a jump-diffusion model, where the jump part corresponds to a univariate Poisson point process with P -intensity t , namely (see (34) for K = 1) dSt = St-[t dt + t dwt + t dNt] = St- (t + tt )dt + t dwt + t dMt (45) with (see (2)) Mt = Nt - t 0 s ds the P -martingale corresponding to Nt . The RadonNikodym derivative for an absolutely continuous change of measure from P to Q, that implies a translation of the Wiener by t and a change of the Poisson intensity from t to t t , is (see (27) for K = 1) Lt = exp t 0 (1 - s)s - 1 2 2 s ds + t 0 s dws + t 0 logs dNs . (46) Defining the Wiener and Poisson martingales wQ t and MQ t by (see (22) and (2)) dwQ t = dwt - t dt, dMQ t = dNt - t t dt (47) the dynamics of St under Q become dSt = St- (t + t t + tt t )dt + t dw Q t + t dM Q t . (48) Taking as numeraire the usual money market account Bt , where dBt = rtBt dt, we immediately see that Q is a martingale measure, i.e., a measure under which St := B-1 t St is a martingale, if t and t 0 are chosen such that t + t t + tt t = rt . From here we see that, for each pair (t,t ) with t 0 arbitrary and t = -1 t (rt - t - tt t ), (49) we obtain a martingale measure, i.e., we can obtain infinitely many martingale measures, one for each choice of t . Concerning the market price of risk t , from (45) and (49) we have t := t + tt - rt = tt - t t - tt t = -t t - tt(t - 1) (50) 188 W.J. Runggaldier from where we see that (-t) can be interpreted as risk premium per unit of diffusion volatility, whereas -t (t -1) can be interpreted as risk premium per unit of jump volatility. On an arbitrage-free market all assets have, at a given time t, the same diffusion- and jump-risk premia and they determine, via the Girsanov transformation, i.e., via (46), the equivalent martingale measure Q. We obtained infinitely many martingale measures because, for a single risky asset, we had two independent sources of randomness. One may thus expect that, by adding a further asset, one can complete the market to obtain a unique martingale measure. Consider then, in addition to St in (45), an asset with price St satisfying dSt = St-[ t dt + t dwt + t dNt]. (51) Notice that St could correspond to the price of a derivative asset with underlying St . In fact, if one is given the explicit expression of this derivative price in terms of St , i.e., St = F(t,St ), then (51) is straightforwardly obtained from (45) by use of the generalized Ito formula (19). Since the two risk premia t and t (t - 1) have to be the same for all assets, we may impose (49) on both assets with prices St and St respectively, namely t = -1 t (rt - t - tt t ) = -1 t (rt - t - tt t ) (52) from where one immediately gets t t = rt (t - t ) + (t t - t t) t t - t t . (53) Inserting this expression in (49) it follows that t = t( t - rt) - t(t - rt) t t - t t . (54) We have thus obtained unique risk premia and, consequently, a unique martingale measure provided the coefficients in (45) and (51) are such that t t - t t = 0 and that t t in (53) is positive. With the unique martingale measure we may expect to have also obtained a complete market in the sense that, by investing in a self financing way in the two assets with prices St and St , one can duplicate any claim. In Section 5.1.1 we shall show that, for the given market model, this is indeed the case. It is easily seen that, if the jump part in the jump-diffusion model corresponds to a multivariate Poisson process, i.e., if instead of (45) we have (see (34)) dSt = St- t dt + t dwt + K k=1 t(k)dNt(k) = St- t + K k=1 t(k)t(k) dt + K k=1 t(k)dMt(k) (55) Ch. 5: Jump-Diffusion Models 189 with Mt(k) = Nt (k) - t 0 s(k)ds, then the previous results admit a straightforward extension. In particular, (49) becomes t = -1 t rt - t - K k=1 t (k)t(k)t (k) (56) and the market price of risk is t = -tt - K k=1 t(k)t (k) t (k) - 1 . (57) This time the generic k-th term t(k)t (k)(t(k) - 1) on the right can be interpreted as risk premium per unit of jump volatility of type k. Again we obtain infinitely many martingale measures by choosing freely t (k) 0, (k = 1,...,K), and t according to (56). Having now K + 1 independent sources of randomness, we may expect that one can complete the market by adding K further assets to obtain a unique equivalent martingale measure. This can be done along the lines of (52)(54) although this time the calculations are more complicated and the conditions on the coefficients more cumbersome. Finally, we consider the more general model (33) (or, equivalently, (32)) with a possibly infinite number of marks. Using the P -martingale measure q() in (8), by analogy to (45) and (55) we may rewrite (33) as dSt = St- t dt + t dwt + E (t,y)p(dt,dy) = St- t + E (t,y)t(dy) dt + t dwt + E (t,y)q(dt,dy) . (58) Using the particular form of the intensity given in (9), we also have E (t,y)t(dy) = tt with t = E (t,y)mt(dy) (59) and so (58) becomes, quite analogously to (45), dSt = St- (t + tt )dt + t dwt + E (t,y)q(dt,dy) . (60) Consider then, instead of (46), the more general Radon­Nikodym derivative (25) that we rewrite here in the form analogous to (46) as Lt = exp t 0 (1 - s hs)s - 1 2 2 s ds + t 0 s dws + t 0 log shs(Ys) dNs , (61) 190 W.J. Runggaldier where hs = E hs(y)ms(dy). Define next the Wiener and jump martingales wQ t and qQ(dt,dy) by (see (22) and (8), (9) as well as (47)) dwQ t = dwt - t dt, qQ(dt,dy) = p(dt,dy) - t t ht(y)mt(dy)dt. (62) The dynamics of St under Q then become dSt = St- (t + t t + tt t )dt + t dw Q t + E (t,y)qQ (dt,dy) , (63) where t = E (t,y)ht(y)mt(dy). The measure Q is now a martingale measure if t and t 0 as well as ht (y) 0 are chosen so that t + t t + tt t = rt , which leads to the following relation corresponding to (49) t = -1 t (rt - t - tt t ). (64) Again, this leads to infinitely many martingale measures but, unless the mark space is finite, to complete the market in order to obtain a unique equivalent martingale measure one needs infinitely many assets. We shall discuss this situation in more detail in the context of bond markets in the next subsection. To complete the analogy with the previous cases, notice that this time the market price of risk becomes (by (60) and (64)) t := t + tt - rt = t t - t t - tt t = -tt - t (tt - t) = -t t - t E (t,y) t ht(y) - 1 mt(dy). (65) This time one may interpret [tht (y)-1]mt(dy) as risk premium per unit of jump volatility of type y. In this latter context of a more general model of type (45) we want to point out that a methodology to obtain all equivalent martingale measures has also been worked out in Prigent (2001). We close this subsection by mentioning that, depending on the purpose, one can single out some specific martingale measures among the various possible ones in a jump-diffusion model, where the market has not been completed. As an example, the construction of the so-called minimal martingale measure in a univariate Poisson jump diffusion model can be found in Runggaldier and Schweizer (1995). From a more practical point of view, an obvious possibility is always that of calibrating the model to market data. 4.2. The case of jump-diffusion term structure models Consider first a term structure model where, under a given measure P , the (continuously compounded) forward rates f (t,T ) and the (zero coupon) bond prices p(t,T ) satisfy (36) and (37) respectively, namely df (t,T ) = (t,T )dt + (t,T )dwt + E (t,T ;y)p(dt,dy), (66) Ch. 5: Jump-Diffusion Models 191 dp(t,T ) = p(t-,T ) m(t,T )dt + v(t,T )dwt + E n(t,T ;y)p(dt,dy) . (67) We shall also make the ad hoc assumptions that all objects are specified in a way to guarantee the validity of the various operations that will have to be performed, such as differentiation under the integral sign and interchange of the order of integration. For later use we recall from Björk, Kabanov and Runggaldier (1997) the relationship between the coefficients in (66) and (67): if f (t,T ) satisfies (66), then p(t,T ) satisfies (67) with m(t,T ) = r(t) + A(t,T ) + 1 2 S(t,T ) 2 , v(t,T ) = S(t,T ), n(t,T ;y) = eD(t,T ;y) - 1, (68) where r(t) = f (t,t) is the short rate and A(t,T ) = - T t (t,s)ds, S(t,T ) = - T t (t,s)ds, D(t,T ;y) = - T t (t,s;y)ds. (69) In the given bond market there are, at least theoretically, infinitely many assets, namely the bonds for all possible maturities T > t. A martingale measure Q is now a measure under which all these bond prices, discounted with respect to the money market account, are (local) martingales. We are therefore not even sure whether in such a given market model there exists a martingale measure and so our first purpose is to investigate the existence of such a measure. Following essentially Björk, Kabanov and Runggaldier (1997) and considering general marked point processes, we also take the general form of the Radon­Nikodym derivative Lt , namely (see (24) where, for simplicity, we put t 1) dLt = Lt t dwt + Lt- E hs(y) - 1 q(ds,dy), (70) where (see (8) and (9)) q(ds,dy) = p(ds,dy) - sms(dy), (71) i.e., we assume that, under P , the local characteristics of the marked point process p(ds,dy) are (t ,mt(dy)). By Theorem 2.5 we know that, under the measure Q that corresponds to Lt in (70), the local characteristics become (t ,ht (y)mt(dy)) so that, defining (see also (62)) dw Q t = dwt - t dt, qQ(dt,dy) = p(dt,dy) - t ht(y)mt(dy)ds (72) 192 W.J. Runggaldier the bond prices p(t,T ) satisfy, under Q, the dynamics dp(t,T ) = p(t-,T ) m(t,T ) + v(t,T )t + t E n(t,T ;y)ht(y)mt(dy) dt + v(t,T )dw Q t + E n(t,T ;y)qQ (dt,dy) . (73) A necessary condition for the existence of martingale measure Q is then that there exist a predictable process t and a predictable E-indexed process ht(y) 0 such that the conditions of Theorem 2.5 hold and m(t,T ) + v(t,T )t + t E n(t,T ;y)ht(y)mt(dy) = r(t). (74) Notice that this implies for the market price of risk a relation analogous to (65), namely t := m(t,T ) + t E n(t,T ;y)mt(dy) - r(t) = -v(t,T )t - t E n(t,T ;y) ht (y) - 1 mt(dy). (75) We shall now translate condition (74), involving the coefficients of (67), into a condition involving the coefficients of (66), namely of the forward rates. Using (68), condition (74) becomes A(t,T ) + 1 2 S(t,T ) 2 + S(t,T )t + E hs(y)(t,T ;dy) = 0 (76) with (t,T ;dy) := (eD(t,T ;y) - 1)tmt(dy) and with A, S and D as in (69). When building a term structure model it is often convenient to specify all objects directly under a martingale measure Q and this obviously imposes some restrictions on the coefficients in the models. Concentrating on forward rates, assume that we want model (66) to be valid under a martingale measure Q, i.e., we are postulating that P = Q and so we have to choose t 0, ht(y) 1. Notice now that (76) has to hold for all maturities so that, inserting the above choices of t and ht(y) and differentiating with respect to T , we obtain (using also (69)) the following necessary condition (t,T ) = (t,T ) T t (t,s)ds - E (t,T ;y)eD(t,T;y) tht (y)mt(dy) (77) which is a clear extension of the classical Heath­Jarrow­Morton drift condition for the pure diffusion case. Having investigated the existence of a martingale measure, we may next look for conditions implying its uniqueness. Concentrating again on forward rates, a necessary condition for the existence of a martingale measure has been seen to be the existence of a predictable t and a predictable E-indexed ht(y) 0 such that relation (76) holds. Quite obviously then, if (76) admits a unique solution in t and ht(y), the martingale measure is unique. Ch. 5: Jump-Diffusion Models 193 To formalize this fact, consider the following linear operator [for technical details, that for simplicity we neglect here, we refer to Björk, Kabanov and Runggaldier (1997)] Kt : ,h(y) S(t,) + E h(y) eD(t,;y) - 1 t mt(dy). (78) The operator Kt is an integral operator of the first kind and we refer to it as martingale operator. The martingale measure is then unique if and only if, dP, dt-a.e., we have KerKt = 0. (79) We may now wonder whether, in the present context of infinitely many sources of randomness, the uniqueness of the martingale measure implies completeness in the sense that every contingent claim can be replicated by a self financing portfolio. The answer is no; in fact, as we shall mention in Section 5.1.2 below, we obtain only a form of approximate completeness. We finally remark that the relationship (39) between discretely and continuously compounded forward rates has allowed Glasserman and Kou (1999) to carry over the just mentioned results for continuously compounded forward rates also to the case when one has simple forwards instead. In fact, a model of the term structure of simple forwards L(t,T ) (see (38)) is defined in Glasserman and Kou (1999) to be arbitrage-free, if it can be embedded in an arbitrage-free model of instantaneous forwards f (t,T ) via (39). 5. Hedging in jump-diffusion market models In the previous section we have seen that, as a consequence of its incompleteness, in a jump-diffusion market model we have in general infinitely many martingale measures. We have then investigated the method of market completion as a tool to obtain a unique martingale measure. On the other hand, from the second fundamental theorem of asset pricing one has that, in general, if a market admits a unique equivalent martingale measure, then it is also complete in the sense that every contingent claim can be hedged by a self financing portfolio. We shall investigate the hedging problem in a jump-diffusion market model having in mind two goals: for the first goal, in the context of asset price models, we shall show in Section 5.1.1 that completed market models with a unique martingale measure are complete also in the sense of hedging if there are only a finite number of marks for the jumping component (there is a finite number of sources of randomness). If however there are an infinite number of marks (an infinite number of sources of randomness) then, in the context of bond markets, in Section 5.1.2 we shall show that the completed market models with a unique martingale measure are only approximately complete in the sense of hedging. In the context of the first goal we also want to add here that Jensen (1999) approximates a given jump-diffusion market model, having an infinite number of marks, by a sequence of jump-diffusion models with a finite number of marks that are therefore complete also in the sense of hedging. 194 W.J. Runggaldier For the second goal, in Section 5.2 we shall consider the case when one cannot have a complete market or when it is not appropriate to complete it. In such a case one has to determine the hedging strategy according to some specific hedging criterion. We shall consider the (local) risk minimization and the related minimum variance criteria and show that they lead to hedging strategies that are quite natural extensions of those in complete markets. While so far only the models of Section 3.1 have been further investigated, the discussion in Section 5.2 will center mainly around the model of Section 3.3. In part, this section can also be seen as preliminary to the next Section 6 on pricing. In fact, if a market is complete in the sense of hedging, then by the criterion of absence of arbitrage the initial value of the self financing and hedging strategy has to correspond to the arbitrage-free price of the contingent claim. If the market cannot be completed, the criterion of absence of arbitrage alone is not sufficient to define a price and the preference structure of the investors has to come into play. Since, typically, the initial value of a hedging portfolio satisfying a specific hedging criterion can be expressed as expectation of the discounted claim under a specific martingale measure, the choice of a hedging criterion implies also the choice of a martingale measure and thus of a pricing kernel. We shall discuss these issues in more detail in Section 6.1 below. 5.1. Hedging when the market is completed 5.1.1. Asset-price models In this subsection we consider the univariate jump-diffusion model of Section 4.1. We had seen that, considering in addition to the asset with price St satisfying (45), also the asset with price St satisfying (51) with coefficients such that t t in (53) is positive and t t - t t = 0, then there exists a unique martingale measure Q corresponding to the choice of t and t according to (53) and (54). Basing ourselves on Jeanblanc-Piqué and Pontier (1990), we show now that in this situation any claim can be hedged with a self financing portfolio. Given a maturity T , consider as claim a (square-integrable) random variable HT , measurable with respect to FT , where Ft := {S0,S0,ws,Ns, s t}, completed with the null sets. In addition to the two risky assets with prices St and St , we suppose given also a nonrisky asset, whose price we take for simplicity identically equal to 1 (equivalent to assuming all prices discounted with respect to the nonrisky asset). An investment strategy is then a triple t = [t, t ,t], where t denotes the number of units of the nonrisky asset held in the portfolio at time t and t, t are the number of shares of the two risky assets respectively. Let t , t be predictable and t be adapted. The value, at time t, of a portfolio corresponding to the strategy is then V(t) = t St + tSt + t. (80) We want to be such that the corresponding portfolio is self financing and duplicates the claim, i.e., that it satisfies dV(t) = t dSt + t dSt, V(T ) = HT . (81) Ch. 5: Jump-Diffusion Models 195 It follows from Section 4.1 that, under the unique martingale measure Q, the discounted prices of the two risky assets, that for simplicity we continue denoting by St and St , are the martingales satisfying dSt = St- t dw Q t + t dM Q t , dSt = St- t dw Q t + t dM Q t , (82) where w Q t and M Q t are as in (47) with t t and t according to (53) and (54). Replacing dSt and dSt from (82) in (81), it follows that also V(t) is a (Q,Ft)-martingale satisfying V(t) = V(0) + t 0 [sSss + sSs s]dwQ s + t 0 [sSs-s + sSs- s]dMQ t . (83) Consider next the (Q,Ft)-martingale M(t) := EQ {HT |Ft}. (84) By the martingale representation theorem (see Theorem 2.3 applied here to the particular case of a univariate Poisson point process) there exist two Ft -predictable processes (1) t and (2) t such that M(t) = M(0) + t 0 (1) s dwQ s + t 0 (2) s dMQ s . (85) Comparing (83) and (85), one sees immediately that, by putting V(0) = M(0) = EQ {HT |F0} (86) and choosing t, t such that (integrating with respect to a Wiener process one may change St into St-) t St-t + tSt- t = (1) t , t St-t + tSt- t = (2) t (87) we have V(t) = M(t). Since M(T ) = HT by definition, with the choices (86) and (87) we obtain a self financing and hedging strategy (the value of t follows from (80)). Notice that, in order to obtain a unique solution of (87), we have to require that t t - t t = 0, which is exactly one of the conditions required after (53) and (54) to obtain a unique equivalent martingale measure. What we have just shown is an existence result leading to the completeness (in the sense of hedging) of the given market when the martingale measure is unique. To actually determine the hedging strategy, we need an explicit expression for the processes (1) t and (2) t that, in the case of a simple claim of the form HT = H(ST ,ST ), can be obtained by analogy to the pure diffusion case using the generalized Ito formula (19). Due to the Markov property of (St ,St ), we may in fact put M(t) = M(t;St ,St) = EQ H(ST ,ST )|Ft . (88) 196 W.J. Runggaldier Formula (19) then leads to dM(t) = Mt() + 1 2 MSS()S2 t-2 t + 1 2 MS S()S 2 t- 2 t + MS SSt-St-t t + M t;St-(1 + t),St-(1 + t) - M(t;St-,St-) - MS()t - MS() t t t dt + MS()St t + MS()St t dwQ t + M t;St-(1 + t),St-(1 + t) - M(t;St-,St-) dM Q t . (89) Since M(t) is a Q-martingale, the drift (finite variation) term in (89) has to vanish and so it follows from (89) and (85) that (1) t = MS(t;St ,St )St t + MS(t;St ,St )St t, (2) t = M t;St-(1 + t),St-(1 + t) - M(t;St-,St-). (90) For a related result see also Shirakawa (1990). We conclude this subsection by pointing out that, analogously to Section 4.1, the procedure that we have described here for the case of a univariate point process can quite naturally be extended to the case of multivariate point processes, provided the market is completed with the addition of an appropriate number of further assets. 5.1.2. Term structure models We consider the term structure model discussed in Section 4.2 assuming that the condition for uniqueness of the martingale measure given by the injectivity (see (79)) of the integral operator Kt in (78) is satisfied. This subsection is mainly based on Björk, Kabanov and Runggaldier (1997) [see also Jarrow and Madan (1999) for a related approach]. In this market, where the basic assets are zero-coupon bonds with prices p(t,T ) for any maturity T > t in addition to a nonrisky asset (money market account Bt ), we have first to define a portfolio. Definition 5.1. On the given bond market a portfolio is a pair (t,t (dT )) where (i) t is predictable; (ii) t, t () is a signed finite measure on [t,). Intuitively, t is the number of units of the riskfree asset held in the portfolio at time t, t (dT ) is the "number" of bonds, with maturities in [T,T + dT ), held at time t. Some integrability assumptions are also required, but we leave them here as implicit. The value process of the portfolio (,), discounted with respect to Bt , is Vt(,) = t + t p(t,T )t (dT ) (91) where, with some abuse of notation, we denote by p(t,T ) also the discounted value of a T -bond. Ch. 5: Jump-Diffusion Models 197 Definition 5.2. The portfolio (,) is self-financing if dVt(,) = t t (dT )dp(t,T ). (92) The integral in the right-hand side in (92) needs an appropriate definition. Justified by the development in Björk et al. (1997), we shall simply replace here dp(t,T ) in (92) by its expression under the (unique) martingale measure. To obtain this expression, recall the condition (77) (or, equivalently, (76) with t = 0, ht(y) = 1) on the coefficients of the forward rate dynamics in order that these dynamics hold under a martingale measure. Translating, via (68), these conditions back to the bond price dynamics and taking also into account the definition of qQ(dt,dy) in (72), one has dp(t,T ) = p(t-,T ) S(t,T )dw Q t + E eD(t,T ;y) - 1 qQ (dt,dy) (93) (recall that we take here for p(t,T ) the discounted values). Given a contingent claim HT FT , that we assume here to be bounded, the conditions for self financing and perfect hedging can be expresses as (combining (92) with (93)) Vt(,) = V0(,) + t 0 s s(dT )p(s,T )S(s,T )dw Q s + t 0 E s s(dT )p(s-,T ) eD(s,T ;y) - 1 qQ(ds,dy), VT (,) = HT , (94) where the inner integral is with respect to T and the outer with respect to s. Paralleling the development in the previous Section 5.1.1, consider next the (Q,Ft)- martingale M(t) := EQ {HT |Ft} (95) which, by the martingale representation Theorem 2.3, admits the representation (see (12) under the measure Q) M(t) = M(0) + t 0 s dwQ s + t 0 E H(s,y)qQ (ds,dy) (96) for predictable (and appropriately integrable) and H . Comparing (94) with (96) one sees that, by putting V0(,) = M(0) = EQ {HT |F0} (97) and choosing t(dT ) such that t t (dT )p(t,T )S(t,T ) = t, t t (dT )p(t-,T ) eD(t,T ;y) - 1 = H(t,y) (98) 198 W.J. Runggaldier we have Vt(,) = M(t) and, in particular, VT (,) = HT , i.e., we have obtained a self financing and hedging strategy (the value of t follows from (91)). Everything now hinges upon the (unique) solvability of (98). To this effect consider the integral operator K t implicit in the left-hand side of (98), namely K t : t p(t,T )S(t,T )(dT ) t p(t-,T ) eD(t,T ;) - 1 (dT ) (99) so that the conditions (98) become K t = t H(t,) . (100) The integral operator K t will be called hedging operator and the market is complete if K t is surjective. Combining this result with that of Section 4.2 on the uniqueness of the martingale measure, namely (79), we may synthesize them into Proposition 5.3. For the given term structure model (66), (67) we have that (i) the martingale measure is unique, if the martingale operators Kt in (78) are injective; (ii) the market is complete if the hedging operators K t in (99) are surjective. It turns out that the operators K t are adjoint to Kt . If the spaces, on which they act, are finite-dimensional, then the injectivity of Kt implies surjectivity of K t and thus that uniqueness of the martingale measure implies completeness. Unfortunately, our spaces here are infinite-dimensional and so, due to the duality relationship (KerK) = cl(ImK) between bounded linear operators, the injectivity of Kt implies denseness of K t . In other words, the uniqueness of the martingale measure implies only an approximate completeness. For details we refer to Björk, Kabanov and Runggaldier (1997). For the case when the mark space E is infinite, Björk, Kabanov and Runggaldier (1997) also give a characterization of the hedgeable claims, based on a Laplace-transform technique and under assumptions that hold, e.g., in the case of an affine term structure. When the mark space E is finite, in Björk, Kabanov and Runggaldier (1997) it is furthermore shown that, under appropriate assumptions, any claim can be hedged with a finite number of bonds, whose maturities can be chosen in an essentially arbitrary way and such that they remain fixed as the running time t varies. 5.2. Hedging when the market is not complete If one cannot have a complete market or market completion is not appropriate, one has to accept some residual risk, due either to non-self-financing or nonperfect hedging, and choose an investment strategy that minimizes the unhedgeable risk. For this purpose various criteria have been proposed and here we describe one such criterion for the case of a slight variant of the market model described in Section 3.3. We assume here that the actual price St of the risky asset satisfies a model of the form of (41), namely dSt = St vt (Zt)dwt, (101) Ch. 5: Jump-Diffusion Models 199 where Zt is supposed to be a diffusion-type process of the form dZt = t (Zt)dt + t(Zt)dwt (102) for a Wiener wt , independent of wt . Given a univariate, doubly stochastic Poisson process Nt with intensity t = t (Zt), suppose that the prices of the risky asset can only be observed at the jump times Tn of Nt , i.e., the observation process Yt is given by (see (43)) dYt = (St - STNt)dNt (103) so that the information of the hedger can be modeled by the filtration FY t = {Ns,Ys; s t} Ft = {S0,Z0,ws,ws,Ns; s t}. Notice that the only difference with respect to the model described in Section 3.3 is that here the actual price process St varies continuously in time according to (101), but is observed only at the discrete time points Tn; there, the process according to (101) is only a background process and the actual price process is given by the values of the background process, sampled at the time points Tn according to (103). Notice also that, according to (101), the process St is implicitly assumed to be a (P,Ft )-martingale. On one hand, this will make our hedging procedure below applicable; on the other hand it can be justified by assuming that [see, e.g., Becherer (2001)] St is discounted with respect to a P -numeraire portfolio, which is a tradable numeraire such that the discounted assets become martingales with respect to the original measure P . Our hedging criterion will be that of (local) risk minimization according to Föllmer and Sondermann (1986), Föllmer and Schweizer (1991), that keeps the requirement of perfect hedging and relaxes the self financing requirement into mean self financing. More precisely, considering as strategy a pair (t,t ) of FY t -predictable processes with t and t denoting the number of units of the numeraire and the given asset respectively, that are held in the portfolio at time t, we give the following Definition 5.4. Assuming prices are discounted with respect to the numeraire, define Vt = Vt(,) := tSt + t as value process, Ct = Ct(,) := Vt - t 0 s dSs as cost process. Notice that, if Ct(,) = const., the strategy (,) is self financing. We shall now relax this assumption by allowing Ct (,) to be a (P,FY t )-martingale and, given a (squareintegrable) claim H(ST ) (already discounted with respect to the numerarire), determine a hedging strategy (,) that, for all t = Tn (n = 1,2,...), minimizes RY t (,) := E CT (,) - Ct (,) 2 |FY t (104) with respect to the hedging strategies (,) for which Ct (,) is a (P,FY t )-martingale. The strategy (,) will be called an FY t -risk minimizing strategy. Notice that there is a close relationship between risk minimizing strategies in the just specified sense and variance-minimizing strategies that are self financing and minimize the variance of the residual hedging error. 200 W.J. Runggaldier To compute an FY t -risk minimizing strategy we shall proceed in two steps following Frey and Runggaldier (1999) [see also Fischer, Platen and Runggaldier (1999) and Frey (2000)]. In the first step we determine an Ft -risk minimizing strategy, namely a risk minimizing strategy where the (hypothetical) information of the hedger corresponds to the full filtration Ft , instead of the subfiltration FY t . For this purpose define the P -martingale g(t,St ,Zt) := E H(ST )|Ft , (105) where the notation is justified by the Markov property of (St ,Zt). Assuming sufficient regularity of g(), we proceed analogously to the last part of Section 5.1.1 applying Ito's formula to g(t,St ,Zt) thereby obtaining H(ST ) = g(0,S0,Z0) + T 0 gt () + gZ()t () dt + T 0 1 2 gSS()vt ()S2 t + 1 2 gZZ()2 t () dt + T 0 gS()dSt + T 0 gZ()t()dwt. (106) Since g(t,St ,Zt) is a P -martingale, the finite variation terms in (106) vanish, leading to H(ST ) = g(0,S0,Z0) + T 0 gS(t,St ,Zt)dSt + MH T (107) which is of the form of a Kunita­Watanabe decomposition of H(ST ), namely a decomposition of the form H(ST ) = H0 + T 0 H t dSt + MH T , (108) where MH is a P -martingale that, due to the independence of wt and wt , is orthogonal to the P -martingale S. It then follows from Föllmer and Sondermann (1986) and Föllmer and Schweizer (1991) that the Ft -risk minimizing strategy is given by F t = H t = gS(t,St ,Zt), F t = g(t,St ,Zt) - F t St (109) so that Vt(F ,F ) = g(t,St ,Zt). This strategy appears as a very natural extension of the classical Black Scholes strategy in the pure diffusion case. Notice that, to actually determine (F t ,F t ) and its value, one needs to compute g(t,St ,Zt), which can be achieved either by computing the expectation in (105) (numerical simulations may be used) or by solving the PDE that results from (106) by setting equal to zero the finite variation terms. Details can be found in Frey and Runggaldier (1999). Coming to the second step, it follows from a general result in Schweizer (1994) [see also Di Masi, Platen and Runggaldier (1995)] that the FY t -risk minimizing strategy is obtained by projecting the Ft -risk minimizing strategy onto the subfiltration FY t . This projection Ch. 5: Jump-Diffusion Models 201 property, which is due to the quadratic nature of the risk minimizing criterion, makes this latter criterion very attractive every time one has to deal with partial information. More precisely, the FY t -risk minimizing strategy (,) is given by t = E vt(Zt)S2 t F t (St ,Zt)|FY t- E vt(Zt)S2 t |FY t- , t = E H(ST ) - t St |FY t . (110) Notice that, according to the model, the hedger will compute the strategy (,) only at the jump times Tn of Nt , when he receives new information [for details and a stochastic filtering-type algorithm to compute the projection in (110) see again Frey and Runggaldier (1999)]. We close the section mentioning that, for a standard jump-diffusion model of the type of Section 3.1.1 with a marked point process, a self financing strategy that minimizes the variance of the residual hedging error can be found in Chapter 7 of Lamberton and Lapeyre (1997). 6. Pricing in jump-diffusion models 6.1. General aspects With the introduction of jumps and/or stochastic volatility the market becomes incomplete. Consequently, the principle of absence of arbitrage does not lead to a uniquely defined price. One obtains actually an entire range of prices [see Eberlein and Jacod (1997), Bellamy and Jeanblanc (2000)] and the preference structure of the investors has to come into play to determine the pricing measure. From the point of view of pure pricing, the problem then reduces to determining a specific martingale measure or, equivalently, the market price of risk. To this effect there are various possibilities and in this section we mention some of them, the last two of which will be discussed in more detail. (i) Historically it appears that a first approach to pricing in markets that are incomplete due to jumps in the prices and to a jumping volatility has been based on general equilibrium with a representative agent [see, e.g., Ahn and Thompson (1988), Naik and Lee (1990), Ahn (1992)]. (ii) A somewhat related and rather recent approach is that of pricing by utility maximization, in which the density of the martingale measure (the pricing kernel) is related to the marginal utility of terminal wealth [see, e.g., Frittelli (2000) and the references therein; for a specific jump-diffusion setting see Miyahara (1998)]. (iii) An alternative possibility is given by more econometric-type approaches based on estimating/filtering the market price of risk on the basis of market data. Related to such an approach is the approach described in Herzel (1998) for a diffusion model with a volatility that may jump at a random time and where the price of a European call turns out to be a monotone function of a parameter characterizing the martingale measures. There exists then a unique consistent with the option price thus allowing to price all the other derivatives consistently with this option. This corresponds basically to completing the market with the given option. 202 W.J. Runggaldier (iv) Approaches based on market completion. In the previous Section 4.1 we have discussed various ways to complete both stock as well as bond markets of the jumpdiffusion type. As we have seen, this completion leads always to a unique martingale measure, but it does not necessarily imply also completeness in the sense that every claim can be hedged with a self financing portfolio. On the other hand, the uniqueness alone of an equivalent martingale measure is already sufficient to obtain a unique arbitrage-free price of a claim as the expectation of its discounted value under this measure. In all cases where one achieves also completeness in the sense of hedging (essentially all cases except when there are an infinite number of sources of randomness) then, always by absence of arbitrage, the (unique) initial value of the self financing and hedging portfolio has to coincide with the price computed as expectation under the unique martingale measure. The approach based on market completion has been widely used an implemented in various economic setups and here we mention just Shirakawa (1990, 1991), Jeanblanc-Piqué and Pontier (1990), Naik (1993), Mercurio and Runggaldier (1993), Jarrow and Madan (1995, 1999). It has the advantage to lead to a unique price on the basis of the principle of absence of arbitrage alone, without having to make assumptions on a non-priced jump risk and without the need to introduce a general equilibrium model. On the other hand it requires that the stochastic evolution of more than just the underlying asset has to be specified and, without specific criteria, the completion may occasionally be rather arbitrary. (v) In the previous Section 5, in the context of hedging it was mentioned that, if the market cannot be completed, then one has to accept some residual risk and it becomes natural to determine the hedging strategy on the basis of a risk minimization criterion. On the other hand, in the previous point (iv) we recalled the fact that, in a complete/completed market the initial value of a self financing and hedging portfolio has to coincide with the arbitrage-free price of the claim. By analogy, it appears then natural to define as price of a claim in a noncomplete market the initial value of a portfolio minimizing a given hedging criterion. Quite typically, the initial value of such a portfolio turns out to be the expectation of the discounted value of the given claim under a specific martingale measure. In other words, there is a correspondence between hedging criteria and martingale measures and the choice of a specific pricing measure can be based on the choice of a specific hedging criterion. An approach along these lines appears thus related to the pricing approach by utility maximization, mentioned in point (ii) above. As an example, let us point out that the criterion of risk minimization discussed in Section 5.2 leads to the so-called minimal martingale measure that was already mentioned at the end of Section 4.1. It has been further shown in Runggaldier and Schweizer (1995) that, if in a jump-diffusion model claims are priced according to the minimal martingale measure, then convergence of asset prices implies convergence of option prices. This stability result for prices computed according to the minimal martingale measure makes the risk minimization criterion discussed in Section 5.2 an attractive criterion for hedging. [For further extensions of this stability property see Prigent (1999), Hubalek and Schachermayer (1998).] Ch. 5: Jump-Diffusion Models 203 6.2. Computational aspects Assume that for a jump-diffusion model we have selected a specific martingale measure according to one of the approaches mentioned in the previous Section 6.1. We have then to compute the expectation of the (discounted value of the) claim under this martingale measure. In this section we shall mention some of the possible methods to accomplish this. We consider first the univariate jump-diffusion model (45) under a generic martingale measure Q with intensity of the Poisson process Nt given by t t . If Q corresponds to the unique martingale measure obtained from a market completion as in Section 4.1, then t t has to be taken according to (53). For simplicity we assume that all the prices are already discounted and so we can put rt 0. The dynamics of St under Q are given by (see (48), (49), (47)) dSt = St- -tt t dt + t dwQ t + t dNt . (111) We want to compute the value of a European call option, namely EQ{(ST -K)+}. For this purpose we adapt an approach from Mercurio and Runggaldier (1993), assuming first that in (111) we have t , i.e., the jump coefficient is constant [for this case see also Aase (1988)]. We have EQ (ST - K)+ = EQ EQ (ST - K)+ |NT . (112) For a fixed k, i.e., when NT = k (k = 0,1,...), using the exponential formula (17) for the specific case when (14) is given by (111), we have S(k) T = S0 ek log(1+ ) exp - T 0 ss + 1 2 2 s ds + T 0 s dwQ s (113) namely logS(k) T N ;mT ,2 T (114) with mT = logS0 + k log(1 + ) - T 0 ss + 1 2 2 s ds, 2 T = T 0 2 s ds, (115) i.e., S(k) T is lognormal with mean and variance given by mT and T respectively. Next compute (with () the cumulative standard Gaussian distribution function) V (k) 0 := EQ S0 S(k) T - K + = + logk ex - K dN x;mT ,2 T dx = 1 22 T + logk ex e - 1 22 T (x-mT )2 dx - K 22 T + logk e - 1 22 T (x-mT )2 dx 204 W.J. Runggaldier = emT - 1 2 2 T mT + 2 T - logK T - K mT - logK T := (1 + )k G(k,S0) (116) with G(k,S0) = S0 exp - T 0 ss ds (x) - K (1 + )k (y), (117) where x = log(S0(1 + )k/K) T 0 (- ss + 1 22 s )ds T 0 2 s ds , y = x - T 0 2 s ds. (118) Coming back to (112) we then have EQ (ST - K)+ = EQ V (NT ) 0 = k=0 (1 + )k G(k,S0) Hk k! e-H (119) with H = T 0 ss ds. Notice that, for actual computations, the infinite sum in the right in (119) has to be truncated at a sufficiently large positive integer. The result for t can be easily extended to the case when t is a piecewise constant deterministic time function. To this effect, given a positive integer m and a subdivision 0 = tm 0 < tm 1 < < tm m = T , let (m) t = 01{0}(t) + m j=1 j 1(tm j-1,tm j ](t); j > -1. (120) Furthermore, let Pj (j = 1,...,m), be independent Poisson random variables with parameters Hj = tm j tm j-1 ss ds. The generalization of formula (119) is then EQ (ST - K)+ = k1,...,km=0 exp m j=1 kj log(1 + j ) G(k1,...,km,S0) m j=1 (Hj )kj (kj )! e-Hj (121) with G(k1,...,km,S0) = S0 exp - T 0 (m) s ss ds (x) - K m j=1(1 + j )kj (y), (122) Ch. 5: Jump-Diffusion Models 205 where x = log(S0 m j=1(1 + j )kj /K) T 0 (- (m) t ss + 1 2 2 s )ds T 0 2 s ds , y = x - T 0 2 s ds. (123) Coming finally to the case of a more general deterministic time function t for the jump coefficient, we assume that there exist piecewise constant deterministic time functions (m) t and (m) t such that (m) t t, (m) t t as m . (124) Consider then a sequence of fictitious risky assets, whose (discounted) values S(m) t are martingales with respect to the same martingale measure Q as is St in (111), namely they satisfy dS (m) t = S (m) t- - (m) t t t dt + (m) t dw Q t + (m) t dNt . (125) For each of the processes S (m) t we can compute v(m) 0 = EQ S(m) T - K + (126) according to (121)­(123). In Mercurio and Runggaldier (1993) it is now shown that lim m v (m) 0 = v0 = EQ (ST - K)+ , (127) i.e., if t is a generic time function, that can be approximated from below by a sequence of piecewise constant time functions, then the corresponding option value can be approximated arbitrarily closely by computable expressions. In Mercurio and Runggaldier (1993) it is also shown that, for given m, v(m) 0 can be interpreted as initial value of a mean self financing and risk minimizing portfolio in the sense of Section 5.2 when the asset price evolves in discrete time according to the process S (m) t of (125), evaluated at the discrete time points tj . In line with the last part of point (v) of the previous Section 6.1, we may thus consider the approximating values v (m) 0 as option values themselves, computed according to the minimal martingale measure. After having discussed the univariate jump-diffusion model (45), we turn now to the general jump-diffusion model with a marked point process and which can equivalently be represented either by (32) or (33). We opt here for the representation (32). i.e., dSt = St- t dt + t dwt + (t,Yt )dNt . (128) In what follows we shall make the further Assumption 6.1. (i) (t,Yt ) (Yt ), i.e., is independent of the current time; 206 W.J. Runggaldier (ii) considering the representation of the marked point process as double sequence (Tn,Yn), assume that Tn is independent of Yn and the Yn form a sequence of independent random variables, the generic one Yn having law m(dy). The driving marked point process has thus local P -characteristics (t ,m(dy)). Suppose that we have chosen a specific martingale measure Q and that we want to compute v0 = EQ{H(ST )} where, typically, we may have H(S) = (S - K)+. For this purpose, in what follows we adapt a procedure from Chapter 7 in Lamberton and Lapeyre (1997). Recall first from Theorem 2.5 that a general absolutely continuous measure transformation from P to Q transforms the P -local characteristics into Q-local characteristics of the form (t t ,ht (y)m(dy)). Recalling furthermore (63) with (62) and (64), it is easily seen that, under the measure Q corresponding to the above local characteristics, the discounted value of St satisfies dSt = St- -t t dt + t dw Q t + (Yt)dNt , (129) where we have put t = E (y)ht(y)m(dy) and t = t t . Using the exponential formula (17) to integrate (129), that is of the form of (14) with the representation (15), one immediately finds that, for a given initial asset price S0, the value v0(S0) of the claim H(ST ) is given by v0(S0) = EQ H S0 exp - T 0 t t + 2 t 2 dt + T 0 t dwQ t NT n=1 1 + (Yn) . (130) Next let V (S0) := EQ H S0 exp - T 0 2 t 2 dt + T 0 t dwQ t (131) so that, for H(S) = (S - K)+, the V (S0) is given by the Black­Scholes formula, i.e., V (S0) = BS(S0). With the use of V (S0) we can now write v0(S0) = EQ V S0 exp - T 0 t t dt NT n=1 1 + (Yn) = k=0 EQ V S0 exp - T 0 t t dt k n=1 1 + (Yn) Hk k! e-H , (132) where, due to the local characteristics under Q, we have H = T 0 ss ds and where the expectation is with respect to the joint distribution of the Yn that in Assumption 6.1 were supposed to be independent. This latter expectation can be explicitly computed in special Ch. 5: Jump-Diffusion Models 207 cases, in more complicated cases one has to use simulations. Again, for the actual computations, the infinite sum has to be truncated at a sufficiently large positive integer. We close this section by mentioning that in Glasserman and Kou (1999), for the term structure models of simple forwards in the jump-diffusion setup described therein, the authors study the pricing of some derivative securities after having characterized arbitragefree dynamics. The derivative prices are also used to investigate what types of patterns in implied volatilities are produced through jumps. References Aase, K.K., 1988. Contingent claim valuation when the security price is a combination of an Ito process and a random point process. Stochastic Processes and their Applications 28, 185­220. Ahn, C., 1992. Option pricing when jump risk is systematic. Mathematical Finance 2, 299­308. Ahn, C.M., Thompson, H.E., 1988. Jump-diffusion processes and the term structure of interest rates. Journal of Finance 43, 155­174. Babbs, S., Webber, N., 1997. Term structure modelling under alternative official regimes. In: Dempster, M.H.A., Pliska, S.R. (Eds.), Mathematics of Derivative Securities. Cambridge University Press. Bakshi, G., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing models. Journal of Finance 52 (5), 2003­2049. Ball, C.A., Torous, W.N., 1985. On jumps in common stock prices and their impact on call option pricing. Journal of Finance 40 (1), 155­173. Becherer, D., 2001. The numeraire portfolio for unbounded semimartingales. Finance and Stochastics 5, 327­341. Bellamy, N., Jeanblanc, M., 2000. Incompleteness of markets driven by a mixed diffusion. Finance and Stochastics 4, 201­222. Björk, T., Kabanov, Yu., Runggaldier, W.J., 1997. Bond market structure in the presence of marked point processes. Mathematical Finance 40 (1), 211­239. Björk, T., Di Masi, G.B., Kabanov, Yu., Runggaldier, W.J., 1997. Towards a general theory of bond markets. Finance and Stochastics 1, 141­174. Björk, T., Näslund, B., 1998. Diversified portfolios in continuous time. European Financial Review 1, 361­387. Brémaud, P., 1981. Point Processes and Queues: Martingale Dynamics. Springer-Verlag, New York. Cox, D., 1955. Some statistical methods connected with series of events. Journal of the Royal Statistical Society. Series B 17, 129­164. Cox, J.C., Ross, S.A., 1976. The valuation of options for alternative stochastic processes. Journal of Financial Economics 3, 145­166. Di Masi, G.B., Platen, E., Runggaldier, W.J., 1995. Hedging of options under discrete observation on assets with stochastic volatility. In: Seminar on Stochastic Analysis, Random Fields and Applications. In: Progress in Probability, Vol. 36. Birkhäuser, pp. 359­364. Eberlein, E., Jacod, J., 1997. On the range of options prices. Finance and Stochastics 2, 131­140. Fischer, P., Platen, E., Runggaldier, W.J., 1999. Risk-minimizing hedging strategies under partial information. In: Seminar on Stochastic Analysis, Random Fields and Applications. In: Progress in Probability, Vol. 45. Birkhäuser, pp. 173­186. Föllmer, H., Schweizer, M., 1991. Hedging of contingent claims under incomplete information. In: Davis, M.H.A., Elliott, R.J. (Eds.), Applied Stochastic Analysis. In: Stochastic Monographs, Vol. 5. Gordon and Breach, London, pp. 389­414. Föllmer, H., Sondermann, D., 1986. Hedging of non-redundant contingent claims. In: Hildenbrand, W., MasColell, A. (Eds.), Contributions to Mathematical Economics, North-Holland, pp. 205­223. Framstad, N.C., Oeksendal, B., Sulem, A., 2001. Optimal consumption and portfolio in a jump-diffusion market with proportional transaction costs. Journal of Mathematical Economics 35, 233­257. 208 W.J. Runggaldier Frey, R., 2000. Risk minimization with incomplete information in a model for high frequency data. Mathematical Finance 10, 215­225. Frey, R., Runggaldier, W.J., 1999. Risk-minimizing hedging strategies under restricted information: The case of stochastic volatility models observable only at discrete random times. Mathematical Methods Operations Research 50, 339­350. Frey, R., Runggaldier, W.J., 2001. A nonlinear filtering approach to volatility estimation with a view towards high frequency data. International Journal of Theoretical and Applied Finance 4 (2), 199­210. Frittelli, M., 2000. The minimal entropy martingale measure and the valuation problem in incomplete markets. Mathematical Finance 10, 39­52. Geman, H., Madan, D., Yor, M., 1999. Asset prices are Brownian motions: only in business time. In: Avellaneda, M. (Ed.), Quantitative Analysis in Financial Markets. World Scientific, Singapore. Glasserman, P., Kou, S.G., 1999. The term structure of simple forward rates with jump risk. Preprint. Columbia University. Herzel, S., 1998. A simple model for option pricing with jumping stochastic volatility. International Journal of Theoretical and Applied Finance 1, 487­505. Hubalek, F., Schachermayer, W., 1998. When does convergence of asset price processes imply convergence of option prices? Mathematical Finance 5, 385­403. Jacod, J., Shiryaev, A.N., 1987. Limit Theorems for Stochastic Processes. Springer-Verlag, Berlin. Jamshidian, F., 1999. LIBOR market model with semimartingales. Working Paper. Net Analytic Ltd., London. Jarrow, R., Madan, D., 1995. Option pricing using the term structure of interest rates to hedge systematic discontinuous asset returns. Mathematical Finance 5, 311­336. Jarrow, R., Madan, D., 1999. Hedging contingent claims on semimartingales. Finance and Stochastics 3, 111­134. Jeanblanc-Piqué, M., Pontier, M., 1990. Optimal portfolio for a small investor in a market model with discontinuous prices. Applied Mathematical Optimization 22, 287­310. Jensen, B., 1999. Option pricing in the jump-diffusion model with a random jump amplitude: A complete market approach. Centre for Analytical Finance. WP Series No. 42, Aarhus. Jorion, P., 1988. On jump processes in the foreign exchange and stock markets. Review of Financial Studies 1 (4), 427­445. Lamberton, D., Lapeyre, B., 1997. Introduction au Calcul Stochastique Appliqué la Finance. Ellipses. English translation by Chapman and Hall. Mercurio, F., Runggaldier, W.J., 1993. Option pricing for jump-diffusions: Approximations and their interpretation. Mathematical Finance 3, 191­200. Merton, R., 1976. Option pricing when the underlying stock returns are discontinuous. Journal of Financial Economics 5, 125­144. Miyahara, Y., 1998. Minimal entropy martingale measures of jump type processes in incomplete asset markets. WP Nagoya City University. Mulinacci, S., 1996. An approximation of American option prices in a jump-diffusion model. Stochastic Processes and their Applications 62, 1­17. Naik, V., 1993. Option valuation and hedging strategies with jumps in the volatility of asset returns. Journal of Finance 48 (5), 1969­1984. Naik, V., Lee, M., 1990. General equilibrium pricing of options on the market portfolio with discontinuous returns. Review of Financial Studies 3, 493­521. Pham, H., 1997. Optimal stopping, free boundary and American option in a jump diffusion model. Applied Mathematical Optimization 35, 145­164. Prigent, J.L., 1999. Incomplete markets: convergence of option values under the minimal martingale measure. Advances in Applied Probabilities 31, 1058­1077. Prigent, J.L., 2001. Option pricing with a general marked point process. Mathematics of Operations Research 26 (1), 50­66. Rogers, L.C.G., Zane, O., 1998. Designing models for high frequency data. Preprint. University of Bath. Ch. 5: Jump-Diffusion Models 209 Runggaldier, W.J., Schweizer, M., 1995. Convergence of option values under incompleteness. In: Seminar on Stochastic Analysis, Random Fields and Applications. In: Progress in Probability, Vol. 36. Birkhäuser, pp. 365­384. Rydberg, T., Shephard, N., 1999. A modelling framework for prices and trades made at the New York stock exchange. Nuffield College working paper series 1999-W14. Schweizer, M., 1994. Risk minimizing hedging strategies under restricted information. Mathematical Finance 4, 327­342. Shirakawa, H., 1990. Security market model with Poisson and diffusion type return process. Institute of Human and Social Sciences, Tokyo Institute of Technology. Shirakawa, H., 1991. Interest rate option pricing with Poisson­Gaussian forward rate curve processes. Mathematical Finance 1, 77­94. Chapter 6 HYPERBOLIC PROCESSES IN FINANCE BO MARTIN BIBBY Department of Mathematics and Physics, The Royal Veterinary and Agricultural University, Thorvaldsensvej 40, DK-1871 Frederiksberg C, Denmark MICHAEL SRENSEN Department of Statistics and Operations Research, Institute of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, DK-2100 Kbenhavn , Denmark Contents Abstract 212 1. Hyperbolic and related distributions 213 1.1. The generalized hyperbolic distribution 213 1.2. The generalized inverse Gaussian distribution 222 1.3. Statistical inference 226 2. Lévy processes 227 3. Stochastic differential equations 230 3.1. Diffusion models 231 3.2. Statistical inference for diffusion processes 233 3.3. Ornstein­Uhlenbeck processes 235 3.4. Compound processes 236 4. Stochastic volatility models 238 Acknowledgment 242 Appendix 243 References 244 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 212 B.M. Bibby and M. Srensen Abstract Distributions that have tails heavier than the normal distribution are ubiquitous in finance. For purposes such as risk management and derivative pricing it is important to use relatively simple models that can capture the heavy tails and other relevant features of financial data. A class of distributions that is very often able to fit the distributions of financial data is the class of generalized hyperbolic distributions. This has been established in numerous investigations, see, e.g., Eberlein ad Keller (1995), Bibby and Srensen (1997), Hurst (1997), Eberlein, Keller and Prause (1998), Rydberg (1999), Küchler et al. (1999), Jiang (2000), and Barndorff-Nielsen and Shephard (2001c). The class of generalized hyperbolic distributions includes the standard hyperbolic distributions, the normal inverse Gaussian distributions, the scaled t-distributions and the variance-gamma distributions. The use of scaled t-distributions in finance was studied by Praetz (1972) and Blattberg and Gonedes (1974), while Madan and Seneta (1990) introduced the variance-gamma distributions in the financial literature. The normal distribution appears as a limit of generalized hyperbolic distributions. The tail behaviour of the generalized hyperbolic distributions thus span a range from Gaussian tails via exponential tails to the power tails of the t-distributions. In Section 1 we present the generalized hyperbolic distributions and their most important properties. We also discuss the generalized inverse Gaussian distributions which play an important role in the theory of generalized hyperbolic distributions and processes. This class of distributions is also of interest in its own right as a model of positive quantities in finance. Its right-hand tail behaviour spans a range from exponential decrease to a Pareto tail. In the following sections we present a number of stochastic process models for which the marginal distributions or the distributions of increments (or both) are generalized hyperbolic. The models are increasingly complex. They are thus able to fit an increasing number of the stylized features of financial data. The well established features of financial data are for instance reviewed in Barndorff-Nielsen (1998) and Rydberg (2000). In Section 2 we discuss Lévy process models, while in Section 3 we discuss models defined by stochastic differential equations. These include classical diffusion models and Ornstein­Uhlenbeck models driven by Lévy processes as well as superpositions of such models. In the final Section 4 we present generalized hyperbolic stochastic volatility models. Ch. 6: Hyperbolic Processes in Finance 213 1. Hyperbolic and related distributions In this section we present the generalized hyperbolic distributions and describe their most important properties. We will also discuss the generalized inverse Gaussian distributions which play an important role in the theory of generalized hyperbolic distributions and processes. As mentioned earlier, this class of distributions is also of independent interest as a model of positive quantities in finance. We will present a few examples of how well these distributions fit financial data. 1.1. The generalized hyperbolic distribution The generalized hyperbolic distributions were introduced by Barndorff-Nielsen (1977) and include, among others, the hyperbolic distributions, the normal-inverse Gaussian (NIG) distributions, the scaled t-distributions and the variance-gamma distributions. We shall discuss these sub-classes in more detail later. First we present the generalized hyperbolic distributions and their properties. A generalized hyperbolic distribution has five parameters. If X follows a generalized hyperbolic distribution we write X H(,,,,). The probability density function of a generalized hyperbolic distribution is given by (/) 2K( ) K-1/2( 2 + (x - )2) ( 2 + (x - )2/)1/2- e(x-) , x R, (1) where 2 = 2 -2, and K is the modified Bessel function of the third kind with index . Definitions and results concerning Bessel functions are collected in an appendix. The parameter domain for the class of generalized hyperbolic distributions is given by 0, > 0, 2 > 2 , if > 0, > 0, > 0, 2 > 2 , if = 0, > 0, 0, 2 2 , if < 0. In all cases R. If = 0 or 2 = 2 the generalized hyperbolic density in (1) is defined as the limit expression obtained by using (A.5). Note that if is equal to zero, the distribution is symmetric. The class of generalized hyperbolic distributions is closed under affine transformation. That is, if X H(,,,,) and Y is defined as Y = aX + b, for some positive a, then we have that Y H , a , a ,a,a + b . (2) 214 B.M. Bibby and M. Srensen From (2) we also see that the parameter is invariant under affine transformations of a generalized hyperbolic random variable. From (A.3) it follows that the mode points for the generalized hyperbolic distribution are solutions to the equation x - 2 + (x - )2 K-3/2( 2 + (x - )2) K-1/2( 2 + (x - )2) = . (3) If = 0, it follows immediately that the distribution is unimodal with mode point . If 3 2 , the ratio of the modified Bessel functions in (3) increases monotonically from 0 to 1, and therefore the distribution is unimodal. See Blsild (1978) for further discussion of features of the generalized hyperbolic density function. The Laplace transform of the generalized hyperbolic distribution is given by L(z) = ez K(z) z K( ) , | + z| < , (4) where 2 z = 2 - ( + z)2. From (A.3) we get that EX = + K+1( ) K( ) , (5) and VarX = K+1( ) K( ) + 22 2 K+2( ) K( ) - K2 +1( ) K2 ( ) . (6) Expressions for the skewness and kurtosis involve modified Bessel functions in a rather complicated way and can be found in Barndorff-Nielsen and Blsild (1980). Sometimes it is useful to reparametrize the generalized hyperbolic density in terms of the parameters , , , , and , where = / and = . Using this parametrization, the generalized hyperbolic density has the form, 2K() K-1/2( 1 + 2 1 + ((x - )/)2) ( 1 + ((x - )/)2/ 1 + 2)1/2- e(x-)/ , x R. (7) The parameters , , and are invariant under affine transformations of a random variable following the generalized hyperbolic distribution. More precisely, the result equivalent to (2) is that Y H(,,,a,a+ b). From this result we see that is a scaling parameter and is a location parameter. In Figure 1 generalized hyperbolic densities are drawn for different values of , , and . In all cases the mean value is 0 and the variance is 1. The tail behaviour of the distributions is more easily seen in Figure 2, where the logarithm of the same densities are plotted. Ch. 6: Hyperbolic Processes in Finance 215 Fig. 1. Generalized hyperbolic densities with mean 0 and variance 1 for different values of the parameters , , and . Fig. 2. The logarithm of generalized hyperbolic densities with mean 0 and variance 1 for different values of the parameters , , and . 216 B.M. Bibby and M. Srensen We shall now consider the important special cases of the generalized hyperbolic distribution mentioned earlier. The hyperbolic distributions is the subclass obtained when is equal to 1. With equal to 1 in (1), we get the following expression for the density of a hyperbolic distribution, 2K1( ) exp - 2 + (x - )2 + (x - ) , x R. (8) From (8) we see that the logarithm of the density of a hyperbolic distribution is a hyperbola, which should be compared to the parabolic log-density of the normal distribution. The name of the hyperbolic distribution stems from this observation. In fact, the definition of the hyperbolic distributions was inspired by the empirical finding by the founding father of the physics of wind blown sand, Brigadier R.A. Bagnold, that the log-density of the distribution of the logarithm of the grain size of natural sand deposits looks more like a hyperbola than like a parabola, as had previously been assumed by geomorphologists, see Bagnold (1941). For the hyperbolic distributions Equation (3), which determines the mode points of the generalized hyperbolic distribution, simplifies to x - 2 + (x - )2 = , implying that the distribution is unimodal with mode point x = + . Letting tend to zero and using (A.5), we get the asymmetric Laplace distribution as a special case of the hyperbolic distribution, that is, 2 - 2 2 e(x-)-|x-| , x R. The normal distribution can also be obtained as a limit case of the hyperbolic distribution. Letting , in such a way that / 2, we get, using (A.6), the normal density: 1 22 e - 1 22 (x-)2 , x R. According to Barndorff-Nielsen et al. (1985) we have that the skewness (1) and the kurtosis (2) for large values of and small values of / satisfy that (1,2) 3,32 , Ch. 6: Hyperbolic Processes in Finance 217 where = / 1 + and = 1 1 + . Based on this observation Barndorff-Nielsen et al. (1985) suggested that the parameters and are natural measures of asymmetry and "kurtosis" for the hyperbolic distribution. Note that they are invariant under location-scale transformations. The parameters and vary in the so-called shape triangle defined by (,) R2 | 0 || < < 1 . (9) Note that the normal and the (possibly skew) Laplace distributions are obtained as limit distributions when 1 and 0, respectively. In Figure 3 hyperbolic log density functions are plotted for different values of and in the shape triangle. In Figure 4 a histogram based on 2666 observations of the daily returns of IBM-stocks (returns are increments on a logarithmic scale of the stock prices) in the period from 1 January 1990 to 20 March 2000 is given. Each point indicates the mid-point of the top of a column in the histogram. The best generalized hyperbolic, hyperbolic,and normal densities are superimposed on the histogram. The parameter values corresponding to the generalized Fig. 3. Hyperbolic log densities with mean 0 and variance 1 for different values of the parameters and (-0.8,-0.6,... ,0.8 for and 0.0,0.25,... ,1.0 for ). The log densities are placed at the corresponding values of and . 218 B.M. Bibby and M. Srensen Fig. 4. A histogram of 2666 daily IBM-stock returns. Superimposed are the best fitting generalized hyperbolic, hyperbolic, and normal densities. The parameter values corresponding to the generalized hyperbolic density are = 5.174, = 0.0048, = 0.0262, = 0.0002, and = -1.933. The parameter values corresponding to the hyperbolic density are = 82.26, = 3.725, = 0.0060, and = -0.0007. hyperbolic density are = 5.174, = 0.0048, = 0.0262, = 0.0002, and = -1.933. For the hyperbolic density the parameter values are = 82.26, = 3.725, = 0.0060, and = -0.0007. In Figure 5 the logarithms of the same histogram points and the same densities are plotted. Log-histograms and log-densities are very useful when the interest is focussed on tail behaviour. From Figures 4 and 5 it is evident that a heavy-tailed distribution such as a generalized hyperbolic or hyperbolic distribution provides a good fit to the data, and certainly a much better fit than the normal distribution, in particular in the tails. A plot like Figure 5, which emphasizes differences in tail behaviour, reveals that the extreme tails of the histogram are a bit heavier than those of the fitted generalized hyperbolic distribution. There is no reason to be overly concerned about this minor discrepancy, because, first, it should be remembered that it is measured on a logarithmic scale, and secondly, the two log-histogram points in the extreme left tail are based on only 1 and 2 observations, respectively, while each of the two points in the extreme right tail represents 2 observations. The normal-inverse Gaussian (NIG) distributions is the subclass obtained for equal to -1 2 . The density of the normal-inverse Gaussian distribution is given by e K1( 2 + (x - )2) 2 + (x - )2 e(x-) , x R. (10) Ch. 6: Hyperbolic Processes in Finance 219 Fig. 5. The logarithm of the histogram in Figure 4 of 2666 daily IBM-stock returns. Superimposed are the logarithms of the best fitting generalized hyperbolic, hyperbolic, and normal densities. The parameter values are as in Figure 4. If the distribution of X has density function (10), we write X NIG(,,,). If we let tend to zero, it follows from (A.5) that the NIG-distribution converges to the Cauchy distribution with location parameter and scale parameter . The Laplace transform of a NIG-distribution is especially simple: L(z) = ez+( -z) , | + z| < , (11) where 2 z = 2 - ( + z)2. Expressions for the mean and variance are also simple in the case of a NIG-distribution: EX = + , VarX = 2 3 . The skewness is 32 -5 and the kurtosis is 32(2 + 42) -7. Although these expressions are quite simple, it is also for the NIG-distributions informative to use the shape triangle, which can be defined in complete analogy with that for the hyperbolic distributions, see, e.g., Rydberg (1997). In Figure 6 NIG log-density functions are drawn for different values of and in the shape triangle defined in the same way as for the hyperbolic distribution. 220 B.M. Bibby and M. Srensen Fig. 6. Normal-inverse Gaussian log densities with mean 0 and variance 1 for different values of the parameters and (-0.8,-0.6,... ,0.8 for and 0.0,0.25,... ,1.0 for ). The log-densities are placed in the shape triangle at the corresponding values of and . Finally, but not least, the class of normal-inverse Gaussian distributions is closed under convolution when the parameters and are fixed, that is if X1 and X2 are independent so that Xi NIG(,,i,i), i = 1,2, then we have that X1 + X2 NIG(,,1 + 2,1 + 2). (12) Only two subclasses of the generalized hyperbolic distributions are closed under convolution. The other class with this important property is the class of variance-gamma (VG) distributions, which is obtained when is equal to 0. This is only possible when > 0 and > ||. The variance-gamma distributions (with = 0) were introduced in the financial literature by Madan and Seneta (1990). Another and perhaps more natural name for the full class is the normal-gamma (NG) distributions. The density function is given by 2 ()(2)-1/2 |x - |-1/2 K-1/2 |x - | e(x-) , x R, (13) where denotes the gamma-function. If X follows a variance-gamma distribution, we write X VG(,,,). Ch. 6: Hyperbolic Processes in Finance 221 The reader is reminded that the parameter domain is > 0, > || 0 and R. The Laplace transform of a VG-distribution is simple: L(z) = ez z 2 , | + z| < , (14) where again 2 z = 2 - ( + z)2. From (14) (or from (5) and (6)) it easily follows that EX = + 2 2 , VarX = 2 2 1 + 2 2 . The class of variance-gamma distributions is closed under convolution when and are fixed. If X1 and X2 are independent random variables such that Xi VG(i,,,i), i = 1,2, then we have that X1 + X2 VG(1 + 2,,,1 + 2). (15) This convolution property follows from (14). By (A.6), the tails of a VG-distribution decrease as |x|-1 e-|x|+x when x . The logarithm of the densities of variance-gamma distributions are plotted for different values of in Figure 7. In all cases = 0, the mean is zero, and the variance is one. From this figure appears a disadvantage of the class of VG-distributions. The probability density is very peaked at the centre for < 1, while for 1 the tail-behaviour does not fit the tails found in typical financial data like those in Figure 5 as well as other generalized hyperbolic distributions like for instance the NIG-distribution. We will finally consider the subclass of the generalized hyperbolic distributions that is obtained when = ||, or equivalently = 0. This is only possible when < 0 and > 0. It is convenient to introduce the reparametrization = -2. For = 0 we obtain the density function 2(-1)/2 (/2) K(+1)/2(|| 2 + (x - )2) ( 2 + (x - )2/||)(+1)/2 e(x-) , x R, (16) where > 0, > 0, R and R. A natural name for this distribution is the asymmetric scaled t-distribution, as will soon be clear. From (A.6) it follows that when is positive, the left-hand tail decreases as |x|-(/2+1) e2x, while the right-hand tail decreases as x-(/2+1). When is negative, the behaviour of the two tails is interchanged. The expectation exists provided > 2, and the variance exists when > 4. More generally, the n-th moment exists when > 2n. The Laplace transform of the distribution given by (16) is ez (-z(z + 2))/2K/2(-z(z + 2)) (/2)2/2-1 (17) 222 B.M. Bibby and M. Srensen Fig. 7. The logarithm of the densities of variance-gamma distributions with = 0, mean 0, and variance 1 for different values of the parameter . with domain -2 < z 0 when > 0 and 0 z < -2 when < 0. When = 0, the domain is the set {0}, and we obtain the density function (( + 1)/2) (/2)(1 + ((x - )/)2)(+1)/2 , x R, which is the well-known density of the scaled t-distribution with degrees of freedom. 1.2. The generalized inverse Gaussian distribution The second class of distributions, that we consider in this section, is the class of generalized inverse Gaussian (GIG) distributions. The GIG-distributions are described by three Ch. 6: Hyperbolic Processes in Finance 223 parameters and defined on the positive half axis. The generalized inverse Gaussian density is of the form (/) 2K( ) x-1 exp - 1 2 2 x-1 + 2 x , x > 0. (18) The parameter domain is given by > 0, 0, if < 0, > 0, > 0, if = 0, 0, > 0, if > 0. The class of generalized inverse Gaussian distributions was first proposed in 1946 by Étienne Halphen, who used it to model the distribution of the monthly flow of water in hydroelectric stations, see Seshardi (1997). The class was rediscovered by Sichel (1973) who used it to construct mixtures of Poisson distributions and by Barndorff-Nielsen (1977) who used it to construct the class of generalized hyperbolic distributions, but also realized its broad usefulness and initiated an in depth study of the class. We shall return to the relation to the generalized hyperbolic distributions later. The generalized inverse Gaussian distributions were briefly mentioned by Goog (1953) as an intermediate between Pearson's curves of Type III and V. The class of generalized inverse Gaussian distributions was investigated extensively in Jrgensen (1982). Using (A.5) we see that for > 0 and > 0 the gamma distribution emerges as limit distribution when tends to zero, that is we get the following density for positive and , ( 2/2) () x-1 e 2x/2 , x > 0. Similarly, the inverse gamma distribution with density given by (2/2) (-) x-1 e(2/2)/x , x > 0, is obtained when tends to zero for < 0 and > 0. This distribution has a tail of the Pareto type. Finally, for = -1 2 we get the inverse Gaussian distribution with density function given by 2x3 e- (x-/ )2/(2x) , x > 0. 224 B.M. Bibby and M. Srensen The generalized inverse Gaussian distributions are unimodal with mode point given by - 1 + ( - 1)2 + 2 2 2 if > 0, 2 2(1 - ) if = 0. If X has a generalized inverse Gaussian distribution, we write X GIG(,, ). In Figure 8 generalized inverse Gaussian densities are plotted for different values of and = . In all cases the variance is 1. The Laplace transform of the GIG(,, )-distribution is L(z) = K( 1 - 2z/ 2) K()(1 - 2z/ 2)/2 (19) for > 0 and > 0. The domain of L is z < 2/2 when 0 and z 2/2 when < 0. In the cases = 0 or = 0, the Laplace transform is obtained from (19) by (A.5). For = 0, L(z) = 1 - 2z 2 , z < 2 2 , Fig. 8. Generalized inverse Gaussian densities with variance 1 for different values of the parameters and = . Ch. 6: Hyperbolic Processes in Finance 225 which is the well-known Laplace transform of the gamma-distribution. For = 0 we ob- tain L(z) = 2K( -22z) (-)(-2z/2)/2 , z 0. For positive values of and the moments of X are given by EXj = j K+j () K() , j = 1,2,.... (20) When either or is zero, the moments of X are also known and are obtained as limits of (20). The variance of X is given by VarX = 2 K+2() K() - K2 +1() K2 () . (21) In Figure 9 a histogram of 307 monthly observations of interest rates in the period from June 1964 to December 1989 is given along with a fitted generalized inverse Gaussian density corresponding to the parameter values = 0.2693, = 11.23, and = -7.0707. More precisely, the data are annualized monthly yields of U.S. one-month Treasury bills. The same data set was studied in Chan et al. (1992). There is the following important relationship between the generalized hyperbolic distribution and the generalized inverse Gaussian distribution, which was, in fact, how the genFig. 9. A histogram of 307 monthly interest rates. The generalized inverse Gaussian density with parameters = 0.2693, = 11.23, and = -7.0707 is superimposed. 226 B.M. Bibby and M. Srensen eralized hyperbolic distribution was originally derived in Barndorff-Nielsen (1977). The generalized hyperbolic distribution is a normal variance­mean mixture where the mixing distribution is generalized inverse Gaussian. What is meant by this is that if X|W = w N( + w,w), and W GIG(,, ), then the marginal distribution of X will be generalized hyperbolic, X H(,,,,), where 2 = 2 + 2. This property provides a possible interpretation of non-Gaussian stochastic variation described by a generalized hyperbolic distribution. As special cases we have that the normal-inverse Gaussian distribution appears when the mixing distribution is an inverse Gaussian distribution, and the variance-gamma distribution emerges as a normal variance­mean mixture where the mixing distribution is a gamma distribution. This explains the names of the distributions. The asymmetric scaled t-distribution is a normal variance­mean mixture with an inverse gamma mixing distribution. As a special case we get the well-known result that the t-distribution is a normal variance mixture ( = 0) with an inverse gamma mixing distribution. The mixing result implies that there is the following simple relationship between the Laplace transform, LX, of the generalized hyperbolic distribution H(,,,,) and that of the GIG(,, 2 - 2)-distribution, LW : LX(z) = ez LW z + 1 2 z2 . Barndorff-Nielsen and Halgreen (1977) showed that generalized inverse Gaussian distributions are infinitely divisible. Using that the generalized hyperbolic distributions are normal variance-mean mixtures with generalized inverse Gaussian mixing distributions, they also proved that generalized hyperbolic distributions are infinitely divisible. Halgreen (1979) showed that generalized hyperbolic distributions and generalized inverse Gaussian distribution are even self-decomposable. In the following section, the properties of infinite divisibility and self-decomposability will turn out to be important because they allow the construction of certain hyperbolic stochastic process models. 1.3. Statistical inference Inference for the parameters when dealing with independent and identically generalized hyperbolic or generalized inverse Gaussian distributed observations should be based on the likelihood function. The C-program HYP described in Blsild and Srensen (1992) can be used for maximum likelihood estimation in the situation where independent and identically (possibly multi-dimensional) hyperbolic distributed observations are considered. The program HYP also has the facility of basing the inference on the multinomial likelihood function obtained by only observing the number of observations in given intervals. More precisely, if I1,...,Ik are disjoint intervals with union the entire real line and yj denotes Ch. 6: Hyperbolic Processes in Finance 227 the number of observations in Ij , j = 1,...,k, then the multinomial log-likelihood function is given by (,,,) = k j=1 yj logpj , (22) where pj is the probability that a hyperbolic distributed random variable takes a value in Ij , that is, pj = Ij 2K1( ) exp - 2 + (x - )2 + (x - ) dx, j = 1,...,k. (23) Inference based on grouped observations from other distributions can of course be carried out in a similar way using (22) and the equivalent of (23). Küchler et al. (1999) note that if the observations are not independent then inference based on the multinomial likelihood function for grouped observations will be more robust to effects of the dependence than inference based on the original likelihood function for independent observations. 2. Lévy processes A homogeneous Lévy process X is a stochastic process with X0 = 0 and with the property that its increments over non-overlapping time intervals are independent. Moreover, the increment, Xt+s - Xs, over any time interval of length t has the same distributions as Xt . The homogeneous Lévy processes are also called processes with independent, stationary increments or additive processes. The mathematical theory of Lévy processes can be found in Bertoin (1996) or Sato (1999). An example of a Lévy process that is well-known from, for instance, the Black­Scholes­Merton option pricing theory is the Brownian motion (or Wiener process), where the increments are normally distributed. For every generalized hyperbolic distribution there exists a homogeneous Lévy process X such that the probability distribution of the value of the process, Xt , at a fixed time point t is that particular generalized hyperbolic distribution. A thorough review of the theory of these generalized hyperbolic Lévy processes and their application in finance can be found in Eberlein (2001), see also Prause (1999) and Eberlein and Raible (2001). The distributions that can appear as the distribution of the instantaneous value of a homogeneous Lévy process are exactly those that have the property called infinite divisibility. As mentioned in Section 1 the generalized hyperbolic distributions are infinitely divisible. Usually, the distribution of the value Xs at a time point s different from t will not be generalized hyperbolic. However, in the case of the NIG and VG distributions, the convolution properties (12) and (15) imply that the value of the Lévy process will be NIG-distributed, respectively VG-distributed, at all time points. This makes the NIG and VG Lévy processes more 228 B.M. Bibby and M. Srensen natural generalized hyperbolic Lévy processes than the other generalized hyperbolic Lévy processes. Simulation of the NIG Lévy process was studied in Rydberg (1997). A generalized hyperbolic Lévy processes can be written in the form Xt = t + Zt, where Zt is a pure jump martingale with infinitely many small jumps in every finite time interval, however small. The behaviour of Zt is reflected in the so-called Lévy measure, see (27) and the discussion following this formula. The Lévy measure of the generalized hyperbolic distribution is q(x) = ex |x| 0 exp(-|x| 2y + 2) 2y(J2 ( 2y) + Y2 ( 2y)) dy + e-|x| if 0, ex |x| 0 exp(-|x| 2y + 2) 2y(J2 -( 2y) + Y2 -( 2y)) dy if < 0. (24) Here J and Y denote Bessel functions of the first and second kind, respectively, see the appendix. The Lévy measure was essentially found by Halgreen (1979), see also Prause (1999). For the NIG-distribution this expression simplifies to q(x) = -1 |x|-1 K1 |x| ex , (25) where K1 is a modified Bessel function of the third kind. The behaviour near zero is particularly important, so the following expansion for generalized hyperbolic distributions (Raible, 2000) is useful: x2 q(x) = + + 1/2 2 |x| + x + o |x| (26) as x 0. We see that for every generalized hyperbolic distribution the Lévy measure has infinite mass in every neighbourhood of the origin. The process Zt is given by Zt = t 0 R\{0} x X (du,dx) - q(x)dudx , (27) where the integer-valued random measure X is defined by X (dt,dx) = s>0 1{ Xs=0}(s, Xs)(dt,dx). Here a denotes the Dirac measure at a, and Xs = Xs - Xs- is the jump of the process X at time s (for most time points Xs = 0). Integrals of the type (27) are treated in, Ch. 6: Hyperbolic Processes in Finance 229 e.g., Jacod and Shiryaev (1987) or Protter (1990). The random measure X is Poissonian with intensity measure q(x)dx dt. This implies that for any closed interval A that does not contain the origin, the number of jumps in the time interval [0,t] with a size that belongs to A, i.e., NA t = X [0,t],A , is a Poisson process with intensity A q(x)dx, which is a finite number. In particular, NA t is Poisson distributed with mean value t A q(x)dx. As the boundary of the interval A tends to zero, the mean value goes to infinity, cf. (26). It is interesting to note that a generalized hyperbolic Lévy process has no continuous Brownian motion component and has infinitely many jumps on every time interval. The generalized hyperbolic Lévy processes do, however, have a nice relation to the Brownian motion. Let B be a standard Brownian motion, and let (t) be a Lévy process for which the distribution of (1) is a generalized inverse Gaussian distribution. Then the process Xt = t + (t) + B(t) (28) is a generalized hyperbolic Lévy process. Because the increments of are generalized inverse Gaussian distributed and hence can only be positive, the process is increasing and can thus be interpreted as a time that increases with a randomly varying speed. A process with this property is called a subordinator, and the construction (28) is called subordination. The randomly increasing time has been interpreted as an operational time or a business time reflecting, for instance, the volume of trade at an exchange. Some times a lot is happening at the exchange and the business time increases rapidly. At other times the exchange is tranquil and the business time goes only slowly. That the distribution of X1 is generalized hyperbolic follows because this distribution is a variance-mean mixture of normal distributions where the mixing distribution is the generalized inverse Gaussian distribution, see Section 1.2. The fact that a Lévy process exists such that (1) is generalized inverse Gaussian distributed follows because these distributions are infinitely divisible, as mentioned in Section 1. In the case of a NIG-distribution, the construction by subordination can be done in the following simple way (Barndorff-Nielsen, 1998). Let (Ut,Vt) be a two-dimensional standard Brownian motion starting at (0,0) and with drift vector (, ), where > 0. Let (t) denote the first time the second component V attains the value t > 0 with > 0. Then {(t): t > 0} is an inverse Gaussian Lévy process, and Xt = t + U(t) is a NIG-Lévy process. Specifically, Xt is NIG(,,t,t) distributed, where = 2 + 2. Construction of financial models by subordination was first proposed by Praetz (1972) who used a scaled t-distribution to model stock returns and obtained a good fit to weekly 230 B.M. Bibby and M. Srensen returns from the Sydney Stock Exchange. This is a particular example of a generalized hyperbolic distribution where the mixing distribution is an inverse gamma distribution, see Section 1.1. Praetz attributed the mixing of normals to the change in activity at the exchange. Clark (1973) and Epps and Epps (1976) found that there is a dependency between trading volume and the variance of returns, but did not suggest generalized hyperbolic models. These finding have been confirmed by Ané and Geman (2000). In Madan and Seneta (1990), Madan and Lime (1991) and Madan and Chang (1996) the so-called variance gamma model is introduced and studied as a model for share market returns. This model is the generalized hyperbolic Lévy process with a gamma mixing distribution. For a discussion of the subordination approach in finance, see, e.g., Hurst, Platen and Rachev (1997). The use of generalized hyperbolic Lévy processes to model the prices of stocks and other assets and the corresponding theory of option pricing has been thoroughly investigated by Eberlein and Keller (1995), Keller (1997), Eberlein, Keller and Prause (1998) and Eberlein and Prause (2002). Eberlein and Jacod (1997) proved that the set of equivalent martingale measures is large and that the corresponding price range is the entire non-arbitrage interval. A theory of the term structure of interest rates based on the hyperbolic Lévy process was developed in Eberlein and Raible (1999). A useful review can be found in Eberlein (2001). For the processes discussed in this section, estimation based on observations at equidistant discrete time points is as easy as estimation for independent generalized hyperbolic distributions, because the increments of the process between the observation times are independent. Usually one would use a Lévy process for which the increments are generalized hyperbolic and then estimate the parameters, for instance by means of the computer program mentioned in Section 1.3. A simple check of the fit of the model to the data can be made as follows. If, for instance, the data are daily observations, then it should be checked that the distributions calculated from the estimated model of the increments over a number of suitably chosen longer time spans fit the corresponding increments calculated from the data. For the NIG and VG Lévy processes these distributions are simply given by the formulae (12) and (15). For an example of this procedure, see Eberlein (2001). 3. Stochastic differential equations In this section we present various methods for constructing diffusion processes with generalized hyperbolic and generalized inverse Gaussian marginal distributions. A diffusion process is the solution of a stochastic differential equation driven by a Wiener process. Estimation of parameters based on discrete-time observations of a diffusion process is considered too. Furthermore, we consider Ornstein­Uhlenbeck type processes driven by Lévy processes and models given as sums of processes defined by stochastic differential equations. Ch. 6: Hyperbolic Processes in Finance 231 3.1. Diffusion models We consider a one-dimensional diffusion process {Xt} and suppose that it is the unique weak solution to the stochastic differential equation dXt = b(Xt;)dt + (Xt ;)dWt, (29) where (x;) is positive for all x in the state space (l,r) (- l < r ) and all in some p-dimensional parameter space . We will focus on ergodic diffusions and denote the density of the corresponding invariant probability measure by . Diffusion processes with a specific marginal distribution are typically constructed by determining drift b and diffusion coefficient so that the invariant distribution is of the required type. This method will result in the appropriate marginal distribution for large values of t or for all t provided that the initial distribution is equal to the invariant distribution (i.e., X0 ). Under mild conditions we have the following relationship between the drift, diffusion coefficient, and the density of the invariant distribution, 2b(x;) - v (x;) = v(x;) (x) (x) , l < x < r, , (30) where v denotes the squared diffusion coefficient, v(x;) = 2(x;). Using (30), Bibby and Srensen (2001) discussed a method for constructing diffusion processes with a prescribed marginal (invariant) distribution. Letting the drift be given by b(x;) = 1 2 v(x;) d dx log v(x;)f (x) , where f is a function that is integrable on the interval (l,r), it was shown under some regularity conditions that the diffusion process given by (29) has invariant density proportional to f , irrespective of the choice of the function v. Bibby and Srensen (2001) also considered the special case where v(x;) = 2 f (x), 2 > 0, [0,1], in particular the situation where the invariant density was hyperbolic. This led to the following stochastic differential equation, dXt = 1 2 2 (1 - )f (Xt)- (Xt - ) 2 + (Xt - )2 dt + f (Xt)-/2 dWt, (31) where f is proportional to the hyperbolic density function given by (8), that is f (x) = exp - 2 + (x - )2 + (x - ) . 232 B.M. Bibby and M. Srensen Note that the drift is towards the mode point of the hyperbolic distribution, + / . The diffusion process given by (31) was successfully used to describe the logarithm of the price of VW-stocks after a linear trend had been subtracted. In Bibby and Srensen (1997) the special case where = 1 was considered in the situation of a hyperbolic invariant density. Note that this results in a diffusion process with no drift, that is the solution to the stochastic differential equation given by dXt = exp 1 2 2 + (Xt - )2 - 1 2 (Xt - ) dWt . (32) It turns out that this is an example of a local martingale which is not a martingale. Also the hyperbolic diffusion process given as the solution of (32) was fitted successfully to the logarithm of stock-prices (minus a linear trend) in Bibby and Srensen (1997). The construction leading to the hyperbolic diffusion (31) can obviously be made similarly for any generalized hyperbolic distribution. In the special case = 1, this was done in Rydberg (1999), where the corresponding NIG-diffusion was fitted successfully to stock prices (minus a linear trend). In Küchler et al. (1999) a hyperbolic diffusion process with constant diffusion coefficient was discussed. This corresponds to letting the function v be equal to a constant 2, or to = 0 in (31), and gives the following stochastic differential equation, dXt = 1 2 2 - Xt - 2 + (Xt - )2 dt + dWt . (33) The hyperbolic diffusion process given by (33) was first proposed in Barndorff-Nielsen (1978). For values of between the two extremes 0, corresponding to stationarity being obtained by pure reversion, and 1, where stationarity is obtained by pure diffusion, both these effects are present to varying degrees. Srensen (1997b) considers the construction of diffusion processes with a generalized inverse Gaussian invariant distribution. If v is a positive function, then the solution to the stochastic differential equation dXt = v(Xt )v (Xt) + 1 2 v(Xt )2 ( - 1)X-1 t - 2 2 + 1 2 2 X-2 t dt + v(Xt )dWt (34) will have a generalized inverse Gaussian invariant density given by (18) under suitable regularity conditions on v. The focus in Srensen (1997b) is on the special case where v(x) = x for constants 0 and > 0. With this choice of diffusion coefficient, the diffusion process is the solution to the stochastic differential equation given by dXt = 1X2-1 t - 2X2 t + 3X 2(-1) t dt + X t dWt, (35) Ch. 6: Hyperbolic Processes in Finance 233 where 1 = 1 2 2 ( - 1) + 2 , 2 = 1 4 ( )2 , 3 = 1 4 ()2 . Note that if = 1 2 and 3 = 0, then the diffusion process is the solution to dXt = (1 - 2Xt)dt + Xt dWt , (36) that is the Cox­Ingersoll­Ross process (CIR-process) used in finance to model short term interest rates, see Cox, Ingersoll Jr. and Ross (1985). A completely different way of constructing hyperbolic diffusion models was proposed in Jensen and Pedersen (1999). These authors consider processes given by Xt = h(Yt ), where Y is a stationary Ornstein­Uhlenbeck process: dYt = -Yt dt + dWt with > 0 and > 0. Suppose F is the distribution function of a given probability distribution, and let denote the distribution function of the standard normal distribution. If 2 = 2 and h(y) = F-1((y)), then the distribution of Xt will have the distribution function F. If, in particular, F is the distribution function of a generalized hyperbolic distribution, we obtain a generalized hyperbolic diffusion process. Unfortunately, there is no explicit expression for the distribution function of a generalized hyperbolic distribution. An advantage of this approach is that there is an expression for the transition density involving the function h. Since the distribution function of a generalized hyperbolic distribution, and hence h, can be calculated numerically, it is relatively easy to calculate the likelihood function, which is usually not the case for diffusion models. A disadvantage is that the drift and diffusion coefficients of the diffusion process X are not explicit functions. 3.2. Statistical inference for diffusion processes Inference for discretely observed diffusion processes is made difficult by the fact that the likelihood function is generally not tractable. In recent years many different methods have been proposed to overcome this obstacle. We will here briefly discuss the methods most commonly used in connection with financial data. For an excellent overview of a wide variety of procedures for estimating parameters based on discretely observed diffusions, see H. Srensen (2000). Approximate likelihood methods are considered by Pedersen (1995), At-Sahalia (2002), and Poulsen (1999). In Pedersen (1995) it is shown that the likelihood function can be calculated to any given precision using simulations and the Euler approximation in a clever way. Unfortunately, the method is very computer intensive. Honoré (1997) successfully applied the Pedersen method to the CKLS-model for interest rates (proposed by Chan et al. (1992)). In At-Sahalia (2002) an analytical approximation to the likelihood function 234 B.M. Bibby and M. Srensen based on a truncated Hermite expansion is developed. Poulsen (1999) obtained an approximation to the likelihood function by numerically solving the Chapman­Kolmogorov forward equations. He used his method to fit the CKLS-model to interest rate data. Asymptotic results for the maximum likelihood estimator based on discrete time observations of a diffusion model were derived in Dacunha-Catelle and Florens-Zmirou (1986). Inference for diffusion processes based on martingale estimating functions is considered in Bibby and Srensen (1995, 1996, 1997). For observations Xt1 ,Xt2 ,...,Xtn the martingale estimating functions introduced in Bibby and Srensen (1995, 1996) are of the form Gn() = n i=1 gi(Xti-1 ;) Xti - E (Xti |Xti-1 ) + n i=1 hi(Xti-1 ;) Xti - E (Xti |Xti-1 ) 2 - Var (Xti |Xti-1 ) . (37) Note that in analogy with the unknown score function, Gn is a sum of functions of consecutive pairs of observations, and Gn is a martingale with respect to the natural filtration. The conditional expectations in (37) can easily be calculated using simulations, and an estimator for the parameter is then obtained by solving the equation Gn() = 0. In Bibby and Srensen (1995) the resulting estimator is shown to be consistent and asymptotically normal as the number of observations tends to infinity. An optimal choice of the functions gi and hi as well as simpler approximately optimal functions that are useful in practice are given in Bibby and Srensen (1995, 1996). As mentioned earlier the hyperbolic diffusion process given by (32) was fitted to the log-prices of stocks after a linear trend had been subtracted in Bibby and Srensen (1997). The parameters in this hyperbolic diffusion model were estimated using the martingale estimating function Kn() = n i=1 ˙v(Xti-1 ;) (ti - ti-1)v(Xti-1 ;)3 (Xti - Xti-1 )2 - E (Xti - Xti-1 )2 |Xti-1 , where v is the squared diffusion coefficient and a dot denotes differentiation with respect to the parameter . This is an approximately optimal modification of (37) taking into account that the diffusion has no drift. Kessler and Srensen (1999) considered martingale estimating functions based on eigenfunctions of the infinetisimal generator of the diffusion process. The advantage of such martingale estimating functions is that they are adapted to concrete models and are easy to calculate in cases where the eigenfunctions are explicitly known. Unfortunately this is not often the case. It is usually easy to obtain an estimator from a simple estimating function of the form Fn() = n i=1 f (Xti ;), Ch. 6: Hyperbolic Processes in Finance 235 where the function f satisfies that r l f (x,) (x)dx = 0 with denoting the density of the invariant probability measure. Such simple estimating functions were studied by Hansen and Scheinkman (1995), Kessler (2000), and Jacobsen (2001). The advantage of these estimating functions is that they are indeed simple and fast to work with because it is straightforward to explicitly find functions f with the property needed. The main disadvantages are that only parameters appearing in the invariant density can be estimated using simple estimating functions and that the estimators may be far from efficient because the dependence structure in the data is ignored. An improved version of the simple estimating function where each term in the sum depends on a pair of consecutive observations was considered by Hansen and Scheinkman (1995) and Jacobsen (2001). Optimality questions were treated in Kessler (2000) and Jacobsen (2001). For the improved version it is also not possible to estimate all parameters, see the discussion in Hansen and Scheinkman (1995). A review of estimating function inference for diffusion models can be found in Srensen (1997a) and Bibby, Jacobsen and Srensen (2002). Indirect inference procedures based on auxiliary models and extensive simulations were proposed by Gouriéroux, Monfort and Renault (1993) and Gallant and Tauchen (1996). These procedures have gained some popularity in the finance literature under the name of the efficient method of moments. However, the quality of the estimators depend on the choice of the auxiliary model, which is not a straightforward matter. Finally, Bayesian MCMC-methods have been applied to diffusion models by Eraker (2001) and Elerian, Chib and Shepard (2001). In these methods, the likelihood function is calculated in a way similar to that in Pedersen (1995). 3.3. Ornstein­Uhlenbeck processes A stochastic process X is called a process of the Ornstein­Uhlenbeck type, if it satisfies a stochastic differential equation of the form dXt = -Xt dt + dZt, (38) where > 0 and where the driving process Z is a homogeneous Lévy process. It is not difficult to see that Xt = e-t X0 + t 0 e-(t-s) dZs. (39) If X is stationary and square integrable, the autocorrelation function of X is (u) = exp(-u). (40) When the process Z is the standard Wiener process, the solution X is the usual OrnsteinUhlenbeck process. Ornstein­Uhlenbeck type processes have been studied by Wolfe (1982), Sato and Yamazato (1982, 1984) and Sato, Watanabe and Yamazato (1994); see 236 B.M. Bibby and M. Srensen also Jurek and Vervaat (1983), Jurek and Mason (1993), and Barndorff-Nielsen, Jensen and Srensen (1998). A necessary and sufficient condition for (38) to have a stationary solution is that E(log(1 + |Z(1)|)) < . For every generalized hyperbolic distribution there exists a stationary OrnsteinUhlenbeck type process such that for all t 0 the distribution of Xt is the given generalized hyperbolic distribution. The same is true for all generalized inverse Gaussian distributions. This is because these distributions have the property called self-decomposability, as discussed in Section 1. The Lévy process driving the NIG Ornstein­Uhlenbeck type process was studied by Barndorf-Nielsen (1998), while the process driving the symmetric variancegamma Ornstein­Uhlenbeck type process, was found by Jiang (2000). For symmetric distributions, the driving Lévy process is, in the case of the NIG Ornstein­Uhlenbeck process, the sum of a NIG Lévy process and a compound Poisson process, while for the variancegamma Ornstein­Uhlenbeck process, it is simply a compound Poisson process. As for most ordinary diffusion processes, the likelihood function is usually not explicitly available for processes of the Ornstein­Uhlenbeck type. Since these processes are Markov processes, a simple and natural approach to statistical inference goes via estimating functions based on conditional moments defined in analogy with those discussed in Section 3.2. 3.4. Compound processes Quite often, the exponentially decreasing autocorrelation function (40) is too simple to fit financial data. However, models with a much more flexible covariance structure are easily obtained by summing independent Ornstein­Uhlenbeck type processes, as was proposed by Barndorff-Nielsen, Jensen and Srensen (1998). The process Xt = X(1) t + + X(m) t , (41) where the processes X(i) t , i = 1,...,m, are independent Ornstein­Uhlenbeck type processes given by dX(i) t = -iX(i) t dt + dZ(i) t (42) for independent Lévy processes Z(i) t , i = 1,...,m, has an autocorrelation function of the form (u) = 1 exp(-1u) + + m exp(-mu), (43) where i is proportional to the variance of X (i) t , and 1 + + m = 1. A much better fit to financial data than that obtained by (40) can often obtained even for m = 2. Examples can be found in Barndorff-Nielsen, Jensen and Srensen (1998) and Barndorff-Nielsen and Shephard (2001c). For every generalized hyperbolic distribution and for every generalized inverse Gaussian distribution there exists a stationary process X of the form (41), (42) such that for all t 0 Ch. 6: Hyperbolic Processes in Finance 237 the distribution of Xt is that particular distribution. Again this is because these distributions are self-decomposable, see Barndorff-Nielsen, Jensen and Srensen (1998). More complex types of superpositions of Ornstein­Uhlenbeck type processes were investigated in Barndorff-Nielsen (2001). The construction (41) can be made for diffusion models with linear drift and non-linear diffusion coefficient too, see Bibby, Skovgaard and Srensen (2002). As an example, suppose we want a stationary stochastic process with autocorrelation function (43) for given values of 1,...,m and 1,...,m, and such that the marginal distribution of Xt is a gamma distribution with shape parameter and scale parameter . This can be obtained by defining m independent processes as the stationary solutions to dX(i) t = -i X(i) t - i dt + 2iX(i) t dW(i) t , (44) i = 1,...,m. Each of the processes, X(i) t , is a CIR-process, (36), which is a particular example of the generalized inverse Gaussian diffusions given by (35). Since X(i) t is gamma distributed with shape parameter i and scale parameter , it follows that Xt defined by (41) has the required gamma distribution, and since the autocorrelation function of X (i) t is exp(-iu), the autocorrelation function of the sum Xt is given by (43). This construction will come in handy in Section 4, where processes of the type (41) will be used as models for stochastic volatility. Empirical autocorrelations that might be interpreted as an indication of long range dependence, may often alternatively be approximated very well by autocorrelation functions of the type (43). However, if a model with genuine long range dependence is desirable, a NIG-process of this type can be constructed as follows. Let X(i), i = 1,2,..., be a sequence of independent NIG Ornstein­Uhlenbeck processes with NIG-parameters (,,0,i), where i i-1-2(1-H) , for some H (0,1), and all with the same value of the drift parameter . Barndorff-Nielsen (1998) showed that the process Xt = i=1 X (i) t/i, (45) which is stationary and well-defined as a mean-square limit, has as its marginal distribution the NIG distribution with parameters (,,0,), where = i=1 i. Moreover, its autocorrelation function r(u) satisfies r(u) L(u)u-2(1-H) , for some slowly varying function L. Thus if 1 2 < H < 1, the process X exhibits long range dependence with exponent H . The construction of long range dependent processes by a 238 B.M. Bibby and M. Srensen sum of the type (45) is similar to a construction proposed by Cox (1984). Almost the same construction was used in Barndorff-Nielsen, Jensen and Srensen (1990). The construction (45) can also be applied to a sequence of independent stationary NIG-diffusions given as solutions of stochastic differential equations defined in analogy to (31). Likelihood inference for the various compound processes considered here is complicated by the fact that the likelihood function is not explicitly available. A feasible alternative is provided by prediction-based estimating functions, see M. Srensen (2000). 4. Stochastic volatility models A generalization of the Black­Scholes model for the logarithm of an asset price dXt = + 2 dt + dWt, (46) that takes into account the empirical finding that the volatility 2 varies randomly over time is a stochastic volatility process: dXt = ( + vt )dt + vt dWt. (47) Here the volatility vt is a stochastic process that cannot be observed directly. If the data are observations at the time points i, i = 0,1,2,...,n, then the returns Yi = Xi - X(i-1) can be written in the form Yi = + Si + SiAi, (48) where Si = i (i-1) vt dt, (49) and where the Ais are independent, standard normal distributed random variables. If the integrated volatility Si is independent of Ai, and if it is generalized inverse Gaussian distributed, then the distribution of the return Yi is generalized hyperbolic. This follows from the representation of the generalized hyperbolic distributions as variance­mean mixtures of normal distributions mentioned in Section 1.2. Unfortunately, no continuous time process v with the property that the integrated volatility (49) is exactly generalized inverse Gaussian distributed is presently known. Therefore we will instead consider models where the volatility process v is stationary with vt generalized inverse Gaussian distributed. For small values of , the distribution of Si will then be close to a generalized inverse Gaussian distribution, and hence the distribution of Yi will be close to a generalized hyperbolic distribution. Thus we obtain models that are not exactly generalized hyperbolic, but which have marginal distribution with much the same tail properties when is not too large. Ch. 6: Hyperbolic Processes in Finance 239 When tends to infinity, the distribution of -1/2(Yi - - Si) = Si/Ai tends to a normal distribution with mean zero and variance equal to the mean volatility, E(vt ), provided that the process v is ergodic. This is in accordance with the empirical finding that the distribution of returns over short periods have heavy tails and are well approximated by generalized hyperbolic distributions, whereas the distribution of returns over long periods is close to a normal distribution. Limit theorems relating, for small , the distribution of Yi to the generalized hyperbolic distribution obtained by assuming that Si is exactly generalized inverse Gaussian distributed are given in Genon-Catalot, Jeantheau and Larédo (1998). A rather different type of discrete time stochastic volatility models with exactly generalized hyperbolic distributed returns was proposed in Banrdorff-Nielsen (1997). It should be noted that stochastic volatility models can be interpreted as being obtained by subordination. Here the operational time or business time is the integral of the volatility process (t) = t 0 vs ds, which can be interpreted as discussed in Section 2. A simple specification of the volatility process v is to assume that it is one of the stationary and ergodic generalized inverse Gaussian diffusions defined in Section 3 as the solution of (35). A particularly simple choice is to assume that v is the stationary CIR-model given by (36), for which vt is gamma-distributed so that a variance-gamma stochastic volatility model is obtained. This model was proposed by Hull and White (1988) and was considered further by Heston (1993). Its advantage is that analytically it is relatively tractable. For instance, all moments and mixed moments can be found explicitly, see, e.g., M. Srensen (2000). A problem is that because of the linear drift, the autocorrelation function is an exponential function, whereas it is a well-established empirical fact that the autocorrelation function of the volatility process decreases more slowly than a single exponential function. Under relatively weak regularity conditions a diffusion model has an exponentially decreasing autocorrelation function. A sufficient condition is that it is -mixing, for which simple conditions are given in Jeantheau and Larédo (2000). For this reason, stochastic volatility models with a diffusion volatility process can usually not fit the autocorrelation of the volatility process well. In applications where the autocorrelation of the volatility process is important, a solution is to use the construction in Section 3.4, i.e., to define the volatility process as the sum vt = v (1) t + + v (m) t , (50) where v(1) t ,...,v(m) t are independent CIR-processes, with v(i) t defined like the process X(i) given by (44). Also in this case a variance-gamma model is obtained, which is exactly as analytically tractable as the variance-gamma model just discussed, but the autocorrelation structure of the volatility process (50) is given by (43) and is thus very flexible. This approach is studied for more general diffusion models in Bibby and Srensen (2002). It has been found empirically that for equities a fall in the price is associated with an increase in the future volatility. This phenomenon is referred to as leverage, Black (1976) and Nelson (1991). Stochastic volatility models of the form (47), where the Wiener process driving the price process is independent of the volatility process, as we have so far assumed, cannot deal with leverage, because for such a model the future fluctuations of the volatility 240 B.M. Bibby and M. Srensen are independent of the present price. We can, however, easily generalize the model to allow for the leverage phenomenon. Again we let the volatility process v be given by (50), and denote the Wiener process driving the kth CIR-process v(k) t by B(k). Then we define the log-price process by dXt = ( + vt )dt + vt dWt, (51) where W is the standard Wiener process Wt = Wt + Bt 1 + 2 with R and Bt = B (1) t + + B (m) t m . (52) A lengthy calculation shows that for = 0 the covariance between Yi and Y2 i+j (j 1) is 1 + 2 1 m m k=1 bk e-kj . Here bk = 2 ek 1 - e-k 2 -3/2 k E v (k) 1 v1 > 0, where is the shape parameter of the gamma distribution of the volatility, and k is the speed of reversion of the kth volatility component. We see that the correlation between Yi and Y2 i+j is negative if < 0, which is exactly what we wanted. For = 0 there is no leverage effect as expected. Note that the effect decreases as j tends to infinity. The decrease is of the same type as that of the autocorrelation function (43), but with different weights. It is thus very flexible and can in particular be slow. Barndorff-Nielsen and Shephard (2001b, c) proposed to model the volatility process v as an Ornstein­Uhlenbeck type process, i.e., a solution to the stochastic differential equation (38). Such a process can be chosen stationary with a generalized inverse Gaussian marginal distribution, as discussed in Section 3.3. Processes of this type have the advantage that the drift is linear and the coefficient in front of the driving Lévy process is constant, which, analogous to the situation for the classical Wiener-driven Ornstein­Uhlenbeck process, implies an unusual analytic tractability. For instance the integrated volatility, which is a key quantity in finance, has the simple structure t s vs ds = -1 (Zt - Zs) - (vt - vs) , Ch. 6: Hyperbolic Processes in Finance 241 where s < t, and where Z is the driving Lévy process. This relation implies, for instance, that stochastic volatility processes of this type can be simulated as accurately as the volatility process can be simulated. This is because the random variables Si, given by (49), are simple functions of the processes Z and v. An efficient method of simulating OrnsteinUhlenbeck type processes is based on results by Rosínski (1991) and Rosínski (2001), see the exposition in Barndorff-Nielsen and Shephard (2001b). Barndorff-Nielsen and Shephard (2001a) have studied the distributional properties of integrated Ornstein­Uhlenbeck type processes in detail. For the Ornstein­Uhlenbeck type volatility process with inverse Gaussian marginal distributions they found that while the integrated volatility process is not distributed exactly as the inverse Gaussian distribution, its tails have the same behaviour as this distribution. This implies that the returns will have the expected NIG tail behaviour. For an Ornstein­Uhlenbeck type volatility process v, the autocorrelations of the discrete time processes Si and Y2 i have the following simple form. Here Si is given by (49), while Yi denotes the return given by (48). cor(Si,Si+j ) = d exp -(j - 1) , (53) and cor Y2 i ,Y2 i+j = c exp -(j - 1) , (54) where 1 d = [1 - exp(-)]2 2[exp(-) - 1 + ] c = [1 - exp(-)]2 6[exp(-) - 1 + ] + 2()2(/)2 0, with and denoting the mean and variance of the volatility vt . Therefore, as discussed in Barndorff-Nielsen and Shephard (2001c), S and Y2 are constrained ARMA(1,1) processes with common autoregressive parameter, and with the moving average root being stronger for S than for Y2. The ARMA structure implies that the return process Y is weak GARCH(1,1) in the sense of Drost and Nijman (1993). Note that the formulae (53) and (54) also hold for the stochastic volatility model discussed above, where the volatility process is a CIR-diffusion. Hence for this model, the processes S and Y2 have the same ARMA structure. Barndorff-Nielsen and Shephard (2001c) also proposed a model with a Lévy-driven Ornstein­Uhlenbeck volatility process that allows for the leverage phenomenon. The logprice is modelled by dXt = ( + vt)dt + vt dWt + dZt, (55) 242 B.M. Bibby and M. Srensen where Zt = Zt - E(Zt) is the centered version of the Lévy process Z that drives the volatility process. This model has properties similar to those of the model with leverage discussed above (when m = 1). It is not a generalized hyperbolic model in the sense of the other stochastic volatility models in this section because of the term dZt . It is not clear to what extend the model is approximately hyperbolic. A complication is that the log-price process is a diffusion with jumps rather than a classical diffusion process driven by a Wiener process. As already mentioned in Section 3.3, the autocorrelation function of an OrnsteinUhlenbeck type process decreases exponentially, which, as also mentioned earlier, is faster than what is typically found in financial data. Volatility processes of the form (50), where v(1) t ,...,v(m) t are independent, stationary Ornstein­Uhlenbeck type processes such that the marginal distribution of v is a generalized inverse Gaussian distribution, have a much more flexible autocorrelation structure. That such a volatility process exists was discussed in Section 3.4. Stochastic volatility models of this type often provide a much better fit to financial data. An example of this is given in Barndorff-Nielsen and Shephard (2001c). Also models where the volatility process is a sum of independent Ornstein­Uhlenbeck processes are analytically tractable. Statistical inference for stochastic volatility models cannot easily be based on the likelihood function as it is not explicitly available and quite hard to simulate. Harvey, Ruiz and Shephard (1994) proposed a pseudo-likelihood method based on a Gaussian approximation that allowed them to apply the Kalman filter. More recently, likelihood based methods for stochastic volatility models have been proposed by Kim, Shephard and Chib (1998), and simulation based Bayesian methods using Markov chain Monte Carlo have been developed by Elerian, Chib and Shephard (2001) and Eraker (2001). A new and quite simple way of obtaining an approximate likelihood function for stochastic volatility models, which seems very promising, has been proposed by H. Srensen (2001). The method takes advantage of the fact that lag-k conditional densities are relatively easy to obtain by simulation for stochastic volatility models. Other methods are the indirect inference methods of Gouriéroux, Monfort and Renault (1993), Galant and Tauchen (1996), and Gallant and Long (1997). The prediction-based estimating functions of M. Srensen (2000) can be applied to all models discussed in this section, while the estimators proposed by GenonCatalot, Jeantheau and Larédo (1999) based on limit results (where the time between observations goes to zero) in Genon-Catalot, Jeantheau and Larédo (1998) are developed for volatility processes of the diffusion type. Recently methods based on realized volatility have been proposed, see Gloter (1999) and Banrdorff-Nielsen and Shephard (2002). Surveys that discuss the literature on stochastic volatility models up to 1995 can be found in Ghyseles, Harvey and Renault (1996) and Shephard (1996). Acknowledgment We are grateful to Ole E. Barndorff-Nielsen for his several useful comments on an earlier version of this chapter. The research of Michael Srensen was supported by MaPhySto, The Ch. 6: Hyperbolic Processes in Finance 243 Centre for Mathematical Physics and Stochastics, funded by a grant from The Danish National Research Foundation, and by the European Commission through the DYNSTOCH Network under the Human Potential Programme. Michael Srensen was also supported by the Centre for Analytical Finance and the Danish Mathematical Finance Network, both financed by the Danish Social Science Research Council. The data were put at our disposal by the Centre for Analytical Finance. Appendix In this appendix a few definitions and results concerning Bessel functions are collected. The modified Bessel function of the third kind with index R can be defined by the following integral representation, K(x) = 1 2 0 u-1 e-x(u+u-1)/2 du, x > 0. The modified Bessel function has the following properties: K-(x) = K(x), (A.1) K+1(x) = 2 x K(x) + K-1(x), (A.2) K(x) = - x K(x) - K-1(x). (A.3) For = n + 1/2, n = 0,1,2,..., we have that Kn+1/2(x) = 2x e-x 1 + n i=1 (n + i)! (n - i)!i! (2x)-i . (A.4) For small values of the argument it holds that K(x) ()2-1 x, x 0, if > 0. (A.5) Similarly, we have for large values of the argument that K(x) = 2x e-x 1 + 42 - 1 8x + (42 - 1)(42 - 9) 2!(8x)2 + (42 - 1)(42 - 9)(42 - 25) 3!(8x)3 + . (A.6) 244 B.M. Bibby and M. Srensen The Bessel function of the first kind with index R can be defined for x > 0 by J(x) = 1 0 cos x sin(u) - u du - sin() 0 e-x sinh(u)-u du. For > -1 2 we have J(x) = 2(x/2) ( + 1/2) 1 0 (1 - u2 )-1/2 cos(xu)du, x R, where denotes the gamma function. The Bessel function of the second kind with index R can be defined for x > 0 by Y(x) = 1 0 sin x sin(u) - u du - 1 0 eu + e-u cos() e-x sinh(u) du. The function Y(x) is often alternatively denoted N(x) and is sometimes called Weber's function. The relationship between J(x) and Y(x) is Y(x) = J(x)cos() - J-(x) sin() . In connection with the NIG-distribution, it is useful to know that J1/2(x) = 2 x sin(x) and Y1/2(x) = - 2 x cos(x). References At-Sahalia, Y., 2002. Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach. Econometrica 70, 223­262. Ané, T., Geman, H., 2000. Order flow, transaction clock and normality of asset returns. Journal of Finance 55, 2259­2284. Bagnold, R.A., 1941. The Physics of Blown Sand and Desert Dunes. Methuen, London. Barndorff-Nielsen, O.E., 1977. Exponentially decreasing distributions for the logarithm of particle size. Proceedings of the Royal Society London. Series A 353, 401­419. Barndorff-Nielsen, O.E., 1978. Hyperbolic distributions and distributions on hyperbolae. Scandinavian Journal of Statistics 5, 151­157. Barndorff-Nielsen, O.E., 1997. Normal inverse Gaussian distributions and stochastic volatility modelling. Scandinavian Journal of Statistics 24, 1­13. Barndorff-Nielsen, O.E., 1998. Processes of normal inverse Gaussian type. Finance and Stochastics 2, 41­68. Barndorff-Nielsen, O.E., 2001. Superposition of Ornstein­Uhlenbeck type processes. Theory of Probability and its Applications, 45. Barndorff-Nielsen, O.E., Blsild, P., 1980. Hyperbolic distributions and ramifications: Contributions to theory and application. Research Report 68. Department of Theoretical Statistics, Institute of Mathematics, University of Aarhus. Ch. 6: Hyperbolic Processes in Finance 245 Barndorff-Nielsen, O.E., Blsild, P., Jensen, J.L., Srensen, M., 1985. The fascination of sand. In: Atkinson, A.C., Fienberg, S.E., (Eds.), A Celebration of Statistics, Springer, New York, pp. 57­87. Barndorff-Nielsen, O.E., Halgreen, C., 1977. Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 38, 309­312. Barndorff-Nielsen, O.E., Jensen, J.L., Srensen, M., 1990. Parametric modelling of turbulence. Philosophical Transactions of the Royal Society of London. Series A 332, 439­455. Barndorff-Nielsen, O.E., Jensen, J.L., Srensen, M., 1998. Some stationary processes in discrete and continuous time. Advances in Applied Probability 30, 989­1007. Barndorff-Nielsen, O.E., Shephard, N., 2001a. Integrated OU processes and non-Gaussian OU-based stochastic volatility models. Research report. MaPhySto, University of Aarhus. Scandinavian Journal of Statistics, forthcoming. Barndorff-Nielsen, O.E., Shephard, N., 2001b. Modelling by Lévy processes for financial econometrics. In: Barndorff-Nielsen, O.E., Mikosch, T., Resnick, S. (Eds.), Lévy Processes: Theory and Applications. Birkhäuser, Boston. Barndorff-Nielsen, O.E., Shephard, N., 2001c. Non-Gaussian Ornstein­Uhlenbeck-based models and some of their uses in financial econometrics (with discussion). Journal of the Royal Statistical Society. Series B 63, 167­241. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society. Series B 64, 253­280. Bertoin, J., 1996. Lévy Processes. Cambridge University Press. Bibby, B.M., Jacobsen, M., Srensen, M., 2002. Estimating functions for discretely sampled diffusion-type models. In: At-Sahalia, Y., Hansen, L.P. (Eds.), Handbook of Financial Econometrics. North-Holland, Amsterdam, forthcoming. Bibby, B.M., Skovgaard, I.M., Srensen, M., 2002. Diffusion-type models with given marginals and autocorrelation function. Preprint. Department of Theoretical Statistics, University of Copenhagen, submitted. Bibby, B.M., Srensen, M., 1995. Martingale estimation functions for discretely observed diffusion processes. Bernoulli 1, 17­39. Bibby, B.M., Srensen, M., 1996. On estimation for discretely observed diffusions: A review. Theory of Stochastic Processes 2, 49­56. Bibby, B.M., Srensen, M., 1997. A hyperbolic diffusion model for stock prices. Finance and Stochastics 1, 25­41. Bibby, B.M., Srensen, M., 2001. Simplified estimating functions for diffusion models with a high-dimensional parameter. Scandinavian Journal of Statistics 28 (1), 99­112. Bibby, B.M., Srensen, M., 2002. Flexible stochastic volatility models of the diffusion type. In preparation. Black, F., 1976. Studies of stock price volatility changes. Proceedings of Business and Economics Statistics Section American Statistical Association 177­181. Blsild, P., 1978. The shape of the generalized inverse Gaussian and hyperbolic distributions. Research Report 37. Department of Theoretical Statistics, Institute of Mathematics, University of Aarhus. Blsild, P., Srensen, M.K., 1992. HYP ­ A computer program for analyzing data by means of the hyperbolic distribution. Research Report No. 248. Department of Theoretical Statistics, Institute of Mathematics, University of Aarhus. Blattberg, R.C., Gonedes, N., 1974. A comparison of the stable and student distributions as models for stock prices. Journal of Business 47, 244­280. Chan, K.C., Karolyi, G.A., Longstaff, F.A., Sanders, A.B., 1992. An empirical comparison of alternative models of the short-term interest rate. Journal of Finance 47, 1209­1227. Clark, P.K., 1973. A subordinated stochastic process with finite variance for speculative prices. Econometrica, 41. Cox, D.R., 1984. Long-range dependence: A review. In: David, H.A., David, H.T. (Eds.), Statistics: An Appraisal. Iowa State University Press. Cox, J.C., Ingersoll, J.E., Jr., Ross, S.A., 1985. A theory of the term structure of interest rates. Econometrica 53 (2), 385­407. 246 B.M. Bibby and M. Srensen Dacunha-Castelle, D., Florens-Zmirou, D., 1986. Estimation of the coefficients of a diffusion from discrete observations. Stochastics 19, 263­284. Drost, F.C., Nijman, T.E., 1993. Temporal aggregation of GARCH processes. Econometrica 61, 909­927. Eberlein, E., 2001. Application of generalized hyperbolic Lévy motions to finance. In: Barndorff-Nielsen, O.E., Mikosch, T., Resnick, S. (Eds.), Lévy Processes ­ Theory and Applications. Birkhäuser, Boston, pp. 319­337. Eberlein, E., Jacod, J., 1997. On the range of option prices. Finance and Stochastics 1, 131­140. Eberlein, E., Keller, U., 1995. Hyperbolic distributions in finance. Bernoulli 1, 281­299. Eberlein, E., Keller, U., Prause, K., 1998. New insights into smile, mispricing and value at risk: The hyperbolic model. Journal of Business 71, 371­406. Eberlein, E., Prause, K., 2002. The generalized hyperbolic model: Financial derivatives and risk measures. In: Geman, H., Madan, D., Pliska, S., Vorst, T. (Eds.), Mathematical Finance ­ Bachelier Congress 2000. Springer, Heidelberg, pp. 245­267. Eberlein, E., Raible, S., 1999. Term structure models driven by general Lévy processes. Mathematical Finance 9, 31­53. Eberlein, E., Raible, S., 2001. Some analytic facts on the generalized hyperbolic model. In: Casacuberta, C., et al. (Eds.), Proceedings of the Third European Meeting of Mathematicians, Vol. II. In: Progress in Mathematics, Vol. 202. Birkhäuser, Boston, pp. 367­378. Elerian, O., Chib, S., Shepard, N., 2001. Likelihood inference for discretely observed non-linear diffusions. Econometrica 69, 959­993. Epps, T.W., Epps, M.L., 1976. The stochastic dependence of security price changes and transaction volumes: Implications for the mixture-of-distributions hypothesis. Econometrica 44, 305­321. Eraker, B., 2001. MCMC analysis of diffusion models with application to finance. Journal of Business and Economic Statistics 19, 177­191. Gallant, A.R., Long, J.R., 1997. Estimating stochastic differential equations efficiently by minimum chi-square. Biometrika 84, 125­141. Gallant, A.R., Tauchen, G., 1996. Which moments to match? Econometric Theory 12, 657­681. Genon-Catalot, V., Jeantheau, T., Larédo, C., 1998. Limit theorems for discretely observed stochastic volatility models. Bernoulli 4, 283­303. Genon-Catalot, V., Jeantheau, T., Larédo, C., 1999. Parameter estimation for discretely observed stochastic volatility models. Bernoulli 5, 855­872. Genon-Catalot, V., Jeantheau, T., Larédo, C., 2000. Stochastic volatility models as hidden Markov models and statistical applications. Bernoulli 6, 1051­1079. Ghysels, E., Harvey, A.C., Renault, E., 1996. Stochastic volatility. In: Rao, C.R., Maddala, G.S. (Eds.), Statistical Methods in Finance. North-Holland, Amsterdam, pp. 119­191. Gloter, A., 1999. Parameter estimation for a hidden diffusion. Preprint 20/99. University of Marne-la-Vallée. Good, I.J., 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40, 237­264. Gouriéroux, C., Monfort, A., Renault, E., 1993. Indirect inference. Journal of Applied Econometrics 8, S85­S118. Halgreen, C., 1979. Self-decomposability of generalized inverse Gaussian and hyperbolic distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 47, 13­17. Hansen, L.P., Scheinkman, J.A., 1995. Back to the future: Generating moment implications for continuous-time Markov processes. Econometrica 63, 767­804. Harvey, A.C., Ruiz, E., Shephard, N., 1994. Multivariate stochastic variance models. Review of Economic Studies 61, 247­264. Heston, S.L., 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6, 327­343. Honoré, P., 1997. Maximum likelihood estimation of non-linear continuous-time term-structure models. Working Paper 1997-7. Department of Finance, Aarhus School of Business. Hull, J., White, A., 1988. An analysis of the bias in option pricing caused by a stochastic volatility. Advances in Futures and Options Research 3, 29­61. Ch. 6: Hyperbolic Processes in Finance 247 Hurst, S.R., 1997. On the stochastic dynamics of stock market volatility. Ph.D. Thesis. Australian National Uni- versity. Hurst, S.R., Platen, E., Rachev, S.T., 1997. Subordinated market index models: A comparison. Financial Engineering and the Japanese Markets 4, 97­124. Jacobsen, M., 2001. Discretely observed diffusions; classes of estimating functions and small -optimality. Scandinavian Journal of Statistics 28 (1), 123­150. Jacod, J., Shiryaev, A.N., 1987. Limit Theorems for Stochastic Processes. Springer, New York. Jensen, J.L., Pedersen, J., 1999. Ornstein­Uhlenbeck type processes with non-normal distribution. Journal of Applied Probability 36, 389­402. Jiang, W., 2000. Some simulation-based models towards mathematical finance. Ph.D. Thesis. University of Aarhus. Jrgensen, B., 1982. Statistical Properties of the Generalized Inverse Gaussian Distribution. Lecture Notes in Statistics, Vol. 9. Springer-Verlag, New York. Jurek, Z.J., Mason, J.D., 1993. Operator-Limit Distributions in Probability Theory. Wiley, New York. Jurek, Z.J., Vervaat, W., 1983. An integral representation for self-decomposable Banach space valued random variables. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 62, 247­262. Keller, U., 1997. Realistic modelling of financial derivatives. Ph.D. Thesis. Universität Freiburg. Kessler, M., 2000. Simple and explicit estimating functions for a discretely observed diffusion process. Scandinavian Journal of Statistics 27, 65­82. Kessler, M., Srensen, M., 1999. Estimating equations based on eigenfunctions for a discretely observed diffusion process. Bernoulli 5, 299­314. Kim, S., Shephard, N., Chib, S., 1998. Stochastic volatility: Likelihood inference and comparison with ARCH models. Review of Economic Studies 65, 361­393. Küchler, U., Neumann, K., Srensen, M., Streller, A., 1999. Stock returns and hyperbolic distributions. Mathematical and Computer Modelling 29, 1­15. Madan, D.B., Chang, E.C., 1996. Volatility smiles, skewness premia and risk metrics: Application of a four parameter closed form generalization of geometric Brownian motion to the pricing of options. Presented at a Conference on Mathematical Finance, University of Aarhus, Denmark. Madan, D.B., Milne, F., 1991. Option pricing with V.G. martingale components. Mathematical Finance 1, 39­55. Madan, D.B., Seneta, E., 1990. The variance gamma (V.G.) model for share market returns. Journal of Business 63, 511­524. Nelson, D.B., 1991. Conditional heteroskedasticity in asset pricing: A new approach. Econometrica 59, 347­370. Pedersen, A.R., 1995. A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics 22, 55­71. Poulsen, R., 1999. Approximate maximum likelihood estimation of discretely observed diffusion processes. Working Paper 29. Centre for Analytical Finance, Aarhus. Praetz, P.D., 1972. The distribution of share prices. Journal of Business 45, 49­55. Prause, K., 1999. The generalized hyperbolic model: Estimation, financial derivatives, and risk measure. Ph.D. Thesis. Universität Freiburg. Protter, P., 1990. Stochastic Integration and Differential Equations ­ A New Approach. Springer, New York. Raible, S., 2000. Lévy processes in finance: Theory, numerics, and empirical facts. Ph.D. Thesis. Universität Freiburg. Rosínski, J., 1991. On a class of infinitely divisible processes represented as mixtures of Gaussian processes. In: Cambanis, S., Samorodnitsky, G., Taqqu, M.S. (Eds.), Stable Processes and Related Topics. Birkhäuser, Boston, pp. 27­41. Rosínski, J., 2001. Series representations of Lévy processes from the perspective of point processes. In: BarndorffNielsen, O.E., Mikosch, T., Resnick, S. (Eds.), Lévy Processes ­ Theory and Applications. Birkhäuser, Boston. Rydberg, T.H., 1997. The normal inverse Gaussian Lévy process: Simulation and approximation. Communications in Statistics. Stochastic Models 13, 887­910. Rydberg, T.H., 1999. Generalized hyperbolic diffusion processes with applications in finance. Mathematical Finance 9, 183­201. 248 B.M. Bibby and M. Srensen Rydberg, T.H., 2000. Realistic statistical modelling of financial data. International Statistical Review 68, 233­ 258. Sato, K., 1999. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press. Sato, K., Watanabe, T., Yamazato, M., 1994. Recurrence conditions for multivariate processes of OrnsteinUhlenbeck type. Journal of the Mathematical Society of Japan 46, 245­265. Sato, K., Yamazato, M., 1982. Stationary processes of the Ornstein­Uhlenbeck type. In: Ito, K., Prohorov, J.V. (Eds.), Probability Theory and Mathematical Statistics. In: Lecture Notes in Mathematics, Vol. 1021. SpringerVerlag, Berlin. Sato, K., Yamazato, M., 1984. Operator-selfdecomposable distributions as limit distributions of processes of the Ornstein­Uhlenbeck type. Stochastic Processes and their Applications 17, 73­100. Seshadri, V., 1997. Halphen's laws. In: Kotz, S., Read, C.B., Banks, D.L. (Eds.), Encyclopedia of Statistical Sciences, Update Volume 1. Wiley, New York, pp. 302­306. Shephard, N., 1996. Statistical aspects of ARCH and stochastic volatility. In: Cox, D.R., Hinkley, D.V., BarndorffNielsen, O.E. (Eds.), Time Series Models in Econometrics, Finance and other Fields. Chapman and Hall, London, pp. 1­67. Sichel, H.S., 1973. Statistical evaluation of diamondiferous deposits. Journal of South African Institut Min. Metall. 76, 235­243. Srensen, H., 2000. Inference for diffusion processes and stochastic volatility models. Ph.D. Thesis. Department of Statistics and Operations Research, University of Copenhagen. Srensen, H., 2001. Simulated likelihood approximations for stochastic volatility models. Preprint 1. Department of Theoretical Statistics, University of Copenhagen. Forthcoming in Scandinavian Journal of Statistics. Srensen, M., 1997a. Estimating functions for discretely observed diffusions: A review. In: Basawa, I.V., Godambe, V.P., Taylor, R.L. (Eds.), Selected Proceedings of the Symposium on Estimating Functions. In: IMS Lecture Notes ­ Monograph Series, Vol. 32. Institute of Mathematical Statistics, Hayward. Srensen, M., 1997b. Exponential family inference for diffusion models. Research Report No. 383. Department of Theoretical Statistics, Institute of Mathematics, University of Aarhus. Srensen, M., 2000. Prediction-based estimating functions. Econometrics Journal 3, 123­147. Wolfe, S.J., 1982. On a continuous analogue of the stochastic difference equation Xn = Xn-1 + Bn. Stochastic Processes and their Applications 12, 301­312. Chapter 7 STABLE MODELING OF MARKET AND CREDIT VALUE AT RISK SVETLOZAR T. RACHEV Department of Statistics and Applied Probability, University of California, Santa Barbara, USA Institute of Statistics and Mathematical Economics, University of Karlsruhe, Germany e-mail: rachev@lsoe-4.wiwi.uni-karlsruhe.de EDUARDO S. SCHWARTZ Anderson School of Management, University of California, Los Angeles, USA IRINA KHINDANOVA Colorado School of Mines Contents Abstract 250 1. Introduction 251 2. "Normal" modeling of VaR 253 2.1. VaR for a single asset 253 2.2. Portfolio VaR 254 3. A finance-oriented description of stable distributions 255 3.1. Parameters and properties of stable distributions 255 3.2. Estimation of parameters of stable distributions 259 3.2.1. Tail estimation 259 3.2.2. Entire-distribution modeling 262 3.2.3. Tail estimation: Fast Fourier transform method 263 4. VaR estimates for stable distributed financial returns 264 4.1. In-sample evaluation of VaR estimates 264 4.2. Forecast-evaluation of VaR estimates 279 5. Stable modeling and risk assessment for individual credit returns 283 6. Portfolio credit risk for independent credit returns 287 7. Stable modeling of portfolio risk for symmetric dependent credit returns 290 8. Stable modeling of portfolio risk for skewed dependent credit returns 296 9. One-factor model of portfolio credit risk 299 10. Credit risk evaluation for portfolio assets 300 11. Portfolio credit risk 305 11.1. Independent credit risks 305 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 250 S.T. Rachev et al. 11.2. Symmetric dependent credit risks 305 11.3. Skewed dependent credit risks 307 12. Conclusions 309 Appendix A. Stable modeling of credit returns in figures 311 Appendix B. Tables 317 Appendix C. OLS credit risk evaluation for portfolio assets in figures 320 Appendix D. GARCH credit risk evaluation for portfolio assets in figures 324 Acknowledgments 326 References 326 Abstract The chapter examines the use of stable Paretian distributions in modeling market and credit Value at Risk (VaR). The in-sample- and forecast-evaluations show that stable market VaR modeling outperforms the "normal" modeling for high values of the VaR confidence level. The chapter also develops a new technique for estimating correlation, constructs a new method for simulating portfolio values, and assesses portfolio VaR in various cases of credit instruments' distributions: independent, symmetric dependent, and skewed dependent. Ch. 7: Stable Modeling of Market and Credit Value at Risk 251 1. Introduction One of the most important tasks of financial institutions is evaluating the exposure to market and credit risks. Market risks arise from variations in prices of equities, commodities, exchange rates, and interest rates. Credit risks refer to potential losses that might occur because of a change in the counterparty's credit quality such as a rating migration or a default. The dependence on market and credit risks can be measured by changes in the portfolio value, or profits and losses. A commonly used methodology for estimation of risks is the Value at Risk (VaR). In the text below, the market VaR implies the VaR measurements associated with market risks and the credit VaR means the VaR linked to credit risks. A VaR measure is the highest possible loss over a certain period of time at a given confidence level. For example, if the daily VaR for a given portfolio of assets is reported to be $2 million at the 95 percent confidence level, it means that, without abrupt changes in the market conditions, one-day losses will exceed $2 million 5 percent of the time. Formally, a VaR = VaRt, is defined as the upper bound of the one-sided confidence interval: Pr P() < -VaR = 1 - c, (1) where c is the confidence level and P() = Pt() is the relative change (return) in the portfolio value over the time horizon . Pt() = P(t + ) - P(t), where P(t) = logS(t), S(t) is the portfolio value at t, the time period is [t,T ], with T - t = , and t is the current time. The essence of the VaR computations is estimation of low quantiles in the portfolio return distributions. The VaR techniques suggest different ways of constructing the portfolio return distributions. The traditional methods are the parametric method, historical simulation, Monte Carlo simulation, and stress-testing. One of the parametric approaches, the variance­covariance method, is based on the normal assumption for the distribution of financial returns. However, financial data often violate the normality assumption. The empirical observations exhibit "fat" tails and excess kurtosis. The historical method does not impose distributional assumptions but it is not reliable in estimating low quantiles of P with a small number of observations in the tails. The performance of the Monte Carlo method depends on the quality of distributional assumptions on the underlying risk factors. A well-known methodology of constructing credit portfolio return distributions is the CreditMetricsTM product of J.P. Morgan.1 It is based on the rating transition model of Jarrow, Lando and Turnbull (1997) and assumptions that joint credit quality changes are driven by joint movements of firms' assets values. 1 See Gupton, Finger and Bhatia (1997). 252 S.T. Rachev et al. The existing methods do not provide satisfactory evaluation of VaR. The main drawback is inadequate approximation of distributional forms of portfolio returns. Given the nature (heavy tails, excess kurtosis, and skewness2) of empirical financial data, the stable Paretian distributions seem to be the most appropriate distributional models.3 The chapter examines the use of stable Paretian distributions in modeling market and credit VaR. The stable distributions are described by four parameters: tail index, skewness, location, and scale. Modeling with such parameters will depict fat tails and skewness of distributions. Empirical analysis reported here confirms that, indeed, stable modeling captures heavy-tailedness and asymmetry of financial returns, and, therefore, produces more accurate risk estimates. The in-sample- and forecast-evaluations show that stable market VaR modeling outperforms the "normal" modeling for high values of the VaR confidence level. The stable distributions possess the additivity property: a linear combination of independent stable (or jointly stable) random variables with stability index is again a stable random variable with the same . The additivity property provides analytic formulas for parameters of portfolio returns. In the case of independent instruments, the formulas are simple and can be used for estimating portfolio risk without simulations. An analyst can employ "independent" risk measurements as lower bounds of portfolio risk estimates. A symmetric stable random variable can be interpreted as a transformation of a normal random variable. Based on this property, a new technique is developed here for estimating correlation. A stable random variable can be decomposed into the "symmetry" and "skewness" parts. Building on this feature, we construct a new method for simulating a distribution of portfolio values. We apply this method for portfolio risk evaluation in various cases of credit instruments' distributions: independent, symmetric dependent, and skewed dependent. The remainder of the chapter is organized as follows. In Section 2 we discuss computation of VaR using the variance­covariance method, which is based on the normality assumption for the distribution of financial returns. Section 3 provides a finance-oriented description of stable distributions. In Section 4 we estimate the market VaR measurements employing normal and stable modeling of financial returns.4 Section 5 investigates stable modeling of credit returns and discusses risk assessment for individual credit instruments. Section 6 considers portfolio risk estimation for independent portfolio assets and derives lower bounds for risk measurements. Sections 7 and 8 present, respectively, evaluation of portfolio risk in two cases of dependent portfolio instruments': symmetric and skewed. Section 9 describes a main framework of the one-factor model. Section 10 discusses credit risk evaluation for portfolio assets. Section 11 explains portfolio credit risk estimation. Section 12 states conclusions. 2 Skewness is most pronounced in distributions of value changes of credit instruments. For references, see Gupton, Finger and Bhatia (1997), Federal Reserve System Task Force on Internal Credit Risk Models (1998), Basle Committee on Banking Supervision (1999). 3 Cheng and Rachev (1995), Chobanov et al. (1996), Fama (1965), Gamrowski and Rachev (1994, 1995a, b), Mandelbrot (1962, 1963a, b, 1967), McCulloch (1996), Mittnik and Rachev (1991, 1993a, b), Mittnik, Rachev and Chenyao (1996), Mittnik, Rachev and Paolella (1998). 4 See also Gamrowski and Rachev (1996). Ch. 7: Stable Modeling of Market and Credit Value at Risk 253 2. "Normal" modeling of VaR From the definition of VaR = VaRt, in Equation (1), the VaR values are obtained from the probability distribution of portfolio value returns: 1 - c = FP (-VaR) = -VaR fP (x)dx, where FP (x) = Pr(P x) is the cumulative distribution function (cdf) of portfolio returns in one period, and fP (x) is the probability density function (pdf) of P .5 If the changes in the portfolio value are characterized by a parametric distribution, VaR can be computed using the distribution parameters. In this section we review "normal" modeling ­ a parametric method based on the normal distribution assumption. It is often called the variance­covariance method. We describe applications of the methodology for computing VaR of a single asset and portfolio VaR. 2.1. VaR for a single asset Assume that a portfolio consists of a single asset, which depends only on one risk factor. Traditionally, in this setting, the distribution of asset returns is assumed to be the univariate normal distribution, identified by two parameters: the mean, , and the standard deviation, . The problem of calculating VaR is then reduced to finding the (1 - c)-th percentile of the standard normal distribution z1-c: 1 - c = X g(x)dx = z1-c (z)dz = N(z1-c), with X = z1-c + , where (z) is the standard normal density function, N(z) is the cumulative normal distribution function, X is the portfolio return, g(x) is the normal distribution function for returns with mean and standard deviation , and X is the lowest return at a given confidence level c. In many applications investors assume that the expected return equals 0. This assumption is based on the conjecture that the magnitude of is substantially smaller than the magnitude of the standard deviation and, therefore, can be ignored. Then we have X = z1-c and, therefore, VaR = -Y0X = -Y0z1-c, where Y0 is the initial portfolio value. 5 If fP (x) does not exist, then VaR can be obtained from the cdf FP . 254 S.T. Rachev et al. 2.2. Portfolio VaR If a portfolio consists of many assets, the computation of VaR is performed in several steps. Portfolio assets are decomposed into "building blocks", which depend on a finite number of risk factors. Exposures of the portfolio securities are combined into risk categories. Then, the total portfolio risk is obtained by aggregating risk factors and their correlations. We denote: * Xp is the portfolio return in one period, * N is the number of assets in the portfolio, * Xi is the i-th asset return in one period ( = 1), Xi = P(1) = Pi(1)-Pi(0), where Pi is the log-spot price of asset i, i = 1,...,N. More generally, Xi can be the risk factor that enters linearly6 in the portfolio return. * wi is the i-th asseťs weight in the portfolio, i = 1,...,N. The portfolio return is Xp = N i=1 wiXi. In matrix notation, Xp = wT X, where w = (w1,w2,...,wN )T , X = (X1,X2,...,XN )T . Then the portfolio variance is V (Xp) = wT w = N i=1 w2 i ii + N i=1 N j=1 i=j wiwj ij ij , where ii is the variance of returns on the i-th asset, i is the standard deviation of returns on the i-th asset, ij is the correlation between the returns on the i-th and the j-th assets, is the covariance matrix, = [ij ], 1 i N, 1 j N. If all portfolio returns are jointly normally distributed, the portfolio return, as a linear combination of normal variables, is also normally distributed. The portfolio VaR based on the normal distribution assumption is VaR = -Y0z1-c(XP ), 6 If the risk factor does not enter linearly (as in a case of an option), then a linear approximation is used. Ch. 7: Stable Modeling of Market and Credit Value at Risk 255 where (Xp) is the portfolio standard deviation (the portfolio volatility), (Xp) = V (Xp). Thus, risk can be represented by a combination of linear exposures to normally distributed factors. Hence, estimation of risk reduces to evaluation of the covariance matrix of portfolio risk factors (in the simplest case, individual asset returns). The simplicity of normal modeling explains its common use for VaR computation despite the fact that financial data often violate the normality assumption. We conjecture that stable distributions are more adequate distributional models. In the following sections we analyze the stable modeling of market and credit VaR. We begin the analysis with providing a finance-oriented description of stable distributions. 3. A finance-oriented description of stable distributions In this part we describe parameters and some finance-oriented properties of stable distributions. We also examine methods of estimating parameters of stable laws. 3.1. Parameters and properties of stable distributions A random variable R is said to be stable7 if for any a > 0 and b > 0 there exist constants c > 0 and d R such that aR1 + bR2 d = cR + d, where R1 and R2 are independent copies of R and d = denotes the equality in distribution. In general, stable distributions do not have closed form expressions for the density and distribution functions. Stable random variables (R) are commonly described by their characteristic functions: R() = E exp(iR) = exp - || 1 - i sign()tan 2 + i , if = 1, R() = E exp(iR) = exp -|| 1 + i 2 sign()ln + i , if = 1, where is the index of stability, 0 < 2, is the skewness parameter, -1 1, is the scale parameter, 0, and is the location parameter, R. To indicate the dependence of a stable random variable R on its parameters, we write R S(,,). If 7 Often R is called -stable or Pareto stable or Pareto­Lévy-stable (for < 2). 256 S.T. Rachev et al. the index of stability = 2, then the stable distribution reduces to the Gaussian distribution. In empirical studies, the modeling of financial return data is done typically with stable distributions having 1 < < 2.8 Stable distributions are unimodal and the smaller is, the stronger the leptokurtic feature of the distribution (the peak of the density becomes higher and the tails are heavier). Thus, the index of stability can be interpreted as a measure of kurtosis. When > 1, the location parameter measures the mean of the distribution. If the skewness parameter = 0, the distribution of R is symmetric and the characteristic function is R() = E exp(iR) = exp - || + i . If > 0, the distribution is skewed to the right. If < 0, the distribution is skewed to the left. Larger magnitudes of indicate stronger skewness. If = 0 and = 0, then the stable random variable R is called symmetric -stable (ss). The scale parameter (the volatility) allows any stable random variable R to be expressed as R = R0, where R0 has a unit scale parameter, and the same index of stability and skewness parameter as R. The scale parameter generalizes the definition of standard deviation. The stable analog of variance is the variation: = . In VaR estimations we are interested in investigating the behavior of the distributions in the tails. The tails of the stable (non-Gaussian) distributions have a power decay and are characterized by the following properties: lim + P(R > ) = k 1 + 2 and lim + P(R < -) = k 1 - 2 , where k = 1 - (2 - )cos(/2) , if = 1, k = 2 , if = 1.9 The p-th absolute moment, E|R|p = 0 P(|R|p > x)dx, is * finite if p < or = 2, and * infinite otherwise. 8 The financial returns modeled with -stable laws exhibit finite means but infinite variances. 9 Note that, in contrast to the normal case, the tails of the non-Gaussian (Pareto) stable distributions are much fatter, which will be an important issue in estimating VaR. Ch. 7: Stable Modeling of Market and Credit Value at Risk 257 Thus, the second moment of any non-Gaussian stable distribution is infinite. Stable distributions possess the additivity property: a linear combination of independent stable random variables with stability index is again a stable random variable with the same .10 Example. If R1,R2,...,Rn are independent stable random variables with stability index , Ri S(i,i,i), then R = n i=1 wiRi is a stable random variable with the same and parameters: (a) if = 1, = |w1|1 + + |wn|n 1/ , = sign(wi)1(|w1|1) + + sign(wn)n(|wn|n) (|w1|1) + + (|wn|n) , = w11 + + wnn; (b) if = 1, = |w1|1 + + |wn|n, = sign(w1)1|w1|1 + + sign(wn)n|wn|n |w1|1 + + |wn|n , = w11 + + wnn - 2 w1 ln|w1|11 + + wn ln|wn|nn . Since the Pareto-stable distributions have infinite variances, one cannot estimate risk by variance and dependence by correlations. We shall introduce variance- and covariancesimilar notions for stable laws. These notions are based on the multivariate assumptions of stable distributions. A random vector R of dimension d is stable if for any a > 0 and b > 0 there exist c > 0 and a d-dimensional vector D such that aR1 + bR2 d = cR + D, where R1 and R2 are independent copies of R. If a random vector is stable with > 1, then it means that all components of the vector are stable with the same index of stability and any linear combination (for example, portfolio returns) is again stable.11 10 This property is shared only by normal and stable laws, and is the main advantage of the use of stable laws for portfolio returns. 11 We shall model the dependence structure of the vector of returns (R1,... ,Rd ) of a portfolio by assuming that (R1,...,Rd ) is an -stable vector. 258 S.T. Rachev et al. The characteristic function of a d-dimensional vector is given by: (a) if = 1, R() = R(1,2,...,d) = E exp iT R = exp - Sd T s 1 - isign T s tan 2 (ds) + iT , (b) if = 1, R() = exp - Sd T s 1 + i 2 sign T s ln T s (ds) + iT , where is a bounded nonnegative measure on the unit sphere Sd, s is the integrand unit vector (s Sd ) and is the shift vector. The measure is named a spectral measure. Let H be the distribution function of . Then, the characteristic function in polar coordinates is as follows (a) if = 1, R() = exp -|| 0 0 ... 2 0 cos(,) × 1 - sgn cos(,) tan 2 dH() + iT , (b) if = 1, R() = exp -|| 0 0 ... 2 0 cos(,) × 1 - sgn cos(,) 2 ln cos(,) dH() + iT , where for given by its polar coordinates, ( sin 1 sind-1, sin 1 sin d-2 × cosd-1, sin 1 sind-3 cosd-2,..., cos1), we denote cos(,) = d-1 i=1 sini sini + d-2 i=1 sin i sini cosd-1 cosd-1 + + cos1 cos1. Ch. 7: Stable Modeling of Market and Credit Value at Risk 259 If > 1, then is the mean vector, = ER. The scale parameter of a linear combination of the components of a stable vector R satisfies the relation: wT R = (w1R1 + + wdRd) = Sd wT s (ds). Viewing R = (R1,...,Rd) as the vector of individual returns in a portfolio with weights w1,...,wd, (wTR) will be the portfolio risk-measure. As we defined above, = is the variation, the stable equivalent of variance. Similarly to the traditional interpretation of covariance as an indicator of dependence, one can use the covariation to estimate the dependence between two ss distributions: [R1;R2] = 1 (w1R1 + w2R2) w1 w1=0; w2=1 = Sn s1s -1 2 (ds), where (R1,R2) is a ss vector (1 < ) and x k = |x|k sgn(x) (signed power). The matrix of covariations [Ri;Rj ], 1 i d, 1 j d, determines the dependence structure among the individual returns in the portfolio. 3.2. Estimation of parameters of stable distributions12 We shall examine the methods of estimating the stable parameters and their applicability in VaR computations, where the primary concern is the tail behavior of distributions. It has been proposed that it is more useful to evaluate directly the tail index (the index of stability) instead of fitting the whole distribution. The latter method is claimed to negatively affect the estimation of the tail behavior by its use of "center" observations. We shall describe both approaches: tail estimation and entire-distribution modeling. We suggest a method, which combines the two techniques: it is designed for fitting the overall distribution with greater emphasis on the tails. 3.2.1. Tail estimation Tail estimators for the index of stability are based on the asymptotic Pareto tail behavior of stable distributions.13 We shall consider the following estimators of tail thickness: the Hill, the Pickands, and the modified unconditional Pickands.14 12 For additional references on estimation of the four parameters of stable univariate laws, see Chobanov et al. (1996), Gamrowski and Rachev (1994, 1995a, b), Klebanov, Melamed and Rachev (1994), Kozubowski and Rachev (1994), McCulloch (1996), Mittnik and Rachev (1991), Rachev and SenGupta (1993). For the multivariate case estimation of: the spectral measure, the index of stability, the covariation and tests for dependence of stable distributed returns, see Cheng and Rachev (1995), Gamrowski and Rachev (1994, 1995a, b, 1996), Heathcote, Cheng and Rachev (1995), Mittnik and Rachev (b), Rachev and Xin (1993). 13 See Section 3.1. 14 For details on the Hill, Pickands, and the modified unconditional Pickands estimators, see Mittnik, Paolella and Rachev (1998c) and references therein. 260 S.T. Rachev et al. The Hill estimator15 is described by ^Hill = 1 1 k k j=1 ln(Xn+1-j:n) - lnXn-k:n , where Xj:n denotes the j-th order statistic of sample X1,...,Xn;16 the integer k points where the tail area "starts". The selection of k is complicated by a tradeoff: it must be adequately small so that Xn-k:n is in the tail of the distribution; but if it is too small, the estimator is not accurate. The disadvantage of the estimator is the condition to explicitly determine the order statistic Xn-k:n. It is proved that, for stable Paretian distributions, the Hill estimator is consistent and asymptotically normal. Mittnik, Paolella and Rachev (1998c) found that, the small sample performance of ^Hill does not resemble its asymptotic behavior, even for n > 10 000 (see Figure 117). It is necessary to have enormous data series in order to obtain unbiased estimates of , for example, with = 1.9, reasonable estimates are produced only for n > 100 000 (see Figure 218). Alternatives to the Hill estimator are the Pickands and the modified unconditional Pickands estimators. The "original" Pickands estimator19 takes the form ^Pick = ln2 ln(Xn-k+1:n - Xn-2k+1:n) - ln(Xn-2k+1:n - Xn-4k+1:n) , 4k < n. The Pickands estimator requires choice of the optimal k, which depends on the true unknown . Mittnik and Rachev (1996) proposed a new tail estimator named "the modified unconditional Pickands (MUP) estimator", ^MUP. An estimate of is obtained by applying the nonlinear least squares method to the following system: k2 = X2,X-1 1 k1 + , where X1 = Xn-k+1:n X-2 n-k+1:n Xn-2k+1:n X-2 n-2k+1:n , X2 = Xn-3k+1:n X-2 n-3k+1:n Xn-4k+1:n X-2 n-4k+1:n , 15 Hill (1975). 16 Given a sample of observations X1,... ,Xn, we rearrange the sample in increasing order X1:n Xn:n, then the j-th order statistic is equal to Xj:n. 17 In Figure 1, the true value of is 1.9, the sample size is n = 10000; the x-axis shows values of k from 1 to n/2 = 5000. Notice that the estimator for ^ = ^(k(n),n) is unbiased when limn(k(n)/n) 0. So, unbiasedness of the estimator requires very small values of k. However, for a small value of k, the variance of the estimator is large. A close look at the estimator ^(k,n) suggests value of ^ around 2.2, whereas = 1.9. 18 In Figure 2, the true is again 1.9, the sample size is n = 500,000, k = 1,... ,n/2 = 250,000. One can see that, for very small values of k, 1.9. 19 Pickands (1975). Ch. 7: Stable Modeling of Market and Credit Value at Risk 261 Fig. 1. Hill estimator for 10 000 standard stable observations with index = 1.9. Fig. 2. Hill estimator for 500 000 standard stable observations with index = 1.9. 262 S.T. Rachev et al. k1 = k - 1 2k - 1 , and k2 = 3k - 1 4k - 1 . Mittnik, Paolella and Rachev (1998c) found that the optimal k for ^MUP is far less dependent on than in the case of either the Hill or Pickands estimators. Studies demonstrated that ^MUP is approximately unbiased for [1.00,1.95) and nearly normally distributed for large sample sizes. The MUP estimator appears to be useful in empirical analysis. 3.2.2. Entire-distribution modeling We shall describe the following methods of estimating stable parameters with fitting the entire distribution: quantile approaches, characteristic function (CF) techniques, and maximum likelihood (ML) methods. Fama and Roll (1971) suggested the first quantile approach based on observed properties of stable quantiles. Their method was designed for evaluating parameters of symmetric stable distributions with index of stability > 1. The estimators exhibited a small asymptotic bias. McCulloch (1986) offered a modified quantile technique, which provided consistent and asymptotically normal estimators of all four stable parameters, for [0.6,2.0] and [-1,1]. The estimators are derived using functions of five sample quantiles: the 5%, 25%, 50%, 75%, and 95% quantiles. Since the estimators do not consider observations in the tails (below the 5% quantile and above the 95% quantile), the McCulloch method does not appear to be suitable for estimating parameters in VAR modeling. Characteristic function techniques are built on fitting the sample CF to the theoretical CF. Press (1972a, b) proposed several CF methods: the minimum distance, the minimum r-th mean distance, and the method of moments. Koutrouvelis (1980, 1981) developed the iterative regression procedure. Kogon and Williams (1998) modified the Koutrouvelis method by eliminating iterations and limiting the estimation to a common frequency interval.20 CF estimators are consistent and under certain conditions are asymptotically normal.21 Maximum likelihood methods for estimating stable parameters differ in a way of computing the stable density. DuMouchel (1971) evaluated the density by grouping data and applying the fast Fourier transform to "center" values and asymptotic expansions ­ in the tails. Mittnik, Rachev and Paolella (1998) calculated the density at equally spaced grid points via a fast Fourier transform of the characteristic function and at intermediate points ­ by linear interpolation. Nolan (1998a) computed the density using numerical approximation of integrals in the Zolotarev integral formulas for the stable density.22 DuMouchel (1973) proved that the ML estimator is consistent and asymptotically normal. In Section 4 we analyze applicability of the ML method in VAR estimations. 20 For additional references, see Arad (1980), Feuerverger and McDunnough (1981), Mittnik, Rachev and Paolella (1998), Paulson, Holcomb and Leitch (1975). 21 Heathcote, Cheng and Rachev (1995). 22 For additional references, see Mittnik et al. (1997). Ch. 7: Stable Modeling of Market and Credit Value at Risk 263 3.2.3. Tail estimation: Fast Fourier transform method Tail estimation using the Fourier Transform (FT) method is based on fitting the characteristic function in a neighborhood of the origin t = 0. Here we use the classical tail estimate: P X - 1 a P |X| 1 a K a a 0 1 - Re fX(t) dt, for all a > 0, where Re{fX(t)} is the real part of the characteristic function fX(t) and the constant K = 1/(1 - sin 1) < 1/7. Precise estimation of the characteristic function guarantees accurate tail estimation, which leads to an adequate evaluation of VaR. Suppose that the distribution of returns r is symmetric--stable,23 that is: the characteristic function of r is given by fr(t) = E eirt = eit-|ct| . If > 1,24 then, given observations r1,...,rn, we estimate by the sample mean = r = 1 n n i=1 ri. For large values of n, the characteristic function of observations Ri = ri - r approaches fR(t) = e-|ct| . Consider the empirical characteristic function of the centered observations: ^fR,n(t) = 1 n n k=1 eiRkt . Because the theoretical characteristic function, fR(t), is real and positive, we have that ^fR,n(t) = Re 1 n n k=1 eiRkt = 1 n n k=1 cos(Rkt). Now the problem of estimating and c is reduced to determining ^ and ^c such that M 0 ^fR,n - fR(t, ^, ^c) = M 0 1 n n k=1 cos(Rkt) - e-(^ct)^ dt is minimal, where M is a sufficiently large value. The realization of the FT method is performed in the following steps: Step 1. Given the asset returns r1,...,rn, compute the centered returns Ri = ri - r, i = 1,...,n, where r = 1 n n i=1 ri. Step 2. Construct the sample characteristic function ^f (tj ) = 1 n n k=1 cos(Rktj ), 23 Empirical evidence suggests that does not play a significant role for VAR estimation. 24 As we have already observed, in all financial return data, fitting an -stable model results in > 1, which implies existence of the first moment. 264 S.T. Rachev et al. where tj = j , j = 1,...,, is the maximal value of t, is the number of grid points on (0,].25 Step 3. Do the search for best ^ and ^c such that j=1 1 n n k=1 cos(Rktj ) - e-(^ctj )^ is minimal. 4. VaR estimates for stable distributed financial returns In this section we consider a stable VaR model, which assumes that the portfolio return distribution follows a stable law. We derive "stable" VaR estimates and analyze their properties applying in-sample and forecast evaluations. We use "normal" VaR measurements as benchmarks for investigating characteristics of "stable" VaR measurements. We conduct analysis for various financial data sets: * the Yen/British Pound (BP) exchange rate, * the BP/US$ exchange rate, * the Deutsche Mark (DM)/BP exchange rate, * the S&P 500 index, * the DAX30 index, * the CAC40 index, * the Nikkei 225 index, * the Dow Jones Commodities Price Index (DJCPI). A short description of the data is given in Table 1. 4.1. In-sample evaluation of VaR estimates In this part we evaluate stable and normal VaR models by examining distances between the VaR estimates and the empirical VaR measures. By a formal definition of VaR in Equation (1), VaR estimates, VaRt, , are such that Pr Pt() < -VaRt, 1 - c, (2) where c is the confidence level, Pt() is the relative change in the portfolio value over the time horizon , i.e., Pt () = Rt, is the portfolio return at moment t over the time horizon and t is the current time. 25 For computation purposes, we have chosen = 20 and = 10000. In the realization of the FT method we selected the following grid steps ht: if 0 t 1, ht = 20/50000: if t > 1, ht = 20/1000. In order to emphasize the tail behavior, we refined the mesh near t = 0 and named that approach FT-Tail (FTT): if 0 t 0.1, ht = 20/100000; if 0.1 t 1.0, ht = 20/10000; if t > 1, ht = 20/1000. The numerical results are reported in Section 4. Ch. 7: Stable Modeling of Market and Credit Value at Risk 265 Table 1 Financial data series Series Source Number of observations Time period Frequency Yen/BP Datastream 6285 1.02.74­1.30.98 Daily (D) BP/US$ D. Hindanov 6157 1.03.74­1.30.98 D DM/BP Datastream 6285 1.02.74­1.30.98 D S&P 500 Datastream 7327 1.01.70­1.30.98 D DAX30 Datastream 8630 1.04.65­1.30.98 D CAC40 Datastream 2756 7.10.87­1.30.98 D Nikkei 225 Datastream 4718 1.02.80­1.30.98 D DJCPI Datastream 5761 1.02.76­1.30.98 D For the purpose of testing VaR models financial regulators advise to choose a time horizon of one day, so we take = 1. In the text below, if the time horizon is not stated explicitly, it is assumed to equal one day. At each time t, an estimate VaRt is obtained using lw recent observations of portfolio returns Rt-1,Rt-2,...,Rt-lw: VaRt = VaR(Rt-1,Rt-2,...,Rt-lw). (3) The lw parameter is called the window length. In this subsection, VaR is estimated employing the entire sample of observations, i.e., lw = N, where N is the sample size. Hence, we do not point out the present time t. We obtain "stable" ("normal") VaR measurements at the confidence level c in two steps: (i) fitting empirical data by a stable (normal) distribution, (ii) calculating a VaR as the negative of the (1 - c)-th quantile of a fitted stable (normal) distribution. "Stable" fitting is implemented using three methods: maximum likelihood (ML), Fourier Transform (FT), and Fourier Transform-Tail (FTT).26 Estimated parameters of densities and corresponding confidence intervals are presented in Table 2. In the FT and FTT fitting we assume that distributions of returns are symmetric, i.e., the skewness parameter is equal to zero. Since the index of stability > 1 for our data series, the location parameter is approximated by the sample mean. The ML estimates were computed applying the STABLE program by J.P. Nolan.27 The confidence intervals (CI) for the FT and FTT parameter estimates were derived using a bootstrap method with 1000 replications.28 Empirical analysis showed that a set of 1000 replications is: (i) satisfactory for constructing 95% CI; (ii) insufficient for obtaining reliable 99% CI. 26 Evaluation of parameters of stable distributions is provided in Section 3.2. 27 The STABLE program is described in Nolan (1997). 28 For references on bootstrapping, see Heathcote, Cheng and Rachev (1995); for discussion on CI based on ML parameter estimates, see Nolan (1998a). 266 S.T. Rachev et al. Table 2 Parameters of stable and normal densitiesa Series Normal Stable Mean Standard Method deviation Yen/BP -0.012 0.649 ML 1.647 -0.170 -0.023 0.361 FT 1.61 -0.018 0.34 [1.57,1.66] [-0.095,0.015] [0.33,0.36] [1.55,1.68] [-0.178,0.025] [0.33,0.37] FTT 1.50 -0.018 0.32 [1.46,1.55] [-0.131,0.034] [0.31,0.34] [1.44,1.64] [-0.261,0.070] [0.31,0.39] BP/US$ 0.006 0.658 ML 1.582 0.038 0.007 0.349 FT 1.57 0.006 0.33 [1.53,1.65] [-0.096,0.045] [0.32,0.36] [1.51,1.75] [-0.393,0.065] [0.32,0.47] FTT 1.45 0.006 0.31 [1.41,1.51] [-0.134,0.070] [0.30,0.33] [1.40,1.62] [-0.388,0.097] [0.30,0.47] DM/BP -0.012 0.489 ML 1.590 -0.195 -0.018 0.256 FT 1.60 -0.012 0.24 [1.54,1.75] [-0.064,0.013] [0.23,0.26] [1.53,1.75] [-0.165,0.022] [0.23,0.27] FTT 1.45 -0.012 0.23 [1.41,1.55] [-0.114,0.038] [0.22,0.26] [1.40,1.77] [-0.402,0.061] [0.22,0.40] S&P 500 0.032 0.930 ML 1.708 0.004 0.036 0.512 FT 1.82 0.032 0.54 [1.78,1.84] [-0.013,0.057] [0.53,0.54] [1.77,1.84] [-0.062,0.067] [0.53,0.55] FTT 1.60 0.032 0.48 [1.56,1.65] [-0.066,0.078] [0.47,0.49] [1.54,1.66] [-0.120,0.095] [0.46,0.50] DAX30 0.026 1.002 ML 1.823 -0.084 0.027 0.592 FT 1.84 0.026 0.60 [1.81,1.88] [-0.015,0.050] [0.59,0.60] [1.80,1.89] [-0.050,0.057] [0.58,0.62] FTT 1.73 0.026 0.57 [1.69,1.77] [-0.031,0.061] [0.56,0.58] [1.68,1.79] [-0.124,0.073] [0.56,0.59] CAC40 0.028 1.198 ML 1.784 -0.153 0.027 0.698 FT 1.79 0.028 0.70 [1.73,1.85] [-0.050,0.088] [0.68,0.73] [1.71,1.87] [-0.174,0.103] [0.67,0.74] a The CIs right below the estimates are the 95% CIs, the next CIs are the 99% CIs. Ch. 7: Stable Modeling of Market and Credit Value at Risk 267 Table 2 (Continued) Series Normal Stable Mean Standard Method deviation FTT 1.76 0.028 0.69 [1.71,1.84] [-0.053,0.091] [0.67,0.72] [1.69,1.87] [-0.394,0.101] [0.66,0.77] Nikkei 0.020 1.185 ML 1.444 -0.093 -0.002 0.524 225 FT 1.58 0.02 0.59 [1.53,1.64] [-0.127,0.102] [0.57,0.62] [1.52,1.67] [-0.421,0.130] [0.57,0.69] FTT 1.30 0.02 0.49 [1.26,1.47] [-0.451,0.316] [0.47,0.69] [1.05,1.67] [-1.448,0.860] [0.47,1.10] DJCPI 0.006 0.778 ML 1.569 -0.060 0.003 0.355 FT 1.58 0.006 0.35 [1.53,1.66] [-0.026,0.100] [0.34,0.37] [1.52,1.67] [-0.140,0.120] [0.33,0.39] FTT 1.49 0.006 0.33 [1.44,1.55] [-0.160,0.062] [0.32,0.36] [1.44,1.69] [-0.396,0.100] [0.32,0.46] In our experiments, sets of 1000 replications generated: (i) 95% CI for and whose bounds coincided up to two decimal points; 95% CI for with slightly varying bounds; (ii) varying 99% CI, with insignificant variation of left limits. VaR measurements were calculated at confidence levels c = 99% and c = 95%. The 99% (95%) VaR was determined as the negative of the 1% (5%) quantile. For calculating stable quantiles we used our program, built on the Zolotarev integral representation form of the cumulative distribution function. The 99% and 95% VaR estimates are reported in Tables 3 and 4, respectively. Biases of stable and normal VaR measurements are provided in Table 5.29 We accompany our computations with plots of: * daily price levels, * daily returns, * fitted empirical, normal, and stable densities with the ML, FT, and FTT estimated para- meters, * daily empirical, normal, and stable VAR* estimates at the 99% and 95% confidence levels.30 29 Biases are computed by subtracting the empirical VAR from the model VAR estimates. 30 The VAR* numbers are the negative values of the VAR estimates, VAR = -VAR. 268 S.T. Rachev et al. Combined plots of densities and VaR estimation are displayed in Figures 3­10. In order to illustrate that confidence intervals for the FT parameter estimates are sufficiently narrow, we show stable densities and VaR measures at boundary values of confidence intervals for ^Yen,FT and ^Yen,FT in Figures 11­14. As Figures 3­10 demonstrate, the VaR estimates obtained at confidence level c = 95% seem to belong to the area between the "tail" and the "center". The VaR at level c = 99% is really in the tail area. Hence, we compare performance of stable and normal models separately for the cases c = 95% and c = 99%. In general, the stable modeling (ML, FT, and FTT) provided evaluations of the 99% VaR greater than the empirical 99% VaR (see Figures 3­10, Tables 3 and 4). It underestimated the sample 99% VaR in the applications of two methods: FT ­ for the CAC40, S&P 500, and DAX30 indices, and ML ­ for the DAX30 index. Biased downwards stable VaR estimates were closer to the true VaR than the normal estimates (see Table 5). Among the methods of stable approximation, the FT method provided more accurate VaR estimates for 7 data sets (see Table 5). For all analyzed data sets, the normal modeling underestimated the empirical 99% VaR. Stable modeling provided more accurate 99% VaR estimates: mean absolute bias31 under the stable (FT) method is 42% smaller than under the normal method. At 95% confidence level, the stable VaR estimates were lower than the empirical VaR for all data sets. The normal VaR measurements exceeded the true VaR, except the Yen/BP exchange rate series (see Table 6). For the exchange rate series (Yen/BP, BP/US$, and DM/BP), the normal method resulted in more exact VaR estimates. For the S&P 500, DAX30, CAC40, and DJCPI indices, stable methods underestimated VaR, though the estimates were closer to the true VaR than the normal estimates. Mean absolute biases under stable and normal modeling are of comparable magnitudes. In-sample examination of VaR models showed: * the stable modeling generally results in conservative and accurate 99% VaR estimates, which is preferred by financial institutions and regulators,32 * the normal approach leads to overly optimistic forecasts of losses in the 99% VaR esti- mation, * from a conservative point of view, the normal modeling is acceptable for the 95% VaR estimation, * the stable models underestimate the 95% VaR. In fact, the stable 95% VaR measurements are closer to the empirical VaR than the normal 95% VaR measurements. The next step in evaluating VaR models is analysis of their forecasting characteristics. 31 Let bm,s be a bias of a VaR estimate: bm,s = VaRm,s - VaREmpirical,s. The mean absolute bias equals MABm = ( 8 s=1 |bm,s|)/8, where m denotes normal, stable-ML, stable-FT, and stable-FTT methods, and s a series. 32 In the 99% VaR estimation for data series from Table 1, mean absolute bias under the stable modeling was 42% smaller than under the normal modelling. Ch. 7: Stable Modeling of Market and Credit Value at Risk 269 Fig. 3. VAR estimation for the DM/BP exchange rate. 270 S.T. Rachev et al. Fig. 4. VAR estimation for the Yen/BP exchange rate. Ch. 7: Stable Modeling of Market and Credit Value at Risk 271 Fig. 5. VAR estimation for the BP/US$ exchange rate. 272 S.T. Rachev et al. Fig. 6. VAR estimation for the CAC40 index. Ch. 7: Stable Modeling of Market and Credit Value at Risk 273 Fig. 7. VAR estimation for the Nikkei 225 index. 274 S.T. Rachev et al. Fig. 8. VAR estimation for the S&P 500 index. Ch. 7: Stable Modeling of Market and Credit Value at Risk 275 Fig. 9. VAR estimation for the DAX30 index. 276 S.T. Rachev et al. Fig. 10. VAR estimation for the DJCPI index. Ch. 7: Stable Modeling of Market and Credit Value at Risk 277 Fig. 11. Stable fitting at limiting values of a confidence interval for alpha. Fig. 12. VAR estimation at limiting values of a confidence interval for alpha. 278 S.T. Rachev et al. Fig. 13. Stable fitting at limiting values of a confidence interval for sigma. Fig. 14. VAR estimation at limiting values of a confidence interval for sigma. Ch. 7: Stable Modeling of Market and Credit Value at Risk 279 Table 3 Empirical, normal, and stable 99% VaR estimatesa Series 99% VaR Empirical Normal Stable ML FT FTT Yen/BP 1.979 1.528 2.247 2.112 2.494 [1.968, 2.252] [2.276, 2.736] [1.919, 2.415] [2.230, 2.836] BP/US$ 1.774 1.526 2.221 2.200 2.668 [2.014, 2.412] [2.436, 2.925] [1.956, 2.593] [2.358, 3.029] DM/BP 1.489 1.149 1.819 1.520 1.996 [1.190, 1.712] [1.792, 2.211] [1.179, 1.742] [1.700, 2.329] S&P 500 2.293 2.131 2.559 2.200 2.984 [2.117, 2.358] [2.757, 3.243] [2.106, 2.470] [2.700, 3.336] DAX30 2.564 2.306 2.464 2.375 2.746 [2.260, 2.502] [2.557, 2.949] [2.240, 2.569] [2.523, 2.997] CAC40 3.068 2.760 3.195 3.019 3.144 [2.753, 3.364] [2.788, 3.504] [2.682, 3.520] [2.700, 3.841] Nikkei 225 3.428 2.737 4.836 3.842 6.013 [3.477, 4.254] [5.190, 6.701] [3.367, 4.453] [4.658, 19.950] DJCPI 2.053 1.804 2.446 2.285 2.603 [1.955, 2.423] [2.382, 2.870] [1.916, 2.474] [2.288, 3.035] a The CIs right below the estimates are the 95% CIs, the next CIs are the 99% CIs. 4.2. Forecast-evaluation of VaR estimates In this section we investigate the forecasting properties of stable and normal VaR modeling by comparing predicted VaR with observed returns. We test the null hypothesis that Equation (1) for a time horizon of 1 day ( = 1) holds at any time t: Pr[Pt < -VaRt] = 1 - c, (4) where Pt is the relative change (return) in the portfolio value, i.e., Pt = Rt is the portfolio return at moment t, VaRt is the VaR measure at time t, c is the VaR confidence level, 280 S.T. Rachev et al. Table 4 Empirical, normal, and stable 95% VaR estimatesa Series 95% VaR Empirical Normal Stable ML FT FTT Yen/BP 1.103 1.086 1.033 0.968 0.995 [0.926,1.047] [0.937,1.132] [0.911,1.186] [0.911,1.329] BP/US$ 1.038 1.077 0.981 0.944 0.986 [0.898,1.072] [0.917,1.158] [0.876,1.599] [0.895,1.588] DM/BP 0.806 0.816 0.772 0.687 0.748 [0.652,0.749] [0.695,0.894] [0.641,0.894] [0.678,1.418] S&P 500 1.384 1.497 1.309 1.308 1.319 [1.275,1.361] [1.265,1.423] [1.265,1.411] [1.246,1.503] DAX30 1.508 1.623 1.449 1.451 1.452 [1.415,1.500] [1.405,1.521] [1.402,1.533] [1.395,1.650] CAC40 1.819 1.943 1.756 1.734 1.734 [1.653,1.837] [1.647,1.845] [1.621,1.944] [1.616,2.288] Nikkei 225 1.856 1.929 1.731 1.666 1.840 [1.570,1.839] [1.582,2.512] [1.558,2.280] [1.500,5.022] DJCPI 1.066 1.274 1.031 0.994 1.011 [0.888,1.047] [0.944,1.188] [0.870,1.200] [0.915,1.615] a The CIs right below the estimates are the 95% CIs, the next CIs are the 99% CIs. t is the current time, t [1,T ], and T is the length of the testing interval. The test is performed by checking whether Pr[Rt < -VaRt] is reasonably close to 1 - c, where VaRt is the estimate of VaRt . Recall that VaRt is computed using the last lw observations.33 Let bt be the indicator function 1{Rt < -VaRt }, 1 t T . If Equation (4) holds, then bt = 1 Rt < -VaRt = 1, probability = 1 - c, 0, probability = c. 33 See Equation (3). Ch. 7: Stable Modeling of Market and Credit Value at Risk 281 Table 5 Biases of normal and stable 99% VaR estimates Series 99% VaRm - 99% VaREmpirical Normal Stable ML FT FTT Yen/BP -0.451 0.268 0.133 0.515 BP/US$ -0.248 0.447 0.426 0.894 DM/BP -0.340 0.330 0.031 0.507 S&P 500 -0.162 0.266 -0.093 0.691 DAX30 -0.258 -0.100 -0.189 0.182 CAC40 -0.308 0.127 -0.049 0.076 Nikkei 225 -0.691 1.408 0.414 2.585 DJCPI -0.249 0.393 0.232 0.550 Mean absolute bias 0.338 0.416 0.196 0.750 Table 6 Biases of normal and stable 95% VaR estimates Series 95% VaRm - 95% VaREmpirical a Normal Stable ML FT FTT Yen/BP -0.017 -0.070 -0.135 -0.108 BP/US$ 0.039 -0.057 -0.094 -0.052 DM/BP 0.010 -0.034 -0.119 -0.058 S&P 500 0.113 -0.075 -0.076 -0.065 DAX30 0.115 -0.059 -0.057 -0.056 CAC40 0.124 -0.063 -0.085 -0.085 Nikkei 225 0.073 -0.125 -0.190 -0.016 DJCPI 0.208 -0.035 -0.072 -0.055 Mean absolute bias 0.087 0.065 0.104 0.070 a m denotes normal, stable-ML, stable-FT, and stable-FTT methods. Let us denote by E the number of exceedings (Rt < -VaRt )34 over the testing interval [1,T ]. If Equation (4) is valid, then the variable E = T t=1 bt has a binomial distribution. We can formulate a testing rule: reject the null hypothesis at level of significance x if E t=0 T t (1 - c)t cT -t x 2 or E t=0 T t (1 - c)t cT -t 1 - x 2 . 34 In nominal levels, an exceeding implies a case when actual losses exceeded the predicted losses. 282 S.T. Rachev et al. Table 7 Admissible VaR exceedings and exceeding frequencies VaR confidence Length of a testing Admissible VaR Admissible VaR level, c interval, T exceedings, E frequencies, E/T (%) Significance level, x Significance level, x 5% 1% 5% 1% 95% 500 [17,33] [14,36] [3.40,6.60] [2.80,7.20] 1500 [61,89] [56,94] [4.07,5.93] [3.73,6.27] 99% 500 [2,8] [0,10] [0.40,1.60] [0.00,2.00] 1500 [9,21] [6,23] [0.60,1.40] [0.40,1.53] For large T and sufficiently high VaR confidence levels, the binomial distribution can be approximated by the normal distribution. Hence, the testing rule for large T is: reject the null hypothesis at level of significance x if E < T (1 - c) - z1-x/2 T (1 - c)c or E > T (1 - c) + z1-x/2 T (1 - c)c, where zp is the p% standard normal quantile. The bounds of admissible VaR exceedings E and exceedings frequencies, E/T , for testing at level of significance 5% and 1% are provided in Table 7. We examined forecasting properties of stable and VaR models for data series described in Table 1. In testing procedures we considered the following parameters: * window lengths lw = 260 observations (data over 1year) and lw = 1560 observations (data over 6 years), * lengths of testing intervals T = 500 days and T = 1500 days. Evaluation results are reported in Tables 8 and 9. We indicate by the bold font the numbers, which are outside of acceptable ranges. From Table 8 we can see that normal models for the 99% VaR computations commonly produce numbers of exceedings above the acceptable range, which implies that normal modeling significantly underestimates VaR (losses). At window length of 260 observations, stable modeling is not satisfactory. It provided permissible number of exceptions only for the BP/US$ and DJCPI series. At sample size of 1560 and testing interval of 500 observations, exceedings by the stable-FT method are outside of the admissible interval for the S&P 500, DAX30, and CAC40 indices. Testing on the longer interval with T = 1500 showed that numbers of "stable" exceptions are within permissible range. Table 8 demonstrates that increasing the window length from 260 observations to 1560 observations reduces the number of stable-FT exceedings. In contrast, extending the window length for normal models does not decrease E, in some cases, even elevates it. Results illustrate that stable modeling outperforms normal modeling in the 99% VaR estima- tions. The 95% VaR normal estimates (except the DAX30 series), obtained using 260 observations, are within the permissible range. Increasing the window length generally worsens the Ch. 7: Stable Modeling of Market and Credit Value at Risk 283 Table 8 99% VaR exceedings Series Length of a testing 99% VaR exceedings interval, T Window length = 260 obs. Window length = 1560 obs. Normal FT Normal FT E E/T (%) E E/T (%) E E/T (%) E E/T (%) Yen/BP 500 15 3.00 13 2.60 10 2.00 2 0.40 1500 40 1.67 34 2.27 45 3.00 21 1.40 BP/US$ 500 10 2.00 5 1.00 1 0.20 0 0.00 1500 26 1.73 13 0.86 17 1.33 5 0.33 DM/BP 500 18 3.60 14 2.80 17 3.40 8 1.60 1500 45 3.00 33 2.20 50 3.33 19 1.27 S&P 500 500 17 3.40 13 2.60 25 5.00 13 2.60 1500 35 2.33 27 1.80 28 1.87 14 0.93 DAX30 500 21 4.20 14 2.80 19 3.80 18 3.60 1500 41 2.73 29 1.93 25 1.67 20 1.33 CAC40 500 16 3.20 14 2.80 14 2.80 13 2.60 1500 34 2.27 29 1.93 17 1.63 19 1.27 Nikkei 225 500 15 3.00 14 2.80 13 2.60 7 1.40 1500 31 2.07 23 1.53 26 1.73 10 0.67 DJCPI 500 12 2.40 7 1.40 15 3.00 10 2.00 1500 29 1.93 15 1.00 28 1.87 17 1.13 normal VaR measurements. The stable-FT method provided sufficient 95% VaR estimates for the Yen/BP and BP/US$ exchange rates and the CAC40 and Nikkei 225 indices. A study of the predictive power of VaR models suggests that: * the normal modeling significantly underestimates 99% VaR, * the stable method results in reasonable 99% VaR estimates, * 95% normal measurements are in the admissible range for the window length of 260 observations. Increasing lw to 1560 observations might deteriorate the precision of the estimates. 5. Stable modeling and risk assessment for individual credit returns Recall that the stable distributions are characterized by four parameters: -tail index, -skewness, -location, and -scale. Modeling with such parameters allows for heavy tails and skewness of the distributions. Our empirical analysis confirms that, indeed: (i) credit returns exhibit asymmetry and heavy-tails; (ii) stable modeling captures these features of the returns. 284 S.T. Rachev et al. Table 9 95% VaR exceedings Series Length of a testing 95% VaR exceedings interval, T Window length = 260 obs. Window length = 1560 obs. Normal FT Normal FT E E/T (%) E E/T (%) E E/T (%) E E/T (%) Yen/BP 500 35 7.00 38 7.60 27 5.40 31 6.2 1500 94 6.27 104 6.93 109 7.27 122 8.13 BP/US$ 500 33 6.60 45 9.00 10 2.00 17 3.40 1500 73 4.87 96 6.40 46 3.07 57 3.80 DM/BP 500 32 6.40 38 7.60 29 5.80 37 7.40 1500 89 5.93 114 7.60 105 7.00 139 9.27 S&P 500 500 34 6.80 39 7.80 43 8.60 47 9.40 1500 79 5.27 98 6.53 62 4.13 69 4.60 DAX30 500 47 9.40 50 10 42 8.40 45 9.00 1500 98 6.53 109 7.27 62 4.13 79 5.27 CAC40 500 32 6.40 34 6.80 31 6.20 32 6.40 1500 81 5.40 87 5.80 51 4.90 82 5.47 Nikkei 225 500 37 7.40 40 8.00 28 5.60 33 6.60 1500 85 5.67 90 6.00 68 4.53 87 5.80 DJCPI 500 29 5.80 35 7.00 37 7.40 46 9.20 1500 70 4.67 93 6.20 77 5.13 108 7.20 The "assets" used in the study are the Merrill Lynch indices of the US government and corporate bonds with maturities from one to 10 years and credit ratings from "BB" to "AAA". Returns on indices are modeled as stable-distributed: Ri Si (i,i,i), where i = 1,...,21. Some analysis of the indices is provided in Table 10. Daily returns series are illustrated on Figure 15 and in Appendix A. The benchmark for assessment of the stable model properties is the "normal" model, i.e., approximation of returns by normal distributions. By categorization of stable distributions, a normal distribution has a tail index = 2 and a symmetric distribution has a skewness parameter = 0. Values of < 2 indicate thicker tails than the tails of the normal distribution. In general, as is smaller, the tails are heavier and the peak of the density is higher. If < 0, the distribution is skewed to the left. If > 0, the distribution is skewed to the right. Larger absolute magnitudes of point to stronger skewness. The stable and normal parameter estimates for the bond indices are presented in Table 10. For all 17 considered indices, the tail index is less than two, which reveals heavy-tailedness, and the skewness parameter is below zero, which implies skewness to the left. The fitted empirical, stable, and normal densities of indices are displayed in Figure 16 and in Appendix A. Ch. 7: Stable Modeling of Market and Credit Value at Risk 285 Table 10 Normal and stable parameter estimates of bond indices Indexa Rating or Maturity Normal Stable issuer (year) Mean St. dev. Tail index Skewness Location Scale G102 US gov-t 1-3 0.026 0.096 1.696 -0.160 0.029 0.055 G202 US gov-t 3-5 0.030 0.204 1.739 -0.134 0.036 0.122 G302 US gov-t 5-7 0.032 0.275 1.781 -0.134 0.032 0.169 G402 US gov-t 7-10 0.033 0.352 1.808 -0.172 0.033 0.218 C1A1 AAA 1-3 0.027 0.096 1.654 -0.080 0.053 0.027 C2A1 AAA 3-5 0.029 0.175 1.695 -0.112 0.029 0.099 C3A1 AAA 5-7 0.032 0.249 1.710 -0.116 0.031 0.145 C4A1 AAA 7-10 0.032 0.319 1.739 -0.155 0.031 0.190 C1A2 AA 1-3 0.028 0.099 1.686 -0.105 0.027 0.056 C2A2 AA 3-5 0.029 0.177 1.722 -0.111 0.029 0.104 C3A2 AA 5-7 0.032 0.250 1.757 -0.121 0.032 0.150 C4A2 AA 7-10 0.033 0.325 1.778 -0.148 0.033 0.198 C1A3 A 1-3 0.028 0.098 1.688 -0.135 0.027 0.056 C2A3 A 3-5 0.030 0.180 1.702 -0.122 0.029 0.104 C3A3 A 5-7 0.032 0.255 1.743 -0.133 0.033 0.151 C4A3 A 7-10 0.033 0.333 1.753 -0.167 0.033 0.199 C1A4 BBB 1-3 0.029 0.112 1.653 -0.113 0.029 0.054 C2A4 BBB 3-5 0.032 0.183 1.662 -0.042 0.033 0.096 C3A4 BBB 5-7 0.034 0.249 1.690 -0.125 0.035 0.140 C4A4 BBB 7-10 0.035 0.316 1.694 -0.136 0.035 0.180 H0A1 BB 1-3 0.027 0.185 1.686 -0.252 0.042 0.104 a Each index set, except H0A1, includes 2418 daily observations from 3.13.90 to 7.29.99. Source of index series: Merrill Lynch, used with permission. In order to assess riskiness of the individual credit series and properties of stable modeling in the credit risk evaluation, the 99% and 95% Value at Risk (VaR) measurements were computed. The stable and normal VaR estimates are reported in Table 11. Normal VaR measurements are given for comparison purposes. The differences between empirical and modeled VaR are given in Appendix B, Table B.1. The VaR evaluation for the bond indices is illustrated on Figures 17 and in Appendix A. Results of VaR estimations lead to the following conclusions:35 Since credit returns have skewed and heavy-tailed distributions, VaR measurements provide more adequate indication of risk than symmetric measurements (standard deviation or, in case of stable distributions, scale parameter) do. 35 This section computes "in-sample" VaR. Hence, the conclusions discuss in-sample VaR properties. 286 S.T. Rachev et al. Fig. 15. H0A1 daily returns. Fig. 16. Stable and normal fitting of the HOA1 index. Ch. 7: Stable Modeling of Market and Credit Value at Risk 287 Fig. 17. VAR estimation for the HOA1 index. * the stable modeling produces conservative and accurate 99% VaR estimates, which is preferred by financial institutions and regulators. "Conservative" VaR estimates exceed empirical VaR, which implies that forecasts of losses were greater than observed losses, * the stable modeling underestimates the 95% VaR, * the normal modeling gives overly optimistic forecasts of losses in the 99% VaR esti- mation, * the normal modeling is acceptable for the 95% VaR estimation. The stable modeling for high values of the VaR confidence level is superior because it adequately describes heavy tails and skewness in the data. Our empirical analysis demonstrates advantages of stable modeling in evaluation of riskiness of single credit returns series. The next step is to examine properties of stable modeling in evaluation of portfolio risk. 6. Portfolio credit risk for independent credit returns Suppose that a portfolio includes n credit assets. Then, the portfolio return is given by RP = n i=1 wiRi, where Ri is the return on the i-th asset, wi is the weight of the i-th asset, i = 1,...,n, n i=1 wi = 1. The modeling in this section assumes that distributions 288 S.T. Rachev et al. Table 11 Empirical, normal, and stable VaR estimates for bond indices Index 99% VaR estimates 95% VaR estimates Empirical Normal Stable Empirical Normal Stable G102 0.242 0.198 0.275 0.127 0.132 0.119 G202 0.518 0.446 0.576 0.303 0.306 0.283 G302 0.739 0.609 0.747 0.412 0.421 0.399 G402 0.928 0.785 0.932 0.545 0.545 0.518 C1A1 0.238 0.196 0.284 0.129 0.130 0.119 C2A1 0.437 0.377 0.509 0.244 0.258 0.236 C3A1 0.687 0.548 0.734 0.369 0.378 0.353 C4A1 0.883 0.712 0.931 0.480 0.494 0.467 C1A2 0.237 0.201 0.285 0.132 0.134 0.125 C2A2 0.443 0.382 0.505 0.254 0.261 0.244 C3A2 0.663 0.550 0.689 0.373 0.380 0.355 C4A2 0.870 0.722 0.890 0.482 0.501 0.474 C1A3 0.237 0.207 0.286 0.135 0.134 0.125 C2A3 0.469 0.390 0.530 0.260 0.267 0.248 C3A3 0.705 0.560 0.719 0.376 0.386 0.361 C4A3 0.893 0.741 0.949 0.487 0.514 0.485 C1A4 0.262 0.231 0.290 0.124 0.155 0.119 C2A4 0.478 0.392 0.511 0.243 0.268 0.228 C3A4 0.711 0.545 0.741 0.361 0.375 0.343 C4A4 0.862 0.702 0.960 0.467 0.486 0.451 H0A1 0.557 0.403 0.570 0.258 0.277 0.245 of Ri are: (i) independent -stable and (ii) characterized by the same index of stability, Ri S(Ri ,Ri ,0),36 i = 1,...,n. The additivity property of independent stable random variables provides analytic formulas for parameters of portfolio returns RP . The formulas lead to estimates of portfolio parameters and risk without simulations. In practice, the "independent" risk measurements are lower bounds of portfolio risk. By the additivity property of stable distributions, a linear combination of independent stable random variables is again a stable random variable. Therefore, RP = n i=1 wiRi follows a stable law: RP S(RP ,RP ,0), 36 We assume that a > 1 (this assumption is always supported by the empirical studies) and the mean Ri = 0. If Ri = 0, we "center" the Ri observations: R i = Ri - Ri . Ch. 7: Stable Modeling of Market and Credit Value at Risk 289 where is the tail index, RP is the scale parameter RP is the skewness parameter, RP = n i=1 |wi|Ri 1/ , RP = n i=1[sign(wi)Ri (|wi|Ri )] n i=1(|wi|Ri ) . Thus, the distribution of the portfolio returns is characterized by three parameters: tail index (index of stability) , skewness RP , and scale RP . The parameter is exogenous. Estimation of RP and RP can be implemented in three steps: Step 1: Find estimates of Ri and Ri by stable fitting sets of Rit , t = 1,...,T, i = 1,...,n. Step 2: Evaluate portfolio parameters RP and RP : ^RP = n i=1 |wi|^Ri 1/ , (5) ^RP = n i=1[sign(wi) ^Ri (|wi|^Ri )] n i=1(|wi|^Ri ) . (6) Having estimates of parameters of the portfolio credit risk, ^RP and ^RP , the portfolio VaR is the negative of the appropriate quantile of the RP -distribution. As an illustration of the method, portfolio risk is estimated for equally weighted returns on indices of the investment grade corporate bonds: C1A1, C2A1, C3A1, C4A1, C1A2, C2A2, C3A2, C4A2, C1A3,C2A3, C3A3, C4A3, C1A4, C2A4, C3A4, and C4A4.37 Description of indices is given in Table 10 of Section 5. By assumption, the indices are (i) characterized by the same tail index and (ii) independent. Fix at 1.708, the average of the values for the bond return series (see Table 10), and recalculate other stable parameters: Ri , Ri , and Ri . New estimates are reported in Table 12. The condition requiring the same tail index for all analyzed series does not appear to be very restrictive: new parameter estimates (given in Table 12) do not differ much from the previous parameter estimates (reported in Table 10); the new stable VaR estimates (see Table B.2 in Appendix B) are close to the initial stable VaR measurements (Table 11). The estimates are all small. Further on, we shall assume = 0. Portfolio parameters following formulas (1), (2) are ^UP = 0.659, ^UP = -0.125. Thus, RP 37 A digit after letter "C" denotes the maturity band: 1 ­ from 1 to 3 years, 2 ­ from 3 to 5 years, 3 ­ from 5 to 7 years, 4 ­ from 7 to 10 years; a digit after letter "A" denotes credit rating: 1 ­ "AAA", 2 ­ "AA", 3 ­ "A", 4 ­ "BBB". 290 S.T. Rachev et al. Table 12 Stable fitting of the bond indices with fixed Bond indices Maturity Stable parameters at = 1.708 (years) C1A1 1-3 -0.084 0.027 0.054 C2A1 3-5 -0.111 0.029 0.099 C3A1 5-7 -0.116 0.031 0.144 C4A1 7-10 -0.146 0.031 0.188 C1A2 1-3 -0.107 0.027 0.057 C2A2 3-5 -0.105 0.029 0.103 C3A2 5-7 -0.098 0.033 0.148 C4A2 7-10 -0.128 0.032 0.194 C1A3 1-3 -0.144 0.027 0.057 C2A3 3-5 -0.120 0.030 0.104 C3A3 5-7 -0.125 0.032 0.149 C4A3 7-10 -0.151 0.032 0.196 C1A4 1-3 -0.118 0.029 0.054 C2A4 3-5 -0.045 0.033 0.098 C3A4 5-7 -0.128 0.035 0.141 C4A4 7-10 -0.143 0.035 0.180 S1.708(0.659,-0.125,0). The portfolio c% VaR is calculated as the negative of the (1 -c)th quantile of the RP -distribution. For the analyzed portfolio, the 99% VaR equals 3.518 and the 95% VaR equals 1.757. As credit returns typically have the non-negative dependence structure, the assumption of independence for single credit returns results in the lowest VaR measurement, the lower bound for portfolio VaR estimates. The upper bound of the portfolio VaR measurements is given by the non-diversified VaR, the sum of the standalone VaR values.38 For our portfolio, the non-diversified stable 99% VaR is 9.813 and the non-diversified stable 95% VaR is 4.733. Analysis in Section 5 showed the 99% stable VaR estimates slightly exceed the empirical 99% VaR, whereas the 95% stable VaR evaluation underestimates the empirical 95% VaR. Therefore, 9.813 is a biased upwards estimate of the portfolio non-diversified 99% VaR and 4.733 is a biased downwards measurement of the portfolio non-diversified 95% VaR. 7. Stable modeling of portfolio risk for symmetric dependent credit returns In this section we suppose that distributions of credit returns are symmetric -stable and dependent. We interpret a symmetric random variable as a transformation of a normal ran- 38 The stand-alone VaR is the VaR for the individual asset. Ch. 7: Stable Modeling of Market and Credit Value at Risk 291 dom variable. Based on this interpretation, we develop a new methodology for correlation estimation. We apply the methodology for portfolio risk assessment. We evaluate portfolio risk by determining portfolio VaR: (i) simulating a distribution of the RP = n i=1 wiRi values; (ii) finding a certain quantile of the RP distribution, say, the 1% quantile, which corresponds to the 99% VaR confidence level. The aim of simulations is to project possible portfolio return values RP at time T + 1 given: (i) observations of individual returns over time: Ri1,Ri2,...,RiT , i = 1,...,n; (ii) weights of portfolio assets w1,...,wn. The simulations must account for dependence among individual credit returns Ri, i = 1,...,n. A traditional approach of quantifying dependence is to calculate the covariance matrix. Under the -stable assumption for distributions of Ri , computation of the covariance matrix is impossible. We suggest a new method for deriving the dependence (association) structure. The method assumes that Ri are symmetric strictly stable: Ri SRi (Ri ,0,0). A symmetric -stable (SS) random variable can be interpreted as a random rescaling transformation of a normal random variable (see Property 1 below). If a collection of SS variables is obtained by applying a similar transformation to dependent normal variables, the dependence structure among variables will remain. Thus, the dependence among SS random variables can be explained by the dependence among underlying normal random variables. Property 1.39 Assume that: (i) G is a normal random variable with a zero mean: G S2(G,0,0) = N 0,22 G , (ii) Y is a symmetric -stable random variable, < 2: Y S(Y ,0,0), (iii) S is a positive 2 -stable random variable: S S/2 2 Y 2 G cos 4 2/ ,1,0 , (iv) S and G are independent. Then, the symmetric -stable random variable Y can be represented as a random rescaling transformation of the normal random variable G: Y = S1/2 G. Simulations of the portfolio return values RP can be divided into two fragments: 39 Property 1 is a slightly modified version of Proposition 1.3.1 in Samorodnitsky and Taqqu (1994). 292 S.T. Rachev et al. (i) generating individual returns Ri with the same dependence structure as the Ri's. We derive the dependence among Ri supposing that Ri SRi (Ri ,0,0). Based on Property 1, Ri can be expressed as a transformation of a normal random variable: Ri = S 1/2 i Gi, (7) where Gi S2(Gi ,0,0) = N 0,22 Gi , (8) Si SRi /2 2 Ri 2 Gi cos 4 2/Ri ,1,0 , (9) Si is independent of Gi, i = 1,...,n. Random rescaling transformations of normal variables Gi into Ri preserve the dependence structure. Hence, the dependence among Ri can be explained by the dependence among Gi, i = 1,...,n. Based on this property, we generate dependent normal variables Gi, maintaining the initial dependence,40 then, we generate Ri = S 1/2 i Gi, where Si is a simulated value of Si ; (ii) computing RP = n i=1 wiRi. The simulations are performed according to the following algorithm:41 Step 1: Estimate stable parameters of Ri: Ri , Ri , Ri .42 Step 2: "Center" the Ri observations: R i = Ri - Ri . Further on, we shall assume Ri = 0 and consider R i as Ri : Ri SRi (Ri ,0,0), i = 1,...,n. Step 3: Assume: (i) Ri can be decomposed according to expressions (7)­(9); (ii) the covariance matrix of (Gi)1 i n is equal to the covariance matrix of truncated (Ri)1 i n. Evaluate the covariance matrix of (Gi)1 i n at time T + 1, T +1 = {cij,T +1|T }, i = 1,..., n,j = 1,...,n, using exponential weighting: c2 i,T +1|T = (1 - ) K k=0 k R2 i,T -k, (10) c2 ij,T +1|T = (1 - ) K k=0 k Ri,T -kRj,T -k, (11) 40 Variables Gi, which enter formulas (1) and (8), are not observable. We suppose the dependence structure of Gaussian variables (Gi)1 i n is "inherited" from the dependence structure of truncated values of stable variables (Ri)1 i n. Because we believe that the "outliers" are very important for the description of the dependence structure, we take the truncation value for Ri sufficiently large. 41 The algorithm is implemented in the Mercury Software Package for Market Risk (VaR). See Rachev et al. (1999). 42 This section assumes Ri = 0. Ch. 7: Stable Modeling of Market and Credit Value at Risk 293 where T + 1|T denotes a forecast for time T + 1 conditional on information up to time T ; is a decay factor, 0 < < 1; K is a number of observations' lags. Exponential weighting (6), (7) allows to account for volatility and correlation clustering (GARCH effects).43 Formulas (6), (7) can be expressed in recursive (GARCH-type) form:44 c2 i,T +1|T = c2 i,T |T -1 + (1 - )R2 i,T , c2 ij,T +1|T = c2 ij,T |T -1 + (1 - )Ri,T Rj,T . Step 4: Generate a value of the multivariate normal random variable G = (G1,G2,..., Gn) with the covariance matrix T +1. Step 5: Simulate values of stable random variables Si SRi /2 22 Ri c2 i cos 4 2/Ri ,1,0 , i = 1,...,n. Step 6: Compute Ri = S 1/2 i Gi, i = 1,...,n. Step 7: Calculate RP = n i=1 wiRi. Step 8: Repeat Steps 4­7 a large number of times to form an RP -distribution. Obtain a portfolio VaR measurement as the negative of a specified quantile of the RP - distribution. We evaluate portfolio risk for equally weighted returns on indices of the investment grade corporate bonds: C1A1, C2A1, C3A1, C4A1, C1A2, C2A2, C3A2, C4A2, C1A3, C2A3, C3A3, C4A3, C1A4, C2A4, C3A4, and C4A4. Description of indices is given in Table 10 of Section 5. We impose an assumption that returns on these indices are symmetric-stable. We compute the 99% and 95% VaR measurements in two procedures: (i) simulation of portfolio returns following the above described algorithm; (ii) calculation of the 99% (95%) VaR as the negative of the 1% (5%) quantile. In step 3 of the portfolio returns simulations, derivation of the covariance matrix T +1, we used different truncation points and decay factor values. In order to estimate accuracy of simulations, we calculate the Kolmogorov Distance (KD) and Anderson­Darling (AD) statistics: KD = sup x Fe(x) - Fs(x) , AD = sup x |Fe(x) - Fs(x)| Fe(x)(1 - Fe(x)) , where Fe(x) is the empirical cumulative density function (cdf) and Fs(x) is the simulated cdf. The computation results are summarized in Table 13. 43 An exponential weighting methodology follows the RiskMetrics' exponentially weighted moving average model. See Longerstaey and Zangari (1996). 44 Formulas are adapted from Longerstaey and Zangari (1996). 294 S.T. Rachev et al. Table 13 Portfolio VaR for symmetric dependent credit returns Decay Truncation Portfolio VaR Kolmogorov Andersonfactor points (%) distance Darling 99% VaR 95% VaR 0.85 10-90 7.508 4.886 3.880 0.086 5-95 7.777 5.153 3.736 0.093 No 8.286 5.346 4.859 0.111 0.94 10-90 7.793 5.147 3.556 0.081 5-95 8.076 5.248 4.362 0.104 1-99 8.389 5.434 5.650 0.128 No 8.114 5.252 5.212 0.117 0.975 10-90 8.028 5.036 3.452 0.077 5-95 8.166 5.318 9.085 0.234 1-99 8.469 5.493 5.805 0.130 No 8.516 5.470 7.274 0.167 The 99% VaR estimates in Table 13 are within the 99% VaR range (3.518, 9.813) derived in Section 6. At each truncation band, increasing the decay factor leads to higher values of the 99% VaR. Thus, as the decay factor grows, the 99% VaR generally rises. At each value of the decay factor, in general, reduction of truncated observations produced higher VaR numbers. We explain the latter observation by positive correlation in tails (concurrent extreme events). Consideration of a larger number of tail observations results in higher VaR. The KD and AD statistics, in general, decline with smaller decay factors. We examine how selection of the decay factor and the truncation method affects estimation of marginal risks. The marginal risk is a risk added by an asset to the portfolio risk. It is computed as the difference between the portfolio risk with an analyzed asset and the portfolio risk without the asset. We report the examination results in Table 14. The decay factor of 0.85 does not produce cases "Marginal VaR > Stand-alone VaR" and "Within one maturity band, higher ratings contribute more risk". In sum, the decay factor = 0.85 results in the lower KD and AD statistics and does not lead to irregular cases; the no-truncation method better accounts for correlation in tails. Hence, we would recommend the choice of the decay factor = 0.85 and the no-truncation method. In Table 15 we report marginal 99% VaR, stand-alone 99% VaR, and diversification effects at the decay factor of 0.85 and the no-truncation method. Marginal VaR estimates of Table 15 are consistent with the expectation that, for a given credit rating, bonds with longer maturities contribute more risk. Having marginal VaR numbers, we can identify concentration risks. We find that the C4A3 bond index makes the highest addition to the portfolio 99% VaR: the C4A3 marginal VaR of 0.920 exceeds all other marginal VaR. Marginal risks for all bond indices are smaller than stand-alone risks, which indicates that, indeed, diversification reduces risk. From Table 15, we notice that the C4A1 and C3A4 bond indices have highest diversification effects. Ch. 7: Stable Modeling of Market and Credit Value at Risk 295 Table 14 Marginal risk for symmetric dependent credit returns Decay factor Truncation (%) Cases: Cases: Higher Marginal VaR > ratings assets Stand-alone VaR contribute more risk 0.85 10-90 0 0 5-95 0 0 No 0 0 0.94 10-90 0 0 5-95 0 0 1-99 3 2 No 0 2 0.975 10-90 0 0 5-95 0 4 1-99 2 4 No 3 4 Table 15 Marginal VaR, stand-alone 99% VaR, and diversification effects for bond indices (decay factor = 0.85, no truncation) Bond indices Marginal VaR Stand-alone VaR Diversification effect C1A1 0.199 0.284 0.085 C2A1 0.338 0.509 0.171 C3A1 0.572 0.734 0.162 C4A1 0.713 0.931 0.218 C1A2 0.245 0.285 0.040 C2A2 0.494 0.505 0.011 C3A2 0.575 0.689 0.114 C4A2 0.788 0.890 0.102 C1A3 0.190 0.286 0.096 C2A3 0.403 0.530 0.127 C3A3 0.592 0.719 0.127 C4A3 0.920 0.949 0.029 C1A4 0.185 0.290 0.105 C2A4 0.338 0.511 0.173 C3A4 0.522 0.741 0.219 C4A4 0.803 0.960 0.157 We studied stable modeling of portfolio risk under the assumptions of the independent and symmetric dependent instruments. In the next section we consider portfolio risk evaluation in the most general case ­ skewed dependent instruments. 296 S.T. Rachev et al. 8. Stable modeling of portfolio risk for skewed dependent credit returns We quantify portfolio risk RP by generating a distribution of its possible values and deriving a portfolio VaR from the constructed distribution of RP . In a case of portfolio assets with skewed dependent credit returns, simulations of the RP values should reflect the "cumulative" skewness and maintain the dependence (association) among them. In order to do that, we decompose single credit returns Ri into two independent parts: the first part accounts for dependence and the second ­ for skewness. Then, we obtain the portfolio dependence and skewness components separately aggregating the dependence and skewness parts of individual credit returns. Simulations of the portfolio credit returns values RP can be divided into three portions: (i) generation of the portfolio dependence component maintaining the dependence structure among individual credit returns, (ii) generation of the portfolio skewness component, and (iii) computation of RP as a sum of the two generated components. Explanations of our methodology are provided below. A stable random variable R S(,,0) can be decomposed (in distribution) into two independent stable random variables R(1) and R(2): R d = R(1) + R(2) , where R(1) S(1,1,0), R(2) S(2,2,0), = 1 + 2 1/ , (12) = 1 1 + 2 2 1 + 2 . (13) Suppose that: (i) R(1) is a symmetric stable variable: 1 = 0; (ii) 1 = 2 = . Then, formulas (12) and (13) can be reduced to the following expressions: = 21/ , (14) = 1 2 2. (15) From Equations (14) and (15), we have = 2-1/ , 2 = 2. In sum, a stable random variable R S(,,0) can be decomposed (in distribution) into two independent stable random variables: symmetric R(1) and skewed R(2): R d = R(1) + R(2) , (16) Ch. 7: Stable Modeling of Market and Credit Value at Risk 297 where R(1) S 2-1/ ,0,0 , (17) R(2) S 2-1/ ,2,0 . (18) Using methodology (16)­(18), we can divide individual credit returns Ri SRi (Ri , Ri ,0) into the "dependence" and "skewness" parts. First, we partition Ri into the "symmetry" and "skewness" fragments: Ri d = R (1) i + R (2) i , where R(1) i SRi 2-1/Ri Ri ,0,0 , R(2) i SRi 2-1/Ri Ri ,2Ri ,0 , parts R(1) i and R(2) i are independent, i = 1,...,n. Second, we suppose: (i) R(1) i , i = 1, ...,n, are dependent and (ii) R(2) i , i = 1,...,n, are independent. Consequently, symmetric terms R(1) i explain dependence (association) among Ri's and terms R(2) i account for skewness of Ri's. Based on Property 1 (see Section 7), R(1) i SRi (2-1/Ri Ri ,0,0) can be written as a transformation of a normal random variable: R(1) i = S 1/2 i Gi, where Gi S2(Gi ,0,0) = N 0,22 Gi , Si SRi /2 2-2/Ri 2 Ri 2 Gi cos 4 2/Ri ,1,0 , Si is independent of Gi, i = 1,...,n. Random rescaling transformations of normal variables Gi into R(1) i maintain the dependence structure. Therefore, from the dependence among Gi's we can determine the dependence among R(1) i , or the dependence among Ri. Adding separately the dependence and skewness terms of Ri 's, we obtain the two components of the portfolio returns RP : RP = R(1) P + R(2) P , (19) where R (1) P = n i=1 wiR (1) i = n i=1 wiS 1/2 i Gi is the "dependence" component and R (2) P = n i=1 wiR (2) i is the "skewness" component. 298 S.T. Rachev et al. We simulate the RP values based on decomposition (19): RP = R(1) P + R(2) P . The simulations are executed according to the next algorithm:45 Step 1: Estimate stable parameters of Ri: Ri , Ri , Ri , Ri. Step 2: "Center" the Ri observations: R i = Ri - Ri . Further on, we shall assume Ri = 0 and consider R i as Ri: Ri SRi (Ri ,Ri ,0), i = 1,...,n. Step 3: Evaluate the covariance matrix of normal random variables (Gi)1 i n at time T + 1, T +1 = {cij,T +1|T }, i = 1,..., n,j = 1,...,n, using exponential weighting: c2 i,T +1|T = (1 - ) K k=0 k R2 i,T -k, c2 ij,T +1|T = (1 - ) K k=0 k Ri,T -kRj,T -k, where T +1|T denotes a forecast for time T +1 conditional on information up to time T ; is a decay factor, 0 < < 1; K is a number of observations' lags. Step 4: Generate a value of the multivariate normal random variable G = (G1,G2,..., Gn) with the covariance matrix T +1. Step 5: Simulate values of stable random variables Si SRi /2 21-2/Ri 2 Ri c2 i cos 4 2/Ri ,1,0 , i = 1,...,n. Step 6: Compute R (1) i = S 1/2 i Gi, i = 1,...,n. Step 7: Generate R (2) i SRi (2-1/Ri Ri ,2Ri ,0), i = 1,...,n. Step 8: Calculate RP = n i=1 wiR (1) i + n i=1 wiR (2) i . Step 9: Repeat Steps 4­8 a large number of times to form an RP -distribution. Derive a portfolio VaR estimate as the negative of a chosen quantile of the RP -distri- bution. We implement the suggested procedure (Step 1­Step 9) for the risk assessment of the same portfolio of indices as in Section 7. We suppose that returns on indices are dependent skewed--stable. The portfolio VaR estimates are presented in Table 16. The 99% portfolio VaR estimates fall within the 99% VaR range (3.518, 9.813) of Section 6. From Table 16, the VaR magnitude generally: (i) increases when the decay factor increases from 0.85 to 0.94; (ii) declines when changes from 0.94 to 0.975. Thus, the decay factor = 0.94 leads to more conservative VaR estimates. The 1%­99% truncation band appears to produce the lowest KD and AD statistics. Based on our observations, we 45 This algorithm is an extended version of the algorithm in Section 7. Ch. 7: Stable Modeling of Market and Credit Value at Risk 299 Table 16 Portfolio VaR for skewed dependent credit returns Decay Truncation Portfolio VaR Kolmogorov Andersonfactor points (%) distance Darling 99% VaR 95% VaR 0.85 10-90 4.939 2.904 7.22 0.20 5-95 5.380 3.162 5.64 0.18 No 5.449 3.236 5.43 0.17 0.94 10-90 5.101 3.009 6.53 0.19 5-95 5.456 3.248 5.24 0.17 1-99 5.596 3.363 4.70 0.14 No 5.455 3.231 5.13 0.17 0.975 10-90 5.112 3.021 6.54 0.19 5-95 5.416 3.238 5.34 0.17 1-99 5.471 3.307 4.37 0.14 No 5.298 3.238 5.43 0.15 would recommend to employ = 0.94 and the 1%­99% truncation band in VaR derivations under the assumption of skewed dependent credit returns. We computed marginal VaRs for the same combinations of the decay factor and the truncation band as in Table 16. The marginal VaR estimates were smaller than the corresponding stand-alone VaR measurements, which supports feasibility of suggested procedure for simulating portfolio returns. We have applied stable modeling to the total risk assessment of credit returns. Below we analyze stable modeling of isolated credit risk. 9. One-factor model of portfolio credit risk In this section we outline a one-factor model for quantifying portfolio credit risk. The model is built on two postulations: (i) constituent parts of the credit returns are the creditrisk-free part and the credit risk premium; (ii) the credit risk spread follows a stable law. Applying the one-factor model, in the following sections we quantify credit risk for single instruments and then estimate portfolio credit risk as a cumulative result of stable distributed individual credit risks. Similarly to the previous sections, we assume that a portfolio includes n assets. Then, the portfolio return is given by RP = n i=1 wiRi , where Ri is the return on the i-th asset, wi is the weight of the i-th asset, i = 1,...,n, n i=1 wi = 1. We conjecture that individual returns Ri depend on one credit-risk-free factor Yi: Ri = ai + biYi + Ui, (20) 300 S.T. Rachev et al. where ai and bi are constants, Ui is the residual representing compensation for credit risk and random noise,46 i = 1,...,n. Suppose the i-th portfolio instrument is a corporate bond of maturity with returns Ri. There are two possible choices for an underlying credit-risk-free factor Yi: (i) returns on a Treasury bond of the same maturity ; (ii) returns on a -year bond with a credit rating AAA. Then, the spread Ui = Ri - ai - biYi reflects charges for credit risk. If the j-th portfolio asset is a swap with a counterparty that has a low credit rating, say BBB, we can choose, as an underlying factor Yj , returns on a similar swap with a company that has a credit rating AAA: Rj = aj + bj Yj + Uj , the term Uj accounts for the credit risk of the BBB-swap. We impose the following assumptions on the components of model (20): (i) Credit risk spreads Ui are strictly stable, Ui SUi (Ui ,Ui ,0),47 Ui > 1. (ii) Default-free factors Yi are strictly stable, Yi SYi (Yi ,Yi ,0),48 Yi > 1. (iii) Ui and Yi are independent of each other, i = 1,...,n. Then, the portfolio return RP can be decomposed into three components: RP = A + YP + UP , where YP expresses aggregate effect of underlying factors, UP represents portfolio credit risk, A = n i=1 wiai, YP = n i=1 wibiYi, UP = n i=1 wiUi. We evaluate the portfolio credit risk UP in two steps: (i) quantifying credit risk of each asset Ui; (ii) estimating UP as a cumulative result of individual Ui, i = 1,...,n. Section 10 discusses credit risk evaluation for single portfolio assets. Section 11 examines portfolio credit risk estimation under the assumptions of independent, symmetric dependent, and skewed dependent credit risks. 10. Credit risk evaluation for portfolio assets Approximations of the credit risk premium values Ui for portfolio assets can be obtained using model (20): Uit = Rti - ^ai - ^biYti, (21) 46 We interpret the yield spread as the credit risk premium and include the noise factor into the credit risk part. The noise factor could incorporate taxability, liquidity, and other premiums. 47 The shift of Ui is, in fact, incorporated in ai . 48 Yi is the centered return. If the returns of portfolio instruments, Zi, are non-centered, then we take Yit = Zit - Zi, t = 1,... ,T . Ch. 7: Stable Modeling of Market and Credit Value at Risk 301 where ^ai and ^bi are the OLS estimates, ^ai = T t=1 Y2 it T t=1 Rit - T t=1 Yit T t=1 Rit Yit T T t=1 Y2 it - ( T t=1 Yit )2 , (22) ^bi = T T t=1 Rit Yit - T t=1 Yit T t=1 Rit T T t=1 Y2 it - ( T t=1 Yit )2 , (23) i = 1,...,n; t = 1,...,T. Estimators ^ai and ^bi, given by expressions (22) and (23), are unbiased.49 We analyze credit risk of corporate bonds applying one-factor model (20). Assume that returns on an index of the US corporate bonds, Ri, are described by returns on a creditrisk-free factor, Yi , and a credit spread, Ui : Ri = ai + biYi + Ui, where ai and bi are constants, i = 1,...,16. We examine returns on the same 16 indices as in Section 5 (see Table 10): Ri {RC1A1, RC2A1, RC3A1, RC4A1, RC1A2, RC2A2, RC3A2, RC4A2, RC1A3, RC2A3, RC3A3, RC4A3, RC1A4, RC2A4, RC3A4, and RC4A4}. We choose, as corresponding credit-risk-free factors, returns on the indices of US government bonds in the same maturity band: Yi {RG1O2, RG2O2, RG3O2, RG4O2}.50 For example, if we consider returns on the index of bonds with maturity from one to three years, RC1A1, then the returns on the index of the government bonds with maturity from one to three years, RG1O2, serve as the underlying credit-risk-free factor. We approximate the percentage return values of the individual credit risks Ui, following approach (21): (i) run OLS regressions of model (20), (ii) compute the residuals' series Ui. Coefficients of the OLS regressions are given in Appendix B, Table B.3. Obtained sets of OLS credit risk premiums Ui are plotted in Figure 18 and in figures of Appendix C. Empirical densities of Ui are shown in Figure 19 and in Appendix C. We observe that the credit risk spread series Ui exhibit volatility clusters and heavy tails. Such behavior of the individual returns sets can be captured by stable and GARCH models. Stable modeling of the credit risk premiums Ui, entailed values of < 1.6, 0, and 0 (see Table 17). These values of parameter estimates indicate that credit risk spreads of the corporate bonds' indices are fat-tailed and almost symmetric. Table 17 presents the following and values of the credit risks of the bond indices with a maturity band from one to three years: AAA bonds: = 1.333 and = 0.011; AA bonds: = 1.379 and = 0.030; A bonds: = 1.393 and = -0.021; BBB bonds: = 1.412 and = 0.004. 49 For analysis of asymptotic properties of OLS estimators (22) and (23) under the stable distribution assumption for the disturbance term, see Götzenberger, Rachev and Schwartz (1999). 50 A digit after letter "G" denotes the maturity band: 1 ­ from 1 to 3 years, 2 ­ from 3 to 5 years, 3 ­ from 5 to 7 years, 4 ­ from 7 to 10 years. 302 S.T. Rachev et al. Fig. 18. OLS credit risk premium of the C1A1 bond index. Table 17 Stable and normal fitting of the OLS credit risk premiums of bond indices OLS credit Maturity Normal Stable risk of bond (years) indices Mean Standard deviation C1A1 1-3 0.0 0.045 1.333 0.011 0.000 0.017 C2A1 3-5 0.0 0.075 1.528 -0.089 -0.001 0.033 C3A1 5-7 0.0 0.096 1.590 -0.023 0.000 0.047 C4A1 7-10 0.0 0.116 1.456 -0.026 0.000 0.051 C1A2 1-3 0.0 0.037 1.379 0.030 0.001 0.015 C2A2 3-5 0.0 0.064 1.523 -0.074 0.000 0.029 C3A2 5-7 0.0 0.086 1.591 -0.060 0.000 0.044 C4A2 7-10 0.0 0.110 1.426 0.005 0.001 0.050 C1A3 1-3 0.0 0.038 1.393 -0.021 0.000 0.015 C2A3 3-5 0.0 0.069 1.483 -0.084 0.000 0.029 C3A3 5-7 0.0 0.098 1.519 -0.073 0.000 0.042 C4A3 7-10 0.0 0.124 1.366 -0.017 0.001 0.048 C1A4 1-3 0.0 0.074 1.412 0.004 0.001 0.018 C2A4 3-5 0.0 0.096 1.527 -0.024 0.001 0.033 C3A4 5-7 0.0 0.113 1.552 -0.077 0.000 0.048 C4A4 7-10 0.0 0.1424 1.480 -0.055 0.001 0.055 Ch. 7: Stable Modeling of Market and Credit Value at Risk 303 Fig. 19. Stable and normal fitting of C1A1 OLS-credit-risks. Plots of the stable and normal fitting of the OLS credit risk spreads Ui are shown on Figure 19 and in Appendix C. Figures demonstrate that stable modeling well captures excess kurtosis and heavy tails of the credit risks Ui. The GARCH approach models clustering of volatilities and fat tails, by expressing the conditional variance as an explicit function of past information: Ri,t = ai + biYi,t + Ui,t , (24) where Ui,t = i,t i,t , (25) i,t N(0,1), (26) 2 i,t = ci + p j=1 i,j 2 i,t-j + q j=1 i,j U2 i,t-j , (27) i = 1,...,n; t = 1,...,T. We shall name model (24)­(27) as a GARCH(p,q)-normal model because it is based on the normality assumption for the disturbance term. In order to detect GARCHdependencies, we examine sample autocorrelation and partial autocorrelation functions of the squared residuals Ui. Visual inspection of the correlograms suggests values of p and q. 304 S.T. Rachev et al. Fig. 20. Credit risks: OLS and GARCH. Applying the Box­Jenkins methodology, we find that p = q = 1 is adequate to capture temporal dependence of volatilities: 2 i,t = ci + i2 i,t-1 + iU2 i,t-1. (28) Coefficients of model (24)­(26) and (28) with Ri {RC1A1, RC2A1, RC3A1, RC4A1, RC1A2, RC2A2, RC3A2, RC4A2, RC1A3, RC2A3, RC3A3, RC4A3, RC1A4, RC2A4, RC3A4, and RC4A4} and Yi {RG1O2, RG2O2, RG3O2, RG4O2} are reported in Appendix B, Table B.4. Densities of the GARCH(1,1)-normal residuals Ui,t = ci + i2 i,t-1 + iU2 i,t-1 ×i,t are displayed in Figures 20 and in Appendix D. Graphs demonstrate that the GARCH credit risk series have lower peaks. In the portfolio context, implementation of the GARCH models is computationally complex because a number of parameters rapidly increases as the portfolio expands.51 Hence, we evaluate portfolio credit risk UP based on stable modeling of individual credit risks with accounting for GARCH effects by exponential weighting of observations.52 In estimation of UP , we separately investigate cases of independent, symmetric dependent, skewed dependent credit risks of portfolio instruments. 51 For references on the multivariate GARCH, see Engle and Kroner (1995). 52 An approach of modeling time-varying volatilities by exponential weighting follows the RiskMetrics' exponentially weighted moving average model described in Morgan (1995). Ch. 7: Stable Modeling of Market and Credit Value at Risk 305 11. Portfolio credit risk In this section we follow the one-factor model of Section 9 and evaluate portfolio credit risk as a cumulative effect of stable distributed individual credit risks. We impose different assumptions on their distributions: independent, symmetric dependent, and skewed dependent. We show implementation of the approach on a portfolio of equally weighted OLS-credit-risk premiums from Section 10. 11.1. Independent credit risks Suppose credit-risk-premiums are: (i) characterized by the same tail index ; (ii) independent. Then, by the additivity property of stable variables (see Section 3), the portfolio credit risk UP = n i=1 wiUi is stably distributed: UP S(UP ,UP ,0), where is the tail index, UP is the scale parameter, UP is the skewness parameter, UP = n i=1 |wi|Ui 1/ , (29) UP = n i=1[sign(wi)Ui (|wi|Ui )] n i=1(|wi|Ui ) . (30) Consider a portfolio of equally weighted OLS-credit-risk premiums from Section 10. Assume credit-risk-premiums are independent and have the same tail index . We take = 1.472, the average of the values for the credit-risk-premium series (see Table 17), and recompute other stable parameters: Ui , Ui , and Ui . New estimates are reported in Table 18. Similarly to returns on bond indices, a condition of the same tail index for all analyzed credit risk series does not seem to be very restraining: new parameter estimates (Table 18) do not deviate much from the previous parameter estimates (Table 17). Since obtained estimates of are very small, we assume = 0. We evaluate portfolio parameters applying formulas (29), (30): ^UP = 0.015, ^UP = -0.038. Thus, UP S1.472(0.015,-0.038,0). The 99% (95%) credit VaR is derived as the negative of the 1% (5%) quantile of the UP - distribution: the 99% (95%) VaR equals 0.125 (0.046). Having analytic formulas for the UP parameters, we obtained estimates of portfolio credit risk without simulations. 11.2. Symmetric dependent credit risks In order to assess portfolio credit risk, we obtain portfolio credit VaR. It is computed in two steps: (i) simulating a distribution of the UP = n i=1 wiUi values; (ii) inferring port- 306 S.T. Rachev et al. Table 18 Stable fitting of the OLS credit risk premiums with fixed OLS credit risk of Maturity Stable parameters at = 1.472 bond indices (years) C1A1 1-3 0.000 0.000 0.018 C2A1 3-5 -0.090 -0.001 0.032 C3A1 5-7 -0.019 0.000 0.045 C4A1 7-10 -0.019 0.001 0.052 C1A2 1-3 0.023 0.001 0.015 C2A2 3-5 -0.072 -0.001 0.029 C3A2 5-7 -0.039 0.000 0.042 C4A2 7-10 -0.004 0.000 0.051 C1A3 1-3 -0.040 0.000 0.015 C2A3 3-5 -0.084 0.000 0.029 C3A3 5-7 -0.067 0.000 0.041 C4A3 7-10 -0.032 0.001 0.049 C1A4 1-3 -0.010 0.001 0.019 C2A4 3-5 0.011 0.001 0.033 C3A4 5-7 -0.071 -0.001 0.046 C4A4 7-10 -0.053 0.001 0.055 folio credit VaR from the simulated UP distribution. This section examines the case of symmetric individual credit risks Ui: Ui SUi (Ui ,0,0), i = 1,...,n. We simulate UP applying the methodology from Section 7: (i) generate individual credit risks Ui with the same dependence structure as the Ui's. We express Ui as a transformation of a normal random variable: Ui = S 1/2 i Gi, where Gi S2(Gi ,0,0) = N 0,22 Gi , Si SUi /2 2 Ui 2 Gi cos 4 2/Ui ,1,0 , Si is independent of Gi, i = 1,...,n. The dependence among Ui can be explained by the dependence among Gi, i = 1,...,n. We form dependent normal variables Gi, preserving the initial dependence. Next, we generate Ui = S 1/2 i Gi, where Si is a simulated value of Si ; (ii) calculate UP = n i=1 wiUi. Ch. 7: Stable Modeling of Market and Credit Value at Risk 307 Table 19 Portfolio credit VaR for symmetric credit risks Decay Truncation Portfolio VaR Kolmogorov Andersonfactor points (%) distance Darling 99% VaR 95% VaR 0.85 10-90 3.502 1.918 8.071 0.210 5-95 3.710 1.896 8.898 0.228 No 3.396 1.856 7.692 0.199 0.94 10-90 3.594 1.963 7.680 0.200 5-95 3.643 1.941 8.162 0.209 1-99 3.476 1.975 8.847 0.227 No 3.321 1.792 6.736 0.164 0.975 10-90 3.623 1.877 7.578 0.194 5-95 3.435 1.943 9.085 0.234 1-99 3.578 2.004 9.665 0.254 No 3.293 1.739 7.174 0.167 A portfolio credit VaR can be measured from the UP -distribution. As an illustration of the approach, we estimate credit risk for a portfolio of equally weighted OLS-credit-risk premiums of bond indices (see Section 10) assuming they are symmetric.53 The estimation results are presented in Table 19. The portfolio credit VaR does not demonstrate a certain pattern of dependence on the decay factor. For each decay factor, reduction of the truncated observations does not seem to affect the portfolio credit VaR in a particular fashion. The no-truncation method approach led to the smallest VaR measurements. Possibly, the credit risk residuals of the investment grade indices have negative correlations in far tails. Taking into account more observations with negative correlations reduces the VaR estimates. Since the decay factor does not influence the VaR results in a specific way and the KD and AD statistics are smaller at the no-truncation approach, in further analysis, we consider the no-truncation method and arbitrarily select the decay factor of 0.85. Computation of the marginal VaR, stand-alone VaR, diversification effects for the no-truncation approach and the decay factor = 0.85 is summarized in Table 20. From Table 20, highest contributions to portfolio credit risk are made by the C4A4, C4A3, and C4A2 bond indices: their marginal 99% VaR equal 0.366, 0.295, and 0.296. The credit risk premium of the C4A1 index displays the largest diversification effect. 11.3. Skewed dependent credit risks For estimation of portfolio risk for the skewed dependent credit risks, we propose to employ the approach of Section 8: (i) split individual credit risks Ui into the dependence and 53 The symmetry proposition is plausible: the skewness parameters of credit risks premiums of bond indices are small (see Table 16). 308 S.T. Rachev et al. Table 20 Marginal VaR, stand-alone 99% VaR, and diversification effects for credit risk premiums of bond indices (decay factor = 0.85, no truncation) Bond indices Marginal VaR Stand-alone VaR Diversification effect C1A1 0.175 0.191 0.016 C2A1 0.203 0.251 0.048 C3A1 0.162 0.305 0.143 C4A1 0.145 0.441 0.296 C1A2 0.024 0.148 0.124 C2A2 0.153 0.222 0.069 C3A2 0.180 0.290 0.110 C4A2 0.296 0.453 0.157 C1A3 0.013 0.149 0.136 C2A3 0.097 0.244 0.147 C3A3 0.203 0.325 0.122 C4A3 0.295 0.507 0.212 C1A4 0.079 0.168 0.089 C2A4 0.091 0.243 0.152 C3A4 0.142 0.346 0.204 C4A4 0.366 0.457 0.091 skewness parts; (ii) find the portfolio dependence and skewness components by combining the dependence and skewness parts of single credit risks; (iii) evaluate the portfolio credit risk as a sum of the dependence and skewness fragments. Details are given below. We divide individual credit risks Ui SUi (Ui ,Ui ,0) into the "dependence" and "skewness" parts, applying methodology (16)­(18) (see Section 8): Ui d = U (1) i + U (2) i , where U (1) i SUi 2-1/Ui Ui ,0,0 , U (2) i SUi 2-1/Ui Ui ,2Ui ,0 , parts U (1) i and U (2) i are independent, i = 1,...,n. We assume: (i) U (1) i , i = 1,...,n, are dependent and (ii) U (2) i , i = 1,...,n, are independent. Then, symmetric components U (1) i explain dependence (association) among Ui's and components U (2) i depict skewness of Ui's. By Property 1 (see Section 7), U (1) i SUi (2-1/Ui Ui ,0,0) can be interpreted as a transformation of a normal random variable: U (1) i = S 1/2 i Gi, Ch. 7: Stable Modeling of Market and Credit Value at Risk 309 where Gi S2(Gi ,0,0) = N 0,22 Gi , Si SUi /2 2-2/Ui 2 Ui 2 Gi cos 4 2/Ui ,1,0 , Si is independent of Gi, i = 1,...,n. Random rescaling transformations of normal variables Gi into U (1) i maintain the dependence structure. Hence, we can derive the dependence among U(1) i , or the dependence among Ui, from the dependence among Gi 's. Combining separately the dependence and skewness terms of Ui's, we obtain the two components of the portfolio credit risk UP : UP = U(1) P + U(2) P , U(1) P = n i=1 wiU(1) i = n i=1 wiS 1/2 i Gi, U(2) P = n i=1 wiU(2) i , where U(1) P is the "dependence" component and U(2) P is the "skewness" component. The portfolio credit risk can be evaluated as a sum of the dependence and skewness frag- ments. We suggested methodologies for portfolio credit risk assessment and demonstrated their applications on analysis of returns on bond indices. The methodologies can be employed for risk evaluation of any financial instruments if they have fat-tailed and/or skewed distri- butions. 12. Conclusions The Value-at-Risk (VaR) measurements are widely applied to estimate the exposure to market and credit risks. The traditional approaches to VaR computations ­ the delta method, historical simulation, Monte Carlo simulation, and stress-testing ­ do not provide satisfactory evaluation of possible losses. The delta-normal methods do not describe well financial data with heavy tails. Hence, they underestimate VaR measurements in the tails. The historical simulation does not produce robust VaR estimates since it is not reliable in approximating low quantiles with a small number of observations in the tails. The stress-testing VaR estimates are subjective. The Monte Carlo VaR numbers might be affected by model misspecification. 310 S.T. Rachev et al. This work proposes the application of stable distributions in market and credit VaR estimation. Our empirical analysis verifies that stable modeling well captures skewness and heavy-tails of market and credit returns and isolated credit risks. The superior fit allows to derive accurate risk estimates. The in-sample- and forecast-evaluation shows that stable VaR modeling outperforms the normal modeling for high values of the VaR confidence level: * the stable modeling generally produces conservative and accurate 99% VaR estimates, which is preferred by financial institutions and regulators, * the normal method leads to overly optimistic forecasts of losses in the 99% VaR estima- tion, * the normal modeling is acceptable for the 95% VaR estimation. Based on the properties of stable distributions, we design new methods for the correlation estimation and simulating portfolio values. We employ the methods in evaluation of portfolio and marginal VaR for three cases of the credit returns: independent, symmetric dependent, and skewed dependent. We suggest a one-factor model of credit risks. Applying the one-factor model, we quantify credit risk for individual assets and then assess portfolio credit risk as an aggregate effect of stable distributed individual credit risks. The stable Paretian model, while sharing the main properties of the normal distribution leading to the CLT (Central Limit Theorem), provides at the same time superior fit in modeling market and credit VaR. However, additional research is needed. Future work is this direction will be construction of models that capture the features of financial empirical data such as heavy tails, time-varying volatility, and short and long range dependence.54 In order to describe thick tails, one can employ the conditional heteroskedastic models based on the stable hypothesis.55 ARMA-stable-GARCH models can incorporate both heavy tails and time-varying volatility.56 The fractional-stable GARCH model can capture all observed phenomena in financial data: heavy tails, time-varying volatility, and short- and long-range dependence. An analysis of VaR estimation with ARMA--stable, ARMA-stable-GARCH, and fractional-stable GARCH models will be provided elsewhere. 54 For some preliminary results see Liu and Brorsen (1995), Mittnik, Rachev and Paolella (1998), Mittnik, Paolella and Rachev (1997, 1998a, b), Panorska, Mittnik and Rachev (1995). 55 These models are named as ARMA--stable models. 56 For discussion of stable-GARCH models see Panorska, Mittnik and Rachev (1995) and Mittnik, Paolella and Rachev (1997). Ch. 7: Stable Modeling of Market and Credit Value at Risk 311 Appendix A. Stable modeling of credit returns in figures Fig. A.1. G302 daily returns. Fig. A.2. Stable and normal fitting of the G302 index. 312 S.T. Rachev et al. Fig. A.3. VaR estimation for the G302 index. Fig. A.4. C3A2 daily returns. Ch. 7: Stable Modeling of Market and Credit Value at Risk 313 Fig. A.5. Stable and normal fitting of the C3A2 index. Fig. A.6. VaR estimation for the C3A2 index. 314 S.T. Rachev et al. Fig. A.7. C4A2 daily returns. Fig. A.8. Stable and normal fitting of the C4A2 index. Ch. 7: Stable Modeling of Market and Credit Value at Risk 315 Fig. A.9. VaR estimation for the C4A2 index. Fig. A.10. C3A3 daily returns. 316 S.T. Rachev et al. Fig. A.11. Stable and normal fitting of the C3A3 index. Fig. A.12. VaR estimation for the C3A3 index. Ch. 7: Stable Modeling of Market and Credit Value at Risk 317 Appendix B. Tables Table B.1 Deviations of VaR estimates for bond indices Index 99% VaRmodel - 99% VaRempirical 95% VaRmodel - 95% VaRempirical Normal Stable Normal Stable G102 -0.044 0.033 0.005 -0.008 G202 -0.072 0.058 0.003 -0.020 G302 -0.130 0.008 0.009 -0.013 G402 -0.143 0.004 0.000 -0.027 C1A1 -0.042 0.046 0.001 -0.010 C2A1 -0.060 0.072 0.014 -0.008 C3A1 -0.139 0.047 0.009 -0.016 C4A1 -0.171 0.048 0.014 -0.013 C1A2 -0.036 0.048 0.002 -0.007 C2A2 -0.061 0.062 0.007 -0.010 C3A2 -0.113 0.026 0.007 -0.018 C4A2 -0.148 0.020 0.019 -0.008 C1A3 -0.030 0.049 -0.001 -0.010 C2A3 -0.079 0.061 0.007 -0.012 C3A3 -0.145 0.014 0.010 -0.015 C4A3 -0.152 0.056 0.027 -0.002 C1A4 -0.031 0.028 0.031 -0.005 C2A4 -0.086 0.033 0.025 -0.015 C3A4 -0.166 0.030 0.014 -0.018 C4A4 -0.160 0.098 0.019 -0.016 H0A1 -0.154 0.013 0.019 -0.013 318 S.T. Rachev et al. Table B.2 Stable VaR estimates for bond indices with fixed Bond index 99% VaR 95% Different Fixed = 1.708 Different Fixed = 1.708 C1A1 0.284 0.257 0.119 0.116 C2A1 0.509 0.494 0.236 0.233 C3A1 0.734 0.732 0.353 0.351 C4A1 0.931 0.979 0.467 0.471 C1A2 0.285 0.273 0.125 0.123 C2A2 0.505 0.517 0.244 0.245 C3A2 0.689 0.747 0.355 0.360 C4A2 0.890 1.003 0.474 0.485 C1A3 0.286 0.277 0.125 0.124 C2A3 0.530 0.523 0.248 0.247 C3A3 0.719 0.763 0.361 0.365 C4A3 0.949 1.022 0.485 0.491 C1A4 0.290 0.260 0.119 0.116 C2A4 0.511 0.471 0.228 0.224 C3A4 0.741 0.716 0.343 0.340 C4A4 0.960 0.934 0.451 0.447 Table B.3 Coefficients of OLS regressions Dependent variable Variables Coeff. Dependent variable Variables Coeff. RC1A1 C 0.004723 RC1A3 C 0.003887 RG102 0.882424 RG102 0.946025 RC2A1 C 0.006183 RC2A3 C 0.005709 RG202 0.770132 RG202 0.816271 RC3A1 C 0.005550 RC3A3 C 0.005051 RG302 0.835640 RG302 0.853295 RC4A1 C 0.003735 RC4A3 C 0.003806 RG402 0.847226 RG402 0.877039 RC1A2 C 0.003357 RC1A4 C 0.006401 RG102 0.951165 RG102 0.874032 RC2A2 C 0.005733 RC2A4 C 0.009603 RG202 0.808308 RG202 0.760162 RC3A2 C 0.004730 RC3A4 C 0.008296 RG302 0.853315 RG302 0.804311 RC4A2 C 0.004118 RC4A4 C 0.007725 RG402 0.868154 RG402 0.803091 Ch. 7: Stable Modeling of Market and Credit Value at Risk 319 Table B.4 GARCH-normal coefficients Dependent Variables Coeff. Std. errors Variance equation variable Variables Coeff. Std. errors RC1A1 C 0.003581 0.000525 C 2.50E-05 2.02E-06 RG102 0.937996 0.004895 ARCH(1) 0.116681 0.003541 GARCH(1) 0.885367 0.002486 RC2A1 C 0.004948 0.000981 C 7.67E-05 5.76E-06 RG202 0.838944 0.003466 ARCH(1) 0.130119 0.005720 GARCH(1) 0.870004 0.004714 RC3A1 C 0.004199 0.001331 C 0.000152 1.66E-05 RG302 0.893949 0.003650 ARCH(1) 0.130479 0.003970 GARCH(1) 0.866746 0.002716 RC4A1 C 0.004014 0.001539 C 0.000355 3.22E-05 RG402 0.887583 0.003538 ARCH(1) 0.153744 0.008756 GARCH(1) 0.830941 0.008452 RC1A2 C 0.002746 0.000411 C 4.77E-06 7.97E-07 RG102 0.946016 0.003830 ARCH(1) 0.096428 0.002594 GARCH(1) 0.914737 0.002586 RC2A2 C 0.004229 0.000885 C 1.34E-05 2.38E-06 RG202 0.890123 0.003547 ARCH(1) 0.056718 0.002501 GARCH(1) 0.943510 0.001441 RC3A2 C 0.002970 0.001078 C 0.000609 4.66E-05 RG302 0.894861 0.003899 ARCH(1) 0.289805 0.017996 GARCH(1) 0.669240 0.015835 RC4A2 C 0.003420 0.001329 C 0.000302 2.42E-05 RG402 0.918195 0.003240 ARCH(1) 0.180168 0.009135 GARCH(1) 0.817444 0.007086 RC1A3 C 0.002271 0.000421 C 7.06E-06 9.92E-07 RG102 1.003079 0.003215 ARCH(1) 0.137045 0.003494 GARCH(1) 0.887812 0.002061 RC2A3 C 0.005204 0.000664 C 2.01E-05 3.21E-06 RG202 0.903683 0.002247 ARCH(1) 0.124285 0.004417 GARCH(1) 0.905287 0.002271 RC3A3 C 0.005840 0.001114 C 0.000223 2.20E-05 RG302 0.915408 0.003059 ARCH(1) 0.253670 0.007199 GARCH(1) 0.777935 0.004480 RC4A3 C 0.004076 0.001308 C 0.000792 3.16E-05 RG402 0.942102 0.002728 ARCH(1) 0.401945 0.015612 GARCH(1) 0.639974 0.007830 RC1A4 C 0.002450 0.00570 C -3.27E-06 5.58E-07 RG102 1.036861 0.003468 ARCH(1) 0.101918 0.001997 GARCH(1) 0.945209 0.000666 RC2A4 C 0.007017 0.000839 C 5.77E-05 3.97E-06 RG202 0.879618 0.003199 ARCH(1) 0.231563 0.006013 GARCH(1) 0.841086 0.002770 RC3A4 C 0.007452 0.001276 C 3.99E-05 7.17E-06 RG302 0.893645 0.003645 ARCH(1) 0.101316 0.003132 GARCH(1) 0.907304 0.002295 RC4A4 C 0.007402 0.001393 C 0.000194 1.72E-05 RG402 0.887104 0.002892 ARCH(1) 0.179030 0.005716 GARCH(1) 0.840838 0.003809 320 S.T. Rachev et al. Appendix C. OLS credit risk evaluation for portfolio assets in figures Fig. C.1. OLS credit risk premium for the C1A2 bond index. Fig. C.2. Stable and normal fitting of C1A2 OLS-credit-risks. Ch. 7: Stable Modeling of Market and Credit Value at Risk 321 Fig. C.3. OLS credit risk premium for the C1A3 bond index. Fig. C.4. Stable and normal fitting of C1A3 OLS-credit-risks. 322 S.T. Rachev et al. Fig. C.5. OLS credit risk premium for the C3A3 bond index. Fig. C.6. Stable and normal fitting of C3A3 OLS-credit-risks. Ch. 7: Stable Modeling of Market and Credit Value at Risk 323 Fig. C.7. OLS credit risk premium for the C1A4 bond index. Fig. C.8. Stable and normal fitting of C1A4 OLS-credit-risks. 324 S.T. Rachev et al. Appendix D. GARCH credit risk evaluation for portfolio assets in figures Fig. D.1. C2A1 credit risks: OLS and GARCH. Fig. D.2. C3A1 credit risks: OLS and GARCH. Ch. 7: Stable Modeling of Market and Credit Value at Risk 325 Fig. D.3. C4A1 credit risks: OLS and GARCH. Fig. D.4. C1A2 credit risks: OLS and GARCH. 326 S.T. Rachev et al. Acknowledgments We thank C. Marinelli of Columbia University and, especially, B. Racheva-Iotova of Bravo Consulting Group for computational assistance. We also thank Kristina Tetereva of University of Karlsruhe, Germany, for providing data. References Arad, R.W., 1980. Parameter estimation for symmetric stable distributions. International Economic Review 21, 209­220. Basle Committee on Banking Supervision, 1999. Credit risk modelling: Current practices and applications. Cheng, B., Rachev, S.T., 1995. Multivariate stable futures prices. Journal of Mathematical Finance 5, 133­153. Chobanov, G., Mateev, P., Mittnik, S., Rachev, S.T., 1996. Modeling the distribution of highly volatile exchangerate time series. In: Robinson, P., Rosenblatt, M. (Eds.), Time Series. Springer-Verlag, pp. 130­144. DuMouchel, W.H., 1971. Stable distributions in statistical inference. Ph.D. dissertation. Department of Statistics, Yale University. DuMouchel, W.H., 1973. On the asymptotic normality of the maximum-likelihood estimate when sampling from a stable distribution. The Annals of Statistics 1, 948­957. Engle, R.F., Kroner, K., 1995. Multivariate simultaneous generalized GARCH. Econometric Theory, 122­150. Fama, E., 1965. The behavior of stock market prices. Journal of Business 38, 34­105. Fama, E., Roll, R., 1971. Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association 66, 331­338. Federal Reserve System Task Force on Internal Credit Risk Models, 1998. Credit risk models at major U.S. banking institutions: Current state of the art and implications for assessment of capital adequacy. Feuerverger, A., McDunnough, P., 1981. On efficient inference in symmetric stable laws and processes. In: Csörgö, M., et al. (Eds.), Statistics and Related Topics. North-Holland, Amsterdam. Gamrowski, B., Rachev, S.T., 1994. Stable models in testable asset pricing. In: Approximation, Probability, Related Fields. Plenum Press, New York, pp. 223­236. Gamrowski, B., Rachev, S.T., 1995a. A testable version of the Pareto-stable CAPM. Technical Report 292. Department of Statistics, Applied Probability, University of California, Santa Barbara. Gamrowski, B., Rachev, S.T., 1995b. Financial models using stable laws. In: Prohorov, Yu.V. (Ed.), Probability Theory, its Applications. In: Surveys in Applied, Industrial Mathematics, Vol. 2, pp., 556­604. Gamrowski, B., Rachev, S.T., 1996. Testing the validity of value at risk measures. In: Heyde, Prohorov, Pyke, Rachev (Eds.), Athens Conference on Applied Probability and Time Series, Vol. I: Applied Probability. Springer-Verlag, pp. 307­320. Götzenberger, G., Rachev, S.T., Schwartz, E., 1999. Performance measurements: The stable Paretian approach. Working Paper. University of Karlsruhe, Germany. Gupton, G.M., Finger, C.C., Bhatia, M., 1997. CreditMetricsTM ­ Technical Document. JP Morgan, New York. Heathcote, C.R., Cheng, B., Rachev, S.T., 1995. Testing multivariate symmetry. Journal of Multivariate Analysis 54, 91­112. Hill, B.M., 1975. A simple general approach to inference about the tail of a distribution. The Annals of Statistics 3 (5), 1163­1174. Jarrow, R.A., Lando, D., Turnbull, S.M., 1997. A Markov model for the term structure of credit risk spreads. Review of Financial Studies 10 (2), 481­523. JP Morgan, 1995. RiskMetrics, 3rd edition. JP Morgan. Klebanov, L.B., Melamed, J.A., Rachev, S.T., 1994. On the joint estimation of stable law parameters. In: Approximation, Probability, and Related Fields. Plenum Press, New York, pp. 315­320. Ch. 7: Stable Modeling of Market and Credit Value at Risk 327 Kogon, S.M., Williams, D.B., 1998. Characteristic function based estimation of stable distribution parameters. In: Adler, R., et al. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques, Applications. Birkhäuser, Boston, pp. 311­335. Koutrouvelis, I.A., 1980. Regression-type estimation of the parameters of stable laws. Journal of American Statistical Association 75, 918­928. Koutrouvelis, I.A., 1981. An iterative procedure for the estimation of the parameters of stable laws. Communications in Statistics. Simulation and Computation B10, 17­28. Kozubowski, T.J., Rachev, S.T., 1994. The theory of geometric stable distributions and its use in modeling financial data. European Journal of Operations Research: Financial Modeling 74, 310­324. Liu, S.-M., Brorsen, B.W., 1995. Maximum likelihood estimation of a GARCH-stable model. Journal of Applied Econometrics 10, 273­285. Longerstaey, J., Zangari, P., 1996. RiskMetricsTM Technical Document, 4th edition. JP Morgan, New York. Mandelbrot, B.B., 1962. Sur certain prix spéculatifs: Faits empiriques et modéle basé sur les processes stables additifs de Paul Lévy. Comptes Rendus 254, 3968­3970. Mandelbrot, B.B., 1963a. New methods in statistical economics. Journal of Political Economy 71, 421­440. Mandelbrot, B.B., 1963b. The variation of certain speculative prices. Journal of Business 26, 394­419. Mandelbrot, B.B., 1967. The valuation of some other speculative prices. Journal of Business 40, 393­413. McCulloch, J.H., 1986. Simple consistent estimators of stable distribution parameters. Communications in Statistics. Simulation and Computation 15, 1109­1136. McCulloch, J.H., 1996. Financial applications of stable distributions. In: Maddala, G.S., Rao, C.R. (Eds.), Handbook of Statistics ­ Statistical Methods in Finance. Elsevier, Amsterdam, Vol. 14, pp. 393­425. Mittnik, S., Paolella, M.S., Rachev, S.T., 1997. Modeling the persistence of conditional volatilities with GARCHstable processes. Manuscript. Institute of Statistics, Econometrics, University of Kiel, Germany. Mittnik, S., Paolella, M.S., Rachev, S.T., 1998a. Unconditional and conditional distributional models for the Nikkei's index. Asia-Pacific Financial Markets 5, 99­128. Mittnik, S., Paolella, M.S., Rachev, S.T., 1998b. The prediction of down-side risk with GARCH-stable models. Technical Report. Institute of Statistics, Econometrics, University of Kiel, Germany. Mittnik, S., Paolella, M.S., Rachev, S.T., 1998c. A tail estimator for the index of the stable Paretian distribution. Communications in Statistics. Theory and Methods 27, 1239­1262. Mittnik, S., Rachev, S.T., 1991. Alternate multivariate stable distributions, their applications to financial modeling. In: Cambanis, S., et al. (Eds.), Stable Processes and Related Topics. Birkhäuser, Boston, pp. 107­119. Mittnik, S., Rachev, S.T., 1993a. Modeling asset returns with alternative stable distributions. Econometric Reviews 12 (3), 261­330. Mittnik, S., Rachev, S.T., 1993b. Reply to comments on "Modeling asset returns with alternate stable laws" and some extensions. Econometric Reviews 12 (3), 347­389. Mittnik, S., Rachev, S.T., 1996. Tail estimation of the stable index . Applied Mathematics Letters 9, 53­56. Mittnik, S., Rachev, S.T., Chenyao, D., 1996. Distribution of exchange rates: A geometric summation-stable model. In: Proceedings of the Seminar on Data Analysis. Sozopol, Bulgaria, September 12­17, 1996. Mittnik, S., Rachev, S.T., Doganoglu, T., Chenyao, D., 1997. Maximum likelihood estimation of stable Paretian models. Manuscript. Institute of Statistics, Econometrics, University of Kiel, Germany. Mittnik, S., Rachev, S.T., Paolella, M.S., 1998. Stable Paretian modeling in finance: Some empirical and theoretical aspects. In: Adler, R., et al. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Birkhäuser, Boston, pp. 79­110. Nolan, J.P., 1997. Numerical computation of stable densities, distribution functions. Communications in Statistics. Stochastic Models 13 (4), 759­774. Nolan, J.P., 1998a. Maximum likelihood estimation and diagnostics for stable distributions. Working Paper. Department of Mathematics, Statistics, American University. Nolan, J.P., 1998b. Parameterizations of stable distributions. Statistics & Probability Letters 38, 187­195. Panorska, A.K., Mittnik, S., Rachev, S.T., 1995. Stable GARCH models for financial time series. Applied Mathematics Letters 815, 33­37. 328 S.T. Rachev et al. Paulson, A.S., Holcomb, E.W., Leitch, R.A., 1975. The estimation of the parameters of the stable laws. Biometrica 62, 163­170. Pickands, J., 1975. Statistical inference using extreme order statistics. The Annals of Statistics 3 (1), 119­131. Press, J.S., 1972a. Estimation of univariate and multivariate stable distributions. Journal of American Statistical Association 67 (340), 842­846. Press, J.S., 1972b. Applied Multivariate Analysis. Holt, Rinehart, Winston, New York. Rachev, S.T., Racheva-Jotova, B., Hristov, B., Mandev, I., 1999. Technical documentation of Mercury 1.0, Software package for market risk (VaR) modeling of stable distributed financial returns. Rachev, S.T., SenGupta, A., 1993. Laplace­Weibull mixtures for modeling price changes. Management Science 1029­1038. Rachev, S.T., Xin, H., 1993. Test on association of random variables in the domain of attraction of multivariate stable law. Probability and Mathematical Statistics 14 (1), 125­141. Samorodnitsky, G., Taqqu, M.S., 1994. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York. Chapter 8 MODELLING DEPENDENCE WITH COPULAS AND APPLICATIONS TO RISK MANAGEMENT PAUL EMBRECHTS, FILIP LINDSKOG and ALEXANDER MCNEIL Department of Mathematics, ETHZ, CH-8092 Zürich, Switzerland web: www.math.ethz.ch/finance Contents 1. Introduction 331 2. Copulas 332 2.1. Mathematical introduction 332 2.2. Sklar's Theorem 334 2.3. The Fréchet­Hoeffding bounds for joint distribution functions 335 2.4. Copulas and random variables 336 3. Dependence concepts 341 3.1. Linear correlation 341 3.2. Perfect dependence 342 3.3. Concordance 343 3.4. Kendalĺs tau and Spearman's rho 345 3.5. Tail dependence 348 4. Marshall­Olkin copulas 351 4.1. Bivariate Marshall­Olkin copulas 352 4.2. A multivariate extension 354 4.3. A useful modelling framework 355 5. Elliptical copulas 357 5.1. Elliptical distributions 357 5.2. Gaussian copulas 360 5.3. t-copulas 361 6. Archimedean copulas 365 6.1. Definitions 365 6.2. Properties 367 6.3. Kendalĺs tau revisited 370 6.4. Tail dependence revisited 371 Research of the second author was supported by Credit Suisse Group, Swiss Re and UBS AG through RiskLab, Switzerland. The third author acknowledges financial support from Swiss Re. Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 330 P. Embrechts et al. 6.5. Multivariate Archimedean copulas 373 7. Modelling extremal events in practice 377 7.1. Insurance risk 377 7.2. Market risk 380 References 383 Ch. 8: Modelling Dependence with Copulas 331 1. Introduction Integrated Risk Management (IRM) is concerned with the quantitative description of risks to a financial business. Whereas the qualitative aspects of IRM are extremely important, in the present contribution we only concentrate on the quantitative ones. Since the emergence of Value-at-Risk (VaR) in the early nineties and its various generalisations and refinements more recently, regulators and banking and insurance professionals have build up a huge system aimed at making the global financial system safer. Whereas the steps taken no doubt have been very important towards increasing the overall risk awareness, continuously questions have been asked concerning the quality of the safeguards as constructed. All quantitative models are based on assumptions vis-a-vis the markets on which they are to be applied. Standard hedging techniques require a high level of liquidity of the underlying instruments, prices quoted for many financial products are often based on "normal" conditions. The latter may be interpreted in a more economic sense, or more specifically referring to the distributional (i.e., normal, Gaussian) behaviour of some underlying data. Especially for IRM, deviations from the "normal" would constitute a prime source of investigation. Hence the classical literature is full of deviations from the socalled random walk (Brownian motion) model and heavy tails appear prominently. The latter has for instance resulted in the firm establishment of Extreme Value Theory (EVT) as a standard tool within IRM. Within market risk management, the so-called stylised facts of econometrics summarise this situation: market data returns tend to be uncorrelated, but dependent, they are heavy tailed, extremes appear in clusters and volatility is random. Our contribution aims at providing tools for going one step further: what would be the stylised facts of dependence in financial data? Is there a way of understanding so-called normal (i.e., Gaussian) dependence and how can we construct models which allow to go beyond normal dependence? Other problems we would like to understand better are spillover, the behaviour of correlations under extreme market movements, the pros and contras of linear correlation as a measure of dependence, the construction of risk measures for functions of dependent risks. One example concerning the latter is the following: suppose we have two VaR numbers corresponding to two different lines of business. In order to cover the joint position, can we just add the VaR? Under which conditions is this always the upper bound? What can go wrong if these conditions are not fulfilled? A further type of risk where dependence play a crucial role is credit risk: how to define, stress test and model default correlation. The present chapter is not solving the above problem, it presents however tools which are crucial towards the construction of solutions. The notion we concentrate on is that of copula, well known for some time within the statistics literature. The word copula first appeared in the statistics literature 1959 (Sklar, 1959), although similar ideas and results can be traced back to Hoeffding (1940). Copulas allow us to construct models which go beyond the standard ones at the level of dependence. They yield an ideal tool to stress test a wide variety of portfolios and products in insurance and finance for extreme moves in correlation and more general measures of dependence. As such, they gradually are becoming an extra, but crucial, element of best practice IRM. 332 P. Embrechts et al. After Section 2 in which we define the concept of copula in full generality, we turn in Section 3 to an overview of the most important notions of dependence used in IRM. Sections 4, 5 and 6 introduces the most important families of copulas, their properties both methodological as well as with respect to simulation. Throughout these sections, we stress the importance of the techniques introduced within an IRM framework. Finally in Section 7 we discuss some specific examples. We would like to stress that the present chapter only gives a first introduction aimed at bringing together from the extensive copula world those results which are immediately usable in IRM. Topics not included are statistical estimation of copulas and the modelling of dependence, through copulas, in a dynamic environment. As such, the topics listed correspond to a one-period point of view. Various extensions are possible; the interested reader is referred to the bibliography for further reading. 2. Copulas The standard "operational" definition of a copula is a multivariate distribution function defined on the unit cube [0,1]n, with uniformly distributed marginals. This definition is very natural if one considers how a copula is derived from a continuous multivariate distribution function; indeed in this case the copula is simply the original multivariate distribution function with transformed univariate marginals. This definition however masks some of the problems one faces when constructing copulas using other techniques, i.e., it does not say what is meant by a multivariate distribution function. For that reason, we start with a slightly more abstract definition, returning to the "operational" one later. Below, we follow Nelsen (1999) in concentrating on general multivariate distributions at first and then studying the special properties of the copula subset. For further details we refer to Nelsen (1999). Throughout this chapter, for a function H , we denote by DomH and RanH the domain and range respectively of H . Furthermore, a function f will be called increasing whenever x y implies that f (x) f (y). We may also refer to this as f is nondecreasing. A statement about points of a set S Rn, where S is typically the real line or the unit cube [0,1]n, is said to hold almost everywhere if the set of points of S where the statement fails to hold has Lebesgue measure zero. 2.1. Mathematical introduction Definition 2.1. Let S1,...,Sn be nonempty subsets of R, where R denotes the extended real line [-,]. Let H be a real function of n variables such that DomH = S1 ××Sn and for a b (ak bk for all k) let B = [a,b] (= [a1,b1] × × [an,bn]) be an n-box whose vertices are in DomH . Then the H -volume of B is given by VH (B) = sgn(c)H(c), Ch. 8: Modelling Dependence with Copulas 333 where the sum is taken over all vertices c of B, and sgn(c) is given by sgn(c) = 1, if ck = ak for an even number of k's, -1, if ck = ak for an odd number of k's. Equivalently, the H -volume of an n-box B = [a,b] is the n-th order difference of H on B VH (B) = b aH(t) = bn an b1 a1 H(t), where the n first order differences are defined as bk ak H(t) = H(t1,...,tk-1,bk,tk+1,...,tn) - H(t1,...,tk-1,ak,tk+1,...,tn). Definition 2.2. A real function H of n variables is n-increasing if VH (B) 0 for all n-boxes B whose vertices lie in DomH . Suppose that the domain of a real function H of n variables is given by DomH = S1 × × Sn where each Sk has a smallest element ak. We say that H is grounded if H(t) = 0 for all t in DomH such that tk = ak for at least one k. If each Sk is nonempty and has a greatest element bk, then H has marginals, and the one-dimensional marginals of H are the functions Hk with DomHk = Sk and with Hk(x) = H(b1,...,bk-1,x,bk+1,...,bn) for all x in Sk. Higher-dimensional marginals are defined in an obvious way. One-dimensional marginals are just called marginals. Lemma 2.1. Let S1,...,Sn be nonempty subsets of R, and let H be a grounded n-increasing function with domain S1 × × Sn. Then H is increasing in each argument. Lemma 2.2. Let S1,...,Sn be nonempty subsets of R, and let H be a grounded n-increasing function with marginals and domain S1 × × Sn. Then, if x = (x1,...,xn) and y = (y1,...,yn) are any points in S1 × × Sn, H(x) - H(y) n k=1 Hk(xk) - Hk(yk) . For the proof, see Schweizer and Sklar (1983). Definition 2.3. An n-dimensional distribution function is a function H with domain R n such that H is grounded, n-increasing and H(,...,) = 1. It follows from Lemma 2.1 that the marginals of an n-dimensional distribution function are distribution functions, which we denote F1,...,Fn. 334 P. Embrechts et al. Definition 2.4. An n-dimensional copula is a function C with domain [0,1]n such that (1) C is grounded and n-increasing. (2) C has marginals Ck, k = 1,2,...,n, which satisfy Ck(u) = u for all u in [0,1]. Note that for any n-copula C, n 3, each k-dimensional marginal of C is a k-copula. Equivalently, an n-copula is a function C from [0,1]n to [0,1] with the following proper- ties: (1) For every u in [0,1]n, C(u) = 0 if at least one coordinate of u is 0, and C(u) = uk if all coordinates of u equal 1 except uk. (2) For every a and b in [0,1]n such that ai bi for all i, VC([a,b]) 0. Since copulas are joint distribution functions (on [0,1]n), a copula C induces a probability measure on [0,1]n via VC [0,u1] × × [0,un] = C(u1,...,un) and a standard extension to arbitrary (not necessarily n-boxes) Borel subsets of [0,1]n. A standard result from measure theory says that there is a unique probability measure on the Borel subsets of [0,1]n which coincides with VC on the set of n-boxes of [0,1]n. This probability measure will also be denoted VC. From Definition 2.4 it follows that a copula C is a distribution function on [0,1]n with uniformly distributed (on [0,1]) marginals. The following theorem follows directly from Lemma 2.2. Theorem 2.1. Let C be an n-copula. Then for every u and v in [0,1]n, C(v) - C(u) n k=1 |vk - uk|. Hence C is uniformly continuous on [0,1]n. 2.2. Sklar's Theorem The following theorem is known as Sklar's Theorem. It is perhaps the most important result regarding copulas, and is used in essentially all applications of copulas. Theorem 2.2. Let H be an n-dimensional distribution function with marginals F1,...,Fn. Then there exists an n-copula C such that for all x in Rn, H(x1,...,xn) = C F1(x1),...,Fn(xn) . (2.1) If F1,...,Fn are all continuous, then C is unique; otherwise C is uniquely determined on RanF1 × × RanFn. Conversely, if C is an n-copula and F1,...,Fn are distribution functions, then the function H defined above is an n-dimensional distribution function with marginals F1,...,Fn. For the proof, see Sklar (1996). Ch. 8: Modelling Dependence with Copulas 335 From Sklar's Theorem we see that for continuous multivariate distribution functions, the univariate marginals and the multivariate dependence structure can be separated, and the dependence structure can be represented by a copula. Let F be a univariate distribution function. We define the generalized inverse of F as F-1(t) = inf{x R | F(x) t} for all t in [0,1], using the convention inf = -. Corollary 2.1. Let H be an n-dimensional distribution function with continuous marginals F1,...,Fn and copula C (where C satisfies (2.1)). Then for any u in [0,1]n, C(u1,...,un) = H F-1 1 (u1),...,F-1 n (un) . Without the continuity assumption, care has to be taken; see Nelsen (1999) or Marshall (1996). Example 2.1. Let denote the standard univariate normal distribution function and let n R denote the standard multivariate normal distribution function with linear correlation matrix R. Then C(u1,...,un) = n R -1 (u1),...,-1 (un) is the Gaussian or normal n-copula. 2.3. The Fréchet­Hoeffding bounds for joint distribution functions Consider the functions Mn, n and Wn defined on [0,1]n as follows: Mn (u) = min(u1,...,un), n (u) = u1 un, Wn (u) = max(u1 + + un - n + 1,0). The functions Mn and n are n-copulas for all n 2 whereas the function Wn is not a copula for any n 3 as shown in the following example. Example 2.2. Consider the n-cube [1/2,1]n [0,1]n. VWn 1 2 ,1 n = max(1 + + 1 - n + 1,0) - n max 1 2 + 1 + + 1 - n + 1,0 + n 2 max 1 2 + 1 2 + 1 + + 1 - n + 1,0 + + max 1 2 + + 1 2 - n + 1,0 = 1 - n 2 + 0 + + 0. Hence Wn is not a copula for n 3. 336 P. Embrechts et al. The following theorem is called the Fréchet­Hoeffding bounds inequality (Fréchet, 1957). Theorem 2.3. If C is any n-copula, then for every u in [0,1]n, Wn (u) C(u) Mn (u). For more details, including geometrical interpretations, see Mikusinski, Sherwood and Taylor (1992). Although the Fréchet­Hoeffding lower bound Wn is never a copula for n 3, it is the best possible lower bound in the following sense. Theorem 2.4. For any n 3 and any u in [0,1]n, there is an n-copula C (which depends on u) such that C(u) = Wn (u). For the proof, see Nelsen (1999), p. 42. We denote by C the joint survival function for n random variables with joint distribution function C, i.e., if (U1,...,Un)T has distribution function C, then C(u1,...,un) = P{U1 > u1,...,Un > un}. Definition 2.5. If C1 and C2 are copulas, C1 is smaller than C2 (written C1 C2) if C1(u) C2(u) and C1(u) C2(u), for all u in [0,1]n. Note that in the bivariate case, C(u1,u2) = 1 - u1 - u2 + C(u1,u2) and hence, C1(u1,u2) C2(u1,u2) if and only if C1(u1,u2) C2(u1,u2). The Fréchet­Hoeffding lower bound W2 is smaller than every 2-copula, and every n-copula is smaller than the Fréchet­Hoeffding upper bound Mn. This partial ordering of the set of copulas is called a concordance ordering. It is a partial ordering since not every pair of copulas is comparable in this order. However many important parametric families of copulas are totally ordered. We call a one-parameter family {C } positively ordered if C1 C2 whenever 1 2. Examples of such one-parameter families will be given later. 2.4. Copulas and random variables Let X1,...,Xn be random variables with continuous distribution functions F1,...,Fn, and joint distribution function H . Then (X1,...,Xn)T has a unique copula C, where C is given by (2.1). The standard copula representation of the distribution of the random vector (X1,...,Xn)T then becomes: H(x1,...,xn) = P{X1 x1,...,Xn xn} = C F1(x1),...,Fn(xn) . Ch. 8: Modelling Dependence with Copulas 337 The transformations Xi Fi(Xi) used in the above representation are usually referred to as the probability-integral transformations (to uniformity) and form a standard tool in simulation methodology. Since X1,...,Xn are independent if and only if H(x1,...,xn) = F1(x1)Fn(xn) for all x1,...,xn in R, the following result follows from Theorem 2.2. Theorem 2.5. Let (X1,...,Xn)T be a vector of continuous random variables with copula C, then X1,...,Xn are independent if and only if C = n. One nice property of copulas is that for strictly monotone transformations of the random variables, copulas are either invariant, or change in certain simple ways. Note that if the distribution function of a random variable X is continuous, and if is a strictly monotone function whose domain contains RanX, then the distribution function of the random variable (X) is also continuous. Theorem 2.6. Let (X1,...,Xn)T be a vector of continuous random variables with copula C. If, for k = 1,...,n, k is strictly increasing on RanXk, then also (1(X1),...,n(Xn))T has copula C. Proof: Let F1,...,Fn denote the distribution functions of X1,...,Xn and let G1,...,Gn denote the distribution functions of 1(X1),...,n(Xn), respectively. Let (X1,...,Xn)T have copula C, and let (1(X1),...,n(Xn))T have copula C. Since k is strictly increas- ing, Gk(x) = P k(Xk) x = P Xk -1 k (x) = Fk -1 k (x) for any x in R, hence C G1(x1),...,Gn(xn) = P 1(X1) x1,...,n(Xn) xn = P X1 -1 1 (x1),...,Xn -1 n (xn) = C F1 -1 1 (x1) ,...,Fn -1 n (xn) = C G1(x1),...,Gn(xn) . Since X1,...,Xn are continuous, Ran G1 = = RanGn = [0,1]. Hence it follows that C = C on [0,1]n. From Theorem 2.2 we know that the copula function C "separates" an n-dimensional distribution function from its univariate marginals. The next theorem will show that there is also a function, C, that separates an n-dimensional survival function from its univariate survival marginals. Furthermore this function can be shown to be a copula, and this survival copula can rather easily be expressed in terms of C and its k-dimensional marginals. 338 P. Embrechts et al. Theorem 2.7. Let (X1,...,Xn)T be a vector of continuous random variables with copula CX1,...,Xn . For i = 1,...,n, let i be strictly monotone on RanXi , and let (1(X1),..., n(Xn))T have copula C1(X1),...,n(Xn). Furthermore let k be strictly decreasing for some k. Without loss of generality let k = 1. Then C1(X1),...,n(Xn)(u1,u2,...,un) = C2(X2),...,n(Xn)(u2,...,un) - CX1,2(X2),...,n(Xn)(1 - u1,u2,...,un). Proof: For i = 1,...,n, let Xi have distribution function Fi and let 1(Xi) have distribution function Gi. Then C1(X1),2(X2),...,n(Xn) G1(x1),...,Gn(xn) = P 1(X1) x1,...,n(Xn) xn = P X1 > -1 1 (x1),2(X2) x2,...,n(Xn) xn = P 2(X2) x2,...,n(Xn) xn - P X1 -1 1 (x1),2(X2) x2,...,n(Xn) xn = C2(X2),...,n(Xn) G2(x2),...,Gn(xn) - CX1,2(X2),...,n(Xn) F1 -1 1 (x1) ,G2(x2),...,Gn(xn) = C2(X2),...,n(Xn) G2(x2),...,Gn(xn) - CX1,2(X2),...,n(Xn) 1 - G1(x1),G2(x2),...,Gn(xn) , from which the conclusion follows directly. By using the two theorems above recursively it is clear that the copula C1(X1),...,n(Xn) can be expressed in terms of the copula CX1,...,Xn and its lower-dimensional marginals. This is exemplified below. Example 2.3. Consider the bivariate case. Let 1 be strictly decreasing and let 2 be strictly increasing. Then C1(X1),2(X2)(u1,u2) = u2 - CX1,2(X2)(1 - u1,u2) = u2 - CX1,X2 (1 - u1,u2). Let 1 and 2 be strictly decreasing. Then C1(X1),2(X2)(u1,u2) = u2 - CX1,2(X2)(1 - u1,u2) = u2 - 1 - u1 - CX1,X2 (1 - u1,1 - u2) = u1 + u2 - 1 + CX1,X2 (1 - u1,1 - u2). Ch. 8: Modelling Dependence with Copulas 339 Here C1(X1),2(X2) is the survival copula, C, of (X1,X2)T, i.e., H(x1,x2) = P{X1 > x1,X2 > x2} = C F1(x1),F2(x2) . Note also that the joint survival function of n U(0,1) random variables whose joint distribution function is the copula C is C(u1,...,un) = C(1 - u1,...,1 - un). The mixed k-th order partial derivatives of a copula C, kC(u)/u1 uk, exist for almost all u in [0,1]n. For such u, 0 kC(u)/u1 uk 1. For details, see Nelsen (1999, p. 11). With this in mind, let C(u1,...,un) = AC(u1,...,un) + SC(u1,...,un), where AC(u1,...,un) = u1 0 ... un 0 n s1 sn C(s1,...,sn)ds1 dsn, SC(u1,...,un) = C(u1,...,un) - AC(u1,...,un). Unlike multivariate distributions in general, the marginals of a copula are continuous, hence a copula has no individual points u in [0,1]n for which VC(u) > 0. If C = AC on [0,1]n, then C is said to be absolutely continuous. In this case C has density n u1un C(u1,...,un). If C = SC on [0,1]n, then C is said to be singular, and n u1un C(u1,...,un) = 0 almost everywhere in [0,1]n. The support of a copula is the complement of the union of all open subsets A of [0,1]n with VC(A) = 0. When C is singular its support has Lebesgue measure zero and conversely. However a copula can have full support without being absolutely continuous. Examples of such copulas are so-called Marshall­Olkin copulas which are presented later. Example 2.4. Consider the bivariate Fréchet­Hoeffding upper bound M given by M(u,v) = min(u,v) on [0,1]2. It follows that 2 uv M(u,v) = 0 everywhere on [0,1]2 except on the main diagonal (which has Lebesgue measure zero), and VM(B) = 0 for every rectangle B in [0,1]2 entirely above or below the main diagonal. Hence M is singular. One of the main aims of this chapter is to present effective algorithms for random variate generation from the various copula families studied. The properties of the specific copula family is often essential for the efficiency of the corresponding algorithm. We now present a general algorithm for random variate generation from copulas. Note however that in most cases it is not an efficient one to use. Consider the general situation of random variate generation from the n-copula C. Let Ck(u1,...,uk) = C(u1,...,uk,1,...,1), k = 2,...,n - 1, 340 P. Embrechts et al. denote k-dimensional marginals of C, with C1(u1) = u1 and Cn(u1,...,un) = C(u1,...,un). Let U1,...,Un have joint distribution function C. Then the conditional distribution of Uk given the values of U1,...,Uk-1, is given by Ck(uk|u1,...,uk-1) = P{Uk uk|U1 = u1,...,Uk-1 = uk-1} = k-1Ck(u1,...,uk) u1 uk-1 k-1Ck-1(u1,...,uk-1) u1 uk-1 , given that the numerator and denominator exist and that the denominator is not zero. The following algorithm generates a random variate (u1,...,un)T from C. As usual, let U(0,1) denote the uniform distribution on [0,1]. Algorithm 2.1. * Simulate a random variate u1 from U(0,1). * Simulate a random variate u2 from C2( | u1). ... * Simulate a random variate un from Cn( | u1,...,un-1). This algorithm is in fact a particular case of what is called "the standard construction". The correctness of the algorithm can be seen from the fact that for independent U(0,1) random variables Q1,...,Qn, Q1,C-1 2 (Q2|Q1),...,C-1 n (Qn|Q1,C-1 2 (Q2|Q1),...) T has distribution function C. To simulate a value uk from Ck(|u1,...,uk-1) in general means simulating q from U(0,1) from which uk can be obtained from the equation q = Ck(uk|u1,...,uk-1) by numerical rootfinding. When C-1 k (q|u1,...,uk-1) has a closed form (and hence there is no need for numerical rootfinding) this algorithm can be recommended. Example 2.5. Let the copula C be given by C(u,v) = (u- + v- - 1)-1/ , for > 0. Then C2|1(v|u) = C u (u,v) = - 1 u+ v- 1 -1/-1 -u--1 = u (-1-)/ u+ v- 1 -1/-1 = 1 + u v- 1 (-1-)/ . Ch. 8: Modelling Dependence with Copulas 341 Solving the equation q = C2|1(v|u) for v yields C-1 2|1(q | u) = v = q-/(1+) - 1 u+ 1 -1/ . The following algorithm generates a random variate (u,v)T from the above copula C. * Simulate two independent random variates u and q from U(0,1). * Set v = ((q-/(1+) - 1)u- + 1)-1/. 3. Dependence concepts Copulas provide a natural way to study and measure dependence between random variables. As a direct consequence of Theorem 2.6, copula properties are invariant under strictly increasing transformations of the underlying random variables. Linear correlation (or Pearson's correlation) is most frequently used in practice as a measure of dependence. However, since linear correlation is not a copula-based measure of dependence, it can often be quite misleading and should not be taken as the canonical dependence measure. Below we recall the basic properties of linear correlation, and then continue with some copula based measures of dependence. 3.1. Linear correlation Definition 3.1. Let (X,Y)T be a vector of random variables with nonzero finite variances. The linear correlation coefficient for (X,Y)T is (X,Y) = Cov(X,Y) Var(X) Var(Y) , (3.1) where Cov(X,Y) = E(XY) - E(X)E(Y) is the covariance of (X,Y)T, and Var(X) and Var(Y) are the variances of X and Y. Linear correlation is a measure of linear dependence. In the case of perfect linear dependence, i.e., Y = aX + b almost surely for a R \ {0}, b R, we have |(X,Y)| = 1. More important is that the converse also holds. Otherwise, -1 < (X,Y) < 1. Furthermore linear correlation has the property that (X + , Y + ) = sign( )(X,Y), for , R \ {0}, , R. Hence linear correlation is invariant under strictly increasing linear transformations. Linear correlation is easily manipulated under linear operations. Let A, B be m × n matrices; a,b Rm and let X, Y be random n-vectors. Then Cov(AX + a,BY + b) = ACov(X,Y)BT . 342 P. Embrechts et al. From this it follows that for Rn, Var T X = T Cov(X), where Cov(X) := Cov(X,X). Hence the variance of a linear combination is fully determined by pairwise covariances between the components, a property which is crucial in portfolio theory. Linear correlation is a popular but also often misunderstood measure of dependence. The popularity of linear correlation stems from the ease with which it can be calculated and it is a natural scalar measure of dependence in elliptical distributions (with well known members such as the multivariate normal and the multivariate t-distribution). However most random variables are not jointly elliptically distributed, and using linear correlation as a measure of dependence in such situations might prove very misleading. Even for jointly elliptically distributed random variables there are situations where using linear correlation, as defined by (3.1), does not make sense. We might choose to model some scenario using heavy-tailed distributions such as t2-distributions. In such cases the linear correlation coefficient is not even defined because of infinite second moments. 3.2. Perfect dependence For every n-copula C we know from the Fréchet­Hoeffding inequality (Theorem 2.3) that Wn (u1,...,un) C(u1,...,un) Mn (u1,...,un). Furthermore, for n = 2 the upper and lower bounds are themselves copulas and we have seen that W and M are the bivariate distributions functions of the random vectors (U,1 - U)T and (U,U)T, respectively, where U U(0,1) (i.e., U is uniformly distributed on [0,1]). In this case we say that W describes perfect negative dependence and M describes perfect positive dependence. Theorem 3.1. Let (X,Y)T have one of the copulas W or M. Then there exist two monotone functions , :R R and a random variable Z so that (X,Y) =d (Z),(Z) , with increasing and decreasing in the former case (W) and both and increasing in the latter case (M). The converse of this result is also true. For a proof, see Embrechts, McNeil and Straumann (2002). In a different form this result was already in Fréchet (1951). Definition 3.2. If (X,Y)T has the copula M then X and Y are said to be comonotonic; if it has the copula W they are said to be countermonotonic. Ch. 8: Modelling Dependence with Copulas 343 Note that if any of F and G (the distribution functions of X and Y, respectively) have discontinuities, so that the copula is not unique, then W and M are possible copulas. In the case of F and G being continuous, a stronger version of the result can be stated: C = W Y = T (X) a.s., T = G-1 (1 - F) decreasing, C = M Y = T (X) a.s., T = G-1 F increasing. Other characterizations of comonotonicity can be found in Denneberg (1994). 3.3. Concordance Let (x,y)T and (~x, ~y)T be two observations from a vector (X,Y)T of continuous random variables. Then (x,y)T and (~x, ~y)T are said to be concordant if (x - ~x)(y - ~y) > 0, and discordant if (x - ~x)(y - ~y) < 0. The following theorem can be found in Nelsen (1999, p. 127). Many of the results in this section are direct consequences of this theorem. Theorem 3.2. Let (X,Y)T and (X,Y)T be independent vectors of continuous random variables with joint distribution functions H and H, respectively, with common marginals F (of X and X) and G (of Y and Y). Let C and C denote the copulas of (X,Y)T and (X,Y)T, respectively, so that H(x,y) = C(F(x),G(y)) and H(x,y) = C(F(x),G(y)). Let Q denote the difference between the probability of concordance and discordance of (X,Y)T and (X,Y)T, i.e., let Q = P X - X Y - Y > 0 - P X - X Y - Y < 0 . Then Q = Q C,C = 4 [0,1]2 C(u,v)dC(u,v) - 1. Proof: Since the random variables are all continuous, P X - X Y - Y < 0 = 1 - P X - X Y - Y > 0 and hence Q = 2P{(X - X)(Y - Y) > 0} - 1. But P X - X Y - Y > 0 = P X > X,Y > Y + P X < X,Y < Y , and these probabilities can be evaluated by integrating over the distribution of one of the vectors (X,Y)T or (X,Y)T. Hence P X > X,Y > Y = P X < X,Y < Y 344 P. Embrechts et al. = R2 P X < x,Y < y dC F(x),G(y) = R2 C F(x),G(y) dC F(x),G(y) . Employing the probability-integral transforms u = F(x) and v = G(y) then yields P X > X,Y > Y = [0,1]2 C(u,v)dC(u,v). Similarly, P X < X,Y < Y = R2 P X > x,Y > y dC F(x),G(y) = R2 1 - F(x) - G(y) + C F(x),G(y) dC F(x),G(y) = [0,1]2 1 - u - v + C(u,v) dC(u,v). But since C is the joint distribution function of a vector (U,V )T of U(0,1) random variables, E(U) = E(V ) = 1/2, and hence P X < X,Y < Y = 1 - 1 2 - 1 2 + [0,1]2 C(u,v)dC(u,v) = [0,1]2 C(u,v)dC(u,v). Thus P X - X Y - Y > 0 = 2 [0,1]2 C(u,v)dC(u,v), and the conclusion follows. Corollary 3.1. Let C, C, and Q be as given in Theorem 3.2. Then (1) Q is symmetric in its arguments: Q(C,C) = Q(C,C). (2) Q is nondecreasing in each argument: if C C , then Q(C,C) Q(C ,C). (3) Copulas can be replaced by survival copulas in Q, i.e., Q(C,C) = Q(C,C). Ch. 8: Modelling Dependence with Copulas 345 The following definition can be found in Scarsini (1984). Definition 3.3. A real valued measure of dependence between two continuous random variables X and Y whose copula is C is a measure of concordance if it satisfies the following properties: (1) is defined for every pair X,Y of continuous random variables. (2) -1 X,Y 1, X,X = 1 and X,-X = -1. (3) X,Y = Y,X. (4) If X and Y are independent, then X,Y = = 0. (5) -X,Y = X,-Y = -X,Y . (6) If C and C are copulas such that C C, then C C. (7) If {(Xn,Yn)} is a sequence of continuous random variables with copulas Cn, and if {Cn} converges pointwise to C, then limn Cn = C. Let be a measure of concordance for continuous random variables X and Y. As a consequence of Definition 3.3, if If Y is almost surely an increasing function of X, then X,Y = M = 1, and if Y is almost surely a decreasing function of X, then X,Y = W = -1. Moreover, if and are almost surely strictly increasing functions on RanX and Ran Y respectively, then (X),(Y) = X,Y . 3.4. Kendalĺs tau and Spearman's rho In this section we discuss two important measures of dependence (concordance) known as Kendalĺs tau and Spearman's rho. They provide the perhaps best alternatives to the linear correlation coefficient as a measure of dependence for nonelliptical distributions, for which the linear correlation coefficient is inappropriate and often misleading. For more details about Kendalĺs tau and Spearman's rho and their estimators (sample versions) we refer to Kendall and Stuart (1979), Kruskal (1958), Lehmann (1975), Capéra and Genest (1993). For other interesting scalar measures of dependence see Schweizer and Wolff (1981). Definition 3.4. Kendalĺs tau for the random vector (X,Y)T is defined as (X,Y) = P X - X Y - Y > 0 - P X - X Y - Y < 0 , where (X,Y)T is an independent copy of (X,Y)T. Hence Kendalĺs tau for (X,Y)T is simply the probability of concordance minus the probability of discordance. Theorem 3.3. Let (X,Y)T be a vector of continuous random variables with copula C. Then Kendalĺs tau for (X,Y)T is given by (X,Y) = Q(C,C) = 4 [0,1]2 C(u,v)dC(u,v) - 1. 346 P. Embrechts et al. Note that the integral above is the expected value of the random variable C(U,V ), where U,V U(0,1) with joint distribution function C, i.e., (X,Y) = 4E(C(U,V )) - 1. Definition 3.5. Spearman's rho for the random vector (X,Y)T is defined as S(X,Y) = 3 P X - X Y - Y > 0 - P X - X Y - Y < 0 , where (X,Y)T, (X,Y)T and (X ,Y )T are independent copies. Note that X and Y are independent. Using Theorem 3.2 and the first part of Corollary 3.1 we obtain the following result. Theorem 3.4. Let (X,Y)T be a vector of continuous random variables with copula C. Then Spearman's rho for (X,Y)T is given by S(X,Y) = 3Q(C,) = 12 [0,1]2 uv dC(u,v) - 3 = 12 [0,1]2 C(u,v)dudv - 3. Hence, if X F and Y G, and we let U = F(X) and V = G(Y), then S(X,Y) = 12 [0,1]2 uv dC(u,v) - 3 = 12E(UV ) - 3 = E(UV ) - 1/4 1/12 = Cov(U,V ) Var(U) Var(V ) = F(X),G(Y) . In the next theorem we will see that Kendalĺs tau and Spearman's rho are concordance measures according to Definition 3.3. Theorem 3.5. If X and Y are continuous random variables whose copula is C, then Kendalĺs tau and Spearman's rho satisfy the properties in Definition 3.3 for a measure of concordance. For a proof, see Nelsen (1999, p. 137). Example 3.1. Kendalĺs tau and Spearman's rho for the random vector (X,Y)T are invariant under strictly increasing componentwise transformations. This property does not hold for linear correlation. It is not difficult to construct examples, the following construction is instructive in its own right. Let X and Y be standard exponential random variables with Ch. 8: Modelling Dependence with Copulas 347 copula C, where C is a member of the Farlie­Gumbel­Morgenstern family, i.e., C is given by C(u,v) = uv + uv(1 - u)(1 - v), for some in [-1,1]. The joint distribution function H of X and Y is given by H(x,y) = C 1 - e-x ,1 - e-y . Let denote the linear correlation coefficient. Then (X,Y) = E(XY) - E(X)E(Y) Var(X) Var(Y) = E(XY) - 1, where E(XY) = 0 0 xy dH(x,y) = 0 0 xy (1 + )e-x-y - 2 e-2x-y - 2 e-x-2y + 4 e-2x-2y dx dy = 1 + 4 . Hence (X,Y) = /4. But 1 - e-X ,1 - e-Y = S(X,Y) = 12 [0,1]2 C(u,v)dudv - 3 = 12 [0,1]2 (uv + uv(1 - u)(1 - v))dudv - 3 = 12 1 4 + 36 - 3 = 3 . Hence (X,Y) is not invariant under strictly increasing transformations of X and Y and therefore linear correlation is not a measure of concordance. Although the properties listed under Definition 3.3 are useful, there are some additional properties that would make a measure of concordance even more useful. Recall that for a random vector (X,Y)T with copula C, C = M C = C = 1, C = W C = C = -1. The following theorem states that the converse is also true. 348 P. Embrechts et al. Theorem 3.6. Let X and Y be continuous random variables with copula C, and let denote Kendalĺs tau or Spearman's rho. Then the following are true: (1) (X,Y) = 1 C = M. (2) (X,Y) = -1 C = W. For a proof, see Embrechts, McNeil and Straumann (2002). From the definitions of Kendalĺs tau and Spearman's rho it follows that both are increasing functions of the value of the copula under consideration. Thus they are increasing with respect to the concordance ordering given in Definition 2.5. Moreover, for continuous random variables all values in the interval [-1,1] can be obtained for Kendalĺs tau or Spearman's rho by a suitable choice of the underlying copula. This is however not the case with linear correlation as is shown in the following example from Embrechts, McNeil and Straumann (2002). Example 3.2. Let X LN(0,1) (Lognormal) and Y LN(0,2), > 0. Then min = (eZ,e-Z) and max = (eZ,eZ), where Z N(0,1). min and max can be calculated, yielding: min = e- - 1 e - 1 e2 - 1 , max = e - 1 e - 1 e2 - 1 , from which follows that lim min = lim max = 0. Hence the linear correlation coefficient can be almost zero, even if X and Y are comonotonic or countermonotonic. Kendalĺs tau and Spearman's rho are measures of dependence between two random variables. However the extension to higher dimensions is obvious, we simply write pairwise correlations in an n × n matrix in the same way as is done for linear correlation. 3.5. Tail dependence The concept of tail dependence relates to the amount of dependence in the upper-rightquadrant tail or lower-left-quadrant tail of a bivariate distribution. It is a concept that is relevant for the study of dependence between extreme values. It turns out that tail dependence between two continuous random variables X and Y is a copula property and hence the amount of tail dependence is invariant under strictly increasing transformations of X and Y. Definition 3.6. Let (X,Y)T be a vector of continuous random variables with marginal distribution functions F and G. The coefficient of upper tail dependence of (X,Y)T is lim u 1 P Y > G-1 (u)|X > F-1 (u) = U Ch. 8: Modelling Dependence with Copulas 349 provided that the limit U [0,1] exists. If U (0,1], X and Y are said to be asymptotically dependent in the upper tail; if U = 0, X and Y are said to be asymptotically independent in the upper tail. Since P{Y > G-1(u) | X > F-1(u)} can be written as 1 - P{X F-1(u)} - P{Y G-1(u)} + P{X F-1(u),Y G-1(u)} 1 - P{X F-1(u)} , an alternative and equivalent definition (for continuous random variables), from which it is seen that the concept of tail dependence is indeed a copula property, is the following which can be found in Joe (1997, p. 33). Definition 3.7. If a bivariate copula C is such that lim u 1 1 - 2u + C(u,u) 1 - u = U exists, then C has upper tail dependence if U (0,1], and upper tail independence if U = 0. Example 3.3. Consider the bivariate Gumbel family of copulas given by C (u,v) = exp - (-lnu) + (-lnv) 1/ , for 1. Then 1 - 2u + C(u,u) 1 - u = 1 - 2u + exp(21/ lnu) 1 - u = 1 - 2u + u21/ 1 - u , and hence lim u 1 1 - 2u + C(u,u) 1 - u = 2 - lim u 1 21/ u21/ -1 = 2 - 21/ . Thus for > 1, C has upper tail dependence. For copulas without a simple closed form an alternative formula for U is more useful. An example is given in the case of the Gaussian copula CR(u,v) = -1(u) - -1(v) - 1 2 1 - R2 12 exp s2 - 2R12st + t2 2(1 - R2 12) ds dt, 350 P. Embrechts et al. where -1 < R12 < 1 and is the univariate standard normal distribution function. Consider a pair of U(0,1) random variables (U,V ) with copula C. First note that P{V v|U = u} = C(u,v)/u and P{V > v|U = u} = 1 - C(u,v)/u, and similarly when conditioning on V . Then U = lim u 1 C(u,u) 1 - u = - lim u 1 dC(u,u) du = - lim u 1 -2 + s C(s,t) s=t=u + t C(s,t) s=t=u = lim u 1 P V > u|U = u + P{U > u|V = u} . Furthermore, if C is an exchangeable copula, i.e., C(u,v) = C(v,u), then the expression for U simplifies to U = 2 lim u 1 P{V > u|U = u}. Example 3.4. Let (X,Y)T have the bivariate standard normal distribution function with linear correlation coefficient . That is (X,Y)T C((x),(y)), where C is a member of the Gaussian family given above with R12 = . Since copulas in this family are exchange- able, U = 2 lim u 1 P{V > u|U = u}, and because is a distribution function with infinite right endpoint, lim u 1 P{V > u|U = u} = lim x P -1 (V ) > x|-1 (U) = x = lim x P{X > x|Y = x}. Using the well known fact that Y|X = x N(x,1 - 2) we obtain U = 2 lim x x - x 1 - 2 = 2 lim x x 1 - 1 + , from which it follows that U = 0 for R12 < 1. Hence the Gaussian copula C with < 1 does not have upper tail dependence. The concept of lower tail dependence can be defined in a similar way. If the limit limu 0 C(u,u)/u = L exists, then C has lower tail dependence if L (0,1], and lower Ch. 8: Modelling Dependence with Copulas 351 tail independence if L = 0. For copulas without a simple closed form an alternative formula for L is more useful. Consider a random vector (U,V )T with copula C. Then L = lim u 0 C(u,u) u = lim u 0 dC(u,u) du = lim u 0 s C(s,t) s=t=u + t C(s,t) s=t=u = lim u 0 P{V < u|U = u} + P{U < u|V = u} . Furthermore if C is an exchangeable copula, i.e., C(u,v) = C(v,u), then the expression for L simplifies to L = 2 lim u 0 P{V < u|U = u}. Recall that the survival copula of two random variables with copula C is given by C(u,v) = u + v - 1 + C(1 - u,1 - v), and the joint survival function for two U(0,1) random variables whose joint distribution function is C is given by C(u,v) = 1 - u - v + C(u,v) = C(1 - u,1 - v). Hence it follows that lim u 1 C(u,u) 1 - u = lim u 1 C(1 - u,1 - u) 1 - u = lim u 0 C(u,u) u , so the coefficient of upper tail dependence of C is the coefficient of lower tail dependence of C. Similarly the coefficient of lower tail dependence of C is the coefficient of upper tail dependence of C. 4. Marshall­Olkin copulas In this section we discuss a class of copulas called Marshall­Olkin copulas. To be able to derive these copulas and present explicit expressions for rank correlation and tail dependence coefficients without tedious calculations, we begin with bivariate Marshall­Olkin copulas. We then continue with the general n-dimensional case and suggest applications of Marshall­Olkin copulas to the modelling of dependent risks. For further details about Marshall­Olkin distributions we refer to Marshall and Olkin (1967). Similar ideas are contained in Muliere and Scarsini (1987). 352 P. Embrechts et al. 4.1. Bivariate Marshall­Olkin copulas Consider a two-component system where the components are subject to shocks, which are fatal to one or both components. Let X1 and X2 denote the lifetimes of the two components. Furthermore assume that the shocks follow three independent Poisson processes with parameters 1,2,12 0, where the index indicates whether the shocks effect only component 1, only component 2 or both. Then the times Z1, Z2 and Z12 of occurrence of these shocks are independent exponential random variables with parameters 1, 2 and 12 respectively. Hence H(x1,x2) = P{X1 > x1,X2 > x2} = P{Z1 > x1}P{Z2 > x2}P Z12 > max(x1,x2) . The univariate survival functions for X1 and X2 are F1(x1) = exp(-(1 + 12)x1) and F2(x2) = exp(-(2 + 12)x2). Furthermore, since max(x1,x2) = x1 + x2 - min(x1,x2), H(x1,x2) = exp -(1 + 12)x1 - (2 + 12)x2 + 12 min(x1,x2) = F 1(x1)F2(x2)min exp(12x1),exp(12x2) . Let 1 = 12/(1 + 12) and 2 = 12/(2 + 12). Then exp(12x1) = F1(x1)-1 and exp(12x2) = F2(x2)-2 , and hence the survival copula of (X1,X2)T is given by C(u1,u2) = u1u2 min u-1 1 ,u-2 2 = min u1-1 1 u2,u1u1-2 2 . This construction leads to a copula family given by C1,2 (u1,u2) = min u1-1 1 u2,u1u1-2 2 = u1-1 1 u2, u1 1 u2 2 , u1u1-2 2 , u1 1 u2 2 . This family is known as the Marshall­Olkin family. Marshall­Olkin copulas have both an absolutely continuous and a singular component. Since 2 u1u2 C1,2 (u1,u2) = u -1 1 , u 1 1 > u 2 2 , u-2 2 , u1 1 < u2 2 , the mass of the singular component is concentrated on the curve u1 1 = u2 2 in [0,1]2 as seen in Figure 1. Kendalĺs tau and Spearman's rho are quite easily evaluated for this copula family. For Spearman's rho, applying Theorem 3.4 yields: S(C1,2 ) = 12 [0,1]2 C1,2 (u,v)dudv - 3 Ch. 8: Modelling Dependence with Copulas 353 Fig. 1. A simulation from the Marshall­Olkin copula with 1 = 1.1, 2 = 0.2 and 12 = 0.6. = 12 1 0 u1/2 0 u1-1 v dv + 1 u1/2 uv1-2 dv du - 3 = 312 21 + 22 - 12 . To evaluate Kendalĺs tau we use the following theorem, a proof of which is found in Nelsen (1999, p. 131). Theorem 4.1. Let C be a copula such that the product (C/u)(C/v) is integrable on [0,1]2. Then [0,1]2 C(u,v)dC(u,v) = 1 2 - [0,1]2 u C(u,v) u C(u,v)dudv. Using Theorems 3.3 and 4.1 we obtain (C1,2 ) = 4 [0,1]2 C1,2 (u,v)dC1,2 (u,v) - 1 = 4 1 2 - [0,1]2 u C1,2 (u,v) u C1,2 (u,v)dudv - 1 = 12 1 + 2 - 12 . 354 P. Embrechts et al. Thus, all values in the interval [0,1] can be obtained for S(C1,2 ) and (C1,2 ). The Marshall­Olkin copulas have upper tail dependence. Without loss of generality assume that 1 > 2, then lim u 1 C(u,u) 1 - u = lim u 1 1 - 2u + u2 min(u-1 ,u-2 ) 1 - u = lim u 1 1 - 2u + u2u-2 1 - u = lim u 1 2 - 2u1-2 + 2u1-2 = 2, and hence U = min(1,2) is the coefficient of upper tail dependence. 4.2. A multivariate extension We now present the natural multivariate extension of the bivariate Marshall­Olkin family. Consider an n-component system, where each nonempty subset of components is assigned a shock which is fatal to all components of that subset. Let S denote the set of nonempty subsets of {1,...,n}. Let X1,...,Xn denote the lifetimes of the components, and assume that shocks assigned to different subsets s, s S, follow independent Poisson processes with intensities s. Let Zs, s S, denote the time of first occurrence of a shock event for the shock process assigned to subset s. Then the occurrence times Zs are independent exponential random variables with parameters s, and Xj = mins:js Zs for j = 1,...,n. There are in total 2n - 1 shock processes, each in one-to-one correspondence with a nonempty subset of {1,...,n}. Example 4.1. Let n = 4. Then X1 = min(Z1,Z12,Z13,Z14,Z123,Z124,Z134,Z1234), X2 = min(Z2,Z12,Z23,Z24,Z123,Z124,Z234,Z1234), X3 = min(Z3,Z13,Z23,Z34,Z123,Z134,Z234,Z1234), X4 = min(Z4,Z14,Z24,Z34,Z124,Z134,Z234,Z1234). If for example 13 = 0, then Z13 = almost surely. We now turn to the question of random variate generation from Marshall­Olkin n-copulas. Order the l := |S| = 2n - 1 nonempty subsets of {1,...,n} in some arbitrary way, s1,...,sl, and set k := sk (the parameter of Zsk ) for k = 1,...,l. The following algorithm generates random variates from the Marshall­Olkin n-copula. Ch. 8: Modelling Dependence with Copulas 355 Algorithm 4.1. * Simulate l independent random variates v1,...,vl from U(0,1). * Set xi = min1 k l,isk,k=0(-lnvk/k), i = 1,...,n. * Set i = l k=1 1{i sk}k, i = 1,...,n. * Set ui = exp(-ixi), i = 1,...,n. Then (x1,...,xn)T is an n-variate from the n-dimensional Marshall­Olkin distribution and (u1,...,un)T is an n-variate from the corresponding Marshall­Olkin n-copula. Furthermore, i is the shock intensity "felt" by component i. Since the (i,j)-bivariate marginal of a Marshall­Olkin n-copula is a Marshall­Olkin copula with parameters i = s: is,js s s: is s and j = s: is,js s s: js s , the Kendalĺs tau and Spearman's rho rank correlation matrices are easily evaluated. The (i,j) entries are given by ij i + j - ij and 3ij 2i + 2j - ij , respectively. As seen above, evaluating the rank correlation matrix given the full parameterization of the Marshall­Olkin n-copula is straightforward. However given a (Kendalĺs tau or Spearman's rho) rank correlation matrix we cannot in general obtain a unique parameterization of the copula. By setting the shock intensities for subgroups with more then two elements to zero, we obtain the perhaps most natural parameterization of the copula in this situation. However this also means that the copula only has bivariate dependence. 4.3. A useful modelling framework In general the huge number of parameters for high-dimensional Marshall­Olkin copulas make them unattractive for high-dimensional risk modelling. However, we now give an example of how an intuitively appealing and easier parameterized model for modelling dependent loss frequencies can be set up, for which the survival copula of times to first losses is a Marshall­Olkin copula. Suppose we are interested in insurance losses occurring in several different lines of business or several different countries. In credit-risk modelling we might be interested in losses related to the default of various different counterparties or types of counterparty. A natural approach to modelling this dependence is to assume that all losses can be related to a series of underlying and independent shock processes. In insurance these shocks might be natural catastrophes; in credit-risk modelling they might be a variety of underlying economic 356 P. Embrechts et al. events. When a shock occurs this may cause losses of several different types; the common shock causes the numbers of losses of each type to be dependent. It is commonly assumed that the different varieties of shocks arrive as independent Poisson processes, in which case the counting processes of the losses are also Poisson and can be handled easily analytically. In reliability such models are known as fatal shock models, when the shock always destroys the component, and nonfatal shock models, when components have a chance of surviving the shock. A good basic reference on such models is Barlow and Proschan (1975). Suppose there are m different types of shocks and for e = 1,...,m, let {N(e)(t), t 0} be a Poisson process with intensity (e) recording the number of events of type e occurring in (0,t]. Assume further that these shock counting processes are independent. Consider losses of n different types and for j = 1,...,n, let {Nj (t), t 0} be a counting process that records the frequency of losses of the jth type occurring in (0,t]. At the rth occurrence of an event of type e the Bernoulli variable I (e) j,r indicates whether a loss of type j occurs. The vectors I(e) r = I (e) 1,r ,...,I(e) n,r T for r = 1,...,N(e)(t) are considered to be independent and identically distributed with a multivariate Bernoulli distribution. In other words, each new event represents a new independent opportunity to incur a loss but, for a fixed event, the loss trigger variables for losses of different types may be dependent. The form of the dependence depends on the specification of the multivariate Bernoulli distribution with independence as a special case. We use the following notation for p-dimensional marginal probabilities of this distribution (the subscript r is dropped for simplicity): P I (e) j1 = ij1 ,...,I (e) jp = ijp = p (e) j1,...,jp (ij1 ,...,ijp ), ij1 ,...,ijp {0,1}. We also write p(e) j (1) = p(e) j for one-dimensional marginal probabilities, so that in the special case of conditional independence we have p(e) j1,...,jp (1,...,1) = p k=1 p(e) jk . The counting processes for events and losses are thus linked by Nj (t) = m e=1 N(e)(t) r=1 I (e) j,r . Under the Poisson assumption for the event processes and the Bernoulli assumption for the loss indicators, the loss processes {Nj (t), t 0} are clearly Poisson themselves, since they are obtained by superpositioning m independent (possibly thinned) Poisson processes generated by the m underlying event processes. The random vector (N1(t),...,Nn(t))T can be thought of as having a multivariate Poisson distribution. The presented nonfatal shock model has an equivalent fatal shock model representation, i.e., of the type presented in Section 4.2. Hence the random vector (X1,...,Xn)T of times Ch. 8: Modelling Dependence with Copulas 357 to first losses of different types, where Xj = inf{t 0 | Nj (t) > 0}, has an n-dimensional Marshall­Olkin distribution whose survival copula is a Marshall­Olkin n-copula. From this it follows that Kendalĺs tau, Spearman's rho and coefficients of tail dependence for (Xi,Xj )T can be easily calculated. For more details on this model, see Lindskog and McNeil (2001). 5. Elliptical copulas The class of elliptical distributions provides a rich source of multivariate distributions which share many of the tractable properties of the multivariate normal distribution and enables modelling of multivariate extremes and other forms of nonnormal dependences. Elliptical copulas are simply the copulas of elliptical distributions. Simulation from elliptical distributions is easy, and as a consequence of Sklar's Theorem so is simulation from elliptical copulas. Furthermore, we will show that rank correlation and tail dependence coefficients can be easily calculated. For further details on elliptical distributions we refer to Fang, Kotz and Ng (1987) and Cambanis, Huang and Simons (1981). 5.1. Elliptical distributions Definition 5.1. If X is a n-dimensional random vector and, for some Rn and some n × n nonnegative definite, symmetric matrix , the characteristic function X-(t) of X - is a function of the quadratic form tTt, X-(t) = (tTt), we say that X has an elliptical distribution with parameters , and , and we write X En(,,). When n = 1, the class of elliptical distributions coincides with the class of onedimensional symmetric distributions. A function as in Definition 5.1 is called a characteristic generator. Theorem 5.1. X En(,,) with rank() = k if and only if there exist a random variable R 0 independent of U, a k-dimensional random vector uniformly distributed on the unit hypersphere {z Rk | zTz = 1}, and an n × k matrix A with AAT = , such that X =d + RAU. For the proof of Theorem 5.1 and the relation between R and see Fang, Kotz and Ng (1987) or Cambanis, Huang and Simons (1981). Example 5.1. Let X Nn(0,In). Since the components Xi N(0,1), i = 1,...,n, are independent and the characteristic function of Xi is exp(-t2 i /2), the characteristic function of X is exp - 1 2 t2 1 + + t2 n = exp - 1 2 tT t . From Theorem 5.1 it then follows that X En(0,In,), where (u) = exp(-u/2). 358 P. Embrechts et al. If X En(,,), where is a diagonal matrix, then X has uncorrelated components (if 0 < Var(Xi) < ). If X has independent components, then X Nn(,). Note that the multivariate normal distribution is the only one among the elliptical distributions where uncorrelated components imply independent components. A random vector X En(,,) does not necessarily have a density. If X has a density it must be of the form ||-1/2g((X - )T-1(X - )) for some nonnegative function g of one scalar variable. Hence the contours of equal density form ellipsoids in Rn. Given the distribution of X, the representation En(,,) is not unique. It uniquely determines but and are only determined up to a positive constant. More precisely, if X En(,,) and X En(,,), then = , = c, () = c , for some constant c > 0. In order to find a representation such that Cov(X) = , we use Theorem 5.1 to obtain Cov(X) = Cov( + RAU) = AE R2 Cov(U)AT , provided that E(R2) < . Let Y Nn(0,In). Then Y =d Y U, where Y is independent of U. Furthermore Y 2 2 n , so E( Y 2) = n. Since Cov(Y) = In we see that if U is uniformly distributed on the unit hypersphere in Rn, then Cov(U) = In/n. Thus Cov(X) = AATE(R2)/n. By choosing the characteristic generator (s) = (s/c), where c = E(R2)/n, we get Cov(X) = . Hence an elliptical distribution is fully described by , and , where can be chosen so that Cov(X) = (if Cov(X) is defined). If Cov(X) is obtained as above, then the distribution of X is uniquely determined by E(X), Cov(X) and the type of its univariate marginals, e.g., normal or t4, say. Theorem 5.2. Let X En(,,), let B be a q × n matrix and b Rq. Then b + BX Eq b + B,BBT , . Proof: By Theorem 5.1, b + BX has the stochastic representation b + BX =d b + B + RBAU. Partition X, and into X = X1 X2 , = 1 2 , = 11 12 21 22 , where X1 and 1 are r × 1 vectors and 11 is a r × r matrix. Ch. 8: Modelling Dependence with Copulas 359 Corollary 5.1. Let X En(,,). Then X1 Er(1,11,), X2 En-r (2,22,). Hence marginal distributions of elliptical distributions are elliptical and of the same type (with the same characteristic generator). The next result states that the conditional distribution of X1 given the value of X2 is also elliptical, but in general not of the same type as X1. Theorem 5.3. Let X En(,,) with strictly positive definite. Then X1|X2 = x Er ~,, ~ , where ~ = 1 + 12-1 22 (x - 2) and = 11 - 12-1 22 21. Moreover, ~ = if and only if X Nn(,). For the proof and details about ~, see Fang, Kotz and Ng (1987). For the extension to the case where rank() < n, see Cambanis, Huang and Simons (1981). The following lemma states that linear combinations of independent, elliptically distributed random vectors with the same dispersion matrix (up to a positive constant) remain elliptical. Lemma 5.1. Let X En(,,) and X En( ~,c, ~) for c > 0 be independent. Then for a,b R, aX + bX En(a + b ~,,) with (u) = (a2u) ~(b2cu). Proof: By Definition 5.1, it is sufficient to show that for all t Rn aX+bX-a-b ~(t) = a(X-)(t)b(X- ~)(t) = (at)T (at) ~ (bt)T (c)(bt) = a2 tT t ~ b2 ctT t . As usual, let X En(,,). Whenever 0 < Var(Xi),Var(Xj ) < , (Xi,Xj ) := Cov(Xi,Xj ) Var(Xi)Var(Xj ) = ij iijj . This explains why linear correlation is a natural measure of dependence between random variables with a joint nondegenerate (ii > 0 for all i) elliptical distribution. Throughout this section we call the matrix R, with Rij = ij / iijj , the linear correlation matrix of X. Note that this definition is more general than the usual one and in this situation (elliptical distributions) makes more sense. Since an elliptical distribution is uniquely determined by , and , the copula of a nondegenerate elliptically distributed random vector is uniquely determined by R and . 360 P. Embrechts et al. One practical problem with elliptical distributions in multivariate risk modelling is that all marginals are of the same type. To construct a realistic multivariate distribution for some given risks, it may be reasonable to choose a copula of an elliptical distribution but different types of marginals (not necessarily elliptical). One big drawback with such a model seems to be that the copula parameter R can no longer be estimated directly from data. Recall that for nondegenerate elliptical distributions with finite variances, R is just the usual linear correlation matrix. In such cases, R can be estimated using (robust) linear correlation estimators. One such robust estimator is provided by the next theorem. For nondegenerate nonelliptical distributions with finite variances and elliptical copulas, R does not correspond to the linear correlation matrix. However, since the Kendalĺs tau rank correlation matrix for a random vector is invariant under strictly increasing transformations of the vector components, and the next theorem provides a relation between the Kendalĺs tau rank correlation matrix and R for nondegenerate elliptical distributions, R can in fact easily be estimated from data. Theorem 5.4. Let X En(,,) with P{Xi = i} < 1 and P{Xj = j } < 1. Then (Xi,Xj ) = 1 - xR P{Xi = x} 2 2 arcsin(Rij ), (5.1) where the sum extends over all atoms of the distribution of Xi. If rank() 2, then (5.1) simplifies to (Xi,Xj ) = 1 - P{Xi = i} 2 2 arcsin(Rij ). For a proof, see Lindskog, McNeil and Schmock (2001). Note that if P{Xi = i} = 0 for all i, which is true for, e.g., multivariate t-distribution or normal distributions with strictly positive definite dispersion matrices , then (Xi,Xj ) = 2 arcsin(Rij ) for all i and j. The nonparametric estimator of R, sin(/2) (dropping the subscript for simplicity), provided by the above theorem, inherits the robustness properties of the Kendalĺs tau estimator and is an efficient (low variance) estimator of R for both elliptical distributions and nonelliptical distributions with elliptical copulas. 5.2. Gaussian copulas The copula of the n-variate normal distribution with linear correlation matrix R is CGa R (u) = n R -1 (u1),...,-1 (un) , Ch. 8: Modelling Dependence with Copulas 361 where n R denotes the joint distribution function of the n-variate standard normal distribution function with linear correlation matrix R, and -1 denotes the inverse of the distribution function of the univariate standard normal distribution. Copulas of the above form are called Gaussian copulas. In the bivariate case the copula expression can be written as CGa R (u,v) = -1(u) - -1(v) - 1 2 1 - R2 12 exp s2 - 2R12st + t2 2(1 - R2 12) ds dt. Note that R12 is simply the usual linear correlation coefficient of the corresponding bivariate normal distribution. Example 3.4 shows that Gaussian copulas do not have upper tail dependence. Since elliptical distributions are radially symmetric, the coefficient of upper and lower tail dependence are equal. Hence Gaussian copulas do not have lower tail dependence. We now address the question of random variate generation from the Gaussian copula CGa R . For our purpose, it is sufficient to consider only strictly positive definite matrices R. Write R = AAT for some n × n matrix A, and if Z1,...,Zn N(0,1) are independent, then + AZ Nn(,R). One natural choice of A is the Cholesky decomposition of R. The Cholesky decomposition of R is the unique lower-triangular matrix L with LLT = R. Furthermore Cholesky decomposition routines are implemented in most mathematical software. This provides an easy algorithm for random variate generation from the Gaussian n-copula CGa R . Algorithm 5.1. * Find the Cholesky decomposition A of R. * Simulate n independent random variates z1,...,zn from N(0,1). * Set x = Az. * Set ui = (xi), i = 1,...,n. * (u1,...,un)T CGa R . As usual denotes the univariate standard normal distribution function. 5.3. t-copulas If X has the stochastic representation X =d + S Z, (5.2) where Rn, S 2 and Z Nn(0,) are independent, then X has an n-variate tdistribution with mean (for > 1) and covariance matrix -2 (for > 2). If 2 362 P. Embrechts et al. then Cov(X) is not defined. In this case we just interpret as being the shape parameter of the distribution of X. The copula of X given by (5.2) can be written as Ct ,R(u) = tn ,R t-1 (u1),...,t-1 (un) , where Rij = ij / iijj for i,j {1,...,n} and where tn ,R denotes the distribution function of Y/ S, where S 2 and Y Nn(0,R) are independent. Here t denotes the (equal) marginals of tn ,R, i.e., the distribution function of Y1/ S. In the bivariate case the copula expression can be written as Ct ,R(u,v) = t-1 (u) - t-1 (v) - 1 2 1 - R2 12 1 + s2 - 2R12st + t2 (1 - R2 12) -(+2)/2 ds dt. Note that R12 is simply the usual linear correlation coefficient of the corresponding bivariate t-distribution if > 2. If (X1,X2)T has a standard bivariate t-distribution with degrees of freedom and linear correlation matrix R, then X2|X1 = x is t-distributed with v + 1 degrees of freedom and E(X2|X1 = x) = R12x, Var(X2|X1 = x) = + x2 + 1 1 - R2 12 . This can be used to show that the t-copula has upper (and because of radial symmetry) equal lower tail dependence: U = 2 lim x P(X2 > x|X1 = x) = 2 lim x t+1 + 1 + x2 1/2 x - R12x 1 - 2 l = 2 lim x t+1 + 1 /x2 + 1 1/2 1 - R12 1 + R12 = 2t+1 + 1 1 - R12 1 + R12 . From this it is also seen that the coefficient of upper tail dependence is increasing in R12 and decreasing in , as one would expect. Furthermore, the coefficient of upper (lower) tail dependence tends to zero as the number of degrees of freedom tends to infinity for R12 < 1. Coefficients of upper tail dependence for the bivariate t-copula are given in Table 1. The last row represents the Gaussian copula, i.e., no tail dependence. Ch. 8: Modelling Dependence with Copulas 363 Table 1 \R12 -0.5 0 0.5 0.9 1 2 0.06 0.18 0.39 0.72 1 4 0.01 0.08 0.25 0.63 1 10 0.00 0.01 0.08 0.46 1 0 0 0 0 1 It should be mentioned that the expression given above is just a special case of a general formula for the coefficient(s) of tail dependence for elliptical distributions with tail dependence. It turns out that if ii > 0 for all i and -1 < ij / iijj < 1 for all i = j, then the bivariate marginal distributions of an elliptically distributed random vector X =d + RAU En(,,) has tail dependence if and only if R is so-called regularly varying (at ). For more details, see Hult and Lindskog (2002), and for details about regular variation in general see Resnick (1987) or Embrechts, Mikosch and Klüppelberg (1997). Equation (5.2) provides an easy algorithm for random variate generation from the t-copula, Ct ,R. Algorithm 5.2. * Find the Cholesky decomposition A of R. * Simulate n independent random variates z1,...,zn from N(0,1). * Simulate a random variate s from 2 independent of z1,...,zn. * Set y = Az. * Set x = s y. * Set ui = t(xi), i = 1,...,n. * (u1,...,un)T Ct ,R. Figures 2 and 3 show samples from bivariate distributions with Gaussian and t-copulas. In Figure 2, we have contrasted a real example (BMW-Siemens daily return data) with simulated data using marginal t4 tails, corresponding Kendalĺs tau (0.5) and varying copulas. Note that the Gaussian copula does not get the extreme joint tail observations clearly present in the real data. The t2-copula seems to be able to do a much better job in that respect. Indeed the t2-generated scatter plot shows most of the graphical features in the real data. Note that these examples were only introduced to highlight the simulation procedures and do not constitute a detailed statistical analysis. Figure 3 (a simulated example) further highlights the difference between the Gaussian and t-copulas, this time with standard normal marginals. The algorithms presented for the Gaussian and t-copulas are fast and easy to implement. We want to emphasize the potential usefulness of t-copulas as an alternative to Gaussian copulas. Both Gaussian and t-copulas are easily parameterized by the linear correlation matrix, but only t-copulas yield dependence structures with tail dependence. 364 P. Embrechts et al. Fig. 2. The upper left plot shows BMW-Siemens daily log returns from 1989 to 1996. The other plots show samples from bivariate distributions with t4-marginals and Kendalĺs tau 0.5. Fig. 3. Samples from two distributions with standard normal marginals, R12 = 0.8 but different dependence structures. (X1,Y1)T has a Gaussian copula and (X2,Y2)T has a t2-copula. Ch. 8: Modelling Dependence with Copulas 365 6. Archimedean copulas The copula families we have discussed so far have been derived from certain families of multivariate distribution functions using Sklar's Theorem. We have seen that elliptical copulas are simply the distribution functions of componentwise transformed elliptically distributed random vectors. Since simulation from elliptical distributions is easy, so is simulation from elliptical copulas. There are however drawbacks: elliptical copulas do not have closed form expressions and are restricted to have radial symmetry (C = C). In many finance and insurance applications it seems reasonable that there is a stronger dependence between big losses (e.g., a stock market crash) than between big gains. Such asymmetries cannot be modelled with elliptical copulas. In this section we discuss an important class of copulas called Archimedean copulas. This class of copulas is worth studying for a number of reasons. Many interesting parametric families of copulas are Archimedean and the class of Archimedean copulas allow for a great variety of different dependence structures. Furthermore, in contrast to elliptical copulas, all commonly encountered Archimedean copulas have closed form expressions. Unlike the copulas discussed so far these copulas are not derived from multivariate distribution functions using Sklar's Theorem. A consequence of this is that we need somewhat technical conditions to assert that multivariate extensions of Archimedean 2-copulas are proper n-copulas. A further disadvantage is that multivariate extensions of Archimedean copulas in general suffer from lack of free parameter choice in the sense that some of the entries in the resulting rank correlation matrix are forced to be equal. At the end of this section we present one possible multivariate extension of Archimedean copulas. For other multivariate extensions we refer to Joe (1997). There is much written about Archimedean copulas. For some background on bivariate Archimedean copulas see Genest and MacKay (1986b). For parameter estimation and a discussion on other statistical questions we refer to Genest and Rivest (1993). Good references on Archimedean copulas in general are Genest and MacKay (1986a), Nelsen (1999), Joe (1997). See also the webpage http://www.mat.ulaval.ca/pages/genest/ for further related work. 6.1. Definitions We begin with a general definition of Archimedean copulas, which can be found in Nelsen (1999, p. 90). As our aim is the construction of multivariate extensions of Archimedean 2-copulas, this general definition will later prove to be a bit more general than needed. Definition 6.1. Let be a continuous, strictly decreasing function from [0,1] to [0,] such that (1) = 0. The pseudo-inverse of is the function [-1] : [0,] [0,1] given by [-1] (t) = -1(t), 0 t (0), 0, (0) t . 366 P. Embrechts et al. Note that [-1] is continuous and decreasing on [0,], and strictly decreasing on [0,(0)]. Furthermore, [-1]((u)) = u on [0,1], and [-1] (t) = t, 0 t (0), (0), (0) t . Finally, if (0) = , then [-1] = -1. Theorem 6.1. Let be a continuous, strictly decreasing function from [0,1] to [0,] such that (1) = 0, and let [-1] be the pseudo-inverse of . Let C be the function from [0,1]2 to [0,1] given by C(u,v) = [-1] (u) + (v) . (6.1) Then C is a copula if and only if is convex. For a proof, see Nelsen (1999, p. 91). Copulas of the form (6.1) are called Archimedean copulas. The function is called a generator of the copula. If (0) = , we say that is a strict generator. In this case, [-1] = -1 and C(u,v) = -1((u) + (v)) is said to be a strict Archimedean copula. Example 6.1. Let (t) = (-lnt) , where 1. Clearly (t) is continuous and (1) = 0. (t) = -(-lnt)-1 1 t , so is a strictly decreasing function from [0,1] to [0,]. (t) 0 on [0,1], so is convex. Moreover (0) = , so is a strict generator. From (6.1) we get C (u,v) = -1 (u) + (v) = exp - (-lnu) + (-lnv) 1/ . Furthermore C1 = and lim C = M (recall that (u,v) = uv and M(u,v) = min(u,v)). This copula family is called the Gumbel family. As shown in Example 3.3 this copula family has upper tail dependence. Example 6.2. Let (t) = (t- - 1)/, where [-1,) \ {0}. This gives the Clayton family C (u,v) = max u+ v- 1 -1/ ,0 . For > 0 the copulas are strict and the copula expression simplifies to C (u,v) = u+ v- 1 -1/ . (6.2) The Clayton family has lower tail dependence for > 0, and C-1 = W, lim0 C = and lim C = M. Since most of the following results are results for strict Archimedean copulas we will refer to (6.2) as the Clayton family. Ch. 8: Modelling Dependence with Copulas 367 Example 6.3. Let (t) = -ln((e-t - 1)/(e- - 1)), where R \ {0}. This gives the Frank family C (u,v) = - 1 ln 1 + (e-u - 1)(e-v - 1) e- - 1 . Frank copulas are strict Archimedean copulas. Furthermore lim C = W, lim 0 C = and lim C = M. Members of the Frank family are the only Archimedean copulas which satisfy the equation C(u,v) = C(u,v) for so-called radial symmetry, see Frank (1979) for details. Example 6.4. Let (t) = 1 - t for t in [0,1]. Then [-1](t) = 1 - t for t in [0,1], and 0 for t > 1; i.e., [-1](t) = max(1 - t,0). Since C(u,v) = max(u + v - 1,0) =: W(u,v), we see that the bivariate Fréchet­Hoeffding lower bound W is Archimedean. 6.2. Properties The results in the following theorem will enable us to formulate multivariate extensions of Archimedean copulas. Theorem 6.2. Let C be an Archimedean copula with generator . Then (1) C is symmetric, i.e., C(u,v) = C(v,u) for all u,v in [0,1]. (2) C is associative, i.e., C(C(u,v),w) = C(u,C(v,w)) for all u,v,w in [0,1]. Proof: The first part follows directly from (6.1). For (2), C C(u,v),w = [-1] [-1] (u) + (v) + (w) = [-1] (u) + (v) + (w) = [-1] (u) + [-1] ((v) + (w)) = C u,C(v,w) . The associativity property of Archimedean copulas is not shared by copulas in general as shown by the following example. Example 6.5. Let C be a member of the bivariate Farlie­Gumbel­Morgenstern family of copulas, i.e., C (u,v) = uv + uv(1 - u)(1 - v), for [-1,1]. Then C 1 4 ,C 1 2 , 1 3 = C C 1 4 , 1 2 , 1 3 for all [-1,1] \ {0}. Hence the only member of the bivariate Farlie­GumbelMorgenstern family of copulas that is Archimedean is . 368 P. Embrechts et al. Theorem 6.3. Let C be an Archimedean copula generated by and let KC(t) = VC (u,v) [0,1]2 | C(u,v) t . Then for any t in [0,1], KC(t) = t - (t) (t+) . (6.3) For a proof, see Nelsen (1999, p. 102). Corollary 6.1. If (U,V )T has distribution function C, where C is an Archimedean copula generated by , then the function KC given by (6.3) is the distribution function of the random variable C(U,V ). The next theorem will provide the basis for a general algorithm for random variate generation from Archimedean copulas. Before the theorem can be stated we need an expression for the density of an absolutely continuous Archimedean copula. From (6.1) it follows that C(u,v) u C(u,v) = (u), C(u,v) v C(u,v) = (v), C(u,v) u C(u,v) v C(u,v) + C(u,v) 2 uv C(u,v) = 0, and hence 2 uv C(u,v) = - (C(u,v)) u C(u,v) v C(u,v) (C(u,v)) = - (C(u,v)) (u) (v) [ (C(u,v))]3 . Thus, when C is absolutely continuous, its density is given by 2 uv C(u,v) = - (C(u,v)) (u) (v) [ (C(u,v))]3 . (6.4) Theorem 6.4. Under the hypotheses of Corollary 6.1, the joint distribution function H(s,t) of the random variables S = (U)/[(U) + (V )] and T = C(U,V ) is given by H(s,t) = sKC(t) for all (s,t) in [0,1]2. Hence S and T are independent, and S is uniformly distributed on [0,1]. Ch. 8: Modelling Dependence with Copulas 369 Proof: [This proof, for the case when C is absolutely continuous, can be found in Nelsen (1999, p. 104). For the general case, see Genest and Rivest (1993).] The joint density h(s,t) of S and T is given by h(s,t) = 2 uv C(u,v) (u,v) (s,t) , where 2C(u,v)/uv is given by (6.4) and (u,v)/(s,t) denotes the Jacobian of the transformation (u) = s(t), (v) = (1 - s)(t). But (u,v) (s,t) = (t) (t) (u) (v) , and hence h(s,t) = - (t) (u) (v) [ (t)]3 (t) (t) (u) (v) = (t)(t) [ (t)]2 . Therefore H(s,t) = s 0 t 0 (y)(y) [ (y)]2 dy dx = s y - (y) (y) t 0 = sKC(t), from which the conclusion follows. An application of Theorem 6.4 is the following algorithm for generating random variates (u,v)T whose joint distribution is an Archimedean copula C with generator . Algorithm 6.1. * Simulate two independent U(0,1) random variates s and q. * Set t = K-1 C (q), where KC is the distribution function of C(U,V ). * Set u = [-1](s(t)) and v = [-1]((1 - s)(t)). Note that the variates s and t correspond to the random variables S and T in Theorem 6.4 and from the proof it follows that this algorithm yields the desired result. Example 6.6. Consider the Archimedean copula family given by C (u,v) = 1 + u-1 - 1 + v-1 - 1 1/ -1 generated by (t) = (t-1 -1) for 1. To generate a random variate from C we simply apply Algorithm 6.1 with 370 P. Embrechts et al. (t) = t-1 - 1 , -1 (t) = t1/ + 1 -1 , K-1 C (t) = i + 1 2 i + 1 2 2 - is. 6.3. Kendalĺs tau revisited Recall that Kendalĺs tau for a copula C can be expressed as a double integral of C. This double integral is in most cases not straightforward to evaluate. However for an Archimedean copula, Kendalĺs tau can be expressed as an (one-dimensional) integral of the generator and its derivative, as shown in the following theorem from Genest and MacKay (1986a). Theorem 6.5. Let X and Y be random variables with an Archimedean copula C generated by . Kendalĺs tau of X and Y is given by C = 1 + 4 1 0 (t) (t) dt. (6.5) Proof: Let U and V be U(0,1) random variables with joint distribution function C, and let KC denote the distribution function of C(U,V ). Then from Theorem 3.3 we have C = 4E C(U,V ) - 1 = 4 1 0 t dKC(t) - 1 = 4 tKC(t) 1 0 - 1 0 KC(t)dt - 1 = 3 - 4 1 0 KC(t)dt. From Theorem 6.3 and Corollary 6.1 it follows that KC(t) = t - (t) (t+) . Since is convex, (t+) and (t-) exist for all t in (0,1) and the set {t (0,1) | (t+) = (t-)} is at most countable (i.e., it has Lebesgue measure zero). Hence C = 3 - 4 1 0 t - (t) (t+) dt = 1 + 4 1 0 (t) (t) dt. Ch. 8: Modelling Dependence with Copulas 371 Example 6.7. Consider the Gumbel family with generator (t) = (-lnt) , for 1. Then (t)/ (t) = (t lnt)/. Using Theorem 6.5 we can calculate Kendalĺs tau for the Gumbel family. = 1 + 4 1 0 t lnt dt = 1 + 4 t2 2 lnt 1 0 - 1 0 t 2 dt = 1 - 1 . As a consequence, in order to have Kendalĺs tau equal to 0.5 in Figure 2 (the Gumbel case), we put = 2. Example 6.8. Consider the Clayton family with generator (t) = (t- - 1)/, for [-1,) \ {0}. Then (t)/ (t) = (t+1 - t)/. Using Theorem 6.5 we can calculate Kendalĺs tau for the Clayton family. = 1 + 4 1 0 t+1 - t dt = 1 + 4 1 + 2 - 1 2 = + 2 . Example 6.9. Consider the Frank family presented in Example 6.3. It can be shown that [see, e.g., Genest (1987)] Kendalĺs tau is = 1 - 4(1 - D1())/, where Dk(x) is the Debye function, given by Dk(x) = k xk x 0 tk et - 1 dt for any positive integer k. 6.4. Tail dependence revisited For Archimedean copulas, tail dependence can be expressed in terms of the generators. Theorem 6.6. Let be a strict generator such that -1 belongs to the class of Laplace transforms of strictly positive random variables. If -1 (0) is finite, then C(u,v) = -1 (u) + (v) does not have upper tail dependence. If C has upper tail dependence, then -1 (0) = and the coefficient of upper tail dependence is given by U = 2 - 2 lim s 0 -1 (2s) -1 (s) . 372 P. Embrechts et al. Proof: [This proof can be found in Joe (1997, p. 103).] Note that lim u 1 C(u,u) 1 - u = lim u 1 1 - 2u + -1(2(u)) 1 - u = 2 - 2 lim u 1 -1 (2(u)) -1 ((u)) = 2 - 2 lim s 0 -1 (2s) -1 (s) . If -1 (0) (-,0), then the limit is zero and C does not have upper tail dependence. Since -1 (0) is the negative of the expectation of a strictly positive random variable, -1 (0) < 0 from which the conclusion follows. The additional condition on the generator might seem somewhat strange. It will however prove quite natural when we turn to the construction of multivariate Archimedean copulas. Furthermore, the condition is satisfied by the majority of the commonly encountered Archimedean copulas. Example 6.10. The Gumbel copulas are strict Archimedean with generator (t) = (-lnt) . Hence -1(s) = exp(-s1/ ) and its derivative -1 (s) = -s1/-1 exp(-s1/)/. Using Theorem 6.6 we get U = 2 - 2 lim s 0 -1 (2s) -1 (s) = 2 - 21/ lim s 0 exp(-(2s)1/) exp(-s1/ ) = 2 - 21/ , see also Example 3.3. Theorem 6.7. Let be as in Theorem 6.6. The coefficient of lower tail dependence for the copula C(u,v) = -1((u) + (v)) is equal to L = 2 lim s -1 (2s) -1 (s) . The proof is similar to that of Theorem 6.6. Example 6.11. Consider the Clayton family given by C (u,v) = (u- +v- -1)-1/, for > 0. This strict copula family has generator (t) = (t- -1)/. It follows that -1(s) = (1 + s)-1/ . Using Theorems 6.6 and 6.7 shows that U = 0 and that the coefficient of lower tail dependence given by L = 2 lim s -1 (2s) -1 (s) = 2 lim s (1 + 2s)-1/-1 (1 + s)-1/-1 = 2-1/ . Ch. 8: Modelling Dependence with Copulas 373 Example 6.12. Consider the Frank family given by C (u,v) = - 1 ln 1 + (e-u - 1)(e-v - 1) e- - 1 for R \ {0}. This strict copula family has generator (t) = -ln((e-t - 1)/(e- - 1)). It follows that -1 (s) = - 1 ln 1 - 1 - e- e-s and -1 (s) = - 1 (1 - e- e-s 1 - (1 - e- )e-s . Since -1 (0) = e - 1 is finite, the Frank family does not have upper tail dependence according to Theorem 6.6. Furthermore, members of the Frank family are radially symmetric, i.e. C = C, and hence the Frank family does not have lower tail dependence. 6.5. Multivariate Archimedean copulas In this section we look at the construction of one particular multivariate extension of Archimedean 2-copulas. For other multivariate extensions see Joe (1997). It should be noted that in order to show that other multivariate extensions are proper copulas, we essentially have to go through the same arguments as those given below. The expression for the n-dimensional product copula n, with u = (u1,...,un)T, can be written as n(u) = u1 ...un = exp(-[(-lnu1)++(-lnun)]). This naturally leads to the following generalization of (6.1): Cn (u) = [-1] (u1) + + (un) . (6.6) In the 3-dimensional case, C3 (u1,u2,u3) = [-1] [-1] (u1) + (u2) + (u3) = C C(u1,u2),u3 , and in the 4-dimensional case, C4 (u1,...,u4) = [-1] [-1] [-1] (u1) + (u2) + (u3) + (u4) = C C3 (u1,u2,u3),u4 = C C C(u1,u2),u3 ,u4 . 374 P. Embrechts et al. Whence in general, n 3, Cn(u1,...,un) = C(Cn-1(u1,u2,...,un-1),un). This technique of constructing higher-dimensional copulas generally fails. But since Archimedean copulas are symmetric and associative it seems more likely that Cn as defined above, given certain additional properties of (and [-1]), is indeed a copula for n 3. Definition 6.2. A function g(t) is completely monotone on the interval I if it has derivatives of all orders which alternate in sign, i.e., if it satisfies (-1)k dk dtk g(t) 0 for all t in the interior of I and k = 0,1,2,.... If g :[0,) [0,) is completely monotone on [0,) and there is a t [0,) such that g(t) = 0, then g(t) = 0 for all t [0,). Hence if the pseudo-inverse [-1] of an Archimedean generator is completely monotone, then [-1](t) > 0 for all t [0,) and hence [-1] = -1. The following theorem from Kimberling (1974) gives necessary and sufficient conditions for the function (6.6) to be an n-copula. Theorem 6.8. Let be a continuous strictly decreasing function from [0,1] to [0,] such that (0) = and (1) = 0, and let -1 denote the inverse of . If Cn is the function from [0,1]n to [0,1] given by (6.6), then Cn is an n-copula for all n 2 if and only if -1 is completely monotone on [0,). This theorem can be partially extended to the case where is nonstrict and [-1] is mmonotone on [0,) for some m 2, that is, the derivatives of [-1] alter sign up to and including the mth order on [0,). Then the function Cn given by (6.6) is an n-copula for 2 n m. However, for most practical purposes, the class of strict generators such that -1 is completely monotone is a rich enough class. The following corollary shows that the generators suitable for extensions to arbitrary dimensions of Archimedean 2-copulas correspond to copulas which can model only positive dependence. Corollary 6.2. If the inverse -1 of a strict generator of an Archimedean copula C is completely monotone, then C , i.e., C(u,v) uv for all u, v in [0,1]. For a proof, see Nelsen (1999, p. 122). While it is simple to generate n-copulas of the form given by (6.6), they suffer from a very limited dependence structure since all k-marginals are identical, they are distribution functions of n exchangeable U(0,1) random variables. One would like to have a multivariate extension of the Archimedean 2-copula given by (6.1) which allows for nonexchangeability. Such multivariate extensions are discussed in Joe (1997). We will now discuss one Ch. 8: Modelling Dependence with Copulas 375 such extension in detail. Since any multivariate extension should contain (6.6) as a special case, clearly the necessary conditions for (6.6) to be a copula has to be satisfied. In the light of Theorem 6.8, we restrict ourselves to strict generators. The expression for the general multivariate extension of (6.1) we will now discuss is notationally complex. For that reason we will discuss sufficient conditions for the 3- and 4-dimensional extensions to be proper 3- and 4-copulas respectively. The pattern and conditions indicated generalize in an obvious way to higher dimensions. The 3-dimensional generalization of (6.1) is -1 1 1 -1 2 2(u1) + 2(u2) + 1(u3) , (6.7) where 1 and 2 are generators of strict Archimedean copulas. The 4-dimensional generalization of (6.1) is -1 1 1 -1 2 2 -1 3 3(u1) + 3(u2) + 2(u3) + 1(u4) , (6.8) where 1, 2 and 3 are generators of strict Archimedean copulas. The expressions (6.7) and (6.8) can be written as C1 C2(u1,u2),u3 and C1 C2 C3(u1,u2),u3 ,u4 , respectively, where Ci denotes an Archimedean copula generated by i. If generators i are chosen so that certain conditions are satisfied, then multivariate copulas can be obtained such that each bivariate marginal has the form (6.1) for some i. However, the number of distinct generators i among the n(n - 1)/2 bivariate marginals is only n - 1, so that the resulting dependence structure is one of partial exchangeability. Clearly the generators have to satisfy the necessary conditions for the n-copula given by (6.6) in order to make (6.7) and (6.8) valid copula expressions. What other conditions are needed to make these proper copulas? To answer that question we now introduce function classes Ln and L n. Let Ln = : [0,) [0,1] | (0) = 1, () = 0, (-1)j (j) 0, j = 1,...,n , n = 1,2,...,, with L being the class of Laplace transforms of strictly positive random variables. Also introduce L n = : [0,) [0,) | (0) = 0, () = , (-1)j-1 (j) 0, j = 1,...,n , n = 1,2,...,. Note that -1 L1 if is the generator of a strict Archimedean copula. The functions in L n are usually compositions of the form -1 with , L1. 376 P. Embrechts et al. Note also that with this notation, the necessary and sufficient conditions for (6.6) to be a proper copula is that -1 Ln and that, if (6.6) is a copula for all n, then -1 must be completely monotone and hence be a Laplace transform of a strictly positive random variable. It turns out that if -1 1 and -1 2 are completely monotone (Laplace transforms of strictly positive random variables) and 1 -1 2 L , then (6.7) is a proper copula. Note that (6.7) has (1,2) bivariate marginal of the form (6.1) with generator 2 and (1,3) and (2,3) bivariate marginals of the form (6.1) with generator 1. Also (6.6) is the special case of (6.7) with 1 = 2. The 3-dimensional copula in (6.7) has a (1,2) bivariate marginal copula which is larger than the (1,3) and (2,3) bivariate marginal copulas (which are identical). As one would expect, there are similar conditions for the 4-dimensional case. If -1 1 , -1 2 and -1 3 are completely monotone (Laplace transforms of strictly positive random variables) and 1 -1 2 and 2 -1 3 are in L , then (6.8) is a proper copula. Note that all 3-dimensional marginals of (6.8) have the form (6.7) and all bivariate marginals have the form (6.1). Clearly the idea underlying (6.7) and (6.8) generalize to higher dimensions. Example 6.13. Let i(t) = (-lnt)i with i 1 for i = 1,...,n, i.e., the generators of Gumbel copulas. What conditions do we have to impose on 1,...,n in order to obtain an n-dimensional extension of the Gumbel family of the form indicated above (expressions (6.7) and (6.8)). It should first be noted that -1 i L for all i, so (6.6) with the above generators gives an n-copula for all n 2. Secondly, i -1 i+1(t) = ti /i+1 . If i/i+1 / N, then the nth derivative of i -1 i+1(t) is given by i i+1 ... i i+1 - (n - 1) ti /i+1-n . Hence if i/i+1 / N, then i -1 i+1 L if and only if i/i+1 < 1. If i/i+1 N, then i -1 i+1 L if and only if i/i+1 = 1. Hence an n-dimensional extension of the Gumbel family of the form indicated above, given by exp - (-lnu1)2 + (-lnu2)2 1/2 + (-lnu3)1 1/1 in the 3-dimensional case, is a proper n-copula if 1 n. Example 6.14. Consider the Archimedean copula family given by C (u,v) = 1 + u-1 - 1 + v-1 - 1 1/ -1 generated by (t) = (t-1 - 1) for 1. Set i(t) = i (t) for i = 1,...,n. Can the above copulas be extended to n-copulas of the form indicated by (6.7) and (6.8), and if so under what conditions on 1,...,n? By calculating derivatives of -1 i and i -1 i+1 it Ch. 8: Modelling Dependence with Copulas 377 follows that -1 i L and i -1 i+1 L if and only if i/i+1 1. Hence the n-dimensional extension of the above copulas are n-copulas if 1 n. Copulas of the above form have upper and lower tail dependence, with coefficients of upper and lower tail dependence given by 2 - 21/ and 2-1/ respectively. One limiting factor for the usefulness of this copula family might be that they only allow for a limited range of positive dependence, as seen from the expression for Kendalĺs tau given by = 1 - 2/(3), for 1. Note that the results presented in this section hold for strict Archimedean copulas. With some additional constraints most of the results can be generalized to hold also for nonstrict Archimedean copulas. However for practical purposes it is sufficient to only consider strict Archimedean copulas. This basically means (there are exceptions such as the Frank family) that we consider copula families with only positive dependence. Furthermore, risk models are often designed to model positive dependence, since in some sense it is the "dangerous" dependence: assets (or risks) move in the same direction in periods of extreme events. 7. Modelling extremal events in practice 7.1. Insurance risk Consider a portfolio consisting of n risks X1,...,Xn, representing potential losses in different lines of business for an insurance company. Suppose that the insurance company, in order to reduce the risk in its portfolio, seeks protection against simultaneous big losses in different lines of business. One suitable reinsurance contract might be the one which pays the excess losses Xi - ki for i B {1,...,n} (where B is some prespecified set of business lines), given that Xi > ki for all i B. Hence the payout function f is given by f (Xi,ki); i B = iB 1{Xi>ki } iB (Xi - ki) . (7.1) In order to price this contract the seller (reinsurer) would typically need to estimate E(f ((Xi,ki); i B)). Without loss of generality let B = {1,...,l} for l n. If the joint distribution H of X1,...,Xl could be accurately estimated, calculating the expected value of (7.1) (possibly by using numerical methods) would not be difficult. Unfortunately, accurate estimation of H is seldom possible due to lack of reliable data. It is more realistic, and we will assume this, that the data available allow for estimation of the marginals F1,...,Fn of H and pairwise rank correlations. The probability of payout is given by H(k1,...,kl) = C F1(k1),...,Fl(kl) , (7.2) where H and C denotes the joint survival function and survival copula of X1,...,Xl. If the thresholds are chosen to be quantiles of the Xi s, i.e., if ki = F-1 i (i) for all i, then the 378 P. Embrechts et al. right-hand side of (7.2) simplifies to C(1 - 1,...,1 - l). In a reinsurance context, these quantile levels are often given as return periods and are known to the underwriter. For a specific copula family, Kendalĺs tau estimates can typically be transformed into an estimate of the copula parameters. For Gaussian (elliptical) n-copulas this is due to the relation Rij = sin((Xi,Xj )/2), where Rij = ij / iijj with being the dispersion matrix of the corresponding normal (elliptical) distribution. For the multivariate extension of the Gumbel family presented in Section 6.5 this is due to the relation = 1/(1 - (Xi,Xj )), where denotes the copula parameter for the bivariate Gumbel copula of (Xi,Xj )T. Hence, once a copula family is decided upon, calculating the probability of payout or the expected value of the contract is easy. However there is much uncertainty in choosing a suitable copula family representing the dependence between potential losses for the l lines of business. The data may give indications of properties such as tail dependence but it should be combined with careful consideration of the nature of the underlying loss causing mechanisms. To show the relevance of good dependence modelling, we will consider marginal distributions and pairwise rank correlations to be given and compare the effect of the Gaussian and Gumbel copula on the probability of payout and expected value of the contract. To be able to interpret the results more easily, we make some further simplifications: let Xi F for all i, where F is the distribution function of the standard Lognormal distribution LN(0,1), let ki = k for all i and let (Xi,Xj ) = 0.5 for all i = j. Then, H(k,...,k) = 1 + (-1) l 1 C1 F(k) + + (-1)l l l Cl F(k),...,F(k) , where Cm, for m = 1,...,l - 1, are m-dimensional marginals of C = Cl (the copula of (X1,...,Xl)). In the Gaussian case, Cm F(k),...,F(k) = m Rm -1 F(k) ,...,-1 F(k) , where m Rm denotes the distribution function of m multivariate normally distributed random variables with linear correlation matrix Rm with off-diagonal entries sin(0.5/2) = 1/ 2. m l (-1(F(k)),...,-1(F(k))) can be calculated by numerical integration using the fact that [see Johnson and Kotz (1972, p. 48)] m l (a,...,a) = (x) a - lx 1 - l m dx, where denotes the univariate standard normal density function. In the Gumbel case, Cm F(k),...,F(k) = exp - -lnF(k) + + - lnF(k) 1/ = F(k)m1/ , where = 1/(1 - 0.5) = 2. Ch. 8: Modelling Dependence with Copulas 379 For illustration, let l = 5, i.e., we consider 5 different lines of business. Figure 4 shows payout probabilities (probabilities of joint exceedances) for thresholds k [0,15], when the dependence structure among the potential losses are given by a Gaussian copula (lower curve) and a Gumbel copula (upper curve). If we let k = F-1(0.99) 10.25, i.e., payout occurs when all 5 losses exceed their respective 99% quantile, then Figure 5 shows that if one would choose a Gaussian copula when the true dependence structure between Fig. 4. Probability of payout for l = 5 when the dependence structure is given by a Gaussian copula (lower curve) and Gumbel copula (upper curve). Fig. 5. Ratios of payout probabilities (Gumbel/Gaussian) for l = 3 (lower curve) and l = 5 (upper curve). 380 P. Embrechts et al. Fig. 6. Estimates of E(f (X1,X2,k)) for Gaussian (lower curve) and Gumbel (upper curve) copulas. the potential losses X1,...,X5 is given by a Gumbel copula, the probability of payout is underestimated almost by a factor 8. Figure 6 shows estimates of E(f (X1,X2,k)) for k = 1,...,18. The lower curve shows estimates for the expectation when (X1,X2)T has a Gaussian copula and the upper curve when (X1,X2)T has a Gumbel copula. The estimates are sample means from samples of size 150000. Since F-1(0.99) 10.25, Figure 6 shows that if one would choose a Gaussian copula when the true dependence between the potential losses X1 and X2 is given by a Gumbel copula, the expected loss to the reinsurer is underestimated by a factor 2. 7.2. Market risk We now consider the problem of measuring the risk of holding an equity portfolio over a short time horizon (one day, say) without the possibility of rebalancing. More precisely, consider a portfolio of n equities with current value given by Vt = n i=1 iSi,t , where i is the number of units of equity i and Si,t is the current price of equity i. Let t+1 = -(Vt+1 - Vt)/Vt, the (negative) relative loss over time period (t,t + 1], be our aggregate risk. Then t+1 = n i=1 i,t i,t+1 Ch. 8: Modelling Dependence with Copulas 381 where i,t = iSi,t /Vt is the portion of the current portfolio value allocated to equity i, and i,t+1 = -(Si,t+1 - Si,t )/Si,t is the (negative) relative loss over time period (t,t + 1] of equity i. We will highlight the techniques introduced by studying the effect of different distributional assumptions for := (1,t+1,...,n,t+1)T on the aggregate risk := t+1. The classical distributional assumption on , widely used within market risk management, is that of multivariate normality. However, in general the empirical distribution of has (onedimensional) marginal distributions which are heavier tailed than the normal distribution. Furthermore, there is an even more critical problem with multivariate normal distributions in this context. Extreme falls in equity prices are often joint extremes, in the sense that a big fall in one equity price is accompanied by simultaneous big falls in other equity prices. This is for instance seen in Figure 7, an example already encountered in Figure 2. Loosely speaking, a problem with the multivariate normal distributions (or models based on them) is that they do not assign a high enough probability of occurrence to the event in which many thing go wrong a the same time ­ the "perfect storm" scenario. More precisely, daily equity return data often indicate that the underlying dependence structure has the property of tail dependence, a property which we know Gaussian copulas lack. Suppose is modelled by a multivariate normal distribution Nn(,), where and are estimated from historical prices of the equities in the portfolio. There seems to be much agreement on the fact that the quantiles of = T N( T, T ) do not Fig. 7. Daily log returns from 1989 to 1996. 382 P. Embrechts et al. capture the portfolio risk due to extreme market movements; see for instance Embrechts, Mikosch and Klüppelberg (1997), Embrechts (2000) and the references therein. Therefore, different stress test solutions have been proposed. One such "solution" is to choose s and s in such a way that s Nn(s,s) represents the distribution of the relative losses of the different equities under more adverse market conditions. The aim is that the quantiles of s = Ts N( Ts, Ts ) should be more realistic risk estimates. To judge this approach we note that Fig. 8. Quantile curves: VaR (), VaR s () and VaR () from lower to upper. Fig. 9. Quantile curves: VaR () and VaR () from lower to upper. Ch. 8: Modelling Dependence with Copulas 383 F-1 s () - Ts F-1() - T = Ts T , where F and Fs denotes the distribution functions of and s respectively. Hence the effect of this is simply a translation and scaling of the quantile curve F-1(). As a comparison, let have a t4-distribution with mean and covariance matrix and let be the corresponding portfolio return. Furthermore let n = 10, i = s,i = i = 0, i = 1/10 for all i and let (i,j ) = ( i , j ) = 0.4, (s,i,s,j ) = 0.6, ij = sin((i,j )/2), s,ij = 1.5 sin((s,i,s,j )/2) for all i,j. Then Figure 8 shows from lower to upper the quantile curves of , s and respectively. If were the true portfolio return, Figure 8 shows that the approach described above would eventually underestimate the quantiles of the portfolio return. It should be noted that this is not mainly due to the heavier tailed t4-marginals. This can be seen in Figure 9 which shows quantile curves of and = T , where is a random vector with t4-marginals, a Gaussian copula, E( ) = E() and Cov( ) = Cov(). There are of course numerous alternative applications of copula techniques to integrated risk management. Besides the references already quoted, also see Embrechts, Hoeing and Juri (2001) where the calculation of Value-at-Risk bounds for functions of dependent risks is discussed. The latter paper also contains many more relevant references to this important topic. References Barlow, R., Proschan, F., 1975. Statistical Theory of Reliability and Life Testing. Hoult, Rinehart & Winston, New York. Cambanis, S., Huang, S., Simons, G., 1981. On the theory of elliptically contoured distributions. Journal of Multivariate Analysis 11, 368­385. Capéra, P., Genest, C., 1993. Spearman's rho is larger than Kendalĺs tau for positively dependent random variables. Journal of Nonparametric Statistics 2, 183­194. Denneberg, D., 1994. Non-Additive Measure and Integral. Kluwer Academic, Boston. Embrechts, P., 2000. The bell curve is wrong: so what? In: Embrechts, P. (Ed.), Extremes and Integrated Risk Management. Risk Waters Group, pp. xxv­xxviii. Embrechts, P., Hoeing, A., Juri, A., 2001. Using copulae to bound the value-at-risk for functions of dependent risks. ETH preprint. Embrechts, P., McNeil, A., Straumann, D., 2002. Correlation and dependence in risk management: Properties and pitfalls. In: Dempster, M.A.H. (Ed.), Risk Management: Value at Risk and Beyond. Cambridge University Press, Cambridge, pp. 176­223. Embrechts, P., Mikosch, T., Klüppelberg, C., 1997. Modelling Extremal Events for Insurance and Finance. Springer, Berlin. Fang, K.-T., Kotz, S., Ng, K.-W., 1987. Symmetric Multivariate and Related Distributions. Chapman & Hall, London. Frank, M.J., 1979. On the simultaneous associativity of f (x,y) and x + y - f (x,y). Aequationes Mathematicae 19, 194­226. Fréchet, M., 1951. Sur les tableaux de corrélation dont les marges sont données. Annales de ĺUniversité de Lyon, Sciences Mathématiques et Astronomie 14, 53­77. 384 P. Embrechts et al. Fréchet, M., 1957. Les tableaux de corrélation dont les marges et des bornes sont données. Annales de ĺUniversité de Lyon, Sciences Mathématiques et Astronomie 20, 13­31. Genest, C., 1987. Frank's family of bivariate distributions. Biometrika 74, 549­555. Genest, C., MacKay, J., 1986a. The joy of copulas: Bivariate distributions with uniform marginals. The American Statistician 40, 280­283. Genest, C., MacKay, R.J., 1986b. Copules archimédiennes et familles de lois bidimensionell dont les marges sont données. The Canadian Journal of Statistics 14, 145­159. Genest, C., Rivest, L.-P., 1993. Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association 88, 1034­1043. Hoeffding, W., 1940. Massstabinvariante Korrelationstheorie. Schriften des Mathematischen Seminars und des Instituts für Angewandte Mathematik der Universität Berlin 5, 181­233. Hult, H., Lindskog, F., 2002. Multivariate extremes, aggregation and dependence in elliptical distributions. Advances in Applied Probability 34, 587­608. Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman & Hall, London. Johnson, N.L., Kotz, S., 1972. Distributions in Statistics: Continuous Multivariate Distributions. Wiley, New York. Kendall, M., Stuart, A., 1979. Handbook of Statistics. Griffin, London. Kimberling, C.H., 1974. A probabilistic interpretation of complete monotonicity. Aequationes Mathematicae 10, 152­164. Kruskal, W.H., 1958. Ordinal measures of association. Journal of the American Statistical Association 53, 814­ 861. Lehmann, E.L., 1975 Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco. Lindskog, F., McNeil, A., 2001. Common Poisson shock models: Applications to insurance and credit risk modelling. ETH preprint. Lindskog, F., McNeil, A., Schmock, U., 2001. A note on Kendalĺs tau for elliptical distributions. ETH preprint. Marshall, A.W., 1996. Copulas, marginals and joint distributions. In: Rüschendorff, L., Schweizer, B., Taylor, M.D. (Eds.), Distributions with Fixed Marginals and Related Topics. Institute of Mathematical Statistics, Hayward, CA, pp. 213­222. Marshall, A.W., Olkin, I., 1967. A multivariate exponential distribution. Journal of the American Statistical Association 62, 30­44. Mikusinski, P., Sherwood, H., Taylor, M., 1992. The Fréchet bounds revisited. Real Analysis Exchange 17, 759­ 764. Muliere, P., Scarsini, M., 1987. Characterization of a Marshall­Olkin type class of distributions. Annals of the Institute of Statistical Mathematics 39, 429­441. Nelsen, R., 1999. An Introduction to Copulas. Springer, New York. Resnick, S.I., 1987. Extreme Values, Regular Variation and Point Processes. Springer, New York. Scarsini, M., 1984. On measures of concordance. Stochastica 8, 201­218. Schweizer, B., Sklar, A., 1983. Probabilistic Metric Spaces. North-Holland, New York. Schweizer, B., Wolff, E., 1981. On nonparametric measures of dependence for random variables. The Annals of Statistics 9, 879­885. Sklar, A., 1959. Fonctions de répartition n dimensions et leurs marges. Publications de ĺInstitut de Statistique de ĺUniversité de Paris 8, 229­231. Sklar, A., 1996. Random variables, distribution functions, and copulas ­ a personal look backward and forward. In: Rüschendorff, L., Schweizer, B., Taylor, M.D. (Eds.), Distributions with Fixed Marginals and Related Topics. Institute of Mathematical Statistics, Hayward, CA, pp. 1­14. Chapter 9 PREDICTION OF FINANCIAL DOWNSIDE-RISK WITH HEAVY-TAILED CONDITIONAL DISTRIBUTIONS STEFAN MITTNIK Institute of Statistics, University of Munich, Akademiestr. 1, D-80799 Munich, Germany Ifo Institute for Economic Research, Munich, Germany Center for Financial Studies, Frankfurt, Germany MARC S. PAOLELLA Institute of Statistics and Econometrics, University of Kiel, Olshausenstr. 40, D-24098 Kiel, Germany Contents Abstract 386 1. Introduction 387 2. GARCH-stable processes 388 3. Modeling exchange-rate returns 389 3.1. Approximate maximum likelihood estimation 390 3.2. Estimation results and volatility persistence 392 3.3. Goodness of fit 395 4. Prediction of densities and downside risk 398 5. Conclusions 402 References 403 The research of S. Mittnik was supported by the Deutsche Forschungsgemeinschaft. Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 386 S. Mittnik and M.S. Paolella Abstract The use of GARCH models with stable Paretian innovations in financial modeling has been recently suggested in the literature. This class of processes is attractive because it allows for conditional skewness and leptokurtosis of financial returns without ruling out normality. This contribution illustrates their usefulness in predicting the downside risk of financial assets in the context of modeling foreign exchange-rates and demonstrates their superiority over use of normal or Studenťs t GARCH models. Ch. 9: Prediction of Financial Downside-Risk 387 1. Introduction Risk managers of financial institutions are particularly interested in the left ­ i.e., downside ­ tail of the return distribution of financial assets. To assess the short-term exposure to market risks, they are required to evaluate future shortfall probabilities or value-at-risk levels of financial investments. Such estimates can be based on the distribution of the returns themselves. For example, ever since the pioneering works of Mandelbrot (1963) and Fama (1965) there have been numerous studies investigating the appropriateness of the stable Paretian distribution for modeling the unconditional distribution of asset returns [for an overview, see, for example, Mittnik and Rachev (1993), McCulloch (1997)]. However, short-term prediction often benefits substantially when taking conditional volatility into account. The GARCH class of conditional models has been widely and both from an academic and applied perspective ­ successfully used to model returns on financial assets [see Palm (1997), Gouriéroux (1997), for surveys]. Although a stationary GARCH model with normally distributed innovations gives rise to an unconditional distribution with higher (possibly nonexistent) kurtosis than the normal, it is often found that residuals from estimated GARCH models of financial return data still tend to exhibit nonnegligible kurtosis. To allow for this, other fatter tailed distributions for GARCH innovations have been considered in the literature, most notably the Studenťs t. Only very recently has the stable Paretian distribution been considered in the context of modeling the conditional heteroscedastic distribution of asset returns. Special cases of the model considered herein were developed by McCulloch (1985), Nelson (1990), Panorska, Mittnik and Rachev (1995), and Mittnik, Rachev and Paolella (1998), while a more general case was examined in Liu and Brorsen (1995), Paolella (1999) and Mittnik, Paolella and Rachev (2000, 2002). Like the Studenťs t, the stable Paretian distribution includes the normal distribution as a special, limiting case and permits heavy-tailed distributions for GARCH innovations. However, the stable Paretian distribution allows for skewness, an attractive property in financial applications not shared by the Studenťs t. In addition to this practical aspect, the stable Paretian distribution also has the appealing theoretical property that it is the only valid distribution that arises as a limiting distribution of sums of independently, identically distributed (iid) random variables. This is highly desirable, given that error terms in econometric models are usually interpreted as random variables that represent the sum of the external effects not being captured by the model. This contribution investigates the use of asymmetric stable Paretian power GARCH models for modeling downside risk and demonstrates that this model class is more suitable than the class of Studenťs t GARCH models, particularly when one uses a goodness-of-fit criterion that focuses on the tails of the conditional distribution. The remainder is organized as follows. Section 2 discusses GARCH processes with stable Paretian innovations and stationarity conditions. Section 3 reconsiders the empirical analysis of the five exchange-rate series in Liu and Brorsen (1995) using the appropriate measure for persistence of volatility and compares the goodness of fit of the estimated stable Paretian and Studenťs t GARCH models. The problem of out-of-sample conditional 388 S. Mittnik and M.S. Paolella density prediction with particular focus on predicting downside market risk is considered in Section 4. Section 5 concludes. 2. GARCH-stable processes Sequence yt is said to be a stable Paretian power GARCH process or, in short, an S ,GARCH(r,s) process [see Panorska, Mittnik and Rachev (1995), Paolella (1999), Rachev and Mittnik (2000)], if yt = + ctt, t iid S,(0,1) (1) and c t = 0 + r i=1 i|yt-i - | + s j=1 j c t-j , (2) where S, (0,1) denotes the standard asymmetric stable Paretian distribution with stable index , skewness parameter [-1,1], zero location parameter, and unit scale parameter. There exist several notational varieties of the stable Paretian distribution; we use the same as in Samorodnitsky and Taqqu (1994) and Rachev and Mittnik (2000), whereby - eitx dH(x) = exp -c |t| 1 - i sign(t)tan 2 + it , if = 1, exp -c|t| 1 + i 2 sign(t)ln|t| + it , if = 1, (3) is the characteristic function and H denotes the distribution function corresponding to S,(,c). The density is symmetric for = 0 and skewed to the right (left) for > 0 ( < 0). Stable index , which, in general, assumes values in interval (0,2], determines the tail-thickness of the distribution. As approaches 2, tails become thinner; and for = 2 the standard stable Paretian distribution coincides with normal distribution N(0,2). For < 2, t does not possess moments of order or higher. Mittnik, Paolella and Rachev (2002) derived sufficient conditions under which the S ,GARCH(r,s) process has a unique strictly stationary solution. These are given by 1 < 2, 0 < < , c0 > 0, ci 0, i = 1,...,r, r 1, dj 0, j = 1,...,s, s 0, and that the volatility persistence, VS, defined by VS := E|Z| r i=1 i + s j=1 j (4) Ch. 9: Prediction of Financial Downside-Risk 389 for Z S,(0,1), satisfies VS 1. (5) If 1 < 2 and 0 < < , they also showed that ,, := E|Z| = 1 1 - 1 + 2 , /(2) cos arctan, , (6) where , := tan(/2) and = (1 - )cos 2 , if = 1, 2 , if = 1. (7) Restrictions 1 < 2 and 0 < < not only appear to be satisfied for the data sets used below, but also for other, even more volatile series, such as stock price indices and East Asian currencies [see Mittnik, Rachev and Paolella (1998), Mittnik, Paolella and Rachev (2000), respectively]. Analogous to the ordinary normal GARCH model (Engle and Bollerslev, 1986), we say that yt is an integrated S ,GARCH(r,s) process, denoted S ,IGARCH(r,s), if, in (5), VS = 1. In practice, the estimated volatility persistence, VS, tends to be quite close to one for highly volatile series, so that an integrated model might offer a reasonable data description. Because both finite sample and even asymptotic properties of VS and the associated likelihood ratio test statistics are not known [see, however, Mittnik, Paolella and Rachev (2000)], it is not immediately clear how one can test for an integrated process. Instead of formally testing, we suggest fitting both models and examining the change in various goodness-of-fit statistics, most notably the Anderson­Darling statistic, which is particularly relevant for assessing the models' ability to successfully model the value-at-risk (see Section 3.3 below). 3. Modeling exchange-rate returns To examine the appropriateness of the stable GARCH hypothesis, we model returns1 on five daily spot foreign exchange rates against the U.S. dollar, namely the British pound, Canadian dollar, German mark, Japanese yen, and the Swiss franc. The choice of exchange rate allows us to compare our more general GARCH specification to that used by Liu and Brorsen (1995), who set = in (2). However, our sample is somewhat larger than theirs, 1 We define the return rt in period t by rt = 100 × (lnPt - lnPt-1), where Pt is the exchange rate at time t. 390 S. Mittnik and M.S. Paolella covering the period January 2, 1980 to July 28, 1994, yielding series of lengths 3681, 3682, 3661, 3621, and 3678, respectively. Serial correlation was found to be negligible, and, as is common in practice, a GARCH(r,s) specification with r = s = 1 was sufficient to capture serial correlation in the absolute returns. Therefore, we specify a model of the form rt = + ctt, (8) c t = 0 + 1|rt-1 - | + 1c t-1 (9) for each of the five currencies. 3.1. Approximate maximum likelihood estimation Evaluation of the probability density function (pdf) and, thus, the likelihood function of the S, distribution is nontrivial, because it lacks an analytic expression. The maximum likelihood (ML) estimate of parameter vector = (,c0,0,1,1,,,) for the S ,GARCH(1,1) models (8), (9) is obtained by maximizing the logarithm of the likelihood function L(;r1,...,rT ) = T t=1 c-1 t S, rt - ct , (10) where c0 denotes the unknown initial value of ct . The ML estimation we conduct is approximate in the sense that the stable Paretian density function S,((rt -)/ct) needs to be approximated. To do so, we follow the algorithm of Mittnik et al. (1999), which approximates the stable Paretian density via fast Fourier transform of the characteristic function. DuMouchel (1973) shows that the ML estimator of the parameters of the stable density is consistent and asymptotically normal with the asymptotic covariance matrix being given by the inverse of the Fisher information matrix. Approximate standard errors of the estimates can be obtained via numerical approximation of the Hessian matrix. Below, we will demonstrate that ­ for the five series under consideration ­ the S ,GARCH(r,s) model outperforms its Studenťs t counterpart. However, it is of practical interest to know at least three things before adopting a new and more complex method: first, how easy the stable ML estimation routine is to implement; second, whether it is numerically well-behaved; and third, how fast it performs. When implemented in highlevel software which provide both FFT and linear interpolation routines (such as Matlab and Splus), the algorithm becomes a straightforward programming exercise. Our experience has shown that the method is extremely well behaved, giving rise to numerical problems only for grossly misspecified and/or overspecified models (for which the Studenťs t GARCH model also has difficulties) or, in the case of the more general class of ARMAGARCH models, when there is near zero-pole cancellation in the ARMA structure ­ a well-known difficulty in ARMA estimation. Ch. 9: Prediction of Financial Downside-Risk 391 The satisfactory behavior of the algorithm is actually not surprising for at least two reasons. First, there is no explicit numerical integration involved [as in the approach of Liu and Brorsen (1995)] and, second, the method can be made arbitrarily accurate by the choice of several tuning constants [recommendations for which are given in Mittnik et al. (1999)].Nevertheless, it is clear that the method will take longer than the (essentially closed form) evaluation of the Studenťs t density. For the series considered in this paper, use of a quasi-Newton minimization algorithm (BFGS, as implemented in Matlab 5.2) with convergence tolerance of 10-4 resulted in convergence after about 150 to 350 function evaluations (including gradient calculations). Rather contrary to our initial expectations ­ and fears ­, the choice of initial values is of surprisingly little importance. Given any "reasonable" set of values, say > 1.4, || < 0.7, || < 0.2, 0 > 0, 1 > 0 and 1 > 0.2, convergenceto the same respective maxima occurred for all five exchange-rate series under consideration, and also for the vast majority of trials from simulation experiments. From a purely numerical standpoint then, the method appears both highly reliable and "stable". Evaluation of the GARCH recursion requires presample values 0 and c0. Following Nelson and Cao (1992), one could set those to their unconditional expected values, i.e., ^c0 = ^0 1 - ^, ^,^ r i=1 ^i - s j=1 ^j and ^0 = ^^c0. (11) In the IGARCH case, (11) will be invalid, so we instead estimate c0 as an additional parameter. In fact, we chose to do this for all models considered here, as (11) will clearly be problematic for nearly integrated GARCH models. For the integrated model S ,IGARCH(1,1), the restriction 1 = 1 - ,,1 needs to be imposed. Notice that this entails evaluation of (4) at each iteration, as 1 is also dependent on values ^, ^ and ^. We compare the S ,GARCH model to the most commonly used heavy-tailed variant of the GARCH model, the Studenťs t-GARCH models in power form, say t -GARCH(r,s), given by rt = + ctt, t iid t(), (12) c t = 0 + r i=1 i|rt-i - | + s j=1 j c t-j , (13) where t() refers to the Studenťs t distribution with degrees of freedom, i.e., f (x;) = K 1 + x2 -(+1)/2 (14) 392 S. Mittnik and M.S. Paolella and K = (( + 1)/2)-1/2 (/2) . (15) Assuming 0 < < and > 1,2 taking unconditional expectations of c t in (13) shows that Ec t exists if E|T | r i=1 i + s j=1 j < 1, where T t() and , := E|T | = + 1 2 - 2 -1 2 . (16) Analogous to (4), the measure of volatility persistence for t -GARCH(r,s) models is defined to be Vt := , r i=1 i + s j=1 j . (17) Similar remarks regarding treatment of presample values and the imposing of the IGARCH constraint apply to the Studenťs t model as well. 3.2. Estimation results and volatility persistence The parameter estimates of the models are presented in Table 1. Noteworthy are the estimates of the skewness parameter : all ^ values are (statistically) significantly different from zero, although those for the British pound and German mark series are quite close to zero. In addition, when || < 0.3 and is over 1.8, the amount of skewness is, for practical purposes, slight. Skewness is most pronounced for the Japanese yen, for which ^ = 1.81 and ^ = -0.418. The persistence-of-volatility measure given in the last column of Table 1 reflects the speed with which volatility shocks die out. A V -value near one is indicative of an integrated GARCH process, in which volatility shocks have persistent effects. Under the S, assumption, the models for the Canadian dollar (VS = ^, ^,^ ^1 + ^1 = 1.001) and Japanese yen (VS = 1.002) series would suggest that they are very close to being integrated. Under the Studenťs t assumption, Vt = ^,^ ^1 + ^1 = 0.992 for the Canadian dollar, which is also rather close to being integrated, while Vt is only 0.972 for the Japanese yen. Thus, for these two currencies, the indications regarding persistence of volatility differ under the two distributional assumptions. For the other currencies, the measures are strikingly close, most notably for the German mark (VS = Vt = 0.969) and the Swiss franc 2 The condition > 1 is analogous to requiring > 1 in the stable Paretian case and implies existence of a finite first moment of the innovations. Ch. 9: Prediction of Financial Downside-Risk 393 Table 1 GARCH parameter estimatesa Intercept GARCH Distribution Persistence parameters parameters measureb 0 1 1 Shape Skew V British S, -9.773e-3 8.085e-3 0.04132 0.9171 1.359 1.850 -0.1368 0.984 (0.012) (2.39e-3) (6.42e-3) (0.0118) (0.0892) (0.0245) (0.0211) t -2.312e-3 0.01190 0.06373 0.9071 1.457 6.218 ­ 0.976 (0.010) (3.56e-3) (0.0115) (0.0200) (0.167) (0.615) Canadian S, 5.167e-3 1.034e-3 0.04710 0.9164 1.404 1.823 0.3577 1.001 (0.0614) (3.12e-4) (6.63e-3) (0.0118) (0.0143) (0.0104) (0.0209) t -2.240e-3 7.774e-4 0.06112 0.9118 1.793 5.900 ­ 0.992 (3.83e-3) (6.90e-4) (5.98e-3) (7.27e-3) (0.0150) (0.0801) German S, 2.580e-3 0.01525 0.05684 0.8971 1.101 1.892 -0.06779 0.969 (0.016) (1.61e-3) (3.44e-3) (7.42e-3) (9.78e-3) (0.0216) (0.0184) t 6.643e-3 0.01812 0.07803 0.8938 1.261 7.297 ­ 0.969 (9.21e-4) (2.25e-3) (6.45e-3) (4.43e-3) (0.147) (0.186) Japanese S, -0.01938 4.518e-3 0.06827 0.8865 1.337 1.814 -0.4175 1.002 (0.0166) (1.12e-3) (7.91e-3) (0.0124) (0.0132) (0.0107) (8.80e-3) t 5.318e-3 9.949e-3 0.07016 0.8756 1.816 5.509 ­ 0.972 (8.87e-3) (3.03e-3) (0.0119) (0.0205) (0.162) (0.461) Swiss S, -2.677e-3 0.01595 0.04873 0.9115 1.041 1.902 -0.2836 0.971 (0.0124) (3.30e-3) (6.84e-3) (0.0132) (0.144) (0.0206) (0.0722) t 8.275e-3 0.02099 0.06825 0.9061 1.159 8.294 ­ 0.968 (0.0118) (3.91e-3) (6.85e-3) (7.25e-3) (0.179) (0.933) a Estimated models: rt = + ct t , c t = 0 + 1|rt-1 - | + 1c t-1. "Shape" denotes the degrees of freedom parameter for the Studenťs t distribution and stable index for the stable Paretian distribution; "Skew" refers to the stable Paretian skewness parameter . Standard deviations resulting from ML estimation are given in parentheses. b V corresponds to VS in the stable Paretian and Vt in the Studenťs t case. V = 1 implies an IGARCH model. (VS = 0.971, Vt = 0.968). It is interesting to note that, for each of these two currencies, the log-likelihood values Lt and LS are also extremely close. These are discussed further in the next section. For all five series, we also estimated the models with the IGARCH condition imposed. Table 2 shows the resulting parameter estimates. Not surprisingly, for those models for which the persistence measure was close to unity, the IGARCH-restricted parameter estimates differ very little. For the remaining models, the greatest changes occur with the 394 S. Mittnik and M.S. Paolella Table 2 IGARCH parameter estimatesa Intercept IGARCH Distribution parameters parameters 0 1 1 Shape Skew British S, -0.01023 7.050e-3 0.03781 0.9114 1.598 1.846 -0.1340 (0.0103) (1.79e-3) (5.64e-3) ­ (0.0677) (0.0224) (0.0147) t -3.033e-3 4.237e-3 0.05774 0.9130 1.949 5.543 (0.0101) (1.68e-3) (9.83e-3) ­ (0.264) (0.484) Canadian S, 5.148e-3 1.115e-3 0.04689 0.9154 1.404 1.823 0.3578 (3.65e-3) (2.14e-4) (5.71e-3) ­ (0.0143) (0.0105) (0.0209) t -2.098e-3 4.998e-4 0.06468 0.9146 1.796 5.890 (3.48e-3) (1.37e-4) (7.54e-3) ­ (0.0226) (0.0838) German S, 8.959e-3 9.666e-3 0.04518 0.8896 1.676 1.881 0.03944 (0.0113) (1.85e-3) (6.10e-3) ­ (0.0662) (0.0217) (0.0930) t 8.851e-3 5.505e-3 0.08124 0.9003 1.741 6.560 (0.0106) (1.60e-3) (0.0106) ­ (0.231) (0.676) Japanese S, -0.01932 4.814e-3 0.06768 0.8858 1.336 1.814 -0.4175 (8.44e-3) (9.75e-4) (7.68e-3) ­ (0.0751) (0.0226) (0.0151) t 6.136e-3 5.611e-3 0.06036 0.8689 2.314 5.066 (8.57e-3) (1.31e-3) (0.0112) ­ (0.224) (0.410) Swiss S, 3.823e-3 0.01111 0.03700 0.9009 1.724 1.889 -0.1703 (0.0127) (2.65e-3) (5.40e-3) ­ (0.0419) (0.0169) (0.137) t 9.130e-3 2.047e-3 0.07125 0.9347 1.166 8.194 (0.0119) (8.34e-4) (9.13e-3) ­ (9.79e-3) (0.0996) a Estimated models: rt = + ct t , c t = 0 + 1|rt-1 - | + (1 - 1)c t-1 with IGARCH condition ^1 = 1 - ^ ^1 imposed. See footnote to Table 1 for further details. power parameter and, to a lesser extent, the shape parameters and . The former increase, while the latter decrease under IGARCH restrictions. It should also be noted that the restriction = , imposed by Liu and Brorsen (1995) when estimating GARCH-stable models for the same five currencies, is not supported by our results. This is important because, if , the unconditional first moments of ct is infinite for any < 2. The knife-edge specification = does not only induce conceptual difficulties, but also leads to a highly volatile evolution of the ct series in practical work. For our estimates, we obtain ^ < ^, which suggest that conditional volatility c t is a welldefined quantity in the sense that E(c t | rt-1,rt-2,...) < for VS < 1. Ch. 9: Prediction of Financial Downside-Risk 395 3.3. Goodness of fit We employ three likelihood-based and one empirical CDF-based criteria for comparing the goodness of fit of the candidate models. The first is the maximum log-likelihood value obtained from ML estimation. This value may be viewed as an overall measure of goodness of fit and allows us to judge which candidate is more likely to have generated the data. The second is the AICC [Hurvich and Tsai (1989), see also Brockwell and Davis (1991), Equation (9.3.7)] given by AICC = -2L + 2T (k + 1) T - k - 2 , (18) where k denotes the number of estimated parameters and T the number of observations. This is the bias-corrected information criterion of Akaike (1973), which corrects the latter's tendency to overfit. Similarly, the SBC (Schwartz, 1978), defined as SBC = -2L + k log(T ) T , (19) is a similar penalizing strategy which is commonly used. The fourth criterion is the Anderson­Darling statistic [Anderson and Darling (1952), see also Press et al. (1991), and Tanaka (1996)], given by AD = sup xR |Fs(x) - F (x)| F(x)(1 - F(x)) , (20) where F(x) denotes the cdf of the estimated parametric density, and Fs(x) is the empirical sample distribution, i.e., Fs(x) = 1 T T t=1 I(-,x] rt - ^ ^ct , where I() is the usual indicator function. The AD statistic weights discrepancies appropriately across the whole support of the distribution. This is especially important if one is interested in determining conditional shortfall probabilities, i.e., the probability of large investment losses, or so-called value-at-risk measures, where one focuses on the left tail of the conditional return distribution. Table 3 displays the aforementioned goodness-of-fit measures for the estimated models. In both the unrestricted and IGARCH restricted cases, the inference suggested from the maximum log-likelihood value L, and the AICC and SBC are identical. This is not too surprising, given the large ratio of observations to parameters, and the fact that there is only one parameter difference between the Studenťs t and stable Paretian GARCH models. 396 S. Mittnik and M.S. Paolella Table 3 Goodness-of-fit measures of estimated modelsa L AICC SBC AD S, t S, t S, t S, t Britain: GARCH -3842.0 -3828.6 7700.0 7671.2 7684.0 7657.2 0.0375 0.0244 IGARCH -3842.3 -3837.1 7698.6 7686.2 7684.6 7674.2 0.0417 0.0420 Canada: GARCH -159.92 -152.25 0335.9 0318.5 0319.9 0304.5 0.0532 0.0571 IGARCH -159.97 -153.71 0334.0 0319.4 0320.0 0307.4 0.0529 0.0633 Germany: GARCH -3986.5 -3986.2 7989.0 7986.4 7973.0 7972.4 0.0368 0.345 IGARCH -3989.9 -3999.4 7993.8 8010.8 7979.8 7998.8 0.0506 0.200 Japan: GARCH -3178.7 -3333.7 6373.4 6681.4 6357.4 6667.4 0.0401 0.0986 IGARCH -3178.8 -3334.6 6371.6 6681.2 6357.6 6669.2 0.0394 0.0793 Switzerland: GARCH -4308.6 -4308.1 8633.2 8630.2 8617.2 8616.2 0.0457 0.287 IGARCH -4314.2 -4325.0 8642.4 8662.0 8628.4 8650.0 0.0460 0.278 a L refers to the maximum log-likelihood value; AICC is the corrected AIC criteria (18); SBC is the Schwarz Bayesian criteria (19); and AD is the Anderson­Darling statistic (20). It appears that L significantly favors the Studenťs t distribution for the British pound (with values, in obvious notation, Lt = -3828.6 and LS = -3842.0) and the Canadian dollar (Lt = -152.25, LS = -159.92). For the German mark (Lt = -3896.2, LS = -3896.5) and the Swiss franc (Lt = -4308.1, LS = -4308.6), the log-likelihood values, AICC and SBC are very close, albeit larger for the Studenťs t. On the other hand, the S, assumption is favored quite strongly for the Japanese yen with LS = -3178.7 as compared to Lt = -3331.7. For the British pound, the AD statistic (ADt = 0.0244, ADS = 0.0375) slightly favors the Studenťs t model, in agreement with L, although the difference is relatively small. The AD statistics for the remaining countries all favor the stable Paretian model, particularly for the German mark (ADt = 0.345, ADS = 0.0368), the Japanese yen (ADt = 0.0986, ADS = 0.0401) and the Swiss franc (ADt = 0.287, ADS = 0.0457). The usual caveat applies, in that, statistically speaking, it is not clear to what extent these differences are significant. However, given virtually identical log-likelihood values, but AD statistics which are several times smaller for the S, distribution, one might safely conclude that, particularly in the tails of the conditional distribution, the S, model offers a distinct advantage, irrespective of its desirable theoretical properties which are not shared by the Studenťs t distribution. Ch. 9: Prediction of Financial Downside-Risk 397 Fig. 1. Comparison of the variance adjusted differences between the sample and fitted distribution functions. 398 S. Mittnik and M.S. Paolella For each currency and both distributional assumptions, Figure 1 plots the values ADt = |Fs(^t:T ) - F(^t:T )| F(^t:T )(1 - F (^t:T )) , t = 1,...,T , where T is the sample size and ^t:T denotes the sorted GARCH-filtered residuals. In most cases, most notably for the Studenťs t GARCH model of the German, Japanese and Swiss currency returns, the maximum absolute value of the ADt occurs in the (left) tail of the distribution. Turning now to the IGARCH-restricted fits, it is clear that the log-likelihood values must necessarily decrease, since none of the unrestricted GARCH models precisely satisfied the IGARCH restrictions. However, for the S, model of the Canadian dollar (L = 159.97) and Japanese yen (L = 3178.8), the log-likelihoods are very close to their unrestricted counterparts. This was expected, as the IGARCH condition for the unrestricted models of these two currencies were nearly met. Somewhat surprising, however, is the small decrease in AD values for the S, model of the Canadian dollar (ADS = 0.0529) and Japanese yen (ADS = 0.0394). Particularly for the latter two currencies, stable IGARCH models appear to describe the daily returns quite plausibly. 4. Prediction of densities and downside risk Decisions on financial investments are typically based on the expected return and the expected risk of the assets under consideration. Rather than adhering to the conventional mean-variance criterion, recent risk management concepts for financial institutions focus on the downside risk or the value-at-risk of a financial position due to market movements. In this context, a typical question would be: what is the probability that the value of a financial position will drop by 50% or more over the next period, i.e., Pr(rt+1 < -0.50)? Alternatively, one may ask what is the threshold or value-at-risk, -z( ), under which a position will not fall with a probability of 100(1 - )%; i.e., find -z( ) such that Pr(rt+1 < -z( )) = . Under unconditional normality, it would be sufficient to simply predict the conditional mean and variance to answer such questions. However, for GARCH processes driven by nonnormal, asymmetric and, possibly, infinite-variance innovations, the predictive conditional density ^ft+1|t(rt+1) = f rt+1 - (^t) ct+1(^t) rt ,rt-1,... , (21) needs to be computed. In (21), ^t refers to the estimated parameter vector using the sample information up to and including period t; and ct+1() is obtained from the conditional-scale Ch. 9: Prediction of Financial Downside-Risk 399 recursion (2) using ^t .3 Multistep density predictions, ^ft+n|t(rt+n) = f rt+n - (^t) ct+n(^t) rt ,rt-1,... , (22) are obtained by recursive application of (2) with unobserved quantities being replaced by their conditional expectations. For each of the five currencies under consideration, we evaluate ^ft+1|t(rt+1), t = 2000,...,T - 1, for the S ,GARCH(1,1) and t GARCH(1,1) models, as well as the conventional GARCH(1,1) model with normal innovations.4 We re-estimate (via ML estimation) the model parameters at each step, as would typically be done in actual applica- tions. The overall density forecasting performance of competing models can be compared by evaluating their conditional densities at the future observed value rt+1, i.e., ^ft+1|t(rt+1). A model will fare well in such a comparison if realization rt+1 is near the mode of ^ft+1|t() and if the mode of the conditional density is more peaked. The conditional densities are determined not only by the specification of the mean and GARCH equations, but also by the distributional choice for the innovations. Table 4 presents the means, standard deviations and medians of the density values ^ft+1|t(rt+1), t = 2000,...,T - 1, for each currency. Based on the means, values corTable 4 Comparison of overall predictive performancea British Canadian German Japanese Swiss Mean Normal 0.4198 1.1248 0.4064 0.4796 0.3713 t 0.4429 1.1871 0.4258 0.5207 0.3851 S, 0.4380 1.1798 0.4213 0.5173 0.3820 Standard deviation Normal 0.1934 0.5697 0.1888 0.1988 0.1620 t 0.2325 0.6802 0.2151 0.2782 0.1840 S, 0.2189 0.6482 0.2016 0.2662 0.1771 Median Normal 0.4291 1.0824 0.4178 0.5172 0.3942 t 0.4483 1.1500 0.4452 0.5261 0.4069 S, 0.4493 1.1730 0.4477 0.5242 0.4041 a The entries represent average predictive likelihood values, T -1 t=2000 ^ft+1|t (rt+1). 3 A conditionally varying location parameter, t , would be handled analogously. 4 Since the sample sizes, T , of the five currencies vary, the number of forecasts ranges from 1,621 to 1,682. 400 S. Mittnik and M.S. Paolella responding to the S, and Studenťs t assumptions are extremely close, with the Studenťs t values nevertheless larger in each case. Based on the medians, however, the stable Paretian model is (slightly) favored by the British, Canadian and German currencies. Notice that this is contrary to the model selection based on the goodness of fit measures; both AICC and AD statistics favored use of stable Paretian innovations for the Japanese yen and Studenťs t innovations for the British pound. Next, we examine how well the models predict the downside risk. Consider the valueat-risk implied by a particular model, M, namely Pr rt+1 -zM t+1( ) = , t = 2000,...,T - 1. (23) For a correctly specified model we expect 100% of the observed rt+1-values to be less than or equal to the implied threshold-values -zt+1( ). If the observed frequency ^ M := 1 T - 2000 T -1 t=2000 I(-,-zM t+1( )](rt+1) is less (higher) than , then model M tends to overestimate (underestimate) the risk of the currency position; i.e., the implied absolute zM t+1( )-values tend to be too large (small). The predictive performance for assessing the downside risk achieved by the normal, Studenťs t and stable Paretian models are compared in Table 5 for the shortfall probabilities = 0.01, 0.025, 0.05, 0.10. A comparison of the stable Paretian and Studenťs t Table 5 Comparison of predictive performance for downside riska 100 Model British German Canadian Japanese Swiss Normal 1.9036 1.5051 1.3674 1.9124 1.4899 1.0 t 1.3682 0.9031 0.7134 1.4189 1.3707 S, 1.3682 0.9031 1.3080 1.3572 1.2515 Normal 3.0339 2.6490 2.3187 2.8994 3.2777 2.5 t 2.8554 2.9500 2.1403 3.2079 3.3969 S, 2.9149 2.9500 2.4970 2.5910 3.1585 Normal 4.7591 4.5756 3.6266 4.9969 4.7676 5.0 t 5.1160 5.2378 3.9834 5.7372 5.0656 S, 5.1160 5.2378 5.0535 5.2437 5.0656 Normal 8.3879 9.2113 8.5612 8.0814 8.9392 10.0 t 9.8751 10.6562 9.9287 10.3023 10.8462 S, 9.6966 10.4154 10.2259 9.8088 10.2503 a The entries show the observed frequencies ^ M = (T - 2000)-1 T -1 t=2000 I(-,-zM t+1( )] (rt+1) multiplied by 100. For a correctly specified model, we expect ^ M . Ch. 9: Prediction of Financial Downside-Risk 401 GARCH models over the five currencies and four cutoff values, , shows that, in 4 out the 20 cases, the Studenťs t GARCH model outperforms that of the stable Paretian, while the latter is more accurate in 11 cases, sometimes considerably so (as for the Canadian dollar with = 0.025 and 0.05). The remaining 5 cases are tied. Table 6 presents summary measures5 for the predictive performance of the three models across all five currencies in the form of the mean error MEM( ) = 1 5 5 i=1 100 ^ M i - , mean absolute error MAEM ( ) = 1 5 5 i=1 100 ^ M i - Table 6 Summary measures for the predictive performancea 100 Model ME( ) MAE( ) MSE( ) Normal 0.6357 0.6357 0.4558 1.0 t 0.1549 0.3083 0.1080 S, 0.2376 0.2764 0.0861 Normal 0.3357 0.4083 0.2209 2.5 t 0.4101 0.5540 0.3527 S, 0.3223 0.3235 0.1633 Normal -0.4548 0.4548 0.4357 5.0 t 0.0280 0.4346 0.3302 S, 0.1433 0.1433 0.0273 Normal -1.3638 1.3638 2.0195 10.0 t 0.3217 0.4002 0.2517 S, 0.0794 0.2772 0.0830 Normal -0.2118 0.7156 0.7830 Aggregate t 0.2287 0.4243 0.2607 S, 0.1956 0.2551 0.0899 a Shown are the mean error (ME), mean absolute error (MAE) and mean squared error (MSE) of the observed extreme-tail frequencies from Table 5 across the five currencies. The bottom panel is the aggregate over all -values considered. 5 The measures are evaluated for 100 rather than because the resulting scales of the reported values enhance readability. 402 S. Mittnik and M.S. Paolella and the mean squared error MSEM ( ) = 1 5 5 i=1 1002 ^ M i - 2 . The ME's for the normal show that it underestimates the probability of extreme downturns (MENormal( ) > 0 for = 0.01, 0.025) and overestimates the probability of moderate downturns (MENormal( ) < 0 for = 0.05, 0.10). With one exception, the ME's of the stable Paretian and Studenťs t GARCH models are smaller (in absolute terms) than those for the normal. However, they are always positive, indicating, on average, slight underprediction of the downturn probabilities. For = 0.01 and = 0.05, the Studenťs t model has smaller ME than the stable Paretian model. This is due to the Studenťs t modeĺs offsetting prediction error for the Canadian dollar for these -values. While the ME's indicate possible systematic prediction bias, the MAEs and MSEs reflect the size of the prediction error. With respect to both measures, the stable Paretian model dominates those of both the normal and the Studenťs t for all -values considered. This is also evident from the bottom panel of Table 6, which aggregates the summary measures over all -values considered. In the aggregate, the model using the stable Paretian innovation assumption outperforms those using the normal and Studenťs t in terms of all three summary measures. 5. Conclusions Power GARCH processes driven by either stable Paretian or Studenťs t innovations have been evaluated and compared in the context of predicting downside market risk, an activity which is particularly important for risk managers of financial institutions. For all five exchange-rate series considered, the asymmetric stable Paretian distributional assumption was found to be superior. While there exist several popular model classes designed to parsimoniously and effectively fit financial return data, the GARCH class of models is arguably the most common. Furthermore, the usual assumption, and that which is implemented in popular software packages, is that the driving innovations are either normally or Studenťs t distributed. The former is the "standard" assumption in financial and even most econometric or statistical models, but fails demonstrably in empirical applications [see, e.g., Palm (1997), Gouriéroux (1997), and the references therein]. Indeed, normality is a special, limiting case of the stable Paretian distribution, which, otherwise, allows for fatter-than-normal tails and skewness, these being precisely two of the typical "stylized facts" associated with financial returns data. The Studenťs t assumption does allow for fatter tails, but is restricted to being symmetric. The latter restraint can actually be overcome if more general Studenťs t-like distributions are used (Paolella, 1999; Mittnik and Paolella, 2000), but these suggestions, while often providing admirable in- and out-of-sample fits, do not possess the theoretical Ch. 9: Prediction of Financial Downside-Risk 403 property of summability, common only to the stable Paretian (and, thus, normal) class of distributions. With respect to the summability property, one might argue that the value of stable Paretian models is, as shown here, their improved forecasting ability as compared to competing models, with such "theoretical niceties" as summability being largely irrelevant. In a larger context, however, the summability property can often be judiciously used when building more complex financial models such as those used in portfolio analysis. In such models, the ad hoc nature of, say, the Studenťs t distribution can become quite problematic. Further discussion along these lines and a test for the summability property in the context of GARCH models has been proposed in Paolella (2001) and further applied in Mittnik, Paolella and Rachev (2000). References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B., Csaki, F. (Eds.), 2nd International Symposium on Information Theory. Akademiai Kiado, Budapest, pp. 267­ 281. Anderson, T., Darling, D., 1952. Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. The Annals of Mathematical Statistics 23, 193­212. Brockwell, P., Davis, R., 1991. Time Series: Theory and Methods. Springer-Verlag. DuMouchel, W., 1973. On the asymptotic normality of the maximum-likelihood estimate when sampling from a stable distribution. The Annals of Statistics 1, 948­957. Engle, R.F., Bollerslev, T., 1986. Modelling the persistence of conditional variances. Econometric Reviews 5, 1­50; 81­87. Fama, E., 1965. The behavior of stock market prices. Journal of Business 38, 34­105. Gouriéroux, C., 1997. ARCH Models and Financial Applications. Springer, New York. Hurvich, C., Tsai, C., 1989. Regression and time series model selection in small samples. Biometrika 76, 297­ 307. Liu, S., Brorsen, B.W., 1995. Maximum likelihood estimation of a GARCH-stable model. Journal of Applied Econometrics 10, 273­285. Mandelbrot, B., 1963. The variation of certain speculative prices. Journal of Business 36, 394­419. McCulloch, J.H., 1985. Interest-risk sensitive deposit insurance premia: stable ACH estimates. Journal of Banking and Finance 9, 137­156. McCulloch, J.H., 1997. Financial applications of stable distributions. In: Maddala, G., Rao, C. (Eds.), Handbook of Statistics, Vol. 14. Elsevier, Amsterdam. Mittnik, S., Paolella, M.S., 2000. Conditional density and value-at-risk prediction of Asian currency exchange rates. Journal of Forecasting 19, 313­333. Mittnik, S., Paolella, M.S., Rachev, S.T., 2000. Diagnosing and treating the fat tails in financial returns data. Journal of Empirical Finance 7, 389­416. Mittnik, S., Paolella, M.S., Rachev, S.T., 2002. Stationarity of stable power-GARCH processes. Journal of Econometrics 106, 97­107. Mittnik, S., Rachev, S., 1993. Modeling asset returns with alternative stable models. Econometric Reviews 12, 261­330. Mittnik, S., Rachev, S.T., Doganoglu, T., Chenyao, D., 1999. Maximum likelihood estimation of stable Paretian models. Mathematical and Computer Modelling 29, 275­293. Mittnik, S., Rachev, S.T., Paolella, M.S., 1998. Stable Paretian modeling in finance. In: Adler, R.J., Feldman, R.E., Taqqu, M.S. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques for Analyzing Heavy Tailed Distributions. Birkhäuser, Boston. 404 S. Mittnik and M.S. Paolella Nelson, D., 1990. Stationarity and persistence in the GARCH(1,1) model. Econometric Theory 6, 318­344. Nelson, D.B., Cao, C.Q., 1992. Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics 10 (2), 229­235. Palm, F.C., 1997. GARCH models of volatility. In: Maddala, G., Rao, C. (Eds.), Handbook of Statistics, Vol. 14. Elsevier, Amsterdam. Panorska, A.K., Mittnik, S., Rachev, S.T., 1995. Stable GARCH models for financial time series. Applied Mathematics Letters 8 (5), 33­37. Paolella, M., 1999. Tail estimation and conditional modeling of heteroscedastic time-series. Ph.D. Thesis. Institute of Statistics and Econometrics, Christian Albrechts University at Kiel. Pro Business, Berlin. Paolella, M.S., 2001. Testing the stable Paretian assumption. Mathematical and Computer Modelling 34, 1095­ 1112. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1991. Numerical Recipes in Fortran: The Art of Scientific Computing, 2nd edition. Cambridge University Press, New York. Rachev, S.T., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley, Chichester. Samorodnitsky, G., Taqqu, M., 1994. Stable Non-Gaussian Random Processes, Stochastic Models with Infinite Variance. Chapman & Hall, London. Schwartz, G., 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461­464. Tanaka, K., 1996. Time Series Analysis, Nonstationary and Noninvertible Distribution Theory. Wiley, New York. Chapter 10 STABLE NON-GAUSSIAN MODELS FOR CREDIT RISK MANAGEMENT BERNHARD MARTIN Institute of Statistics and Mathematical Economics, University of Karlsruhe, Germany SVETLOZAR T. RACHEV Department of Statistics and Applied Probability, University of California, Santa Barbara, USA Institute of Statistics and Mathematical Economics, University of Karlsruhe, Germany e-mail: rachev@lsoe-4.wiwi.uni-karlsruhe.de EDUARDO S. SCHWARTZ Anderson School of Management, University of California, Los Angeles, USA Contents Abstract 406 1. Stable modeling in credit risk ­ recent advances 407 2. A one-factor model for stable credit returns 408 2.1. Credit risk evaluation for single assets 410 2.2. A stable portfolio model with independent credit returns 411 2.3. A stable portfolio model with dependent credit returns 413 3. Comparison of empirical results 415 3.1. The observed portfolio data 415 3.2. Generating comparable risk-free bonds from the yield curve 415 3.3. Fitting the empirical time series for Ri, Yi , and Ui 416 3.4. CVaR-results for the independence assumption 417 3.5. CVaR-results for the dependence assumption 418 4. The detection and measurement of long-range dependence 422 4.1. Fractal processes and the Hurst-Exponent 423 4.2. The Aggregated Variance Method 425 4.3. Absolute Values of the Aggregated Series 426 4.4. Classical R/S analysis 426 4.5. The modified approach by Lo 427 4.6. The statistic of Mansfield, Rachev and Samorodnitsky (MRS) 430 4.7. Empirical results for long-range dependence in credit data 431 5. Conclusion 439 References 440 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 406 B. Martin et al. Abstract Unlike the credit risk models based on the normal assumption, the model in this chapter assumes credit returns to follow a stable distribution. As empirical studies show, the daily returns of a bond and its credit spread obey a stable law, exhibiting peaked, heavy tailed, and skewed distributions. This implies the application of stable Credit Value-at-Risk (CVaR) in order to obtain a more precise measure for a bonďs risk compared to normal Credit Value-at-Risk. Describing the returns of a financial instrument subject to credit risk, our model is based on the one-factor model proposed by Rachev, Schwartz and Khindanova (2000). It separates the risky asset into a default free component subject to interest risk and a component that represents the default risk. For this model, we change the definition of the bonds' returns and derive the risk of individual corporate bonds directly from their historical market prices. Thus, we avoid to construct yield curves mapping the individual credit risk of the observed risky bonds. In an empirical example consisting of a portfolio with two corporate bonds, we compare the stable and normal Credit Value-at-Risk. Furthermore, we analyze the effects of considering the dependence among different instruments compared to the independence assumption. The second part of the chapter analyzes the presence of long-range dependence (LRD) in credit returns using time series of corporate bond indices. For the detection of LRD, we apply the classical R/S analysis by Mandelbrot and Wallis (1968), the statistic of Lo (1991), and the Mansfield­Rachev­Samorodnitsky (MRS) statistic (1999). Our results show that the Hurst-Exponent is greater than 0.5 for all four time series. Under the Gaussian assumption, the LRD hypothesis is significant for two of the four time series. Allowing a tail-index of less than 2, 1 < 2, the applied MRS-statistic also indicates significant LRD for these two series. Ch. 10: Stable Non-Gaussian Models 407 1. Stable modeling in credit risk ­ recent advances Academics and practitioners1 have examined the application of stable distributions for modeling asset returns. As it is well documented in the literature on empirical finance,2 changes in the value of a financial asset are heavy-tailed and peaked, whereas the mass of the commonly used normal distribution is located around its center. Therefore, the normal distribution3 fails to model crashes and strong upturns in financial markets. Recent research has also examined the returns of instruments subject to credit risk. Those studies4 found that credit returns are also peaked and heavy-tailed. Moreover, they turned out to be skewed. Rachev, Schwartz and Khindanova (2000) suggested the application of stable distributions for credit instruments to meet those properties. As explained above, for stable distributions, the peakedness and the heavy tails are determined by the stability index , whereas, the parameter is responsible for skewness or asymmetry. In the following, we propose a model that describes the returns of individual corporate bonds assuming those to follow a stable law. We are especially interested in determining the Value-at-Risk (VaR) of such financial instruments subject to credit risk for a given time horizon. VaR is a measure for the riskiness of an asset and determines the economic capital required for holding the asset.5 VaR models seek to measure the maximum loss of value on a given asset or liability over a given time period at a given confidence level (e.g., 95%). The VaR is defined as a threshold regarding the price change of the instrument over the observed time horizon. The return over time horizon is expected to fall below that threshold with a probability of 1 - c. It says, with a probability of 1 - c the returns are expected to be less than -VaRc.6 The VaR is expressed as P p() -VaRc = 1 - c, (1) with * p(): price change over time horizon ; * c: confidence level of VaR, e.g., 95%; * The probability that losses exceed VaRc is 1 - c. 1 See the work of Mandelbrot (1963), Fama (1965a, b), Fama and Roll (1971). 2 For example, see Rachev and Mittnik (2000). 3 The application of the normal distribution for financial returns goes back to the work of Bachelier (1900). 4 Federal Reserve System Task Force on Internal Credit Risk Models (1998), Basel Committee on Banking Supervision (1999). 5 Saunders (1999). 6 The VaR is defined as a positive number. 408 B. Martin et al. 2. A one-factor model for stable credit returns In their model for credit returns, Rachev, Schwartz and Khindanova (2000) assumed a linear relationship between the returns of a risky credit instrument and the returns of a comparable risk-free credit instrument. For such a credit instrument i, the returns are described by Ri = ai + biYi + Ui, (2) where * Ri are the log-returns of an asset i that is subject to credit risk; * Yi are the log-returns of a risk-free asset; * Ui is the disturbance. It represents the spread or the premium for the credit risk; * ai and bi are constants which are obtained by ordinary least squares (OLS) estimation. In this linear model, the returns of both the risky (Ri) and the risk-free (Yi) credit instrument are assumed to follow a strictly stable law. Moreover, the disturbance term (Ui) is a strictly stable random variable: * Ui S(,,), 1 < < 2; * Yi S ( , , ), 1 < < 2. For credit instruments, the log-return Ri,t at time t is defined as Ri,t = log Pi,t,T Pi,t-1,T -1 , (3) where Pi,t,T is the price of an instrument i subject to credit risk with maturity date T evaluated at time t. The log-returns of the riskless asset Yi,t are determined by Yi,t = log Bi,t,T Bi,t-1,T -1 , (4) with Bi,t,T as the price of the risk-free asset with maturity date T evaluated at time t. This means that all prices used for the calculation of the returns are determined on the basis of constant time-to-maturity. Therefore, the time series of log-returns (both Yi,t and Ri,t ) is calculated such that the time-to-maturity is the same for all t. It must be noted that Yi,t and Ri,t are not directly observable for individual bonds whose market price movements are recorded on a daily basis. The prices Bi,t,T ,Bi,t-1,T -1, Bi,t-2,T -2,... are calculated from the yield curve of riskless treasury bonds and Pi,t,T ,Pi,t-1,T -1,Pi,t-2,T -2,... are derived from a yield curve generated from risky bonds representing a similar level of credit risk (e.g., having equal credit ratings). Such an approach enables us to deal with constant time-to-maturity. This is crucial, because for the prices of individual bonds, time-to-maturity decreases with increasing time t. However, a decreasing time to maturity does have an effect on the credit returns. Thus, the advantage of the approach in (3) and (4) is that we do not have to pay attention on the influence of a changing time-to-maturity. Ch. 10: Stable Non-Gaussian Models 409 The effect of changing time-to-maturity on credit returns can be demonstrated by a small example with two riskless zero-bonds: the first one has a time-to-maturity of one year, the other one has a time to maturity of two years. Furthermore, the term structure is assumed to be flat, and therefore, both securities have equal yields. If the yield of both increases by the same percentage, then the price of the two-year bond reacts more sensitively, compared to the one-year bond. However, the approach of modelling the returns as in (3) and (4) is very difficult to implement in practice. Historical data of daily yield curves is available for treasury bonds, but it is practically impossible to observe a time series Pt,T ,Pt-1,T -1,Pt-2,T -2,... for an individual bond. We would have to define a number of different credit risk categories and assign individual bonds with different maturities to those.7 We would use the prices of the bonds assigned to the same risk category in order to generate the corresponding yield curve.8 In order to avoid such difficulties, we look for a more practical way to define the credit returns. Obviously, a risk manager would prefer to deal with the observed real prices of a bond to fit a model, rather than deriving prices from yield curves that have to be generated before. Moreover, each yield curve only represents an average credit risk level. Our approach proposed in the following paragraph determines the individual credit risk of the analyzed bond. A new approach to define the returns Ri and Yi . From the historical yield curve data of treasury bonds, we can construct daily prices for any riskless bond with given coupon, coupon dates, and maturity. Thus, we can generate a corresponding riskless bond i with identical specifications for each risky corporate bond i. We define the return Ri,t of a risky corporate bond as its actual (observable) daily price movement: Ri,t = log Pi,t,T Pi,t-1,T . (5) Here, time-to-maturity is no longer kept fixed. The return Ri,t is that of an individual bond i with fixed maturity date T . The riskless returns Yi,t are defined the same way: Yi,t = log Bi,t,T Bi,t-1,T . (6) This riskless bond i has the same specifications (maturity, coupon, coupon dates), as the risky bond i. With this new approach, the original linear risk-return relation Ri = ai + biYi + Ui remains, but its components Ri , Yi , and Ui now have a different meaning. Ri and Yi are individual bond returns, and the disturbance Ui incorporates both credit spread and the risk of time-to-maturity. 7 For example, one can use the rating grades assigned by Standard & Poor's or Moody's to define a risk category. 8 For example, see McCulloch (1971, 1975). 410 B. Martin et al. For all empirical examinations in this chapter, we used the model with the returns defined in (5) and (6). In the following, we present a brief summary of advantages and disadvantages of both approaches: The model whose returns are defined by Equations (3) and (4), abandons the problem of changing time-to-maturity. This is its main advantage. The disadvantage of such an approach is that yield curves have to be modelled for a number of different risk levels (e.g., corporate credit ratings), and for the risk free (treasury) bonds. After fitting the parameters a and b of Equation (2), we can simulate future scenarios for each yield curve integrating a model for the riskless returns. Such a framework would enable us to simulate future daily returns for each time-to-maturity. With the simulated yield curves, we would then be able to calculate the future returns of individual bonds. With our model defined by the returns in (5) and (6) we neither need to construct yield curves for a number of risk levels (of risky bonds), nor do we have to simulate future representations of those yield curves by applying a complex term structure model. Thus, we can directly simulate future returns of individual bonds by generating representations of Yi and Ui according to their fitted distributions. The advantage of the chosen approach is that we can work with the actual historical price data and spread information of the individual bonds, instead of generating yield curves, each for a certain risk grade. Such yield curves only represent the average of a risk grade. Studies found that in some cases a higher rated bond can even have a larger credit spread than bonds with a lower rating grade. This is due to the fact that the range of credit spreads within a given rating grade can be relatively wide and that the spread ranges of neighboring grades are usually overlapping. A reason for this effect could be that the market values the creditworthiness of an issuer differently than the rating agencies do. Sometimes the market can anticipate a change in the credit quality of an issuer before the rating agencies react. The construction of a yield curve for a given credit grade usually requires data from a large number of bonds with various issuers. The yield curve of a single issuer is calculable even only for large corporations with many issued bonds. 2.1. Credit risk evaluation for single assets In order to obtain the Credit Value-at-Risk (CVaR) for a bond i over a time horizon of one period, we perform the following steps: * We create a corresponding risk-free treasury bond with equal maturity, coupon, and coupon dates. * The estimates for ai and bi are calculated with OLSE. As in Rachev, Khindanova and Schwartz, the estimates are given by ^ai = T t=1 Y2 it T t=1 Rit - T t=1 Yit T t=1 RitYit T T t=1 Y2 it - ( T t=1 Yit )2 , (7) ^bi = T T t=1 Rit Yit - T t=1 Yit T t=1 Rit T T t=1 Y2 it - ( T t=1 Yit )2 , (8) Ch. 10: Stable Non-Gaussian Models 411 where i = 1,...,N; t = 1,...,T . With the estimates ^ai and ^bi, we obtain the residuals Ui, Ui = Ri - ^ai - ^biYi. (9) * Finally, we perform a stable fit for Ui and Yi. * In order to calculate the CVaR of asset i for one period, we simulate 1000 representations of Ri = ai + biYi + Ui. 2.2. A stable portfolio model with independent credit returns Suppose there are n different credit instruments i (bonds) in a portfolio, and let vi be the weight of security i within the portfolio.9 The return of the portfolio is given by Rp = n i=1 viRi, (10) with Rp = n i=1 vi(ai + biYi + Ui) = n i=1 viai + n i=1 vibiYi + n i=1 viUi, (11) and n i=1 vi = 1. (12) Rp can be expressed by Rp = n i=1 viai + Yp + Up, (13) with YP and Up given by Yp = n i=1 vibiYi (14) 9 vi can also be negative in case short-selling is allowed. 412 B. Martin et al. and Up = n i=1 viUi. (15) The constant ap of the total portfolio is ap = n i=1 viai. (16) As we assume the Ri to be driven by independent -stable distributions, this also means that both the Ui and the Yi , i = 1,...,n, are independent of each other. We further assume that both the Ui and the Yi, i = 1,...,n, are characterized by a common index of stability ( for the Ui, for the Yi). A common stability index allows an easy analytical solution for the parameters of the distributions for Up and Yp. For the properties of stable distributions, see Samorodnitsky and Taqqu (1994). The common index of stability is calculated as an average from the stability indices of the distributions of the individual Ui and Yi , weighted according to formula (10): = n i=1 |vi|i n i=1 |vi| (17) and = n i=1 |vi|i n i=1 |vi| . (18) With the common stability index, the parameters , , have to be re-estimated for the individual Ui and Yi first. The assumption of independent returns gives us an analytical solution for the portfolio's Up and Yp. The parameters of Up and Yp are then determined by the following expressions: Up = n i=1 |vi|Ui 1/ , (19) Up = n i=1[sign(vi)Ui (|vi|Ui )] n i=1(|vi|Ui ) , (20) Up = n i=1 viUi , (21) Ch. 10: Stable Non-Gaussian Models 413 Yp = n i=1 |vi ^bi|Yi 1/ , (22) Yp = n i=1[sign(vi ^bi)Yi (|vi ^bi|Yi ) ] n i=1(|vi ^bi|Yi ) , (23) Yp = n i=1 viYi . (24) The portfolio's returns Rp are given by (13). 2.3. A stable portfolio model with dependent credit returns This section introduces a solution for modelling the dependence between credit returns on the one hand, and integrating the skewness-property of their distributions on the other hand. Each variable Ui and Yi is split into a dependent symmetric and into an independent skewed component. Both components are independent of each other; Ui = U (1) i + U (2) i , (25) Yi = Y (1) i + Y (2) i . (26) By the example of Ui, we show the derivation of the parameters for the two independent components. Both components are defined to have identical stability indices: U (1) i S(1,1,0), (27) U (2) i S(2,2,0). (28) Because of the independence of U (1) i and U (2) i , the parameters' values of Ui are calculated as follows: = 1 + 2 1/ , (29) = 1 1 + 2 2 1 + 2 . (30) U(1) i is symmetric, therefore 1 = 0. We also set equal values for the scale parameters, 1 = 2 = . Thus, the parameters of Ui are: = 21/ , (31) 414 B. Martin et al. = 1 2 2. (32) Summing up the results for the parameters, we have: 1 = 2 = = 2-1/, 2 = 2 (2 is for the skewed component U (2) i ), and 1 = 0 (1 is for the symmetrical component U (1) i ), U (1) i S 2-1/ ,0,0 , (33) U (2) i S 2-1/ ,2,0 . (34) Analogously, Yi is split into Y (1) i +Y (2) i , and their parameters are obtained the same way. The return of the credit instrument i is then given by Ri,t = a + b Y (1) i,t + Y (2) i,t + U (1) i,t + U (2) i,t . (35) The symmetric components Y (1) i,t and U (1) i,t are used to incorporate the dependence among the n assets. The dependence structure of the SS10 vectors (U (1) 1 ,U (1) 2 ,...,U (1) n ) and (Y (1) 1 ,Y (1) 2 ,...,Y (1) n ) is modelled by representing them as sub-Gaussian vectors. Thus, (U(1) 1 ,U(1) 2 ,...,U(1) n ) is represented as U(1) 1 ,U(1) 2 ,...,U(1) n A1/2 G1,A1/2 G2,...,A1/2 Gn , (36) where A is a totally skewed /2-stable random variable with A S/2 cos 4 2/ ,1,0 and G = (G1,G2,...,Gn) is an n-dimensional Gaussian zero mean random vector. Let Rij = EGiGj , i,j = 1,...,n, be the covariances within the vector G = (G1,G2, ...,Gn). Then (U(1) 1 ,U(1) 2 ,...,U(1) n ) is generated by simulating a representation of the Gaussian vector G with correlated elements G1,G2,...,Gn and an independent representation of the /2-stable random variable A.11 The generation of vector (Y (1) 1 ,Y (1) 2 ,...,Y (1) n ) is performed analogously. 10 A SS vector is a symmetrically stable random vector. 11 There are various ways to model the dependence. For example, see Rachev, Khindanova and Schwartz (2000). Ch. 10: Stable Non-Gaussian Models 415 3. Comparison of empirical results In order to illustrate the effects of the different assumptions on our stable credit modelling, we perform an empirical example. We chose a portfolio consisting of two corporate bonds and calculated its daily Credit Value-at-Risk (CVaR) for the independent and the dependent case. 3.1. The observed portfolio data As our sample-portfolio, two corporate bonds (country: US market; currency: US-Dollars) were selected. Both bonds pay coupons twice a year. Historical prices were obtained from Bloomberg12 for the past four years (March 14th 1996 up to March 13th 2000). According to their credit ratings, the bonds exhibit considerable credit risk. For our portfolio, we assume to have one unit of each security. Both have a nominal value of 100 US $ (see Table 1). 3.2. Generating comparable risk-free bonds from the yield curve First, we calculate the daily returns of the above listed bonds using market prices. Then, for each bond a corresponding riskless bond was generated in order to derive the values for the Yi. The corresponding riskless bond has the same specifications (maturity, coupon, coupon date) as the risky corporate bond. The history of daily prices of these artificial treasury bonds were calculated from the daily yield curves. The treasury-yield curve for each day was approximated by prices of 9 risk-free zero bonds with maturities: 0.25, 0.5, 1, 2, 3, 4, 5, 7, 10 years. These 9 points were interpolated by a natural cubic spline algorithm.13 Table 1 Bonds selected for sample-portfolio Corporation Coupon Rating (S&P/Moodys) Maturity Pennzoil (Bond 1) 10.25 BBB+/Baa2 11/05 United Airlines (Bond 2) 9.0 BB+/Baa2 12/03 Table 2 Estimates for a and b Corporation ^ai ^bi Pennzoil (Bond 1) 0.0000 0.9262 United Airlines (Bond 2) 0.0000 0.9878 12 Bloomberg Information System, Corporate Bonds Section. 13 Burden and Faires (1997). 416 B. Martin et al. With the obtained daily yield curves, we can generate historical prices for our artificial treasury bonds and calculate their daily returns according to (6). Next, we perform the linear regressions to estimate the parameters a and b of the equations Ri = ai + biYi + Ui. The resulting ^ai and ^bi are OLS-estimates (see Equations (7) and (8), and Table 2). With the values ^b and ^a the estimates for the disturbances Ui can be calculated: Ui = Ri - ^a - ^bYi. We now have the empirical distributions for Ri , Yi, and Ui . Next, we apply both a stable and a normal fit to those. During the available sample period from 1996 to 2000, time-to-maturity for the observed bonds reduces by 41% and 52%. The question rises if this has a systematic effect on the fitted parameters of the Yi over time. However, in our case empirical analysis found no evidence for this. 3.3. Fitting the empirical time series for Ri, Yi, and Ui For the stable fit, we applied the Maximum Likelihood Estimation (MLE) to obtain the four parameters of the distribution. The stable densities were approximated via Fast Fourier Transformation.14 The procedure was implemented with Matlab 5.3. Table 3 Parameters for R fitted with stable and normal distribution Corporation Stable Normal alpha beta sigma mu mean std-dev Pennzoil (Bond 1) 1.5451 -0.0690 0.0019 0.0000 -0.0001 0.0041 United Airlines (Bond 2) 1.5199 -0.0744 0.00164 0.0000 -0.0001 0.0035 Table 4 Parameters for Y fitted with stable and normal distribution Corporation Stable Normal alpha beta sigma mu mean std-dev Pennzoil (Bond 1) 1.3639 -0.0297 0.0012 0.0000 -0.0001 0.0027 United Airlines (Bond 2) 1.2811 0.0062 0.0009 0.0000 -0.0001 0.0022 Table 5 Parameters for the disturbance U fitted with stable and normal distribution Corporation Stable Normal alpha beta sigma mu mean std-dev Pennzoil (Bond 1) 1.0348 -0.0247 0.0006 0.0000 0.0001 0.0027 United Airlines (Bond 2) 1.1663 0.0117 0.0008 0.0000 0.0000 0.0032 14 For example, see Rachev and Mittnik (2000). Ch. 10: Stable Non-Gaussian Models 417 The parameters of the stable and Gaussian distributions fitted for the Ri, Yi , and Ui are shown in Tables 3, 4, and 5. 3.4. CVaR-results for the independence assumption The assumption of independence between the bonds in our portfolio leads to the application of the equations in Section 2.2. We perform the stable fit for both the Yi and the Ui based on a common stability index, and select = 1.10 for the Ui, and = 1.32 for the Yi . Reestimating the parameters by a stable fit applying common stability indices, we obtain the results presented in Tables 6 and 7. The parameters of the portfolio's Up and Yp, given by Up = v1U1 + v2U2 and Yp = v1 ^b1Y1 + v2 ^b1Y2, (37) are determined by the relationships presented in Section 2.2. With v1 = v2 = 0.5, we have Up = 0.5U1 + 0.5U2 and Yp = 0.5 ^b1Y1 + 0.5 ^b1Y2. (38) Table 6 Parameters for the disturbance U fitted with stable and normal distribution assuming = 1.10 Corporation Stable alpha beta sigma mu Pennzoil (Bond 1) 1.1000 -0.0047 0.0007 0.0000 United Airlines (Bond 2) 1.1000 0.0828 0.0011 0.0000 Table 7 Parameters for Y fitted with stable and normal distribution assuming = 1.32 Corporation Stable alpha beta sigma mu Pennzoil (Bond 1) 1.3200 0.0089 0.0013 0.0001 United Airlines (Bond 2) 1.3200 -0.0430 0.0010 0.0000 Table 8 Stable parameters for portfolio Up and Yp Stable alpha beta sigma mu Yp 1.3200 -0.0137 0.0009 0.0000 Up 1.1000 0.0497 0.0008 0.0000 418 B. Martin et al. The results for the parameters of Up and Yp are printed in Table 8. Their calculation is performed according to Equations (19)­(24). The resulting equation describing the portfolio's returns is Rp = 0.5(^a1 + ^a2) + Yp + Up = 0.0000 + Yp + Up. (39) Based on this, we can simulate 1000 daily returns. This provides us the daily Credit Value-at-Risk of the portfolio. For the stable model with independence assumption, we obtain a one-day CVaR of 0.67% at the 95% level and a one-day CVaR of 2.24% at the 99% level. So far, we have assumed both bonds to be independent of each other. However, empirical examinations exhibit strong dependence among the Yi and among the Ui. Therefore, the following section presents the results of the model in Section 2.3 incorporating dependence among the Ui and dependence among the Yi. 3.5. CVaR-results for the dependence assumption Calculating the Gaussian covariances and correlations between the Ui and Yi of our example portfolio, the results are presented in Tables 9­12. The modelling of the dependent case ­ as demonstrated in the former section ­ is performed by splitting both the Yi and the Ui into two components. The first component includes the dependence which is modelled by a sub-Gaussian random vector. The second component exhibits the skewness (see Section 2.3). Table 13 provides the Credit-Value-at-Risk (CVaR) for the 95% and 99% level with horizon one day, comparing both stable models (independent and dependent case) with the empirical data. Table 9 cov(Yi,Yj ), i,j = 1,2 cov(Yi,Yj ) 10-4 Y1 Y2 Y1 0.7699 0.5821 Y2 0.5821 0.4785 Table 10 cov(Ui,Uj ), i,j = 1,2 cov(Ui,Uj ) 10-4 U1 U2 U1 0.1038 0.0850 U2 0.0850 0.0748 Table 11 cor(Yi,Yj ), i,j = 1,2 cov(Yi,Yj ) Y1 Y2 Y1 1.0000 0.9591 Y2 0.9591 1.0000 Table 12 cor(Ui,Uj ), i,j = 1,2 cov(Ui,Uj ) U1 U2 U1 1.0000 0.9653 U2 0.9653 1.0000 Ch. 10: Stable Non-Gaussian Models 419 Table 13 Stable portfolio Credit Value-at-Risk (one-day) as log-price and percental price changes 95% 99% log-price change perc. change (%) log-price change perc. change (%) Empirical 0.0054 0.54 0.0108 1.08 Dependent 0.0060 0.60 0.0242 2.40 Independent 0.0068 0.67 0.0226 2.24 Table 14 Gaussian portfolio Credit Value-at-Risk (one-day) as log-price and percentile price changes 95% 99% log-price change perc. change (%) log-price change perc. change (%) Dependent 0.0061 0.61 0.0087 0.87 Independent 0.0044 0.44 0.0063 0.63 For comparison, Table 14 presents the CVaR assuming the returns to follow a Gaussian law. The results for CVaR confirm the earlier findings15 that for credit returns the Gaussian VaR is only acceptable for the 95% level, but does underestimate the 99% level. The stable VaR is also appropriate for the 95% level, but it is a more conservative measure for the 99% level. This is actually good because the empirical VaR tends to underestimate the true VaR due to the low number of observations in the tails.16 Calculating the CVaR also for longer horizons, e.g., 10 days, we would have to build the 10-day returns for both the corporate bonds and their corresponding treasury bonds from the empirical data, and fit the above models with those data. It has to be pointed out that longer horizons cannot be calculated by taking the one-day return model and extend it to the desired horizon by simply applying a Lévy process with independent increments. Subsequent observations of the returns are not i.i.d. as volatility clustering can be observed and long-memory effects might occur. Thus, the volatility for a multiple-day horizon cannot be obtained by a simple scaling approach.17 Longer forecast horizons require new types of models while sample data should be available for longer periods. So far we have dealt with the phenomenon of heavy tails in credit returns. Two others, volatility clustering and long-range dependence, have already been mentioned. The following part of the chapter explains the phenomenon of long-range dependence. It introduces the theory and possible ways of detection. Finally, we examine such behavior for credit return data. 15 Rachev, Schwartz and Khindanova (2000). 16 See Rachev and Mittnik (2000). 17 See Christoffersen, Diebold and Schuermann (1998). 420 B. Martin et al. Fig.1.Stablemodelsforindependentanddependentcase:densityofdailylog-returns(portfolio). Ch. 10: Stable Non-Gaussian Models 421 Fig.2.Stablemodelsforindependentanddependentcase:densityofdailylog-returns(portfolio)­tailsonleft-handside. 422 B. Martin et al. 4. The detection and measurement of long-range dependence Time series can have a long memory. Those systems are not independently identically distributed. This phenomenon is often referred as burstiness in the literature.18 The underlying stochastic processes for such burstiness are called fractal. Fractal processes with a long memory are called persistent. A common characteristic of those fractal processes is that their space time is governed parsimoniously by power law distributions. This effect is called the "Noah-Effect", explaining the occurrence of heavy tails and infinite variance. It can be observed as the tendency of time series for abrupt and discontinuous changes. Another property of fractal processes is hyperbolically decaying autocorrelations, which is known as the "Joseph-Effect". It is the tendency of a persistent time series to have trends and cycles. The examination of fractal processes in finance has become a popular topic over the years.19 For a long-memory process, we observe that larger-than-average representations are more likely followed by larger-than-average representations instead of lower-thanaverage representations. Hurst developed a statistic to examine the long memory of a stochastic process. As significant autocorrelations are often not visible, he came up with a new methodology to provide a measure (the Hurst-Exponent) for long-range dependence within a time series. Due to the failures of traditional capital market theory which is largely based on the theory of martingales, researchers experienced that markets do not follow a purely random walk. The fractal market hypothesis was developed. The existence of self-similar structures is a major component of it. For self-similar processes, small increments of time are statistically similar to larger increments of time. Self-Similarity is defined as follows:20 Let Xt be a stochastic process with a continuous time t. Xt is self-similar with self-similarity parameter H (H -ss), if the re-scaled process with time scale ct, c-H Xct , is equal in distribution to the original process Xt , Xt d = c-H Xct . (40) This means, for a sequence of time points t1,...,tk and a positive stretch factor c, the distribution of c-H (Xct1 ,...,Xctk ) is identical with the one of Xt1,...,Xtk . In other words, the path covered by a self-similar process always looks the same, regardless of the scale it is observed with. In terms of financial data this means: no matter if we have intraday, daily, weekly, or monthly data, the plots of the resulting processes have similar looks. For further information on self-similarity we refer to Samorodnitsky and Taqqu (1994), or Beran (1994). 18 Willinger, Taqqu and Erramilli (1996). 19 For example, we refer to Mandelbrot (1997a, b, 1999), and Peters (1994). 20 Beran (1994). Ch. 10: Stable Non-Gaussian Models 423 4.1. Fractal processes and the Hurst-Exponent First, we consider a process without a long memory. A perfect example is Standard Brownian Motion, which is characterized as a standard random walk. 21 Commonly known is Einstein's "to the one-half" ­ rule, describing the distance covered by a particle driven by Standard Brownian Motion. It states that the distance between consecutive values of the observed time series of this particle is proportional to the square root of time:22 R T 0.5 . (41) The power of 0.5 refers to the Hurst-Exponent which is already known as the selfsimilarity parameter. For Standard Brownian Motion, the Hurst-Exponent H is equal to 0.5 which means that we have an unbiased random walk. A process with a Gaussian limiting distribution but a Hurst-Exponent H different from 0.5 is called Fractional Brownian Motion. Fractional Brownian Motion differs from Standard Brownian Motion by the fact that it is a biased random walk. The odds are biased in one direction or the other. Definition of Fractional Brownian Motion. 23 Let us assume a self-similar Gaussian process with Xt , t R, having mean zero and the autocovariance function Cov(Xt1 ,Xt2) = 1 2 |t1|2H + |t2|2H - |t1 - t2|2H VarX(1), (42) where H is the self-similarity parameter and H (0,1). Such a process is called a Fractional Brownian Motion. For H = 1/2 it becomes a Standard Brownian Motion. The increments of Fractional Brownian Motion, Yj = BH (j + 1) - BH (j), j Z, form a stationary sequence Yj which is called Fractional Gaussian Noise. Fractional Gaussian Noise. A sequence of Fractional Gaussian Noise has the following properties: (i) its mean is zero, (ii) its variance EY2 j = EB2 H (1) = 2 0 , and (iii) its autocovariance function is r(j) = 2 0 2 (j + 1)2H - 2j2H + (j - 1)2H , where j Z, j 0, and r(j) = r(-j) for j < 0. 21 See Campbell, Lo and McKinlay (1997). 22 Peters (1994). 23 Samorodnitsky and Taqqu (1994). 424 B. Martin et al. For j , r(j) behaves like a power function. lim j r(j) 0. (43) The autocorrelations are given by (j) = 1 2 (j + 1)2H - 2j2H + (j - 1)2H , (44) where j 0 and (j) = (-j) for j < 0. As j tends to infinity, (j) is equivalent to H(2H - 1)j2H-2.24 In the presence of long memory, 1/2 < H < 1, the correlations decay to zero so slowly that they are no more summable: j=(j) = . (45) For H = 1/2, i.e., a Gaussian i.i.d. process, all correlations at non-zero lags are zero. For 0 < H < 1/2, the correlations are summable, and it holds: j=(j) = 0. (46) H = 1 implies (j) = 1. For H > 1, the condition -1 (j) 1 is violated. For 0 < H < 1, a Gaussian process with mean zero and the given autocovariance function is self-similar and has stationary increments (H -sssi). The above autocovariance function is shared by all Gaussian H -sssi processes. Fractional processes with stable innovations. There are many different extensions of the Fractional Brownian motion to the -stable case with < 2. Most common is the so called Linear Fractional Stable Motion or, Linear Fractional Lévy Motion. In an analogy to the Gaussian case with = 2, the increments of Linear Fractional Stable Motion25 show long-range dependence for H > 1/. LRD for < 1 does not exist, as H must lie in (0,1). Processes with H = 1/ are called -stable Lévy Motion whose increments X(tj+1) - X(tj ) are all mutually independent. For -stable Lévy processes with infinite variance, we carefully have to interpret the value obtained for H and how it is related to the parameter d measuring the degree of long-range dependence. 24 Beran (1994). 25 Samorodnitsky and Taqqu (1994). Ch. 10: Stable Non-Gaussian Models 425 H , the Hurst-Exponent, is the scaling parameter and describes asymptotical self-simi- larity: For finite variance processes, the relation between H and d is H = d + 1 2 . (47) For processes with infinite variance ( < 2), the relation is H = d + 1 . (48) If d > 0, the time series is governed by a long-memory process. There is a number of methods to distinguish a purely random time series from a fractal one. For example, the classical R/S analysis26 determines the parameter H of a time series. The resulting graph is called pox-plot of R/S or rescaled-adjusted-range plot. Before the classical R/S method will be described, we briefly explain two other methods to derive the Hurst-Exponent H , the Aggregated Variance Method and the similar method Absolute Values of Aggregated Series.27 4.2. The Aggregated Variance Method The original time series X = (Xi, i = 1,...,N) is divided into blocks. Each block has the size m elements. The index k labels the block. The aggregated series is calculated as the mean of each block: X(m) (k) = 1 m km i=(k-1)m+1 Xi with k = 1,2,..., N m . (49) After building the aggregated series, we get the sample variance of X(m)(k) as VarX(m) = 1 N/m N/m k=1 X(m) (k) 2 - 1 N/m N/m k=1 X(m) (k) 2 . (50) The procedure is repeated for different values of m {mi, i 1}. The chosen values for m should be equidistant on a log-scale, i.e., mi+1/mi = C. As X(m) scales like m(H-1), the sample variance VarX(m) behaves like m(2H-2). Thus, plotting a log-log representation of m and VarX(m), the plots form a straight line with slope 2H - 2. 26 Mandelbrot and Wallis (1968). 27 Teverovsky, Taqqu and Willinger (1995) as well as Teverovsky, Taqqu and Willinger (1998). 426 B. Martin et al. 4.3. Absolute Values of the Aggregated Series This method is similar to the Method of Aggregated Variance explained above. Starting again with the aggregated series, we calculate the sum of the absolute values of the aggregated series. 1 (N/m) N/m k=1 X(m) (k) . (51) If the original series has a long-range dependence parameter H , the log­log-plot of m versus the corresponding values of the statistic provides us with a line of slope H - 1. 4.4. Classical R/S analysis Let us assume we have a time series of N consecutive values. Y(n) = n i=1 Xi, n 1, is the partial sum and S2(n) = 1 n n i=1[Xi - n-1Y(n)]2, n 1, is the corresponding sample variance. We define Z(t) = Y(t) - t n Y(n). The rescaled-adjusted-range statistic or R/S statistic is defined by R S (n) = 1 S(n) max 0 t n Z(t) - min 0 t n Z(t) . (52) R/S is called the rescaled adjusted range as its mean is zero, and it is expressed in terms of the local standard deviation. For large n, the expected value of the statistic approaches c1nH : E R/S(n) c1nH , (53) where c1 is a positive, finite constant and does not depend on n. In case of long-rangedependence in a Gaussian process, the values for H range in the interval (0.5,1.0). For an i.i.d. Gaussian process (i.e., pure random walk) or a short-range dependent process, the value of R/S(n) approaches c2n0.5. c2 is independent of n, finite, and positive. E R/S(n) c2n0.5 . (54) The practical application of the R/S analysis is performed graphically. It is described in Mandelbrot and Wallis (1968). With this procedure K different estimates of R/S(n) are obtained. It starts with dividing the total sample of N consecutive values into K blocks, each of size N/K. We define k(m) = (m - 1)N K + 1 (55) Ch. 10: Stable Non-Gaussian Models 427 as the starting points of each block, where K is the total number of blocks and m = 1,...,K is the current block number. Now we compute the R(n,k(m))/S(n,k(m)) for each lag n such that k(m) + n < N. All data points before k(m) are ignored in order to avoid the influence of particular short-range dependence in the data. Plotting the log(R(n,k(m))/S(n,k(m))) for each block versus log(n), we can estimate the slope of the fitted straight line. The classical R/S analysis is quite robust against variations in the marginal distribution of the data. This is also true for data with infinite variance. Calculating the Hurst-Exponent H and the stability index of the process innovations, the long-range dependence parameter d is obtained by d = H - 1 2 , (56) for finite variance ( = 2), and b d = H - 1 , (57) for infinite variance ( < 2). Long-range dependence occurs if d is greater than 0. The R/S analysis is a nonparametric tool for examining long-memory effects. There is no requirement for the time series' underlying limiting distribution. In case of an underlying Gaussian process ( = 2), a Hurst-Exponent of H = 0.5 implies that there is no long-range dependence among the elements of the time series. For 0.5 < H < 1, a Gaussian time series is called persistent.28 A persistent time series is characterized by long-memory effects. If long memory is present, the effects occur regardless of the scale of the time series. All daily changes are correlated with all future daily changes, and all weekly changes are correlated with all future weekly changes. The fact that there is no characteristic time scale is an important property of fractal time series. 0 < H < 0.5 signals an antipersistent system for finite variance. Such a system reverses itself more frequently than a purely random one. At the first glance, it looks like a meanreverting process. But this would actually require a stable mean, which is not the case in such systems. 4.5. The modified approach by Lo Hursťs R/S statistic turned out to react sensitively towards short-memory processes. Thus, Lo (1991) modified the classical R/S statistic, now showing robustness towards short-range dependence.29 Lo's statistic only focuses on lag n = N, the length of the series.30 Multiple lags are not analyzed, the statistic does not vary n over several lags < N. 28 Peters (1994). 29 Lo (1991). 30 Teverovsky, Taqqu and Willinger (1998). 428 B. Martin et al. Compared to the graphical R/S method, which delivers an estimate of the parameter H , Lo's modified statistic just indicates the presence of long-range dependence, but does not deliver an estimate of the Hurst-Exponent. The statistic performs a test of the hypotheses * H0: no long-range dependence. Instead of the ordinary sample standard deviation S for normalization, there is an adjusted standard deviation Sq in the denominator. Sq considers the elimination of short term memory to the statistic. As it is known that the R/S statistic responds very sensitively towards short range dependence, the influence of short range dependence can be offset by normalizing R with a weighted sum of short-lag autocovariances. To the variance S2 Lo added weighted autocovariances up to order q.31 His modified statistic Vq(N) is defined by Vq(N) = N-1/2 R(N) Sq(N) , (58) with Sq(N) = S2 + 2 q j=1 wj (q) ^j , (59) where ^j is the autocovariance of order j for the observed time series. wj (q) is defined as wj (q) = 1 - j q + 1 with q < N. (60) The statistic Vq(N) is applied for a hypothesis test. It checks if the null hypothesis of the test can be rejected or not, given a certain confidence level. The two hypotheses are: * H0: no long-range dependence present in the observed data, 0 < H 0.5. * H1: long-range dependence is present in the data, 0.5 < H < 1. The statistic assumes a Gaussian process ( = 2). In cases where the value of Vq(N) lies inside the interval [0.809,1.862], H0 is accepted as the statistic is in the 95% acceptance region. For Vq(N) outside the interval [0.809,1.862], H0 is rejected. Lo's results are asymptotic assuming N and q = q(N) .32 However, in practice the sample size is finite and the value of the statistic depends on the chosen q. Thus, the question arises, what would be the proper value for q in order to perform the hypothesis test? Andrews (1991) has developed a data driven method for choosing q:33 qopt = 3N 2 1/3 2 ^ 1 - ^2 2/3 , (61) 31 Peters (1994). 32 Teverowsky, Taqqu and Willinger (1998). 33 See Lo (1991). Ch. 10: Stable Non-Gaussian Models 429 here [] stands for the greatest integer smaller than the value between. ^ is the first order autocorrelation coefficient. Therefore, choosing Andrews' q assumes that the true underlying process is AR(1). Critique of Lo's statistic. Lo's statistic is applied by calculating Vq for a number of lags q, plotting those values against q. The confidence interval for accepting H0 at the 95% confidence level is plotted as well. Simulations have shown that the acceptance of H0 (and therefore the value of Vq(N)) varies significantly with q. Taqqu, Willinger and Teverowsky (1998) found that the larger the time series and the larger the value for q, the less likely H0 is rejected. Whereas, Lo's statistic just checks for the significance of long-range dependence, the graphical method of the classical R/S provides relatively good estimates of H . For small q the results of Vq usually vary strongly. Then a range of stability follows after the so called "extra" short-range dependence has been eliminated, and the only effect measurable for the statistic would be long-range dependence. Applying the statistic to Fractional Brownian Motion with H > 0.5, which is a purely long-range dependent process without short memory effects, Vq is expected to stabilize at very low values of q. Unfortunately this could not be confirmed by the testing of Taqqu, Willinger and Teverowsky (1998). Moreover, they demonstrate that ­ if q is large enough ­ the following holds for Vq(N) and q1/2-H : Vq(N) q1/2-H . (62) For H > 0.5, Vq decreases with increasing q. Even for strongly fractional processes with time series containing 10000 samples, Taqqu, Willinger and Teverowsky found that, with increasing values for q, the probability of Vq lying inside the H0 95% confidence interval and accepting the null-hypothesis grows. To mention three cases only: for q = 500 and H = 0.9 the null-hypothesis (no long-range dependence) is accepted with 90% for Fractional Brownian Motion, with 92% for FARIMA(0.5,d,0), and with 94% for FARIMA(0.9,d,0).34 Lo's test is very conservative in rejecting the null-hypothesis. It works for short-range dependence, but in cases of long-range dependence it mostly accepts the null-hypothesis. The statistic of Lo is certainly an improvement compared to the short-range sensitive classical R/S, but should not be used isolated without comparing its results with other tests for LRD. In practical applications, the question for a proper choice of q remains. The value of Andrews' data driven qopt depends on the econometric model underlying the observed time series, but, the appropriate model is not known in advance. Andrews' choice bears the assumption that the time series obeys an AR(1) process. 34 FARIMA(0.5,d,0) means a fractional ARIMA process with an AR(1) coefficient of 0.5 and an MA(1) coefficient of 0. 430 B. Martin et al. It used to be a common way to asses long-range dependence by looking at the rate at which the autocorrelations decay. With a Hurst-Exponent H different from 0.5 the correlations are no longer summable. Such non-summability of autocorrelations used to be seen as a comfortable way of assuming long-range dependence. But there are pitfalls: if the underlying process is considered to follow a stable law with < 2, a second moment does not exist and therefore autocorrelations do not exist either. It can be concluded that ­ if testing for long-range dependence ­ the application of a single technique is insufficient. 4.6. The statistic of Mansfield, Rachev and Samorodnitsky (MRS) Long-range dependence means that a time series exhibits a certain kind of order over a long comprehensive period. Instead of pure chaos with no rule in the price movements of an asset, we can find periods of time with its sample mean significantly different from the theoretical mean. The stronger the long-memory effects in the time series, the longer an interval of the series whose mean deviates from the expected value. Mansfield, Rachev and Samorodnitsky (1999) concentrate on this property of LRDexhibiting time series. This property of LRD is valid regardless of the assumed underlying stochastic model. The authors define a statistic that delivers the length of the longest interval within the time series, where the sample mean lies beyond a certain threshold. The threshold is set greater than the finite mean EXi of the whole time series. Furthermore, the time series is assumed to follow a stationary ergodic process. Expressed in mathematical terms, the statistic is defined as Rn(A) = sup j - i: 0 i < j n, Xi+1 + + Xj j - i A , (63) which is defined for every n = 1,2,.... If the supremum is taken over the empty set, the statistic is defined to be equal to zero. The set A is defined either as A = (,) with > , (64) or as A = (-,) with < , (65) where is the theoretical mean of the time series. Rn(-,) and Rn(,) are interpreted as "greatest lengths of time intervals when the system runs under effective load that is different from the nominal load".35 In the following, the examination is restricted to Rn(,). 35 Mansfield, Rachev and Samorodnitsky (1999). Ch. 10: Stable Non-Gaussian Models 431 A theoretical way to examine a time series for long-range dependence would be the log­log plot of Rn(,) versus n. In the case of long-range dependence, the slope of the plot would be expected to be greater than 1/ with as the tail index. However, is not known in advance. Therefore, Mansfield, Rachev and Samorodnitsky developed a statistic that does not rely on an a-priori tail index. They defined Wn() = Rn(,) Mn , (66) where Mn = max(X1,...,Xn) is the largest of the first n observations, n 1. This statistic has a self-normalizing nature, and because of the denominator it has the ability to compensate for the effects of the tail-index . In case of short-range dependence, the ratio Wn() approaches a weak limit as n . In case of long-range dependence, Rn grows faster than Mn and the statistic diverges. For visualization, the statistic Wn() is plotted against . Its limiting distribution is independent of . A difficult task is the selection of the proper range of . It has to be determined empirically by looking where the values for Wn() stabilize. Once the value of the statistic is at least 19 for a certain then long-range dependence is present at a significance level of 0.05. 4.7. Empirical results for long-range dependence in credit data For our empirical examination of long-memory effects in daily credit return data, we use the returns of bond indices provided by Merill Lynch.36 We have selected four indices with time series of daily index prices from January 1988 to April 2000. Each index represents a number of bonds with similar properties (see explanation in Table 15). As the analysis of long-memory effects requires large data samples, an important criterion for the selection of an index was the available sample size. The sample sizes are listed in Table 16. We apply three different methods for estimating the self-similarity parameter H and two methods performing a hypothesis test regarding the presence of LRD. As explained before, we have chosen (i) the "Aggregated Variance Method", (ii) the method "Absolute Values of Aggregated Series", (iii) the classical R/S analysis developed by Mandelbrot and Wallis, (iv) Lo's modified R/S statistic, and (v) the statistic of Mansfield, Rachev and Samorodnitsky (MRS). All these methods have been implemented with Matlab 5.3. Methods (i)­(iii) provide an estimate of the Hurst-Exponent H . Method (iv) tests if the null hypothesis "no long-range dependence" has to be accepted or rejected at a given confidence level. Method (v) is also a hypothesis test, however, contrary to Lo's test it works independently of the tail index. 36 The time series were obtained via Bloomberg's Index Section. 432 B. Martin et al. Table 15 Explanation of the selected indices Index Explanation X0H0 High Yield 175 C8B0 Corporates C rated, cash pay J0A3 AAA-AA rated corporates, time to maturity 15 yrs C0A0 US Corporate master Table 16 Data sets used for testing LRD Index No. of observations Starting date Ending date X0H0 3083 10­31­86 04­30­00 C8B0 3470 10­31­86 04­30­00 J0A3 2920 08­04­88 04­30­00 C0A0 3472 01­04­88 04­30­00 Table 17 Results for Aggregated Variance and Absolute Values of the Aggregated Series Index H for Aggreg. Variance H for Abs. Values of Aggreg. Ser. X0H0 0.7632 0.7596 C8B0 0.5527 0.5511 J0A3 0.8070 0.8022 C0A0 0.5856 0.5838 Testing the index-returns for long-range dependence, we computed the daily changes of the index log-prices rt = log(pt ) - log(pt-1). (67) The results of the methods "Aggregated Variance" and "Absolute Values of the Aggregated Series". For methods (i) and (ii), we plotted the values of the statistic over m (number of elements in each block), with m ranging from 10 up to 40. Finally, we determined the slope of the data points in order to obtain H . The values for H are printed in Table 17. Both methods calculate Hurst-Exponents greater than 0.5 for all observed indices. Thus, under the Gaussian assumption, the underlying processes are long-memory processes. X0H0 and J0A3 show strong LRD, whereas C8B0 and C0A0 have a weaker long memory. The results of classical R/S and Lo's statistic. As we only have about 3000 observations for each time series, we do not divide the data set into several blocks for the classical R/S statistic. Thus, we choose K = 1. The results of classical R/S and the values of Lo's statistic Vq (for q we chose a range of 1,...,50) are presented in Table 18. We plotted both the log(R/S)­log(n), and the Ch. 10: Stable Non-Gaussian Models 433 Table 18 Results for the classical R/S statistic and Lo's test Classical R/S Lo's statistic Index Fitted H Range of Vq (q = 1,... ,50) Optimal q (Andrews) XOHO 0.7579 [1.74, 3.44] 11 C8B0 0.4874 [1.33, 1.40] 6 J0A3 0.9213 [2.04, 4.29] 10 C0A0 0.4493 [1.23, 1.40] Vq­q graphs for our observed indices X0H0, C8B0, J0A3, and C0A0 (see Figures 3­6 for classical R/S, and Figure 7 for Lo's test). The second column of Table 18 presents the Hurst-Exponent estimated with the R/S statistic. In the third column the table presents the intervals in which the values of Lo's Vq are located for q = 1,...,50. The fourth column provides the optimal lag q, determined by Andrews' data driven method.37 The results of the R/S-statistic are similar to the ones obtained by the Aggregated Variance Method and the Absolute Values of the Aggregated Series. The time series of X0H0 and J0A3 exhibit strong LRD according to their Hurst-Exponent H . This is supported by the result of Lo's test that rejects the null-hypothesis "no LRD" at the 95% level. However, for C8B0 and C0A0, the Hurst-Exponent appears already in the area of antipersistence. Another interesting finding is that for C0A0 ­ which has the lowest value for H ­ the sample autocorrelation of order 1 is negative. Therefore, we cannot calculate the optimal q for C0A0. 4.7.0.1. The results of the statistic by Mansfield, Rachev and Samorodnitsky (MRS). Figures 8­11 show the plots of Wn() over the range of . For the time series of the index X0H0, we found that the statistic Wn() is linearly increasing with in the range of [0.5 e-4,3.5 e-4] (the empirical mean of the whole series is 0.497 e-4). The value of Wn() reaches levels of about 19 and then declines until it stabilizes at a level of about 1 (see Figure 8). This result clearly indicates the presence of LRD. The presence of long memory is significant at the 0.05 level once the value of the statistic is at least 19. Thus, the MRS-statistic supports the LRD-hypothesis for X0H0. Lo's statistic and classical R/S also indicate long-range dependence for the index X0H0, but this was based on the assumption that the underlying process of the time series follows a Gaussian law, i.e., that = 2. However, the MRS-statistic is independent of . The second bond index that exhibits strong LRD in its returns with the former tests, was the J0A3-index (C rated corporates). Its empirical mean is -3.2 e-4. We observe a sharp increase of W() for [0,6.5 e-4] up to a value of about 15, and it finally drops to a level of about 1 as well. Thus, the hypothesis of long-range dependence can also be confirmed for the J0A3-series as the MRS-statistic also exhibits significant values (see Figure 9). However, the significance is not as strong as for the X0H0-series. 37 See Lo (1991). 434 B. Martin et al. Fig. 3. Plot of log(R/S)­log(n) for X0H0. Fig. 4. Plot of log(R/S)­log(n) for J0A3. Ch. 10: Stable Non-Gaussian Models 435 Fig. 5. Plot of log(R/S)­log(n) for C0A0. Fig. 6. Plot of log(R/S)­log(n) for C8B0. 436 B. Martin et al. Fig.7.TheplotsofLo'sstatisticforX0H0,J0A3,C0A0,andC8B0. Ch. 10: Stable Non-Gaussian Models 437 Fig. 8. Plot of W()­ for X0H0. Fig. 9. Plot of W()­ for J0A3. 438 B. Martin et al. Fig. 10. Plot of W()­ for C0A0. Fig. 11. Plot of W()­ for C8B0. Ch. 10: Stable Non-Gaussian Models 439 The returns of the two other indices, C0A0 and C8B0, do not exhibit long-range dependence with the W()-statistic, and this is consistent with the results of the formerly applied tests. The returns of the C8B0-index show a higher probability for the LRDhypothesis than the returns of C0A0, however, both are not significant. Thus, for both indices C0A0 and C8B0, there is no significant indication for long-range dependence with the MRS-statistic (see Figures 10 and 11). 5. Conclusion In the first part of this chapter we have shown the predominant performance of Value-atRisk models based on stable distributions compared to Gaussian models. Furthermore, we have presented a modified model for credit returns which makes practical implementation easier. In the second part of the chapter we have studied long-range dependence in credit return data. In Section 3 we have illustrated that the stable distribution much better approximates the tail of the empirical distribution of credit returns. This is especially important for Value-atRisk (VaR) applications. VaR has become increasingly important for risk management. The stable VaR exhibits excellent performance for the high quantiles (i.e., 99% VaR). While the Gaussian 99% VaR underestimates the empirical VaR, the stable 99% VaR slightly overestimates it. Thus, the heavy-tailedness property of time series of credit returns is captured very uniquely by the application of non-Gaussian stable distributions, as well as the skewness property. The stability indices of the fitted corporate bond returns lie in the range of 1.5­1.6, which clearly indicates heavy-tailedness. In this context, a slightly modified model for credit returns has also been presented which can be implemented without the building of yield curves for various rating grades. It makes a practical application less burdensome. The other phenomenon that has been analyzed in this work is the long-memory property of credit returns (Section 4). A sign of long memory is the "burstiness" of plotted time series. Long-range dependence is characterized by hyperbolically decaying autocorrelations and the property that large (small) representations are more likely followed by large (small) representations than small (large) representations. While three of the chosen tests measure the Hurst-Exponent, the other two are hypothesis-tests checking the significance of the LRD-hypothesis. Applying the methods "Aggregated Variance" and "Absolute Values of Aggregated Series", all four analyzed time series exhibit a Hurst-Exponent H greater than 0.5, which means long-range dependence under the Gaussian assumption. For two of the four credit return series, the modified R/S statistic of Lo confirms LRD to be significant. This is remarkable because Lo's test tends to confirm the null-hypothesis "no LRD" for large sample sizes and increasing lag q, even when the actual process is strongly long-range dependent.38 Also allowing infinite variance ( < 2), we apply the MRS-statistic. It analyzes a 38 Teverowsky, Taqqu and Willinger (1998). 440 B. Martin et al. process for LRD without relying on the tail-index. For the X0H0 and J0A3 series which have been confirmed for LRD by Lo's test, the MRS-statistic W() also indicates significant long memory. This is probably the strongest result of our LRD studies which states that long-range dependence in credit returns is also found to be significant in combination with the non-Gaussian stable assumption. Our examinations have only focused on the returns. However, for other financial series ­ such as stock prices ­ LRD has also been discovered in the trading time process, as demonstrated by Marinelli et al. (1999). The use of bond indices for the empirical examination, instead of individual bonds, is advantageous in two respects: First, each index incorporates numerous bonds of a certain market segment. Thus, the obtained results can then be considered a widespread phenomenon. If only a small number of bonds within the observed indices would exhibit such an effect, it would probably fade away. Second, LRD-analysis requires large samples which are more readily available for indices than for single bonds. Finally we can conclude that the issue of long memory cannot be neglected for time series of credit returns. The increments of the underlying stochastic process are not i.i.d. With the proven LRD in the time series of credit returns and by demonstrating that the distribution of credit returns is better captured with stable non-Gaussian models, we obtain a powerful tool to generate accurate forecasts of Value-at-Risk for longer horizons. References Andrews, D., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817­858. Bachelier, L., 1900. Theorie de la spéculation. Annales de École Normale Superieure Series 3 (17), 28­86. English translation: Coonter, P.H., (Ed.), The Random Character of Stock Market Prices. MIT Press, Cambridge, MA, 1964. Basle Committee on Banking Supervision, 1988. International Convergence of Capital Measurement and Capital Standards. Bank of International Settlements. Basle Committee on Banking Supervision, 1999. Credit Risk Modeling: Current Practices and Applications. Bank of International Settlements. Beran, J., 1994. Statistics for Long-Memory Processes. In: Monographs on Statistics and Applied Probability, Vol. 61. Chapman and Hall. Burden, R.C., Faires, J.D., 1997. Numerical Analysis. Sixth edition. Brooks/Cole. Campbell, J.Y., Lo, A.W., MacKinlay, A.C., 1997. The Econometrics of Financial Markets. Princeton University Press, Princeton, NJ. Christoffersen, P.F., Diebold, F.X., Schuermann, T., 1998. Horizon problems and extreme events in financial risk management. Economic Policy Review, Federal Reserve Bank of New York, 4. Fama, E.F., 1965a. The behavior of stock market prices. Journal of Business 38, 34­105. Fama, E.F., 1965b. Portfolio analysis in a stable Paretian market. Management Science 11, 404­419. Fama, E.F., 1970. Risk, return, and equilibrium. Journal of Political Economy 78, 30­55. Fama, E.F., Roll, R., 1971. Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association 66, 331­338. Federal Reserve System Task Force on Internal Credit Models, 1998. Credit risk models at major US banking institutions: Current state of the art and implications for assessment of capital adequacy. Ch. 10: Stable Non-Gaussian Models 441 Gamrowski, B., Rachev, S.T., 1994. Stable models in testable asset pricing. In: Anastassiou, G., Rachev, S.T., (Eds.), Approximation, Probability, and Related Fields. Plenum, New York, pp. 223­236. Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press. Hull, J.C., 2000. Options, Futures, and Other Derivatives. Prentice-Hall. Hurst, S.R., Platen, E., Rachev, S.T., 1997. Subordinated market index models: A comparison. Financial Engineering and the Japanese Markets 4, 97­124. Lo, A.W., 1991. Long-term memory in stock market prices. Econometrica 59 (5), 1279­1313. Lux, T., 1999. Multi-fractal processes as models for financial returns: A first assessment. Discussion Paper. University of Bonn. Mandelbrot, B., 1963. The variation on certain speculative prices. Journal of Business 36, 394­419. Mandelbrot, B., 1997a. Fractals and Scaling in Finance, Discontinuity, Concentration, Risk. Springer, New York. Mandelbrot, B., 1997b. Fractals, Hasard, et Finance. Flammarion, Paris. Mandelbrot, B., 1999. Survey of multifractality in Finance. Cowles Foundation Discussion Paper No. 1238. Yale University, New Haven. Mandelbrot, B., Wallis, J., 1968. Noah, Joseph, and operational hydrology. Water Resources Research 4, 909­918. Mansfield, P., Rachev S.T., Samorodnitsky, G., 1999. Long strange segments of a stochastic process and long range dependence. The Annals of Applied Probability, to appear. Marinelli, C., Rachev, S.T., Roll, R., 1999. Subordinated exchange rate models: Evidence for heavy-tailed distributions and long-range dependence. Marinelli, C., Rachev, S.T., Roll, R., Göppl, H., 1999. Subordinated stock price models: Heavy tails and longrange dependence in the high-frequency Deutsche Bank price record. McCulloch, H., 1971. Measuring the term structure of interest rates. Journal of Business 44 (1), 19­31. McCulloch, H., 1975. The tax-adjusted yield curve. Journal of Finance 30 (3), 811­830. Mittnik, S., Rachev, S.T., Paolella, M.S., 1997. Stable Paretian models in finance: Some theoretical and empirical aspects. In: Adler, R., et al. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques for Analyzing Heavy Tailed Distributions, Birkhäuser. Boston. Peters, E.E., 1994. Fractal Market Analysis: Applying Chaos Theory to Investment and Economics. Wiley. Rachev, S.T., Mittnik S., 2000. Stable Paretian Models in Finance. In: Wiley Series in Financial Economics. Wiley, New York. Rachev, S.T., Samorodnitsky, G., 1999. Long strange segments in long range dependent moving average. Stochastic Processes and their Applications, to appear. Rachev, S.T., Schwartz, E., Khindanova, I., 2000. Stable modeling of Value-at-Risk. In: Stable Models in Finance. Pergamon Press, to appear. Samorodnitsky, G., Taqqu, M.S., 1994. Stable Non-Gaussian Random Processes. Chapman & Hall, New York. Saunders, A., 1999. Credit Risk Measurement. New Approaches to Value at Risk and Other Paradigms. In: Wiley Frontiers in Finance. Wiley. Taqqu, M.S., Teverovsky, V., 1998. On estimating the intensity of long-range dependence in finite and infinite variance time series. In: Adler, R.J., Feldman, R., Taqqu, M.S. (Eds). A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Springer-Verlag. Teverovsky, V., Taqqu, M.S., Willinger, W., 1995. Estimators for long-range dependence: An empirical study. Fractals 3 (4) 785­788. Teverovsky, V., Taqqu, M.S., Willinger, W., 1998. A critical look at Lo's modified R/S statistic. Journal of Statistical Planning and Inference, to appear. Willinger, W., Taqqu, M.S., Erramilli, A., 1996. A bibliographical guide to self-similar traffic and performance modeling for modern high speed networks. In: Kelly, F.P., Zachary, S., Ziedins, I. (Eds.), Stochastic Networks: Theory and Applications. Claredon Press, Oxford, pp. 339­366. Willinger, W., Taqqu, M.S., Teverovsky, V., 1999. Stock Market prices and Long Range Dependence. Online Publication, Springer-Verlag, Berlin. Chapter 11 MULTIFACTOR STOCHASTIC VARIANCE MODELS IN RISK MANAGEMENT: MAXIMUM ENTROPY APPROACH AND LÉVY PROCESSES ALEXANDER LEVIN Group Risk Management, TD Bank Financial Group, Toronto e-mail: Alex.Levin@td.com ALEXANDER TCHERNITSER Bank of Montreal, Toronto e-mail: Alexander.Tchernitser@bmo.com. Contents Abstract 444 1. Review of market risk models 445 1.1. Market risk management and Value-at-Risk 445 1.2. Statistical properties of the market risk factors 447 1.3. A short review of stochastic volatility models 448 2. Single-factor stochastic variance model 450 2.1. Maximum entropy approach and Lévy processes 450 2.2. Generalized Gamma Variance model 456 2.3. Mean-reverting stochastic variance model 460 3. Multifactor stochastic variance model 463 3.1. Requirements for multifactor VaR models 463 3.2. "Nave" multifactor model 464 3.3. Elliptical stochastic variance model 465 3.4. Independent stochastic variances for the principal components 467 3.5. A model with correlated stochastic variances 468 3.5.1. Example 1. Joint distribution for DEM/USD and JPY/USD FX rates 471 3.5.2. Example 2. Twenty risk factors 471 3.6. Calibration for the GSV model 472 Acknowledgment 477 References 477 The views expressed in this chapter are those of the authors and not necessarily of the Bank of Montreal. Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 444 A. Levin and A. Tchernitser Abstract This chapter investigates a class of multifactor non-normalmodels for Market Risk Management, and, specifically, for Value-at-Risk (VaR) calculations, with stochastic variance (SV) driven by Lévy processes. Relevant statistical and dynamic properties for the risk factors are discussed. A short review of the Market Risk Management requirements and stochastic models for VaR is presented. In the case of one asset, a broad class of pure jump Generalized Gamma processes for the SV is derived from the Maximum Entropy principle. The corresponding family of Lévy processes for the risk factors (RF) possesses skewed leptokurtic marginal distributions with a wide range of heavy tails, from exponential and sub-exponential (stretched exponential) to polynomial. The introduced extended Generalized Gamma Variance family is a two shape parameter class of conditionally normal symmetric distributions (there is the third shape parameter in the case of non-zero skewness) with the SV represented as an arbitrary power (positive, zero or negative) of a gamma distribution. It includes normal, Variance Gamma (Generalized Laplace), Student t, and Weibull Variance Mixture distributions as special cases. Ornstein­Uhlenbeck type processes for the SV driven by positive Lévy noise and the corresponding term structure of the RF kurtosis and quantiles are considered for the purpose of modelling non-linear dependence in the asset returns. A general framework for constructing multidimensional conditionally Gaussian stochastic processes with the correlated multivariate stochastic variances that follow Lévy processes is considered. This methodology allows for different shape and tail behavior of the marginal RF and linear sub-portfolio distributions, exact fit into the RF correlation structure, and proper non-linear scaling of VaR for different holding periods. Presented empirical evidence for different markets confirms a good agreement between the model and historical RF distributions. Effective numerical calibration and Monte Carlo simulation procedures are developed. Ch. 11: Multifactor Stochastic Variance Models in Risk Management 445 1. Review of market risk models 1.1. Market risk management and Value-at-Risk Market Risk Management deals with the risk of potential portfolio losses due to adverse changes in the price of financial instruments caused by stochastic fluctuations of the market variables (JP Morgan, 1996; Basle Committee on Banking Supervision, 1997; Jorion, 2001; Crouhy, Galai and Mark, 2001). The are many types of general market and specific risk factors (RF) with different distributional properties and stochastic behavior in the foreign exchange, interest rate, commodity and equity markets. Market variables include, for example, stock prices, equity indices, spot foreign exchange rates, commodity prices, as well as complex aggregate structures: interest rate curves, commodity futures price curves, credit spread curves, implied volatility surfaces (e.g., European option implied volatility as a function of strike and maturity) or "cubes" (e.g., swaption implied volatility as a function of underlying swap tenor, swaption maturity and strike). Also, there are such "wild" and "exotic" market variables as, for example, electricity prices and interest rate or foreign exchange rate cross-correlations (the changes of latter variables effect the spread and cross-currency option prices). Proper modelling of the multivariate future RF distributions is important for financial institutions for the purpose of accurate estimation of the market risk, identification of the risk concentration, developing of trading and hedging strategies, portfolio optimization, consistent measurement of the risk adjusted performance for different units (Risk Adjusted Return On Capital (RAROC) and Capital-at-Risk methodologies), setting up the trading limits, calculating of the regulatory capital (Basle Committee on Banking Supervision, 1997), back-testing of the market risk models required by regulators (Basle Committee on Banking Supervision, 1996). Many financial institutions need to consistently estimate market risk for large portfolios and sub-portfolios (aggregation levels) that comprise hundreds of thousands of instruments dependent on thousands of risk factors in all markets. These portfolios usually include sub-portfolios of options, which magnify and non-linearly transform deviations of the underlyings. Modern Market Risk Management is interested in comprehensivemodelling of the multidimensional risk factor stochastic processes and marginal distributions for different time horizons rather than static multivariate distributions for some fixed holding period. This interest comes from the requirements to capture liquidity risk for many instrument types with varying liquidation periods [see Crouhy, Galai and Mark (2001)], estimate intraday risk for some frequently rebalanced positions, consistently evaluate VaR for one-day and ten-day time horizons prescribed by BIS documents (Basle Committee on Banking Supervision, 1996, 1997) for back-testing and regulatory capital calculations respectively, and actively dynamically manage risk. This problem points out on the importance of adequate modelling of a non-linear dependence in the underlying returns observed in the market to capture a proper VaR term profile. Along with the RF volatilities (standard deviations of daily changes) and correlations combined with the portfolio sensitivities [Greeks, Hull (1999)], the most widely accepted methodology for measuring market risk is the Value-at-Risk approach. The VaR can be 446 A. Levin and A. Tchernitser defined as the worst possible loss in the portfolio value over a given holding period (1 or 10 days) at the 99% confidence level (Jorion, 2001; Crouhy, Galai and Mark, 2001). Essentially, a mathematical model for VaR consists of two main parts: (1) modelling of proper multivariate risk factor distributions (processes) for the required time horizons; (2) evaluation of the portfolio (linear instruments, options and other derivatives) changes for the risk factor scenarios to produce a portfolio distribution. The evaluation part can be based on a full revaluation for the prices of instruments or partial revaluation methodologies [for example, Delta­Gamma­Vega approximation (Hull, 1999)]. Regulators also require complementing the VaR analysis with stress testing (scenarios for crashes, extreme movements in the market, stresses of volatilities and correlations, etc.). Traditional methods of the VaR calculation are analytical (variance­covariance)method (JP Morgan, 1996), historical simulation [combined with some bootstrapping procedures or other non-parametric methods (Crouhy, Galai and Mark, 2001)], and parametric Monte Carlo simulation approach [see Duffie and Pan (1997)]. Primarily developed for the "normal" market conditions (multivariate Gaussian distribution for the risk factors), the variance­covariance method can be applied only for linear portfolios. The variance­covariance method can be extended from multivariate normal to the non-normal elliptical RF distributions (see Section 3.3). VaR for option portfolios is usually calculated based on simulation approaches. In this chapter, we concentrate on the parametric modelling of the RF distributions based on the Monte Carlo simulation procedures given an appropriate portfolio valuation methodology. There are some market risk measures other than VaR closely related to the tails of the RF probability distributions, for example, Expected Shortfall [see Mausser and Rosen (2000)]. The Expected Shortfall is defined as an average loss calculated from the losses that exceed VaR. The Expected Shortfall, as a conditional mathematical expectation, is an example of so-called coherent risk measures [see Artzner et al. (1999)] that, contrary to VaR, possess a natural subadditivity property (total risk of entire portfolio should be less or equal to a sum of risks of all sub-portfolios). In some cases, Expected Shortfall reflects the market risk better than VaR (it gives an answer to the question, what is the average of the worst case losses that occur at the corresponding confidence level). This market risk measure is more sensitive to the tail behavior than VaR. In general, it is wrong to say that only tails of the underlying RF distributions are important for the VaR or other risk measures. For example, a left tail for the portfolio of some barrier options or even European near at-the-money options may mostly depend on the central part of the underlying distribution. Therefore, it is a necessity to accurately model all parts of the RF distributions, including peaks at the origin and tails. Due to short time horizons utilized in Market Risk Management (1­10 business days) contrary to Credit Risk Management with usual time horizons of years (Crouhy, Galai and Mark, 2001; Duffie and Pan, 2001), the market risk factors are defined as daily logreturns, relative or absolute changes in the underlying prices, rates or implied volatilities, rather than these underlyings themselves. Such long-term effects as mean-reversion in the interest rate, commodity price, and implied volatility dynamics (with characteristic times 1­20 years) are not taken into account in the VaR modelling. Most of financial variables are positive (although, spreads and interest rate differentials may be negative). Except some Ch. 11: Multifactor Stochastic Variance Models in Risk Management 447 rare situations (e.g., Japanese interest rates), daily changes for the underlyings are much less than 100% of the notional values, and, therefore, there is no need to apply any positive transformations to the market variables, like exponential or square transformations. Heuristically, this means that in most cases one can use "linear" RF simulation models for the VaR calculation. 1.2. Statistical properties of the market risk factors There is extensive empirical evidence that historical daily return distributions for different underlyings in the foreign exchange, interest rate, commodity, and equity markets have high peaks, "fat" tails (excess kurtosis, Figures 1 and 2) and skewness (right graph on Figure 2) contrary to the normal distribution [see, for example, Mandelbrot (1960), Fama (1965), Duffie and Pan (1997), Müller, Dacorogna and Pictet (1998), Barndorff-Nielsen and Shephard (2000b), Rachev and Mittnik (2000), Bouchaud and Potters (2000), Cont (2001)]. Also, it is well known that the volatility of these financial variables varies stochastically with clustering (Bollerslev, Engle and Nelson, 1994) (see Figure 3). These distributional properties have significant impact on Risk Management, specifically on VaR. A standard methodology usually used for the VaR calculation (JP Morgan, 1996) exploits a multivariate normal distribution as a proxy for the RF distributions. The standard model corresponds to stable market conditions when one can neglect large jumps of the underlyings and volatility fluctuations. This results in underestimating of the actual VaR by the standard methodology and breaching the back-testing. A comprehensive RF simulation model should additionally capture the following important features observed in the mar- ket: ­ different distributional shapes for different risk factors and markets (for example, short interest rates have much heavier tails, higher peaks and kurtosis than long term rates even for the same interest rate curve, Figure 1; some commodity price distributions deviate more from normal than others); ­ anomalously small normalization effect for large diversified portfolios contrary to the one predicted by the Central Limit Theorem (for example, S&P 500 Industrial Index or TSE 300 Index (Figure 2), viewed as large portfolios of stocks, have markedly nonnormal distributions with kurtosis about ten). This phenomenon points to a non-linear dependence between different risk factors [see also Embrechts, McNeil and Straumann (1999)]; ­ normalization of the risk factor distributions for longer holding periods [for example, ten-day return distributions are significantly closer to normal than daily return distributions, on the other hand, intraday change distributions are clearly more distant from normal than daily ones (Müller, Dacorogna and Pictet, 1998; Cont, Potters and Bouchaud, 1997; Mantegna and Stanley, 2000)]. A decreasing term structure of kurtosis points out to the same effect (Duffie and Pan, 1997; Bouchaud and Potters, 2000); ­ volatility clustering and non-linear time dependence in risk factor returns (for example, statistically significant autocorrelation in squares of virtually uncorrelated daily returns, see top graph on Figure 3 and Figure 10 in Section 2.3). 448 A. Levin and A. Tchernitser Fig. 1. Variety of distributional shapes for CAD BA interest rate daily returns. Fig. 2. Distributions for the CAD/USD FX and TSE 300 daily log-returns. 1.3. A short review of stochastic volatility models In this chapter we restrict consideration of the SV models to the case of continuous time models. Time series approaches (ARCH, GARCH, etc.) (Bollerslev, 1986; Bollerslev, Engle and Nelson, 1994) are beyond the scope of the chapter. L. Bachelier introduced the normal distribution and Brownian motion in finance in his Ph.D. Thesis (Bachelier, 1900) more than one hundred years ago. Brownian motion [that corresponds to a standard model for VaR (JP Morgan, 1996)] was rediscovered in finance Ch. 11: Multifactor Stochastic Variance Models in Risk Management 449 Fig. 3. Volatility clustering and large deviations in CAD/USD FX rate daily returns. in Osborne (1959), and then replaced by a Geometric Brownian motion for modelling of the stock dynamics (Samuelson, 1965). Without any doubt, the Black­Scholes­Merton (Black and Scholes, 1973) option pricing model has become a main tool in modern finance. Since well-known investigations of Mandelbrot (1960, 1963) and Fama (1965) on stable processes in the market, researchers have developed different approaches for modelling the abnormal behavior of the market variables. Fat-tailed distributions and jumps in the risk factors have been usually modelled by jump-diffusion processes (Merton, 1976, 1990; Bates, 1996; Kou, 2000), processes with diffusion stochastic volatility (Hull and White, 1987; Heston, 1993; Stein and Stein, 1991; Bates, 1991; Melino and Turnbull, 1990), mixtures of normal and other distributions (Duffie and Pan, 1997; Rachev and SenGupta, 1993; Albanese, Levin and Ching-Ming Chao, 1997), and other methods (Hull and White, 1998; Sornette, Simonetti and Andersen, 2000). Also, different types of non-Gaussian Lévy processes were used to describe the dynamics of underlyings [we refer to Bertoin (1996), Feller (1966), Lukacs (1970) and Sato (1999) for the theory of infinitely divisible distributions and Lévy processes]. Stable Paretian models in Finance were considered in Madelbrot (1960, 1963), Fama (1965), McCulloch (1978, 1996), Mittnik and Rachev 450 A. Levin and A. Tchernitser (1989), Willinger, Taqqu and Teverovsky (1999), Rachev and Mittnik (2000), and other works [see also Samorodnitsky and Taqqu (1994), Janicki and Weron (1994), Nolan (1998) for the theory, simulation and estimation of stable processes]. Since pioneering 1973 paper of Clark (1973), there have been a lot of research works on subordinated Lévy processes in finance: VG model (Madan and Seneta, 1990; Madan and Milne, 1991; Madan, 1999); Hyperbolic and Generalized Hyperbolic models (Barndorff-Nielsen, 1977, 1978, 1997, 1998; Eberlein and Keller, 1995; Embrechts, McNeil and Straumann, 1999; Eberlein and Raible, 1999) [see also Marinelli, Rachev and Roll (1999), Rachev and Mittnik (2000)]. A fine structure of asset returns from a Lévy process point of view was considered in Carr et al. (2000), Geman, Madan and Yor (1999, 1998) (CGMY model), Mantegna and Stanley (2000), Bouchaud and Potters (2000), Boyarchenko and Levendorskii (2000), BarndorffNielsen and Levendorskii (2001) (Truncated Lévy Flight). A general theory of conditionally normal stochastic variance and stochastic time change models is considered in Steutel (1970, 1973), Rosínski (1991), Maejima and Rosínski (2000), Barndorff-Nielsen and Pérez-Abreu (2000). Most papers discuss a one-dimensional case with applications to option pricing. However, multidimensional models with a large number of risk factors are of significance for Risk Management. This chapter presents a new class of multivariate VaR models with the SV driven by Lévy processes. 2. Single-factor stochastic variance model 2.1. Maximum entropy approach and Lévy processes Let a risk factor X denote a t-day absolute return, relative return, or log-return of the underlying market variable. Value-at-Risk over a given holding period t with a specified confidence level q (usually, q = 1%) is defined as a q-quantile of the distribution for the portfolio changes during the period t. For the standard model, a RF probability density function is normal with given constant mean and variance. We consider a class of conditional normal models where the variance V of the risk factor X is stochastic rather than constant. The stochastic variance of the underlying returns is not directly observable in the market. Generally, the most reliable information about the SV is its average value over some period of time. It can be estimated from the sampling variance of the underlying returns. Under conditions of uncertainty, it is reasonable to adopt a conservative approach, i.e., choose a probability distribution for the SV that provides the most uncertain outcomes given only information about the average value. A well-known measure of uncertainty associated with probability distributions is entropy (Kagan, Linnik and Rao, 1973). Therefore, it is reasonable to determine the SV distribution from the Maximum Entropy principle. A proposed single-factor SV model is based on the following assumptions (Levin and Tchernitser, 1999a): Ch. 11: Multifactor Stochastic Variance Models in Risk Management 451 Assumption 1. The density function, pX(x,T ), of the risk factor X = X(T ) for some holding period T is normal conditional upon the stochastic variance V = V (T ) that possesses a probability density function pV (v,T ), v 0, i.e., pX(x,T ) = 0 1 2v exp (x - v - T )2 2v pV (v,T )dv. (1) Parameter T specifies a constant part of the mean for the conditional normal distribution, and parameter defines a shift in the mean proportional to the SV. As is shown later, determines the correlation between the RF and SV that results in a skewed RF distribution. The case = 0 corresponds to a symmetric distribution. Linear dependence of the shift term v from v in the mean of normal density is important for further construction of a Lévy process for the RF. The stochastic representation for X is as follows: X(T ) = V (T )Z + V (T ) + T, Z N(0,1), (2) with Z being a standard normal random variable independent of V (T ). Assumption 2. The average variance E{V (T )} for the holding period T is known and equal to V : E V (T ) = 0 vpV (v,T )dv = V . (3) Assumption 3. The probability density function pV (v,T ) of the stochastic variance V (T ) is defined by the Maximum Entropy principle given the average variance (3): H(pV ) = - 0 pV (v)lnpV (v)dv max pV (v) 0 . (4) The optimization problem (4) for the SV density pV (v) subject to the constraint on the average variance (3) and standard normalization constraint 0 pV (v)dv = 1 has the exponential density pV (v) = 1 V exp - v V as a solution calculated by the Lagrange multiplier method (Kagan, Linnik and Rao, 1973). According to the Law of Total Probability, the unconditional density (1) of the risk factor X(T ) has the following density: pX(x,T ) = V exp |x - T | + (x - T ) , = V 2 + 2V . (5) 452 A. Levin and A. Tchernitser Distribution (5) is known as the skewed double exponential (Laplace) distribution (Kotz, Kozubowski and Podgórski, 2001). This distribution has a sharp peak, exponential tails and non-zero skewness for = 0. Kurtosis of a symmetric Laplace distribution is always equal to 6, in contrast to 3 for a normal distribution. Historical distributions of daily returns for many market variables, such as CAD/USD FX rate (Figure 2), JPY/USD FX rate, S&P 500 Index, TSE 300 Index (Figure 2), NYMEX Natural Gas futures prices, some LIBOR rates, etc., have a similar leptokurtic shape (Levin and Tchernitser, 1999a; Kotz, Kozubowski and Podgórski, 2001). In the case of a linear portfolio and symmetric Laplace distribution for the RF, the impact of non-normality on VaR can be estimated as VaRLaplace VaRNormal = ln(2q) 2zq , where zq is a standardized normal quantile for the confidence level q. For the case q = 1% (zq = 2.3263), VaRLaplace for a linear portfolio is 19% higher than the standard VaRNormal. The impact on VaR is even more pronounced for non-linear instruments. For example, for a non-linear perfectly delta-hedged option portfolio, (x), within Delta­Gamma approximation for the portfolio changes, (x) = 0.5 x2, the corresponding formulas for VaR are as follows: VaRLaplace = -V 4 ln2 (q), VaRNormal = -V 2 (zq/2)2 . This results in 60% higher VaRLaplace number than VaRNormal (Levin and Tchernitser, 1999a). The exponential distribution for the SV was derived from the Maximum Entropy principle for some unspecified holding period T . To calculate VaR for different holding periods t, a stochastic process for the risk factor X is required. The standard normal model assumes that the risk factor X follows a Wiener process with independent stationary Gaussian increments. The simplest extension of this assumption is that the RF follows a Lévy process, i.e., a stochastic process with independent stationary (not necessarily Gaussian) increments. It can be shown (Rosínski, 1991) that within the class of conditionally normal models (2) this assumption is equivalent to the following assumption on the SV: Assumption 4. The total stochastic variance V (t) in (2) follows a positive increasing Lévy process. The exponential distribution for the V (T ) is infinitely divisible. It uniquely determines a positive increasing pure jump Gamma process [see Sato (1999)] for the total stochastic variance V (t), t > 0, with a Gamma probability density function pV (t)(v) = vt-1 (t)t exp - v , (6) Ch. 11: Multifactor Stochastic Variance Models in Risk Management 453 where = 1/T , = V . Assumptions 1­4 define the corresponding Lévy process for the risk factor X(t) with the following probability density function: pX(x,t) = 2 2t-1 exp(y) (t)t |y|t-1/2 Kt-1/2 |y| , y = x - t . (7) Here is defined in (5), () is a gamma function, and K(y) is a modified Bessel function of the third kind of the order , K-(y) = K(y) (Abramowitz and Stegun, 1972). Distribution (7) is known as a Bessel K-function distribution (Johnson, Kotz and Balakrishnan, 1994) or as a Generalized Laplace distribution (Kotz, Kozubowski and Podgórski, 2001). Essentially, the SV model derived from the Maximum Entropy principle is equivalent to the Variance Gamma (VG) model [Gamma stochastic time change model, see Madan and Seneta (1990), Madan and Milne (1991), Geman and Ané (1996)]. The tail asymptotic behavior and behavior at the origin for the density (7) follows from known asymptotics for the modified Bessel function K(y) (Abramowitz and Stegun, 1972) K(y) y 2y e-y , K(y) y0 ()2-1 y, > 0, K0(y) y0 -ln(y). The RF density (7) has exponential tails for all t and a wide range of shapes at the origin, from almost normal "bell" shape (for large 1) to a highly peaked (0.5 < 1) and even unbounded shape (0 < 0.5) (see Figure 4). A skewed Laplace density (5) is a special case of (7) for t = T . The Bessel K-function family of distributions possesses finite moments of all orders. The characteristic function for the Gamma process has a simple form Fig. 4. Probability densities for the Gamma SV model. 454 A. Levin and A. Tchernitser X(t)() = E eiX(t) = exp(it) (1 - i + 2/2)t . (8) The Lévy density from the Lévy­Khintchine representation of X(t)() that characterizes the intensity of jumps of different sizes x has the following closed form [see Sato (1999)]: k(x) = |x| exp - 2 + 2|x| + x . The RF distribution (7) tends to a normal distribution for t +. This normalization effect is important for a proper VaR scaling from short holding periods to longer ones. The total variance DX(t) is proportional to time, as it is for any Lévy process with finite variance (Feller, 1966) (a "square root of time" rule for the volatility is valid). However, contrary to the Gaussian case, the ratios of q-quantiles and standard deviation for the RF distributions (7) are not constant for different holding periods t. For example, the standardized 1%-quantile (VaRVG) is higher for shorter holding period than the same 1%-quantile for longer holding period (Figure 5). The entropy for the SV distribution standardized by time t (the mean of a standardized SV is equal to 1 for all t) has the maximum at t = T (Figure 6) that corresponds to the exponential distribution. This property may be explained by transition of the standardized Gamma density from the delta-function at 0 to the delta-function at 1 as time t passes. Heuristically, this evolution of shape for the SV density corresponds to a transition from the state of maximum certainty at time 0 to the limiting state of maximum certainty at t = (with the limiting normal density for the standardized RF). The following expressions provide a connection between the first four moments of the RF distribution and those of the SV distribution (Levin and Tchernitser, 2000a): mX(t) = t + mV (t), DX(t) = mV (t) + 2 DV (t), (9) m3,X(t) = 3DV (t) + 2 m3,V (t) , m4,X(t) = 3m2 V (t) + 3DV (t) + 62 mV (t)DV (t) + 62 m3,V (t) + 4 m4,V (t). Fig. 5. VG model 1%-VaR term structure with respect to 1% Normal VaR = 2.33. Ch. 11: Multifactor Stochastic Variance Models in Risk Management 455 Fig. 6. Evolution of standardized Gamma SV density and entropy. The expressions (9) for the moments are valid for conditional normal models of the form (1) provided that the distribution pV (t)(v) for the stochastic variance V (t) possesses moments up to the fourth order. Parameter controls skewness of the RF distribution and defines the correlation X,V between the risk factor X and its stochastic variance V : X,V = DV mV + 2DV . A parameter estimation procedure (model calibration), with respect to the four parameters, , , , and can be based either on the Maximum Likelihood approach or the method of moments given four sampling central moments for the T1-day underlying returns and analytical expressions for the moments of the Gamma stochastic variance (Johnson, Kotz and Balakrishnan, 1994) mV (T1) = T1, DV (T1) = T12, m3,V (T1) = 2T13, m4,V (T1) = 3T14(T1 + 2). Equations (9) can be used for the model calibration by the method of moments. Note that time T , corresponding to the maximum entropy for the SV density, can be recovered from the calibrated parameter as T = 1/. It follows from (6) and (9) that the term structure of the RF variance and kurtosis for the symmetric case of the Gamma-SV model ( = 0) is: DX(t) = t, KurtX(t) - 3 = 3 t . (10) 456 A. Levin and A. Tchernitser 2.2. Generalized Gamma Variance model Some market variables exhibit jumps as large as 5 to 10 daily standard deviations (Fama, 1965; Bouchaud and Potters, 2000; Mantegna and Stanley, 2000; Cont, 2001). Such events have significantly lower theoretical probability to occur for the corresponding periods of observations not only for the normal model, but even for the Gamma SV model with exponential tails. Extremely large jumps in the risk factors have often been described by distributions with polynomial tails, specifically by stable distributions (Mandelbrot, 1960, 1963; Mittnik and Rachev, 1989, 2000). However, stable Paretian distributions do not have finite variance (volatility). This contradicts the majority of empirical observations [see Müller, Dacorogna and Pictet (1998)]. Also, volatility is a main tool in financial risk management and pricing. Therefore, heavy tailed distributions with finite variance are of considerable interest for the finance applications. An example of such distribution widely discussed in the financial literature is Student t-distribution (Platen, 1999; Albanese, Levin and ChingMing Chao, 1997; Rachev and Mittnik, 2000). A new family of the RF distributions introduced below includes t-distribution as a special case. The symmetric Gamma SV model considered above has only one shape parameter, , that controls both the tails and central part of the distribution. It seems that one shape parameter is insufficient to distinguish between sources of high kurtosis: whether it comes from heavy tails or high peak. It is possible to show that for a class of conditional normal models the tail asymptotics of the RF distribution depends upon the tail asymptotics of the corresponding SV distribution. Therefore, a more general SV model that allows for separate control for the tails and peak should more successfully describe large deviations of the risk factors. Note, that the Gamma SV density (6) can be formally derived from the Maximum Entropy principle (4) without Assumption 4. Instead, one can use a constraint on the logarithmic moment E{ln(V )} in addition to the condition on the average variance E{V } (Kagan, Linnik and Rao, 1973). Essentially, this logarithmic constraint defines a power behavior of the SV density at the origin, while the constraint on E{V } defines the exponential tail behavior. The condition on average variance can be replaced by a more flexible condition to accommodate information on a generalized moment of any power for the SV (Levin and Tchernitser, 2000a, b). For example, one can assume that the average volatility is known instead of average variance. This approximately corresponds to a constraint on the fractional moment E{ V } instead of E{V }. Hence, we can formally define the entropy maximization problem (4) with two essential constraints 0 ln(v)pV (v)dv = c0, 0 v1/ pV (v)dv = c1 (11) and a standard normalization constraint for a probability density function. The use of the Maximum Entropy approach with a constraint on the generalized moment E{V 1/}, R1, allows for a desirable generalization of the Gamma SV model to a broad class of models with a wide range of heavy tails, from exponential and sub-exponential (stretched Ch. 11: Multifactor Stochastic Variance Models in Risk Management 457 exponential) to polynomial (Levin and Tchernitser, 2000a, b). A solution of the maximization problem (4), (11) is the Generalized Gamma density for the stochastic variance V : pV (v) = v/-1 || () exp - v1/ . (12) The corresponding stochastic representation for V is a -th power of the Gamma distributed random variable with the density (6) [see Johnson, Kotz and Balakrishnan (1994)]: V = . (13) Stochastic representations (2) and (13) allow for an effective Monte Carlo simulation procedure for the RF given well-known simulation procedures for normal and gamma random variables (Fishman, 1996). The Generalized Gamma distribution is a very flexible class of distributions with two shape parameters and . This class includes Gamma ( = 1), Inverse Gamma ( = -1), and Weibull ( = 1, > 0) distributions as special cases. It is known that the Generalized Gamma distribution is infinitely divisible for these three representatives [see Grosswald (1976), Ismail (1977), Sato (1999)] and for positive max(,1) (Ismail and Kelker, 1979). Therefore, for these cases the Generalized Gamma distribution produces Lévy processes for the SV. We do not know if the Generalized Gamma distribution is infinitely divisible for arbitrary values of R1, nor whether there is a closed form representation for the characteristic function. Hence, we apply the distribution (12) to describe the returns for the shortest holding period available, say one day, and then construct an additive SV process for a longer holding period, say 10 days, by summing up the independent Generalized Gamma distributed random variables. An analytical formula for the moments of the Generalized Gamma distribution is readily available E V k = k ( + k) () (the condition for the k-moments to exist is ( + k) > 0). The corresponding RF density pX(x) is given by the integral (1) with SV density pV (v) being of the form (12). We call this density a Generalized Gamma Variance density (GGV). Unfortunately, in the general case there is no closed analytical form for the density pX(x). However, we consider an effective numerical procedure for calculating the integral (1) to be as good as, for example, a "closed form" formula (7) involving special K-Bessel functions. Effective asymptotic expansion methods (Olver, 1974; Abramowitz and Stegun, 1972) can be applied for the numerical calculations.1 In the case of a symmetric GGV density, there is an analytical formula for the moments of any fractional order k (finite for +k/2 > 0): 1 Effective numerical procedure and software for the GGV density calculation was developed by Xiaofang Ma. 458 A. Levin and A. Tchernitser E |X|k = 2k/2k/2 ((k + 1)/2) ( + k/2) () . (14) The moments cease to exist for some combinations of negative values of and > 0 because of polynomial tails for the GGV density. Below, we provide some results for a symmetric density pX(x). There are some known special analytical cases for pX(x): (i) = -1 corresponds to the t-distribution with 2 degrees of freedom; (ii) = 0 corresponds to the Gaussian distribution; (iii) = +1 corresponds to the K-Bessel function distribution (7). Table 1 presents a summary of results for the Generalized Gamma Variance model, including a constraint on the generalized moment in Maximum Entropy principle (column 1), SV stochastic representation (column 2), corresponding RF density (column 3), and asymptotics for the tails of the RF density (column 4). Some market variables are better described by distributions with polynomial tails, while others are better described by distributions with semi-heavy tails (exponential and subexponential) [see Platen (1999), Rachev and Mittnik (2000), Duffie and Pan (1997)]. The GGV model is capable of accommodating both types of behavior. A range of values < 0 corresponds to a power low tails. GGV density is finite at zero for all < 0. A range of values > 0 corresponds to exponential and sub-exponential tails. In this case, tails are far lighter and the moments of all orders exist. The range > 1 corresponds to a class of stretched exponential densities pX(x). The specific class of the stretched exponential distributions based on a modified Weibull density was considered in Sornette, Simonetti and Andersen (2000). Figure 7 shows the RF GGV density pX(x) for different values of parameters and . Parameter brings an extra flexibility to the GGV density: it is seen that GGV model can accommodate a wide variety of shapes and tail behavior. A statistical investigation of different SV models from a Generalized Hyperbolic family based on historical data for 15 stock market indices was presented in the paper by Platen (1999). The class of Generalized Hyperbolic distributions developed in Barndorff-Nielsen (1978, 1998), Eberlein and Keller (1995), Eberlein, Keller and Prause (1998) is also a two shape parameter family in symmetric case. All members of this family have exponential Table 1 GGV model summary Constr. E{V 1/} SV density & Stoch. rep. RF density RF asymptotics x E{V }, = 1 Gamma, V = K-Bessel x-1 e-cx E{ V }, = 2 Square of Gamma, V = 2 GGV(2,) x2/3-1 e-cx2/3 E{1/V }, = -1 Inverse Gamma, V = 1/ t-Distribution x-(2+1) E{V 1/}, > 0 Generalized Gamma, V = GGV(,) x2/(1+)-1 e-cx2/(1+) E{V 1/}, < 0 Generalized (Inv.) Gamma, V = GGV(,) x-(2/||+1) = 0 SV degenerates to V 1 Normal e-x2/2 Ch. 11: Multifactor Stochastic Variance Models in Risk Management 459 Fig. 7. Generalized Gamma Variance densities. Fig. 8. Historical and calibrated GGV densities for the CAD 3-month BA interest rate daily log-returns. tails except the Student t-distribution, which has polynomial tails. For this specific case, the class of Generalized Hyperbolic distributions collapses to a one shape parameter (number of degrees of freedom) family. Four representatives from a Generalized Hyperbolic class (Student t-distribution, Normal Inverse Gaussian, Variance Gamma, and Hyperbolic distributions) were compared based on the Maximum Likelihood criteria. The last three of these distributions have exponential tails. Results presented in Platen (1999) show that all distributions having exponential tails fail to satisfy the Pearson 2 test. In contrast, the t-distribution has not been rejected on a 99% confidence level for ten of the fifteen indices. Two-parameter Paretian tail GGV distributions perform better than the t-distribution. As an 460 A. Levin and A. Tchernitser Fig. 9. GGV model log-likelihood surface for S&P 500. example, Figure 8 demonstrates a fit for the Canadian 3-month BA interest rate daily return density (1992­1998) by normal, Student-t, and GGV densities calibrated using Maximum Likelihood approach. It is seen that GGV(,) with optimal parameters = -5.5 and = 15 outperforms t-distribution, and both GGV and t-distributions significantly outperform normal. The 2 value for the GGV(15,-5.5) is about 80% less than 2 value for the calibrated t-distribution. It is interesting to note, that during the period 1992­1998, Canadian 3-month BA interest rate exhibited 14 large daily moves greater than four standard deviations (about 1% of all observations). Another example (Figure 9) shows a GGV model log-likelihood surface for S&P 500 Index as a function of parameters and . A deep minimum for = 0 corresponds to the normal distribution, while two wings correspond to the power law ( < 0) and stretched exponential ( > 1) tailed distributions. For this example, a stretched exponential sub-class produces almost the same maximum likelihood value as a power law sub-class. 2.3. Mean-reverting stochastic variance model So far, we have considered a class of the SV models driven by Lévy processes with independent, identically distributed, but not necessarily Gaussian increments. The model explains non-normality of the RF distributions. For any conditional normal SV model, expressions (9) provide an exact answer for the term structures of the risk factor variance and kurtosis Ch. 11: Multifactor Stochastic Variance Models in Risk Management 461 DX(t) = mV (t), KurtX(t) - 3 = 3 DV (t) m2 V (t) . (15) Here V (t) is a total variance. Since mV (t) and DV (t) are linear functions of time for any Lévy process for V (t), the above expressions predict linear increase of the RF variance and hyperbolic decrease of its excess kurtosis. However, empirical investigations show that the underlying returns are almost uncorrelated, but not independent [see Bouchaud and Potters (2000), Cont (2001), Müller, Dacorogna and Pictet (1998)]. The easiest way to demonstrate this dependence is to consider the empirical correlations for the absolute values or squares of the returns. It is seen (Figure 10) that autocorrelations of squares are statistically significant. This phenomenon is connected with a known volatility clustering effect (Figure 3). Also, it is known that empirical term structure of kurtosis decreases slower than is predicted by a "Lévy term structure" model (15) [so called "anomalous decay", see Bouchaud and Potters (2000), Cont (2001)]. All this suggests that a better model for the instantaneous stochastic volatility is not a "white noise" kind of process, but rather a process with autocorrelation. One way to account for the autocorrelation structure of the SV is to consider regime-switching SV processes [see Konikov and Madan (2000)]. We will follow another approach to introduce the SV autocorrelation by considering Ornstein­Uhlenbeck (OU) type processes for the instantaneous SV (Levin and Tchernitser, 1999a, 2000a). Such class of non-Gaussian OU type processes driven by positive Lévy noise was investigated in detail in BarndorffNielsen and Shephard (2000a, b). In this section we will only demonstrate that the empirically observed term structure of kurtosis can be consistently described by such models. Consider a stationary non-negative process (t) with autocorrelation function R () that describes the instantaneous stochastic variance. For the total variance V (t) being V (t) = t 0 (t )dt , it follows that mV (t) = mV (1)t, DV (t) = 2 t 0 (t - )R ()d. The above expressions in conjunction with (15) can be used to calculate a term structure of the RF kurtosis. In particular, assume a mean-reverting process for the instantaneous stochastic variance (t) be a Ornstein­Uhlenbeck type process d(t) = -(t)dt + dG(t), (16) where G(t) is, for example, a Gamma process, > 0 is a mean-reversion speed parameter. Expressions for R () and variance DV (t) are as follows R () = 2 2 e-|| , DV (t) = 2t 2 1 - 1 - e-t t . 462 A. Levin and A. Tchernitser Fig. 10. Autocorrelation in the squared CAD/USD FX daily returns and in the SV. The autocorrelation function R () is an exponential function for any OU model (16). It is seen that DV (t) is not a linear function of time contrary to the Lévy case (10). Previous formulas and formulas (15) result in the following term structure of the RF kurtosis: KurtOU X(t) - 3 = 3 t 1 - 1 - e-t t . Figure 11 shows a term structure of the RF kurtosis for different values of the meanreversion speed parameter . As expected, the OU stochastic variance process provides Ch. 11: Multifactor Stochastic Variance Models in Risk Management 463 Fig. 11. Term structure of the RF kurtosis for the model with autocorrelated SV. slower decay of kurtosis vs. Lévy SV process. Reduction in decay can be significant depending upon the mean-reversion speed . This is equivalent to slower "normalization" effect. The bottom graph in Figure 10 presents the time series for the simulated SV and empirical CAD/USD FX rate squared daily log-returns. The bottom graph in Figure 3 presents the simulated RF time series. It is evident that the model produces large deviations for the FX rate and volatility clustering effect that is very similar to the one observed in the market (top graph in Figure 3). 3. Multifactor stochastic variance model 3.1. Requirements for multifactor VaR models A realistic multifactor VaR model should consistently describe not only the correlation and volatility structure for the risk factors, but also different shapes of the marginal risk factor distributions and distributions in other "diagonal" directions. Also, a principal component analysis for daily returns in different markets (interest rate curves, commodity futures prices, implied volatility curves and surfaces), clearly indicates the presence of non-linear dependence between risk factors (principal components). For example, the squared daily changes of the principal components are significantly correlated, while daily changes themselves are uncorrelated. This non-linear dependence breaks conditions of the Central Limit Theorem and has an important impact on VaR calculation: even for well-diversified linear portfolios with a large number of instruments there is no full normalization of the portfolio return distributions (Levin and Tchernitser, 1999a, b). An example of such large diversified portfolio is the S&P 500 Index. Its distribution is quite far from normal despite the portfolio averaging effect. Hence, a comprehensive model for multiple risk factors should additionally capture the following important features observed in the market: 464 A. Levin and A. Tchernitser * exact match of a given volatility and correlation structure of the risk factors; * approximate match of shapes, kurtosis, and tails for different risk factors (marginal dis- tributions); * approximate match of shapes, kurtosis, and tails for different linear sub-portfolios (marginal distributions in diagonal directions). The model should also allow for an effective Monte Carlo simulation procedure. To facilitate further multivariate analysis, in the sequel, we shall consider the case of symmetric joint probability distributions for the RF returns. 3.2. "Nave" multifactor model A very simple idea for constructing a multivariate conditionally Gaussian stochastic variance model is to define a distribution for the vector of risk factors X(t) RN as a multivariate normal with some fixed correlation matrix R and independent stochastic variances Vi(t), i = 1,...,N. A symmetric multivariate probability density function for the vector of risk factors is represented as: pX(t)(x) = V1 VN 1 (2)N det(C) × exp x C-1x 2 pV (V1,...,VN )dV1 dVN , (17) C = R , = diag V1(t),..., VN (t) . (18) Here pV (V1,...,VN ) = N i=1 pVi (Vi) is a probability density for independent stochastic variances Vi(t), x is transpose of x. The corresponding stochastic representation for the risk factors X(t) is X(t) = diag V1(t),..., VN (t) AZ, AA = C, Z N(0,I), (19) where Z N(0,I) is independent of V standard normal vector with identity covariance matrix I. This representation allows for modelling marginal distributions with different leptokurtic shapes. However, it can be shown that this "nave" approach reduces the correlations between risk factors because of "randomization" for the covariance matrix (Levin and Tchernitser, 1999a). Due to independence of the stochastic variances Vi, absolute values of the model correlations Corr(Xi,Xj ) are less than absolute values of the correlations Rij used in (17): Cov(Xi,Xj ) = xixj pX(x)dxi dxj = Rij Vi Vj pV (Vi,Vj )dVi dVj = Rij VipVi (Vi)dVi Vj pVj (Vj )dVj = fij X,iX,j Rij , i = j. (20) Ch. 11: Multifactor Stochastic Variance Models in Risk Management 465 Table 2 Correlation reduction factors i = j 0.5 1 2 5 10 f (i,j ) 0.64 0.79 0.88 0.95 0.98 Reduction factors fij , i = j, are less than one, because Vi1pVi (Vi)dVi < VipVi (Vi)dVi 1pVi (Vi)dVi = E{Vi} = X,i. It means that the sampling correlation matrix cannot be used as the matrix R in (17). For example, the reduction factors fij < 1, i = j, calculated explicitly for the case of the Gamma stochastic variances (6) are as follows: Corr(Xi,Xj ) = fij Rij , fij = f (i,j ) = (i + 1/2) (j + 1/2) (i) (j ) ij , i = j. The underestimation of the correlations can be significant for some values of parameters i, j as it is shown in Table 2. The randomization effect exists for any probability density functions pVi (Vi) for independent stochastic variances. Usually, equations Corr(Xi,Xj ) = fij Rij cannot be resolved with respect to correlations Rij given sampling correlations Corr(Xi,Xj ) while preserving the necessary conditions |Rij | 1 or non-negative definite matrices R. Hence, this "nave" model does not allow to preserve historical correlations between the risk factors. Remark. Equation (20) and the inequality Vi Vj pV (Vi,Vj )dVi dVj < Cov(Vi,Vj ) + E{Vi}E{Vj } imply that the class of the SV models with the stochastic representation (18) for the covariance matrix preserves the RF correlation structure only if Vi Vj pV (Vi,Vj )dVi dVj = E{Vi} E{Vj }, which requires dependent stochastic variances with positive correlations. We do not investigate this direction in the chapter. 3.3. Elliptical stochastic variance model The simplest extension of a single-factor SV model to the multifactor case is an elliptical stochastic variance model. Elliptical models are widely used for representing non- 466 A. Levin and A. Tchernitser normal multivariate distributions in finance [see Eberlein, Keller and Prause (1998), Kotz, Kozubowski and Podgórski (2001)]. This class of models preserves the observed RF correlation structure. The model is similar by construction to the one-dimensional variance mixture of normals. An elliptical N-dimensional symmetric process XE(t) for N risk factors has a stochastic representation as a single variance mixture of multivariate normals with a given covariance matrix C: XE (t) = V (t)ZC , ZC N(0,C). (21) Here V (t) is a univariate SV process, ZC is a multivariate normal N-dimensional vector independent of V (t). The covariance matrix C is estimated from historical T1-day returns (e.g., daily returns), while the SV is normalized to satisfy a condition mV (T1) = E{V (T1)} = 1. The unconditional density for the random vector of risk factors XE(t) is: pXE(t)(x) = 0 1 (2V )N det(C) exp x C-1x 2V pV (t)(V )dV. As an example, consider the case of Gamma stochastic variance V (t). A closed analytical form for the unconditional elliptical Bessel K-function density for XE(T ) is available in Kotz, Kozubowski and Podgórski (2001). A characteristic function XE(t)() for the elliptical Lévy process XE(t) is represented as: XE(t)() = 1 + 2 C -t , (22) where is N-dimensional vector, is a vector transposed to . Due to known properties of elliptical distributions [see Fang, Kotz and Ng (1990)], all marginal one-dimensional distributions for the risk factors are univariate Bessel K-function distributions with the same shape parameter t and the same kurtosis. They differ only by the standard deviations. The same property holds for all one-dimensional distributions of linear combinations X = XE(t) of the risk factors. These linear combinations correspond to the linear portfolios defined by . The kurtosis of X(T1) for any arbitrary is equal to k = 3(1 + DV (T1)/m2 V (T1)) = 3(1 + DV (T1)). Therefore, within the class of elliptical models there is no normalization effect at all for the distributions of large diversified portfolios. This is a result of violation of the conditions for the Central Limit Theorem: the risk factors are dependent through the common stochastic variance V . Such property is a drawback for all elliptical models. It is clear that the actual RF fluctuations are not driven by a single stochastic variance ("global market activity"). More realistic SV model should include a multidimensional processes for the SV to model different distributional shapes for the risk factors and linear sub-portfolios. Since sampling marginal RF distributions have different shapes, the calibration of elliptical model is restricted to fitting a distribution of some preselected portfolio. Hence, the calibration of elliptical models is portfolio dependent. Ch. 11: Multifactor Stochastic Variance Models in Risk Management 467 3.4. Independent stochastic variances for the principal components One of the possible ways to model different shapes for the RF distributions while preserving a given correlation structure was considered in Levin and Tchernitser (1999a, b). An N-dimensional vector of the risk factors is represented as a linear combination of principal components (PC) with independent one-dimensional stochastic variances. The corresponding stochastic representation is as follows: XL (t) = ~AZI (t), ZI i (t) = Vi(t)Zi, Zi N(0,1), i = 1,...,M. Here Zi are independent standard normal variables, Vi(t) are independent SV processes with a unit mean and some variances DV i for a specified time horizon T1. The columns of a constant matrix ~A are the principal components of a given covariance matrix C. The covariance matrix C is estimated from the historical T1-day returns. Matrix ~A is calculated based on eigenvalue decomposition of the covariance matrix C [see Wilkinson and Reinsch (1971)]: C = UDU , U = U-1 , D = diag(d1,...,dN ), (23) ~A = UMD 1/2 M , DM = diag(d1,...,dM), M N, C = ~A ~A . (24) Matrix UM consists of the first M columns of the orthogonal matrix U, which correspond to the first M largest eigenvalues d1,...,dM of the matrix C. Number M may be chosen less than N if the matrix C is singular and has only M non-zero eigenvalues. Some numerical issues related to singularity of the matrix C were considered in Kreinin and Levin (2000). It follows from the construction of the process XL(t) that Cov(XL(T1)) = C. This ensures an exact match of the sampling covariance matrix C. One can keep even a smaller number M of the principal components in (24) and recover the matrix C with the required accuracy. A characteristic function for the model is a product of the characteristic functions of onedimensional processes for the PCs. For example, a characteristic function for the Gamma SV model with independent SV has a form XE(t)() = M i=1 1 + ( ( ~A)i)2 2 -i t , where ( ~A)i is i-th column of the matrix ~A. The matrix ~A can be defined up to an arbitrary orthogonal transformation H without change of the covariance matrix C since Zi are independent standard normal variables, Zi and Vi(T1) are independent and E{Vi(T1)} = 1. Hence, E{ZI (T1)ZI (T1) } = I and E XL (T1)XL (T1) = ~AHE ZI (T1)ZI (T1) H ~A = ~AHH-1 ~A = C 468 A. Levin and A. Tchernitser for any orthogonal matrix H . However, the matrix H influences a matrix of the fourth moments of XL(t), Kij = E{(XL)2 i (XL)2 j }. The orthogonal matrix H and shape parameters for the Vi can be determined to approximate a given sampling matrix {Kij } of the fourth moments for the RF distribution (all moments E{(XL)3 i (XL)j } are equal to zero for symmetric distributions). An explicit calculation yields: Kij = E XL 2 i XL 2 j = 3 M k=1 a2 ika2 jkDV k + CiiCjj + 2C2 ij , i,j = 1,...,N, (25) where aik are the elements of the matrix A = ~AH . An effective method for calculating the matrix H and shape parameters is discussed in Section 3.6 below. The model provides an exact match of the RF correlation and volatility structures and approximates different shapes and kurtosis of the marginal RF distributions contrary to the Elliptical model. However, there is a significant drawback for this model. Since the stochastic variances Vi are independent, there is a strong normalization effect in any "diagonal" direction. This means that some linear portfolios X(t) = XL(t) have almost normal distributions whenever the portfolio Delta, , is not a marginal direction and the number of principal risk factors M is large enough. Described effect presents a real danger, because the non-normal marginal RF distributions may be well-approximated, while the modelled portfolio distributions (contrary to the actual sampling distributions) may be almost normalized and the VaR underestimated. 3.5. A model with correlated stochastic variances As it was pointed out above, a more general and realistic market model should incorporate the correlated stochastic variances that can correct the deficiencies of both Elliptical model and the model with independent SV for the principal components. The correlated SV structure should allow modeling of some general economic factors as well as idiosyncratic components that drive the SV processes for different risk factors and markets. The model is defined via stochastic representation of the following form (Levin and Tchernitser, 2000a, b): XCV (t) = AZI (t), ZI i (t) = Vi(t)Zi, Zi N(0,1), i = 1,...,M. (26) Here Zi are independent standard normal variables, Vi(t) are the correlated stochastic variance processes with a unit mean for a specified time horizon T1. The matrix A RN×M is defined as in the previous section through the eigenvalue decomposition for the covariance matrix C up to an arbitrary orthogonal transformation H RM×M : C = Cov XCV (T1) = AA = ~A ~A , A = ~AH, H = H-1 . Ch. 11: Multifactor Stochastic Variance Models in Risk Management 469 Stochastic variances Vi(t) are correlated to each other due to the following stochastic rep- resentation: Vi(t) = L k=1 bikk(t), L k=1 bik = 1, bik 0, B RM×L , (27) where k(t) are independent positive increasing Lévy processes with unit mean for the time horizon T1 and different shape parameters, and B is a constant matrix with non-negative elements. The processes k(t) are the drivers for the SV processes Vi(t). For example, each driver k can be a Gamma process or Generalized Gamma process. Linear structure in (27) with bik 0 ensures that Vi(t) are positive increasing Lévy processes. The normalization conditions E{k(T1)} = 1 and bik = 1 ensure, as in Section 3.4, exact recovering of the sampling covariance matrix for the risk factors. It follows, that the vector of stochastic variances V (T1) has covariance matrix CV equal to CV = Cov V (T1) = BD B , D = diag(D1,...,DL), Dk = Var k(T1) . (28) The multivariate Generalized Stochastic Variance (GSV) model (26), (27) has two levels of correlations. First level defines usual correlations across the risk factors described by the covariance matrix C. Second level defines the correlations across the stochastic variances described by the covariance matrix CV . The second level of correlations provides a possibility to obtain an approximate, but consistent match of the higher order moments and shape of the RF multivariate distribution. The elliptical model and the model with independent stochastic variances are the special cases of the above GSV model. Elliptical model corresponds to the matrix B being equal to one column with all unit entries, B = [1,...,1] . The model with independent SV corresponds to the case when the matrix B is equal to the identity matrix, B = I. There is no analytical form for the probability density function of the vector XCV(t) even for the Gamma drivers k(t). However, a characteristic function XCV (t)() can be calculated as XCV (t)() = RL + exp - 1 2 Adiag(B)A p(t)()d = L j=1 + 0 exp - j 2 M i=1 bij N k=1 Akik 2 pj (t)(j )dj . The expression for the characteristic function above is equivalent to XCV (t)() = L j=1 + 0 exp - j 2 Cj pj (t)(j )dj , 470 A. Levin and A. Tchernitser where Cj , j = 1,...,L, are certain positive semi-definite matrices. The latter expression for the characteristic function allows for a different interpretation of the GSV model. It shows that the process for the risk factors XCV(t) can be presented as a sum of L independent elliptical Lévy processes. In turn, each of these elliptical processes has a multivariate conditional normal distribution with a covariance matrix proportional to Cj and the corresponding stochastic variance j (t). The kurtosis k of a linear combination of the risk factors X(T1) = XCV (T1) for any given direction can be calculated analytically: k - 3 = E{X4 (T1)} E{X2 (T1)}2 - 3 = 3 CV = 3 BD B , RM , k = ( A)2 k A 2 , k = 1,...,M. (29) The above expression provides a link between the covariance matrix CV and the kurtosis k, that characterizes the shape of the RF multivariate distribution for the linear portfolio with Delta equal to . Another useful quantity that clarifies the role of the correlated variances Vi is a standardized matrix of the fourth moments. This matrix, {kij }, is a multidimensional analog for kurtosis kij = E{(XL)2 i (XL)2 j } E{(XL)2 i }E{(XL)2 i } . (30) The matrix {kij } incorporates kurtosis in all marginal and all pair-wise diagonal directions in the original risk factor space. It is expressed as kij - 1 + 22 ij = M k=1 M l=1 2 ik2 jl Cov(VkVl) + 2 M k=1 M l=1 ikjkiljl Cov(VkVl), ik = aik ai , ai 2 = M k=1 a2 ik, (31) where ij is a correlation between i-th and j-th risk factors. Formulas (29) and (31) clearly indicate that the correlation structure of the SV is embedded into the correlation structure of the fourth moments of the RF distribution. This connection will be used as the base for the GSV model calibration. A number L of the SV drivers can be chosen significantly smaller than a number of stochastic variances M and risk factors N. These SV drivers may be thought as "stochastic activities" for different countries, industries, sectors, etc. The GSV model with the correlated stochastic variances is, in fact, a general framework. It can incorporate any reasonable processes to represent the SV drivers k(t), k = 1,...,L. Some examples of suitable one-dimensional SV driver distributions are: Inverse Gaussian Ch. 11: Multifactor Stochastic Variance Models in Risk Management 471 distribution (Barndorff-Nielsen, 1997), Gamma distribution (Madan and Seneta, 1990; Levin and Tchernitser, 1999a), Lognormal distribution (Clark, 1973), or considered above class of Generalized Gamma distributions. The GSV model is practical in terms of effective Monte Carlo simulation: it is based on the simulation of one-dimensional SV processes and standard multivariate normal variables. 3.5.1. Example 1. Joint distribution for DEM/USD and JPY/USD FX rates The first example presents a bivariate GSV model applied to the foreign exchange market data. Four bivariate models were examined for DEM/USD and JPY/USD FX rate daily returns: Standard Gaussian model, Elliptical Gamma Variance model, model with independent stochastic variances for PCs, and the model with correlated SV. Figures 12 and 13 show a 3-D plot and a contour plot of the joint probability density for the historical data and four types of the models considered. All three SV models provide a far better fit than the Gaussian distribution. However, the most convincing fit is provided by the GSV model with the correlated stochastic variances. Marginal distributions for DEM/USD and JPY/USD FX rates have kurtosis 5.2 and 6.9 respectively. Figures 14 and 15 show that the latter model is able to capture kurtosis and shape of marginal distributions in different directions. 3.5.2. Example 2. Twenty risk factors The second example examines a 20-dimensional GSV model with correlated SV applied to the data from the interest rate, FX rate, and equity markets. The USD and CAD zero interest rate curves each consisting of nine interest rates, CAD/USD FX rate, and S&P 500 Index were chosen as a representative set of the risk factors. There were 5 years (1994­ 1999) of daily historical data used for the model calibration (about 1,250 data points). Figure 16 presents statistical results for principal component analysis and the correlation matrix for squares of the first three PCs. These results indicate that uncorrelated PCs neither are normal nor independent. The first three "largest" PCs per zero curve were used for the GSV model calibration and simulation. Three Gamma distributed drivers k, k = 1,2,3, with different shape parameters were utilized to represent each stochastic variance Vi, i = 1,2,3, for PCs. Therefore, the following values for parameters were assigned: number of risk factors N = 9×2+1+1 = 20, number of principal risk factors M = 3×2+1+1 = 8, number of SV drivers L = 3. The model was calibrated to match kurtosis (in the least squares sense) for all 20 risk factors and kurtosis for 15 additional linear sub-portfolios. Sampling kurtosis varies within a wide range from 5 to 25. Typically, kurtosis for short-term interest rates is much higher than kurtosis for long-term rates. It is seen (Figure 17) that the GSV model reproduces this typical decreasing kurtosis term structure quite well. It is also seen that the model adequately matches kurtosis of the FX rate and S&P 500 Index, as well as kurtosis of different linear sub-portfolios. To compare, the standard multi-dimensional Gaussian model produces a flat kurtosis term structure identically equal to three. 472 A. Levin and A. Tchernitser Fig. 12. Joint density for DEM/USD and JPY/USD FX rate. 3.6. Calibration for the GSV model The GSV model is a two-level model that incorporates a traditional variance­covariance structure of the risk factors and novel variance­covariance structure of the RF stochastic variances. The GSV model with correlated SV automatically preserves the RF covariance matrix C. At the second level, it is necessary to calibrate the SV covariance matrix CV to approximate the fourth moments of the multivariate RF distribution. The main steps of the model calibration procedure are as follows: Ch. 11: Multifactor Stochastic Variance Models in Risk Management 473 Fig. 13. Contour plot for the DEM/USD­JPY/USD FX rate joint density. 1. Calculate a sampling covariance matrix C RN×N for a given set of risk factors X. The time window usually used for calibration of the covariance matrix C is about 1­2 years. Exponentially weighted averaging or uniform sliding window are the usual methods for the covariance matrix calculation (JP Morgan, 1996). 2. Decompose the sampling covariance matrix C using a standard eigenvalue decomposition procedure and form a matrix ~A RN×M from a set of M eigenvectors corresponding to M largest eigenvalues. Number M has to be chosen to recover the matrix C with a required accuracy. 474 A. Levin and A. Tchernitser Fig. 14. DEM/USD and JPY/USD FX marginal distributions. Fig. 15. Fit of the kurtosis for different sub-portfolios. Ch. 11: Multifactor Stochastic Variance Models in Risk Management 475 Fig. 16. PCA for USD zero curve. Fig. 17. Fit of the kurtosis. 476 A. Levin and A. Tchernitser 3. Calculate sampling fourth order moments for the risk factors X (the matrix kij in (30)) and kurtosis k for any preselected set of directions (linear sub-portfolios) {}. The time window typically required for calculation of the fourth moments is of the order 5­ 10 years. This period of observations has to be much longer than the one for the second order moments. This is necessary to incorporate relatively rare extreme events into the calibration. Longer time horizon allows for an adequate approximation of the tails and general shape of the multivariate RF distribution. 4. Calculate matrices H , B, and D using the least squares approach: i wi ^ki - ki(H,B,D ) 2 + i j wij ^ke ij - ke ij (H,B,D ) 2 min H,B,D , (32) where wi, and wij are some predefined weights (these weights may be chosen depending on the importance of particular risk factors and sub-portfolios), ^ki is the sampling kurtosis for the direction i, ki(H,B,D ) is the analytical estimate (29), ^ke ij is the sampling matrix of the fourth moments, and ke ij (H,B,D ) is its analytical estimate (31). The minimization problem above is a subject to constraints imposed on the matrices H , B, D . The most difficult condition to satisfy is orthogonality of the matrix H . It follows from the analysis of expressions (29) and (31) that M × M orthogonal matrix H can be constructed as a product of M × (M - 1)/2 elementary rotation matrices (Wilkinson and Reinsch, 1971). It can be shown that for the problem (29), a representation for the orthogonal matrix H does not require reflections. The diagonal matrix D is subject to simple non-negativity constraints. The matrix B is subject to constraints (27). Hence, the non-linear optimization problem (32) can be re-formulated with respect to M × (M - 1)/2 angles m for the elementary rotation matrices with simple constraints - m and elements of the matrices B and D with mentioned above simple constraints. 5. If the Gamma Variance model for the SV drivers k is adopted, the diagonal matrix D and conditions E{k(T1)} = 1 determine the shape and scale parameters k and k in (6). For the GGV model, the powers k R1 have to be additionally specified. As a practical approach, the following methodology has been adopted: a set of parameters {k} is fixed in such a way that it covers a reasonably wide range of values k. For example, the set of k can be chosen as {k} = {-2,-1,+1,+2}. This choice is justified by the fact that the SV drivers k with negative values of k will produce the RF probability density function with heavy polynomial tails. On the other hand, positive values of k can produce the RF distributions with semi-heavy exponential and sub-exponential tails, but with unbounded peaks at the origin. However, it is quite possible that a more flexible and adjustable structure for the set of parameters {k} is more beneficial for the model calibration. Ch. 11: Multifactor Stochastic Variance Models in Risk Management 477 Acknowledgment Authors thank C. Albanese, O. Barndorff-Nielsen, D. Duffie, P. Embrechts, H. Geman, D. Madan, J. Nolan, and, especially, M. Taqqu for interesting discussions and useful comments related to presented models. References Abramowitz, M., Stegun, I.A. (Eds.), 1972. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards, Washington, DC. Albanese, C., Jaimungal, S., Rubisov, D.H., 2001. A jump model with binomial volatility. Working Paper. University of Toronto­Math-Point, Toronto. Albanese, C., Levin, A., Ching-Ming Chao, J., 1997. Bayesian Value at Risk, back-testing and calibration. Working paper. Bank of Montreal­University of Toronto, Toronto. Anderson, T.W., 1984. An Introduction to Multivariate Statistical Analysis, 2nd edition. Wiley, New York. Ané, T., Geman, H., 1999. Stochastic volatility and transaction time: an activity-based volatility estimator. Journal of Risk 2, 57­69. Ané, T., Geman, H., 2000. Order flow, transaction clock and normality of asset returns. Journal of Finance 55, 2259­2284. Artzner, P., Delbaen, F., Eber, J.-M., Heath, D., 1999. Coherent measures of risk. Mathematical Finance 9 (3), 203­228. Bachelier, L., 1900. Theory of speculation. Ph.D. Thesis. English translation. In: Cootner, P.H. (Ed.), The Random Character of Stock Market Prices. MIT Press, Cambridge, MA, 1964. Barndorff-Nielsen, O., 1977. Exponentially decreasing distributions for the logarithm of particle size. Proceedings of the Royal Society London. Series A 353, 401­419. Barndorff-Nielsen, O., 1978. Hyperbolic distributions and distributions on hyperbolae. Scandinavian Journal of Statistics 5, 151­157. Barndorff-Nielsen, O., 1997. Normal Inverse Gaussian distributions and stochastic volatility modelling. Scandinavian Journal of Statistics 24, 1­14. Barndorff-Nielsen, O., 1998. Processes of Normal Inverse Gaussian type. Finance and Stochastics 2, 41­68. Barndorff-Nielsen, O., Levendorskii, S., 2001. Feller processes on Normal Inverse Gaussian type. Quantitative Finance 1, 318­331. Barndorff-Nielsen, O., Pérez-Abreu, V., 2000. Multivariate type G distributions. Working Paper. Centre for Mathematical Physics and Stochastics, University of Aarhus, Aarhus. Barndorff-Nielsen, O., Shephard, N., 2000a. Non-Gaussian OU based models and some of their use in financial economics. Working Paper. Centre for Mathematical Physics and Stochastics, University of Aarhus, Aarhus. Barndorff-Nielsen, O., Shephard, N., 2000b. Modelling by Lévy processes for financial econometrics. Working Paper. Centre for Mathematical Physics and Stochastics, University of Aarhus, Aarhus. Basle Committee on Banking Supervision, 1996. Supervisory framework for the use of backtesting in conjunction with the internal models approach to market risk capital requirements. January, http://www.bis.org. Basle Committee on Banking Supervision, 1997. International Convergence of Capital Measurements and Capital Standards. July 1988, amended in April 1997, http://www.bis.org. Bates, D., 1991. The crash of '87: was it expected? The evidence from option markets. Journal of Finance 46, 1009­1044. Bates, D., 1996. Jumps and stochastic volatility: exchange rate processes implicit in Deutsche Mark options. Review of Financial Studies 9, 69­107. Bertoin, J., 1996. Lévy Processes. Cambridge University Press, Cambridge. Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637­654. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307­ 327. 478 A. Levin and A. Tchernitser Bollerslev, T., Engle, R., Nelson, D., 1994. ARCH models. In: Engle, R., McFadden, D. (Eds.), Handbook of Econometrics, Vol. 4. Elsevier, Amsterdam. Bouchaud, J.-P., Potters, M., 2000. Theory of Financial Risks. Cambridge University Press, Cambridge. Boyarchenko, S., Levendorskii, S., 2000. Option pricing for truncated Lévy processes. International Journal of Theoretical and Applied Finance 3, 549­552. Buchen, P., Kelly, M., 1996. The maximum entropy distribution of an asset inferred from option prices. Journal of Financial and Quantitative Analysis 31, 143­159. Carr, P., Geman, H., Madan, D., Yor, M., 2000. The fine structure of asset returns: an empirical investigation. Working paper. University of Maryland, College Park. Clark, P., 1973. A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135­155. Cont, R., 2001. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance 1, 223­236. Cont, R., Potters M., Bouchaud J.-P., 1997. Scaling in stock market data: stable laws and beyond. In: Dubrulle, B., Graner, F., Sornette, D. (Eds.), Scale Invariance and Beyond, Proceedings of the CNRS Workshop on Scale Invariance. Springer. Crouhy, M., Galai, D., Mark, R., 2001. Risk Management. McGraw-Hill, New York. Duffie, D., Pan, J., 1997. An overview of Value at Risk. The Journal of Derivatives 4, 7­49. Duffie, D., Pan, J., 2001. Analytical Value-at-Risk with jumps and credit risk. Finance and Stochastics 5, 155­180. Eberlein, E., Keller, U., 1995. Hyperbolic distributions in finance. Bernoulli 1, 281­299. Eberlein, E., Keller, U., Prause, K., 1998. New insights into smile, mispricing and Value at Risk: the hyperbolic model. Journal of Business 71, 371­405. Eberlein, E., Raible, S., 1999. Term structure models driven by general Lévy processes. Mathematical Finance 9 (1), 31­53. Embrechts, P., McNeil, A., Straumann, D., 1999. Correlation and dependency in risk management: properties and pitfalls. ETH Preprint, Zürich. Fama, E., 1965. The behavior of stock market prices. Journal of Business 38, 34­105. Fang, K., Kotz, S., Ng, K., 1990. Symmetric Multivariate and Related Distributions. Chapman and Hall, London. Feller, W., 1966. An Introduction to Probability Theory and its Applications, Vol. 2. Wiley, New York. Feuerverger, A., McDunnough, P., 1981. On efficiency of empirical characteristic function procedures. Journal of the Royal Statistical Society. Series B 43 (1), 20­27. Fishman, G.S., 1996. Monte Carlo: Concepts, Algorithms and Applications. Springer-Verlag, New York. Geman, H., Ané, T., 1996. Stochastic subordination. Risk 9 (September), 12­16. Geman, H., Madan, D., Yor, M., 1998. Asset prices are Brownian motion: only in business time. Working paper. University of Maryland, College Park. Geman, H., Madan, D., Yor, M., 1999. Time changes for Lévy processes. Working paper. University of Maryland, College Park. Grosswald, E., 1976. The Student t-distribution of any degree of freedom is infinitely divisible. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 36, 103­109. Heston, S., 1993. A closed-form solution for options with stochastic volatility, with applications to bond and currency options. Review of Financial Studies 6, 327­344. Hull, J.C., 1999. Options, Futures, and Other Derivatives, 4th edition. Prentice-Hall, Upper Saddle River, NJ. Hull, J., White, A., 1987. The pricing of options on assets with stochastic volatilities. Journal of Finance 42, 281­300. Hull, J., White, A., 1998. Value at Risk when daily changes in market variables are not normally distributed. The Journal of Derivatives 5, 9­19. Ismail, M., 1977. Bessel functions and the infinite divisibility of the Student t-distribution. The Annals of Probability 5, 582­585. Ismail, M., Kelker, D., 1979. Special functions, Stieltjes transforms and infinite divisibility. SIAM Journal on Mathematical Analysis 10, 884­901. Ch. 11: Multifactor Stochastic Variance Models in Risk Management 479 Janicki, A., Weron, A., 1994. Simulation and Chaotic Behavior of -Stable Stochastic Processes. Marcel Dekker, New York. Johnson, N.L., Kotz, S., Balakrishnan, N., 1994. Continuous Univariate Distributions, Vol. 1, 2nd edition. Wiley, New York. Johnson, N.L., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions, Vol. 2, 2nd edition. Wiley, New York. Jorion, P., 2001. Value-at-Risk: The New Benchmark for Managing Financial Risk, 3rd edition. McGraw-Hill, New York. JP Morgan, 1996. RiskMetricsTM Technical Document, 4th edition. JP Morgan, New York. Kagan, A.M., Linnik, Yu.V., Rao, C.R., 1973. Characterization Problems in Mathematical Statistics. Wiley, New York. Kelker, D., 1971. Infinite divisibility and variance mixtures of the normal distributions. The Annals of Mathematical Statistics 42, 802­808. Khindanova, I., Rachev, S., Schwartz, E., 2000. Stable modelling of Value at Risk. Working Paper. University of California, Santa Barbara. Konikov, M., Madan, D., 2000. Pricing options of all strikes and maturities using a generalization of the VG model. Working paper, University of Maryland, College Park. Kotz, S., Balakrishnan, N., Johnson, N.L., 2000. Continuous Multivariate Distributions, Vol. 1, 2nd edition. Wiley, New York. Kotz, S., Kozubowski, T., Podgórski, K., 2001. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhäuser, Boston. Kou, S.G., 2000. A jump diffusion model for option pricing with three properties: leptokurtic feature, volatility smile, and analytical tractability. Working Paper. Columbia University, New York. Kreinin, A., Levin, A., 2000. Robust Monte Carlo simulation for approximate covariance matrices and VaR analysis. In: Uryasev, S. (Ed.), Probabilistic Constrained Optimization: Methodology and Applications. Kluwer Academic, Boston, pp. 166­178. Levin, A., Tchernitser, A., 1999a. Multifactor stochastic variance Value-at-Risk model. Presentation at the Workshop Probability in Finance, Toronto, January 26­30, 1999. The Fields Institute, Toronto. Levin, A., Tchernitser, A., 1999b. Multifactor Gamma stochastic variance Value-at-Risk model. Presentation at the Conference Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics. Washington, DC, June 3­5, 1999. American University, Washington, DC. Levin, A., Tchernitser, A., 2000a. A class of multifactor stochastic variance VaR models: Maximum Entropy approach and jump processes. Presentation at the First World Congress of the Bachelier Finance Society, Paris, June 28­July 1, 2000. Levin, A., Tchernitser, A., 2000b. Stochastic volatility and jump Lévy processes in Value-at-Risk modelling. Presentation at the Risk Conference Math Week '2000, London, November 27­December 1, 2000. Lukacs, E., 1970. Characteristic Functions. Griffin, London. Madan, D., 1999. Purely discontinuous asset price processes. Working paper. University of Maryland, College Park. Madan, D., Milne, F., 1991. Option pricing with VG martingale components. Mathematical Finance 1 (4), 39­55. Madan, D., Seneta, E., 1990. The Variance Gamma (VG) model for share market returns. Journal of Business 63, 511­524. Maejima, M., Rosínski, J., 2000. Type G distributions on Rd . Working Paper. Keio University. Mandelbrot, B., 1960. The Pareto­Lévy law and the distribution of income. International Economic Review 1, 79­106. Mandelbrot, B., 1963. The variation of certain speculative prices. Journal of Business 36, 394­419. Mandelbrot, B., 1997. Fractal and Scaling in Finance: Discontinuity, Concentration, Risk. Springer, New York. Mantegna, R.N., Stanley, H.E., 1994. Stochastic process with ultraslow convergence to a Gaussian: the truncated Lévy flight. Physical Reviews Letters 73, 2946­2949. Mantegna, R.N., Stanley, H.E., 2000. An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, Cambridge. 480 A. Levin and A. Tchernitser Marinelli, C., Rachev, S., Roll, R., 1999. Subordinated exchange rate models: evidence for heavy tailed distributions and long-range dependence. Working Paper. University of Padova. Mausser, H., Rosen, D., 2000. Managing risk with expected shortfall. In: Uryasev, S. (Ed.), Probabilistic Constrained Optimization: Methodology and Applications. Kluwer Academic, Boston, pp. 204­225. McCulloch, J., 1978. Continuous time processes with stable increments. Journal of Business 36, 394­419. McCulloch, J., 1996. Financial applications of stable distributions. In: Maddala, G., Rao, C. (Eds.), Statistical Methods in Finance, Elsevier, Amsterdam, pp. 393­425. Melino, A., Turnbull, S., 1990. Pricing foreign currency options with stochastic volatility. Journal of Econometrics 45, 239­265. Merton, R., 1976. Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics 3, 125­144. Merton, R., 1990. Continuous-Time Finance. Blackwell, Cambridge. Mittnik, S., Rachev, S., 1989. Stable distributions for asset returns. Applied Mathematics Letters 213, 301­304. Müller, U., Dacorogna, M., Pictet, O., 1998. Heavy tails in high-frequency financial data. In: Adler, R., Feldman, R., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails. Birkhäuser, Boston. Nolan, J., 1998. Multivariate stable distributions: approximation, estimation, simulation and identification. In: Adler, R., Feldman, R., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails. Birkhäuser, Boston, pp. 509­525. Olver, F., 1974. Introduction to Asymptotics and Special Functions. Academic Press, New York. Osborne, M., 1959. Brownian motion in the stock market. Operations Research 7, 145­173. Platen, E., 1999. A Minimal Share Market model with stochastic volatility. Research Paper 21. University of Technology, Sydney. Prudnikov, A., Brychkov, Yu., Marichev, O., 1986. Integrals and Series, Vol. 1. Gordon & Breach, New York. Rachev, S., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley, Chichester, New York. Rachev, S., SenGupta, A., 1993. Laplace­Weibull mixtures for modelling price changes. Management Science 39, 1029­1038. Rosínski, J., 1991. On a class of infinitely divisible processes represented as mixtures of Gaussian processes. In: Cambanis, S., Samorodnitsky, G., Taqqu, M. (Eds.), Stable Processes and Related Topics. Birkhäuser, Boston, pp. 27­41. Rosínski, J., 2001. Series representations of Lévy processes from the perspective of point processes In: BarndorffNielsen, O., Mikosch, T., Resnick, S. (Eds.), Lévy Processes ­ Theory and Applications. Birkhäuser, Boston, pp. 27­41. Samorodnitsky, G., Taqqu, M., 1994. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York. Samuelson, P., 1965. Rational theory of warrant pricing. Industrial Management Review 6, 13­32. Sato, K.-I., 1999. Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge. Schoenberg, I., 1938. Metric spaces and completely monotonic functions. Annals of Mathematics 39, 811­841. Sornette, D., Simonetti, P., Andersen, J.V., 2000. q -field theory for portfolio optimization: "fat tails" and nonlinear correlations. Physics Reports 335 (2), 19­92. Stein, E., Stein, J., 1991. Stock price distributions with stochastic volatility: an analytic approach. Review of Financial Studies 4, 727­752. Steutel, F., 1970. Preservation of infinite divisibility under mixing and related topics. In: Math. Centre Tracts, No. 33. Math. Centrum, Amsterdam. Steutel, F., 1973. Some recent results in infinite divisibility. Stochastic Processes and their Applications 1, 125­ 143. Wiggins, J., 1987. Option values under stochastic volatility: theory and empirical estimates. Journal of Financial Economics 19, 351­372. Wilkinson, J.H., Reinsch, C., 1971. Linear Algebra, Vol. 2. In: Handbook for Automatic Computation. SpringerVerlag, New York. Willinger, W., Taqqu, M., Teverovsky, V., 1999. Stock market prices and long-range dependence. Finance and Stochastics 3, 1­13. Chapter 12 MODELLING THE TERM STRUCTURE OF MONETARY RATES LUISA IZZI Credit Risk Department of Banca Nazionale del Lavoro, Department of Mathematics, Statistics and Information Technology, Faculty of Economics, and Centro Interdipartimentale Vito Volterra, University of Rome Tor Vergata, Italy Contents Abstract 482 1. Introduction 483 2. The mathematical framework 484 2.1. Model setup and notation 484 2.2. Regularity conditions on the jump-diffusion process 486 2.3. Interest rates with non-identically distributed jumps 487 2.4. The smile effect and infinitely divisible distributions 490 2.5. The discrete-time process 492 3. The tree representation 494 4. The econometric analysis 500 4.1. Data 500 4.2. Estimation results 501 5. Conclusions 506 References 507 The analysis and conclusions set forth in this chapter are those of the author and do not indicate concurrence by other members of the Banca Nazionale del Lavoro. Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 482 L. Izzi Abstract The chapter addresses the general problem of modeling and estimating the term structure of interest rates by adopting the use of jump-diffusion mean-reverting and stable Paretian models. The chapter proposes a new procedure to recursively compute interest rates subject to both Brownian and Poissonian noises. This procedure is consistent with the absence of arbitrage, non-negativity of interest rates, the mean-reverting hypothesis and the recombining condition, and can be calibrated with respect to any term structure which can be observed in the market. The numerical study shows that the proposed model is particularly suited to describe the behavior of European money market rates. Ch. 12: Modelling the Term Structure of Monetary Rates 483 1. Introduction The term structure of interest rates provides a characterization of interest rates as a function of maturity, which is mainly used in the pricing of fixed-income securities and for the valuation of contingent claims. The chapter addresses the general problem of the modeling and estimation of the term structure of interest rates. The problem is indeed of great importance, in view of its possible application to the analysis of Economic and Monetary Union monetary policy. It is this specific application that makes the problem of modeling interest rates difficult; in fact, the narrowness of the time horizons of interest rates time series has been one of the main obstacles in developing a solid quantitative analysis of the short-term interest rates in the Euro-zone. The term structure has been modeled extensively using primarily pure-Gaussian models. Notable examples of these kinds of models are those of Merton (1973), Cox (1975), Cox and Ross (1976), Vasicek (1977), Dothan (1978), Brennan and Schwartz (1980), Cox, Ingersoll and Ross (1980, 1985) and Black, Derman and Toy (1990). Recent empirical works show that models which accommodate skewness and kurtosis, such as stochastic volatility [see, e.g., Bates (1996) and Britten-Jones and Neuberger (2000)] and jump-diffusion models (Ahn and Thompson, 1988; Ho, Perraudin and Sorensen, 1996; Scott, 1997), appear to fit the time series of interest rates better than pure-Gaussian type ones.1 In fact, unlike jump-diffusion processes, stochastic volatility models are not capable of generating high levels of skewness and kurtosis at short maturities under reasonable parameterizations. As a consequence, stochastic volatility models cannot generate as sharp implied volatility "smiles" as those typically observed empirically.2 It should also be noted that conditional skewness and kurtosis in stochastic volatility models are always hump-shaped in the length of horizon: indeed, for plausible parameter values, both quantities must be increasing over short to moderate maturities. This implies that the smile does not flatten out appreciably as maturity increases. Moreover, the kurtosis of changes in interest rates increases when the data is sampled daily instead of monthly, and interest rates often display discontinuous behavior, partly on account of the discrete changes the Central Bank makes in the official rates [see Baz and Das (1996), Das and Foresi (1996), Björk, Kabanov and Runggaldier (1997), Attari (1999)]. Modeling the term structure using a jump-diffusion process captures these market features better, because with these processes the variance shrinks with the time interval, yet the size of jumps remains 1 The class of jump-diffusion models augments the Black­Scholes (1973) return distribution with a Poissondriven jump process; while the class of stochastic volatility models extends the Black­Scholes model by allowing the volatility of the return process itself to evolve randomly over time. 2 The "smile" is the plot of implied volatilities from a range of options of the same maturity across different strike prices. It can be observed that at-the-money options seem to trade at the lowest implied volatilities and the in-the-money and out-of-the-money options trade at higher volatilities. Since the options are all written on the same underlying variable, there should be no plausible reason for this, other than the fact that the model is inexact. Since the observed distribution of interest rates has fatter tails than that assumed by a pure-Gaussian model, such an effect is intuitively obvious. When plotted against the strike price, the graph of implied volatilities appears U-shaped like a smile. Hence the terminology. 484 L. Izzi the same, enhancing kurtosis (which is the relative size of the outliers to the variance of the process). In this chapter a jump-diffusion mean-reverting model for estimating monetary rates is introduced and related to the class of models driven by infinitely divisible processes [see Gnedenko (1992)] to which Gaussian, Poissonian and Stable ones [see Samorodnitsky and Taqqu (1994)] belong. Relating interest rate models to the class of Stable processes is an attempt to unify, from the probabilistic point of view, the extensive literature on purediffusion, pure-Poissonian and jump-diffusion processes. It completes recent works on the stock [see Schumacher (1997), Rieken (1999)] and exchange markets [see Akgiray and Booth (1988), Rachev and Mittnik (2000)] concerning the same topic. We also propose a new numerical procedure to recursively compute interest rates subject to both Brownian and Poissonian noises. This procedure ­ which generalizes the trinomial approach proposed by Hull and White (1994, 1996) in the presence of pure-diffusion stochastic components ­ is consistent with the absence of arbitrage, non-negativity of interest rates, the mean-reverting hypothesis and the recombining condition. It can be calibrated with respect to any term structure which can be observed in the market, for instance, the present one. Numerical experiments demonstrate how the jump-diffusion mean-reverting model is particularly suited to describe the behavior of European money market rates. Interest rates controlled by the monetary authorities behave like jump processes and the term structure, at short maturity, is contingent upon the levels of these official rates. The rest of the chapter is organized as follows. Section 2 introduces the interest rate model used to describe the behavior of monetary rates both in continuous and in discrete time. To provide a direct application of the model, a new numerical procedure is proposed in Section 3. The new method determines a unique fully specified hexanomial tree, which is consistent with respect to the risk neutral probability. Econometric estimations are related in Section 4. The estimation of the mean-reverting jump-diffusion process is obtained via the indirect inference method. Finally, Section 5 concludes the chapter. 2. The mathematical framework 2.1. Model setup and notation Let be a (rich enough) sample space, representing the state-of-the-world, and let Xt : be a stochastic process on such that Xt = Bt + n i=1 aiN (i) t + , (1) where t [0,T ] and T is a finite horizon; Bt is a Brownian motion; 0 is constant; n 1 is fixed; N(i) t are mutually independent Poisson processes, also independent of Bt , with parameters i > 0, for i = 1,...,n; ai Bin(i, 2 i ) is independent of any other Ch. 12: Modelling the Term Structure of Monetary Rates 485 variable, with i and i , and represents the jump size of the corresponding Poisson process, for i = 1,...,n; is a shift parameter. Let F = {Ft}0 t T be the filtration generated by {Xt}0 t T (the information structure on ), that is Ft is the -field generated by {Xs}0 s t , where the initial filtration F0 is trivial, and P a probability measure on (,FT ). Let rt : + be a stochastic process on the filtered probability space (,{Ft},P) described by the stochastic differential equation: drt = t dt + t dXt (t [0,T ]), (2) where t = k(t - rt ) is the drift function (or conditional mean); t is a positive bounded real function of time; k is the calling back strength of the t rate on the instantaneous rate rt ; t is the diffusion function of the interest rate change (2 t is the conditional variance). So, we can redefine the model as follows: drt = k(t - rt )dt + t[ dBt + a dNt] (t [0,T ]), (3) where a dNt = n i=1 ai dN(i) t ; a = [a1,a2,...,an]; Nt = [N(1) t ,N(2) t ,...,N(n) t ]T. Remark 2.1. Let us consider the particular case in which n = 2 and a1 Bin(1, 2 1 ) and a2 Bin(2, 2 2 ) are independent of any other random variable, with 1 > 0, 2 < 0, 1,2 . Then, the stochastic process Xt has the form: Xt = Bt + a1N (1) t + a2N (2) t + (t [0,T ]) and the two Poisson processes N(1) t and N(2) t could represent the actions of the European Central Bank (N (1) t "up" and N (2) t "down") on the official rates. Moreover, the stochastic differential equation: drt = k(t - rt )dt + t dBt + a1 dN (1) t + a2 dN (2) t (t [0,T ]) is subject to the constraint: |t - rt | ut - dt, where t is the rate of the main refinancing operation of the European Central Bank (ECB); ut is the rate of the marginal lending facility; dt is the rate of the deposit facility (see Figure 1). In fact, money market rates should not exceed the corridor fixed by ut (upper bound) and dt (lower bound), because under normal conditions the ECB gives banks all the money they need at the ut rate. The only limit is determined by the availability of collateral to be presented to the ECB as a guarantee of the credit received. In such a way, if the collateral available to banks has no limits, also the credit that banks receive from the ECB at the ut rate has no limits. Similarly, at the end of every working day, banks can deposit an unlimited amount of money in the ECB accounts at the dt rate.3 3 EONIA (the European overnight rate which is computed as an average value by the ECB) has gone up to the ut rate only in very few occasions due to market inefficiency problems. 486 L. Izzi Fig. 1. Official rates during the ten policy periods (percentage values). 2.2. Regularity conditions on the jump-diffusion process The jump-diffusion process defined in (3) can be used to approximate a wide range of Markovian or non-Markovian processes. In particular, the weak convergence of a driving noise process to a Lévy process implies the weak convergence of the driving noise process to a jump-diffusion process [see, for example, Blasikiewicz and Brown (1996)]. In the general case, the Brownian motion Bt and the compound Poisson process t 0 a dN () are infinitely divisible in time, when appropriately scaled, and have independent increments. It is noted, however, that even though Bt is a martingale process, a compound Poisson process is in general not a martingale. Define the compensated Poisson process: dNt() = dNt() - dt since E t+ t t a dN () = E t+ t t a dN () - d = 0, i.e., t 0 a dN () is a martingale process. Therefore, without loss of generality, we only need to consider the case in which the compound Poisson process is a martingale process.4 To ensure that the generalized stochastic differential equation defined in (3) is well behaved, it is necessary to make two additional assumptions. 4 For further details, see Zacks et al. (1999). Ch. 12: Modelling the Term Structure of Monetary Rates 487 Assumption 2.1. The coefficient functions () and () are measurable in the product -algebra B × F, where B is the -field of the Borel sets on , and furthermore T 0 |t|dt < a.s., T 0 2 t dt < a.s. and E T 0 2 t dt < a.s. To ensure the existence and uniqueness of a solution to the stochastic differential equation (3), it is necessary to impose: Assumption 2.2. The coefficient functions () and () satisfy both Lipschitz and linear growth conditions, i.e., there exists a positive constant K for which (r,t) - (r ,t) + (r,t) - (r ,t) K|r - r |, (r,t) - (r,t ) + (r,t) - (r,t ) K|t - t | and 2 (r,t) + 2 (r,t) K 1 + r2 . Let us consider the model: drt = k(t - rt )dt + dBt + a dNt (t [0,T ]), (4) where the diffusion function t is assumed to be equal to 1; 0 is the volatility of the interest rate, provided that no jumps occur; Nt() is a Poisson process with an intensity parameter ; a is the jump size; dBt and dNt are statistically independent. 2.3. Interest rates with non-identically distributed jumps Let us focus our attention on the stochastic part of the interest rate process, i.e., on Xt , defined by Xt = Bt + n i=1 aiN(i) t + (t [0,T ] and n 1). (5) Following Rachev and Rüschendorf (1994) and Schumacher (1997), we can study Xt as an infinitely divisible random variable.5 5 For further details, see Rachev and Mittnik (2000) and Rieken (1999). 488 L. Izzi Definition 2.1. A sequence {Xn,k} of random variables (with n 1 and k = 1,...,n) is uniformly asymptotically negligible (u.a.n.) if Xn,k P 0 (n ), uniformly in k, or equivalently if for every > 0 max 1 k n P |Xn,k| 0 (n ). Let Fn,k denote the cumulative distribution function of Xn,k and n,k its characteristic function. It can be shown [see Rachev and Mittnik (2000) and the references therein] that the u.a.n. condition is equivalent to max 1 k n x2 1 + x2 dFn,k 0 (n ) or max 1 k n |n,k - 1| 0 (n ), uniformly on every finite interval. The use of u.a.n. random variables restricts the class of summands, but this restricted class, known as the class of infinitely divisible random variables, is that investigated in the classical central limit problem. It is of primary interest in modeling interest rates and includes the most commonly models, i.e., Normal, Poisson, Stable and others. Definition 2.2. A random variable X, its distribution FX, and its characteristic function X are said to be infinitely divisible (I.D.) if, for every n 1, there are independent and identically distributed (i.i.d.) random variables 1,2,...,n, with distribution function F and characteristic function each, such that X d = 1 + + n, or equivalently if, for every n 1, FX is the n-fold convolution of F, i.e., FX = F F = (F)n , or equivalently if, for every n 1, its characteristic function X is the n-th power of the characteristic function . From Definition 2.2 it follows immediately that the Normal, the Poisson and the degenerate distributions, i.e., all possible limits of n k=1 Xn,k in the i.i.d. case, belong to this class. Ch. 12: Modelling the Term Structure of Monetary Rates 489 The I.D. law is uniquely defined by the Lévy­Khintchinerepresentation of its characteristic function [see Rachev and Mittnik (2000)]: X(u) = E eiuX = exp iu + + - eiux - 1 - iux 1 + x2 1 + x2 x2 d(x) , (6) where and is a distribution function up to a multiplicative constant. The characteristic function (6) describes the stochastic process which drives the interest rate in the natural world. Hence, and can be estimated from the observed data. In the case in which the variance of X is finite, the characteristic function (6) simplifies to: X(u) = exp iu + + - eiux - 1 - iux 1 x2 d(x) . (7) The finite variance I.D. model with characteristic function (7) contains the Normal and the Poisson distributions as special cases. In fact, suppose that (x) = I[0,)(x), where I[a,)(x) denotes the indicator function of the interval [a,), i.e., I[a,)(x) = 0, if x < a, 1, if x a, with a,x . In this case, the integral in (7) becomes -2t2/2 and the characteristic function reduces to: X(u) = exp iu - 2t2 2 , (8) which is the characteristic function of a Normal random variable with mean and variance 2. The Poisson distribution is obtained from (7) by setting (x) = K2 I[K,)(x), with > 0 and K = 0. In this case, the characteristic function becomes: X(u) = exp eiuK - 1 - iuK . (9) The relation (9) defines the characteristic function of a scaled and shifted Poisson random variable of the form K (N - ), where N is a Poisson random variable with parameter , i.e., P{N = k} = e- k! , 490 L. Izzi with > 0 and k = 1,2,.... Yet, the characteristic function of N is: N (u) = exp eiu - 1 . (10) Hence, the finite-variance I.D. distribution defined by (7) can be interpreted as an infinite mixture of independent Poisson random variables and one independent Gaussian random variable. For practical purposes, the I.D. model can, for example, be specified as a Normal random variable and a finite number n of independent Poisson random variables, as defined in (5), i.e., Xt = Bt + n i=1 aiN (i) t + , (11) where t [0,T ]; n 1; 0; Bt is a standard Normal random variable; N(i) t are independent Poisson random variables (independent from Bt ), respectively, of parameters i, for i = 1,...,n; ai Bin(i, 2 i ) are independent of any other variable, with i,i , for i = 1,...,n; . 2.4. The smile effect and infinitely divisible distributions If the Black­Scholes (1973) model of pricing contingent claims were true, the implied volatility would be constant for all claims. This, however, is not observed in practice. Instead, the phenomenon of a volatility smile is observed on call options on increasing strike prices. That is, deep-in-the-money call options (options with a strike price K X, where X is the current price of the underlying asset) and deep-out-of-the-money options have an implied volatility parameter (the scale parameter in the Normal distribution) which exceeds the implied volatility parameters of at-the-money options. Said another way, (K) appears to be a convex function of K. Said still a third way, the Black­Scholes formula underprices deep-in-the-money and deep-out-of-the-money call options, whereas it overprices at the money options. A related issue is the leptokurtic nature of financial asset values. That is, it is also widely observed that asset values tend to be more highly peaked and have fatter tails than is implied by the Normal model. The use of infinitely divisible distributions may help to explain the phenomenon of leptokurtosis as well as the smile effect.6 Definition 2.3. The coefficient of kurtosis 4 of the random variable X is defined as 4 = 4 2 2 , where 4 and 2 are the fourth and second central moments of X, respectively. 6 See Schumacher (1997) and Rachev and Mittnik (2000) for further details. Ch. 12: Modelling the Term Structure of Monetary Rates 491 Remark 2.2. The existence of a moment generating function of X is enough to ensure the finiteness of the central moments. The coefficient of kurtosis is a measure of the peakedness of the distribution. It can be shown that for a Normally distributed random variable X, the coefficient of kurtosis 4 = 3. Given the leptokurtotic nature of asset prices, we might find it desirable that our model of interest rate behavior have a coefficient of kurtosis 4 3. Theorem 2.1. Let 4(X) be the coefficient of kurtosis of an infinitely divisible random variable X. Then 4(X) 3. Proof: Let Yn = n i=1(aiNi + bi) where Ni are independent Poisson random variables with rate i . The log characteristic function of Yn is then logYn (u) = n i=1 i eiaiu - 1 + iubi . The k-th derivative (for k 2) of the log characteristic function is d(k) logYn (u) du(k) = n i=1 i(iai)k eiaiu . Hence, we obtain the semi-invariants of order j, for j 2 for the finite sum of scaled and shifted Poisson random variables: j = n i=1 a j i i. The semi-invariants can be expressed in terms of the central moments as 2 = 2 = 2 , 3 = 3, 4 = 4 - 32 2, and so on. From this it follows that the coefficient of kurtosis for the given sum is given by 4 = 4 2 2 = n i=1 a4 i i + 3( n i=1 a2 i i)2 ( n i=1 a2 i i)2 3. Now by a remark of Gnedenko (1992), all infinitely divisible random variables are limits of sums of scaled and shifted Poisson random variables; so the theorem follows. 492 L. Izzi 2.5. The discrete-time process Let us divide the time interval [0,T ] into M (M 1) sub-intervals of equal length t, i.e., [ti,ti+1), where i = 0,1,...,M - 1, ti = i t and t = T/M. Then, for i = 0,1,..., M - 1, the stochastic differential equation (4) admits the discretization: r(M) ti+1 - r(M) ti = k (M) ti - r(M) ti t + B(M) ti+1 - B(M) ti + a N(M) ti+1 - N(M) ti , (12) where {B (M) ti+1 - B (M) ti }M-1 i=0 are i.i.d. random variables, distributed as N(0, t); a N (M) ti+1 - N (M) ti = n j=1 aj N (M),(j) ti+1 - N (M),(j) ti ; {N (M),(j) ti+1 - N (M),(j) ti }M-1 i=0 are i.i.d. random variables, distributed as Poisson ((j) t), for any j = 1,...,n. As M , r (M) ti (i = 0,1,...,M), as defined in (12), converges in weak sense to the process rt (t [0,T ]), satisfying (4). In the following, the index M will be omitted for simplicity. Let and and set, for notational convenience, = t. Following Das (1997), let us define the four sequences of random variables: X (1) ti+1 = 1 t (Bti+1 - Bti ) (i = 0,1,...,M - 1), where {X (1) ti }M i=0 are i.i.d. standard Normal random variables; X(2) ti = + , w. prob. 2 , 0, w. prob. 1 - , - , w. prob. 2 , i.i.d. for i = 0,1,...,M and with {X(2) ti }M i=0 and {X (1) ti }M i=0 mutually independent; X (2) ti = + , w. prob. 1 2 , - , w. prob. 1 2 , i.i.d. for i = 0,1,...,M; Bin()ti = 1, w. prob. , 0, w. prob. 1 - , i.i.d. for i = 0,1,...,M and with {Bin()ti }M i=0 and {X (2) ti }M i=0 mutually independent. It is easy to show that {X (2) ti }M i=0 d = {X (2) ti Bin()ti }M i=0. Ch. 12: Modelling the Term Structure of Monetary Rates 493 Let D be the metric space of right continuous real-valued functions on [0,T ], with left limits. Let the interest rate process be defined on the path space D[0,T ]. Let C[0,T ] be the subspace of D[0,T ] of all real-valued continuous functions on [0,T ]. The space D is restricted to the Skorokhod topology which, when restricted to the space C[0,T ], is the topology of uniform convergence.7 By the Functional Limit Theorem for stochastic differential equations, to approximate the spot rate process as defined in (4), in weak sense in D[0,T ], with the process rti (i = 0,1,...,M) as defined in (12), we can replace { X (1) ti t}M i=0 by the sequence of i.i.d. random variables {X (1) ti }M i=0 distributed as follows: X (1) ti = + t, w. prob. 1 2, - t, w. prob. 1 2, for any i = 0,1,...,M. Then, the discrete spot rate process rti (i = 0,1,...,M) as defined in (12) is equivalent to rti+1 = rti + k(ti - rti ) t + X(1) ti+1 + X(2) ti+1 (i = 0,...,M - 1). (13) Remark 2.3. The discrete process above mimics the behavior of a continuous-time jumpdiffusion process. Hence for i = 0,...,M, the first noise term, X (1) ti , represents the diffusion component, while the second one, X (2) ti , represents the jump, which assumes values + or - . Therefore, the jump has mean and variance 2. is the probability parameter of a jump in unit time and hence the probability of a jump in any time interval approximates = t. In our process, jumps occur "rarely", which is achieved by choosing a low value for (0,1). Moreover, as the time interval t decreases, the probability of a jump goes to zero. Finally, we may choose the parameters and to provide the necessary skewness and kurtosis. In particular, the parameter governs skewness, while drives kurtosis. Remark 2.4. The moments of the discrete process provide the intuition for why the choice of jump form injects the necessary skewness and kurtosis into the model. The first four moments are as follows: * mean: rt + k(t - rt) + ; * variance: (2 + 2) - 22 + 2 t; * skewness: (1 - )(2 + 3 2 - 22), the sign of which clearly depends on the sign of ; 7 See Billingsley (1995). 494 L. Izzi * kurtosis (expressed for = 0 and = 0, to keep the expression simple and to observe the effect of ): 4, which demonstrates that the magnitude of kurtosis depends on the size of . When = 0, kurtosis is equal to: (1 - )44 + 2 ( - - )4 + 2 ( + - )4, which means that the sign and size of also affect the degree of kurtosis. However, here too, the degree of kurtosis increases with . 3. The tree representation The diffusion process X (1) ti (i = 0,...,M - 1), as defined above, can be represented on the binomial tree: X (1) ti X(1) ti+1 = X(1) ti + t, w. prob. 1 2 , X (1) ti+1 = X (1) ti - t, w. prob. 1 2 . The random term representing the jump in Equation (13) is X(2) ti and can be represented on the trinomial tree: X(2) ti X(2) ti+1 = X(2) ti + + , w. prob. 2 , X(2) ti+1 = X(2) ti , w. prob. 1 - , X(2) ti+1 = X(2) ti + - , w. prob. 2 . Let us consider, at first, only the discrete diffusion process: rti+1 = rti + k(ti - rti ) t + X(1) ti+1 (i = 0,...,M - 1), (14) and, following the methodology of Hull and White (1994), let us construct a binomial recombining tree for a variable r ti (i = 0,...,M), which is initially zero, follows the process: r ti+1 = r ti - kr ti t + X (1) ti+1 (i = 0,...,M - 1) (15) and is symmetrical about r ti = 0. The variable [r ti+1 - r ti ] is Bernoulli distributed. If terms of higher order than t2 are ignored, it holds: E r ti+1 - r ti = -kr ti t, Var r ti+1 - r ti = 2 t + k2 r ti 2 t2 . Let ji be a positive or negative integer, varying in the range (- 1 k t ,+ 1 k t ) and depending on i, with i = 0,...,M. Recalling that ti = i t, let denote r D(i,) the corresponding value of r ti on a lattice scheme. For any (integer) i [0,M] and (any integer) ji (- 1 k t ,+ 1 k t ) Ch. 12: Modelling the Term Structure of Monetary Rates 495 the value of r D(i,) at "level" ji, i.e., at node (i, ji), will be denoted with r D(i,ji) and similar notation applies to rti . Assumption 3.1. The binomial tree is recombining. For every i = 0,...,M, let ni be the number of nodes of the tree generated at time ti = i t. The binomial tree is assumed to be recombining, then, the number of nodes on the tree will be i k=0 nk = i k=0 (k + 1) = i+1 k=1 k. Let Ji denote the set of indices ji generated at time ti = i t, with i = 0,...,M. Then |Ji| = ni = i + 1. Assuming J-2 = J-1 = and recalling that the tree is recombining, we can calculate recursively the set Ji as: Ji = Ji-2 {i,-i}, for any i = 0,...,M. For any integer ji [0,+ 1 k t ), let (ji) denote the spacing between interest rates on the tree by moving from level ji - 1 to level ji ("up" movement), or from level ji to level ji - 1 ("down" movement). In both cases, "up" or "down" movement, the width of spacing between rates depends on the maximum (starting or ending) level reached by moving between the two adjacent nodes, i.e., if the interest rate moves (up or down) from node (i,ji) to node (i + 1,ji+1), with ji 1, then r D(i + 1,ji+1) - r D(i,ji) = max{ji,ji+1} . On the contrary, for any integer ji (- 1 k t ,0) set: r D(i + 1,ji+1) - r D(i,ji) = min{ji,ji+1} . Then (ji) = (-ji), ji - 1 k t ,+ 1 k t {0}. Remark 3.1. Note that, by construction, the quantity (0) = 0 will never be employed in the tree building, i.e., in deriving the interest rate position on the tree. For any i = 1,...,M - 1 and any integer ji (- 1 k t ,+ 1 k t ), let u(ji,ji+1) and d(ji,ji+1) denote, respectively, an "up" and a "down" movement on the binomial tree from level ji to level ji+1. Then u(ji,ji+1) = - d(-ji,-ji+1). 496 L. Izzi Assumption 3.2. Let us suppose that, for any i = 1,...,M and ji Ji (- 1 k t ,+ 1 k t ), the spacing between interest rates on the tree is such that (ji) = |ji| t 1 - j2 i k2 t2 . Remark 3.2. The spacing between nodes is not constant, but assumes values increasing with ji. This makes our model more flexible than the standard Cox, Ross and Rubinstein (1979) binomial tree, which consists of a set of nodes ­ representing possible future stock prices ­ with constant logarithmic spacing between them. This spacing is a measure of the future volatility, itself assumed to be constant in the Cox, Ross and Rubinstein framework and in the Black and Scholes model (1973) to which a Cox, Ross and Rubinstein tree, with an "infinite" number of time steps, converges. The constancy of volatility cannot easily be reconciled with the observed structure of implied volatilities for options traded in most financial markets. Volatility varies with both strike price and expiration. This variation, known as implied volatility "smile", is currently a significant and persistent feature of option markets. Nevertheless, the constant local volatility assumption in the Black and Scholes theory and in the Cox, Ross and Rubinstein tree leads to the absence of a volatility smile, at least as long as market frictions are ignored [see Derman, Kani and Chriss (1996)]. Our tree refers to implied tree theories that extend the Black and Scholes model to make it consistent with the shape of the smile [see Derman and Kani (1997a, b) and Rubinstein (1994)]. Because r ti is initially zero, then r D(0,0) = 0. The dynamics of r ti incorporate a mean reversion to zero, where the strength of the mean reversion is proportional to the value of r ti . Therefore, there exist an upper and a lower bound to the interest rate process. Let us formalize this condition as follows: Assumption 3.3. There exist two integer numbers ju (0,+ 1 k t ) and jd (- 1 k t ,0) such that * jd ji ju for any ji Ji (i = 0,...,M); * if ji = ju at step i, then ji+1 = ju - 1 at step i + 1 (i = 0,...,M - 1); * if ji = jd at step i, then ji+1 = jd + 1 at step i + 1 (i = 0,...,M - 1). The parameters ju and jd may be estimated from the real values assumed by rti , otherwise we can set: ju = 1 k t - 1 and jd = -ju, where [ 1 k t ] denotes the integer part of 1 k t . Let pu(i,ji) and pd(i,ji) denote, respectively, the transition probabilities of "up" and "down" movements starting from node (i,ji), with i = 0,...,M -1 and ji Ji (jd,ju). Ch. 12: Modelling the Term Structure of Monetary Rates 497 The two probabilities must be chosen to match the expected change and variance of the change in r ti over the next time interval t. The probabilities must also be non-negative and sum to unity. This leads to the equations below: pu(i,ji) u(ji,ji+1) + pd(i,ji) d(ji,ji+1) = -kr D(i,ji) t, pu(i,ji) 2 u(ji,ji+1) + pd(i,ji) 2 d(ji,ji+1) = 2 t + k2r D(i,ji)2 t2, pu(i,ji) + pd(i,ji) = 1, 0 pu(i,ji) 1, 0 pd(i,ji) 1. Using Assumptions 3.2 and 3.3, the solution to the system above is given by the following cases: * for any i = 0,...,M - 1 and ji Ji [+1,ju): pu(i,ji) = pu(ji) = 1 2 - 1 4 - t u(ji,ji+1) - d(ji,ji+1) 2 , pd(i,ji) = pd(ji) = 1 2 + 1 4 - t u(ji,ji+1) - d(ji,ji+1) 2 ; * for any i = 0,...,M - 1 and ji Ji (jd,-1]: pu(i,ji) = pu(ji) = 1 2 + 1 4 - t u(ji,ji+1) - d(ji,ji+1) 2 , pd(i,ji) = pd(ji) = 1 2 - 1 4 - t u(ji,ji+1) - d(ji,ji+1) 2 ; * for any i = 0,...,M - 1 and ji = 0: pu(i,0) = pu(0) = 1 - k t 2 , pd(i,0) = pd(0) = 1 + k t 2 ; * for any i = 0,...,M - 1 and ji = ju: pu(i,ju) = pu(ju) = 0, pd(i,ju) = pd(ju) = 1; * for any i = 0,...,M - 1 and ji = jd: pu(i,jd) = pu(jd) = 1, pd(i,jd) = pd(jd) = 0. 498 L. Izzi Remark 3.3. At each node (i,ji), for any i = 0,...,M - 1 and ji Ji [jd,ju], with jd = -ju and ji = 0, the transition probabilities depend only on ji, so the resulting binomial tree, besides being recombining, is also symmetrical around ji = 0, i.e., pu(ji) = pd(-ji), pu(ju) = pd(jd) and pu(jd) = pd(ju). Thus, it is not necessary to save all transition probabilities in one large array and the loss in computing time is very small. Remark 3.4. For every i = 0,...,M - 1 and ji Ji [jd,ju], the transition probabilities pu(ji) and pd(ji) reflect the mean reversion of the rate r ti and, thus, of r D(i,ji). In fact, * if ji > 0, r ti lies above its medium term tendency, pu(ji) < 1 2 and pd(ji) > 1 2 ; moreover, for any integer k > 0, pu(ji) > pu(ji + k) while pd(ji) < pd(ji + k), i.e., the larger the ji, the greater the calling back strength; * if ji < 0, r ti lies below its medium term tendency, pu(ji) > 1 2 and pd(ji) < 1 2 ; moreover, for any integer k < 0, pu(ji) < pu(ji + k) while pd(ji) > pd(ji + k), i.e., the smaller the ji, the greater the calling back strength. Following Hull and White (1994), we have to convert the tree for r ti (i.e., for r D(i,)) into a tree for rti (i.e., for rD(i,)). This is accomplished by displacing the nodes on the r ti -tree so that the initial term structure is exactly matched. Define: ti = rti - r ti (i = 0,...,M) and let (i) = rD(i,) - r D(i,) its lattice version. For a given "time level" i = 0,...,M, all nodes are shifted by the same amount (i). By the definitions of rti and r ti (14), (15), the variation of ti in the interval t is: ti = k[ti - ti ] t. Then, as t 0 it holds: dt = k[t - t ]dt. The estimation of t , i.e., the calibration of the tree, could be done via the "spline­wavelet" method proposed in Cattani and Izzi (2000)8 or, in turn, by following the procedure pro- 8 Cattani and Izzi (2000) propose a spline Haar-wavelet interpolation as a flexible tool for interest rate and term structure estimation. The main advantage in employing wavelet functions is that they are a very well-localized representation of the process: the accuracy is reached with only a few sets of data and singularities do not influence the estimation of the entire process. Ch. 12: Modelling the Term Structure of Monetary Rates 499 posed by Hull and White (1994). The solution t can be used to create a tree for rti from the corresponding tree for r ti . The approach is to set the interest rates on the rD-tree at time i t to be equal to the corresponding interest rates on the r D-tree plus the value of () at time i t while keeping the probabilities the same. Then, for any i = 0,...,M - 1 and ji Ji [jd,ju], it holds that: r D(i,ji) = sgn(ji) |ji | l=0 |l| t 1 - l2k2 t2 and rD(i,ji) = r D(i,ji) + (i), where sgn(ji) = +1, if ji 0, -1, if ji < 0. Remark 3.5. For any i = 0,...,M - 1 and ji Ji [jd,ju], r D(,), pu() and pd() do not depend on i, a property referred to as stationarity: the behavior of r D depends on its current value via j but not on the date. In the next instant of time, ti+1 = (i + 1) t, r D() could assume the following values: r D(ji+1) = r D(ji) + u(ji,ji+1), w. prob. pu(ji), r D(ji) + d(ji,ji+1), w. prob. pd(ji), while rD(i + 1,ji+1) = r D(ji) + u(ji,ji+1) + (i + 1), w. prob. pu(ji), r D(ji) + d(ji,ji+1) + (i + 1), w. prob. pd(ji). Let us consider now the discrete jump process: X (2) ti = + , w. prob. 2 , 0, w. prob. 1 - , - , w. prob. 2 . For i = 0,...,M - 1, let rJD(i,) denote the value corresponding to rti+1 = rti + k(ti - rti ) t + X (1) ti+1 + X (2) ti+1 (i = 0,...,M - 1), (16) 500 L. Izzi on a lattice scheme, while r JD(i,) that of r ti+1 = r ti - kr ti t + X (1) ti+1 + X (2) ti+1 (i = 0,...,M - 1). (17) The convolution between the jump and the diffusion processes provides the outcome for representing the evolution of the entire term structure. The product space resulting from this convolution could be represented by a hexanomial tree. With reference to r JD(i,), i.e., to Equation (17), we have: r JD(i + 1,ji+1) = r JD(ji+1) = r D(ji) + u(ji,ji+1) + + , w. prob. 2 pu(ji), r D(ji) + u(ji,ji+1), w. prob. (1 - )pu(ji), r D(ji) + u(ji,ji+1) + - , w. prob. 2 pu(ji), r D(ji) + d(ji,ji+1) + + , w. prob. 2 pd(ji), r D(ji) + d(ji,ji+1), w. prob. (1 - )pd(ji), r D(ji) + d(ji,ji+1) + - , w. prob. 2 pd(ji). Then, the hexanomial tree representing the jump-diffusion process (16) is defined by rJD(i + 1,ji+1) = r JD(ji+1) + (i + 1), for any i = 0,...,M - 1 and ji Ji (jd,ju). 4. The econometric analysis 4.1. Data The short-term rate used in estimating our model is the one-month Euribor (EURo InterBank Offered Rate). Data come from the European Central Bank and Datastream and have been sampled daily from 1 January 1999 to 5 March 2001 (567 working days). Weekends and holidays have not been treated specifically (Monday is taken as the next day after Friday). Whereas weekend effects have been documented for stock prices, there does not seem to be a conclusive weekend effect on money market instruments. While similar theoretical and empirical work could be performed on zero-coupon bond yield, the liquidity and default characteristics of such securities are different from those of interbank instruments and the time-varying effects introduced by such features would blur our analysis. Before defining our stochastic model for the interbank interest rates, a statistical analysis on a wide set of data has been done. The short-term rates used in this preliminary step are the daily interbank euro rates: overnight, 1 week, 1­12 months. The overnight rate is EONIA (Euro OverNight Index Average) daily calculated by the European Central Bank; the Ch. 12: Modelling the Term Structure of Monetary Rates 501 Table 1 Descriptive statistics of the short-term interbank rates Rate Mean Standard deviation Skewness Kurtosis o / n 3.549 0.906 0.256 -1.295 1 w 3.604 0.872 0.257 -1.512 1 m 3.653 0.872 0.216 -1.524 2 m 3.712 0.875 0.175 -1.508 3 m 3.768 0.876 0.133 -1.498 4 m 3.809 0.870 0.126 -1.499 5 m 3.846 0.867 0.114 -1.499 6 m 3.875 0.867 0.099 -1.509 7 m 3.900 0.868 0.083 -1.511 8 m 3.927 0.870 0.062 -1.510 9 m 3.955 0.873 0.036 -1.503 10 m 3.983 0.877 0.015 -1.484 11 m 4.007 0.877 -0.016 -1.469 12 m 4.034 0.881 -0.042 -1.450 interbank rates, from 1 week to 12 months, are the Euribor. Table 1 reports the descriptive statistics of the fourteen rates.9 4.2. Estimation results In the European money market, jumps may arise from intervention by the European Central Bank (ECB) on the official rates. Short rates tend to track these rates ­ and, above all, the main refinancing operation rate ­ rather closely. The aim of estimations is to examine the impact of changes in the main refinancing operation rate level t on monetary rates. Equation (4) is specified as follows: rt = k[ti - rti ] t + Bt + a , 2 Nt(), at+ t = 0[t+ t - t]. The process is modelled as if the main refinancing operation rate were actually implemented and perceived by the market. Main refinancing operation rate changes are assumed to be independent from the short-rate process and to be infrequent, with 1 the probability of the main refinancing operation rate change on any day t. In other words, the main refinancing operation rate changes are modelled here as outcomes in Bernoulli trials, where the probability of an event ­ main refinancing operation rate change on any given day ­ occurring in a single trial is constant. 9 It should be noted that monetary policy actions influence the three policy rates, beginning from different dates; however, for the statistical analysis, we consider as simultaneous the ECB influence on the three rates, because of the immediate effect of the announcement on the short-term structure. 502 L. Izzi The jump-diffusion process is estimated using indirect inference method. The problem of the optimal estimator is solved with respect to both the generalized method of moments (GMM) and the full information maximum likelihood estimation (FIML). In the latter case, parameter estimates are obtained by numerically maximizing the sample log-likelihood function. The standard error is estimated using the Hessian matrix at the optimum point. The optimization algorithm is the Gauss­Newton method. In a second stage, the generalized method of moments is used to obtain efficient estimates of the parameters of the Poissonian part of the process. These are derived as follows: = E{t+ t - t}, = E{at}, 2 = Var{at}. In particular, the estimate for the daily probability is given by the empirical frequency of main refinancing operation rate changes. There are 8 main refinancing operation rate changes in our sample of 567 working days. Equation (4) has been estimated with respect to all the interbank short rates, from overnight to twelve-month maturity. Estimation results indicate that the presence of jump components superimposed on the diffusion process is significant in any of the fourteen interest rates, so that the assumption is consistent with the empirical evidence. Estimations also show that the impact of monetary policy actions is particularly evident in those rates whose maturity lies between one and three months. The estimates performed on these rates produce ­ from the statistical point of view ­ the most significant results. The model shows a good fit with respect to both the GMM and the FIML estimation method. The best performance is obtained with respect to the two-month Euribor (see Figure 2). A T-test on the forecast error has also been applied. The test results show how, in the case of two-month Euribor, the prevision error is in mean statistically equal to zero. In the case of one-month and three-month Euribor, even if the error is not so large, the T-test refuses the hypothesis that the observed and estimated series are statistically equal in the average (see Tables 2 and 3).10 10 The usual T-statistic for testing the equality of averages Xn and Ym from two independent samples with n and m observations is: Tn,m = |Xn - Ym| (n - 1)S2 n + (m - 1)S2 m nm(n + m - 2) n + m T(n+m-2), where S2 n and S2 m are the sample variances of the two groups. The T-statistic assumes that n = m, where n and m are the population variances of the two groups. We use the folded form of the F (Fisher) statistic, F , to test the assumption that the variances are equal, where F = max[S2 n,S2 m] min[S2 n,S2 m] . Under the assumption of equal variances, the T-statistic is computed with the formula given above. Under the assumption of unequal variances, the approximate T-statistic is computed as Tn,m = |Xn - Ym| S2 n/n + S2 m/m . Ch. 12: Modelling the Term Structure of Monetary Rates 503 Fig. 2. A comparison among the main refinancing operation rate and actual and fitted two-month Euribor (percentage values). Table 2 T-test between the actual series of one-, two- and three-month Euribor and their estimates with respect to the full information maximum likelihood method Series 1m Euribor 2m Euribor 3m Euribor Mean value of the actual series 3.65312898 3.71241696 3.76756714 Mean value of the fitted series 3.43574164 3.64401606 3.51284076 T-test 4.2619 1.2725 5.0116 Prob > |T | 0.0001 0.2035 0.0001 Table 3 T-test between the actual series of one-, two- and three-month Euribor and their estimates with respect to the generalized method of moments Series 1m Euribor 2m Euribor 3m Euribor Mean value of the actual series 3.65312898 3.71241696 3.76756714 Mean value of the fitted series 3.42677354 3.63484966 3.50756698 T-test 4.4117 1.4274 5.0721 Prob > |T | 0.0001 0.1538 0.0001 504 L. Izzi Table 4 Parameter estimates of the jump-diffusion process with respect to the generalized method of moments One-month Euribor Parameter Estimate T-statistic k -0.010033 -22.08 0.993974 395.68 0.003086 1.70 0.019877 4.41 0.013522 4.41 Fig. 3. A comparison between the main refinancing operation rate changes and the shock related to the estimation of the one-month Euribor (percentage values). The estimates of the parameters of the Poissonian part of the process, which are calculated with respect to the one-month Euribor and become asymptotically more significant as the length of the horizon grows, suffer the narrow size of the sample, which does not allow us to make rolling estimations. This problem could be overcame using Monte Carlo simulations. Parameter estimates are summarized in Table 4. As a final observation, let us consider the shock of the model, derived from a stochastic perturbation process. As shown in Figures 3­5, the shocks related to the three monetary rates (from one- to three-month Euribor) capture the effects of the approach main refinancing operation rate changes, thus giving a measure of market expectations. It has to be noted that in all the three cases, the shocks (Figures 3­5, peaks in plain) anticipate the interventions of the ECB (peaks in bold). In particular, in the case of the first intervention ­ the only downward one in our sample period ­ the shock magnitude Ch. 12: Modelling the Term Structure of Monetary Rates 505 Fig. 4. A comparison between the main refinancing operation rate changes and the shock related to the estimation of the two-month Euribor (percentage values). Fig. 5. A comparison between the main refinancing operation rate changes and the shock related to the estimation of the three-month Euribor (percentage values). is smaller than that observed in the case of upward interventions, taking into account that upward interventions are closer to each other. The downward movements observed in the three shock series at the end of 1999 (see the plain peaks) show a behavior only apparently contrary to monetary policy indications (bold 506 L. Izzi peaks). In fact, with the intention of contributing to a smooth transition to the year 2000, the European Central Bank injected, via longer-term and main refinancing operations, an amount of liquidity greater than that demanded by the market. Normal liquidity conditions in the money market were restored with the fine-tuning operation conducted by the ECB on 5 January 2000. Some large positive peaks observed in the shock series far from the main refinancing operation rate changes are due to technical reasons like, first of all, the end of the maintenance period. 5. Conclusions In this chapter a jump-diffusion mean-reverting model for estimating monetary rates is introduced and related to the class of models driven by infinitely divisible processes to which Gaussian, Poissonian and Stable ones belong. Relating the interest rate models to the class of Stable processes is an attempt to unify, from the probabilistic point of view, the extensive literature of pure-diffusion, pure-Poissonian and jump-diffusion processes and completes recent works on the stock and exchange markets on the same topic. We also propose a new numerical procedure to recursively compute interest rates subject Ch. 12: Modelling the Term Structure of Monetary Rates 507 References Ahn, C.M., Thompson, H.E., 1988. Jump-diffusion processes and the term structure of interest rates. Journal of Finance 43 (1), 155­174. Akgiray, V., Booth, G.G., 1988. Mixed diffusion-jump process modeling of exchange rate movements. The Review of Economics and Statistics LXX (4), 631­637. Attari, M., 1999. Discontinuous interest rate processes: An equilibrium model for bond option prices. Journal of Financial and Quantitative Analysis 34 (3), 293­322. Bates, D.S., 1996. Jump and stochastic volatility: exchange rate processes implicit in Deutsche mark options. The Review of Financial Studies 9 (1), 69­107. Baz, J., Das, S.R., 1996. Analytical approximations of the term structure for jump-diffusion processes: A numerical analysis. The Journal of Fixed Income 6 (1), 78­86. Billingsley, P., 1995. Probability and Measure, 3rd edition. Wiley. Björk, T., Kabanov, Y., Runggaldier, W., 1997. Bond market structure in the presence of market point processes. Mathematical Finance 7 (2), 211­239. Black, F., Derman, E., Toy, W., 1990. A one-factor model of interest rates and its applications to treasury bond options. Financial Analysts Journal 46, 33­39. Black, F., Scholes, M., 1973. The pricing of options and other corporate liabilities. Journal of Political Economy 81, 637­654. Blasikiewicz, M., Brown, T.C., 1996. Poisson approximations for Markov-driven point processes. Stochastic Processes and their Applications 62 (May/June), 179­189. Brennan, M.J., Schwartz, E.S., 1980. Analyzing convertible bonds. Journal of Financial and Quantitative Analysis 15 (4), 907­929. Britten-Jones, M., Neuberger, A., 2000. Option prices, implied price processes, and stochastic volatility. Journal of Finance 85, 839­866. Buttiglione, L., Del Giovane, P., Tristani, O., 1997. Monetary policy actions and the term structure of interest rates: A cross-country analysis. Temi di discussione del Servizio Studi, No. 306. Banca ďItalia. Cattani, C., Izzi, L., 2000. Spline wavelet interpolation of interest rates. Working Paper, No. 31/2000. Department of Mathematics "G. Castelnuovo", University of Rome "La Sapienza". Cox, J.C., 1975. Notes on option pricing I: Constant elasticity of variance diffusion. Working Paper. Stanford University. Cox, J.C., Ingersoll, J.E., Jr., Ross, S.A., 1980. An analysis of variable rate loan contracts. Journal of Finance 35 (2), 389­403. Cox, J.C., Ingersoll, J.E., Jr., Ross, S.A., 1985. A theory of the term structure of interest rates. Econometrica 53 (2), 385­407. Cox, J.C., Ross, S.A., 1976. The valuation of options for alternative stochastic processes. Journal of Financial Economics 3 (January/March), 173­202. Cox, J.C., Ross, S.A., Rubinstein, M., 1979. Option pricing: A simplified approach. Journal of Financial Economics (7), 229­263. Das, S.R., 1997. An efficient generalized discrete-time approach to Poisson­Gaussian bond option pricing in the Heath­Jarrow­Morton model. Technical Working Paper, No. 212. NBER. Das, S.R., Foresi, S., 1996. Exact solutions for bond and option prices with systematic jump risk. Review of Derivatives Research 1 (1), 7­24. Derman, E., Kani, I., 1997a. Riding on a smile. Risk 7 (1), 18­20. Derman, E., Kani, I., 1997b. Stochastic implied trees: Arbitrage pricing with stochastic term and strike structure of volatility. In: Quantitative Strategies Research Notes. Goldman Sachs, pp. 1­54. Derman, E., Kani, I., Chriss, N., 1996. Implied trinomial trees of the volatility smile. In: Quantitative Strategies Research Notes. Goldman Sachs, February, pp. 1­24. Dothan, L.U., 1978. On the term structure of interest rates. Journal of Financial Economics 6 (1), 59­69. Estrella, A., Mishkin, F.S., 1995. The term structure of interest rates and its role in monetary policy for the European Central Bank. Working Paper, No. 5279. NBER. 508 L. Izzi European Central Bank, 2001a. Monthly Bulletin, June. European Central Bank, 2001b. The Monetary Policy of the ECB, August. Gnedenko, B.V., 1992. Teoria della Probabilitá. Editori Riuniti. Ho, M.S., Perraudin, W.R.M., Sorensen, B.E., 1996. A continuous-time arbitrage-pricing model with stochastic volatility and jumps. Journal of Business and Economic Statistics 14 (1), 31­43. Hull, J., White, A., 1994. Numerical procedures for implementing term structure models I: Single-factor models. The Journal of Derivatives, Fall, 7­16. Hull, J., White, A., 1996. Using the hull-white interest rate trees. The Journal of Derivatives, Spring, 26­36. Izzi, L., Racheva, B., 2002. The term structure of interest rates in the economic and monetary union. In: Rachev, S.T. (Ed.), Special Issue on Mathematical Models in Market and Credit Risk. Mathematical Methods of Operations Research 55 (2), 187­224. Merton, R.C., 1973. Theory of rational option pricing. Bell Journal of Economics and Management Science 4, 141­183. Rachev, S.T., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley. Rachev, S.T., Rüschendorf, L., 1994. Models for option pricing. Theory of Probability and its Applications 39 (1) 120­152. Rieken, S., 1999. Option Pricing Using Subordinated and Infinitely Divisible Return Processes. Pro Business. Rubinstein, M., 1994. Implied binomial trees. Journal of Finance LXIX (3), 771­818. Samorodnitsky, G., Taqqu, M.S., 1994. Stable Non-Gaussian Random Processes. Stochastic Models with Infinite Variance. Chapman & Hall. Santillan, J., Bayle, M., Thygesen, C., 2000. The impact of the Euro on money and bond markets. Occasional Paper, No. 1. European Central Bank. Schumacher, N., 1997. Option pricing with infinitely divisible returns. Ph.D. Dissertation in Statistics. University of California, Santa Barbara. Scott, L.O., 1997. Pricing stock options in a jump-diffusion model with stochastic volatility and interest rates: Applications of Fourier inversion methods. Mathematical Finance 7 (4), 413­426. Vasicek, O., 1977. An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177­ 188. Zacks, S., Perry, D., Bshouty, D., Bar-Lev, S., 1999. Distributions of stopping times for compound Poisson processes with positive jumps and linear boundaries. Stochastic Models 15 (1), 89­101. Chapter 13 ASSET LIABILITY MANAGEMENT: A REVIEW AND SOME NEW RESULTS IN THE PRESENCE OF HEAVY TAILS YESIM TOKAT Department of Economics, University of California, Santa Barbara, USA SVETLOZAR T. RACHEV Department of Statistics and Applied Probability, University of California, Santa Barbara, USA Institute of Statistics and Mathematical Economics, University of Karlsruhe, Germany e-mail: rachev@lsoe-4.wiwi.uni-karlsruhe.de EDUARDO S. SCHWARTZ Anderson School of Management, University of California, Los Angeles, USA Contents Abstract 510 1. Introduction 511 Part I: Review of the stochastic programming ALM literature 513 2. Stochastic programming ALM models 513 2.1. Chance-constrained model 513 2.2. Dynamic programming 515 2.3. Sequential decision analysis 516 2.4. Stochastic Linear Programming with Recourse (SLPR) 518 2.5. Dynamic generalized networks 520 2.6. Scenario optimization 521 2.7. Robust optimization 522 3. Multistage stochastic ALM programming with decision rules 523 4. Scenario generation 524 4.1. Discrete time series model 524 4.1.1. Multivariate approach 524 4.1.2. Cascade approach 525 4.2. Continuous time model 526 Part II: Stable asset allocation 528 5. Stable distribution 528 5.1. Description of stable distributions 529 5.2. Financial modeling and estimation 530 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 510 Y. Tokat et al. 5.2.1. Maximum likelihood estimation 531 5.2.2. Comparison of estimation methods 532 6. Multistage stable asset allocation model with decision rules 532 6.1. Scenario generation 534 6.2. Valuation of assets 536 6.3. Computational results 537 7. Conclusion 542 References 543 Abstract Asset and liability management is the simultaneous consideration of assets and liabilities in strategic investment planning. In this chapter, asset and liability management models that use stochastic programming framework are reviewed. Most of these models describe the financial uncertainty by a set of representative scenarios. We propose to replace the classical assumption of Gaussian returns in the scenario generation with the stable Paretian distribution, which can capture the leptokurtic nature of financial data. A multistage stochastic asset allocation model with decision rules is analyzed. Optimal asset allocation under the Gaussian and stable Paretian returns are compared. Our computational results suggest that asset allocation may be up to 20% different depending on the utility function and the risk aversion level of the investor. Certainty equivalent return can be increased up to 0.13% and utility can be improved up to 0.72% by switching to the stable Paretian model. Ch. 13: Asset Liability Management 511 1. Introduction Managing assets and liabilities is a concern for banks, pension funds and insurance companies. Before the deregulation of interest rates, the market value of liabilities changed very little from year to year. However, after interest rates were deregulated in 1979, they showed much more volatility. This lead the institutional investors mentioned above to consider assets and liabilities simultaneously during their strategic planning. Strategic investment planning is the allocation of portfolio across broad asset classes such as bonds, stocks, cash and real estate considering the legal and policy constraints facing the institution. Empirical evidence by Brinson, Hood and Beebower (1986) suggests that asset allocation is the most important factor in determining investment performance. Most of the early models in this field are either myopic or represent deterministic formulations of multiperiod problems. Hakansson (1971) show that solving a sequence of single period models optimizes investor's long-run wealth or the expected utility of wealth.1 They assume absence of transaction costs, market impact costs, and liquidity considerations. However, these assumptions are not justifiable in many situations. Myopic models cannot capture long-term investment goals in the presence of transaction costs. There is considerable evidence of predictability in asset returns2 and the myopic models do not take this empirical finding into account. These models tend to produce high portfolio turnovers and opportunistic asset trades. There has been a growing interest in the development of multiperiod stochastic models for asset and liability management (ALM). Kusy and Ziemba (1986) developed a multiperiod stochastic linear programming model for Vancouver City Savings Credit Union for a 5-year planning period. Their work suggests that their stochastic ALM model is superior to 5-year deterministic models. Another successful application of multistage stochastic programming is the Russell­Yasuda Kasai model by Carino et al. (1994). The investment strategy suggested by the model resulted in extra income of $79 million during the first two years of its application (1991 and 1992). An ALM model designed by Mulvey (1994) has been implemented by the Pacific Financial Asset Management Company. Boender (1997) reported the success of a hybrid simulation/optimization scenario model for ALM of pension funds in the Netherlands. The application of the model to a particular pension fund lead to a reduction of the yearly expected contributions of $100 million. The ALM models that have gained applicability are based on stochastic programming with or without decision rules. In these models, the future economic uncertainty is modeled using discrete scenarios. Most of the models assume that the variables or the innovations of these variables follow normal distribution or the continuous time counterpart, Brownian 1 Merton (1969) and Samuelson (1969) show that a constant relative risk aversion investor chooses the same asset proportions independent of the investment horizon if the market is frictionless and returns are independent over time. 2 See for example Hodrick (1992), Bekaert and Hodrick (1992), Kandel and Staumbaugh (1996), and Brandt (1999). 512 Y. Tokat et al. motion. In response to the empirical evidence about the heavy tails, high peak and possible skewness in financial data, Fama (1965) and Mandelbrot (1963, 1967) propose stable Paretian distribution3 as an alternative model. Among the alternative non-Gaussian distributions in the literature,4 stable distribution has unique characteristics that make it an ideal candidate. The stable laws are the only possible limit distributions for properly normalized and centered sums of independent identically distributed random variables (Embrechts et al., 1997; Rachev and Mittnik, 2000). If a financial variable can be regarded as the result of many microscopic effects, then it can be described by a stable law. Stable distributions are leptokurtotic. When compared to normal distribution, they typically have fatter tails and higher peak around the center. Asymmetry is allowed. Due to its flexibilities, stable model fits the empirical distribution of the financial data better [see Mittnik et al. (2000)]. Gaussian distribution is a special case of stable distribution. In fact, it is the only distribution in the stable family with a finite second moment. Although autoregressive conditional heteroskedastic models driven by normally distributed innovations imply unconditional distributions that possess heavier tails, there is still considerable kurtosis unexplained by this model. Mittnik, Paolella and Rachev (2000) present empirical evidence favoring stable hypothesis over the normal assumption as a model for unconditional, homoskedastic conditional, and heteroskedastic conditional distributions of several asset return series. The purpose of this chapter is to review the stochastic programming models in the ALM literature and to analyze an asset allocation problem in the presence of heavy tails. In the first part of the chapter, we review ALM models that utilize stochastic programming methodology. In the second part, a multistage asset allocation model with decision rules is analyzed under the Gaussian and stable returns scenarios. Our computational results suggest that if the investor has very high or low risk aversion, then the normal and stable scenarios result in similar asset allocations. However, when the risk aversion level is between the two cases, the two distributional assumptions may result in considerably different asset allocations depending on the utility function and the risk aversion level of the decision maker. The investor may reduce his equity allocation up to 20%, increase his certainty equivalent wealth up to 0.13% and improve his utility by 0.72% by switching to stable model. Section 2 reviews the stochastic programming ALM models without imposing any decision rules. Section 3 describes stochastic programming models that assume the investor uses the same decision rule to update the asset allocation every period. In Section 4, we present the scenario generation methods available in the ALM literature. In the second part of the chapter, a stable asset allocation model is described. Section 5 states the reasons for desirability of the stable model, describes the distribution and presents estimation methods. In Section 6, we set up the asset allocation model and report the computational results. Section 7 concludes. 3 We will call it stable distribution from now on. 4 A well known alternative to stable model is Student-t distribution. A major drawback of Student-t distribution is its lack of stability with respect to summation, i.e., a portfolio of Student-t distributed asset returns does not have Student-t distribution. It is not supported by a central limit theorem. Student-t distribution is a symmetric distribution and it cannot capture the possible skewness in financial data. Ch. 13: Asset Liability Management 513 Part I: Review of the stochastic programming ALM literature 2. Stochastic programming ALM models This method provides a general-purpose modeling framework that conveniently addresses real world concerns such as transaction costs, taxes, legal and policy constraints. The number of decision variables becomes very large resulting in large scale optimization problems. The computational costs make it impractical to test the recommendations out of the sample. We describe various modeling approaches developed within this framework: 2.1. Chance-constrained model Charnes and Kirby (1966) develop a chance-constrained model that expresses future deposits and loan payments as jointly distributed random variables, and capital adequacy formula by chance-constraints on meeting withdrawal claims. A drawback of the model is that constraint violations are not penalized according to their magnitude. The methodology has found applications in various areas: Charnes, Gallegos and Yao (1982) applies this methodology to balance sheet management, Li (1995a, b) uses chanceconstrained programming in portfolio analysis of insurance companies, and Dert (1998) develops a multistage chance-constrained ALM model for a defined benefit pension fund. As opposed to the original approach of Charnes and Kirby, Dert models the uncertainty using scenarios rather than making distributional assumptions. Derťs model minimizes the cost of funding while ensuring the stability of contributions and ability to make benefit payments timely with an acceptable level of insolvency risk. The solvency requirement is the asset level being at least equal to the product of required funding level with the value of the remaining liabilities (constraint (7)). The asset value falling below the required level is modeled as a probabilistic constraint. Since uncertainty is modeled through scenarios, binary variables are needed to formulate the chance constraint explicitly (constraints (8)­(10)). It is assumed that remedial contributions are made in case of under-funding (constraint (6)). The ALM model is formulated as follows: minA01 + T -1 t=1 St s=1 P(t,s)tsYts + T t=1 St s=1 P(t,s)tsZts subject to Yl ts Yts Yu ts, (1) yl ts Yts Wts yu ts, (2) Yts Wts - Yt-1,^s Wt-1,^s t, (3) 514 Y. Tokat et al. Ats + Yts - lts = N i=1 Xits, (4) xl its(Ats + Yts - lts) Xits xu its(Ats + Yts - lts), t = 0,...,T - 1, s = 1,...,St , (5) Ats = Zts + N i=1 erits Xi,t-1,^s, (6) Ats Lts, (7) Zts ftsMts, (8) St s=1 P (t,s)|(t - 1, ^s) fts t-1,^s, (9) fts {0,1}, t = 0,...,T - 1, s = 1,...,St , (10) where, t = 0,1,...,T is the time period, s = 1,2,...,St is the status of the world, i = 1,2,...,N is the asset class, is the demanded funding level, t is the maximal raise in contribution per period as a fraction of the cost of wages at time t, ts is the discount factor for a cash flow at time t in state s, lts is the benefit payments and costs to the fund at time t in state s, Lts is the actuarial reserve at time t in state s, is the penalty parameter to penalize remedial contributions, rits is the continuous return on investment of each asset class i during period t in state s, Mts is the large constant at time t in state s, Wts is the cost of wages during period t in state s, Ats is the total asset value before receiving regular contributions and making benefit payments at time t in state s, fts is the binary variable for remedial contributions at time t in state s, ts is the probability of under-funding at time t + 1 given the world was in state s at time t, Xits is the amount of money invested in asset class i at time t in state s, xits is the fraction of asset value invested in asset class i at time t in state s, Yts is the regular contribution during period t in state s, yts is the regular contribution as a fraction of the cost of wages during period t in state s, Zts is the remedial contribution at time t in state s. Ch. 13: Asset Liability Management 515 The first three constraints, namely (1)­(3), limit the regular contribution amount, regular contribution as a fraction of wages and maximal raise in contribution as a fraction of cost of wages, respectively. After receiving regular contributions and making benefit payments, the assets are reallocated (4) considering the upper and lower bounds on the asset mix (5). The price inflation, wage inflation, and asset return scenarios are generated using vector autoregressive model. The characteristics of participants are modeled using a Markov chain. More detailed description of a similar model is given in Dert (1998). 2.2. Dynamic programming The main idea behind dynamic programming is to solve the problem by dealing with one stage at a time. The procedure produces one solution per possible state in each stage. If there are many state variables or the objective function depends in an arbitrary way on the whole history up to the current period, this method is not very appropriate. It can handle small number of financial instruments simultaneously. Therefore it is of limited use in practice. Eppen and Fama (1971) model a three-asset portfolio problem using this approach. At any point in time, they assume that state of the system is described by two variables: m being the level of cash balance (m N), and b being the level of bond account (b {N - N-}). Decisions concerning the state of the system are made at equally spaced discrete points in time. The stochastic changes in the cash balance between the periods are a sequence of independent identically distributed random variables with the discrete probability mass function p(d). The function p(d) is positive only on a finite state space, i.e., there is a finite K such that p(d) = 0 if |d| > K. The notation is as follows: T (m,b;m ,b ) is the minimum transfer cost involved in changing the state from (m,b) to (m ,b ), ch is the marginal opportunity cost of starting a period with an additional dollar of cash, cp is loss of being a dollar short on cash which is incurred at the beginning of the period, L(m ) is the penalty cost of carrying cash: L(m ) = chm , m 0, -cpm , m < 0, is the discount factor, fn(m,b) is the discounted expected cost for an n period problem whose state at the beginning of period n is (m,b). The recursive relationship for fn(m,b) is given by: fn(m,b) = min m ,b T (m,b;m ,b ) + Gn(m ,b ) , 516 Y. Tokat et al. where Gn(m,b) = cb h b + L(m) + K d=-K fn-1(m + d,b) p(d). Gn(m,b) is the current expected holding penalty cost (the first two terms) plus the discounted expected cost if a decision is made to start period n in state (m,b) and an optimal policy is followed in period n - 1 and all future periods. 2.3. Sequential decision analysis This approach uses implicit enumeration to find an optimal solution. It results in extremely large equivalent linear programming problems since it enumerates all possible portfolio strategies for all scenarios in all periods of consideration. The method ensures feasibility of the first period for every possible scenario, this shrinks the feasible set and gives substantial importance to scenarios with low probabilities of occurrence. Stochastic decision tree model by Bradley and Crane (1972) overcomes the computational difficulties of the approach by using a decomposition algorithm. The objective is the maximization of expected terminal wealth of the firm. Constraint (11) guarantees that the firm cannot purchase assets that cost more than it has funds available. The second set of constraints balance the inventory. The net realized capital losses in a period are controlled by some pre-specified upper bound using (13). Constraint (14) limits the holding of a particular asset. Their linear programming formulation5 is max eN EN p(eN ) K k=1 N-1 m=0 yk m(em) + uk m,N (eN ) hk m,N (eN ) + yk N (eN ) + uk N,N (eN ) bk N (eN ) subject to K k=1 bk n(en) - K k=1 n-2 m=0 yk m(em)hk m,n-1(en-1) + yk n-1(en-1)bk n-1(en-1) - K k=1 n-1 m=0 1 + gk m,n(en) sk m,n(en) = fn(en), (11) -hk m,n-1(en - 1) + sk m,n(en) + hk m,n(en) = 0, m = 0,1,...,n - 2, (12) 5 The formulation is taken from Kusy and Ziemba (1986). Ch. 13: Asset Liability Management 517 -bk n-1(en-1) + sk n-1,n(en) + hk n-1,n(en) = 0, hk 0,0(e0) = hk 0, - K k=1 n-1 m=0 gk m,n(en)sk m,n(en) Ln(en), (13) kKi bk n(en) + n-1 m=0 hk m,n(en) Ci n(en), i = 1,2,...,I, (14) yk m,n(en) 0, sk m,n(en) 0, hk m,n(en) 0, m = 1,...,n - 1, where en En, n = 1,2,...,N, k = 1,2,...,K, en is an economic scenario from period 1 to n having probability p(en), En is the set of possible economic scenarios from period 1 to n, K is the total number of assets, Ki is the number of assets of type i, N is the number of time periods, yk m(em) is the income yield per dollar of purchase price in period m of asset k conditional on scenario en, uk m,N (eN ) is the expected terminal value per dollar of purchase price in period m of asset k held at period N, conditional on scenario en, bk n(en) is the dollar amount of asset k purchased in period n conditional on scenario en, hk m,n(en) is the dollar amount of asset k purchased in period m and held in period n conditional on scenario en, sk m,n(en) is the dollar amount of asset k purchased in period m and sold in period n conditional on scenario en gk m,n(en) is the capital gain or loss per dollar of purchase price of asset k purchased in period m and sold in period n conditional on scenario en, fn(en) is the incremental increase or decrease of funds available for period n, Ln(en) is the dollar amount of maximum allowable net realized capital losses in period n, Ci n(en) is the upper bound in dollars on the amount of funds invested in asset type i in period n. They use a decomposition algorithm to breakdown the problem and use an efficient technique to solve the sub-problems of the overall portfolio. However, the solution is still computationally intractable for real life problems. 518 Y. Tokat et al. 2.4. Stochastic Linear Programming with Recourse (SLPR) The basic formulation of the general T -stage SLPR model is min x1 c1x1 + Ew1 min x2 c2(w1) x2 + + EwT -1 min xT cT (wT -1) xT subject to A1x1 = b1, B2(w1)x1 + A2(w1)x2 = b2(w1), B3(w2)x2 + A3(w2)x3 = b3(w2), ... BT (wT -1)xT -1 + AT (wT -1)xT = bT (wT -1), lt xt ut, t = 1,2,...,T, where wt is the random vector that generates the coefficients bt, ct , At, and Bt of the decision problem at the t-th stage, t = 2,...,T , lt , ut are the vector of deterministic bounds on xt at stage t, t = 2,...,T , b1, c1, and A1 are the deterministic first stage coefficient vectors or matrices, and xt is the vector decision variable. The objective formalizes a sequence of optimization problems corresponding to different stages. At stage 1, the outcome completely depends on future realizations of the uncertainty. After the first period, decisions are allowed to be a function of the observed realization (xt-1,wt) only. One first decides on x1, then observes w1, then decides on x2, then observes w2, and so on. The recourse decisions depend on the current state of the system as determined by previous decisions and random events. The uncertainty is modeled by using finite scenarios which have pre-assigned probabilities. In this case, the problem reduces to a large linear program of a special structure: min c1x1 + K2 k2=2 pk2 ck2 xk2 + K3 k3=K2+1 pk3 ck3 xk3 + + KT kT =KT -1+1 pkT ckT xkT subject to A1x1 = b1, Bk2 x1 + Ak2 x2 = bk2 , k2 = 2,...,K2, Bk3 xa(k3) + Ak3 xk3 = bk3 , k3 = K2 + 1,...,K3, ... BkT xa(kT ) + AkT xkT = bkT , kT = KT -1+1,...,KT , lt xkt ut , kt = Kt-1 + 1,...,Kt, t = 1,2,...,T. The scenarios used determine the size, form and optimal solution of the linear program. There are finitely many sequences of possible realizations of the random coefficients Ch. 13: Asset Liability Management 519 (ckt , Akt , Bkt , bkt ) with path probabilities pkt of the subsequences of these realizations, pkt > 0, kt , Kt kt =Kt-1+1 pkt = 1, t = 2,...,T , that identify the discrete joint probability distribution of w = {w1,...,wT -1}. In the program, a(kt) denotes the immediate predecessor of kt, for example a(k2) = k1. An important application of stochastic linear programming with simple recourse model is given by Kusy and Ziemba (1986). The model was developed for the Vancouver City Savings Credit Union for a 5-year planning period. The formulation has the following features: (1) Changing yield spreads across time, transaction costs associated with selling assets prior to maturity, and synchronization of cash flows across time are incorporated in a multiperiod context. (2) Assets and liabilities are considered simultaneously to satisfy basic accounting principles and match liquidities. (3) Transaction costs are included. (4) Uncertainty of withdrawal claims and deposits is reflected in uncertain cash flows. (5) Uncertainty of interest rates is explicitly recognized. (6) Legal and policy constraints are taken into account. Their two-stage model did not contain end effects. Three possible scenarios that are independent over time were considered to keep the computations tractable. Their results indicate that their model generates policies that are superior than stochastic decision analysis. Another milestone after the Kusy and Ziemba model is the Russell­Yasuda Kasai model by Carino et al. (1994). The model builds on the previous research to design a large scale SLPR model with possibly dependent scenarios, end effects, and all the relevant institutional and policy constraints. We present their model next. Decision variables are Vt : total fund market value at time t, Xnt : market value in asset n at time t, wt+1: income shortfall at time t + 1, and vt+1: income surplus at time t + 1. Random coefficients are RPnt+1: price return of asset n from end of t to end of t + 1, RInt+1: income returns of asset n from end of t to end of t + 1, Ft+1: deposit inflow from end of t to end of t + 1, Pt+1: principal payout from end of t to end of t + 1, It+1: interest payout from end of t to end of t + 1, gt+1: rate at which interest is credited to policies from end of t to end of t + 1, Lt : liability valuation at t. The objective is to maximize the expected market value of the firm at the horizon net of penalties for the shortfalls. Expected amount by which goals are not achieved is a more tangible risk measure than variance. The penalty costs of shortfalls may be based on expected financial impact or psychological costs. The piecewise linear convex cost function for the shortfall is denoted by ct(wt). (15) is the budget constraint. The return on assets and inflow of deposits net of principal and interest payout gives the total fund market value (16). 520 Y. Tokat et al. Liability balances and cash flows are computed to model liability accumulation (18). If Yasuda does not achieve adequate income, recourse action must be taken at a cost. The income generation is modeled as a soft constraint (17), which permits surpluses or deficits. maxE Vt - T t=1 ct (wt) subject to n Xnt - Vt = 0, (15) Vt+1 - n (1 + RPnt+1 + RInt+1)Xnt = Ft+1 - Pt+1 - It+1, (16) n RInt+1Xnt + wt+1 - vt+1 = gt+1Lt, (17) Lt+1 = (1 + gt+1)Lt + Ft+1 - Pt+1 - It+1, (18) Xnt 0, wt+1 0, vt+1 0. The abbreviated formulation does not include some elements of the model. There are additional types of shortfalls, indirect investment types, regulatory restrictions, multiple accounts, loan assets, tax effects and end effects that are included in the original model. See Carino and Ziemba (1998) for the details of the formulation. Carino, Myers and Ziemba (1998) discusses the concepts, technical issues and uses of the model. Korhonen (1987) applies SLPSR to multicriteria decision making. Oguzsoy and Guven (1997) use the SLPSR methodology for a bank ALM model in Turkey. Geyer et al. (2002) describes a pension fund planning model that utilizes this approach. Some authors argue against linearizing the objective function. Bai, Carpenter and Mulvey (1997) demonstrates that nonlinear programs are not much more difficult than their linear counterparts. Zenios et al. (1998) applies multistage stochastic nonlinear programming with recourse to fixed income portfolio management. 2.5. Dynamic generalized networks Multistage stochastic nonlinear programs with recourse can be represented by generalized network formulations. This framework can be used to account for the dynamic aspects of ALM problems while considering uncertainty in all relevant parameters and accommodating random parameters by means of a moderate number of scenarios. The network structure is exploited in the solution procedure.The problem is decomposed into its constituent scenario subproblems. Preserving the network structure of each subproblem is challenged by the existence of non-anticipativity constraints. These constraints dictate that scenarios that share common information history up to a specific period must yield the same decision up to that period, i.e., dependence on hindsight is avoided. Ch. 13: Asset Liability Management 521 The desired decomposition is achieved by dualizing the non-anticipativity constraints. The algorithm by Rockafeller and Wets (1991) operates on the split-variable form of the original problem. The problem is solved by progressively enforcing the non-anticipativity constraints. Mulvey (1994) utilizes this methodology in designing an asset allocation model for the Pacific Financial Asset Management Company. The single period portfolio model is formulated as a network model. The arcs can be constrained to impose legal or policy constraints. The objective function is the expected utility of surplus at the end of the planning horizon. The model was implemented in a PC environment with acceptable accuracy and efficiency. Mulvey and Vladimirou (1989, 1991, 1992) present several aspects of stochastic generalized network models. See also the review of Mulvey and Ziemba (1995) which discusses the model in a general context. 2.6. Scenario optimization According to the scenario optimization approach, one computes a solution to the deterministic problem under all scenarios then solves a coordinating model to find a single feasible policy. This approach can be compared to the scenario aggregation method suggested by Rockafeller and Wets (1991). It handles multistage stochastic programming problems, and allows for decisions to depend on future outcomes. On the other hand, scenario optimization is designed for two-period models only. It is assumed that scenario probabilities are functions of time, and estimates of the random parameters in the future stages are poor. Hence one only selects a policy for the immediate future. Suppose the scenario subproblem is vs = mincT s x subject to Asx = bs, (19) Adx = bd, (20) x 0, where the objective function is a particular realization of the uncertainty under scenario s, (19) is also a particular realization of the uncertain constraints under scenario s, and (20) is the deterministic constraints. A possible coordinating model could be min s ps cT s x - vs 2 + s ps Asx - bs 2 subject to Adx = bd, x 0. 522 Y. Tokat et al. The coordinating model tracks the scenario solution as close as possible while still maintaining feasibility. Alternative coordination models are discussed in Dembo (1991, 1993). Illustrative applications in portfolio immunization and dedication are also presented therein. 2.7. Robust optimization Robust optimization approach integrates goal programming formulations with a scenario based description of the uncertainty in the data. The aim is to produce solutions that are relatively less sensitive to the realizations of different scenarios. The objective function, in its most general form, is composed of two terms: the first term trades off between mean value and the variability in the mean; the second term is a feasibility penalty function. Consider the following formulation. min (x,y1,...,ys) + w(z1,...,zs) subject to Ax = b, Bsx + Csys + zs = es, s , x 0, ys 0, s , where x is the vector of decision variables whose value cannot be adjusted once a specific realization of the data is observed, y is the vector of decision variables that are subject to adjustment once uncertain parameters are observed, z is the vector of decision variables that measure infeasibility allowed, s = {1,...,S} is the set of possible scenarios, A, b, Bs, Cs, es are the coefficients related to the variables, w is the goal programming weight that is used to derive a spectrum of answers that trade-off the two objectives. The inclusion of higher order moments in the objective function reduces the variability of the solution. Hence, few adjustments become necessary as scenarios unfold. The model recognizes that it may not always be possible to find a feasible solution to the problem under all scenarios. The penalty function is used to detect the least amount of infeasibilities to be dealt with outside the model. See Mulvey, Carpenter and Mulvey (1995) for possible objective function choices and their applications. Bai, Vanderbrei and Zenios (1997) argues that linear objective functions fail to identify robust solutions and concave utility functions produce much better results for risk averse decision makers even when penalty term is not used. Both papers compare robust optimization with stochastic linear programming approach (SLP). Since SLP optimizes only the first moment of the distribution of the objective value, more adjustment is needed as scenarios are realized. However, there is no mechanism for choosing w, and the cost of the robust solution may be higher than that of SLP. Ch. 13: Asset Liability Management 523 3. Multistage stochastic ALM programming with decision rules In this method, time is discretized into n-stages across the planning horizon, and investments are made using a decision rule, e.g., fixed mix, at the beginning of each time period. The decision rule can easily be tested with out-of-sample scenarios and confidence limits on the recommendations can be constructed. The use of this approach hinges on discovering policies that are intuitive and that will produce superior results. Decision rules may lead to non-convexities and highly nonlinear functions. Some decision rules used in the literature are fixed mix, no rebalancing, life cycle mix (Berger and Mulvey, 1998), constant proportional portfolio insurance (Perold and Sharpe, 1988), target wealth path tracking (Mulvey and Ziemba, 1998). Boender (1997) and Boender, van Aalst and Heemskerk (1998) describe an ALM model designed for Dutch pension funds. Their goal is to find efficient frontiers of initial asset allocations, which minimize the value of downside risk for given certain values of average contribution rates. The scenarios are generated across time horizon of interest. The management selects a funding policy, an indexation policy of the earned pension rights, and an investment decision rule. These strategies are simulated against generated scenarios. Then, the objective function of the optimization problem is a completely specified simulation model except for the initial asset mix. The hybrid simulation/optimization model has the following three steps: (1) Randomly generate initial asset mixes, simulate them, and evaluate their contribution rates and downside risks. (2) Select the best performing initial asset mixes that are located at a minimal critical distance from each other. (3) Use a local search algorithm to identify the optimal initial asset mix. Maranas et al. (1997) adopt another approach to stochastic programming with decision rules. They determine the optimal parameters of the decision rule by means of a global optimization algorithm. They propose a dynamically balanced investment policy which is specified by the following parameters: w0: initial dollar wealth, rs it : percentage return of asset i {1,2,...,I} in time period t {1,2,...,T } under scenario s {1,2,...,S}, ps: probability of occurrence of scenario s. The decision variables are: ws t : dollar wealth at time t in scenario s, i : fraction of wealth invested in asset category i (note that it is constant over time). The model is a multiperiod extension of mean­variance method. The multi-period efficient frontier is obtained by varying (0 1). The formulation is as follows: max i,ws t mean(wT ) - (1 - )var(wT ) subject to ws T = w0 T t=1 I i=1 (1 + rs it)i , s = 1,...,S, (21) 524 Y. Tokat et al. I i=1 i = 1, (22) 0 i 1, i = 1,...,I. The wealth accumulation is governed by (21). When (21) is substituted into the objective1 i = 1, (22) 0 i 1, i = 1,...,I. The wealth accumulation is governed by (21). When (21) is substituted into the objective1 i = 1, (22) 0 i 1, i = 1,...,I.lat2 Tf.8-2..31lo8()].126r()T Ch. 13: Asset Liability Management 525 There are a number of variables used to predict stock returns in various studies. Brennan, Schwartz and Lagnado (1997) use Treasury bill rate, Treasury bond rate and dividend yield as state variables in their model. Brandt (1999) uses lagged excess return on NYSE index over Treasury bill rate as a state variable in addition to dividend yield, default spread and term spread. VAR may sometimes diverge from long-term equilibrium. Boender, van Aalst and Heemskerk (1998) extend VAR model to a Vector Error Correction Model (VECM) which additionally takes economic regime changes and long term equilibria into account. First, a sub-model generates future economic scenarios. Then, a liability sub-model determines the earned pension rights and payments corresponding to each economic scenario. The economic scenario sub-module uses time series analysis. The vector of the lognormal transformations of inflation, wage growth, bond return, cash return, equity return, real estate return and nominal GNP growth is yt . Diagnostic tests revealed that the order of the VAR process as 1. yt N + {yt-1 - }, , where N(,) denotes a Gaussian distribution with mean and covariance matrix . The extended VECM is given as yt N 1yt-1 + 2CT (xt-1 - 1I{T1} - 2I{T2}), , where the 1 corresponds to the short term dynamics and the 2 corresponds to the long term correction. The index set T1 specifies the period of an economic regime with growth vector 1, and T2 gives the period of another economic regime with growth vector 2. The second term, CT (xt-1 - 1I{T1} - 2I{T2}) generates the error correction to restore violations of the equilibria, while 2 determines the speed of the response. They estimated the model by row wise ordinary least squares and seemingly unrelated regression methods. Then, scenarios are generated iteratively using the parameter estimates. They report that the VECM improves the explanatory power of the model. The VECM has a more clear economic interpretation which incorporates regime changes and long run equilibrium. The liability sub-module uses a push Markov model to determine the future status of each individual plan member depending on age, gender, and employee category. Given this information, the pull part of the model is used to determine additional promotions and new employees. Then, the pension rules are applied to compute the guaranteed pension payments and earned pension rights. 4.1.2. Cascade approach Wilkie (1986) suggests using a cascade structure rather than a multivariate model, in which each variable could affect each of the others. He considers inflation, ordinary shares and fixed interest securities as the main economic determinants of a stochastic investment model. The model includes the following variables: inflation, an index of share of dividends, the dividend yield (the dividend index divided by the corresponding price index) 526 Y. Tokat et al. Fig. 1. Wilkie's scenario generation model. on these share indices, and the yield on consols (as a measure of the general level of fixed interest yields in the market). Wilkie's investigations and actuarial experience lead him to the conclusion that inflation is the driving force for the other investment variables. Figure 1,6 where the arrows indicate the direction of influences, depicts the cascade structure of the model. The inflation is described first using a first order autoregressive model. The dividend yield depends on both the current level of inflation and the previous values of itself. The index of share dividends depends on inflation and the residual of the yield model. The consol yield also depends on inflation and the residual of the yield model along with the previous values of itself. Then, the estimated parameters are used to generate future economic scenarios. Wilkie (1986) improves this basic model. 4.2. Continuous time model Mulvey (1996) designs an economic projection model for Towers Perrin using stochastic differential equations. The model has a cascade structure as depicted in Figure 2.7 First the Treasury yield curve, and then government bond returns, price and wage inflation, and large cap returns are generated. Lastly, returns on primary asset categories such as small cap stock and corporate bonds are projected. It is assumed that short- and long-term interest rates (denoted by rt and lt , respectively) are linked through a correlated white noise term. The spread between the two is kept under control by using a stabilizing term. This variant of the two-factor Brennan and Schwartz (1982) model is as follows: drt = a(r0 - rt)dt + b rt dz1, dlt = c(l0 - lt)dt + e lt dz2, 6 Source: Wilkie (1986). 7 Resource: Mulvey (1996). Ch. 13: Asset Liability Management 527 Fig. 2. Mulvey's scenario generation model. where a and c are functions that depend on the spread between the long and short rates, b and e are constants, and dz1 and dz2 are correlated Weiner terms. The price inflation rate is modeled as a diffusion process that depends on short term interest rate: dpt = ndrt + g(p0 - pt )dt + h(vpt)dz3, where pt is the price inflation at time t, and vpt is the stochastic volatility at time t. Since the volatility of inflation persists, it is represented using Autoregressive Conditional Heteroskedasticity (ARCH) model. The equation for the stochastic volatility is given by: dvpt = k(vp0 - vpt)dt + m vpt dz4, where g and k are functions that handle the independent movement of the underlying prices at time t for the price inflation and stochastic volatility, respectively, and h and m are constants. Real yields are related to interest rates, current inflation, and expectations for future inflation. The diffusion equation for long-term yield is dyt = n(yu,yt,lu,lt ,pu,pl)dl + q(yu,yt,lu,lt ,pu,pl)dt + u(yt)dz5, where yu is the normative level of real yields, n and q are vector functions that depend upon various economic factors. The wage inflation is connected to price inflation in a lagged and smoothed fashion. The stock returns are broken down into two components: dividends and capital appreciation, and they are estimated independently. Mulvey reports that the decomposed structure provides more accurate linkages to the key economic factors such as inflation and interest rates. The parameters of the model are calibrated by considering the overall market trends in the light of historical evidence and subjective beliefs of the management. This model has been in use at Towers Perrin since 1992. Mulvey and Thorlacius (1998) extend the model 528 Y. Tokat et al. to a global environment that links the economies of individual countries within a common framework. Modeling term structure of interest rates is a very essential part of scenario generation. The use of binomial lattice models in the valuation of interest rate contingencies is prevalent. However, the number of scenarios grows very large if the valuation is to be precise. There are some sampling methods to reduce the size of the event tree such as Monte Carlo simulation, antithetic sampling and stratified sampling. However, Klaasen (1997) points out that even if the underlying description is arbitrage-free, a subset of it may include arbitrage opportunities that may lead to spurious profits. Instead of sampling paths, Klaasen (1998a) suggests an aggregation method that can be used to reduce the size of the event tree preserving the arbitrage free description of uncertainty. In Klaasen (1998b), he presents a solution algorithm which iteratively disaggregates the condensed representation towards a more precise description of uncertainty. Part II: Stable asset allocation In this part of the chapter, we analyze asset allocation problem for an investor that maximizes isoelastic utility function or an analog of mean-variance objective function at the end of the investment horizon. Stochastic programming with decision rules is used as the solution methodology. The financial uncertainty is represented by using a branching event tree. Each node of the tree represents the joint outcome of all the random variables at that decision stage and each path through the event tree represents a `scenario'. Financial scenarios are generated by using asset return predictors documented in the literature. The investor perceives the world as modeled by the scenario tree and chooses the fix mix proportions that maximize his objective function. The asset allocation problem is solved under Gaussian and stable return scenarios. Optimal allocations and utility function values under these alternative sets of scenarios are compared. We state the reasons for desirability of the stable model, describe the distribution and present estimation methods in Section 5. Section 6 sets up the asset allocation model and reports computational results. Section 7 concludes. 5. Stable distribution There are several important reasons for modeling financial variables using stable distributions. The stable law is supported by a generalized central limit theorem (Embrechts et al., 1997; Rachev and Mittnik, 2000). Stable distributions are leptokurtotic. Since they can accommodate the fat tails and asymmetry, they fit empirical distribution of the financial data better. Any distribution in the domain of attraction of a specified stable distribution will have properties, which are close to the ones of stable distribution. Even if the observed data does Ch. 13: Asset Liability Management 529 not exactly follow the ideal distribution specified by the modeler, in principle, the resulting decision is not affected. Each stable distribution has an index of stability, which remains the same regardless of the sampling interval adopted. The index of stability can be regarded as an overall parameter that can be employed in inference and decision making. However, we should note that for some financial data empirical analysis shows that the index of stability increases as the sampling interval increases. It is possible to check whether a distribution is in the domain of attraction of a stable distribution or not by examining the tails of the distribution. The tails dictate the properties of the distribution. This section describes the properties of stable distribution and addresses the estimation issues. 5.1. Description of stable distributions If the sums of linear functions of independent identically distributed (iid) random variables belong to the same family of distributions, the family is called stable. Formally, a random variable r has stable distribution if for any a > 0 and b > 0 there exists constants c > 0 and d R such that ar1 + br2 d = cr + d, where r1 and r2 are independent copies of r, and d = denotes equality in distribution. The distribution is described by the following parameters: (0,2] (index of stability), [-1,1] (skewness parameter), R (location parameter), and [0,) (scale parameter). The variable is then represented as r S,(,). Gaussian distribution is actually a special case of stable distribution when = 2, = 0. The smaller the stability index is, the stronger the leptokurtic nature of the distribution becomes, i.e., with higher peak and fatter tails. If the skewness parameter is equal to zero, as in the case of Gaussian distribution, the distribution is symmetric. When > 0 ( < 0), the distribution is skewed to the right (left). If = 0 and = 0, then the stable random variable is called symmetric -stable (SS). The scale parameter generalizes the definition of standard deviation. The stable analog of variance is variation, v, which is given by . Stable distributions generally do not have closed form expressions for density and distribution functions. They are more conveniently described by characteristic functions. The characteristic function of random variable r, r() = E[exp(ir)], is given by r() = exp - || 1 - i sign()tan 2 + i , if = 1, exp -|| 1 - i 2 sign()ln + i , if = 1. 530 Y. Tokat et al. The p-th absolute moment of r, E|X|p = 0 P(|X|p > y)dy, is finite if 0 < p < , and infinite otherwise. Hence, when < 1 the first moment is infinite, and when < 2 the second moment is infinite. The only stable distribution that has finite first and second moments is the Gaussian distribution. In models that use financial data, it is generally assumed that (1,2]. There are several reasons for this: (1) When > 1, the first moment of the distribution is finite. It is convenient to be able to speak of expected returns. (2) Empirical studies support this parametrization. (3) Although the empirical distributions of the financial data sometimes depart from normality, the deviation is not "too much". In scenario generation, one may need to use multivariate stable distributions. The extension to the multivariate case is nontrivial. Although most of the literature concentrates on the univariate case, recently some new results have become available. See for example Samorodnitsky and Taqqu (1994), Rachev and Mittnik (2000). If R is a stable d-dimensional stable vector, then any linear combination of the components of R is also a stable random variable. However, the converse is true under certain conditions (Samorodnitsky and Taqqu, 1994). The characteristic function of R is given by: Y () = exp - Sd T s 1 - isign T s tan 2 (ds) + iT , if = 1, exp - Sd T s 1 + i 2 sign T s ln T s (ds) + iT , if = 1, where is the spectral measure which replaces the scale and skewness parameters that enter the description of the univariate stable distribution. It is a bounded nonnegative measure on the unit sphere Sd, and s Sd is the integrand unit vector. The index of stability is again , and is the vector of locations. Stable distributions have infinite variances. The stable equivalent of covariance for SS variables is covariation: [R1,R2] = Sd s1s -1 2 (ds), where (R1,R2) is a SS vector ( (1,2)), and x -1 = |x|k sign(x). The matrix of covariations determines the dependence structure among the individual variables. 5.2. Financial modeling and estimation Financial modeling frequently involves information on past market movements. Examples include technical analysis to derive investment decisions, or researchers assessing the efficiency of financial markets. In such cases, it is not the unconditional return distribution which is of interest, but the conditional distribution, which is conditioned on information Ch. 13: Asset Liability Management 531 contained in past return data, or a more general information set. The class of autoregressive moving average (ARMA) models is a natural candidate for conditioning on the past of a return series. These models have the property that the conditional distribution is homoskedastic. In view of the fact that financial markets frequently exhibit volatility clusters, the homoskedasticity assumption may be too restrictive. As a consequence, conditional heteroskedastic models, such as Engle's (1982) autoregressive conditional heteroskedastic (ARCH) models and the generalization (GARCH) of Bollerslev (1986), possibly in combination with an ARMA model, referred to as an ARMA-GARCH model, are now common in empirical finance. It turns out that ARCH-type models driven by normally distributed innovations imply unconditional distributions which themselves possess heavier tails. Thus, in this respect, ARCH models and stable distributions can be viewed as competing hy- potheses. Mittnik, Rachev and Paolella (1997) present empirical evidence favoring stable hypothesis over the normal assumption as a model for unconditional, homoskedastic conditional, and heteroskedastic conditional distributions of several asset return series. 5.2.1. Maximum likelihood estimation We will describe an approximate conditional maximum-likelihood (ML) estimation procedure suggested by Mittnik et al. (1999). The unconditional ML estimate of = (,,,) is obtained by maximizing the logarithm of the likelihood function L() = T t=1 S, rt - -1 . One needs to use conditional ML to estimate ARMA and ARMA-GARCH models. The ML estimation is conditional, in the sense that, when estimating, for example, an ARMA(p,q) model, one conditions on the first p realizations of the sample, rp, rp-1,...,r1, and, when > 1 holds, sets innovations p, p-1,...,p-q+1 to their unconditional mean E(t ) = 0. The estimation of all stable models is approximate in the sense that the stable density function, S,(,), is approximated via fast Fourier transformation (FFT) of the stable characteristic function, - eitx dH(x) = exp - |t| 1 - i sign(t)tan 2 + it , if = 1, exp -|t| 1 + i 2 sign(t)ln|t| + it , if = 1, where H is the distribution function corresponding to S,(,). This ML estimation method essentially follows that of DuMouchel (1973), but differs in that the stable density is approximated numerically by an FFT of the characteristic function rather than some series expansion. As DuMouchel shows, the resulting estimates are consistent and asymptotically normal with the asymptotic covariance matrix of T 1/2( ^ - 0) 532 Y. Tokat et al. being given by the inverse of the Fisher information matrix. The standard errors of the estimates are obtained by evaluating the Fisher information matrix at the ML point estimates. For details on stable ML estimation see Mittnik et al. (1999), Mittnik and Rachev (1993), and Paulauskas and Rachev (1999). 5.2.2. Comparison of estimation methods When the residuals of the ARMA model have Gaussian distribution, Least Squares (LS) estimation is equivalent to conditional ML estimation. Furthermore, Whittle estimator is asymptotically equivalent to LS and ML estimation methods. However, when the innovations have stable distribution, the properties of conventional estimation methods may change due to the infinite variance property. In the stable case, ML estimates are still consistent and asymptotically normal (DuMouchel, 1973); LS and Whittle estimates are consistent but they are not asymptotically normal. The LS and Whittle estimates have infinite variance limits with a convergence rate that is faster than that of the Gaussian case (Mikosch et al., 1995). Calder and Davis (1998) compare LS, Least Absolute Deviation (LAD), and ML methods for the estimation of ARMA model with stable innovations. Their simulations reveal that the difference between the estimates of the three methods is insignificant when the index of stability of the residuals is 1.75. However, when = 1 or = 0.75, they report that the LAD and ML estimation procedures are superior to LS estimation. ML estimation has desirable properties in both the Gaussian and stable setting, but it is computationally very demanding. Since the variables of interest in this paper have indices of stability greater than 1.5, nonlinear LS estimation method has been utilized in this study. Our parameter estimates are consistent, but they are not asymptotically normal. However, due to the high index of stability, the parameter estimates are comparable to those that would be achieved if ML estimation were to be used. 6. Multistage stable asset allocation model with decision rules The asset allocation problem for an investor that maximizes isoelastic utility function or an analog of mean-variance objective function at the end of the investment horizon is formulated as follows: maxE u ´Ri s,T subject to ´Ri s,T = T t=1 1 + Ri s,t - 1, Ri s,t = J j=1 wi j rjst, wi j 0, Ch. 13: Asset Liability Management 533 where wi j is the proportion of funds8 of portfolio i invested in asset j, ´Ri s,T is compound return of allocation i in time period of 1 through T under scenario s {1,2,...,S}, Ri s,t is the return of the portfolio i under scenario s {1,2,...,S} in time period t {1,2,...,T }, and rjst is the percentage return of asset j {1,2,...,J} under scenario s in time period t. The restrictions on the model are that there are no short sales and the asset allocation is updated every month according to fixed mix decision rule.9 In general, fixed mix strategy requires the purchase of stocks as they fall in value, and the sale of stocks as they rise in value. Fixed mix strategy does not have much downside protection, and tends to do very well in flat but oscillating markets. However, it tends to do relatively poorly in bullish markets (Perold and Sharpe, 1988). We use two alternative objective functions: the first one is power utility function and the second one is an analog of mean­variance analysis. The power utility function, which has constant relative risk aversion, is calculated as follows:10 U Wi = 1 S S s=1 1 (1 - ) Wi s (1- ) , > -1, where is the coefficient of relative risk aversion, and Wi s is the final wealth. Assuming that the initial wealth is 1, we compute the final wealth as follows: Wi s = 1 (1 + ´Ri s,T ). A constant relative risk aversion investor chooses the same investment proportions independent of the investment horizon if the market is frictionless and returns are independent over time. Fix mix is the optimal portfolio choice in this setting. However, if the returns are predictable, which is the conjecture of this paper, then the portfolio choice depends on the investment horizon. Although the fix mix strategy is no longer optimal in this economic environment, the investor is assumed to follow this decision strategy for computational simplicity. The second objective function trades off between mean final return and a measure of risk: U ´Ri T = E ´Ri T - c MD ´Ri T , where c is the coefficient of risk aversion. 8 Fix mix rule requires that wi j does not depend on time. 9 Perold and Sharpe (1988) suggest constant proportion portfolio insurance as an alternative strategy. In this strategy, one sells stocks as they fall in value and buy stocks as they rise in value. 10 Note that U(Wi ) is finite if (1 - ) < 2. 534 Y. Tokat et al. The mean compound portfolio return of fixed mix rule i {1,2,...,I} at the final date is: E ´Ri T = 1 S S s=1 ´Ri s,T . We consider the following risk measure which gives less importance to outliers than variance does: MD ´Ri T = 1 S S s=1 ´Ri s,T - E ´Ri T r , where 1 < r < 2. Notice that when r = 2, the above risk measure becomes the variance. Since variance is not defined for non-Gaussian stable variables, we use those values of r < 2 for which MD( ´Ri T ) is finite, such as r = 1.5. The scenario generation module generates asset return scenarios, rjst , for each time period. At each stage, n new offspring scenarios are generated from the parent scenarios. If the horizon of interest is T periods, then we produce nT alternative asset return scenarios for the final date. Optimal asset allocation is calculated for this scenario tree. The scenario tree is repeated 100 times and the sample average of optimal allocations is reported as the optimal asset allocation. 6.1. Scenario generation The portfolio we analyze is composed of Treasury bill and S&P 500. The monthly return on Treasury bill is assumed to be constant at 6% annualized rate of return. The main challenge is predicting the return scenarios for S&P 500. The financial variables that are used to generate the return scenarios for S&P 500 are modeled in a cascade structure similar to Mulvey11 (1996) (see Figure 3). However, the analysis is done in discrete time as in Wilkie12 (1995). Monthly data from 2/1965 through 12/1999 is used for the estimation of the time series models.3-month Treasury bill rate and 10-year Treasury bond rate are modeled first as measures of short term and long term interest rates. The price inflation depends on the Treasury bond rate and the previous values of inflation. Following Wilkie's and Mulvey's approaches, stock returns are analyzed in two components: dividend growth and dividend yield growth.13 The relationship of economic variables does not denote a one way casual relationship, but rather indicates the sequencing of the modules. The economic variables are modeled 11 See Section 4.2 for a brief description. 12 See Section 4.1.2 for a brief review. 13 Tokat, Rachev and Schwartz (2002) gives the details of the time series analysis. Ch. 13: Asset Liability Management 535 using Box­Jenkins methodology. The standard Gaussian Box­Jenkins techniques carry over to the stable setting with some possible changes. We do not model the time varying volatility of the economic variables. Fitting ARMAGARCH models may reduce the kurtosis in the residuals. However, Balke and Fomby (1994) show that even after estimating GARCH models, significant excess kurtosis and/or skewness still remains. Mittnik, Rachev and Paolella (1997) present empirical evidence favoring stable hypothesis over the normal assumption as a model for ARMA-GARCH residuals. We postpone modeling the time varying volatility to another paper. Future economic scenarios are simulated at monthly intervals. One set of scenarios is generated by assuming that the residuals of each variable is identical normally distributed. This is the classical assumption made in the literature. Another set of scenarios is generated by assuming that the residuals are identical stable distributed. The estimated normal and stable parameters14 for the innovations of the time series models are given in Table 1. See Figures 4­8 for graphical comparison of stable and normal fit to the residuals. Fig. 3. The scenario generation model. Table 1 The estimated normal and stable parameters for the innovations Innovations of Normal distribution Stable distribution Price inflation (Inf) 6.15e-06 0.0021 1.7072 0.1073 6.15e-06 0.0012 Dividend gr. (Divg) 9.89e-4 0.0195 1.7505 -0.0229 9.89e-4 0.0114 Dividend yield (d(Divy)) -0.002551 0.0407 1.8076 0.2252 -0.002551 0.0239 Treasury bill (d(Tbill)) 0.000336 0.0579 1.5600 0 0 0.0308 Treasury bond (d(Tbond)) 0.000818 0.0339 1.9100 0 0 0.0230 14 Stable parameters are estimated using maximum likelihood estimation method. 536 Y. Tokat et al. Fig. 4. The empirical pdf of the residuals of inflation, the Gaussian fit and the stable fit. The scenarios have a tree structure. At each stage (month) we generate n possible scenarios. For each scenario, we first generate a normal or stable residual for Treasury bill, and calculate the corresponding Treasury bill rate for the proceeding month. Then, given this short rate, we generate Treasury bond rate, price inflation, dividend growth rate and dividend yield for that month according to the cascade structure and the time series models we have built. For instance, the inflation rate for next month is generated by using the Treasury bond rate, inflation rate and the surprise to expected inflation this month, and the normal or stable innovation of inflation rate next month. Note that we allow for innovation of each economic variable in each simulated month. At the next stage, n new offspring scenarios are generated from the parent scenarios. This continues until the final time of interest. In this study, we generate 2 scenarios for each month, so 512 possible economic scenarios are considered over the next three quarters. 6.2. Valuation of assets The monthly return of S&P 500 is derived using the dividend yield and the dividend index. Dividend index is calculated by multiplying price index with the dividend yield: DIt = Pt × DYt, Ch. 13: Asset Liability Management 537 Fig. 5. The empirical pdf of the residuals of dividend growth, the Gaussian fit and the stable fit. where DIt is the dividend index for period t, Pt is the price index for period t, and DYt is the dividend yield for period t. The dividend growth is just log differences of dividend indices. The dividend yield and dividend growth rate are simulated as explained in the previous section. Hence, we can get back simulated future price index in period t under scenario s from the simulated dividend growth and dividend yield indices by Pst = DIst/DYst. Then, we can calculate the return for holding S&P 500 for a month under scenario s as rst = (Pst - Ps(t-1) + DIst)/Ps(t-1). 6.3. Computational results We first present the mean annualized return of S&P 500 in 100 repetitions of the scenario tree generated by using the Gaussian and stable distribution models (see Table 2). 538 Y. Tokat et al. Fig. 6. The empirical pdf of the residuals of dividend yield, the Gaussian fit and the stable fit. Table 2 Annualized return scenarios on S&P 500 Mean 1% 2.5% 25% 75% 97.5% 99% Normal 9.07 -122.34 -103.03 -31.97 48.54 129.66 152.90 scenarios Stable 10.20 -149.17 -107.29 -27.45 44.96 128.68 171.16 scenarios The table also depicts the percentiles of these return scenarios. It should be noted that the S&P 500 returns generated by stable scenarios have fatter tails than those of Gaussian scenarios. Hence, stable scenarios consider more extreme scenarios than Gaussian scenarios do. Khindanova, Rachev and Schwartz (2001) report similar observations in their paper where they compute value at risk employing Gaussian and stable distributed daily returns. They state that 5% percentile of normal and stable distribution are very close, but the 1% percentile of stable distribution is greater than that of the Gaussian. Ch. 13: Asset Liability Management 539 Fig. 7. The empirical pdf of the residuals of Treasury bill, the Gaussian fit and the stable fit. The asset allocation problem has been solved for an investor that maximizes the power utility of final wealth. The optimal asset allocation depends on the risk aversion level of the agent. If his relative risk aversion coefficient is very low, such as 0.80, or very high, such as 10.00, then the Gaussian and stable scenarios result in similar asset allocations (see Table 3). The intuitive explanation for this is that, the investor who has very low risk aversion, does not mind the risk very much. Therefore, his decision does not change when the extreme events are modeled more realistically. Similarly, the investor who has very high risk aversion, is already scared away from the risky asset. The fatter tails do not affect his decision much either. On the other hand, an investor who would put 60% in S&P 500 if he were to use normal scenarios, will put only 48% in S&P 500 if he uses stable scenarios. The fact that stable scenarios model the extreme events more realistically, results in stable investor putting less in the risky asset than Gaussian investor does. The time series models which generate the Gaussian and stable scenarios are the same except for the residuals being Gaussian or stable, respectively. In our computations, the 15 Note that when = 1 the power utility function reduces to logarithmic utility function. 540 Y. Tokat et al. Fig. 8. The empirical pdf of the residuals of Treasury bond yield, the Gaussian fit and the stable fit. Table 3 Optimal allocations under normal and stable scenarios (T = 3 quarters) Normal scenarios Stable scenarios Optimal percentage invested Optimal percentage invested S&P 500 (%) Treasury Bill (%) S&P 500 (%) Treasury Bill (%) 0.80 100 0 100 0 1.0015 100 0 88 12 1.50 86 14 66 44 2.30 60 40 48 52 2.70 52 48 42 58 10.00 14 86 12 88 mean return of Gaussian S&P 500 scenarios came out to be less than stable S&P 500 scenarios. The equity premium is 3.07% in the normal scenarios and 4.20% in the stable scenarios. Since the premium on equity is higher in stable scenarios, the equity is more attractive. However, the fact that the stable scenarios also have heavier tails outweighs Ch. 13: Asset Liability Management 541 this, and consequently the investor puts considerably less money in the stock index. If the equity premium were the same in both sets of scenarios, we contemplate that the allocation difference would be even more pronounced. Table 4 depicts the change in the utility16 if the investor uses stable scenarios rather than Gaussian scenarios. The improvement can be as large as 0.72% depending on the risk aversion level of the investor. Table 5 reports the improvement in the certainty equivalent final wealth (CEFW) if an investor uses stable scenarios rather than Gaussian scenarios.17 The computations show a 6 basis point improvement in the certainty equivalent wealth of the investor who would put 60% in S&P 500. The difference could get larger or smaller depending on the risk aversion level of the decision maker. The other `utility' function we consider is an analog of mean-variance criterion. The computational results achieved are very similar to the constant relative risk aversion utility. The investor who has very low or very high risk aversion, does not gain much from using Table 4 Comparison of utility achieved from normal and stable scenarios (T = 3 quarters) Normal scenarios Stable scenarios % Change in utility % in S&P 500 Utility % in S&P 500 Utility 0.80 100 5.0633 100 5.0633 0.00 1.00 100 0.0600 88 0.0604 0.72 1.50 86 -1.9458 66 -1.9445 0.06 2.30 60 -0.7188 48 -0.7181 0.09 2.70 52 -0.5391 42 -0.5386 0.09 10.00 14 -0.0728 12 -0.0728 0.03 Table 5 Comparison of certainty equivalent wealth achieved from normal and stable scenarios (T = 3 quarters) Normal scenarios Stable scenarios % Change in CEFW % in S&P 500 CEFW % in S&P 500 CEFW 0.80 100 1.0650 100 1.0650 0.00 1.00 100 1.0618 88 1.0623 0.04 1.50 86 1.0565 66 1.0579 0.13 2.30 60 1.0536 48 1.0543 0.07 2.70 52 1.0526 42 1.0532 0.05 10.00 14 1.0480 12 1.0481 0.00 16 Note that the utility value becomes negative when > 1. Although negative utility does not make much sense, it can be made positive by monotonic transformations. 17 Since Gaussian distribution is a special case of stable distribution, the stable model encompasses the Gaussian model. Therefore, the certainty equivalency comparison is made under the assumption that stable is the correct model. 542 Y. Tokat et al. Table 6 Optimal allocations under normal and stable scenarios (T = 3 quarters) c Normal scenarios Stable scenarios Optimal percentage invested Optimal percentage invested S&P 500 (%) Treasury Bill (%) S&P 500 (%) Treasury Bill (%) 0.35 100 0 100 0 0.40 90 10 80 20 0.52 60 40 54 46 0.59 50 50 44 66 1.00 20 80 18 82 Table 7 Percentage change in utility achieved from normal and stable scenarios (T = 3 quarters) c Normal scenarios Stable scenarios % Change in utility % in S&P 500 Utility % in S&P 500 Utility 0.35 100 0.0583 100 0.0583 0.00 0.40 90 0.0561 80 0.0562 0.28 0.52 60 0.0526 54 0.0527 0.10 0.59 50 0.0513 44 0.0514 0.12 1.00 20 0.0479 18 0.0480 0.08 the stable model. However, the stable model makes a difference for the investors in the middle. Table 6 depicts that an investor who would put 60% in S&P 500 if he were to use normal scenarios, will put only 56% in S&P 500 if he uses stable scenarios. Table 7 reports the percentage improvement in the `utility' function18 if one uses stable model as opposed to Gaussian model. If there is any percentage improvement in the utility function, an investor can reduce the risk for a given level of mean return or increase the mean return for a given level of risk. This can be achieved by switching from Gaussian scenario generation to stable scenario generation. 7. Conclusion The ALM models that are based on stochastic programming with or without decision rules are starting to gain applicability in the industry. In these models, the future uncertainty is modeled using discrete scenarios. A representative set of scenarios describes the possible future economic situations facing the institution. 18 Since the risk corresponding to certainty equivalent return is zero, the certainty equivalent return is equal to the utility of return. Hence, the percentage improvement in the utility of return is equivalent to the percentage improvement in the certainty equivalent return. Ch. 13: Asset Liability Management 543 Generating scenarios that realistically represent the future uncertainty is important for the validity of the results of stochastic programming based ALM models. The assumption underlying the scenario generation models used in the literature is the normal distribution. The validity of normal distribution has been questioned in the finance and macroeconomics literature. The leptokurtic (heavy tailed and peaked), and asymmetric nature of the economic variables can be better captured by using stable distribution as opposed to normal distribution. We analyze the effects of the distributional assumptions on optimal asset allocation. A multistage dynamic asset allocation model with decision rules has been set up. The optimal asset allocations found under normal and stable scenarios are compared. The analysis suggests that the normal scenarios greatly underestimate risks. Stable scenario modeling leads to asset allocations that are up to 20% different than those of normal scenario mod- eling. Although the financial data exhibit time varying volatility and long range dependence as well as heavy tails, this study has only considered explicit modeling of heavy tails in the financial data. The conditional heteroskedastic models (ARMA-GARCH) utilizing stable distributions can be used to describe the time varying volatility along with the asymmetric and leptokurtic behavior. In addition to these, the long-range dependence can also be modeled if fractional-stable GARCH models are employed. These aspects of financial data will be considered in a later paper. References Bai, D., Carpenter, T., Mulvey, J.M., 1997. Making a case for robust optimization models. Management Science 43 (7), 895­907. Balke, N.S., Fomby, T.B., 1994. Large shocks, small shocks, and economic fluctuations: outliers in macroeconomic time series. Journal of Applied Econometrics 9, 181­200. Bekaert, G., Hodrick, R.J., 1992. Characterizing predictable components in excess returns on equity and foreign exchange markets. Journal of Finance 47 (2), 467­509. Berger, A.J., Mulvey, J.M., 1998. The home account advisor: asset and liability management for individual investors. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 634­665. Boender, G.C.E., 1997. A hybrid simulation/optimization scenario model for asset/liability management. European Journal of Operational Research 99, 126­135. Boender, G.C.E., van Aalst, P., Heemskerk, F., 1998. Modeling and management of assets and liabilities of pension plans in the Netherlands. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 561­580. Bollerslev, T., 1986. Generalized autoregressive conditional homoskedasticity. Journal of Econometrics 52, 5­59. Bradley, S.P., Crane, D.B., 1972. A dynamic model for bond portfolio management. Management Science 19, 139­151. Brandt, M.W., 1999. Estimating portfolio and consumption choice: a conditional Euler equations approach. Journal of Finance 54, 1609­1646. Brennan, M.J., Schwartz, E.S., 1982. An equilibrium model of bond pricing and a test of market efficiency. Journal of Financial and Quantitative Analysis 17 (3), 301­329. Brennan, M.J., Schwartz, E.S., 1998. The use of Treasury bill futures in strategic asset allocation programs. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 205-228. 544 Y. Tokat et al. Brennan, M.J., Schwartz, E.S., Lagnado, R., 1997. Strategic asset allocation. Journal of Economic Dynamics and Control 21, 1377­1403. Brinson, G.P., Hood, L.R., Beebower, G.L., 1986. Determinants of portfolio performance. Financial Analysts Journal 42 (4), 39­48. Bunn, D.W., Salo, A.A., 1993. Forecasting with scenarios. European Journal of Operational Research 68 (3), 291­303. Calder, M., Davis, R.A., 1998. Inference for linear processes with stable noise. In: Adler, R.J., Feldman, R.E., Taqqu, M.S. (Eds.), A Practical Guide to Heavy Tails. Birkhäuser, pp. 159­176. Carino, D.R., Kent, T., Myers, D.H., Stacy, C., Sylvanus, M., Turner, A.L., Watanabe, K., Ziemba, W.T., 1994. The Russell­Yasuda Kasai model: An asset/liability model for a Japanese insurance company using multistage stochastic programming. Interfaces 24, 29­49. Carino, D.R., Myers, D.H., Ziemba, W.T., 1998. Concepts, technical issues, and uses of the Russel­Yasuda Kasai financial planning model. Operations Research 46 (4), 450­462. Carino, D.R., Ziemba, W.T., 1998. Formulation of the Russell­Yasuda Kasai financial planning model. Operations Research 46 (4), 433­449. Carpenter, T., Lustig, I., Mulvey, J.M., 1991. Formulating stochastic programs for interior point methods. Operations Research 39, 757­770. Charnes, A., Gallegos, Yao, S., 1982. A chance constrained approach to bank dynamic balance sheet management. Center for Cybernetic Studies CCS 428, University of Texas, Austin. Charnes, A., Kirby, M.J.L., 1966. Optimal decision rules for the E-model of chance-constrained programming. Cahiers du Centre ďÉtudes de Recherche Operationelle 8, 5­44. Chopra, V.K., Ziemba, W.T., 1993. The effect of errors in means, variances, and covariances on optimal portfolio choice. Journal of Portfolio Management, Winter, 6­11. Culp, C., Tanner, K., Mensink, R., 1997. Risk, returns and retirement. RISK 10 (10), 63­69. Dembo, R.S., 1991. Scenario optimization. Annals of Operations Research 30, 63­80. Dembo, R.S., 1993. Scenario immunization. In: Zenios, S.T. (Ed.), Financial Optimization. Cambridge University Press, Cambridge, UK, pp. 290-308. Dert, C.L., 1995. Asset and liability management for pension funds, a multistage chance constrained programming approach. Ph.D. Thesis. Erasmus University, Rotterdam, The Netherlands. Dert, C.L., 1998. A dynamic model for asset liability management for defined benefit pension funds. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 501­536. DuMouchel, W.H., 1973. On the asymptotic normality of the maximum likelihood estimate when sampling from a stable distribution. The Annals of Statistics 1, 948­957. Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal Events for Insurance and Finance. Springer-Verlag, Berlin. Engle, R., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987­1007. Eppen, G., Fama, E., 1971. Three asset cash balance and dynamic portfolio problems. Management Science 17 (5), 311­319. Fama, E., 1965. The behavior of stock market prices. Journal of Business 38, 34­105. Geyer, A., Herold, W., Kontriner, K., Ziemba, W.T., 2002. The innovest Austrian pension fund financial planning model InnoALM. Working paper. University of Economics, Vienna, Austria. Hakansson, N.H., 1971. On the optimal myopic portfolio policies, with and without serial correlation of yields. Journal of Business 44, 324­334. Reprinted in: Ziemba, W.T., Vickson, R.G. (Eds.), 1975. Stochastic Optimization Models in Finance. Academic Press, pp. 401­411. Hiller, R.S., Eckstein, J., 1993. Stochastic dedication: designing fixed income portfolios using massively parallel Benders decomposition. Management Science 39 (11), 1422­1438. Hodrick, R.J., 1992. Dividend yields and expected stock returns: alternative procedures for inference and measurement. Review of Financial Studies 5 (3), 357­386. Ch. 13: Asset Liability Management 545 Kandel, S., Staumbaugh, R.F., 1996. On predictability of stock returns: an asset allocation perspective. Journal of Finance 51, 385­424. Khindanova, I., Rachev, S.T., Schwartz, E., 2001. Stable modeling of value at risk. Mathematical and Computer Modelling 34 (9­11), 1223­1259. Kim, J.-R., 1999. Another look on the asymmetric behavior of unemployment rates over the business cycle. Working paper. Technische Universität, Dresden. Klaasen, P., 1997. Discretized reality and spurious profits in stochastic programming models for asset/liability management. European Journal of Operational Research 101, 374­392. Klaasen, P., 1998a. Financial asset-pricing theory and stochastic programming models for asset/liability management: a synthesis. Management Science 44 (1), 31­48. Klaasen, P., 1998b. Solving stochastic programming models for asset/liability management using iterative disaggregation. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 427­463. Korhonen, A., 1987. A dynamic bank portfolio planning model with multiple scenarios, multiple goals and changing priorities. European Journal of Operational Research 30, 13­23. Kusy, M.I., Ziemba, W.T., 1986. A bank asset and liability management model. Operations Research 35, 356­376. Li, S.X., 1995a. An insurance and investment portfolio model using chance constrained programming. Omega 23 (5), 577­585. Li, S.X., 1995b. A satisficing chance constrained model in the portfolio selection of insurance lines and investments. Journal of the Operational Research Society 46 (9), 1111­1120. Mandelbrot, B.B., 1963. The variation of certain speculative prices. Journal of Business 26, 394­419. Mandelbrot, B.B., 1967. The variation of some other speculative prices. Journal of Business 40, 393­413. Maranas, C.D., Androulakis, I.P., Floudas, C.A., Berger, A.J., Mulvey, J.M., 1997. Solving long-term financial planning problems via global optimization. Journal of Economic Dynamics and Control 21, 1405­1425. Merton, R.C., 1969. Lifetime portfolio selection under uncertainty: the continuous-time case. Review of Economics and Statistics 51 (3), 247­257. Mikosch, T., Gadrich, T., Klüppelberg, C., Adler, R.J., 1995. Parameter estimation for ARMA models with infinite variance innovations. The Annals of Statistics 23 (1), 305­326. Mittnik, S., Paolella, M.S., Rachev, S.T., 2000. Diagnosing and treating the fat tails in financial returns data. Journal of Empirical Finance 7, 389­416. Mittnik, S., Rachev, S.T., 1993. Modeling asset returns with alternative stable distribution. Econometric Review 12 (3), 261­330. Mittnik, S., Rachev, S.T., Doganoglu, T., Chenyao, D., 1999. Maximum likelihood estimation of stable Paretian models. Mathematical and Computer Modelling 29 (10­12), 275­293. Mittnik, S., Rachev, S.T., Paolella, M.S., 1997. Stable Paretian modeling in finance: some empirical and theoretical aspects. Unpublished manuscript. Institute of Statistics and Econometrics, Christian Albrechts University at Kiel. Mulvey, J.M., 1994. An asset-liability investment system. Interfaces 24, 22­33. Mulvey, J.M., 1996. Generating scenarios for the Towers Perrin investment system. Interfaces 26, 1­15. Mulvey, J.M., 1998. Generating scenarios for global financial planning systems. International Journal of Forecasting 14, 291­298. Mulvey, J.M., Thorlacius, A.E., 1998. The Tower Perrin global capital market scenario generation system. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 286­312. Mulvey, J.M., Vanderbrei, R.J., Zenios, S.A., 1995. Robust optimization of large-scale systems. Operations Research 43 (2), 264­281. Mulvey, J.M., Vladimirou, H., 1989. Stochastic network optimization models for investment planning. Annals of Operations Research 20, 187­217. Mulvey, J.M., Vladimirou, H., 1991. Applying the progressive hedging algorithm to stochastic generalized networks. Annals of Operations Research 31, 399­424. 546 Y. Tokat et al. Mulvey, J.M., Vladimirou, H., 1992. Stochastic network programming for financial planning problems. Management Science 38 (11), 1642­1664. Mulvey, J.M., Zenios, S.A., 1994a. Capturing correlations of fixed income instruments. Management Science 40 (10), 1329­1342. Mulvey, J.M., Zenios, S.A., 1994b. Dynamic diversification of fixed income portfolios. Financial Analysts Journal, January­February, 30­38. Mulvey, J.M., Ziemba, W.T., 1995. Asset and liability allocation in a global environment. In: Jarrow, R., et al. (Eds.), Handbooks in OR and MS, Vol. 9, pp. 435­463. Mulvey, J.M., Ziemba, W.T., 1998. Asset and liability management systems for long-term investors: discussion of the issues. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 3­38. Oguzsoy, C.B., Guven, S., 1997. Bank asset and liability management under uncertainty. European Journal of Operational Research 102, 575­600. Ortobelli, L., Rachev, S.T., Schwartz, E., 2002. On the problem of optimal portfolio with stable distributed returns. Unpublished manuscript. Anderson Graduate School of Business, University of California, Los Angeles. Paulauskas, V., Rachev, S.T., 1999. Maximum likelihood estimators in regression models with infinite variance innovation. Working paper. Vilnius University, Lithuania. Perold, J.M., Sharpe, W.F., 1988. Dynamic strategies for asset allocation. Financial Analysts Journal, January, 16­27. Rachev, S.T., Han, S., 1999. Optimization problems in mathematical finance. Working paper. University of Karlsruhe, Germany. Rachev, S.T., Kim, J.-R., Mittnik, S., 1999. Stable Paretian models in econometrics: Part I. Working paper. University of Karlsruhe, Germany. Rachev, S.T., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley, New York. Rockafeller, R.T., Wets, R.J.-B., 1991. Scenarios and policy aggregation in optimization under uncertainty. Mathematics of Operations Research 16, 119­147. Samorodnitsky, G., Taqqu, M.S., 1994. Stable Non-Gaussian Random Variables. Chapman and Hall, New York. Samuelson, P.A., 1969. Lifetime portfolio selection by dynamic stochastic programming. Review of Economics and Statistics 51 (3), 239­246. Sims, C., 1980. Macroeconomics and reality. Econometrica 48 (1), 1­48. Sweeney, J.C., Sonlin, S.M., Correnti, S., Williams, A.P., 1998. Optimal insurance asset allocation in a multicurrency environment. In: Ziemba, W.T., Mulvey, J.M. (Eds.), Worldwide Asset and Liability Modeling. Cambridge University Press, Cambridge, UK, pp. 341­368. Tokat, Y., Rachev, S.T., Schwartz, E., 2002. The stable non-Gaussian asset allocation: a comparison with the classical Gaussian approach. Journal of Economic Dynamics and Control. Special Issue on High Performance Computing in Finance, forthcoming. Von Neumann, J., Morgenstern, O., 1947. Theory of Games and Economic Behavior. Princeton University, Princeton, NJ. Wilkie, A.D., 1986. A stochastic investment model for actuarial use. Transactions of the Faculty of Actuaries 39, 391­403. Wilkie, A.D., 1995. More on a stochastic asset model for actuarial use. Presented to the Institute Actuaries and Faculty of Actuaries, London. Zenios, S.A., 1993. Financial Optimization. Cambridge University Press, Cambridge, UK. Zenios, S.A., Holmer, M.R., McKendall, R., Vassiadou-Zeniou, C., 1998. Dynamic models for fixed income portfolio management under uncertainty. Journal of Economic Dynamics and Control 22, 1517­1541. Ziemba, W.T., 1974. Choosing investment portfolios when the returns have stable distributions. In: Hammer, P.L., Zoutendijk, G. (Eds.), North-Holland, Amsterdam, pp. 443­482. Ziemba, W.T., Mulvey, J.M., 1998. Worldwide Asset and Liability Modeling. Cambridge University Press, Cam- bridge. Chapter 14 PORTFOLIO CHOICE THEORY WITH NON-GAUSSIAN DISTRIBUTED RETURNS SERGIO ORTOBELLI University of Bergamo, Italy ISABELLA HUBER University of Karlsruhe, Germany SVETLOZAR T. RACHEV Department of Statistics and Applied Probability, University of California, Santa Barbara, USA Institute of Statistics and Mathematical Economics, University of Karlsruhe, Germany e-mail: rachev@lsoe-4.wiwi.uni-karlsruhe.de EDUARDO S. SCHWARTZ Anderson School of Management, University of California, Los Angeles, USA Contents Abstract 548 1. Introduction 549 2. Choices determined by a finite number of parameters 553 2.1. Portfolio choice with institutional restrictions 553 2.2. Portfolio choice when unlimited short sales are allowed 558 2.3. Relations with Ross' multi-parameter models 560 3. The asymptotic distributional classification of portfolio choices 561 3.1. The sub-Gaussian stable model 567 3.2. A three fund separation model in the domain of attraction of a stable law 570 3.3. A k + 1 fund separation model in the domain of attraction of a stable law 573 4. A first comparison between the normal multivariate distributional assumption and the stable sub-Gaussian one 574 4.1. An optimal allocation problem 575 4.2. Stable versus normal optimal allocation: a first comparison 578 5. Conclusions 581 Acknowledgment 582 Appendix A: Proofs 582 Appendix B: Tables 585 References 590 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 548 S. Ortobelli et al. Abstract This chapter discusses the parametric distributions of asset returns and proposes portfolio choice models consistent with the maximization of the expected utility. We analyze multi-parameter models to select nonstochastically dominated portfolios when short sales are allowed and when short sales are not allowed. We also concentrate our attention on the stable distributional approach in order to derive optimal portfolios with heavy-tailed distributed financial returns. Finally, we examine and compare optimal allocations obtained with the multivariate normal model and the sub-Gaussian stable one. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 549 1. Introduction The purpose of this chapter is to describe the admissible classes of parametric distribution functions of return portfolios and to analyze their consistency with the maximization of the expected utility, in order to choose the optimal portfolios. In particular, we present a general theory and a unifying framework with the following aims: (1) Understanding the distributional approach applied to the portfolio choice theory; (2) Studying the implications of the classical market restrictions on the portfolio distributions; (3) Considering the asymptotic behavior of return data. We conclude our analysis comparing the normal multivariate approach with the sub-Gaussian approach here pre- sented. The theory of portfolio choice is based on the assumption that investors allocate their wealth across the available assets in order to maximize their expected utility. Markowitz (1952, 1959, 1987) and Tobin (1958, 1965) were among the first to give rigorous results approximating to the portfolio selection problem in terms of the mean and the variance. Their analysis was extended to an equilibrium theory by Sharpe (1964), Lintner (1965) and Mossin's (1966) capital asset pricing model (CAPM). In a mean­variance world the investor is concerned with only two parameters of the probability distribution of total returns on investment: the mean of the return and the variance of the return. The simplicity of the equilibrium theory and the intuitive appeal of the mean­variance analysis attracted and directed the attention toward determining their generality and extensibility. The foundation of the whole theory lays in the arbitrage pricing theory (APT) and in the stochastic dominance analysis. As a matter of fact, both theories are strictly grounded on the equilibrium theory [see, among others, Ross (1975), Dybvig and Ross (1987), Jarrow (1986)]. The arbitrage pricing theory and the fund separation theorems [see Ross (1976, 1978a)] justify and extend CAPM to multi-parameter linear models. Whereas the stochastic dominance analysis justifies the partial consistency of the mean­variance framework with the expected utility maximization when the portfolios are elliptically distributed [see, among others, Bawa (1975), Chamberlein (1983), Owen and Rabinovitch (1983)]. In the same years and subsequently, the theory was further generalized to intertemporal finance and to consumption-based model. Since the space of feasible consumption bundles is quite generally a linear space [as Ross (1978b), Cox and Leland (1982), Rubinstein (1976) and many others have emphasized], the original dynamic problem can be replaced with an equivalent one-period problem, which has appropriate terminal state prices, if all consumption takes place at the end. More generally, if preferences are time separable and if we treat consumption at each date separately, the analysis is unchanged. For this reason, here we propose a static approach. A first generalization to an intertemporal approach can be found in Ortobelli, Rachev and Schwartz (2002). The mean­variance theory has survived theoretical criticism and empirical rejection, such as that of Samuelson (1967, 1969) and Samuelson and Merton (1975) who have underlined the limits of the approximation given only by the mean and the variance of a portfolio. Later, Roll (1977, 1978, 1979a, b) was the first to understand clearly the weaknesses of the theory and the empirical deficiencies. Then, Dybvig and Ingersoll (1982) 550 S. Ortobelli et al. verified that the mean­variance pricing and the complete market hypothesis still can lead to arbitrage opportunities. They also proved that the standard mean­variance separation theorem holds in a complete market only if all investors have quadratic utility. Further, Dybvig and Ross (1982) have demonstrated that efficient sets generally are not convex and the market portfolio could be inefficient. Bawa (1976) considered the case of a market with no short sale opportunities and jointly normally distributed returns. Under these assumptions, if there are some investors with increasing nonconcave utility functions [for example Friedman and Savage type utility functions (Friedman and Savage, 1948)], the market portfolio could be inefficient. Moreover, Dybvig and Ross (1985a) observed that assuming symmetric information and an inefficient index, the security market line analysis can be grossly misleading, since in general efficient and inefficient portfolios can plot above and below the security market line. In another paper [see Dybvig and Ross (1985b)] they also argued that differential information disrupts the validity of the security market line analysis, since it takes us outside the domain of the mean­variance analysis. On the other side, the fundamental work of Mandelbrot (1963a, b, 1967), Mandelbrot and Taylor (1967) and Fama (1963, 1965a, b) has sparked considerable interest in studying the empirical distribution of financial assets. The excess kurtosis, found in Mandelbroťs and Fama's investigations, led them to reject the normal assumption (generally used to justify the mean­variance approach) and to propose the stable Paretian distribution as a statistical model for asset returns. Other relevant empirical studies on postwar US data have shown that the slope of the mean­standard deviation frontier or of the expected returnbeta lines is much higher than the reasonable risk aversion and consumption volatility estimates suggest. This is the equity premium puzzle (Merha and Prescott, 1985; Hansen and Jagannathan, 1991) that could be generated by one or more of the following conditions: (a) the investors are much more risk averse than the academics might have thought; (b) the stock returns of the last 50 years are due to good luck rather than an equilibrium compensation for risk; (c) something is deeply wrong with the model [see Cochrane (1999)]. The many lacks and contrasting results on the empirical and theoretical mean­variance analysis represented the main justifications and reasons of the alternative mean­dispersion models proposed in the last decades [see Markowitz' (1959) mean­semi-varianceapproach, Yitzhaki (1982) and Shalit and Yitzhaki's (1984) mean­Gini portfolio theory, Dybvig's (1988a, b) distributional approach for complete market, Speranza (1993) and Konno and Jamazaki's mean­absolute deviation approach (MAD) (Konno and Yamazaki, 1991), the Ogryczak and Ruszczynski's (1999, 2001) mean­semi-deviation models]. However, from a conceptual point of view, the stochastic dominance theory has shown that the variance, as any other dispersion measure, cannot always be considered as a risk measure. As a matter of fact, given a lottery X with a given dispersion measure, we can always find another lottery Y which has a greater, lower or equal dispersion and it is preferred to the first one by some non-satiable or risk averse investors [see among others Levy (1992)]. Then, the implicit problems to solve are the following: "When can we use a dispersion measure as a risk measure? Which relations there exist between the dispersion measure and the other parameters that characterize the para- Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 551 metric family of portfolio distribution functions?". To answer these questions, we recall a distributional stochastic dominance analysis of the parametric families consistent with the maximization of the expected utility, see Ortobelli (2001). The distributional analysis proposed is formally different from Dybvig's (1988a) distributional one. Dybvig's model for complete markets has been applied successfully in theoretical work to value the magnitude of inefficiency of some dynamic portfolio strategies [see Dybvig (1988b)]. However, Dybvig's model is not easily applicable from an empirical point of view. In this chapter we classify clearly the multi-parameter optimization problems to solve so as to obtain optimal portfolios. The proposed multi-parameter approach is alternative to Ross' one and it is a unifying and generalizing extension of the classic moment analysis in portfolio choice theory [see among others Jean (1971), Fishburn (1980), Ingersoll (1987)]. As a matter of fact, when the family of return portfolios belongs to a parametric family uniquely determined by a finite number of moments, we show what kind of optimization problems we have to solve in order to find frontiers of admissible choices. As a consequence, we obtain a further delimitation of the admissible portfolio choices. Therefore, we could classify the distributions for which Markowitz and Tobin's mean­variance rule is an optimal selection rule in a market with and without institutional restrictions (no short sale and limited liability). Moreover, this analysis justifies the stochastic dominance properties of the meandispersion models in the works recalled above and of some classic mean­dispersionskewness approaches [see, for example, Kraus and Litzenberger (1976), Simaan (1993), Ingersoll (1987)]. In second analysis we study the asymptotic distributional behavior of data. The behavior, generally stationary over time of returns, and the Central Limit Theorem and Central Pre-limit Theorem for normalized sums of i.i.d. random variables [see Zolotarev (1986), Klebanov, Rachev and Szekely (2001), Klebanov, Rachev and Safarian (2000)] theoretically justify the stable Paretian approach proposed by Mandelbrot and Fama. As a matter of fact, their conjecture was supported by numerous empirical investigations in the subsequent years [see Mittnik, Rachev and Paolella (1997), Rachev and Mittnik (2000)]. The practical and theoretical appeal of the stable non-Gaussian approach is given by its attractive properties that are almost the same as the normal one. A relevant desirable property of a stable distributional assumption is that stable distributions have domain of attraction. Therefore, any distribution in the domain of attraction of a specified stable distribution will have properties close to those of the stable distribution. Another attractive aspect of the stable Paretian assumption is the stability property, i.e., stable distributions are stable with respect to summation of i.i.d. random stable variables. Hence, the stability governs the main properties of the underlying distribution [detailed accounts for theoretical aspects of stable distributed random variables can be found in Samorodnitsky and Taqqu (1994), Janicki and Weron (1994)]. Here, we adapt the above mentioned multi-parameter approach to portfolio choice problems using stable laws. We find an equivalent parameterization of the stable laws (in terms of some moments) that characterizes the stable laws generally used. Then, we recall three admissible fund separation models where the asset returns are in the domain for attraction of stable laws [see Ortobelli, Rachev and Schwartz (2002)]. 552 S. Ortobelli et al. We first consider the portfolio allocation among n -stable sub-Gaussian distributed risky assets (with 1 < < 2) and the riskless one. The joint stable sub-Gaussian family is an elliptical family. Hence, as argued by Owen and Rabinovitch (1983), in this case, we can use a mean­dispersion analysis. The resulting efficient frontier is formally the same as Markowitz­Tobin's mean­variance analysis, but, instead of considering the variance as a risk parameter, we have to consider the scale parameter of the stable distributions. All the stable parameters can be estimated. In order to consider the possible asymmetry of asset returns, we describe a three-fund separation model for returns in the domain of attraction of a stable law. In case of asymmetry, the model results from a new stable version of the Simaan's model, see Simaan (1993). In case of symmetry of returns, we obtain a version of a model recently studied by Götzenberger, Rachev and Schwartz (1999), that can also be viewed as a particular version of the two-fund separation of Fama's (1965b) model. In this case too, it is possible to estimate all parameters. Finally, the last model proposed deals with the case of optimal allocation among stable distributed portfolios with different indexes of stability. To overcome the difficulties of the most general case of the stable law, we introduce a k + 1 fund separation model. Then, we show how to express the modeĺs multi-parameter admissible frontier. Finally, we analyze an investment allocation problem. It consists of the maximization of the mean minus a measure of portfolio risk. We propose a mean­risk analysis that facilitates the interpretation of the results. In the allocation problem, we consider as the risk measure the expected value of a power absolute deviation. When the power is equal to two, we obtain the classic quadratic utility functional. We examine the optimal allocation among a riskless return and 23 risky returns, then we compare the allocation obtained with the Gaussian and the stable sub-Gaussian distributional assumption for the risky returns. We choose the 6% annual rate as riskless return. The model parameters are estimated using the methodology based on the moment method. We show that there are significant differences in the allocation when the data fit the stable subGaussian or the normal distributions. By comparing the joint normal distribution with the joint stable sub-Gaussian law one, it has occurred that the results performed under the examined optimal allocation problems are substantially different. In particular, the stable market portfolio is generally less risky than the Gaussian market portfolio. This intuitive result is confirmed by the comparison of the optimal allocations when different distributional hypotheses are assumed. Therefore, the investors who fit the data with the stable distributions are generally more risk preserving than the investors who fit the data with the normal laws because they consider the component of risk due to the heavy tails. Section 2 presents a first classification of the parametric distributions consistent with the maximization of the expected utility. Section 3 analyzes the asymptotic distributional assumption. In Section 4 we compare the stable sub-Gaussian multivariate approach with the normal multivariate one. In the last section, we briefly summarize the results. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 553 2. Choices determined by a finite number of parameters In this section we propose a distributional analysis of the optimal portfolio choice problem among n + 1 assets: n of those assets are risky with gross returns1 Z = [Z1,...,Zn] , and the (n + 1)-th asset has risk-free gross return Z0. When unlimited short selling is allowed, every portfolio of gross returns is a linear combination of the constant riskless gross return Z0, and the risky returns Zi, i.e.: x0Z0 + n i=1 xiZi, (1) where (x0,x) Rn+1, x Rn. Therefore, the distribution functions of all admissible investments belong to a translation and scale invariant family2 determined by a finite number of parameters. Assume price taker agents have preferences depending only on the probability distribution of terminal wealth. This assumption allows von Neumann­Morgenstern's preferences (1953) over wealth or more generally Machina's preferences (1982) over wealth but it precludes state dependent preferences. Assume that the market faced by a decision maker comes from a standard model of perfect market (no transaction costs, taxes, asymmetric information, or arbitrage opportunities and all securities are perfectly divisible) which may not be complete. Thus, in order to classify the parametric portfolio distribution functions consistent with the expected utility maximization, we distinguish and analyze the differences in portfolio allocation when: (1) institutional restrictions (no short sales, limited liability) are allowed; or, (2) unlimited short selling is allowed without penalty. 2.1. Portfolio choice with institutional restrictions When limited liability and no short sales are allowed, portfolios of gross returns (i.e., x Z 0 where Zi > 0 and xi 0, i) are positive random variables. Thus, we assume that the portfolios of gross returns are positive random variables belonging to a scale invariant family, denoted with + k (a), that admits positive translations and it has the following characteristics: 1 Generally, we assume the standard definition of i-th gross return between time t and time t + 1, Zi = (Pt+1,i + d[t,t+1],i)/Pt,i, where Pt,i is the price of the i-th asset at time t and d[t,t+1],i the total amount of cash generated by the instrument between t and t + 1. We distinguish the definition of gross return (with the capital letter) from the definition of return denoted zi = Zi - 1 (or the alternative definition of continuously compounded return ri = logZi). 2 Recall that a parametric family of distribution functions is translation invariant if whenever the distribution FX(x) = P (X x) belongs to , then for every t R, FX+t as well. Similarly, we say that a family is scale invariant if whenever the distribution FX belongs to , then for every > 0, FX belongs to as well. 554 S. Ortobelli et al. (1) Every distribution FX belonging to + k (a) is associated to a positive random variable X and is identified from k parameters (mX,X,a1,X,...,ak-2,X) A Rk , where mX is the mean of X, X is the positive scale parameter of X.3 We assume that the class + k (a) is weakly determined from its parametrization. That is, the equality (mX,X,a1,X,...,ak-2,X) = (mY ,Y ,a1,Y ,...,ak-2,Y ) implies that FX d = FY , but the converse is not necessarily true. (2) For every admissible real t 0, the distribution function FX + k (a) has the same parameters as FX+t + k (a), except the mean and the dispersion measure. In particular, the application f (t) = X+t is a nonincreasing continuous function. (3) For every admissible positive , the distribution function FX + k (a) has, the same parameters of the distribution FX except for the mean that is mX and the scale parameter that is X (where mX and X are respectively the mean and the scale parameter of the random variable X). When portfolios belong to a + k (a) class, we can identify stochastic dominance relations4 among portfolios and the following theorem holds. Theorem 1. Assume all random admissible portfolios of gross returns belong to a + k (a) class. Let w Z and y Z be a couple of portfolios respectively determined by the parameters (mw Z,w Z,a1,p,...,ak-2,p) and (my Z,y Z,a1,p,...,ak-2,p). Then, the following implications hold: (1) Suppose mw Z w Z = my Z y Z , then w Z FSD y Z if and only if w Z > y Z. (2) mw Z w Z my Z y Z and w Z y Z with at least one inequality strict, implies w Z FSD y Z. 3 In our context we use the mean as location parameter but the analysis can be extended to translation invariant families which do not admit finite the first moment. Moreover, we recall Pitman's seminal work (1939) on the estimation of location and scale parameters. 4 Recall that the portfolio x Z first order stochastically deaminates (FSD) y Z if and only if for every increasing utility functions u, E(u(x Z)) E(u(y Z)) and the inequality is strict for some u. Equivalently x Z FSD y Z if and only if P (x Z t) P (y Z t) for every real t and strictly for some t. Analogously, we say that x Z second order stochastically dominates (SSD) y Z, if and only if for every increasing, concave utility function u, E(u(x Z)) E(u(y Z)) and the inequality is strict for some u. Equivalently, x Z SSD y Z, if and only if t - Fx Z(v)dv t - Fy Z(v)dv for every real t and strictly for some t [see, among others, Quirk and Saposnik (1962), Fishburn (1964), Hanoch and Levy (1969), Hadar and Russel (1969)]. We also say that x Z Rothschild Stiglitz stochastically dominates (R­S) y Z if and only if for every concave utility functions u, E(u(x Z)) E(u(y Z)) and the inequality is strict for some u. Equivalently x Z R­S y Z if and only if E(x Z) = E(y Z) and x Z SSD y Z [see Rothschild and Stiglitz (1970)]. However, there exist many other stochastic orders used in Economics and Finance, see, among others, Levy (1992), Shaked and Shanthikumar (1994). Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 555 (3) w Z R­S y Z, if and only if mw Z = my Z and w Z < y Z. (4) mw Z w Z my Z y Z and mw Z my Z with at least one inequality strict, implies w Z SSD y Z. (5) w Z y Z and w Z SSD y Z, implies w Z FSD y Z. (6) mw Z my Z and w Z y Z with at least one inequality strict, implies w Z SSD y Z. The proofs of Theorem 1 and of the next results are given in the appendix. Observe that there exist counterexamples to the converse of implications (2), (4), (5) and (6) in Theorem 1. Thus, in order to obtain the converse of these implications, we need additional hypotheses [see Ortobelli (2001)]. Theorem 1 stresses the limits of meanvariance rule. In fact, suppose the portfolios of gross returns (without considering the riskless gross return) belong to a + 2 (a) class uniquely determined by the mean and the variance. Then, all non-satiable investors will choose portfolio solutions of the following constrained system max x x Qx subject to E(x Z) x Qx = h, where x e = 1, (2) xi 0, i = 1,...,n, for some h, where e = [1,...,1] , Q is the variance­covariance matrix of the vector of gross returns Z = [Z1,...,Zn] . Let be the maximum standard deviation of all admissible portfolios. Let us denote with the portfolio mean of gross returns with maximum variance. As a consequence of Theorem 1 and Bawa's results (1976), when the variancecovariance matrix Q is not singular and h varies in the following interval: h ZQ-1Z = max x x Z x Qx , (3) where Z is the mean of the vector of gross returns Z, then the solutions of optimization problem (2) describe a set that contains the efficient frontier for agents with utility functions monotonically increasing in wealth. Moreover, under our assumptions, there exists a nonempty neighborhood U of the global minimum variance portfolio Z Q-1e/(e Q-1e) such that every admissible portfolio belonging to U(Z Q-1e/(e Q-1e)) is not a solution of optimization problem (2). With reference to the portfolio selection problem, recall that Markowitz (1952, 1987) and Tobin (1958, 1965) proposed the following selection rule for non-satiable risk averse investors: "From among a given set of investment alternatives (which includes the set of securities available in the market as well as all possible linear combinations of those basic securities), the admissible set of alternatives is obtained by discarding those investments with a lower mean and higher variance than a member of the given set". On the basis of 556 S. Ortobelli et al. Fig. 1. The continuous curve represents efficient portfolios for non-satiable investors considering restrictions of nonnegative wealth; - - - dominated portfolios. The class of non-satiable investor's optimal choices (which are contained in the arc ABC) is different from the class of risk averse investor's optimal choices (which are contained in the arc DEAB), even if they generally have many common choices. Therefore, when we consider the risk averse non-satiable investor's optimal choices, we obtain the feasible optimal portfolios (which are contained in the arc AB) that are only a part of portfolios given by the classic Markowitz and Tobin's rule (arc EAB). Theorem 1, we find that Markowitz­Tobin's selection rule is not optimal for non-satiable risk averse investors. In this context it is necessary to underline that no short sales or limited liability restrictions are imposed in a market where no riskless return is allowed. As a consequence, all portfolios are random variables uniformly bounded from below.5 As a matter of fact, Theorem 1 cannot be extended to nonpositive random variables. Markowitz, Tobin, Bawa and many other authors left behind this observation in their considerations using normal distributions for returns. They have considered as efficient the portfolios on the upper neighborhood of global minimum variance (EA in Figure 1) but the same portfolios whose domain is under this restriction are not all efficient. Therefore, we proved that Markowitz and Tobin selection rule cannot be optimal even when portfolios belong to a family uniquely determined from the mean and the variance. It is well known that a lower variance does not imply a better choice for a non-satiable risk averse investor [see, example, in Hanoch and Levy (1969)]. Moreover, in an opportune neighborhood of global minimum variance portfolio, optimal portfolios for non-satiable investors do not exist. However, when riskless borrowing or lending is allowed, the mean­variance rule provides a sharper decision which permits to derive the efficient set for decision making with increasing and concave utility functions. In fact, if riskless asset is allowed, the global 5 Recall that a random variable X is bounded from below (above) if there exists a real t such that P (X t) = 0 (P (X t) = 0). Analogously, a parametric family is uniformly bounded from below (above) if there exists a real t such that, for every random variable X , P (X t) = 0 (P (X t) = 0). Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 557 minimum dispersion portfolio is the riskless asset itself. Thus, as shown by Levy and Kroll (1976) and Kroll and Levy (1979), the classification of the efficient frontiers given by the stochastic dominance analysis assumes a simpler form. Under institutional restriction on the market (no short sales, limited liability), we can assume that the family of all admissible portfolios of gross returns x Z belongs to a scale invariant family which admits positive translations. If every distribution function FX is associated to a positive random variable X uniquely determined6 by (mX,X,p1,X,...,pk-2,X), where mX is the mean, X is the standard deviation and pi,X = E((X - E(X))i+2) i+2 X for i = 1,...,k - 2 are the first k - 2 nontrivial fundamental ratios, then, the family is a particular + k (a). Note for i = 1 and i = 2 the i-th fundamental ratios are respectively Pearson's asymmetry and kurtosis coefficients of the random variable X. Thus, all risk averse investors will choose non-R­S stochastically dominated portfolios among the solutions of the following constrained optimization problem min x x Qx subject to E(x Z) = m, x e = 1, (4) E((x Z - E(x Z))i) (x Qx)i/2 = qi, i = 3,...,k, xj 0, j = 1,...,n, for some m and qi, i = 3,...,k, where e = [1,...,1] , Q is the variance­covariancematrix of the vector of gross returns Z = [Z1,...,Zn] . Moreover, all non-satiable investors will choose portfolio weights, solutions of the following optimization problem max x x Qx subject to E(x Z) x Qx h, x e = 1, (5) E((x Z - E(x Z))i) (x Qx)i/2 = qi, i = 3,...,k, xj 0, j = 1,...,n, 6 Recall that a class of distributions is uniquely determined from k parameters when the equality (mX,X,a1,X,...,ak-2,X) = (mY ,Y ,a1,Y ,... ,ak-2,Y ) implies the equality of the respective distributions, i.e., FX d = FY , and vice versa. 558 S. Ortobelli et al. for some qi, i = 3,...,k, and h /, where is the maximum standard deviation of all admissible portfolios and is the mean of that portfolio of gross returns. Similarly, all non-satiable risk averse investors will choose portfolio weights among the solutions of the following optimization problem max x E(x Z) subject to E(x Z) x Qx h, x e = 1, (6) E((x Z - E(x Z))i) (x Qx)i/2 = qi, i = 3,...,k, xj 0, j = 1,...,n, for some qi, i = 3,...,k, and h j /j , where j is the maximum mean of all primary gross returns and j is the standard deviation of that return. We obtain optimization problems analogous to (4), (5) and (6) when we consider the riskless asset. In this case, the mean is given by E(x Z) + (1 - x e)Z0 and we require that 0 x e 1 instead of requiring x e = 1. Theorem 1 is used for positive random variables. However, the above results can be generalized to families of random variables, which are uniformly bounded from below. In fact, without loss of generality, we can consider a translation that makes all random variables positive. 2.2. Portfolio choice when unlimited short sales are allowed In the last fifty years the researchers of portfolio choice theory often used unbounded random variables for portfolio of returns, typically: the Gaussian laws. They also used to study continuously compounded portfolio of returns, say x r = n i=1 xi logZi, where7 ri = logZi. In particular, we assume that the distribution functions of portfolios belong to a translation and scale invariant family denoted with k(a) with the following characteristics: 7 The continuously compounded portfolio of returns x r represents an approximation to the portfolio of returns x z (i.e., n i=1 xi logZi x z, where zi = Zi - 1). Thus, continuously compounded portfolio of returns x r are equivalently identified and called portfolio of returns. However, observe that X FSD Y if and only if logX FSD logY , while X SSD Y implies log X SSD logY but the converse is not necessarily true (you can find a simple counterexample with the log-normal class). Hence, when we study the optimal choices by considering the approximation n i=1 xi log Zi x z, we find a set of choices that would be closer to the efficient set as well as the approximation would be right. (The approximation is good enough when we consider daily ­ or weekly ­ data in the empirical analysis). Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 559 (1) Every distribution FX belonging to k(a) is identified from k parameters (mX,X, a1,X,...,ak-2,X) A Rk where mX is the mean of X, X is the positive scale parameter of X. We assume that the class k(a) is weakly determined from its parameterization. That is the equality (mX,X,a1,X,...,ak-2,X) = (mY ,Y ,a1,Y ,...,ak-2,Y ) implies that FX d = FY but the converse is not necessarily true. (2) For every admissible real t, the distribution function FX k(a) has the same parameters, except the mean, as FX+t k(a) (the translated of FX). (3) For every admissible positive , the distribution function FX k(a) has the same parameters of the distribution FX k(a) except for the mean that is mX and the scale parameter that is X (where mX and X are respectively the mean and the scale parameter of the random variable X). The random variables associated to the distribution functions of a k(a) class are not uniformly bounded from below because every k(a) class is translation invariant. When portfolios belong to a k(a) class, we can identify a stochastic dominance relation among portfolios unbounded from below and the following theorem holds. Theorem 2. Suppose the distribution functions of all random portfolios belong to the same class k(a). Let w r and y r be a couple of random portfolios unbounded from below respectively determined by the parameters (mw r,w r,a1,p,...,ak-2,p) and (my r,y r,a1,p,...,ak-2,p). Then, the following properties are equivalent (1) E(w r) E(y r), w r y r with at least one inequality strict. (2) w r SSD y r and y r d = w r - (E(w r) - E(y r)) + and E(/w r) = 0. As for Theorem 2, when all portfolios are random variables unbounded from below and their distribution functions belong to a 2(a) class, two portfolios X and Y such that X > Y and X SSD Y cannot exist. On the contrary, when the random portfolios considered in Theorem 2 are random variables bounded from below, we need further assumptions to get the above equivalence [see Ortobelli (2001)]. According to Theorem 2, it follows that when all portfolios are unbounded random variables belonging to a k(a) class, it is easier to characterize their stochastic dominance properties. In this sense, the continuously compounded portfolios of returns x r, are natural candidates for a simpler stochastic dominance analysis. Samuelson (1969), Samuelson and Merton (1975) were among the first to investigate the conditions for the mean­variance criterion to provide an approximate optimum. Chamberlein (1983) has shown that when the riskless return is allowed, the families of elliptical distributions with finite variance are necessary and sufficient for the expected utility of final wealth to be a function only of the mean and the variance. Hence, when the portfolios 560 S. Ortobelli et al. are unbounded random variables8 with distribution functions belonging to the same elliptical distribution family having finite variance, we can use Markowitz and Tobin's rule to individuate the optimal portfolios. Similarly, assuming that: (a) there is no riskless asset; (b) the portfolios of returns are unbounded random variables; (c) the last n - 1 components of the return random vector are elliptically distributed (with finite variance) conditional on the first component which has an arbitrary distribution with finite variance [see Chamberlein (1983)]; then, Markowitz and Tobin's rule can be used to individuate the optimal portfolios. Thus, Theorems 1 and 2 underline a further limitation (the above point (b)) of the previous studies on this issue. We can now find optimal portfolios when all returns are unbounded random variables uniquely determined by a finite number of moments. Thus, if short sale is allowed, all risk averse investors will choose non-R­S stochastically dominated portfolios that are solutions of the constrained optimization problem (4) without the constrain xj 0, j = 1,...,n. Similarly, we obtain optimal solutions for non-satiable investors maximizing the mean for some fixed central moments. 2.3. Relations with Ross' multi-parameter models Consider the problem of optimal allocation among n + 1 assets: n of those assets are risky with non-redundant returns r = [r1,...,rn] , and the (n + 1)-th asset return is z0 risk-free. Then,we are interested in the cases of portfolio distributions belonging to a k(a) family with k < n. As argued by Ross (1978a), in order to reduce the variables of the portfolio choice problem, we have to assume some restrictions on the vector = (1,2,...,n) in the following representation of the returns: ri = q p=1 bi,p Yp - E(Yp) + i, i = 1,...,n, (7) where Yi and i are random variables and bi,j are scalars. Differently from Ross, we propose to study the case where all random variables x belong to a s(a) family. Then, the scale parameter x of random variable x , has to verify the properties relatively to the s(a) class. Thus, consider the parameterization given by: (1) the parameters of the s(a) family, and (2) the parameters c j = x b,j /x , for j = 1,...,q. This parameterization verifies the properties of a k(a) family with k = s + q. In fact, for every positive real the parameters of x (r - ) do not change except for the scale 8 The elliptical families with finite variance are symmetric around the mean and are not necessarily associated to unbounded random variables, see Ingersoll (1987), Owen and Rabinovitch (1983). Then, following Theorem 2 we need to specify when elliptically distributed random variables have to be unbounded. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 561 parameter that becomes x . Then, all portfolios belong to a k(a) family with k = s+q when the returns admit the form (7) and all admissible random variables x belong to a s(a) family. Typical examples of this approach are the stable Paretian models presented in the next section. Note that the above proposed moment analysis generalizes many of the three moment models presented in the last decades [see, for example, Kraus and Litzenberger (1976), Ingersoll (1987), Simaan (1993)]. Moreover, we do not need to require that portfolio returns verify the fund separation conditions as happened in the three moment models. Therefore, the above theorems represent a first classification of portfolio distribution functions which is alternative to those proposed from Ross (1976, 1978a). In fact, we underline the following differences from Ross' models: (1) We express necessary and sufficient conditions to identify optimal portfolios. We can derive the efficient frontiers solving a constrained optimization problem. (2) We do not require the closure of the random law under addition. (3) The above theorems are an unifying and generalizing extension of moment analysis in the portfolio selection theory. In particular, the previous analysis describes further restrictions in using Markowitz and Tobin's selection rule as optimal portfolio selection rule. (4) We express a portfolio choice theory dependent on a finite number of parameters consistent with expected utility maximization. We do not specify which parameters identify the distribution functions of asset returns. We only require very general properties which determine the existence of a scale parameter and a shift parameter. (5) The above results can be applied to every economic choice in uncertainty conditions when the distribution functions are weakly determined by a finite number of parameters and verify properties of k(a) or of + k (a) classes. Besides, this classification of choices under uncertainty conditions implies a first classification of the admissible dispersion measures [see Ortobelli (2001), Giacometti and Ortobelli (2001)]. As it follows from the previous considerations, the models introduced here can be theoretically improved and empirically tested. However, a more general theoretical and empirical analysis with further discussion, studies and comparison of the above models does not enter in the objective of this chapter and it will be the subject of future research. 3. The asymptotic distributional classification of portfolio choices In this section we study the portfolio choice problem analyzing the asymptotic behavior of data. In particular, we consider unbounded random portfolios of stable distributed returns, x r, that, with abuse of notation, we continue to call as portfolios of stable distributed returns.9 9 If logZi is stable distributed, then Zi = 1 + zi is log-stable distributed. 562 S. Ortobelli et al. The recent crashes observed in the stock market showed that the stock returns are more volatile than those predicted by the models with finite variance of the asset returns. In the empirical financial literature, it is well documented that the asset returns have a distribution whose tail is heavier than that of the distributions with finite variance, i.e., P |ri| > x x-i Li(x) as x , (8) where 0 < < 2 and Li(x) is a slowly varying function at infinity, i.e., lim x Li(cx) Li(x) 1 for all c > 0, see Rachev and Mittnik (2000) and the references therein. In particular, in the data observed until now 1 < < 2. The constrain 1 < < 2 and the relation (8) imply that returns ri admit finite mean and non-finite variance. The tail condition in (8) also implies that the vector of returns r = [r1,...,rn] is in the domain of attraction of (1,...,n)-stable law. That is, given T i.i.d (independent and identically distributed) observations on r, namely r(t) = r (t) 1 ,...,r(t) n , t = 1,2,...,T, then, there exist normalizing constants a(T ) = a(T ) 1 ,...,a(T ) n Rn + and b(T ) = b(T ) 1 ,...,b(T ) n Rn , such that T i=1 r (i) 1 a(T ) 1 + b (T ) 1 ,..., T i=1 r (i) 1 a(T ) n + b(T ) n d - S(1,...,n) as T , (9) where S(1,...,n) is (1,...,n)-stable random variable. This convergence result is a consequence of the stationary behavior of returns and of the Central Limit Theorem for normalized sums of i.i.d. random variables which determines the domain of attraction of each stable law [see Zolotarev (1986)]. Therefore, any distribution in the domain of attraction of a specified stable distribution will have properties close to those of the stable distribution. The constants a (T ) j in (9) have the form a (T ) j = T 1/j Lj (T ), where Lj (T ) are slowly varying functions as T . Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 563 Each component of S(1,...,n) = (s1,...,sn) has a Pareto­Lévy stable distribution, i.e., its characteristic function is given by j (t) = exp - j j |t|j 1 - ij sgn(t)tan j 2 + ij t if j = 1, exp -j |t| 1 + ij 2 sgn(t)log|t| + ij t if j = 1, (10) where j (0,2) is the so-called stable (tail) index of sj , > 0 is the scale (or dispersion) parameter, [-1,1] is a skewness parameter and is a location parameter. Moreover, for every fixed , the Pareto­Lévy -stable law is a 3(a) class. When > 1 the location parameter is the mean. However, there is a considerable debate in literature concerning the applicability of -stable distributions as they appear in Lévy's central limit theorems. A serious drawback of Lévy's approach is that in practice one can never know whether the underlying distribution is heavy tailed, or just has a long but truncated tail. Limit theorems for stable laws are not robust with respect to truncation of the tail or with respect to any change from light to heavy tail, or conversely. Based on finite samples, one can never justify the specification of a particular tail behavior. Hence, one cannot justify the applicability of classical limit theorems in probability theory. Therefore, instead of relying on limit theorems, we can use the so-called pre-limit theorem which provides an approximation for distribution functions in case the number of observation T is "large" but not too "large" [see Klebanov, Rachev and Szekely (2001), Klebanov, Rachev and Safarian (2000)]. In particular the "pre-limiting" approach helps to overcome the drawback of Lévy-type central limit theorems. As a matter of fact, we can assume that returns are bounded "far away", say daily returns cannot be outside the interval [-0.5,0.5]. Thus, considering the empirical observation on asset returns, we can assume that the asset returns ri are truncated i-stable distributed with support, [-0.5,0.5]. Even if the returns will be attracted by the CLT to the Gaussian law, pre-limit theorems show that for any reasonable T the truncated stable laws will be attracted to the stable laws. Therefore, it is plausible assuming that the vector of returns r = [r1,...,rn] is in the domain of attraction of a n-dimensional (1,...,n)-stable law. In order to express a multi-parameter choice in portfolio selection theory coherent with the empirical evidence and consistent with the expected utility maximization, we need the asymptotic distributional assumption consisting in: (1) (Heavy tailedness assumption) Portfolios x r are unbounded random variables belonging to Lp with 1 < p 2 and the return vector r = [r1,...,rn] is in the domain of attraction of (1,...,n)-stable law (1 < i 2, i = 1,...,n). The assumption 1 < i 2 is supported by increasing empirical results as shown by Mandelbrot (1963a, b, 1967), Fama (1963, 1965, b), Mittnik, Rachev and Paolella (1997), Rachev and Mittnik (2000). (2) (Consistency with the expected utility maximization) The distributions of the portfolio returns x r belong to the same k(a) class of distribution functions. 564 S. Ortobelli et al. Under these assumptions, as for Theorem 2, we obtain an admissible frontier for nonsatiable and non-satiable risk averse investors. A simpler way to express the asymptotic behavior of data consists in considering every portfolio in the domain of attraction of a Pareto­Lévy stable distribution with > 1. Given that, we implicitly assume that all optimal choices are identified by four parameters of the underlined stable law. Therefore, every portfolio x r can be well approximated by a stable distribution, i.e., we can assume: x r + (1 - x e)z0 d = S(x) (x),(x),(x) , (11) where z0 is the riskless return, (x) (min1 i n i,2) is the index of stability, j > 1 is the index of stability of the j-th asset return, (x) is the scale parameter, (x) = x E(r) + (1 - x e)z0 is the mean and (x) is the skewness parameter. Properties of 4(a) class are verified with this parameterization, so according to Theorem 2 every risk averse investor will choose a portfolio weight, solution of the following constrained problem min x (x) subject to x E(r) + (1 - x e)z0 = m, (12) (x) = , (x) = for some m, , . In this case, we are not able to find a closed form of the efficient frontier because we do not know a priori the joint distribution of the asset returns. In order to overcome this problem, we could consider another admissible parameterization of stable distribution for problem (11). For example, we can prove that the mean (x) = x E(r) + (1 - x e)z0, the scale parameter s(x) = E(|x r - x E(r)|) and the fundamental ratios 1(x) = E(|x r - x E(r)|q1) (s(x))q1 and 2(x) = E((x r - x E(r)) q2 ) (s(x))q2 , where q1,q2 (1,min1 i n i); represent a parameterization which verifies the properties of 4(a) class.10 In fact, first observe that 1(x) and 2(x) do not depend on (x) and (x) because x r - x E(r) q1 d = (x)q1 S(x)(1,(x),0) q1 , 10 The symbology x t stands for sgn(x)|x|t . Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 565 and also x r - x E(r) q2 d = (x)q2 S(x)(1,(x),0) q2 . Thus, as a consequence of Property 1.2.17 in Samorodnisky and Taqqu (1994) 1(x) = E(|x r - x E(r)|q1) (s(x))q1 = K (1 - q1/(x))cos(arctan((x)tan((x)/2))q1/(x)) ( (1 - 1/(x))cos(arctan((x)tan((x)/2))1/(x)))q1 , where K is a constant that depends only on q1. Hence, for every q1 (1,min1 i n i) and for every fixed (x), 1(x) is a decreasing function of (x) on the existence interval. Moreover, 1(x) is an even function of (x) and it decreases in |(x)| for fixed (x) (min1 i n i,2). Instead, 2(x) is an increasing odd function of for every q2 (1,min1 i n i) and for every fixed (x) (min1 i n i,2). These relations imply that 1(x) and 2(x) uniquely determinate (x) and (x). Then, under the assumption (11), every risk averse investor will choose a portfolio weight, solution of the following constrained problem min x E x r - x E(r) subject to x E(r) + (1 - x e)z0 = m, (13) E(|x r - x E(r)|q1) (s(x))q1 = 1, E((x r - x E(r)) q2 ) (s(x))q2 = 2 for some m, 1, 2. Differently from problem (12), problem (13) does not require the knowledge of the joint distribution of asset returns but it is still computationally too complex. Generally, in order to identify the efficient frontier and reduce the number of parameters, we assume that i = for all i = 1,...,n. Observe that stable distributions are stable with respect to summation of i.i.d. random stable variables and the vector of returns r = [r1,...,rn] is -stable distributed with > 1 if and only if all linear combinations are stable [see Samordinsky and Taqqu (1994, Theorems 2.1.2 and 2.1.5)]. In this case the joint characteristic function of returns is given by r(t) = exp - Sn |t s| 1 - isgn(t s)tan 2 (ds) + it , where is the index of stability, (ds) is the spectral measure concentrated on Sn = {s Rn | s = 1}. 566 S. Ortobelli et al. Thus, when the vector of returns is stable distributed (with > 1), every portfolio x r + (1 - x e)z0 (except the riskless return, i.e., x = 0) is distributed as x r + (1 - x e)z0 d = S (x),(x),(x) , where (x) = x E(r) + (1 - x e)z0, (x) = Sn |x s| (ds) 1/ and (x) = Sn |x s| sgn(x s) (ds) ((x)) are respectively the mean, the scale parameter and the skewness parameter of the portfolio x r + (1 - x e)z0. Under this distributional assumption, every risk averse investor will choose a portfolio weight, solution of the following constrained problem min x (x) subject to x E(r) + (1 - x e)z0 = m, (14) (x) = for some m and . In order to determine estimates of the scale parameter and of the skewness parameter, we can consider the tail estimator for the index of stability and the estimator for the spectral measure (ds) proposed by Rachev and Xin (1993) and Cheng and Rachev (1995). However, even if the estimates of the scale parameter and the skewness parameter are computationally feasible, they require numerical calculations. Thus, model (14) does not present an easy applicability from an empirical point of view. Similarly to problem (13), we can fix q < and propose a different representation based on the moments type constrains. Therefore, instead of model (14), we obtain the following constrained problem min x E x r - x E(r) subject to x E(r) + (1 - x e)z0 = m, (15) E((x r - x E(r)) q2 ) (E(|x r - x E(r)|))q2 = 2 for some m and 2. Optimization problems (15) and (13) can be used in a more general setting than optimization problems (12), (14). In fact, a priori other classes of distribution functions (not only stable distributions) for returns uniquely determined by the parameters m(x), s(x), 1(x) and 2(x) could exist. Next, in order to overcome the intrinsic difficulties of the problems (12)­(14) and (15), we analyze different fund separation models that consider the asymptotic distributional assumption. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 567 3.1. The sub-Gaussian stable model Assume the vector of returns r = [r1,...,rn] is sub-Gaussian -stable distributed with 1 < < 2. Then, the characteristic function of r has the following form r(t) = E exp(it r) = exp -(t Qt)/2 + it , (16) where Q = [Ri,j /2] is a positive definite (n × n)-matrix, = E(r) is the mean vector, and (ds) is the spectral measure with support concentrated on Sn = {s Rn | s = 1}. The term Ri,j is defined by Ri,j 2 = [~ri, ~rj ] ~rj 2- , (17) where ~rj = rj - j are the centralized return, the covariation [~ri, ~rj ] between two jointly symmetric stable random variables ~ri and ~rj is given by [~ri, ~rj ] = S2 si|sj |-1 sgn(sj ) (ds), in particular, ~rj = ( S2 |sj | (ds))1/ = ([~rj , ~rj ])1/. Here the spectral measure (ds) has support on the unit circle S2. This model can be considered as a special case of Owen­Rabinovitch's elliptical model [see Owen and Rabinovitch (1983)]. However, no estimation procedure of the model parameters is given in the elliptical models with non-finite variance. In our approach we use (16) and (17) to provide a statistical estimator of the stable efficient frontier. To estimate the efficient frontier for returns given by (16), we need to consider an estimator for the mean vector and an estimator for the dispersion matrix Q. The estimator of is given by the vector ^ of sample averages. Using Lemma 2.7.16 in Samorodnitsky and Taqqu (1994) we can write for every p such that 1 < p < [~ri, ~rj ] ~rj = E(~ri ~r p-1 j ) E(|~rj |p) , (18) where the scale parameter j can be written ~rj = j . It can be approximated by the moment method suggested by Samorodnitsky and Taqqu (1994) (Property 1.2.17) in the case = 0 p j = ~rj p = E(|~rj |p)p + 0 u-p-1 sin2 udu 2p-1 (1 - p/) . (19) It follows Ri,j 2 = 2 j E(~zi ~z p-1 j ) E(|~zj |p) = 2-p j p + 0 u-p-1 sin2 udu 2p-1 (1 - p/) E ~zi ~z p-1 j . 568 S. Ortobelli et al. The above suggests the following estimator Q = [Ri,j /2] for the entries of the unknown covariation matrix Q Ri,j 2 = ^ 2-p j p + 0 u-p-1 sin2 udu 2p-1 (1 - p/) 1 N N k=1 ~z (k) i ~z (k) j p-1 , (20) where the 2 j is estimated as follows ^2 j = Rj,j 2 = 1 N N k=1 |~r(k) j |pp + 0 u-p-1 sin2 udu 2p-1 (1 - p/) 2/p . (21) The moment estimator makes most sense for each fixed p (1,). The rate of convergence of the empirical matrix Q = [Ri,j /2] to the unknown matrix Q (to be estimated), will be faster, if p is as large as possible, see Rachev (1991). Now, let us recall that our portfolio satisfies the relation x r d = S(x r,x r,mx r) and furthermore, W = z0 when x = 0, otherwise W = x r + (1 - x e)z0 d = S(x r,x r,mW ), where is the index of stability, x r = x Qx is the scale (dispersion) parameter, x r = 0 is the skewness parameter and mW = x E(r) + (1 - x e)z0 is the mean of W. In particular, every sub-Gaussian -stable family is a particular 2(m,) class. In view of what stated before, when the returns r = [r1,...,rn] are jointly sub-Gaussian -stable distributed, every risk averse investor will choose an optimal portfolio among all portfolio solutions of the following optimization problem: min x x Qx subject to x + (1 - x e)z0 = mW (22) for some given mean mW , where W = x r + (1 - x e)z0. Thus, every optimal portfolio that maximizes a given concave utility function u, max x E u x r + (1 - x e)z0 belongs to the mean­dispersion frontier = m - z0 ( - ez0) Q-1( - ez0) if m z0, z0 - m ( - ez0) Q-1( - ez0) if m < z0, (23) Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 569 where = E(r); m = x + (1 - x e)z0; e = [1,...,1] ; and 2 = x Qx. Besides, the optimal portfolio weights x satisfy the following relation: x = Q-1 ( - z0e) m - z0 ( - ez0) Q-1( - ez0) . (24) Note that (23) and (24) have the same forms as the mean­variance frontier. In particular, they assume a more general form for nonnecessarily symmetric dispersion matrix Q. As a matter of fact, even if Q is a symmetric matrix (it is definite positive) the estimator proposed in the sub-Gaussian cases (21) and (22) generally is not necessarily symmetric. Therefore, in some extreme cases we could obtain the inconsistent situation of stable distribution associated to a portfolio x with square scale parameter equal or lower than zero.11 Moreover, (24) exhibits the two fund separation property for both the stable and the normal case but the matrix Q and the parameter have different meaning. In the normal case, Q is the variance­covariance matrix and is the standard deviation, while in the stable case Q is a dispersion matrix and is the scale (dispersion) parameter, = x Qx. According to the two-fund separation property of the sub-Gaussian -stable approach, we can assume that the market portfolio is equal to the risky tangent portfolio under the equilibrium conditions (as in the classical mean­variance Capital Asset Pricing Model (CAPM)). Therefore, every optimal portfolio can be seen as the linear combination between the market portfolio x r = r Q-1( - z0e) e Q-1 - e Q-1ez0 , (25) and the riskless asset return z0. Following the same arguments as in Sharpe, Lintner, Mossin's mean­variance equilibrium model, the return of asset i is given by: E(ri) = z0 + i,m E(x r) - z0 , (26) 11 Observe that for every x Rn, we get x Qx > 0 if and only if (Q + (Q) )/2 is a definite positive matrix. Thus, we can verify that (Q + (Q) )/2 is definite positive in order to avoid stable portfolios x z with negative scale parameter estimators. Moreover, we observe that the symmetric matrix (Q + (Q) )/2 is an alternative estimator of the dispersion matrix Q whose statistical properties have to be proved. In particular, if we want to simulate the vector ~z = [~z1,... , ~zn] of the centred stable sub-Gaussian return distributions, we generally use the dispersion matrix = (Q + (Q) )/2. As a matter of fact, we first generate the vector G = [G1,...,Gn] of the joint Gaussian distribution G = N(0,) using the Cholesky decomposition matrix. Then, the vector of returns [see Samorodnitsky and Taqqu (1994)] is given by: ~z = AG, where A d = S/2(2(cos(/4))2/,1,0) is an /2 stable random variable independent of the Gaussian vector G. 570 S. Ortobelli et al. where i,m = x Qei/x Qx, with ei the vector with 1 in the i-th component and zero in all the other components. As a consequence of Ross' necessary and sufficient conditions of two-fund separation [see Ross (1978a)], the above model admits the form ri = i + biY + i, i = 1,...,n, where i = E(ri), E(/Y) = 0, = (1,2,...,n) , b = [b1,...,bn] and the vector bY + is sub-Gaussian -stable distributed with zero mean. Hence, our sub-Gaussian -stable version of CAPM is not much different from Gamrowski­Rachev's (1999) version of the two-fund separation -stable model. As a matter of fact, Gamrowski and Rachev (1999) propose a generalization of Fama's -stable model (1965b) assuming ri = i + biY + i, for every i = 1,...,n, where i and Y are -stable distributed and E(/Y) = 0. In view of their assumptions, E(ri) = z0 + ~i,m E(x r) - z0 , where ~i,m = 1 x ~r x ~r xi = [~ri, x ~r] x ~r . Furthermore, the coefficient [~ri, x ~r]/ x ~r can be estimated using the above formula (18). Now, we see that in the above sub-Gaussian symmetric -stable model x Qx = x ~r 2 and x Qei = 1 2 x ~r 2 / xi. Thus, we get the equivalence between the coefficient i,m of model (26) and ~i,m of Gamrowski­Rachev's model, i.e.: i,m = x Qei x Qx = 1 x r x r xi = [~ri, x ~r] x ~r = ~i,m, where x r is the scale parameter of market portfolio. 3.2. A three fund separation model in the domain of attraction of a stable law Let us assume that the vector r = [r1,...,rn] describes the following three-fund separating stable model of security returns: ri = i + biY + i, i = 1,...,n, (27) where the random vector = (1,2,...,n) is independent from Y and follows a joint sub-Gaussian 1-stable distribution (1 < 1 < 2 ), with zero mean and characteristic func- tion (t) = exp -|t Qt|1/2 , Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 571 where Q is the definite positive dispersion matrix. On the other hand, Y d = S2 (Y ,Y ,0) is 2-stable distributed random variable independent from , with 1 < 2 < 2 and zero mean. Under these assumptions, the portfolios are in the domain of attraction of a stable law with = min(1,2) and belong to a 3(a) family. A testable case in which Y is 2-stable symmetric distributed (i.e., Y = 0), was recently studied by Götzenberger, Rachev and Schwartz (1999). When Y = 0 and 1 = 2, our model can lead to the twofund separation Fama's model. The characteristic function of the vector of returns r = [r1,r2,...,rn] is given by: r(t) = (t)Y (t b)eit = exp -|t Qt|1/2 - |t bY |2 1 - iY sgn(t b)tan 2 2 + it , (28) where b = [b1,...,bn] is the coefficient vector and = [1,...,n] is the mean vector. Next we shall estimate the parameter in model (27), (28). First, the estimator of is given by the vector ^ of sample average. Then, we consider as factor Y a centralized index return (for example the market portfolio (25) given by the above sub-Gaussian model). Therefore, given the sequence of observations Y(k), we can estimate its stable parameters. Observe that the random vector admits a representation as a product of random variable V and Gaussian vector G: = V G. V = A, where A is an 1/2-stable subordinator, that is A d = S1/2 cos 1 4 2/1 ,1,0 ; G is a (n × 1)-Gaussian vector with null mean and variance­covariance matrix Q and it is independent from A. We can generate values Ak, k = 1,...,N, of A independent from G. We address to Paulauskas and Rachev's work (1999) the problem of generating such values Ak. Using the centralizing returns ~rj = rj - j on Y we write the following OLS estimators12 for b = [b1,...,bn] and Q: ^bi = N k=1 Y(k) ~r(k) i /Ak N k=1 (Y(k))2/Ak , i = 1,...,n, and Q = 1 N N k=1 (~r(k) - ^bY(k))(~r(k) - ^bY(k)) Ak . 12 For a discussion see Tokat, Rachev and Schwartz (2002). 572 S. Ortobelli et al. The selection of 1 is a separate problem. A possible way to estimate 1 is to consider the OLS estimator ~bi = N k=1 Y(k) ~r (k) i / N k=1(Y(k))2 and then to evaluate the sample residuals ~(k) = ~r(k) - ~bY(k). If these residuals are heavy tailed, one can take the tail exponent as an estimator for 1. The asymptotic properties of the above estimator can be derived arguing similarly with Paulauskas and Rachev (1999) and Götzenberger, Rachev and Schwartz (1999). In order to determine portfolios R­S non-dominated when unlimited short selling is allowed, we have to minimize the scale parameter W = x Qx for some fixed mean mW = x + (1 - x e)z0 and ~b = x b/ x Qx. Alternatively, as shown by Ortobelli, Rachev and Schwartz (2002), we can obtain these portfolios from the solution of the following quadratic programming problem: min x x Qx subject to x + (1 - x e)z0 = mW , x b = b (29) for some mW and b. Thus, under our assumptions, every portfolio that maximizes the expected value of a given concave utility function u, max x E u(x r) belongs to the following frontier (1 - 2 - 3)z0 + 2 r Q-1( - z0e) e Q-1( - z0e) + 3 r Q-1b e Q-1b (30) spanned by the riskless return z0, and the two risky portfolios u(1) = r Q-1( - z0e) e Q-1( - z0e) and u(2) = r Q-1b e Q-1b . Observe in (28) that when = 1 = 2 > 1, every portfolio x r is an -stable distribution and satisfies the relation W = (1 - x e)z0 + x r d = S x r,x r,(1 - x e)z0 + mx r and W = z0 when x = 0, where x r = (x Qx)/2 + |x bY | , x r = |x bY | sgn(x b)Y x r , mx r = x E(r). Hence, this jointly -stable model is a fund separation model whose solutions are given by the optimization problem (14) and these solutions satisfy the quadratic programming problem (29). Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 573 3.3. A k + 1 fund separation model in the domain of attraction of a stable law As empirical studies show in the stable case one of the most severe restrictions of performance measurement and asset pricing is the assumption of a common index of stability for all assets ­ individual securities and portfolio alike. It is well understood that asset returns are not normally distributed. We also know that the return distributions do not have the same index of stability. However, under the assumption that returns have different indexes of stability, it is not generally possible to find a closed form to the efficient frontier. Generalizing the above model instead, we get the following k + 1 fund separation model [for details on k fund separation models see Ross (1978a)]: ri = i + bi,1Y1 + + bi,k-1Yk-1 + i, i = 1,...,n. (31) Here, n k 2, the vector = (1,2,...,n) is independent from Y1,...,Yk-1 and follows a joint sub-Gaussian symmetric k-stable distribution with 1 < k < 2, zero mean and characteristic function (t) = exp(-|t Qt|k/2), and the random variables Yj d = Sj (Yj ,Yj ,0), j = 1,...,k - 1, are mutually independent13 j -stable distributed with 1 < j < 2 and zero mean. Under these assumptions, the portfolios belong to a k+1(a) class. If we need to insure the separation obtained in situations where the above model degenerates into a p-fund separation model with p < k + 1, we require the rank condition [see Ross (1978a)]. In order to determine portfolios R­S non-dominated when unlimited short selling is allowed, we have to minimize the scale parameter W = x Qx for some fixed mean mW = x + (1 - x e)z0 and ~bj = x b,j / x Qx, j = 1,...,k - 1. Alternatively, as shown by Ortobelli, Rachev and Schwartz (2002), we can obtain these portfolios from the solution of the following quadratic programming problem: min x x Qx subject to x + (1 - x e)z0 = mW , x b,j = cj , j = 1,...,k - 1. (32) 13 In order to estimate the parameters, we need to know the joint law of the vector (Y1,...,Yk-1). Therefore, we assume independent random variables Yj , j = 1,... ,k - 1. Then the characteristic function of the vector of returns r = [r1,...,rn] is given by r(t) = (t) k-1 j=1 Yj (t b,j )eit . Under this additional assumption, we can approximate all parameters of any optimal portfolio using a similar procedure of the previous three fund separation model. However, if we assume a given joint (1,... ,k-1) stable law for the vector (Y1,...,Yk-1), we can generally determine estimators of the parameters studying the characteristics of the multivariate stable law. 574 S. Ortobelli et al. By solving the optimization problem (32), we obtain that the riskless portfolio and other k risky portfolios span the efficient frontier for the risk averse investors given by 1 - k j=1 j z0 + 1 r Q-1( - z0e) e Q-1( - z0e) + k-1 j=1 j+1 r Q-1b,j e Q-1b,j . The above multivariate models are motivated by arbitrage considerations as in the Arbitrage Pricing Theory (APT) [see Ross (1976)]. Without going into details, it should be noted that there are two versions of the APT for -stable distributed returns, a so-called equilibrium [see Chen and Ingersoll (1983), Dybvig (1983), Grinblatt and Titman (1983)] and an asymptotic version [see Huberman (1982)]. Connor (1984) and Milne (1988) introduced a general theory which encompassed the equilibrium APT as well as the mutual fund separation theory for returns belonging to any normed vector space (hence also symmetric -stable distributed returns). While Gamrowski and Rachev (1999) provide the proof for the asymptotic version of -stable distributed returns. Hence, it follows from Connor and Milne's theory that the above model in the domain of attraction of a stable law of the return is coherent with the classic arbitrage pricing theory and the mean returns can be approximated by the linear pricing relation i z0 + bi,11 + + bi,k-1k-1, where p, for p = 1,...,k - 1, are the risk premiums relative to the different factors. The above k + 1 fund separation model concludes the examples of models in the domain of attraction of stable laws. In the next section we compare the Gaussian multivariate approach with the sub-Gaussian stable one. 4. A first comparison between the normal multivariate distributional assumption and the stable sub-Gaussian one In this section we examine and compare the stable sub-Gaussian assumption with the normal distributional one. Thus, we implicitly assume that returns belong to a 2(m,) class where m is the mean and is either the scale parameter of stable distributions or the standard deviation of normal distributions. In a recent work Ortobelli, Rachev and Schwartz (2002) compare the stable nonGaussian assumption and the normal one by analyzing optimal allocations between a riskless return and a benchmark index. Three different indexes have been taken into consideration: CAC40, DAX30 and S&P 500. Their analysis has indicated that either the heavy tails of data or a greater centralization of data around the mean can have a significant impact on the approximation of the investors' choices. However, the stable non-Gaussian allocation is generally more risk preserving than the normal one. Precisely, the stable approach considers a further component of risk which is due to the fat tails of the return Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 575 distributions. This fact does not surprise us excessively. As a matter of fact, also Mehra and Prescotťs empirical analysis (1985) underlines that asset pricing puzzles can be justified thinking of people much more risk averse. Clearly, we do not believe that the equity premium puzzle can be explained only considering the sub-Gaussian stable distribution instead of the Gaussian one. However, we believe that the distributional differences between the data and the classic model used in finance can help to understand asset pricing puzzles. This conjecture is partly confirmed by assuming the stable distributions in place of the Gaussian one [see, for example, Kocherlakota's (1997) test on CCAPM with heavytailed pricing errors]. Next, we extend Ortobelli, Rachev and Schwartz's comparison to the multivariate case. This comparison is formally and theoretically different from the previous one because here the benchmark index is given by the market portfolio which generally will change, if the distributional assumptions change too. Thus, as a consequence of Roll (1977, 1978, 1979a, b), Dybvig and Ross' (1985a, b) analysis, we observe that: (a) an investor, who fits the return distributions with a joint 1-stable sub-Gaussian distribution, will consider as inefficient the choice of another investor who fits the return distributions with a joint 2-stable sub-Gaussian distribution with 1 = 2; and (b) the stable CAPM is still subject of some of the criticism already addressed to the classical one. Nevertheless, it seems that the stable case explains better the empirical data. This is the main reason why here we interpret and analyze the different behavior between the investor who fits the data with joint stable sub-Gaussian distribution and the investor who fits the data with the joint normal distribution. 4.1. An optimal allocation problem First, we consider the optimal allocation among 24 assets: 23 of those assets are risky assets with returns r = [r1,r2,...,r23] and the 24th is riskfree with annual rate 6%. We analyze the portfolio choice problems when short sales are allowed and when short sales are not allowed. In view of this comparison, we discuss and study the differences in portfolio choice problems without examining them so as to choose one of the two assumptions (Gaussian or sub-Gaussian).14 In our comparison we use daily data taken from 23 international risky indexes valued in USD and quoted from January 1995 to January 1998. In the analysis proposed we first consider the maximum likelihood estimation of the stable parameters and of the Gaussian ones for every risky asset. Thus, Tables 1 and 2 assembles the approximating parameters obtained from using the program STABLE.15 In order to compare the different stable sub-Gaussian joint distributions and the joint normal distributions for the asset returns, we assume that the vector r is sub-Gaussian 14 On this topic, recent studies [see Ortobelli et al. (2001), Ortobelli, Huber and Schwartz (2002)] have shown that sub-Gaussian multivariate models present a superior performance with respect to the mean­variance model. 15 See Nolan (1997) and the web site www.ca.american.edu/jpnolan. 576 S. Ortobelli et al. -stable distributed, with = k, k = 1,2 where 1 = 1.7488 represents the average of the indexes of stability and 2 = 1.8856 represents the maximum of the indexes of stability (see Table 2).16 Moreover, when in the following tables we consider the index of stability = 2, we implicitly assume that the returns are jointly normal distributed. Thus, every portfolio of risky assets is stable distributed in the following way: x r d = Sk (x r,x r,mx r), where k is one of the considered index of stability k = 1,2, x r = (x Qkx)1/2 is the respective scale parameter, Qk = [Rij /2]k is the dispersion matrix, with k = 1,2, x r = 0 is the skewness parameter, and mx r represents the mean of x r. Observe that the matrix Qk is estimated with the method defined in the previous section and thus it depends on the index of stability k for k = 1,2. As observed previously, the rate of convergence of the empirical matrix Qk to the unknown matrix Qk will be faster, if p is as large as possible. In our estimations we use p1 = 1.7 (relative to 1 = 1.7488) and p2 = 1.8 (relative to 2 = 1.8856). We assume the investors wish to maximize the following utility functional: U(W) = E(W) - cE W - E(W) q , (33) where c and q are positive real numbers, W = z0 + (1 - )x r is the return on the portfolio, z0 is the risk-free asset return, and x r = r Q-1 k ( - z0e) e Q-1 k - e Q-1 k ez0 is the tangent portfolio of returns (25). With reference to the allocation problem (33), we observe: (1) Problem (33) is equivalent to the following maximization of the utility functional aE(W) - bE W - E(W) q , (34) assuming c = b/a in (33) for every a,b > 0. Thus, E(|W - E(W)|q) represents a particular risk measure of portfolio loss, which satisfies (under the opportune standardization) the main characteristics of the typical dispersion measures. Solving the optimal allocation problem (33), the investor implicitly maximizes the expected mean of the increment wealth aW as well as minimizes the individual risk bE(|W - E(W)|q). (2) Furthermore, when q = 2, the maximization of utility functional (33) motivates the mean­variance approach in terms of preference relations. 16 We consider different indexes of stability, in order to value the effects of heavy-tailedness on the portfolio selection problems. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 577 Suppose X dominates Y in the sense of R­S. Since E(X) = E(Y) and f (x) = c|x E(X)|q is a concave utility function, for every q [1,), it follows that: U(X) = E(X) - cE X - E(X) q U(Y), q [1,). The above inequality implies that every risk averse investor with utility functional (34) should choose a portfolio W = z0 + (1 - )x r that maximizes the utility functional (33) for some real and some q [1,). We know that for = 1, all the portfolio returns W = z0 + (1 - )x r admits stable distribution Sk |1 - |x r,0,z0 + (1 - )mx r , k = 1,2, and W = z0 when = 1. Now, in order to solve the asset allocation problem max E(W) - cE W - E(W) q , notice first that, for all q [1,) and 1 < < 2, we get U(W) = E(W) - cE W - E(W) q = z0 + (1 - )mx r - c H(,0,q) q |1 - |q q x r, where H(,0,q) q = 2q-1 (1 - q/) q 0 u-q-1 sin2 udu [see Samorodnitsky and Taqqu (1994), Hardin (1984)]. The above relation analyzes the stable non-Gaussian case. When the vector r admits a joint normal distribution (i.e., = 2), then for all q > 0, U(W) = E(W) - cE W - E(W) q = z0 + (1 - )mx r - c 2q/2 ((q + 1)/2) |1 - |q q x r. Hence, the real optimal solution of the problem in the important case q (1,), is given by = 1 - sgn(1 - ) sgn(1 - )(mx r - z0) qc q x rV (,0,q) 1/(q-1) (35) 578 S. Ortobelli et al. and x = (1 - )x, (36) where x is given by (25) and V (,0,q) = H(,0,q) q in the stable case (1 < < 2), 2q/2 ((q + 1)/2) in the normal case ( = 2). Again, one would expect that the optimal allocation was different because the constant V (,0,q) and the matrix Q are different in the stable sub-Gaussian and in the normal case. 4.2. Stable versus normal optimal allocation: a first comparison We analyze the differences in optimal allocations with reference to problem (33) when the investor chooses: (1) joint normal distribution, or, (2) joint k stable sub-Gaussian distribution (k = 1,2) where 1 = 1.7488, 2 = 1.8856 as a model for the asset returns in his/her portfolio. Under these distinctive assumptions, the investors with utility functional (33) have different information about the distributional behavior of data. In particular, we examine the different market portfolio composition and the different investor's wealth allocation in the riskless asset. First, when short sales are allowed and when short sales are not allowed, we examine optimal allocation among the riskless return and 23 index-daily returns: DAX 30, DAX 100 Performance, CAC 40, FTSE all share, FTSE 100, FTSE actuaries 350, Reuters Commodities, Nikkei 225 simple average,Nikkei 300 weighted stock average,Nikkei 300 simple stock average, Nikkei 500, Nikkei 225 stock average, Nikkei 300, Brent Crude Physical, Brent current month, Corn No 2 Yellow cents, Coffee Brazilian, Dow Jones Futures 1, Dow Jones Commodities, Dow Jones Industrials, Fuel Oil No 2, Goldman Sachs Commodity, S&P 500. We use the riskless return 6% p.a. Using the estimated daily index parameters, we can compute the dispersion matrixes and the approximating "market" portfolios. The dispersion matrix Q is given by either the variance­covariance matrix (in the normal case) or the matrix Qk (in the stable cases) which depends on the index of stability k for k = 1,2 (1 = 1.7488 and 2 = 1.8856). Therefore, as shown by Tables 3, 4, the market portfolio weights x = Q-1( - z0e) e Q-1 - e Q-1ez0 change under the different distributional assumptions. We observe that the market portfolio composition does not change excessively when we use either the asymmetric estimator (20) and (21) of matrix Qk or the symmetric one (Qk + (Qk) )/2. However, using daily Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 579 data the elements of the dispersion matrixes are of orders 10-6. Thus the approximation in using data could be determinant to express elements of the matrixes. In particular, Table 3 presents the market portfolio weights when we consider all 23 asset returns and short sales are allowed. Table 4 gives the market portfolio weights when no short sales are allowed. Under this constraint, we value the market portfolio weights in terms of the risky portfolio compositions which maximize the extended Sharpe ratio, i.e., the market portfolio weights are the solution of the following optimization problem max x E(x r) - z0 x r , x e = 1, xi 0, i = 1,...,n. In this case the optimal allocation is reduced only among the four risky assets: DAX 100 Performance, FTSE all share, Nikkei 300 weighted stock average, Dow Jones Industrials and the riskless one. As argued by Roll (1977, 1978), Dybvig and Ross (1985a), different market portfolios imply a completely different security market line analysis. Thus, the approach which takes into account short sales presents more opportunities of earning than the approach with no short sales constraint. Therefore, it dominates the other approaches. Besides, if the returns are jointly k stable sub-Gaussian distributed (for some determined k = 1,2), then the Gaussian approach is inefficient. Since, in general, efficient and inefficient portfolios can plot above and below the "real" security market line. The analysis of Tables 3 and 4 points out that the composition of the market portfolio is strictly linked to the index of stability. In fact, we see that the allocation of the market portfolio in each asset component is generally monotone with respect of the stability index. Then the intuition suggests that the stable sub-Gaussian approaches take more into consideration the component of risk because of the fat tails. Recall that the tail behavior of every stable non-Gaussian distribution X d = S(,,), with 1 < < 2, is given by lim + P(X > ) = C 1 2 , (37) where C = (1 - )/( (2 - )cos 2 ). Therefore, the fat tails of smaller stability indexes underline the risk of the loss component of every portfolio. In particular, under the diverse distributional assumption, we distinguish the different perception of risk in the market portfolio components. This issue can be easily analyzed in the market portfolio weights with reference to the 23 returns when no short sales are allowed. In fact, Table 2 shows that the index of stability of FTSE all share is greater than the other indexes of stability (of the assets DAX 100 Performance, Nikkei 300 weighted stock average, Dow Jones Industrials). Observe that in Table 4 the component of the FTSE all share in the market portfolio increases with the index of stability k of the sub-Gaussian approach and the component of the other assets (DAX 100 Performance, Nikkei 300 weighted stock average, 580 S. Ortobelli et al. Dow Jones Industrials) decreases with the index of stability. Thus, the market portfolios obtained under Gaussian and sub-Gaussian distributional hypotheses consider the risks due to heavy tails differently. On the other hand the mean of market portfolios decreases with the index of stability. However, if we accept the idea that the market portfolios represent in some sense the market behavior, then according to the classic mean­risk interpretation, an optimal portfolio that has a greater mean, it has also a greater risk. This fact appears clear enough when we consider and compare the dispersion measures xkQj xk in every mean­risk plane for every market portfolio weights xk = Q-1 k ( - z0e) e Q-1 k - e Q-1 k ez0 , for every k and j. Observe that ~j,k = xkQj xk is the dispersion measure of market portfolio xkr considering the j stable Paretian approach. Therefore, for every fixed mean­risk plane (i.e., for every fixed j stable distributional approach) we can compare the market portfolio risk positions considering their risk position ~j,k (varying k). According to a mean­risk interpretation, we could observe that market portfolio with greater mean admits also a greater dispersion measure ~j,k in any mean­risk plane (see Tables 5 and 6). As a consequence of relation (37) it follows that every stable non-Gaussian distribution X d = S(,,), with 1 < < 2, admits E X - E(X) q < for q < and (38) E X - E(X) q = for q . (39) Hence, the weight of the risk measure E(|X - E(X)|q) in optimization problem (33) is generally greater for the investors who use the stable laws for asset returns when q is quite close to the index of stability . In Tables 7, 8 we listed the optimal allocation for the normal and the stable fit. Recall that is the optimal proportion of funds invested in the risk free asset which maximizes E(W) - cE(|W - E(W)|q), where W = z0 + (1 - )x r. We have chosen q = 1.45 in Table 7 and q = 1.55 in Table 8, so that q is strictly less than all indexes of stability in the data set. On the other hand, we want to value and compare the different effects of q distant or closer to the stability parameters k. For any given allocation problem, we remark in bold character and in italics respectively the greatest and the smallest allocation in the riskless asset. Both tables show the greatest diversity among the optimal allocations considering small risk aversion coefficients c. Instead, the very risk averse investors assume a less risky position with every distributional hypothesis and the allocations in the riskless asset do not change very much. As we see from these tables, when q = 1.45 and q = 1.55 the investors who fit the data with the Gaussian approach generally assume a less risky position than the investors who fit the data with the sub-Gaussian approach. Thus, if the stable sub-Gaussian approximation Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 581 presents greater performances than Gaussian one (as observed by many empirical analysis) the "stable investors" have more opportunities of earning than the "Gaussian investors". In particular, the investors with 1 = 1.7488 stable sub-Gaussian approach invest less in the riskless asset than the investors who fit the data with the other approaches. However, if we consider q very closer to 1 in optimization problem (33), then, as a consequence of (38) and (39), the investors who fit the data with 1 stable sub-Gaussian approach assume a less risky position than the investors who fit the data with the Gaussian approach. In this case, the "stable investor" has a very risk preserving behavior because he prefers not allocating too much wealth in the risky asset. In this sense, intuition suggests that the stable approaches with lower indexes of stability generally are more risk preserving than those with greater indexes of stability because they consider the component of risk due to the fat tails of asset returns. Therefore, the stability index plays a strategic role in the stable optimal portfolio selection. Conversely, q in the above optimization problem can be an opportune measure of the magnitude to be given to the component of risk due to the heavy-tailedness of the asset returns. The importance given to q is intuitively linked to the conditions of the market in which the investor operates. 5. Conclusions Firstly, we study, analyze and discuss portfolio choice models depending on a finite number of parameters. The distributional analysis presented permits to classify the admissible parametric families of returns. Moreover, by the interrelation between the parameters of each parametric family, we can order the portfolio choices using the basic principles of the stochastic dominance analysis. Thus, we can identify a dispersion measure which has some basic characteristics and represents the implicit measure of the return portfolio risk. In view of the classification of parametric portfolio choices, that is alternative to Ross' multiparameter one, we can distinguish the different efficient frontiers for investors who are non-satiable, risk averse or both (non-satiable and risk averse). In particular, we distinguish further restrictions to the classic Markowitz­Tobin's efficient frontier when no short sales are allowed. Besides, we can identify the optimization problems we have to solve in order to determine more accurate estimations of the investor's optimal allocations. In this sense, the analysis presented represents a general theory and a unifying framework to understand the parametric distributional approach to the portfolio choice theory. Secondly, we show a simple classification of the portfolio choices considering the asymptotic behavior of returns with heavy tailed distributions. As a matter of fact, when returns have a stationary behavior they are in the domain of attraction of a stable law. Therefore, we present some examples of models in the domain of attraction of stable laws. The first distributional model considered is the case of the sub-Gaussian stable distributed returns. It permits a mean risk analysis pretty similar to Markowitz­Tobin's mean­variance one. In fact, this model admits the same analytical form for the efficient frontier but the parameters have a different meaning in the two models. Thus, the most important difference is given by the way of estimating the parameters. In order to present heavy tailed models 582 S. Ortobelli et al. that consider the asymmetry of returns, we study a three fund separation model where the portfolios are in the domain of attraction of an (1,2) stable law. Next, we analyze the case of k + 1 fund separation model with portfolios in the domain of attraction of an (1,...,k) stable law. In all models we explicate the efficient frontier for the risk averse investors. In this context, we have shown that if the stable optimal portfolio analysis is stable, our approach is theoretically and empirically possible. Indeed, this work should be viewed only as a starting point for new empirical and theoretical studies on the topic of optimal allocation. Finally, the comparison made between the stable sub-Gaussian and the normal approach in terms of the allocation problems has indicated that the stable sub-Gaussian allocation is more risk preserving than the normal one and can give more opportunities of earning. Precisely, the stable approach, differently from the normal one, considers the component of risk due to the fat tails. Therefore, we find that the tail behavior of sub-Gaussian and Gaussian approaches could imply substantial differences in the asset allocation. Taken into account that the stable approach is more adherent to the reality of the market, then, as argued by Götzenberger, Rachev and Schwartz (1999), we can obtain models that improve the performance measurements with the stable distributional assumption. Acknowledgment We are grateful to Stoyan Stoyanov and Boryana Racheva-Jotova (Sofia University and Bravo Risk Management Group, Santa Barbara) for the computational analysis and helpful comments. Appendix A: Proofs In order to prove the following results, we use some Hanoch and Levy's results [in particular see Theorems 3 and 4 in Hanoch and Levy (1969)]. Proof of Theorem 1: Implication 1. According to definition of + k (a) family, it follows w Z w Z d = y Z y Z because the two random variables have the same parameters. If w Z > y Z, then for every t 0 P(w Z t) = P w Z w Z t w Z P w Z w Z t y Z = P(y Z t) and the above inequality is strict for some t. Conversely, if w Z FSD y Z, then E(w Z) > E(y Z). Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 583 Implication 2. As a consequence of the assumptions, it follows q := E(w Z) - E(y Z) 0 and w Z y Z y Z+t, for every t 0. Moreover, for every t 0 the function g(t) = (E(y Z) + t)/y Z+t is an increasing continuous positive function that tends to infinity as t . As a consequence of definition of + k (a) family there exists t q such that the random variable (w Z)/w Z has the same parameters of (y Z + t)/y Z+t and hence (w Z)/w Z d = (y Z + t)/y Z+t. Then, for every 0: P(w Z ) P y Z + t y Z+t y Z+t P(y Z ). (40) Observe that at least one of the two inequalities w Z y Z and q 0 is strict by hypothesis. Then, at least one of the previous inequalities (40) is strict for some real 0. Therefore, w Z FSD y Z. Implication 3. First assume mw Z = my Z and w Z < y Z. Then, mw Z/w Z > my Z/y Z and there exists t > 0 such that mw Z/w Z = (my Z + t)/y Z+t . Therefore, mw Z < my Z + t, 1 < y Z+t w Z =: and w Z d = y Z + t . Hence, for every M := t/( - 1); P(w Z ) = P(y Z - t) P(y Z ) and for every M, P(w Z ) P(y Z ). By hypothesis mw Z = my Z, thus we cannot have w Z FSD y Z or y Z FSD w Z. Therefore, from Hanoch and Levy Theorem 3 (1969) w Z SSD y Z and mw Z = my Z; that is w Z R­S stochastically dominates y Z. Conversely, if w Z R­S stochastically dominates y Z, then mw Z = my Z and w Z = y Z. Thus, by previous demonstration it follows that w Z < y Z because the converse is absurd. Implication 4. From Implication 2, w Z y Z, implies w Z FSD y Z which implies w Z SSD y Z. Next, assume w Z < y Z. Therefore, there exists t 0 such that mw Z/w Z = (my Z + t)/y Z+t and w Z d = (y Z + t)/ where = y Z+t/w Z. Thus, we can distinguish two cases: (1) mw Z my Z + t and w Z y Z+t. As a consequence of Implication 1, w Z FSD y Z. (2) my Z mw Z < my Z + t and w Z < y Z+t. Then, as proved in Implication 3, w Z SSD y Z. 584 S. Ortobelli et al. Implication 5. First, assume w Z = y Z. As a consequence of stochastic dominance w Z SSD y Z, the inequality E(w Z) > E(y Z) holds. Thus, as a consequence of Implication 2, w Z FSD y Z. Secondly, assume w Z > y Z. Therefore, we can distinguish two cases: (1) mw Z/w Z my Z/y Z. Thus, as a consequence of Implication 2, w Z FSD y Z. (2) mw Z/w Z < my Z/y Z. Then, there exists t > 0 such that (mw Z + t)/w Z+t = my Z/y Z and (w Z + t)/w Z+t d = y Z/y Z. Observe that mw Z + t > my Z. Therefore, w Z+t > y Z and for every M := t/( - 1) where = (w Z+t)/y Z, it follows Fw Z() Fy Z(). Similarly, for every > M, Fw Z() Fy Z(). However, a M such that Fw Z() > Fy Z() cannot exist, because distribution functions are right continuous and M - Fw Z(u)du > M - Fy Z(u)du, against the assumption w Z SSD y Z. Then, w Z FSD y Z. Implication 6. The assumptions w Z y Z and mw Z my Z with at least one inequality strict imply mw Z/w Z > my Z/y Z. As a consequence of Implication 4, it follows w Z SSD y Z. Proof of Theorem 2: From the assumptions of the theorem it follows w r - E(w r) w r d = y r - E(y r) y r . First suppose w r SSD y r. Therefore, E(w r) E(y r). If E(w r) = E(y r) and w r = y r the equality in distribution w r d = y r holds, against the hypothesis. Suppose for absurd that w r > y r . Then, for every t < M := (mw ry r - my rw r)/(y r - w r) it follows (t - E(w r))/w r > (t - E(y r))/y r and Fw r(t) Fy r(t). The inequality is strict for some t because w r is a random variable unbounded from below. This is a contradiction because by hypothesis w r SSD y r. Therefore, w r y r and E(w r) E(y r) with at least one inequality strict. Conversely, suppose E(w r) > E(y r) and w r = y r we obtain w r FSD y r from the properties of k(a) class [see Ortobelli (2001)]. Then, assume E(w r) E(y r) and w r < y r. The distributions of the random variables w r and X d = w r -E(w r)+E(y r) belong to the same k(a) family and E(X) = E(y r). Moreover, from the properties of k(a) family, the random variables X and w r have the same scale parameter w r and the same other k - 2 parameters (a1,p,...,ak-2,p). Therefore,Y := (X - E(y r))/w r d = (y r - E(y r))/y r and for every real u: u P(X ) - P(y r ) d = u P Y - E(y r) w r - P Y - E(y r) y r d 0. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 585 The above inequality is strict for some real u because random variable Y is unbounded from below and w r < y r . Thus, X SSD y r. If E(w r) = E(y r) we have X d = w r. Instead, if E(w r) > E(y r), from the properties of k(a) class the following stochastic relation holds w r FSD X SSD y r. Therefore w r SSD y r. However, considering the stochastic dominance equivalences for "SSD" [see Levy (1992)] the following equality in distribution holds y r d = w r -E(w r)+ E(y r) + with E(/w r) = 0. Appendix B: Tables Table 1 Maximum likelihood estimations of the normal asset return parameters considering daily data from 1/3/95 to 1/30/98 Assets Gaussian parameters Mean Standard deviation DAX 30 0.0007 0.0113 DAX 100 Performance 0.0007 0.0106 CAC 40 0.0005 0.011 FTSE all share 0.0007 0.007 FTSE 100 0.0008 0.0078 FTSE actuaries 350 0.0007 0.0072 Reuters Commodities -0.0002 0.0072 Nikkei 225 simple average 0.0005 0.0157 Nikkei 300 weighted stock average 0.0006 0.0137 Nikkei 300 simple stock average 0.0004 0.0129 Nikkei 500 0.0003 0.0128 Nikkei 225 stock average -0.0005 0.0158 Nikkei 300 -0.0005 0.0138 Brent Crude Physical 0.0000 0.0185 Brent current month 0.0000 0.0186 Corn No 2 Yellow cents 0.0002 0.0152 Coffee Brazilian 0.0002 0.0270 Dow Jones Futures 1 -0.0001 0.0055 Dow Jones Commodities -0.0001 0.0079 Dow Jones Industrials 0.0013 0.0086 Fuel Oil No 2 -0.0001 0.0201 Goldman Sachs Commodity 0.0000 0.0092 S&P 500 0.0009 0.0083 586 S. Ortobelli et al. Table 2 Maximum likelihood estimators of the stable asset return parameters considering daily data from 1/3/95 to 1/30/98 Assets Stable parameters Index of Stable Stable Stable stability skewness mean scale DAX 30 1.8148 -0.6682 0.0005 0.0069 DAX 100 performance 1.7996 -0.6389 0.0004 0.0064 CAC 40 1.8381 -0.1852 0.0004 0.0071 FTSE all share 1.8418 -0.5726 0.0006 0.0045 FTSE 100 1.8856 -0.5192 0.0007 0.0052 FTSE actuaries 350 1.8521 -0.5666 0.0006 0.0047 Reuters Commodities 1.7959 -0.2075 -0.0003 0.0045 Nikkei 225 simple average 1.663 -0.0483 0.0004 0.009 Nikkei 300 weighted stock average 1.6962 0.0869 0.0006 0.0079 Nikkei 300 simple stock average 1.7064 0.085 0.0004 0.0075 Nikkei 500 1.7253 0.0334 0.0003 0.0076 Nikkei 225 stock average 1.6798 -0.0721 -0.0006 0.0091 Nikkei 300 1.6994 0.0303 -0.0005 0.008 Brent Crude Physical 1.7423 -0.229 -0.0003 0.0112 Brent current month 1.7405 -0.2039 -0.0001 0.0112 Corn No 2 Yellow cents 1.6869 -0.1565 0.0002 0.0083 Coffee Brazilian 1.5876 -0.0153 0.0007 0.0144 Dow Jones Futures 1 1.8063 -0.4641 -0.0002 0.0035 Dow Jones Commodities 1.6806 -0.1389 -0.0001 0.0037 Dow Jones Industrials 1.7368 -0.2886 0.0012 0.0049 Fuel Oil No 2 1.7338 -0.1961 -0.0002 0.0117 Goldman Sachs Commodity 1.8036 -0.2663 -0.0002 0.0058 S&P 500 1.7052 -0.0881 0.0010 0.0047 Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 587 Table 3 Stable sub-Gaussian and Gaussian market portfolio weights when short sales are allowed Assets Weights for Weights for Gaussian weights = 1.7488 = 1.8856 ( = 2) DAX 30 0.3929 0.7182 1.2784 DAX 100 Performance -0.0704 -0.4594 -1.1205 CAC 40 -0.0694 -0.1217 -0.2022 FTSE all share 9.1802 10.6690 14.0729 FTSE 100 -3.0513 -2.5989 -1.6515 FTSE actuaries 350 -4.5968 -6.3873 -10.4307 Reuters Commodities -1.2833 -1.3451 -1.4679 Nikkei 225 Simple average 0.9302 0.0114 -1.8360 Nikkei 300 weighted stock average 19.7594 19.4478 19.7910 Nikkei 300 simple stock average -12.7675 -12.0352 -11.2504 Nikkei 500 15.8399 15.5310 15.5214 Nikkei 225 stock average -2.1322 -1.2088 0.6395 Nikkei 300 -21.2774 -21.3744 -22.4705 Brent Crude Physical 0.2125 0.1850 0.1653 Brent current month 0.0044 0.0260 0.0460 Corn No 2 Yellow cents 0.0295 0.0032 -0.0306 Coffee Brazilian 0.0175 0.0170 0.0201 Dow Jones Futures 1 -0.2376 -0.2771 -0.4155 Dow Jones Commodities -0.7123 -0.6422 -0.5327 Dow Jones Industrials 3.7278 3.7832 4.0654 Fuel Oil No 2 -0.2806 -0.2740 -0.2714 Goldman Sachs Commodity 0.2473 0.2356 0.2143 S&P 500 -2.8630 -2.9032 -3.1345 588 S. Ortobelli et al. Table 4 Stable sub-Gaussian and Gaussian market portfolio weights when no short sales are allowed Assets Weights for Weights for Gaussian weights = 1.7488 = 1.8856 ( = 2) DAX 100 performance 0.0746 0.0742 0.0732 FTSE all share 0.2329 0.2379 0.2496 Nikkei 300 weighted stock average 0.0509 0.0502 0.0485 Dow Jones Industrials 0.6416 0.6377 0.6287 Other assets 0 0 0 Table 5 Stable sub-Gaussian and Gaussian market portfolio parameters for every mean­dispersion plane when short sales are allowed Parameters Market portfolio Market portfolio Gaussian market = 1.7488 = 1.8856 portfolio Mean 0.0243 0.0236 0.0231 Dispersion if = 1.7488 0.0104 0.0101 0.0101 Dispersion if = 1.8856 0.0191 0.0185 0.0183 Standard deviation 0.0451 0.0433 0.0418 Table 6 Stable sub-Gaussian and Gaussian market portfolio parameters for every mean­dispersion plane when no short sales are allowed Parameters Market portfolio Market portfolio Gaussian market = 1.7488 = 1.8856 portfolio Mean 0.001079 0.001077 0.001072 Dispersion if = 1.7488 0.001553 0.001549 0.001541 Dispersion if = 1.8856 0.002807 0.002800 0.002783 Standard deviation 0.006393 0.006376 0.006339 Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 589 Table 7 Optimal allocation for the optimization problem max E(W) - cE(|W - E(W)|1.45) when different distributional assumptions are considered Allocation in the riskless asset considering the market portfolio on 23 assets when unlimited short sales are allowed Coefficient c of the Optimal allocation Optimal allocation Optimal allocation optimization when when when problem = 1.7488 = 1.8856 = 2 1.8 -10.2855 -2.3803 -0.0715 2.2 -6.2252 -1.1642 0.3140 3 -2.6268 -0.0863 0.6557 4.2 -0.7171 0.4857 0.8370 5 -0.1655 0.6509 0.8893 6 0.2227 0.7672 0.9262 7 0.4482 0.8347 0.9476 10 0.7502 0.9252 0.9763 15 0.8985 0.9696 0.9904 21 0.9520 0.9856 0.9954 Allocation in the riskless asset considering the market portfolio on 23 assets when short sales are not allowed Coefficient c of the Optimal allocation Optimal allocation Optimal allocation optimization when when when problem = 1.7488 = 1.8856 = 2 1.4 0 0 0.3788 1.5 0 0 0.4671 1.8 0 0 0.6446 2.2 0 0.3024 0.7725 3 0 0.6498 0.8858 4.2 0.4559 0.8342 0.9459 5 0.6306 0.8875 0.9633 6 0.7537 0.9250 0.9755 7 0.8251 0.9467 0.9826 10 0.9208 0.9759 0.9921 This table computes the optimal allocation in the riskless return 6% annual rate (daily z0 = 0.00016) for different risk aversion coefficient c of the optimization problem max E(W) - cE(|W - E(W)|1.45) where W = z0 + (1 - ) x r and x r is either the Gaussian market portfolio (for = 2) or the sub-Gaussian market portfolio (for = 1.7488 or = 1.8856). 590 S. Ortobelli et al. Table 8 Optimal allocation for the optimization problem max E(W) - cE(|W - E(W)|1.55) when different distributional assumptions are considered Allocation in the riskless asset considering the market portfolio on 23 assets when unlimited short sales are allowed Coefficient c of the Optimal allocation Optimal allocation Optimal allocation optimization when when when problem = 1.7488 = 1.8856 = 2 1.5 -11.4475 -4.5346 -1.2292 1.8 -7.9354 -2.9730 -0.6003 2.2 -5.2038 -1.7584 -0.1111 3 -2.5298 -0.5695 0.3678 4.2 -0.9146 0.1487 0.6571 5 -0.3944 0.3800 0.7503 6 -0.0010 0.5549 0.8207 7 0.2437 0.6637 0.8646 10 0.6046 0.8242 0.9292 15 0.8108 0.9159 0.9661 Allocation in the riskless asset considering the market portfolio on 23 assets when short sales are not allowed Coefficient c of the Optimal allocation Optimal allocation Optimal allocation optimization when when when problem = 1.7488 = 1.8856 = 2 1.3 0 0 0 1.5 0 0 0 1.8 0 0 0.0859 2.2 0 0 0.3654 3 0 0.1239 0.6389 4.2 0 0.5248 0.8042 5 0.2305 0.6539 0.8574 6 0.4476 0.7515 0.8976 7 0.5826 0.8123 0.9226 10 0.7818 0.9019 0.9595 This table computes the optimal allocation in the riskless return 6% annual rate (daily z0 = 0.00016 ) for different risk aversion coefficient c of the optimization problem max E(W) - cE(|W - E(W)|1.55)where W = z0 + (1- ) x r and x r is either the Gaussian market portfolio (for = 2) or the sub-Gaussian market portfolio (for = 1.7488 or = 1.8856). References Bawa, V.S., 1975. Optimal rules for ordering uncertain prospects. Journal of Financial Economics 2, 95­121. Bawa, V.S., 1976. Admissible portfolio for all individuals. Journal of Finance 31, 1169­1183. Chamberlein, G., 1983. A characterization of the distributions that imply mean­variance utility functions. Journal of Economic Theory 29, 975­988. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 591 Chen, N.F., Ingersoll, J., 1983. Exact pricing in linear factor models with finitely many assets: a note. Journal of Finance 38, 985­988. Cheng, B., Rachev, S., 1995. Multivariate stable futures prices. Mathematical Finance 5, 133­153. Cochrane, J.H., 1999. Asset pricing. Manuscript. University of Chicago. Connor, G., 1984. A unified beta pricing theory. Journal of Economic Theory 34, 13­31. Cox, J., Leland, H., 1982. On dynamic investment strategies. In: Proceeding of Seminar on the Analysis of Security Prices 26. Center for Research in Security Prices, University of Chicago. Dybvig, P., 1983. An explicit bound on individual assets' deviations from APT pricing in a finite economy. Journal of Financial Economics 12, 483­496. Dybvig, P., 1988a. Distributional analysis of portfolio choice. Journal of Business 61, 369­393. Dybvig, P., 1988b. Inefficient dynamic portfolio strategies or how to throw away a million dollars in the stock market. Review of Financial Studies 1, 67­88. Dybvig, P., Ingersoll, J., 1982. Mean­variance theory in complete markets. Journal of Business 55, 233­251. Dybvig, P., Ross, S., 1982. Portfolio efficient sets. Econometrica 50, 1525­1546. Dybvig, P., Ross, S., 1985a. The analytics of performance measurement using a security market line. Journal of Finance 40, 401­416. Dybvig, P., Ross, S., 1985b. Differential information and performance measurement using a security market line. Journal of Finance 40, 383­399. Dybvig, P., Ross, S., 1987. Arbitrage. In: The New Palgrave: A Dictionary of Economics. Stockton Press, New York. Fama, E., 1963. Mandelbrot and the stable Paretian hypothesis. Journal of Business 36, 420­429. Fama, E., 1965a. The behavior of stock market prices. Journal of Business 38, 34­105. Fama, E., 1965b. Portfolio analysis in a stable Paretian market. Management Science 11, 404­419. Fishburn, P., 1964. Decision and Value Theory. Wiley, New York. Fishburn, P., 1980. Stochastic dominance and moments distributions. Mathematical Operation Research 5, 94­ 100. Friedman, M., Savage, L.J., 1948. The utility analysis of choices involving risk. Journal of Political Economy 56, 279­304. Gamrowski, B., Rachev, S., 1994. Stable models in testable asset pricing. In: Approximation, Probability and Related Fields. Plenum Press, New York. Gamrowski, B., Rachev, S., 1999. A testable version of the Pareto-stable CAPM. Mathematical and Computer Modeling 29, 61­81. Giacometti, R., Ortobelli, S., 2001. A comparison between dispersion measures for the asset allocation problem. Technical Report. Department of Mathematics, Statistics, Computer Science and Applications, University of Bergamo, Italy. Götzenberger, G., Rachev, S., Schwartz, E., 1999. Performance measurements: the stable Paretian approach. In: Applied Mathematics Reviews, Vol. 1. World Scientific, to appear. Grinblatt, M., Titman, S., 1983. Factor pricing in a finite economy. Journal of Financial Economics 12, 497­508. Hadar, J., Russel, W., 1969. Rules of ordering uncertain prospects. American Economic Review 59, 25­34. Hanoch, G., Levy, H., 1969. The efficiency analysis of choices involving risk. Review of Economic Studies 36, 35­46. Hansen, L.P., Jagannathan, R., 1991. Implications of security market data for models of dynamic economies. Journal of Political Economy 99, 225­262. Hardin, Jr., 1984. Skewed stable variable and processes. Technical Report 79. Center for Stochastic Processes at the University of North Carolina, Chapel Hill. Huberman, G., 1982. A simple approach to arbitrage pricing theory. Journal of Economic Theory 28, 183­191. Ingersoll, J., Jr., 1987. Theory of Financial Decision Making. Rowman & Littlefield, Totowa. Janicki, A., Weron, A., 1994. Simulation and Chaotic Behavior of Stable Stochastic Processes. Marcel Dekker, New York. Jarrow, R., 1986. The relationship between arbitrage and first order stochastic dominance. Journal of Finance 41, 915­921. 592 S. Ortobelli et al. Jean, W.H., 1971. The extension of portfolio analysis to three or more parameters. Journal of Financial and Quantitative Analysis, 505­515. Klebanov, L.B., Rachev, S., Safarian, G., 2000. Local pre-limit theorems and their applications to finance. Applied Mathematics Letters 13, 70­73. Klebanov, L.B., Rachev, S., Szekely, G., 2001. The central pre-limit theorem and its applications. In: Stable Models in Finance. Pergamon Press, to appear. Kocherlakota, N.R., 1997. Testing the consumption CAPM with heavy-tailed pricing errors. Macroeconomic Dynamics 1, 551­567. Konno, H., Yamazaki, H., 1991. Mean­absolute deviation portfolio optimization model and its application to Tokyo stock market. Management Science 37, 519­531. Kraus, A., Litzenberger, R., 1976. Skewness preference and the valuation of risk assets. Journal of Finance 31, 1085­1100. Kroll, Y., Levy, H., 1979. Stochastic dominance with a riskless asset: an imperfect market. Journal of Financial and Quantitative Analysis 14, 179­204. Levy, H., 1990. Stochastic dominance. In: Utility and Probability. The Macmillan Press, United Kingdom. Levy, H., 1992. Stochastic dominance and expected utility: survey and analysis. Management Science 38, 555­ 593. Levy, H., Kroll, Y., 1976. Stochastic dominance with riskless assets. Journal of Financial and Quantitative Analysis 11, 743­773. Lintner, J., 1965. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics 47, 13­37. Machina, M., 1982. Expected utility analysis without the independence axiom. Econometrica 50, 277­323. Mandelbrot, B., 1963a. New methods in statistical economics. Journal of Political Economy 71, 421­440. Mandelbrot, B., 1963b. The variation of certain speculative prices. Journal of Business 26, 394­419. Mandelbrot, B., 1967. The variation of some other speculative prices. Journal of Business 40, 393­413. Mandelbrot, B.B., Taylor, H.M., 1967. On the distribution of stock price differences. Operations Research 15, 1057­1062. Markowitz, H., 1952. Portfolio selection. Journal of Finance 7, 77­91. Markowitz, H., 1959. Portfolio Selection; Efficient Diversification of Investment. Wiley, New York. Markowitz H., 1977. An algorithm for finding undominated portfolios. In: Levy, H., Sarnat, M. (Eds.), Financial Decision Making under Uncertainty. Academic Press, New York. Markowitz, H., 1987. Mean­Variance Analysis in Portfolio Choice and Capital Markets. Blackwell, Oxford. Mehra, R., Prescott, E.C., 1985. The equity premium: a puzzle. Journal of Monetary Economy 15, 145­161. Milne F., 1988. Arbitrage and diversification a general equilibrium asset economy. Econometrica 56, 815­840. Mittnik, S., Rachev, S., Paolella, M., 1997. Stable Paretian modelling in finance: some empirical and theoretical aspects. In: Adler, R., et al. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques for Analyzing Heavy Tailed Distributions. Birkhäuser, Boston. Mossin, J., 1966. Equilibrium in a capital asset market. Econometrica 34, 768­783. Nolan, J., 1997. Numerical computation of stable densities and distribution functions. Communications in Statistics. Stochastic Models 13, 759­774. Ogryczak, W., Ruszczynski, A., 1999. From stochastic dominance to mean­risk models: semideviations as risk measures. European Journal of Operational Research 116, 33­50. Ogryczak, W., Ruszczynski, A., 2001. On consistency of stochastic dominance and mean­semideviation models. Mathematical Programming 89, 217­232. Ortobelli, S., 2001. The classification of parametric choices under uncertainty: analysis of the portfolio choice problem. Theory and Decision 51, 297­327. Ortobelli, S., Huber, I., Höchstötter, M., Rachev, S., 2001. A comparison among Gaussian and non-Gaussian portfolio choice models. In: Proceeding IFAC SME 2001. Ortobelli, S., Huber, I., Schwartz, E., 2002. Portfolio selection with stable distributed returns. Mathematical Methods of Operation Research 55, 265­300. Ch. 14: Portfolio Choice Theory with Non-Gaussian Distributed Returns 593 Ortobelli, S., Rachev, S., Schwartz, E., 2002. The problem of optimal asset allocation with stable distributed returns. Technical Report. University of Karlsruhe. Owen, J., Rabinovitch, R., 1983. On the class of elliptical distributions and their applications to the theory of portfolio choice. Journal of Finance 38, 745­752. Paulauskas, V., Rachev, S., 1999. Maximum likelihood estimators in regression models with infinite variance innovations. Technical Report. Department of Statistics, UCSB, Santa Barbara, CA 93106. Pitman, 1939. The estimation of the location and scale parameters of a continuous population of any given form. Biometrica 30, 391­421. Quirk, J., Saposnik, R., 1962. Admissibility and measurable utility function. Review of Economic Studies 29, 140­146. Rachev, S., 1991. Probability Metrics and the Stability of Stochastic Models. Wiley, New York. Rachev, S., Mittnik, S., 2000. Stable Model in Finance. Wiley, New York. Rachev, S., Xin, H., 1993. Test on association of random variables in the domain of attraction of multivariate stable law. Probability and Mathematical Statistics 14, 125­141. Roll, R., 1977. A critique of the asset pricing theory's tests. Journal of Financial Economics 4, 129­176. Roll R., 1978. Ambiguity when performance is measured by the security market line. Journal of Finance 33, 1031­1069. Roll R., 1979a. Testing a portfolio for ex ante mean/variance efficiency. In: Elton, E., Gruber, M. (Eds.), TIMS Studies in the Management Sciences. North-Holland, Amsterdam. Roll R., 1979b. A reply to Meyer and Rice (1979). Journal of Financial Economics 7, 391­400. Ross, S., 1975. Return, risk and arbitrage. In: Friend, I., Bicksler, J. (Eds.), Studies in Risk and Return. Ballinger, Cambridge, MA. Ross, S., 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13, 341­360. Ross, S., 1978a. Mutual fund separation in financial theory ­ the separating distributions. Journal of Economic Theory 17, 254­286. Ross, S., 1978b. A simple approach to the valuation of risky streams. Journal of Business 51, 453­475. Rothschild, M., Stiglitz, J., 1970. Increasing risk: I. definition. Journal of Economic Theory 2, 225­243. Rubinstein, M., 1976. The valuation of uncertain income streams and the pricing of options. Bell Journal of Economics 7, 407­425. Samorodnitsky, G., Taqqu, M.S., 1994. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York. Samuelson, P.A., 1967. General proof that diversification pays. Journal of Financial Quantitative Analysis 2, 1­13. Samuelson, P.A., 1969. The fundamental approximation theorem of portfolio analysis in terms of means, variances and higher moments. Review of Economic Studies 37, 537­542. Samuelson, P.A., Merton, R., 1975. Generalized mean­variance tradeoffs for best perturbation corrections to approximate portfolio decisions. Journal of Finance 30, 27­40. Shaked, M., Shanthikumar, G., 1994. Stochastic Orders and their Applications. Academic Press ­ Harcourt, Brace & Company, New York. Shalit, H., Yitzhaki, S., 1984. Mean-Gini, portfolio theory, and the pricing of risky assets. Journal of Finance 39, 1449­1468. Sharpe, W., 1964. The capital asset prices: a theory of market equilibrium under condition of risk. Journal of Finance 19, 425­442. Simaan, Y., 1993. Portfolio selection and asset pricing ­ three parameter framework. Management Science 5, 568­577. Speranza, M.G., 1993. Linear programming models for portfolio optimization. Finance 14, 107­123. Tobin, J., 1958. Liquidity preference as behavior toward risk. Review of Economic Studies 25, 65­86. Tobin, J., 1965. The theory of portfolio selection. In: Hahn, F.H., Brechling, F.P.R. (Eds.), The Theory of Interest Rates. Macmillan, London. Tokat, Y., Rachev, S., Schwartz, E., 2002. The stable non-Gaussian asset allocation: a comparison with the classical Gaussian approach. Journal of Economic Dynamics and Control, to appear. 594 S. Ortobelli et al. von Neumann, J., Morgenstern, O., 1953. Theory of Games and Economic Behavior, 3rd edition. Princeton University Press, Princeton, NJ. Yitzhaki, S., 1982. Stochastic dominance, mean, variance and Gini's mean difference. American Economic Review 72, 178­185. Zolotarev, V.M., 1986. One-Dimensional Stable Distributions. In: Translation of Mathematical Monographs, Vol. 65. American Mathematical Society, Providence, RI. Translation of the original 1983 in Russian. Chapter 15 PORTFOLIO MODELING WITH HEAVY TAILED RANDOM VECTORS MARK M. MEERSCHAERT Department of Mathematics, University of Nevada, Reno, USA e-mail: mcubed@unr.edu HANS-PETER SCHEFFLER Fachbereich Mathematik, University of Dortmund, 44221 Dortmund, Germany e-mail: hps@math.uni-dortmund.de Contents Abstract 596 Keywords 596 1. Introduction 597 2. Heavy tails 597 3. Central limit theorems 600 4. Matrix scaling 607 5. The spectral decomposition 610 6. Sample covariance matrix 613 7. Dependent random vectors 616 8. Tail estimation 619 9. Tail estimator proof for dependent random vectors 626 10. Conclusions 636 References 637 Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 596 M.M. Meerschaert and H.-P. Scheffler Abstract Since the work of Mandelbrot in the 1960s there has accumulated a great deal of empirical evidence for heavy tailed models in finance. In these models, the probability of a large fluctuation falls off like a power law. The generalized central limit theorem shows that these heavy-tailed fluctuations accumulate to a stable probability distribution. If the tails are not too heavy then the variance is finite and we find the familiar normal limit, a special case of stable distributions. Otherwise the limit is a nonnormal stable distribution, whose bell-shaped density may be skewed, and whose probability tails fall off like a power law. The most important model parameter for such distributions is the tail thickness , which governs the rate at which the probability of large fluctuations diminishes. A smaller value of means that the probability tails are fatter, implying more volatility. In fact, when < 2 the theoretical variance is infinite. A portfolio can be modeled using random vectors, where each entry of the vector represents a different asset. The tail parameter usually depends on the coordinate. The wrong coordinate system can mask variations in , since the heaviest tail tends to dominate. A judicious choice of coordinate system is given by the eigenvectors of the sample covariance matrix. This isolates the heaviest tails, associated with the largest eigenvalues, and allows a more faithful representation of the dependence between assets. Keywords multivariable regular variation, moment estimates, moving averages, generalized domains of semistable attraction, R­O varying measures Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 597 1. Introduction In order to construct a useful probability model for an investment portfolio, we must consider the dependence between assets. If we accept the premise that price changes are heavy tailed, then we are lead to consider random vectors with heavy tails. In this chapter, we survey those portions of the theory of heavy tailed random vectors that seem relevant to portfolio analysis. The most flexible models recognize the possibility that the thickness of probability tails varies in different directions, implying the need for matrix scaling. A judicious change of coordinates often simplifies the model, and may uncover features masked by the original coordinates. The original coordinates are the price changes (or returns) for each asset. The new coordinates can be interpreted as market indices, chosen to capture certain features of the market. In some popular heavy-tailed finance models, the tails are so heavy that the theoretical variance of price changes is undefined. For these models, the theoretical covariance matrix is also undefined. Of course the sample variance and the sample covariance matrix can always be computed for any data set, but these statistics are not estimating the usual model parameters. One of the most interesting discoveries in heavy tailed modeling is that, in the infinite variance case, the sample covariance matrix actually contains quite a bit of important information about the underlying distribution. In fact, the eigenvectors of this matrix provide a very useful coordinate system. We illustrate the application of this principle, and we also include a previously unpublished proof, extending the method to more general heavy tailed vector models with time dependence. 2. Heavy tails A probability distribution has heavy tails if some of its moments fail to exist. Suppose that X is a random variable with density f (x) so that P(a X b) = b a f (x)dx. The k-th moment of the random variable X is defined by an improper integral k = E Xk = - xk f (x)dx. The mean = 1, variance 2 = 2 - 2 1, skewness and kurtosis depend on these moments. Because k is an improper integral, it may not exist. If f (x) is a normal density, a lognormal density, or any other density whose tails fall off exponentially then all of the moments k exist. But if f (x) has heavy tails that fall off like a power law, then some of the moments k will not exist. The simplest example of a heavy tailed distribution is a Pareto, invented to model the distribution of incomes. A Pareto random variable satisfies 598 M.M. Meerschaert and H.-P. Scheffler P(X > x) = Cx- so that the probability of large outcomes falls off like a power law. The Pareto density is defined by f (x) = Cx--1 for x > C1/, 0 otherwise so that k = C1/ Cxk--1 dx = Ck/ 1 yk--1 dy = Ck/ ykk - y=1 using the substitution x = C1/y. If k < then the limit at infinity is zero and k = Ck//( - k), but if k then this improper integral diverges, so that the k-th moment does not exist. Pareto distributions are closely related to some other familiar distributions. If U has a uniform distribution on (0,1), then X = U-1/ has a Pareto distribution with tail parameter . To check this, write P(X > x) = P U-1/ > x = P U < x= x- . If X is Pareto with P(X > x) = x-, then Y = lnX has an exponential distribution with rate . To see this, note that P(Y > y) = P(ln X > y) = P X > ey = ey = e-y . Some other familiar distributions have Pareto-like power law tails, causing some moments to diverge. If Y has a Student-t distribution with degrees of freedom, then P(|Y| > y) Cy- where = .1 Then E(Yk) exists only for k < . If Y has a Gamma distribution with density proportional to yp-1 e-qy then the log-Gamma random variable X defined by Y = lnX satisfies P(X > x) Cx- for x large, where = q. Some other distributions with Pareto-like tails are the stable and operator stable distributions, which will be discussed later in this chapter. Heavy tailed random variables with P(|X| > x) Cx- are observed in many real world applications. Estimation of the tail parameter is important, because it determines which moments exist. Anderson and Meerschaert (1998) find heavy tails in a river flow with 3, so that the variance is finite but the fourth moment is infinite. Tessier et al. (1996) find heavy tails with 2 < < 4 for a variety of river flows and rainfall accumulations. Hosking and Wallis (1987) find evidence of heavy tails with 5 for annual flood levels of a river in England. Benson, Wheatcraft and Meerschaert (2000), Benson et al. (2001) model concentration profiles for tracer plumes in groundwater using 1 Here f (x) g(x) means that f (x)/g(x) 1 as x . Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 599 stochastic models whose heavy tails have 1 < < 2, so that the mean is finite but the variance is infinite. Heavy tail distributions with 1 < < 2 are used in physics to model anomalous diffusion, where a cloud of particles spreads faster than classical Brownian motion predicts (Blumen, Zumofen and Klafter, 1989; Klafter, Blumen and Shlesinger, 1987; Shlesinger, Zaslavsky and Frisch, 1994). More applications to physics with 0 < < 2 are cataloged in Uchaikin and Zolotarev (1999). Resnick and St˘aric˘a (1995) examine the quiet periods between transmissions for a networked computer terminal, and find heavy tails with 0 < < 1, so that the mean and variance are both infinite. Several additional applications to computer science, finance, and signal processing appear in Adler, Feldman and Taqqu (1998). More applications to signal processing can be found in Nikias and Shao (1995). Mandelbrot (1963) and Fama (1965a) pioneered the use of heavy tail distributions in finance. Mandelbrot (1963) presents graphical evidence that historical daily price changes in cotton have heavy tails with 1.7, so that the mean exists but the variance is infinite. Jansen and de Vries (1991) argue that daily returns for many stocks and stock indices have heavy tails with 3 < < 5, and discuss the possibility that the October 1987 stock market plunge might be just a heavy tailed random fluctuation. Loretan and Phillips (1994) use similar methods to estimate heavy tails with 2 < < 4 for returns from numerous stock market indices and exchange rates. This indicates that the variance is finite but the fourth moment is infinite. Both daily and monthly returns show heavy tails with similar values of in this study. Rachev and Mittnik (2000) use different methods to find heavy tails with 1 < < 2 for a variety of stocks, stock indices, and exchange rates. McCulloch (1996) uses similar methods to re-analyze the data in Jansen and de Vries (1991), Loretan and Phillips (1994), and obtains estimates of 1.5 < < 2. This is important because the variance of price returns is finite if > 2 and infinite if < 2. While there is disagreement about the true value of , depending on which model is employed, all of these studies agree that financial data is typically heavy tailed, and that the tail parameter varies between different assets. Portfolio analysis involves the joint probability distribution of several prices or returns X1,...,Xd, where d is the number of assets in the portfolio. It is natural to model this set of numbers as a d-dimensional random vector X = (X1,...,Xd) . We say that X has heavy tails if E( X k) is undefined for some k = 1,2,3,.... Let us consider the practical problem of portfolio modeling. We choose d assets and research historical performance to obtain data of the form Xi(t) where i = 1,...,d is the asset and t = 0,...,n is the time variable. Typically the distribution of values Xi(0),...,Xi(n) has a heavy tail whose parameter i can be estimated from this data. The research of Jansen and de Vries (1991), Loretan and Phillips (1994), and Rachev and Mittnik (2000) indicates, not surprisingly, that i will vary depending on the asset. Then the random vectors Xt = (X1(t),...,Xd(t)) will have heavier tails in some directions than in others. Despite this well known fact, most existing research on heavy tailed portfolio modeling has assumed that the probability tails are the same in every direction. Nolan, Panorska and McCulloch (2001) consider such a model, based on the multivariable stable distribution, for a vector of two exchange rates. 600 M.M. Meerschaert and H.-P. Scheffler They argue that is the same for both.2 Rachev and Mittnik (2000) use a multivariable stable model for portfolio analysis, so that is the same for every asset. The same approach was also applied to portfolio analysis by Bawa, Elton and Gruber (1979), Belkacem, Véhel and Walter (2000), Chamberlain, Cheung and Kwan (1990), Fama (1965b), Gamba (1999), Press (1982), Rachev and Han (2000), and Ziemba (1974). If this modeling approach can be enhanced to allow i to vary with the asset, a more realistic and flexible representation of financial portfolios can be achieved. The goal of this chapter is to show how this can be accomplished, using modern central limit theory. 3. Central limit theorems Normal and log-normal models are popular in finance because of their simplicity and familiarity. Their use can also be justified by the central limit theorem. If X,X1,X2,X3,... are independent and identically distributed (IID) random variables with mean m = E(X) and finite variance 2 = E[(X - m)2] then the central limit theorem says that (X1 + + Xn) - nm n1/2 Y, (3.1) where Y is a normal random variable with mean zero and variance 2, and means convergence of probability distributions. Essentially, (3.1) means that X1 + + Xn is approximately normal (with mean nm and variance n2) for n large. If the summands Xi represent independent price shocks, then their sum is the price change over a period of time. If price changes are accumulations of many IID shocks, then they should be normally distributed. If price changes accumulate multiplicatively, taking logs changes the product into a sum, leading to a log-normal model. For portfolio analysis, we need to consider a vector of prices. Suppose that X,X1, X2,X3,... are IID random vectors on a d-dimensional Euclidean space Rd . If X = (X1,...,Xd) then the mean m = E(X) is a vector with i-th entry mi = E(Xi), the covariance matrix C is a d × d matrix with ij entry cij = Cov(Xi,Xj ) = E (Xi - mi)(Xj - mj ) , and the central limit theorem says that (X1 + + Xn) - nm n Y, (3.2) 2 Example 8.1 gives an alternative operator stable model for the same data set. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 601 where Y is a normal random vector with mean zero and covariance matrix C = E[YY ]. In this case, it simplifies the analysis to change coordinates. If the matrix P defines the change of coordinates then it follows from (3.2) that (PX1 + + PXn) - nPm n PY , (3.3) where PY is multivariate normal with mean zero and covariance matrix PCP = E[(PY)(PY) ]. If we take the new coordinate system defined by the eigenvectors of the covariance matrix C, then the limit PY has independent normal marginals. The eigenvalues of C determine the variance of each marginal, so their square roots measure volatility. The corresponding marginals of PX are all linear combinations of the original assets, chosen to be asymptotically independent. This coordinate system is one of the cornerstones of Markowitz's theory of optimal portfolios, see for example Elton and Gruber (1995). For heavy tailed random variables, the central limit theorem may not hold, because the second moment might not exist. An extended central limit theorem applies in this case. If X,X1,X2,X3,... are IID random variables we say that X belongs to the domain of attraction of some random variable Y, and we write X DOA(Y), if (X1 + + Xn) - bn an Y. (3.4) For mathematical reasons we exclude the degenerate case where Y = c with probability one. The limits in (3.4) are called stable. If E(X2) exists then the classical central limit theorem shows that Y is normal, a special case of stable. In this case, we can take an = n1/2 and bn = nE(X). If X has heavy tails with P(|X| > r) Cr- then the situation depends on the tail thickness . If > 2 then E(X2) exists and sums are asymptotically normal. But if 0 < 2 then E(X2) = and (3.4) holds with an = n1/ as long as a tail balancing condition holds: P(X > r) P(|X| > r) p and P(X < -r) P(|X| > r) q as r (3.5) for some 0 p,q 1 with p + q = 1. A proof of the extended central limit theorem can be found in Gnedenko and Kolmogorov (1968), see also Feller (1971) and Meerschaert and Scheffler (2001a). The condition for X DOA(Y) is stated in terms of regular variation. A function f (r) varies regularly if lim r f (r) f (r) = for all > 0. (3.6) For Y stable with index 0 < < 2, so that Y is not normal, a necessary and sufficient condition for X DOA(Y) is that P(|X| > r) varies regularly with index - and (3.5) 602 M.M. Meerschaert and H.-P. Scheffler holds for some 0 p,q 1 with p + q = 1. If we have P(|X| > r) Cr- then it is easy to see that P(|X| > r) varies regularly with index -, but the definition also allows a slightly more general tail behavior. For example, if P(|X| > r) Cr- logr then P(|X| > r) still varies regularly with index -. The norming constants an in (3.4) can always be chosen according to the formula nP(|X| > an) C. If we have P(|X| > r) Cr- this leads to an = n1/. In practical applications, it is common to assume that P(|X| > r) Cr- because a practical procedure exists for estimating the parameters C, for a given heavy tailed data set.3 Stable distributions are typically specified in terms of their characteristic functions (Fourier transforms). If Y is stable with density f (y) its characteristic function E eikY = - eiky f (y)dy is of the form e(k) where (k) = ibk - |k| 1 - i sign(k)tan 2 for = 1, ibk - |k| 1 + i 2 sign(k)ln|k| for = 1. (3.7) The entire class of nondegenerate stable laws on R1 is given by these formulas with index (0,2], scale (0,), skewness [-1,+1], and center b (-,). The stable distribution with these parameters will be written as S(,,b) using the notation of Samorodnitsky and Taqqu (1994). The skewness = p - q governs the deviations of the distribution from symmetry, so that f (y) is symmetric if = 0. The scale and the center b have the usual meaning that if Y has a S(1,,0) distribution then Y + b has a S(,,b) distribution, except that for = 1 and = 0 multiplication by introduces a nonlinear change in the shift. The stable index governs the tails of Y, and in fact P(|Y| > r) Cr- where = C (2 - ) 1 - cos 2 for = 1, C 2 for = 1 (3.8) in the nonnormal case 0 < < 2. The tails are balanced so that P(Y > r) P(|Y| > r) p and P(Y < -r) P(|Y| > r) q as r . (3.9) 3 See Section 8. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 603 Stable laws belong to their own domain of attraction, but more is true. In fact, if Yn are IID with Y then (Y1 + + Yn) - bn n1/ d = Y (3.10) for some bn, where d = indicates that both sides have the same probability distribution. Sums of IID stable laws are again stable with the same ,. Although there is no closed analytical formula for stable densities, the efficient computational method of Nolan (1997, 2002; Resnick and St˘aric˘a, 1995) can be used to plot density curves. Nolan (2001) uses these methods to compute maximum likelihood estimators for the stable parameters, see also Mittnik et al. (1999), Mittnik, Doganoglu and Chenyao (1999). If Xn is the price change on day n then the accumulation of these changes will be approximately stable, assuming that Xn are IID with X and P(|X| > x) Cx-. If < 2, as in the cotton prices considered in Mandelbrot (1963), then the price obtained by adding these changes will be approximately stable with a power law tail. The balancing parameters p and q describe the probability that a large change in price will be positive or negative, respectively. The scale (or equivalently, the dispersion C) depends on the price units (e.g., US dollars). If 2 < < 4 then the sum of these price changes will be asymptotically normal. However, the rule of thumb that sums look normal for n 30 is no longer reliable. The heavy tails slow the rate of convergence in the central limit theorem. To illustrate the point, we simulated Pareto random variables with = 3, using the fact that if U is uniform on (0,1) then U-1/ is Pareto with tail parameter . We summed n = 50 of these random variables, and repeated the simulation 100 times to get an idea of the distribution of these sums. The boxplot in Figure 1 indicates that the distribution of the resulting sums is skewed to the right, with some outliers. The normal probability plot in Figure 2 indicates a significant deviation from normality. The moral of this story is that for heavy tailed random variables with > 2, sums eventually converge to a normal limit, but slower than usual. For heavy tailed random vectors, a generalized central limit theorem applies. If X,X1,X2,X3,... are IID random vectors on Rd we say that X belongs to the generalized domain of attraction of some full dimensional random vector Y on Rd, and we write X GDOA(Y), if An(X1 + + Xn - bn) Y (3.11) for some d × d matrices An and vectors bn Rd. The limits in (3.11) are called operator stable (Jurek and Mason, 1993; Sharpe, 1969). If E( X 2) exists then the classical central limit theorem shows that Y is multivariable normal, a special case of operator stable. In this case, we can take An = n-1/2I and bn = nE(X). If X has heavy tails with P( X > r) Cr- then the situation depends on the tail thickness . If > 2 then 604 M.M. Meerschaert and H.-P. Scheffler Fig. 1. Sums of 50 Pareto variables with = 3. Their distribution is skewed to the right with several outliers. Fig. 2. Sums of 50 Pareto variables with = 3. Upper tail shows systematic deviation from normal distribution. E( X 2) exists and sums are asymptotically normal. But if 0 < < 2 then E( X 2) = and (3.11) holds with An = n-1/I as long as a tail balancing condition holds: P( X > r, X/ X B) P( X > r) M(B) as r (3.12) Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 605 for all Borel subsets4 B of the unit sphere S = { Rd: = 1} whose boundary has M-measure zero, where M is a probability measure on the unit sphere which is not supported on any d - 1 dimensional subspace of Rd. A proof of the generalized central limit theorem can be found in Rvaˇceva (1962) or Meerschaert and Scheffler (2001a). In this case, where the tails of X fall off at the same rate in every direction, the limit Y is multivariable stable (Samorodnitsky and Taqqu, 1994), a special case of operator stable. If Y is multivariable stable with density f (y) its characteristic function E eikY = eiky f (y)dy is of the form e(k) where (k) = ib k - =1 | k| 1 - isign( k)tan 2 M(d) for = 1 and (k) = ib k - =1 | k| 1 + i 2 sign( k)ln| k| M(d) for = 1. The entire class of multivariable stable laws on Rd is given by these formulas with index (0,2], scale > 0, mixing measure M and center b Rd . We say that Y has distribution S(,M,b) in this case. The mixing measure M is a probability distribution on the unit sphere in Rd that governs the tails of Y, so that f (y) is symmetric if M is symmetric. The center b and scale have the usual meaning that if Y has a S(1,M,0) distribution then Y + b has a S(,M,b) distribution, except when = 1. The stable index governs the tails of Y in the nonnormal case (0 < < 2). In fact, P( Y > r) Cr- where C is given by (3.8). The mixing measure M is a multivariable analogue of the skewness . If d = 1 then M{+1} = p and M{-1} = q, since the unit sphere on R1 is the two point set {-1,+1}. In this case, Y is stable with skewness = p - q. The tails of a multivariable stable random vector are balanced so that P( Y > r, Y/ Y B) P( Y > r) M(B) as r . (3.13) If d = 1 this reduces to the tail balancing condition (3.9) for stable random variables. Multivariable stable laws belong to their own domain of attraction, and if Yn are IID with Y then (Y1 + + Yn) - bn n1/ d = Y (3.14) 4 The class of Borel subsets is the smallest class that include open sets and is closed under complements and countable unions. 606 M.M. Meerschaert and H.-P. Scheffler for some bn, so that sums of IID multivariable stable laws are again multivariable stable with the same . When Y is nonnormal multivariable stable with distribution S(,M,b) for some 0 < < 2, the necessary and sufficient condition for X DOA(Y) is that P( X > r) varies regularly with index - and the balanced tails condition (3.12) holds. Example 3.1. The mixing measure governs the radial direction of large price jumps. Take Ri IID Pareto random variables with P(R > r) = Cr-. Take i to be IID random unit vectors with distribution M, independent of (Ri). Then Xi = Rii are IID random vectors with P( Xi > r) = Cr- and P( Xi > r, Xi/ Xi B) P( Xi > r) = P(i B) = M(B) for any Borel subset B of the unit sphere, and so Xi DOA(Y) where Y is multivariable stable with distribution S(,M,b) for any b Rd. We can take An = n-1/I in (3.11), and b depends on the choice of centering bn. We call these heavy tailed random vectors multivariable Pareto. If we use a multivariable Pareto model for large jumps in the vector of prices for a portfolio, the parameter governs the radius and the mixing measure M governs the angle of large jumps. Sums of these IID jumps are asymptotically multivariable stable with the same index and mixing measure M. The radius R = Y satisfies P(R > r) Cr- and the distribution of the radial component = Y/ Y conditional on P( Y > r) tends to M as r in view of the tail balancing condition (3.13). In other words, multivariable stable random vectors are asymptotically multivariable Pareto on their tails. In a multivariable stable model for price jumps, the mixing measure determines the direction of large jumps. If M is discrete with M(i) = pi, then it follows from the characteristic function formulas that Y can be represented as the sum of independent stable components laid out along the i directions, and the methods of Nolan (1997, 2002) can be used to plot multivariable stable densities, see Byczkowski, Nolan and Rajput (1993). The same idea is used by Modarres and Nolan (1994) to simulate stable random vectors with discrete mixing measures. For an arbitrary mixing measure, multivariable stable laws can be simulated using sums of independent, identically distributed multivariable Pareto laws. If 0 < < 1 then the random vector n-1/(X1 + + Xn) is approximately S(,M,0) where C is given by (3.8). If 1 < < 2 then n-1/(X1 + + Xn - nEX1) is approximately S(,M,0) where C is given by (3.8) and E(X1) = E(R1)E(1) = C1/ - 1 =1 M(d). Remark 3.2. Previously a different type of multivariable Pareto distribution was considered by Arnold (1990), see also Kotz, Balakrishnan and Johnson (2000). Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 607 4. Matrix scaling The multivariable stable model is the basis for the work of Nolan, Panorska and McCulloch (2001) on exchange rates, and the portfolio models in Rachev and Mittnik (2000). Under the assumptions of this model, the probability tail of the random vector Xt is assumed to fall off at the same power law rate in every radial direction. Suppose that Xt = (X1(t),...,Xd(t)) where Xi(t) is the price change of the i-th asset on day t. If Xt belongs to the domain of attraction of some multivariable stable random vector Y = (Y1,...,Yd) with index , and that (3.11) holds with An = n-1/I. Projecting onto the i-th coordinate axis shows that Xi(1) + + Xi(n) - bi(n) n1/ Yi, (4.1) where bn = (b1(n),...,bd(n)) , so that Yi is stable with index and Xi(t) belongs to the domain of attraction of Yi . According to Jansen and de Vries (1991), Loretan and Phillips (1994), and Rachev and Mittnik (2000), the stable index i should vary depending on the asset. Then (4.1) is replaced by Xi(1) + + Xi(n) - bi(n) n1/i Yi for each i = 1,...,d (4.2) so that Yi is stable with index i. Mittnik and Rachev (1993) seem to have been the first to apply such models to a problem in finance, see also Section 8.6 in Rachev and Mittnik (2000). Assuming the joint convergence An X1(1) X2(1) ... Xd(1) + + X1(n) X2(n) ... Xd(n) - b1(n) b2(n) ... bd(n) Y1 Y2 ... Yd (4.3) and changing to vector-matrix notation we get (3.11) with diagonal norming matrices An = n-1/1 0 0 0 n-1/2 0 ... ... ... 0 0 n-1/d (4.4) which we will also write as An = diag(n-1/1 ,...,n-1/d ). The matrix scaling is natural since we are dealing with random vectors, and it allows a more realistic portfolio model. The i-th marginal Yi of the operator stable limit vector Y is stable with index i , so the tail behavior of Y varies with angle. The convergence (3.11) with An diagonal was first considered in Resnick and Greenwood (1979), see also Meerschaert (1991). 608 M.M. Meerschaert and H.-P. Scheffler Matrix notation also leads to a natural analogue of the stable index . Let exp(A) = I + A + A2/2! + A3/3! + be the usual exponential operator for d × d matrices. This operator occurs, for example, in the theory of linear differential equations. If A = diag(a1,...,ad) then an easy matrix computation using the Taylor series formula ex = 1 + x + x2/2! + x3/3! + shows that exp(A) = diag(ea1,...,ead ). See Hirsch and Smale (1974) or Section 2.2 of Meerschaert and Scheffler (2001a) for details and additional information. Now define E = diag(1/1,...,1/d). Then the norming matrices An in (4.4) can also be written in the more compact form An = n-E = exp(-E lnn), since -E lnn = diag(-(1/1)lnn,...,-(1/d)lnn) and e-(1/i)lnn = n-1/i . The matrix E, called an exponent of the operator stable random vector Y, plays the role of the stable index . This matrix E need not be diagonal. Diagonalizable exponents involve a change of coordinates, degenerate eigenvalues thicken probability tails by a logarithmic factor, and complex eigenvalues introduce rotational scaling, see Meerschaert (1990). The case of a diagonalizable exponent plays an important role in Example 8.1. The generalized central limit theorem for matrix scaling can be found in Meerschaert and Scheffler (2001a). Matrix scaling allows for a limit with both normal and nonnormal components. Since Y is infinitely divisible, the Lévy representation [Theorem 3.1.11 in Meerschaert and Scheffler (2001a)] shows that the characteristic function E[eikY ] is of the form e(k) where (k) = ib k - 1 2 k Ck + x=0 eikx - 1 ik x 1 + x 2 (dx) for some b Rd, some nonnegative definite symmetric d × d matrix C and some Lévy measure . The Lévy measure satisfies {x: x > 1} < and 0< x <1 x 2 (dx) < . For a multivariable stable law, x: x > r, x x B = Cr- M(B) and the characteristic function formulas for multivariable stable laws follow by a lengthy computation, see Section 7.3 in Meerschaert and Scheffler (2001a) for complete details. If = 0 then Y is normal with mean b and covariance matrix C. If C = 0 then a necessary and sufficient condition for (3.11) to hold is that nP(AnX B) (B) as n (4.5) for Borel subsets B of Rd \{0} whose boundary have -measure zero, where is the Lévy measure of the limit Y. Proposition 6.1.10 in Meerschaert and Scheffler (2001a) shows Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 609 that the convergence (4.5) is equivalent to regular variation of the probability distribution (B) = P(X B). If (4.5) holds then Proposition 6.1.2 in Meerschaert and Scheffler (2001a) shows that the Lévy measure satisfies t(dx) = t-E dx for all t > 0 (4.6) for some d × d matrix E. Then it follows from the characteristic function formula that Y is operator stable with exponent E, and that for Yn IID with Y we have n-E (Y1 + + Yn - bn) d = Y (4.7) for some bn, see Theorem 7.2.1 in Meerschaert and Scheffler (2001a). Hence operator stable laws belong to their own GDOA, so that the probability distribution of Y also varies regularly, and sums of IID operator stable random vectors are again operator stable with the same exponent E. If E = aI then Y is multivariable stable with index = 1/a, and (4.5) is equivalent to the balanced tails condition (3.12). Example 4.1. Multivariable Pareto random vectors with matrix scaling extend the model in Example 3.1. Suppose Y is operator stable with exponent E and Lévy measure . Define Fr,B = sE : s > r, B and let (B) = (F1,B) for any Borel subset B of the unit sphere S whose boundary has -measure zero.5 Let C = (S) and define the probability measure M(B) = (B)/C. Take Ri IID standard Pareto random variables with P(R > r) = Cr-1, i IID random unit vectors with distribution M and independent of (Ri), and finally let Xi = RE i i . Since tEF1,B = Ft,B we have (Ft,B) = (tEF1,B) = t-1(F1,B) = Ct-1M(B) in view of (4.6). Then nP n-E Xi Ft,B = nP RE i i Fnt,B = nP(Ri > nt,i B) = nC(nt)-1 M(B) = (Ft,B) for n > 1/t, so that (4.5) holds for the sets Ft,B with An = n-E. Then Xi GDOA(Y). Operator stable laws can be simulated using sums of these IID random vectors. If every eigenvalue of E has real part greater than one, then n-E(X1 + + Xn) is approximately operator stable with exponent E and Lévy measure . If every eigenvalue of E has real 5 The measure is called the spectral measure of Y. 610 M.M. Meerschaert and H.-P. Scheffler part less than one, then n-E(X1 + + Xn - nm) is approximately operator stable with exponent E and Lévy measure where m = C =1 C rE dr r2 M(d) is the mean of X1. 5. The spectral decomposition The tail behavior of an operator stable random vector Y is determined by the eigenvalues of its exponent E. If E = (1/)I then Y is multivariable stable and P(|Y | > r) C r- for any = 0. If E = diag(a1,...,ad) then Y = (Y1,...,Yd) where Yi is a stable random variable with index i = 1/ai. This requires 0 < i 2 so that ai 1/2. For any d × d matrix E there is a unique spectral decomposition based on the real parts of the eigenvalues, see for example Theorem 2.1.14 in Meerschaert and Scheffler (2001a). This decomposition allows us to write E = PBP-1 where P is a change of coordinates matrix and B is block-diagonal with B = B1 0 0 0 B2 0 ... ... ... 0 0 Bp (5.1) where Bi is a di ×di matrix, every eigenvalue of Bi has real part equal to ai, a1 < < ap, and d1 ++dp = d. Let e1 = (1,0,...,0) , e2 = (0,1,0,...,0) , ..., ed = (0,...,0,1) be the standard coordinates for Rd and define pik = Pej when j = d1 + + di-1 + k for some k = 1,...,di. Then Vi = span{pi1,...,pidi } = di k=1 tkpik: t1,...,tdi real is a di-dimensional subspace of Rd. Any vector y Rd can be written uniquely in the form y = y1 + + yp with yi Vi for each i = 1,...,p. This is called the spectral decomposition of Rd with respect to E. Since B is block-diagonal and E = PBP-1, every Epik is a linear combination of pi1,...,pidi and therefore Eyi Vi for every yi Vi. This means that Vi is an E-invariant subspace of Rd . Given a nonzero vector Rd, write = 1 + + p with i Vi for each i = 1,...,p and define () = min 1 ai : i = 0 . (5.2) Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 611 Since the probability distribution of Y varies regularly with exponent E, Theorem 6.4.15 in Meerschaert and Scheffler (2001a) shows that for any small > 0 we have r-()< P |Y | > r < r-()+ for all r > 0 sufficiently large. In other words, the tail behavior of Y is dominated by the component with the heaviest tail. This also means that E(|Y |) exists for 0 < < () and diverges for > (). If we write Y = Y1 + + Yp with Yi Vi for each i = 1,...,p, then projecting (4.7) onto Vi shows that Yi is an operator stable random vector on Vi with some exponent Ei. We call this the spectral decomposition of Y with respect to E. Since every eigenvalue of Ei has the same real part ai we say that Yi is spectrally simple, with index i = 1/ai. Although Yi might not be multivariable stable, it has similar tail behavior. For any small > 0 we have r-i< P Yi > r < r-i+ for all r > 0 sufficiently large, so E( Yi ) exists for 0 < < i and diverges for > i . If X GDOA(Y) then Theorem 8.3.24 in Meerschaert and Scheffler (2001a) shows that the limit Y and norming matrices An in (3.11) can be chosen so that every Vi in the spectral decomposition of Rd with respect to the exponent E of Y is An-invariant for every n, and V1,...,Vp are mutually perpendicular. Then the probability distribution of X is regularly varying with exponent E and X has the same tail behavior as Y. In particular, for any small > 0 we have r-()< P |X | > r < r-()+ for all r > 0 sufficiently large. In this case, we say that Y is spectrally compatible with X, and we write X GDOAc(Y). Example 5.1. If Y is operator stable with exponent E = aI then (4.7) shows that Y is multivariable stable with index = 1/a. Then p = 1, P = I, and B = E. There is only one spectral component, since the tail behavior is the same in every radial direction. If asset price change vectors are IID with X = (X1,...,Xd) GDOA(Y), then every asset has the same tail behavior. If j measures the amount of the j-th asset in a portfolio, price changes for this portfolio are IID with the random variable X = X11 + + Xdd. Since the probability tails of X are uniform in every direction, the probability of a large jump in price falls off like r- for any portfolio. Example 5.2. If Y is operator stable with exponent E = diag(a1,...,ad) where a1 < < ad then p = d, P = I, B = E, Bi = ai and Vi is the i-th coordinate axis. The spectral decomposition of Y = (Y1,...,Yd) with respect to E is Y = Y1 + + Yd with Yi = Yiei , the i-th marginal laid out along the i-th coordinate axis. Projecting (4.7) onto the i-th coordinate axis shows that Yi is stable with index i = 1/ai, so that P(|Yi | > r) 612 M.M. Meerschaert and H.-P. Scheffler Cir-i . If = 0 then P(|Y | > r) falls off like r-() where () = min{i: i = 0}. In other words, the heaviest tail dominates. If asset price change vectors are IID with X GDOAc(Y), then the assets are arranged in order of increasing tail thickness. If i measures the amount of the i-th asset in a portfolio, the probability of a large jump in price falls off like r-(). Example 5.3. If Y is operator stable with exponent E = diag(1,...,d) then Bi = aiI for some ai 1/2 and di counts the number of diagonal entries j for which j = 1/i. The matrix P sorts 1,...,d in increasing order, and the vectors pik are the coordinates ej for which j = ai. The vectors Yi are multivariable stable with index i = 1/ai, so that P( Y i > r) Cir-i . For nonzero vectors Vi we have P |Y | > r = P |Yi | > r C r-i by the balanced tails condition for multivariable stable laws. For any other nonzero vector , P(|Y | > r) C r-() where () = min{1/j : j = 0}. Again, the heaviest tail dominates. If asset price change vectors are IID with X GDOAc(Y), then X has essentially the same tail behavior as Y, and P sorts the assets in order of increasing tail thickness. Example 5.4. Take B = diag(a1,...,ad) where a1 < < ad and P orthogonal, so that P-1 = P . If Y = (Y1,...,Yd) is operator stable with exponent E = PBP-1 then p = d, Bi = ai and V1,...,Vd are the coordinate axes in the new coordinate system defined by the vectors pi = Pei for i = 1,...,d. The spectral component Yi is the stable random variable Y pi with index i = 1/ai, laid out along the Vi axis. Since Yj = Y ej is a linear combination of stable laws of different indices, it is not stable. The change of coordinates P rotates the coordinate axes to make the marginals stable. Since n-PBP-1 = Pn-B P-1 it follows from (4.7) that Pn-B P-1 (Y1 + + Yn - bn) d = Y, n-B P-1 Y1 + + P-1 Yn - P-1 bn d = P-1 Y so that Y0 = P-1Y is operator stable with exponent B. Then the tail behavior of Y = PY0 follows from Example 5.2 and the change of coordinates. If we write = 1p1 ++dpd in these coordinates then P(|Y | > r) C r-() where () = min{i: i = 0}. If asset price change vectors are IID with X GDOAc(Y), then the tail behavior of X is essentially the same as Y. In particular, taking = p1 gives a portfolio with the lightest probability tails. Example 5.5. Suppose that Y is operator stable with exponent E = PBP-1 where P is orthogonal and B is given by (5.1), with di ×di blocks Bi = aiI for some 1/2 a1 < < ap. Let D0 = 0 and Di = d1 + + di for 1 i p. Then pik = Pej when j = Di-1 + k Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 613 for some k = 1,...,di and Vi = span{pik: k = 1,...,di}. To avoid double subscripts we will also write qj = Pej , so that qj = pik when j = Di-1 +k for some k = 1,...,di. The j-th column of the matrix P is the vector qj , and Eqj = PBP-1 qj = PBej = Paiej = aiPej = aiqj when qj Vi, so that qj is a unit eigenvector of the matrix E with corresponding eigenvalue ai. The spectral component Yi = di k=1 (Y pik)pik is the orthogonal projection of Y onto the di-dimensional subspace Vi. The random vector Yi is multivariable stable with index i = 1/ai, so that P( Yi > r) Cir-i , and every marginal Yik = Y pik is stable with the same index i. The change of coordinates P rotates the coordinate axes to find a set of orthogonal unit eigenvectors for E, so that the marginals of Y in the new coordinate system are all stable random variables. The matrix P also sorts the corresponding eigenvalues in increasing order. For any nonzero vector Rd , P |Y | > r C r-() , where () = i for the largest i such that the orthogonal projection of onto the subspace Vi is not equal to zero. If asset price change vectors are IID with X GDOAc(Y), then the tail behavior of X is essentially the same as Y . If = 1e1 + ded so that i measures the amount of the i-th asset in a portfolio, price changes for this portfolio are IID with X = X11 + + Xdd. In particular, any V1 gives a portfolio with the lightest probability tails. 6. Sample covariance matrix Given a data set of price changes (or log returns) X1,X2,...,Xn for a given asset, the k-th sample moment ^k = 1 n n t=1 Xk t estimates the k-th moment k = E(Xk). These sample moments are used to estimate the mean, variance, skewness and kurtosis of the data. If Xt are IID with P(|Xt | > r) Cr-, then Xk t are also IID and heavy tailed with P Xk t > r = P |Xt| > r1/k Cr-/k 614 M.M. Meerschaert and H.-P. Scheffler so the extended central limit theorem applies. Recall from Section 2 that k exists for k < and diverges for k . If > 4 then Var(X2 t ) = 4 - 2 2 exists and the central limit theorem (3.1) implies that n1/2 ( ^2 - 2) = n-1/2 n t=1 X2 t - 2 Y, (6.1) where Y is normal. When 2 < < 4, the mean 2 = E(X2 t ) of these summands exists but Var(X2 t ) is infinite, and the extended central limit theorem (3.4) implies that n1-2/ ( ^2 - 2) = n-2/ n t=1 X2 t - 2 Y, where Y is stable with index /2. When 0 < < 2 the mean 2 = E(X2 t ) of the squared price change diverges, and the extended central limit theorem implies that n1-2/ ^2 = n-2/ n t=1 X2 t Y, where again Y is stable with index /2. In this case, the sample second moment ^2 exists but the second moment 2 does not. When 0 < < 2, or when 2 < < 4 and 1 = 0, the sample variance ^2 = 1 n n t=1 (Xt - ^1)2 = ^2 - ^2 1 (6.2) is asymptotically equivalent to the sample second moment, see for example Anderson and Meerschaert (1997). Since we can always center to zero expectation when 2 < < 4, both have the same asymptotics. If > 4 the sample variance is asymptotically normal, and when 0 < < 4 the sample variance is asymptotically stable. Since the variance is a measure of price volatility, the sample variance estimates volatility. Confidence intervals for the variance are based on normal asymptotics when > 4 and stable asymptotics when 2 < < 4. When < 2 the variance is undefined, but the sample variance still captures some important features of the data, see Section 8. Suppose that Xt = (X1(t),...,Xd(t)) where Xi(t) is the price change of the i-th asset on day t. The covariance matrix characterizes dependence between price changes of different assets over the same day, and the sample covariance matrix estimates the covariance matrix. As before, it is simpler to begin with the uncentered estimate Mn = 1 n n t=1 Xt Xt, (6.3) Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 615 where X denotes the transpose of the vector X = (X1,...,Xd) and hence XX = X1 ... Xd (X1,...,Xd) = X1X1 X1Xd X2X1 X2Xd ... ... ... XdX1 XdXd is an element of the vector space Md s of symmetric d × d matrices. The ij entry of Mn is Mn(i,j) = 1 n n t=1 Xi(t)Xj (t) which estimates E(XiXj ). If Xt are IID with X, then XtXt are IID random matrices and we can apply the central limit theorems from Section 3 [see Section 10.2 in Meerschaert and Scheffler (2001a) for complete proofs]. If the probability distribution of X is regularly varying with exponent E and (4.5) holds with t{dx} = {t-E dx} for all t > 0, then the distribution of XX is also regularly varying with nP(AnXX An B) (B) as n (6.4) for Borel subsets B of Md s that are bounded away from zero and whose boundary has -measure zero. The exponent of the limit measure {d(xx )} = {dx} is defined by M = EM + ME for M Md s . Using the matrix norm M = d i=1 d j=1 M(i,j)2 1/2 we get XX 2 = d i=1 d j=1 (XiXj )2 = d i=1 X2 i d j=1 X2 j = X 4 so that XX = X 2. If every eigenvalue of E has real part ai < 1/4, then E( XX 2) = E( X 4) < and the multivariable central limit theorem (3.2) shows that n1/2 (Mn - C) = n-1/2 n t=1 (XtXt - C) W, (6.5) where W is a Gaussian random matrix and C is the (uncentered) covariance matrix C = E(XX ). The estimates of Jansen and de Vries (1991) and Loretan and Phillips (1994) 616 M.M. Meerschaert and H.-P. Scheffler indicate tail estimates in the range 2 < < 4. In this case, every eigenvalue of E has real part 1/4 < ai < 1/2. Then E( XX 2) = E( X 4) = , but E( XX ) = E( X 2) < so the covariance matrix C = E(XX ) exists. Now the generalized central limit theorem (3.11) gives nAn(Mn - C)An = An n t=1 (XtXt - C) An W, (6.6) where the limit W is a nonnormal operator stable random matrix. The estimates in Rachev and Mittnik (2000) give tail estimates in the range 1 < < 2, so that every eigenvalue of E has real part ai > 1/2. Then E( XX ) = E( X 2) = and the covariance matrix C = E(XX ) diverges. In this case, nAnMnAn W (6.7) holds with W operator stable. Since the covariance matrix is undefined, there is no reason to believe that the sample covariance matrix contains useful information. However, we will see in Section 8 that even in this case the sample covariance matrix characterizes the most important distributional features of the random vector X. The centered sample covariance matrix is defined by n = 1 n n i=1 Xi - Xn Xi - Xn , where Xn = n-1(X1 + + Xn) is the sample mean. In the heavy tailed case ai > 1/4, Theorem 10.6.15 in Meerschaert and Scheffler (2001a) shows that n and Mn have the same asymptotics, similar to the one dimensional case. In practice, it is common to meancenter the data, so it does not matter which form we choose. 7. Dependent random vectors Suppose that Xt = (X1(t),...,Xd(t)) where Xi(t) represents the price change (or log return) of the i-th asset on day t. A model where Xt are IID with X GDOA(Y) allows dependence between the price changes Xi(t) and Xj (t) on the same day t, which is commonly observed in practice. If we also want to model dependence between days, we need to relax the IID assumption. A wide variety of time series models can be mathematically reduced to a linear moving average. This reduction may involve integer or fractional differencing, detrending and deseasoning, and nonlinear mappings. Asymptotics for the underlying moving average are established in Section 10.6 of Meerschaert and Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 617 Scheffler (2001a). Assume that Z,Z1,Z2,Z3,... are IID random vectors on Rd whose probability distribution is regularly varying with exponent E, so that nP(AnZ B) (B) as n (7.1) for Borel subsets B of Rd \ {0} whose boundary have -measure zero, and t(dx) = (t-E dx) for all t > 0. If every eigenvalue of E has real part ai > 1/2 then Z GDOA(Y) and An(Z1 + + Zn - nbn) Y, (7.2) where Y is operator stable with exponent E and Lévy measure . Define the moving average process Xt = j=0 Cj Zt-j , (7.3) where Cj are d × d real matrices. If every eigenvalue of E has real part ai < ap then the moving average (7.3) is well defined as long as j=0 Cj < (7.4) for some < 1/ap with 1. If every eigenvalue of E has real part ai < 1/2, then E( Xt 2) exists and the asymptotics are normal, see Brockwell and Davis (1991). If every eigenvalue of E has real part ai > 1/2, and if for each j either Cj = 0, or else C-1 j exists and AnCj = Cj An for all n, then Theorem 10.6.2 in Meerschaert and Scheffler (2001a) shows that An X1 + + Xn - n j=0 Cj bn j=0 Cj Y. (7.5) The limit in (7.5) is operator stable with no normal component and Lévy measure j Cj , where Cj = 0 if Cj = 0 and otherwise Cj (dx) = (C-1 j dx). If every eigenvalue of E has real part ai < 1/2, then both the mean m = E(Xt) and the lag h covariance matrix (h) = E (Xt - m)(Xt+h - m) exist. The matrix (h) tells us when price changes on day t are correlated with price changes (of the same asset or some other asset) h days later. These correlations are useful 618 M.M. Meerschaert and H.-P. Scheffler to identify leading indicators, and they are the basic tools of time series modeling. The sample covariance matrix at lag h 0 for the moving average Xt is defined by n(h) = 1 n - h n-h t=1 Xt - X Xt+h - X , (7.6) where X = (X1 + + Xn)/n. If every eigenvalue of E has real part ai < 1/4, then E( Xt 4) < and n(h) is asymptotically normal, see Brockwell and Davis (1991). If every eigenvalue of E has real part 1/4 < ai < 1/2, the estimates of Jansen and de Vries (1991) and Loretan and Phillips (1994), then An n t=1 (ZtZt - D) An U (7.7) as in Section 6, where U is a nonnormal operator stable random matrix and D = E(ZZ ). Then Theorem 10.6.15 in Meerschaert and Scheffler (2001a) shows that nAn n(h) - (h) An j=0 Cj UCj+h (7.8) for any h 0. The asymptotics (7.8) determine which elements of the sample covariance matrix n(h) are statistically significantly different from zero. If every eigenvalue of E has real part ai > 1/2, as in the estimates of Rachev and Mittnik (2000), then An n t=1 Zt Zt An U (7.9) and Theorem 10.6.15 in Meerschaert and Scheffler (2001a) shows that nAnn(h)An j=0 Cj UCj+h (7.10) for any h 0. In this case the covariance matrix (h) does not exist, but the sample covariance matrix n(h) still contains useful information about the time series Xt of price changes. In the next section, we will explain this apparent paradox. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 619 8. Tail estimation Given a set of price changes (or log-returns) X1,...,Xn for some asset, it is important to estimate the tail behavior. If the price changes Xt are identically distributed6 with X and P(X > r) Cr-, then the dispersion C and the tail index determine the central limit behavior, as well as the extreme value behavior, of the price change distribution. Mandelbrot (1963) pioneered a graphical estimation method for C and . If y = P(X > r) Cr- then logy logC - logr. Ordering the data so that X(1) X(2) X(n) we should have approximately that r = X(i) when y = i/n. Then a plot of logX(i) versus log(i/n) should be approximately linear with slope - and logC can be estimated from the vertical axis intercept. If P(X > r) Cr- for r large, then the upper tail should be approximately linear. We call this a Mandelbrot plot. Several Mandelbrot plots for stock market and exchange rate returns appear in Loretan and Phillips (1994) as evidence of heavy tails with 2.5 < < 3. Replacing X by -X gives information about the left tail. Least squares estimators for based on the Mandelbrot plot were proposed by Schultze and Steinebach (1996), see also Csörg˝o and Viharos (1997). The most popular numerical estimator for C and is due to Hill (1975), see also Hall (1982). Sort the data in decreasing order to obtain the order statistics X(1) X(2) X(n). Assuming that P(X > r) = Cr- for large values of r > 0, the maximum likelihood estimates for and C based on the m + 1 largest observations are ^ = 1 m m i=1 lnX(i) - lnX(m+1) -1 , (8.1) C = m n X^ (m+1), where m is to be taken as large as possible, but small enough so that the tail condition P(X > r) = Cr- remains valid. Replacing X by -X gives estimates for the left tail. Replacing X by |X| gives estimates for the combined tail. This is often advantageous, because it allows us to combine the data from both tails, and increase the number m of order statistics used. Finding the best value of m is a challenge, and creates a certain amount of controversy. Jansen and de Vries (1991) use Hilĺs estimator with a fixed value of m = 100 for several different assets. Loretan and P hillips (1994) tabulate several different values of m for each asset. Hilĺs estimator ^ is consistent and asymptotically normal with variance 2/m, so confidence intervals are easy to construct. These intervals clearly demonstrate that the tail parameters in Jansen and de Vries (1991) and Loretan and Phillips (1994) vary depending on the asset. Aban and Meerschaert (2001) develop a more general Hilĺs estimator to account for a possible shift in the data. If P(X > r) = C(r - s)- for r large, the maximum likelihood 6 Note that we are not assuming IID here. 620 M.M. Meerschaert and H.-P. Scheffler estimates for and C based on the m + 1 largest observations are ^ = 1 m m i=1 ln(X(i) - ^s) - ln(X(m+1) - ^s) -1 , (8.2) C = m n (X(m+1) - ^s)^ , where ^s is obtained by numerically solving the equation ^(X(m+1) - ^s)-1 = (^ + 1) 1 m m i=1 (X(i) - ^s)-1 (8.3) over ^s < X(m+1). Once the optimal shift is computed, ^ and C come from Hilĺs estimator applied to the shifted data. One practical implication is that, since the Pareto model is not shift-invariant, it is a good idea to try shifting the data to get a linear Mandelbrot plot. If Xt is the sum of many IID price shocks, then it can be argued that the distribution of Xt must be (at least approximately) stable with distribution S(,,b). Maximum likelihood estimation for the stable parameters is now practical, using the efficient method of Nolan (1997) for computing stable densities, see also Mittnik et al. (1999), Mittnik, Doganoglu and Chenyao (1999). Since the stable index 0 < 2, the stable MLE for cannot possibly agree with the estimates found in Jansen and de Vries (1991) and Loretan and Phillips (1994). Rachev and Mittnik (2000) use a stable model for price changes, and their estimates yield 1 < < 2 for a variety of assets. McCulloch (1997) argues that the > 2 estimates found in Jansen and de Vries (1991) and Loretan and Phillips (1994) are inflated due to a distributional misspecification. The Pareto tail of a stable random variable X disappears as 2, so that it may be impossible to take m large enough for a reliable estimate, see Fofack and Nolan (1999) for a more detailed discussion. The estimator in Aban and Meerschaert (2001) corrects for the fact that Hilĺs ^ is not shift-invariant, and may go some distance towards correcting the problem identified by McCulloch (1997). Maximum likelihood estimation is quite sensitive to deviations from the proscribed distribution, and it is no surprise that the MLE computations of Jansen and de Vries (1991) and Loretan and Phillips (1994), based on the Pareto model, differ significantly from the estimates of Rachev and Mittnik (2000), based on a stable model. Part of the controversy stems from the fact that the range of is limited to (0,2] for the stable model. Akgiray and Booth (1988) interpret the results of Hilĺs estimator for stock returns as evidence against the stable model. Actual finance data does not exactly fit either the stable or Pareto-tail models, and in our opinion, parameter estimates are only valid with respect to the model used to obtain them, so that Pareto-based estimates of > 2 in no way invalidate the stable model. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 621 Meerschaert and Scheffler (1998) propose a robust estimator ^ = 2 lnn lnn + ln ^2 (8.4) based on the sample variance (6.2). This estimator can be applied whenever X DOA(Y) and Y is stable with index 0 < < 2. Then X can be stable or Pareto, or any distribution with balanced power-law tails. The estimator is also applicable to dependent data, since it also applies when Xt = j cj Zt-j , Zt is IID with Z DOA(Y), and Y is stable with index 0 < < 2. The estimator is based on the simple idea that n1-2/ ^2 Y, ln n^2 - 2 lnn lnY, 2 lnn ln(n^2) 2 lnn - 1 lnY so that ln(n^2)/(2 lnn) estimates 1/. If X has heavy tails with 2 then ^ 2. In this case, we can apply the estimator to Xk, which also has heavy tails with tail parameter /k. It is interesting, and even somewhat ironic, that the sample variance can be used to estimate tail behavior, and hence tells us something about the spread of typical values, even in this case 0 < < 2 where the variance is undefined. Portfolio modeling requires a vector model to incorporate dependence between price changes for different assets. In these vector models, the sample variance is replaced by the sample covariance matrix. For heavy tailed price changes with infinite variance, the covariance matrix does not exist. Even so, we will see that the sample covariance matrix is a very useful tool for portfolio modeling. Suppose that Xt = (X1(t),...,Xd(t)) where Xi(t) is the price change of the i-th asset on day t. If Xt are identically distributed with X and if X has heavy tails with P( X > r) Cr- then the vector norms X1 ,..., Xn can be used to estimate the tail parameter . Alternatively, we can apply one variable tail estimators to the i-th marginal to get an estimate ^i of the tail parameter. If the probability tails of X fall off at the same rate r- in every radial direction, then these estimates should all be reasonably close. In that case, we might assume that X is multivariable stable with distribution S(,M,b). The mean b can be estimated using the sample mean in the usual case 1 < < 2. Several estimators now exist for the scale and the mixing measure M, or equivalently, for the spectral measure (d) = M(d). Those estimators are surveyed in another chapter in this Handbook (Kozubowski, Panorska and Rachev, 2003), so we will not dwell on them here. If > 2, one might consider the multivariable Pareto laws introduced in Example 3.1. If P( X > r) Cr- and the balanced tails condition (3.13) holds for some mixing measure M, then the tail behavior of X is multivariable Pareto. Multivariable stable random vectors have this property with 0 < < 2. If > 2 then multivariable Pareto could offer a reasonable alternative, which to our knowledge has not been pursued in the finance literature. 622 M.M. Meerschaert and H.-P. Scheffler While experts disagree on the range of for typical assets, there seems to be general agreement that the tail index depends on the asset. Then it is appropriate to assume that the probability distribution of X varies regularly with some exponent E. For IID random vectors, a method for estimating the exponent E can be found in Section 10.4 of Meerschaert and Scheffler (2001a). In Section 9 we show that the same methods also apply to dependent random vectors which are identically distributed. The method is applicable when the eigenvalues of E all have real part ai > 1/2, the infinite variance case. To be concrete, we adopt the model of Example 5.5, which is the simplest model flexible enough for realism. This model assumes that E has a set of d mutually orthogonal unit eigenvectors. Note that if the eigenvalues of E are all distinct then these unit eigenvectors are unique up to a factor of 1. On the other hand, if E = aI for some a > 1/2 then any set of d mutually orthogonal unit vectors can be used. Recall the spectral decomposition E = PBP-1 from Example 5.5, where P is orthogonal and B is given by (5.1), with di × di blocks Bi = aiI for some 1/2 a1 < < ap. Let D0 = 0 and Di = d1 + + di for 1 i p. Then qj = Pej is a unit eigenvector of the matrix E and the di dimensional subspace Vi = span{qj : Di-1 < j Di} contains every eigenvector of E with associated eigenvalue ai. Our estimator for E is based on the sample covariance matrix Mn defined in (6.3). Since Mn is symmetric and nonnegative definite, there exists an orthonormal basis of eigenvectors for Mn with nonnegative eigenvalues. Eigenvalues and eigenvectors of Mn are easily computed using standard numerical routines, see for example Press et al. (1987). Sort the eigenvalues 1 d and the associated unit eigenvectors 1,...,d so that Mnj = j j for each j = 1,...,d. Now Theorem 10.4.5 in Meerschaert and Scheffler (2001a) shows that logn + logj 2 logn ai as n in probability for any Di-1 < j Di. This is a multivariable analogue for the one variable tail estimator (8.4). Furthermore, Theorem 10.4.8 in Meerschaert and Scheffler (2001a) shows that the eigenvectors j converge in probability to V1 when j D1, and to Vp when j > Dp-1. This shows that the eigenvectors estimate the coordinate vectors in the spectral decomposition, at least for the lightest and heaviest tails. Now we illustrate the practical application of the multivariable tail estimator. Recall that Xt = (X1(t),...,Xd(t)) where Xi(t) is the price change of the i-th asset on day t. Compute the (uncentered) sample covariance matrix Mn using the formula (6.3) and then Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 623 compute the eigenvalues 1 d and the associated eigenvectors 1 = 1(1),...,d(1) ... (8.5) d = 1(d),...,d(d) of the matrix Mn. A change of coordinates is essential to the method. Write Zj (t) = Xt j = X1(t)1(j) + + Xd(t)d(j) for each j = 1,...,d. Our portfolio model is based on these new coordinates. Let ^j = 2 logn logn + logj for each j = 1,...,n. Since the eigenvalues are sorted in increasing order we will have ^1 ^d . Our model assumes that Zj (t) are identically distributed with Zj , and the tail parameter ^j governs the j-th coordinate Zj . If ^j < 2 then P(|Zj | > r) falls off like r-^j and if ^j 2 then a finite variance model for Zj is adequate. We can also use any other one variable tail estimator to get j for each of the new coordinates Zj (t). The new coordinates unmask variations in that would go undetected in the original coordinates. Example 8.1. We look at a data set of n = 2853 daily exchange rate log-returns X1(t) for the German Deutsch Mark and X2(t) for the Japanese Yen, both taken against the US Dollar. We divide each entry by 0.004 which is the approximate median for both |X1(t)| and |X2(t)|. This has no effect on the eigenvectors but helps to obtain good estimates of the tail thickness. Then we compute Mn = 1 n n t=1 X1(t)2 X1(t)X2(t) X1(t)X2(t) X2(t)2 = 3.204 2.100 2.100 3.011 which has eigenvalues 1 = 1.006, 2 = 5.209 and associated unit eigenvectors 1 = (0.69,-0.72) , 2 = (0.72,0.69) . Next we compute ^1 = 2 ln2853 ln 2853 + ln1.006 = 1.998, (8.6) ^2 = 2 ln2853 ln 2853 + ln5.209 = 1.656 indicating that Z1(t) = 0.69X1(t) - 0.72X2(t) fits a finite variance model but Z2(t) = 0.72X1(t) + 0.69X2(t) fits a heavy tailed model with = 1.656. Then we can model 624 M.M. Meerschaert and H.-P. Scheffler Fig. 3. Exchange rates against the US dollar. The new coordinates uncover variations in the tail parameter . Zt = (Z1(t),Z2(t)) as being identically distributed with the random vector Z = (Z1,Z2) where P(|Z2| > r) C1r-1.656 and Var(Z1) < . The simplest model with these properties is to take Z1(t) normal and Z2(t) stable with index = 1.656 and independent of Z1(t). Next we explain the operator stable model based on these estimates. The random vectors Zt are operator stable with exponent B = 0.50 0 0 0.60 since 0.50 = 1/1.998 and 0.60 = 1/1.656. The change of coordinates matrix P = 0.69 -0.72 0.72 0.69 so that Zt = Z1(t) Z2(t) = 0.69 -0.72 0.72 0.69 X1(t) X2(t) = PXt . Since P-1 = 0.69 0.72 -0.72 0.69 Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 625 (rounded off to two decimal places) we also have Xt = P-1 Zt = X1(t) X2(t) = 0.69 0.72 -0.72 0.69 Z1(t) Z2(t) so that X1(t) = 0.69Z1(t) + 0.72Z2(t), (8.7) X2(t) = -0.72Z1(t) + 0.69Z2(t). Both exchange rates have a common heavy-tailed stable factor Z2(t) and so both exchange rates have heavy tails with the same tail index = 1.656. It is tempting to interpret Z2(t) as the common influence of fluctuations in the US dollar, and the remaining light-tailed factor Z1(t) as the accumulation of other price shocks independent of the US dollar. We also take the opportunity to fill in the details of Example 5.4 in this simple case. The original data Xt = P-1Zt is modeled as operator stable with exponent E = PBP-1 = 0.55 0.05 0.05 0.55 . In this case, Z1(t) and Z2(t) are independent so the density of Zt is the product of the two marginal densities, and then the density of Xt can be obtained by a simple change of variables. The columns of the change of variables matrix P are the eigenvectors j of the sample covariance matrix, which estimate the theoretical coordinate system vectors pj in the spectral decomposition. Remark 8.2. This exchange rate data in Example 8.1 was also analyzed by Nolan, Panorska and McCulloch (2001) using a multivariable stable model. Since both marginals X1(t) and X2(t) have heavy tails with the same , there is no obvious reason to employ a more complicated model. However, the change of coordinates in Example 8.1 uncovers variations in the tail parameter , an important modeling insight. Remark 8.3. Kotz, Kozubowski and Podgórski (2001) employ a very different model for the data in Example 8.1, based on the Laplace distribution. This distribution, and its multivariable analogues, assume exponential probability tails for the data. These models have heavier tails than the Gaussian, but they have moments of all orders. Remark 8.4. The simplistic model in Example 8.1 assumes that the two factors Z1 and Z2 are independent. If we assume that Z is operator stable with Z1 normal and Z2 stable then these components must be independent, in view of the general characteristic function formula for operator stable laws. Another alternative is to assume that Z1 is stable with index = 1.998, very close to a normal distribution. In this case, the two components can 626 M.M. Meerschaert and H.-P. Scheffler be dependent. The dependence is captured by the mixing measure or spectral measure, see Example 4.1. Scheffler (1999) provides a method for estimating the spectral measure from data for an operator stable random vector with a known exponent. This provides a more flexible model including dependence between the two factors. 9. Tail estimator proof for dependent random vectors In this section, we provide a proof that the multivariable tail estimator of Section 8 is still valid for certain sequences of dependent heavy tailed random vectors. We say that a sequence (Bn) of invertible linear operators is regularly varying with index -E if for any > 0 we have B[n]B-1 n -E as n . For further information about regular variation of linear operators see Meerschaert and Scheffler (2001a, Chapter 4). In view of Theorem 2.1.14 of Meerschaert and Scheffler (2001a) we can write Rd = V1 Vp and E = E1 Ep for some 1 p d where each Vi is E invariant, Ei :Vi Vi and Re() = ai for all real parts of the eigenvalues of Ei and some a1 < < ap. By Definition 2.1.15 of Meerschaert and Scheffler (2001a) this is called the spectral decomposition of Rd with respect to E. By Definition 4.3.13 of Meerschaert and Scheffler (2001a) we say that (Bn) is spectrally compatible with -E if every Vi is Bn-invariant for all n. Note that in this case we can write Bn = B1n Bpn and each Bin :Vi Vi is regularly varying with index -Ei. [See Proposition 4.3.14 of Meerschaert and Scheffler (2001a).] For the proofs in this section we will always assume that the subspaces Vi in the spectral decomposition of Rd with respect to E are mutually orthogonal. We will also assume that (Bn) is spectrally compatible with -E. Let i denote the orthogonalprojection operator onto Vi. If we let Pi = i + + p and Li = Vi Vp then Pi :Rd Li is a orthogonal projection. Furthermore, Pi = 1 + + i is the orthogonal projection onto Li = V1 Vi. Now assume 0 < a1 < < ap. Since (Bn) is spectrally compatible with -E, Proposition 4.3.14 of Meerschaert and Scheffler (2001a) shows that the conclusions of Theorem 4.3.1 of Meerschaert and Scheffler (2001a) hold with Li = Vi Vp for each i = 1,...,p. Then for any > 0 and any x Li \ Li+1 we have n-ai Bnx n-ai+ (9.1) for all large n. Then log Bnx logn -ai as n (9.2) Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 627 and since this convergence is uniform on compact subsets of Li \ Li+1 we also have log iBn logn -ai as n . (9.3) It follows that log Bn logn -a1 as n . (9.4) Since (Bn)-1 is regularly varying with index E , a similar argument shows that for any x Li \ Li-1 we have nai- (Bn)-1 x nai+ (9.5) for all large n. Then log (Bn)-1x logn ai as n (9.6) and since this convergence is uniform on compact subsets of Li \ Li-1 we also have log i(Bn)-1 logn ai as n . (9.7) Hence log (Bn)-1 logn ap as n . (9.8) Suppose that Xt , t = 1,2,..., are Rd-valued random vectors and let Mn be the sample covariance matrix of (Xt ) defined by (6.3). Note that Mn is symmetric and positive semidefinite. Let 0 1n dn denote the eigenvalues of Mn and let 1n,...,dn be the corresponding orthonormal basis of eigenvectors. Basic Assumptions. Assume that for some exponent E with real spectrum 1/2 < a1 < < ap the subspaces Vi in the spectral decomposition of Rd with respect to E are mutually orthogonal, and there exists a sequence (Bn) regularly varying with index -E and spectrally compatible with -E such that: (A1) The set {n(BnMnBn): n 1} is weakly relatively compact. (A2) For any limit point M of this set we have: (a) M is almost surely positive definite. (b) For all unit vectors the random variable M has no atom at zero. 628 M.M. Meerschaert and H.-P. Scheffler Now let Rd = V1 Vp be the spectral decomposition of Rd with respect to E. Put di = dimVi and for i = 1,...,p let bi = di + + dp and bi = d1 + + di. Our goal is now to estimate the real spectrum a1 < < ap of E as well as the spectral decomposition V1,...,Vp. In various situation, these quantities completely describe the moment behavior of the Xt . Theorem 9.1. Under our basic assumptions, for i = 1,...,p and bi-1 < j bi we have log(njn) 2 logn ai in probability as n . The proof of Theorem 9.1 is in parts quite similar to the Theorem 2 in Meerschaert and Scheffler (1999b). See also Section 10.4 in Meerschaert and Scheffler (2001a), and Scheffler (1998). We include it here for sake of completeness. Proposition 9.2. Under our basic assumptions we have log(ndn) 2 logn ap in probability. Proof: For > 0 arbitrary we have P log(ndn) 2 logn - ap > P dn > n2(ap+)-1 + P dn < n2(ap-)-1 . Now choose 0 < < and note that by (9.8) we have (Bn)-1 nap+ for all large n. Using assumption (A1) we obtain for all large n P dn > n2(ap+)-1 = P Mn > n2(ap+)-1 P (Bn)-1 2 nBnMnBn > n2(ap+) P nBnMnBn > n2(-) and the last probability tends to zero as n . Now fix any 0 Lp \ Lp-1 and write (Bn)-10 = rnn for some unit vector n and rn > 0. Theorem 4.3.14 of Meerschaert and Scheffler (2001a) shows that every limit point of (n) lies in the unit sphere in Vp. Then since (9.5) holds uniformly on compact sets we have for any 0 < < that nap- rn nap+ for all large n. Then for all large n we get P dn < n2(ap-)-1 = P max =1 Mn < n2(ap-)-1 P Mn0 0 < n2(ap-)-1 Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 629 = P nBnMnBnn n < r-2 n n2(ap-)-1 P nBnMnBnn n < n2(-) . Given any subsequence (n ) there exists a further subsequence (n ) (n ) along which n . Furthermore, by assumption (A1) there exists another subsequence (n ) (n ) such that nBnMnBn M along (n ). Hence by continuous mapping [see Theorem 1.2.8 in Meerschaert and Scheffler (2001a)] we have nBnMnBnn n M along (n ). Now, given any 1 > 0 by assumption (A2)(b) there exists a > 0 such that P{M < } < 1/2. Hence for all large n = n we have P nBnMnBnn n < n2(-) P{nBnMnBnn n < } P{M < } + 1 2 < 1. Since for any subsequence there exists a further subsequence along which P nBnMnBnnn < n2(-) 0, this convergence holds along the entire sequence which concludes the proof. Proposition 9.3. Under the basic assumptions we have log(n1n) 2 logn a1 in probability. Proof: Since the set GL(Rd) of invertible matrices is an open subset of the vector space of d × d real matrices, it follows from (A1) and (A2)(a) together with the Portmanteau Theorem [cf., Theorem 1.2.2 in Meerschaert and Scheffler (2001a)] that limn P{Mn GL(Rd)} = 1 holds. Hence we can assume without loss of generality that Mn is invertible for all large n. Given any > 0 write P log(n1n) 2 logn - a1 > P 1n > n2(a1+)-1 + P 1n < n2(a1-)-1 . To estimate the first probability on the right-hand side of the inequality above choose a unit vector 0 L1 and write (Bn)-10 = rnn as above. Then, since (9.5) holds uniformly 630 M.M. Meerschaert and H.-P. Scheffler on the unit sphere in L1 = V1, for 0 < < we have na1- rn na1+ for all large n. Therefore for all large n P 1n > n2(a1+)-1 P min =1 Mn > n2(a1+)-1 P Mn0 0 > n2(a1+)-1 P nBnMnBnn n > n2(-) . It follows from assumption (A1) together with the compactness of the unit sphere in Rd and continuous mapping that the sequence (nBnMnBnn n) is weakly relatively compact and hence by Prohorov's Theorem this sequence is uniformly tight. Since > it follows that P{1n > n2(a1+)-1} 0 as n . Since the smallest eigenvalue of Mn is the reciprocal of the largest eigenvalue of M-1 n we have P 1n < n2(a1-)-1 = P 1 1n > n2(-a1)+1 = P max =1 M-1 n > n2(-a1)+1 = P M-1 n > n2(-a1)+1 P 1 n (Bn)-1 M-1 n B-1 n > Bn -2 n2(-a1) . It follows from (9.4) that for any 0 < < there exists a constant C > 0 such that Bn Cn-a1+ for all n and hence for some constant K > 0 we get Bn -2 Kn2(a1-) for all n. Note that by assumptions (A1) and (A2)(a) together with continuous mapping the sequence 1 n (Bn)-1 M-1 n B-1 n is weakly relatively compact and hence by Prohorov's theorem this sequence is uniformly tight. Hence P 1 n (Bn)-1 M-1 n B-1 n > Bn -2 n2(-a1) P 1 n (Bn)-1 M-1 n B-1 n > Kn2(-) 0 as n . This concludes the proof. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 631 Proof of Theorem 9.1: Let Cj denote the collection of all orthogonal projections onto subspaces of Rd with dimension j. The Courant­Fischer Max­Min Theorem [see, Rao (1965, p. 51)] implies that jn = min PCj max =1 PMnP = max PCd-j+1 min =1 PMnP . (9.9) Note that P2 i = Pi and that Bn and Pi commute for all n,i. Furthermore (PiBn) is regularly varying with index Ei Ep. Since n(PiBn)PiMnPi(BnPi) = nPi(BnMnBn)Pi it follows by projection from our basic assumptions that the sample covariance matrix formed from the Li valued random variables PiXt satisfies again those basic assumptions with E = Ei Ep on Li. Hence if n denotes the smallest eigenvalue of the matrix PiMnPi it follows from Proposition 9.3 that log(nn) 2 logn ai in probability. Similarly, the sample covariance matrix formed in terms of the Li-valued random vectors LiXt again satisfies the basic assumptions with E = E1 Ei as above. Then, if n denotes the largest eigenvalue of the matrix PiMnPi it follows from Proposition 9.2 above that log(nn) 2 logn ai in probability. Now apply (9.9) to see that n jn n whenever bi-1 < j bi. The result now follows easily. After dealing with the asymptotics of the eigenvalues of the sample covariance in Theorem 9.1 above we now investigate the convergence of the unit eigenvectors of Mn. Recall that i :Rd Vi denotes the orthogonal projection onto Vi for i = 1,...,p. Define the random projection in(x) = bi j=bi-1+1 (x jn)jn. 632 M.M. Meerschaert and H.-P. Scheffler Theorem 9.4. Under the basic assumptions we have 1n 1 and pn p in probability as n . Again the proof is quite similar to the proof of Theorem 3 in Meerschaert and Scheffler (1999b) and Theorem 10.4.8 in Meerschaert and Scheffler (2001a). See also Scheffler (1998). We include here a sketch of the arguments. Proposition 9.5. Under our basic assumptions we have: If j > bp-1 and r < p then rjn 0 in probability. Proof: Since rjn = (rMn/jn)jn we get rjn rMn jn rB-1 n nBnMnBn (Bn)-1 njn . By assumption (A1) together with continuous mapping it follows from Prohorov's theorem that (n BnMnBn ) is uniformly tight. Also, by (9.7), (9.8) and Theorem 9.1 we get log( rB-1 n nBnMnBn (Bn)-1 )/(njn) logn = log rB-1 n logn + log (Bn)-1 logn - log(njn) logn ar + ap - 2ap < 0 in probability. Hence the assertion follows. Proposition 9.6. Under our basic assumptions we have: If j b1 and r > 1 then rjn 0 in probability. Proof: Since rjn = (rM-1 n jn)jn we get rjn rM-1 n jn rBn 1 n (Bn)-1 M-1 n B-1 n Bn (njn). As in the proof of Proposition 9.3 the sequence (1 n (Bn)-1M-1 n B-1 n ) is uniformly tight and now the assertion follows as in the proof of Proposition 9.5. Proof of Theorem 9.4: The proof is almost identical to the proof of Theorem 3 in Meerschaert and Scheffler (1999b) or Theorem 10.4.8 in Meerschaert and Scheffler (2001a) and therefore omitted. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 633 Corollary 9.7. Under our basic assumptions, if p 3 then in i in probability for i = 1,...,p. Proof: Obvious. Example 9.8. Suppose that Z,Z1,Z2,... is a sequence of independent and identically distributed (IID) random vectors with common distribution . We assume that is regularly varying with exponent E. That means that there exists a regularly varying sequence (An) of linear operators with index -E such that n(An) as n . (9.10) For more information on regularly varying measures see Meerschaert and Scheffler (2001a, Chapter 6). Regularly varying measures are closely related to the generalized central limit theorem discussed in Section 3. Recall that if An(Z1 + + Zn - nbn) Y as n (9.11) for some nonrandom bn Rd, we say that Z belongs to the generalized domain of attraction of Y and we write Z GDOA(Y). Corollary 8.2.12 in Meerschaert and Scheffler (2001a) shows that Z GDOA(Y) and (9.11) holds if and only if varies regularly with exponent E and (9.10) holds, where the real parts of the eigenvalues of E are greater than 1/2. In this case, Y has an operator stable distribution and the measure in (9.10) is the Lévy measure of the distribution of Y. Operator stable distributions and Lévy measures were discussed in Section 4, where (9.10) is written in the equivalent form nP(AnZ dx) (dx). The spectral decomposition was discussed in Section 5. Theorem 8.3.24 in Meerschaert and Scheffler (2001a) shows that we can always choose norming operators An and limit Y in (9.11) so that Y is spectrally compatible with Z, meaning that An varies regularly with some exponent -E, the subspaces Vi in the spectral decomposition of Rd with respect to E are mutually orthogonal, and these subspaces are also An-invariant for every n. In this case, we write Z GDOAc(Y). Recall from Section 6 that, since the real parts of the eigenvalues of E are greater than 1/2, nAnMnAn W as n , (9.12) where Mn is the uncentered sample covariance matrix Mn = 1 n n i=1 ZiZi and W is a random d × d matrix whose distribution is operator stable. Theorem 10.2.9 in Meerschaert and Scheffler (2001a) shows that W is invertible with probability one, and 634 M.M. Meerschaert and H.-P. Scheffler Theorem 10.4.2 in Meerschaert and Scheffler (2001a) shows that for all unit vectors Rd the random variable W has a Lebesgue density. Then the basic assumptions of this section hold, and hence the results of this section apply. The tail estimator proven in this section approximates the spectral index function (x) defined in (5.2). This index function provides sharp bounds on the tails and radial projection moments of Z. Given a d-dimensional data set Z1,...,Zn with uncentered covariance matrix Mn, let 0 1n dn denote the eigenvalues of Mn and 1n,...,dn the corresponding orthonormal basis of eigenvectors. Writing xj = x j we can estimate the spectral index (x) by ^(x) = min{^j : xj = 0}, where ^j = 2 logn log(njn) using the results of this section. Hence the eigenvalues are used to approximate the tail behavior, and the eigenvectors determine the coordinate system to which these estimates pertain. A practical application of this tail estimator appears in Example 8.1. Example 9.9. The same tail estimation methods used in the previous example also apply to the moving averages considered in Section 7. This result is apparently new. Given a sequence of IID random vectors Z,Zj whose common distribution varies regularly with exponent E, so that (9.10) holds, we define the moving average process Xt = j=Cj Zt-j , (9.13) where we assume that the d × d matrices Cj fulfill for each j either Cj = 0 or Cj is invertible and AnCj = Cj An for all n. Moreover if ap denotes the largest real part of the eigenvalues of E we assume further j=- Cj < (9.14) for some < 1/ap with 1. Recall from Section 7 that under those conditions Xt is almost surely well defined, and that if the real parts of the eigenvalues of E are greater than 1/2 we have that nAnn(0)An M = j=Cj WCj as n , (9.15) where the sample covariance matrix n(h) is defined by (7.6) and W is a random d × d matrix whose probability distribution is operator stable. Suppose that the norming Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 635 operators An are chosen so that (9.11) holds and Z GDOAc(Y). Then in view of our basic assumptions (A1) and (A2) it remains to show: Lemma 9.10. Under the assumptions of the paragraph above the limiting matrix M in (9.15) is a.s. positive definite and for any unit vector the random variable M has no atom at zero. Proof: Since W in (9.12) is a.s. positive definite we have for any = 0 that Cj WCj = WCj Cj 0 for all j and strictly greater that zero for those j with Cj = 0. Hence M = j=Cj WCj > 0 for any = 0 so M is positive definite. Moreover if for a given unit vector we set zj = Cj then zj0 = 0 for at least one j0. Since W is almost surely positive definite we have P{M < t} = P j=Wzj zj < t P{Wzj0 zj0 < t} 0 as t 0 using the fact that Wzj0 zj0 has a Lebesgue density as above. Hence M has no atom at zero. It follows from (9.15) together with Lemma 9.10 that the Xt defined above fulfill the basic assumptions of this section. Hence it follows from Theorems 9.1 and 9.4 that the tail estimator used in Example 9.8 also applies to time-dependent data that can be modeled as a multivariate moving average. We can also utilize the uncentered sample covariance matrix (6.3), which has the same asymptotics as long as EZ = 0 [cf. Theorem 10.6.7 and Corollary 10.2.6 in Meerschaert and Scheffler (2001a)]. In either case, the eigenvalues can be used to approximate the tail behavior, and the eigenvectors determine the coordinate system in which these estimates apply. Example 9.11. Suppose now that Z1,Z2,... are IID Rd-valued random vectors with common distribution . We assume that is ROV(E,c), meaning that there exist (An) regularly varying with index -E, a sequence (kn) of natural numbers tending to infinity with kn+1/kn c > 1 such that kn(Akn ) as n . (9.16) See Meerschaert and Scheffler (2001a, Section 6.2) for more information on R­O varying measures. 636 M.M. Meerschaert and H.-P. Scheffler R­O varying measures are closely related to a generalized central limit theorem. In fact, if is ROV(E,c) and the real parts of the eigenvalues of E are greater than 1/2 then (9.16) is equivalent to Akn (Z1 + + Zkn - knbn) Y as n , where Y has a so called (cE,c) operator semistable distribution. See Meerschaert and Scheffler (2001a), Sections 7.1 and 8.2 for details. Once again, a judicious choice of norming operators and limits guarantees that Y is spectrally compatible with Z, so that An varies regularly with some exponent -E, the subspaces Vi in the spectral decomposition of Rd with respect to E are mutually orthogonal, and these subspaces are also An-invariant for every n. It follows from Theorem 8.2.5 of Meerschaert and Scheffler (2001a) that Z has the same moment and tail behavior as for the generalized domain of attraction case considered in Section 5. In particular, there is a spectral index function (x) taking values in the set {a-1 1 ,...,a-1 p } where a1 < < ap are the real parts of the eigenvalues of E. Given x = 0, for any small > 0 we have r-(x)< P |Z x| > r < r-(x)+ for all r > 0 sufficiently large. Then E(|Z x|) exists for 0 < < (x) and diverges for > (x). Now let Mn = 1 n n i=1 ZiZi denote the sample covariance matrix of (Zi). Then it follows from Theorem 10.2.3, Corollaries 10.2.4 and 10.2.6, Theorem 10.2.9, and Lemma 10.4.2 in Meerschaert and Scheffler (2001a) that Mn fulfills the basic assumptions (A1) and (A2) of this section. Hence, by Theorems 9.1 and 9.4 we rediscover Theorems 10.4.5 and 10.4.8 of Meerschaert and Scheffler (2001a). See also Scheffler (1998). In other words, the approximation ^(x) from Example 9.8 still functions in this more general case, which represents the most general setting in which sums of IID random vectors can approximated in distribution via a central limit theorem. 10. Conclusions If one believes that asset price changes (or log-returns) have heavy tails, then there is ample reason to seek a model where the tail thickness parameter varies with the asset. Operator stable random vectors provide such a model, and are justified by a central limit theorem. Matrix-scaled sums of independent, identically distributed random vectors can only converge (in a distributional sense) to an operator stable limit. Such ran- Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 637 dom vectors have regularly varying probability distributions whose tails are governed by a matrix exponent. Time dependent models can be constructed by taking moving averages of these random vectors. If Xi is the price change in the i-th asset then the vector of price changes X = (X1,...,Xd) can be described by such models. If i measures the amount of the i-th asset in a portfolio, price changes for this portfolio are of the form X = X11 + + Xdd. The probability of large jumps in price depends on the mix according to a tail index function (). If 2 < () < 4 we have a finite variance model with infinite fourth moments. Then the sample covariance matrix plays the usual role as a descriptor of dependence between assets, but its asymptotics are operator stable. If () < 2 indicating heavy tails with infinite variance, the sample covariance matrix still provides some useful information.In particular, the coordinate system that diagonalizes this matrix also identifies the portfolios with the best or worst tail behavior. References Aban, I., Meerschaert, M., 2001. Shifted Hilĺs estimator for heavy tails. Communications in Statistics. Simulation and Computation 30, 949­962. Adler, R., Feldman, R., Taqqu, M., 1998. A Practical Guide to Heavy Tails. Birkhäuser, Boston. Akgiray, V., Booth, G., 1988. The stable-law model of stock returns. Journal of Business and Economic Statistics 6, 51­57. Anderson, P., Meerschaert, M., 1997. Periodic moving averages of random variables with regularly varying tails. The Annals of Statistics 25, 771­785. Anderson, P., Meerschaert, M., 1998. Modeling river flows with heavy tails. Water Resources Research 34, 2271­ 2280. Arnold, B.C., 1990. A flexible family of multivariate Pareto distributions. Journal of Statistics Planning and Inference 24, 249­258. Bawa, V., Elton, E., Gruber, M., 1979. Simple rules for optimal portfolio selection in stable Paretian markets. Journal of Finance 34 (4), 1041­1047. Belkacem, L., Véhel, J., Walter, C., 2000. CAPM, risk and portfolio selection in -stable markets. Fractals 8 (1), 99­115. Benson, D., Schumer, R., Meerschaert, M., Wheatcraft, S., 2001. Fractional dispersion, Lévy motions, and the MADE tracer tests. Transport Porous Med. 42, 211­240. Benson, D., Wheatcraft, S., Meerschaert, M., 2000. Application of a fractional advection­dispersion equation. Water Resources Research 36, 1403­1412. Blumen, A., Zumofen, G., Klafter, J., 1989. Transport aspects in anomalous diffusion: Lévy walks. Physical Review A 40, 3964­3973. Brockwell, P., Davis, R., 1991. Time Series: Theory and Methods, 2nd edition. Springer-Verlag, New York. Byczkowski, T., Nolan, J.P., Rajput, B., 1993. Approximation of multidimensional stable densities, Journal of Multivariate Analysis 46, 13­31. Chamberlain, T.W., Cheung, C.S., Kwan, C.C., 1990. Optimal portfolio selection using the general multi-index model: A stable-Paretian framework. Decision Sciences 21, 563­571. Csörg˝o, S., Viharos, L., 1997. Asymptotic normality of least-squares estimators of tail indices. Bernoulli 3, 351­ 370. Elton, E., Gruber, M., 1995. Modern Portfolio Theory and Investment Analysis. Wiley, New York. Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal Events. For Insurance and Finance. In: Applications of Mathematics, Vol. 33. Springer-Verlag, Berlin. Fama, E., 1965a. The behavior of stock market prices. Journal of Business 38, 34­105. 638 M.M. Meerschaert and H.-P. Scheffler Fama, E.F., 1965b. Portfolio analysis in a stable Paretian market. Management Science 11, 404­419. Feller, W., 1971. An Introduction to Probability Theory and its Applications, Vol. II, 2nd edition. Wiley, New York. Fofack, H., Nolan, J., 1999. Tail behavior, modes and other characteristics of stable distribution. Extremes 2, 39­58. Gamba, A., 1999. Portfolio analysis with symmetric stable Paretian returns. In: Current Topics in Quantitative Finance, Venice, 1997. In: Contribution to Management Science. Physica, Heidelberg, pp. 48­69. Gnedenko, B., Kolmogorov, A., 1968. Limit Distributions for Sums of Independent Random Variables. Translated from the Russian, annotated, and revised by K.L. Chung. With appendices by J.L. Doob and P.L. Hsu. Revised edition. Addison-Wesley, Reading, MA. Hall, P., 1981. A comedy of errors: the canonical form for a stable characteristic function. Bulletin of the London Mathematical Society 13, 23­27. Hall, P., 1982. On some simple estimates of an exponent of regular variation. Journal of the Royal Statistical Society. Series B 44, 37­42. Hill, B., 1975. A simple general approach to inference about the tail of a distribution. The Annals of Statistics 3, 1163­1173. Hirsch, M., Smale, S., 1974. Differential Equations, Dynamical Systems and Linear Algebra. Academic Press, New York. Hosking, J., Wallis, J., 1987. Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29, 339­349. Janicki, A., Weron, A., 1994. Simulation and Chaotic Behavior of -Stable Stochastic Processes. Marcel Dekker, New York. Jansen, D., de Vries, C., 1991. On the frequency of large stock market returns: Putting booms and busts into perspective. Review of Economical Statistics 23, 18­24. Jurek, Z., Mason, J.D., 1993. Operator-Limit Distributions in Probability Theory. Wiley, New York. Klafter, J., Blumen, A., Shlesinger, M., 1987. Stochastic pathways to anomalous diffusion. Physical Review A 35, 3081­3085. Kotz, S., Balakrishnan, N., Johnson, N.L., 2000. Continuous Multivariate Distributions, Vol. 1. Wiley, New York. Kotz, S., Kozubowski, T., Podgórski, K., 2001. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering and Finance. Birkhäuser, Boston. Kozubowski, T.J., Panorska, A.K., Rachev, S.T., 2003. Statistical issues in modeling multivariate stable portfolios. In: Rachev, S. (Ed.), Handbook of Heavy Tailed Distributions in Finance, Vol. 1. Elsevier, Amsterdam, pp. 131­167. Leadbetter, M., Lindgren, G., Rootzén, H., 1980. Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York. Loretan, M., Phillips, P., 1994. Testing the covariance stationarity of heavy tailed time series. Journal of Empirical Finance 1, 211­248. Mandelbrot, B., 1963. The variation of certain speculative prices. Journal of Business 36, 394­419. Mandelbrot, B., 1982. The Fractal Geometry of Nature. Freeman, San Francisco. McCulloch, J., 1996. Financial applications of stable distributions. In: Madfala, G., Rao, C.R. (Eds.), Statistical Methods in Finance. In: Handbook of Statistics, Vol. 14. Elsevier, Amsterdam, pp. 393­425. McCulloch, J., 1997. Measuring tail thickness to estimate the stable index : A critique. Journal of Business and Economics Statistics 15, 74­81. Meerschaert, M., 1990. Moments of random vectors which belong to some domain of normal attraction. The Annals of Probability 18, 870­876. Meerschaert, M., 1991. Regular variation in Rk and vector-normed domains of attraction. Statistics & Probability Letters 11, 287­289. Meerschaert, M., Scheffler, H.P., 1998. A simple robust estimator for the thickness of heavy tails. Journal of Statistical Planning and Inference 71, 19­34. Meerschaert, M., Scheffler, H.P., 1999a. Sample covariance matrix for random vectors with heavy tails. Journal of Theoretical Probability 12, 821­838. Ch. 15: Portfolio Modeling with Heavy Tailed Random Vectors 639 Meerschaert, M., Scheffler, H.P., 1999b. Moment estimator for random vectors with heavy tails. Journal of Multivariate Analysis 71, 145­159. Meerschaert, M., Scheffler, H.P., 2000. Moving averages of random vectors with regularly varying tails. Journal of Time Series Analysis 21, 297­328. Meerschaert, M., Scheffler, H.P., 2001a. Limit Distributions for Sums of Independent Random Vectors: Heavy Tails in Theory and Practice. Wiley, New York. Meerschaert, M., Scheffler, H.P., 2001b. Limit theorems for continuous time random walks. Preprint. Mittnik, S., Rachev, S., 1993. Reply to comments on Modeling asset returns with alternative stable distributions, and some extensions. Econometric Reviews 12, 347­389. Mittnik, S., Doganoglu, T., Chenyao, D., 1999. Computing the probability density function of the stable Paretian distribution. Mathematical and Computer Modelling 29, 235­240. Mittnik, S., Rachev, S., Doganoglu, T., Chenyao, D., 1999. Maximum likelihood estimation of stable Paretian models. Mathematical and Computer Modelling 29, 275­293. Modarres, R., Nolan, J.P., 1994. A method for simulating stable random vectors. Computational Statistics 9, 11­19. Nikias, C., Shao, M., 1995. Signal Processing with Alpha Stable Distributions and Applications. Wiley, New York. Nolan, J.P., 1997. Numerical calculation of stable densities and distribution functions. Heavy tails and highly volatile phenomena. Communications in Statistics. Stochastic Models 13, 759­774. Nolan, J.P., 2001. Maximum likelihood estimation of stable parameters. In: Barndorff-Nielsen, O., Mikosch, T., Resnick, S. (Eds.), Levy Processes. Birkhäuser, Boston. Nolan, J.P., 2002. Stable Distributions: Models for Heavy Tailed Data. Birkhäuser, Boston. Nolan, J.P., Panorska, A., 1997. Data analysis for heavy tailed multivariate samples. Heavy tails and highly volatile phenomena. Communications in Statistics. Stochastic Models 13, 687­702. Nolan, J.P., Panorska, A., McCulloch, J.H., 2001. Estimation of stable spectral measures. Mathematical and Computer Modelling 34, 1113­1122. Press, S.J., 1982. Applied Multivariate Analysis, 2nd edition. Krieger, Malabar. Press, W., Flannery, B., Teukolsky, S., Vetterling, W., 1987. Numerical Recipes. Cambridge University Press, New York. Rachev, S., Han, S., 2000. Portfolio management with stable distributions. Mathematical Methods of Operations Research 51 (2), 341­352. Rachev, S., Mittnik, S., 2000. Stable Paretian Models in Finance. Wiley, Chichester. Rao, C.R., 1965. Linear Statistical Inference and its Applications. Wiley, New York. Resnick, S., Greenwood, P., 1979. A bivariate characterization and domains of attraction. Journal of Multivariate Analysis 9, 206­221. Resnick, S., St˘aric˘a, C., 1995. Consistency of Hilĺs estimator for dependent data. Journal of Applied Probability 32, 139­167. Rvaˇceva, E., 1962. On domains of attraction of multidimensional distributions. In: Select. Transl. Math. Stat. Prob. 2. American Mathematical Society, Providence, RI, pp. 183­205. Samorodnitsky, G., Taqqu, M., 1994. Stable Non-Gaussian Random Processes. Chapman & Hall, New York. Scheffler, H.P., 1998. Multivariable R­O varying measures and generalized domains of semistable attraction. Habilitation thesis. University of Dortmund. Scheffler, H.P., 1999. On estimation of the spectral measure of certain nonnormal operator stable laws. Statistics & Probability Letters 43, 385­392. Schultze, J., Steinebach, J., 1996. On least squares estimates of an exponential tail coefficient. Statistics & Decisions 14, 353­372. Sharpe, M., 1969. Operator-stable probability distributions on vector groups. Translations of the American Mathematical Society 136, 51­65. Shlesinger, M., Zaslavsky, G., Frisch, U., 1994. Lévy Flights and Related Topics in Physics. In: Lecture Notes in Physics, Vol. 450. Springer-Verlag, Berlin. 640 M.M. Meerschaert and H.-P. Scheffler Tessier, Y., Lovejoy, S., Hubert, P., Schertzer, D., 1996. Multifractal analysis and modeling of rainfall and river flows and scaling, causal transfer functions. Journal of Geophysical Research 101, 427­440. Uchaikin, V., Zolotarev, V., 1999. Chance and Stability: Stable Distributions and their Applications. VSP, Utrecht. Weron, R., 1996. On the Chambers­Mallows­Stuck method for simulating skewed stable random variables. Statistics & Probability Letters 28 (2), 165­171. Ziemba, W., 1974. Choosing investment portfolios when the returns have stable distributions. In: Mathematical Programming in Theory and Practice, Proc. NATO Adv. Study Inst., Figueira da Foz, 1972. North-Holland, Amsterdam, pp. 443­482. Chapter 16 LONG RANGE DEPENDENCE IN HEAVY TAILED STOCHASTIC PROCESSES BORJANA RACHEVA-IOTOVA Faculty of Economics and Business Administration, University of Sofia, Bulgaria Bravo Risk Management Group, 1101-C Eugenia Place, Carpinteria, Santa Barbara, CA 93013, USA e-mail: borjana.racheva@bravo-group.com GENNADY SAMORODNITSKY School of Operations Research and Industrial Engineering, and Department of Statistical Science, Cornell University, Ithaca, NY 14853, USA e-mail: gennady@orie.cornell.edu Contents Abstract 642 Keywords 642 1. Introduction 643 2. What is long range dependence? 644 3. Tails and rare events 646 4. Some classes of heavy tailed processes 649 4.1. Linear processes 649 4.2. Infinitely divisible processes 650 5. Rare events, associated functionals and long range dependence 652 5.1. Unusual sample mean and long strange segments for heavy tailed linear processes 653 5.2. Ruin probability for heavy tailed linear processes 656 5.3. Rare events for stationary stable processes 657 5.4. High dimensional joint tails for a linear process with stable innovations 659 References 661 G. Samorodnitsky's research was partially supported by NSF grants DMS-0071073 and DMI-9713549 at Cornell University. Handbook of Heavy Tailed Distributions in Finance, Edited by S.T. Rachev 2003 Elsevier Science B.V. All rights reserved 642 B. Racheva-Iotova and G. Samorodnitsky Abstract The notion of long range dependence has traditionally been defined through a slow decay of correlations. This approach may be completely inappropriate in the case of a stochastic process with heavy tails. Yet long memory has been reported to be found in various fields where heavy tails are a standard feature of the commonly used stochastic models. Financial and communications networks data are among those often believed to exhibit long memory. We discuss alternative points of view on long range dependence that are applicable in the heavy tailed case. Such alternative approaches may be tailored for a particular applications at hand. Keywords heavy tails, long range dependence, rare events, large deviations Ch. 16: Long Range Dependence and Heavy Tails 643 1. Introduction A glance at the plot on Figure 1 describing the annual minima of the water level in the Nile river suggests that the process plotted there has at least 4 distinct time periods when the "level" and "drift" of the process change. This needs not, however, be necessarily taken as an indication that a nonstationary model should be used for the annual minima of water level process. Even though using a nonstationary model is possible in such situations, sometimes a more parsimonious model is a long memory stationary model (or a stationary model with long range dependence). In fact, commonly used long memory models exhibit what Benoit Mandelbrot termed "persistence", or "Joseph effect" (referring to the long stretches of plenty and famine in Egypt of the Bible). Here is what one sees when looking at the increments of a long memory Fractional Brownian motion (also called Fractional Gaussian noise): "Nearly every sample looks like a "random noise" superimposed upon a background that performs several cycles. However, there cycles are not periodic, that is, cannot be extrapolated as the sample lengthens. In addition, one often sees an underlying trend that need not continue in the extrapolate." (Mandelbrot, 1983, p. 251.) The Nile river data set is a famous one; arguably, it is the data set that forced us to think about long range dependence in the first place. It was, of course, the same Mandelbrot who with co-workers (Mandelbrot, 1965; Mandelbrot and Van Ness, 1968; Mandelbrot and Wallis 1968, 1969) first realized that a long memory stationary Gaussian process may explain the behaviour a particular statistic (the so-called R/S statistic) suggested and applied to the Nile river data by Hurst (1951, 1955). Today long memory models are still being used in hydrology and related areas. However, new applications have arisen, significantly in finance and communication networks. Often observations from these latter areas feature heavy tails, and such data sets sometimes provide extreme illustrations to the Mandelbrot remark on "spurious cycles". For example, Figure 2 describing the load offered by a network server, suggests that the process plotted there has at least 10 distinct time periods when the "nature" of the process changes. Once Fig. 1. Annual minima of the water level in the Nile river for the years 622 to 1281, measured at the Roda gauge near Cairo. 644 B. Racheva-Iotova and G. Samorodnitsky Fig. 2. Amount of information (in bytes) sent by a server from a major telecommunication company in the middle of a workday. Time is measured in seconds. again, one should not automatically decide to use a nonstationary model because there are perfectly reasonable stationary models that have a similar behavior. Stationary models are attractive not only because of parsimony but also because it is important to have a reasonably small class of well studied and well understood models that have wide applicability. Hence it is important to study stationary processes that can account for features we saw above; stationary processes with long range dependence. This chapter is an attempt to survey stationary models with long range dependence and heavy tails. These two features are believed to be present in various data sets of financial and communication networks origin and, hence, attracted recently much attention. Describing long range dependence in the heavy tailed case is especially challenging and most of the work is still ahead of us. Nevertheless, it is an exciting task, and we argue that the insights we hope one will obtain are likely to be useful in other areas of stochastic modeling. 2. What is long range dependence? The obvious way to measure the length of memory in a stochastic process is by looking at the rate at which its correlations decay with lag. Annoyingly, this requires correlations to make sense, hence finite variance needs to be assumed. Let, therefore, Xn, n = 0,1,2,..., be a stationary stochastic process with mean = EX0 and 0 < EX2 0 < (we discuss discrete time processes, but parallel formulations for stationary processes with finite variance in continuous time are entirely clear). Let n = Corr(X0,Xn), n = 0,1,..., be the correlation function. For most "usual" stochastic models: ARMA processes, GARCH processes, many Markov and Markov modulated processes the correlations decay exponentially fast with n; this has a number of important consequences, one of which is n=0 |n| < . This, in turn, guarantees that the variance of the partial sums Sn = X1 + + Xn, n 0, cannot grow more than linearly fast, which says, heuristically, that we do not expect to see Sn to be more than about n away from its Ch. 16: Long Range Dependence and Heavy Tails 645 mean n. What Mandelbrot realized two and half decades ago was that the strange behavior of R/S statistic on the Nile river data might be explained if the variance of the partial sums could grow faster than linearly fast. As we already know this implies that n=0 |n| = . (2.1) Hence, (2.1) is often taken as the definition of the long memory; as a definition it seems to originate with Cox (1984). Looking over the literature on long range dependence, one realizes that the definition (2.1) has not proved to be the most popular one. By far the most widely used definition is the more concrete n cn-d as n for some 0 < d < 1 and c > 0. (2.2) A stationary process satisfying (2.2) would have been called long range dependent of index d by Cox (1984). A weaker version of (2.2) is also sometimes mentioned; it replaces the constant c by a slowly varying function (Beran, 1994). Quite often one writes the exponent d = 2 - 2H for some 0.5 < H < 1 [e.g., Beran (1992)], and the reasons are historical: this is the relation between the exponent H of self-similarity of a Fractional Brownian motion and the rate of decay of correlations of its increments. A misnomer, H = 1-d/2 is at times referred to as the self-similarity parameter even if nothing in the model is self similar. Less common (but still used) point of view on long range dependence is to allow d in (2.2) to take any positive value or, indeed, a similar assumption of regular variation of correlations (which also allows for the slowly varying case d = 0). In fact, one could even draw the line between long and short memory by distinguishing between correlations decaying slower than exponentially fast and those decaying at least exponentially fast. It is difficult to justify such importance assigned to the rate of decay of correlations (or, almost equivalently, to the rate at which the spectral density of the stationary process grows at the origin), unless one deals with a Gaussian model (like the Fractional Gaussian noise, the increment process of a Fractional Brownian motion), or a process that is very close to being Gaussian. Using correlations as a measure of the length of memory becomes untenable in the case of heavy tails. Specifically, let Xn, n = 0,1,2,..., be a stationary stochastic process. Let F be the distribution function of X0, and F = 1 - F the (right) distribution tail. In a tradition going back, once again, to Mandelbrot in the early 1960s [an exhaustive list of references is in Mandelbrot (1983)] heavy tails are synonymous with infinite variance of X0. Once again, more concrete views are prevailing in literature; it is common to identify heavy tails with a particular tail behavior of F . Sometimes one assumes F(x) cxas x for some 0 < < 2 and c > 0. (2.3) See, for example, a recent collection Park and Willinger (2000). Note the parallels between the various definitions of heavy tails and viewing long range dependence via the rate of 646 B. Racheva-Iotova and G. Samorodnitsky decay of correlations. Again, one allows sometimes any positive value of in (2.3) [see, e.g., Müller, Dacorogna and Pictet (1998), Gomez, Selman and Crato (1997)]. Here regular variation of the tails as opposed to power-like (Pareto-type is the common expression) is widely accepted. Finally, faster (but still slower than exponentially fast) rate of decay of the distribution tail is sometimes also regarded as being consistent with heavy tails. Here one does not usually go beyond the class of subexponential distributions; see Embrechts, Klüppelberg and Mikosch (1997). Obviously, one cannot use correlations to draw the line between short and long memory if the variance is infinite. Several attempts have been made to use "correlation-like" notions in that case. In the important class of stable processes notions of covariation and codifference have been introduced and their rate of decay for various classes of stationary stable processes computed; see, e.g., Astrauskas, Levy and Taqqu (1991). Obviously, such "surrogate correlations" can be expected to carry even less information than the "real" correlations do in the case when the latter are defined [although, surprisingly, codifference turns out to characterize mixing of stationary symmetric stable processes, a fact due, essentially, to Maruyama (1970), see also Gross (1994)]. Before finishing this section we remark that when talking about tails we are thinking about the right tails. For as long as the left tails do not interfere with the right tails, we will leave it that way. When right and left tails begin to interfere with one another, we will need to say more about the left tails and how heavy they are as well. 3. Tails and rare events Here is an alternative point of view on long range dependence in heavy tailed processes. Most practitioners using heavy tailed models will agree that the most important feature of such processes is precisely their tails as expressed in probabilities of various rare events. Risk analysis, ruin probabilities, congestion and overflow analysis are just some of the key words that name such rare events in various modern applications. To be a bit more concrete here are several specific examples of rare events one usually deals with. Let, once again, Xn, n = 0,1,2,..., be a stationary stochastic process. Example 3.1. For large > 0 the event {X0 > } is a rare event, whose probability is clearly related to the tails. This event is so elementary that it does not tell us anything about the memory in the process. Example 3.2. For k 1 and large 0,1,...,k the event {X0 > 0,X1 > 1,...,Xk > k} is a rare event whose probability can carry very important information about the dependence in finite pieces of the process. Generally, the dependence we can measure using such rare events is a "tail dependence". However, for specific classes of heavy tailed processes (e.g., stable processes, linear processes, etc.) these events can provide even more information. Ch. 16: Long Range Dependence and Heavy Tails 647 Example 3.3. For large n 1 and a positive sequence (j )j 0 that does not converge to zero the event {Xj > j , j = 0,1,...,n} is a rare event and its probability is a very interesting measure of the length of memory in the process. The case j = > 0 for all j 0 seems to be especially appealing. A slight generalization of this example uses a triangular array ( (n) j )n 1, 0 j n. Here the case (n) j = (n) for j n with various asymptotic rules for ((n)) is very interesting. Example 3.4. For k 1 and large the event {X1 ++Xk > } is a tail event. Similarly to Example 3.2 the probability of this event can be used to clarify the "finite dimensional dependence" in the process. Example 3.5. Suppose that the mean = EX0 is finite, and that the stationary process Xn, n = 0,1,2,..., is ergodic. For large n 1 and > 0 the event {X1 ++Xn > n(+)} is a rare event whose probability measures the length of memory in the sense of a tendency of being over the mean for long stretches of time. It is, obviously, related to the tails. The effect of heavy tails is quite special, as will be discussed below. Example 3.6. This example has a flavor similar to that of Example 3.5. Let, once again, the process Xn, n = 0,1,2,..., be ergodic with a finite mean = EX0. Let > 0. For large the event {X1 ++Xn > n(+)+ for some n 1} is a rare event, whose probability is sometimes referred to as ruin probability in the context of risk analysis. In the queuing context various stationary quantities often have expressions of this kind for their probability tails. Adopting the risk analysis term, the ruin probability can be used to measure the length of memory; the effect of heavy tailed case is, once again, very special here. The list of examples can be continued indefinitely, and we have omitted some very interesting ones. Instead, let us look at some details of the interplay between the tails, memory and rare events in the heavy tailed case, especially in the light of Examples 3.5 and 3.6. The starting point is to adopt the lenses of large deviations: an unlikely event happens in the most likely way. We will argue that such lenses provide a powerful way of thinking about the length of memory in a process. It is unfortunate that this idea is not made more explicit in many beautiful texts on large deviations (that also reserve the term "large deviation principle" for something else); see, e.g., Deuschel and Stroock (1989) and Dembo and Zeitouni (1993). The following statement is not a rigorous mathematical statement. Nevertheless, it is often very useful as a guide and, in many ways, it captures the essence of heavy tails: the most likely way tail related rare events happen in a heavy tailed stochastic process is because of the smallest possible number of causes. This "smallest possible number of causes" is often equal to one. Thus, in Example 3.4 it turns out that, if X1,...,Xk are i.i.d. and heavy tailed, then P(X1 + + Xk > ) kP(X1 > ) P max(X1,...,Xk) > as . (3.1) 648 B. Racheva-Iotova and G. Samorodnitsky That is, the sum X1 + + Xk is most likely to be very large due to one of the terms being very large. In this case the possible "causes" are simply the individual terms in the sum. The greatest generality under which (3.1) is valid is that of subexponential distributions, introduced by Chistyakov (1964). See also Chover, Ney and Wainger (1973), and a survey in Goldie and Klüppelberg (1998). Similarly, in Example 3.5, for every > 0 P X1 + + Xn > n( + ) nP(X1 > n) as n (3.2) for exactly the same reason as in (3.1). Indeed, one of the terms ( causes) in the sum X1 + + Xn has to be exceptionally large; exactly how large can be determined by realizing that the "nonexceptional"terms in that sum add up to about n. While the domain of heavy tails over which (3.2) is valid does not extend to all subexponential distributions, it does extend to all distributions with regularly varying tails of index > 1; see, e.g., Heyde (1968) and Nagaev (1979). On the other hand, for distributions with "light" tails not only (3.1) and (3.2) fail, even their spirit is false. In fact, in the case of exponentially fast decaying tails the most likely way for the event {X1 + + Xn > n( + )} to happen is not because of a single cause, or a small number of causes but, rather, because most of the terms in the sum "conspire" to be a bit bigger than they would normally be. This is, in fact, the point of the classical large deviation principle. When Xn, n = 0,1,2,..., is a stationary heavy tailed stochastic process with memory, it is not, generally, the case that individual observations should be viewed as "causes" of rare events. The nature of such causes depends on the nature of the process and it is, sometimes, a nontrivial problem to figure out what the "right causes" are. We will see several examples below. Moreover, and this is precisely the point why we are interested in rare events, the causes, when found, typically have their effect distributed over time and it is in this way that they make the rare events happen. We argue that this temporal distribution of the effect of the "causes" on rare events is a useful way of thinking about long range dependence. There are two important classes of heavy tailed processes for which progress has been made in understanding the "right causes" of certain rare events and the way the effect of these causes is distributed over time: linear processes and infinitely divisible processes. We discuss these below. Before doing so we would like to introduce another notion related to certain rare events with a potential of being useful, in a similar way, in studying long range dependence. Certain rare events should be rather viewed as sequences of events that become more and more rare. Examples 3.3 and 3.5 are of this nature. More generally and formally, let Aj Rj be a Borel set, j = 1,2,..., such that pj := P (X1,...,Xj ) Aj 0 as j . (3.3) For n 1 define Rn = max j - i + 1: 1 i j n, (Xi,Xi+1 ...,Xj ) Aj-i+1 . (3.4) Ch. 16: Long Range Dependence and Heavy Tails 649 That is, Rn is the highest dimension of an Aj observed over the first n observations X1,...,Xn. We call Rn the functional associated with the sequence of rare events (Aj ). It is obvious that if Xn, n = 0,1,2,..., is a mixing stationary process and pj > 0 for an infinite sequence of j's then Rn with probability 1 as n . It appears to be almost obvious that the rate at which Rn grows is related to the rate at which pj decays to zero. Certain rigorous connections are, indeed, possible; other connections seem to require additional information on the process. In any case, the rate of growth of Rn is, in its own right, related to the way rare events happen and, hence, to the memory in the process. There is a very important reason to concentrate on the probabilities of certain rare events and on functionals associated with sequences of certain rare events, instead of concentrating on correlations, when trying to understand the boundary between short memory and long memory. Such rare events and functionals are often of a direct importance on their own right, as one can see by looking at the examples above and thinking, for instance, of applications in risk analysis and congestion control. On the other hand, nobody is interested in correlations on their own right. We only study correlations hoping that they are significant for whatever application we might have at hand. Unfortunately, the information that the correlations carry is often only indirect and very limited, as anyone familiar, for example, with ARCH and GARCH models realizes. 4. Some classes of heavy tailed processes 4.1. Linear processes One of the classes of heavy tailed processes we will consider is that of heavy tailed linear processes. Let n, n Z, be iid random variables. A (two-sided) linear process with the noise sequence n, n Z, is defined by Xn = j=n-j j , n = 0,1,2,..., (4.1) where j , j Z, is a sequence of (nonrandom) coefficients. We will assume that the noise variables are heavy tailed, but how heavy the tails are will be left open at the moment. It is obvious that the linear process Xn, n = 0,1,2,..., is a stationary stochastic process as long as it is well defined, meaning that the sum defining it converges. The latter is an assumption on the coefficients j . In particular, if E2 0 < and E0 = 0, then a necessary and sufficient condition for convergence of the series in (4.1) is j=- 2 j < ; (4.2) 650 B. Racheva-Iotova and G. Samorodnitsky a nonzero mean will require, in addition, the series j=- j to converge. Frequently we will assume that the noise variables have regularly varying tails. Unless one is working with constant sign coefficients (an assumption that we will not make in this chapter), it is necessary to control both right and left probability tails of the noise since, say, a negative coefficient will "translate" the left tail of the noise into the right tail of the sum in (4.1). Therefore, a typical assumption is P |0| > = L()- , lim P(0 > ) P(|0| > ) = p, lim P(0 < -) P(|0| > ) = q, (4.3) as , for some 0 and 0 < p = 1 - q 1. Here L is a slowly varying (at infinity) function. If > 2 we are in the case of finite variance, but for 2 the precise condition for convergence in (4.1) depends on the slowly varying function, and can be stated through the three series theorem. In particular, j=|j |< (4.4) for some > 0 is a sufficient condition for convergence if 0 < 1 or if 1 < 2 and E0 = 0; a nonzero mean in the latter case will also require, as before, the series j=- j to converge. A rich source of information on linear processes in Brockwell and Davis (1991). This book covers, mostly, the L2 case. For more information on the infinite variance case see, for example, Cline (1983, 1985) and Mikosch and Samorodnitsky (2000b). Heavy tailed linear processes are attractive to us because, in this case, the potential "causes" of rare events appear to be evident: those are the individual noise variables n, n Z. This intuition has been born out in a number of situations, as will be seen below. 4.2. Infinitely divisible processes A stochastic process Xn, n = 0,1,2,..., is infinitely divisible if for any k = 1,2,... there is a stochastic process Y(k) n , n = 0,1,2,..., such that the finite dimensional distributions of Xn, n = 0,1,2,..., and of k i=1 Y(k,i) n , n = 0,1,2,..., coincide. Here for i = 1,...,k, the processes Y(k,i) n , n = 0,1,2,..., are iid copies of Y(k) n , n = 0,1,2,.... Many important classes of stochastic processes are, in fact, infinitely divisible. All Gaussian processes, and all stable processes in particular, are infinitely divisible. In general, an infinitely divisible process will have two independent components, a Gaussian one and a non-Gaussian one. Since we are interested in heavy tails, for a vast majority of applications the Gaussian component will have only a negligible effect on the probabilities of rare events we consider. Therefore, we will only consider infinitely divisible processes without a Gaussian Ch. 16: Long Range Dependence and Heavy Tails 651 component. Such processes have a characteristic function of the form E exp i n=0 nXn (4.5) = exp R exp i n=0 nxn - 1 - i n=0 nxn1 |xn| 1 (dx) + i n=0 nbn for all n, n = 0,1,2,..., only finitely many of which are different from zero. Here is a -finite measure on R equipped with the product -field (the Lévy measure of the process) and bn, n = 0,1,2,..., is a constant vector in R. The Lévy measure of an infinitely divisible process is its most important feature. Often an infinitely divisible process is given in the form of a stochastic integral with respect to an infinitely divisible random measure. In that case there is a natural way to relate the Lévy measure of the process to the basic characteristics of such an integral. Unlike the linear processes in the previous subsection, it is less obvious what are the potential "causes" of rare events when one deals with infinitely divisible processes as above. There is, however, a point of view on infinitely divisible processes that turns out to be useful here. To be able to see the essence better and not to get bogged in the technical details, let us consider, first, a particular case, when R xn1 |xn| 1 (dx) < for all n = 0,1,2 .... (4.6) In that case one can rewrite (4.5) in the form E exp i n=0 nXn = exp R exp i n=0 nxn - 1 (dx) + i n=0 nbn (4.7) with bn = bn - R xn1(|xn| 1)(dx) for n 0. Let M be a Poisson random measure on R with mean measure . It is easy to check that the process R xnM(dx)-bn for n 0 is well defined and has characteristic function given by (4.7). That is, one can represent the process Xn, n = 0,1,2,..., in the sense of equality of finite dimensional distributions in the form Xn = R xnM(dx) - bn, n = 0,1,2,.... (4.8) If (z(j) = (z (j) n , n 0), j = 1,2,...) is a (measurable) enumeration of the points of the random measure M, then (4.8) means that the process Xn, n = 0,1,2,..., is the sum of (z(j)), j = 1,2,..., (shifted by the sequence (bn)). This "discrete" structure of infinitely 652 B. Racheva-Iotova and G. Samorodnitsky divisible processes makes the potential "causes" of certain rare events visible, and it is precisely the Poisson points ((z(j)), j = 1,2,...) that turn out to be such "causes". Even if the assumption (4.6) does not hold, then a representation similar to (4.8) can still be written, but this time an appropriate centering is required to make the Poisson integral to converge. The important point is that the discrete structure is still here, and the potential causes of rare events are still visible. There are various ways of summing up Poisson points to get an infinitely divisible process. A very general description is in Rosínski (1989, 1990). Sometimes it is convenient to order the Poisson points according to the value of a particular test functional. If the process is originally given in the form of a stochastic integral with respect to an infinitely divisible random measure, then one can have a more concrete structure of the Poisson points, hence better understanding of the possible causes of rare events. The literature on infinitely divisible processes is rich. The framework preferred by many authors is that of infinitely divisible probability laws on Banach (or other nice) spaces. See for example Araujo and Giné (1980) and Linde (1986). A very general treatment of stochastic integrals with respect to infinitely divisible random measures as well as representations of infinitely divisible processes as such stochastic integrals is in Rajput and Rosínski (1989). An important and reasonably well understood class of infinitely divisible processes is that of -stable processes. The latter are characterized by the following scaling property of their Lévy measure: (rA) = r(A) for all measurable A R and r > 0. (4.9) Here is a parameter with the range 0 < < 2. See Samorodnitsky and Taqqu (1994) for information on stable processes; the structure of stationary stable processes has been elucidated by J. Rosinski; see, e.g., Rosínski (1998). 5. Rare events, associated functionals and long range dependence Suppose that we are considering a parametric family of laws of a stationary stochastic process Xn, n = 0,1,2,.... Let be the (generally, infinite dimensional) parameter space. We are interested in significant changes ("phase transitions") in the rate of decay of probabilities of certain rare events and/or in the rate of growth of the functionals associated with sequences of rare events that may occur when the parameter crosses the boundary between a subset 1 of and its complement. We argue that certain phase transitions of this kind can be viewed as transitions between short and long range dependence. It is clear that it is not useful to view every significant change in, say, probabilities of rare events as an indication of interesting and important things happening to the memory of the process. Other factors may be in play as well, most significantly related to the heaviness of the tails. If, for example, one of the components of parameter governs how heavy the tails of X0 are, one can very easily induce a very significant change in the probabilities of Ch. 16: Long Range Dependence and Heavy Tails 653 certain rare events by simply changing that particular component of the parameter without doing anything to the memory of the process. In the examples in the sequel we will be careful to look for phase transitions that do not involve changing how heavy the tails are. We will see several examples of such phase transitions indicating a shift from short to long memory below. We present some known results; these are quite scarce. When appropriate, we supplement those with conjectures. In other cases we have performed numerical studies to try to guess whether a phase transition occurs and, if so, of what kind. 5.1. Unusual sample mean and long strange segments for heavy tailed linear processes Here we consider the sequence of rare events of the Example 3.5 An = {X1 + + Xn > n( + )} (for a fixed > 0) and the corresponding associated functional Rn = max j - i + 1: 1 i j n, Xi + Xi+1 + + Xj j - i + 1 > + . (5.1) We will keep the distribution of the noise variables n, n Z, in the heavy tailed linear processes of Section 4.1 fixed; it is assumed to have the regular variation property (4.3) with > 1. In particular, the parameter which is responsible for the heaviness of the tails is kept fixed. We will also assume that the E0 = 0. In this case the parameter space is = = (...,-1,0,1,2,...) RZ , satisfying (4.2) if > 2 or (4.4) if 1 < 2 . (5.2) Let 1 be the set of all sequences RZ satisfying j=|j | < . (5.3) Note that the set 1 contains the parameter sequence j = 1(j = 0), j Z, in which case the linear process is an iid sequence. It turns out that for any value of the parameters in 1 the functionals Rn defined by (5.1) grow at the same rate, i.e., at the same rate as for an iid sequence with the same marginal tails. This has been established in Mansfield, Rachev and Samorodnitsky (2001). Specifically, let F be the distribution function of the noise random variable 0 and define the usual quantile sequence an = 1 1 - F (n). (5.4) 654 B. Racheva-Iotova and G. Samorodnitsky Here for a function U on [0,), U denotes its generalized inverse U (y) = inf s: U(s) y . Note that, by (4.3), the sequence (an) is regularly varying at infinity with exponent 1/. See Resnick (1987) for more information on regular varying tails and their quantile functions. For > 0 let Z be a Fréchet random variable with P(Z z) = exp -z, z > 0. (5.5) Assume (5.3). Then the numbers M+() = max sup - h > max 1 , 1 2 (5.10) Ch. 16: Long Range Dependence and Heavy Tails 655 and L2 is a slowly varying function. Clearly any such parameter vector is in c 1 . Define bn = 1 (an), (5.11) n 1 and note that sequence (bn) is regularly varying with exponent 1/(h). Then b-1 n Rn p1/h (1 - h) -1/h c 1/h + + c 1/h - Zh (weakly) as n . (5.12) See Theorem 2.1 in Rachev and Samorodnitsky (2001). Since bn grows faster than an does, under the assumptions (5.9) the sequence Rn does grow faster than an and, hence, faster than in the iid case and, more generally, faster than it is the case for any 1. Both results (5.7) and (5.12) are, in the final analysis, a consequence of change in the temporal distribution of the effect of the individual "causes": exceptionally large or exceptionally small values of the noise variables (m). In fact, the contribution of each individual noise variable m to the sum Xi + Xi+1 + + Xj in (5.1) is m j-m d=i-m d. The intuition of heavy tailed large deviations says that it is a single m that is most likely to be responsible for a large value of Rn. Therefore, one would expect that for large xn P(Rn > xn) P for some m = ...,-1,0,1,..., j-m d=i-m d m > (j - i + 1)( + ) for some 1 i j n, j - i + 1 xn . (5.13) This turns out to be valid. Moreover, this intuition allows one, in both cases (i.e., under (5.3) and under (5.9)) to select the right rate of growth for xn in (5.13), which is equivalent to selecting the appropriate normalization to Rn. It is a bit surprising that less is known about the apparently easier problem of identifying the rate of decay of probabilities pn = P(X1 + + Xn > ( + )n) for > 0 as n . It has been checked that under the assumption j=j|j | < (5.14) which defined a proper subset of 1, pn n-(-1) L(n)- p j=- j + + q j=- j as n , (5.15) 656 B. Racheva-Iotova and G. Samorodnitsky where p, q and L are defined in (4.3), and one assumes that q > 0 if j=- j < 0. See Lemma A.5 in Mikosch and Samorodnitsky (2000b). It looks very plausible that (5.15) holds for every parameter 1. The logic of large deviations indicates that, under the assumptions (5.9), pn is regularly varying with exponent -(h - 1) at infinity, but nobody has presented a rigorous proof so far. 5.2. Ruin probability for heavy tailed linear processes In this subsection we consider the rare event in Example 3.6, A = {X1 + + Xn > n( + ) + for some n 1}, when > 0 is fixed and is large. Unfortunately, the result for the entire set 1 is not available here. However, there is a result for the subset of 1 defined by (5.14). In the latter case, the probability of the event A (commonly referred to as the ruin probability) satisfies P(A) pM(1) + () + qM(1) - () ( - 1) -(-1) L() as , (5.16) where M (1) + () = sup - n + for some n 1 . (5.18) Once again, this turns out to be valid (at least, under the assumption (5.14)). The problem of the behaviour of the ruin probability for c 1 has not, to the best of our knowledge, been treated. One can pursue the logic of large deviations, leading to (5.18). This leads us to conjecture that, under the assumptions (5.9), P(A) is, as a function of , regularly varying with exponent -(h - 1) at infinity. Based on the above discussion (admittedly, some part of it is "hard" results, and another part is conjectures) one can argue that a significant change occurs for heavy tailed linear processes as parameter crosses the boundary between 1 and its complement. Not only the order of magnitude of the probabilities of certain rare events, and of certain functionals associated with sequences of certain rare events, appears to change at that boundary but another interesting phenomenon seems to happen. Various orders of magnitude do not Ch. 16: Long Range Dependence and Heavy Tails 657 change as the parameter varies inside of 1; not only these orders of magnitude do change at the boundary but, also, they may keep changing as the parameter varies outside of 1. It is important to make a remark at this moment. It does appear that one should, in fact, look at the behavior of a family of related rare events, or a family of sequences of related rare events, if one wants to see what precisely happens at a boundary. For example, the assumptions (5.9) do not cover the entire c 1 . We conjecture, however, that important changes happen when one moves from 1 into c 1 and not, necessarily, into the subset of c 1 defined by (5.9). It is likely that, in order to see these changes, one should look not only, say, at the event An = {X1 + + Xn > n( + )} but also at some related rare events, for example at the event Bn = {|X1| + + |Xn| > n(1 + )}, with 1 = E|X1|. It is also interesting to mention that, in the case > 2, the condition (5.3) also implies the absolute summability of correlations (i.e., (2.1) fails). 5.3. Rare events for stationary stable processes The situation regarding "phase transitions" for general stationary heavy tailed infinitely divisible processes of Section 4.2 has been investigated even less than it is the case with the heavy tailed linear processes. There are several reasons for this, including relatively complicated structure of stationary infinitely divisible processes and its very involved parameter space, which is a space of measures. Most of the known results are for stable processes, whose structure is better understood. We present here the results for a subclass of stationary stable processes, where we will be able to see a "phase transition". Specifically, let Xn, n = 0,1,2,..., be the linear fractional symmetric -stable noise, 1 < < 2. For a fixed the law of the process has an important parameter H (0,1). That is, Xn = R fn(x)M(dx), n = 0,1,2,..., (5.19) where M is a symmetric -stable random measure on the real line with the Lebesgue control measure, and fn(x) = f (x + n) - f (x + n + 1), n = 0,1,2,..., x R, with f (x) = a (-x) H-1/ + - (-x - 1) H-1/ + + b (-x) H-1/ - - (-x - 1) H-1/ - (5.20) if H (0,1), H = 1/. Here a and b are real numbers not simultaneously equal to zero. For H = 1/ one has two choices, f (x) = a1 [-1,0] (x) (5.21) and f (x) = a ln|x| - ln|x + 1| . (5.22) 658 B. Racheva-Iotova and G. Samorodnitsky In the latter two cases a is a real number different from zero. The resulting symmetric -stable process in (5.19) is an ergodic stationary process. It is the increment process of the linear fractional symmetric -stable motion if H = 1/, an iid sequence ( the increment process of the symmetric -stable Lévy motion) under (5.21), and the increment process of the log-fractional symmetric -stable motion under (5.22). All of these processes are H -self-similar with stationary increments. We refer the reader to Samorodnitsky and Taqqu (1994) for information on stable processes, their integral representations and on self-similar processes. The parameter space is, then, the collection of all triples (H,a,b) with H (0,1), H = 1/, and a,b real, a2 + b2 > 0, together with the triples (H,a,i) with H = 1/, a real, different from zero, and i = 1,2, depending on the choice between (5.21) and (5.22). Let 1 be the subset of corresponding to 0 < H < 1/. We consider, once again, the rare event in the Example 3.6, A = {X1 + + Xn > n( + ) + for some n 1}, when > 0 is fixed and is large. Of course = 0 here. Then P(A) K -(-1) if 0 < H < 1 or under (5.21), K -(-1) (log) under (5.22), K H -(1-H) if 1 < H < 1 (5.23) as . Here K is a finite positive constant that depends on , H , a and b, but not on . See Proposition 4.4 in Mikosch and Samorodnitsky (2000a). Observe that the order of magnitude of the ruin probability remains the same as H varies in (0,1/). Furthermore, this order of magnitude is the same as under independence. On the other hand, as H varies in the interval (1/,H), the order of magnitude of the ruin probability is greater than that in the case of independence and, furthermore, this order of magnitude changes with H . As we argued earlier, this gives us a reason to say that the range H (0,1/) corresponds to short memory, and the range H (1/,1) corresponds to long memory. It is interesting that, in this case, the boundary H = 1/ contains two points, corresponding to (5.21) and to (5.22), and it makes sense to view the latter as corresponding to long memory, while the former is the independent case. Here is how the intuition of large deviations works here. As mentioned in Section 4.2, the process Xn, n = 0,1,2,..., can be represented as a sum of Poisson points. In the symmetric stable case this can be done as follows. One can write (in terms of equality of finite dimensional distributions) the process given by (5.19) in the form Xn = C1/ j=1 j -1/ j g(Vj )-1/ fn(Vj ), n = 0,1,2,..., (5.24) where C is a finite positive constant that depends only on , g a strictly positive measurable function such that R g(x)dx = 1, (n)n 1 is an iid sequence of Rademacher variables Ch. 16: Long Range Dependence and Heavy Tails 659 (P(n = -1) = P(n = 1) = 1/2), (n)n 1 are the points of a unit rate Poisson process on (0,), and (Vn)n 1 is an iid sequence of real valued random variables with common density g. Moreover, the three sequences are mutually independent. See Samorodnitsky and Taqqu (1994, Section 3.10). Rewriting P(A) = P C1/ sup n 1 j=1 j -1/ j g(Vj )-1/ n k=1 fk(Vj ) - n > u , the intuition of rare events says that it is a single one of the Poisson points (in the function space) (j -1/ j g(Vj )-1/ n k=1 fk(Vj ), n = 1,2,...) that is most likely to cause the ruin. This intuition translates into P(A) j=1 P C1/ -1/ j g(Vj )-1/ sup n 1 j n k=1 fk(Vj ) - n > u (5.25) as . It is the equivalence (5.25) that allows one to understand the change in the way the effect of these Poisson points is distributed over time as the parameter H crosses the boundary 1/. Interestingly, the probabilities of the rare events of Example 3.5 An = {X1 + + Xn > n( + )} do not indicate anything interesting happening at the point H = 1/. In fact, since the processes under considerations are the increments of H -self-similar processes, pn = P(X1 + + Xn > n) = P nH X1 > n const - n-(1-H) as n . Hence the order of magnitude of pn changes "ordinarily" as H crosses the boundary 1/. As mentioned at the end of Section 5.2, one should, probably, look at certain related rare events as well. The behavior of the associated functionals in (5.1) does not seem to have been studied so far. 5.4. High dimensional joint tails for a linear process with stable innovations We conclude this chapter with a simulation study of a situation in which no analytical results are yet available. Consider a heavy tailed linear process (4.1). For a fixed > 0 we consider the probability of the event An = {Xj > , j = 0,...,n}, when n is large. We are within the framework of Example 3.3. The discussion above makes it possible to conjecture that there is a phase transition at the boundary between the set 1 in (5.3) and its complement in the set in (5.2). To check this conjecture we ran a simulation of 107 realizations of a linear process with symmetric -stable innovations with different . We estimated both the probability P(An) as a function of n and the rate of growth of the associated functional Rn = max j - i + 1: 1 i j n, min(Xi,...,Xj ) > . (5.26) 660 B. Racheva-Iotova and G. Samorodnitsky We simulated first an AR(1) process with j = 0 for j = 0 or 1, 0 = 1 and varying 1. This choice of coefficients is, clearly, in 1. Then we simulated a linear process with j = 0 for j < 0 and j = (1 + j)-0.8 for j 0 (and > 1/0.8). This choice of parameters is in the set c 1 . While a simulation study of this type cannot provide a definite answer, it seems to indicate that for the AR(1) process the probabilities P(An) decay exponentially fast with n. We plotted in Figure 3 the ratio -(logP(An))/n over the range of n for in the set {0.1,0.2,0.3,0.4} for the AR(1) process with = 1.5 and 1 = 0.5. Notice how the curves become horizontal. In comparison, our simulations seem to indicate that for the linear process with j = (1 + j)-0.8, j 0, the probabilities P(An) decay hyperbolically fast with n. We plotted Fig. 3. The ratio -(logP (An))/n for the AR(1) process with = 1.5 and 1 = 0.5. Fig. 4. A plot of P (An) against n for a linear process with = 1.5 and j = (1 + j)-0.8, j 0. Log­log scale. Ch. 16: Long Range Dependence and Heavy Tails 661 Fig. 5. A plot of (logRn)/logn for a linear process with = 1.5 and j = (1 + j)-0.8, j 0. in Figure 4 P(An) against n in the log scale, for the case = 1.5. Here we use in the set {0.1,1,5,40}. Notice how linear the plots are. Finally, we present a plot of (logRn)/logn for the long memory process with = 1.5 and {0.1,0.2,0.5,1} (Figure 5). Our intuition tells us that in that case Rn should grow polynomially fast with n, and the simulation appears to bear this out. Once again, even though a simulation study is not a conclusive evidence of a phase transition at the boundary between the set 1 and its complement, its results are consistent with such a phase transition. References Araujo, A., Giné, E., 1980. The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York. Astrauskas, A., Levy, J., Taqqu, M.S., 1991. The asymptotic dependence structure of the linear fractional Lévy motion. Lietuvos Matematikos Rinkinys (Lithuanian Mathematical Journal) 31, 1­28. Beran, J., 1992. Statistical methods for data with long-range dependence. Statistical Science 7, 404­416. With discussions and rejoinder, pp. 404­427. Beran, J., 1994. Statistics for Long-Memory Processes. Chapman & Hall, New York. Brockwell, P., Davis, R., 1991. Time Series: Theory and Methods, 2nd edition. Springer-Verlag, New York. Chistyakov, V., 1964. A theorem on sums of independent random variables and its applications to branching random processes. Theory of Probability and its Applications 9, 640­648. Chover, J., Ney, P., Wainger, S., 1973. Functions of probability measures. Journal ďAnalyse Mathématique 26, 255­302. Cline, D., 1983. Estimation and linear prediction for regression, autoregression and ARMA with infinite variance data. Ph.D. Thesis. Colorado State University. Cline, D., 1985. Linear prediction of ARMA processes with infinite variance. Stochastic Processes and their Applications 19, 281­296. Cox, D., 1984. Long-range dependence: a review. In: David, H., David, H. (Eds.), Statistics: An Appraisal. Iowa State University Press, pp. 55­74. 662 B. Racheva-Iotova and G. Samorodnitsky Dembo, A., Zeitouni, O., 1993. Large Deviations Techniques and Applications. Jones and Bartlett, Boston. Deuschel, J.-D., Stroock, D., 1989. Large Deviations. Academic Press, Boston. Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal Events for Insurance and Finance. Springer-Verlag, Berlin. Goldie, C., Klüppelberg, C., 1998. Subexponential distributions. In: A Practical Guide to Heavy Tails: Statistical Techniques for Analysing Heavy Tailed Distributions. Birkhäuser, Boston, pp. 435­460. Gomez, C., Selman, B., Crato, N., 1997. Heavy-tailed probability distributions in combinatorial search. In: Smolka, G. (Ed.), Principles and Practice of Constraint Programming. Springer, Berlin, pp. 121­135. Gross, A., 1994. Some mixing conditions for stationary symmetric stable stochastic processes. Stochastics Processes and their Applications 51, 277­295. Heyde, C., 1968. On large deviation probabilities in the case of attraction to a non-normal stable law. Sankya. Series A 30, 253­258. Hurst, H., 1951. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770­808. Hurst, H., 1955. Methods of using long-term storage in reservoirs. Proceedings of the Institution of Civil Engineers, Part I, 519­577. Linde, W., 1986. Probability in Banach Spaces ­ Stable and Infinitely Divisible Distributions. Wiley, Chichester. Mandelbrot, B., 1965. Une classe de processus stochastiques homothetiques a soi; application a loi climatologique de H.E. Hurst. Comptes Rendus ĺAcadémie Sciences Paris 240, 3274­3277. Mandelbrot, B., 1983. The Fractal Geometry of Nature. Freeman, San Francisco. Mandelbrot, B., Van Ness, J., 1968. Fractional Brownian motions, fractional noises and applications. SIAM Review 10, 422­437. Mandelbrot, B., Wallis, J., 1968. Noah, Joseph and operational hydrology. Water Resources Research 4, 909­918. Mandelbrot, B., Wallis, J., 1969. Robustness of the rescaled range R/S in the measurement of noncyclic long-run statistical dependence. Water Resources Research 5, 967­988. Mansfield, P., Rachev, S., Samorodnitsky, G., 2001. Long strange segments of a stochastic process and long range dependence. The Annals of Applied Probability 11, 878­921. Maruyama, G., 1970. Infinitely divisible processes. Theory of Probability and its Applications 15, 1­22. Mikosch, T., Samorodnitsky, G., 2000a. Ruin probability with claims modeled by a stationary ergodic stable process. The Annals of Probability 28, 1814­1851. Mikosch, T., Samorodnitsky, G., 2000b. The supremum of a negative drift random walk with dependent heavytailed steps. The Annals of Applied Probability 10, 1025­1064. Müller, U., Dacorogna, M., Pictet, O., 1998. Hill, bootstrap and jacknife estimators for heavy tails. In: A Practical Guide to Heavy Tails: Statistical Techniques for Analysing Heavy Tailed Distributions. Birkhäuser, Boston, pp. 283­310. Nagaev, S., 1979. Large deviations of sums of independent random variables. The Annals of Probability 7, 745­ 789. Park, K., Willinger, W. (Eds.), 2000. Self-Similar Network Traffic and Performance Evaluation. Wiley, New York. Rachev, S., Samorodnitsky, G., 2001. Long strange segments in a long range dependent moving average. Stochastic Processes and their Applications 93, 119­148. Rajput, B., Rosínski, J., 1989. Spectral representations of infinitely divisible processes. Probability Theory and Related Fields 82, 451­488. Resnick, S., 1987. Extreme Values, Regular Variation and Point Processes. Springer-Verlag, New York. Rosínski, J., 1989. On path properties of certain infinitely divisible processes. Stochastics Processes and their Applications 33, 73­87. Rosínski, J., 1990. On series representation of infinitely divisible random vectors. The Annals of Probability 18, 405­430. Rosínski, J., 1998. Structure of stationary stable processes. In: Adler, R., Feldman, R., Taqqu, M. (Eds.), A Practical Guide to Heavy Tails: Statistical Techniques for Analysing Heavy Tailed Distributions. Birkhäuser, Boston, pp. 461­472. Samorodnitsky, G., Taqqu, M., 1994. Stable Non-Gaussian Random Processes. Chapman & Hall, New York.