Portfolio Theory Dr. Andrea Rigamonti andrea.rigamonti@econ.muni.cz Lecture 3 Content: • Statistics notions Statistics notions A random variable (r.v.) is one whose value depends on the outcome of a random experiment. • A discrete r.v. can only take certain specific values. • A continuous r.v. can take any value (inside a certain interval) A probability distribution is a function that assigns a certain probability that each possible outcome will occur. • For a discrete r.v. a probability mass function (or probability distribution function) is the function that links each outcome with the corresponding probability. • For a continuous r.v. we have a probability density function (pdf), and the probability of the events is given by the area under a function (i.e. by its integral). Statistics notions The most popular continuous distribution is the Gaussian (or normal) distribution. Characteristics: • Symmetric • Unimodal • Completely described by its mean µ and variance σ2 • Any linear transformation of a normally distributed r.v., or any linear combination of independent normally distributed r.v., is itself normally distributed. The pdf of a normally distributed r.v. is given by: 𝑓 𝑦 = 1 2𝜋𝜎 𝑒−(𝑦−𝜇)2/2𝜎2 Statistics notions The pdf of a normal r.v. has a bell shape: • The area under the pdf measures the probability • Probability is measured in the range [0,1], so the entire area under the curve (the sum of probability of all events) is 1 • The probability of an exact value is zero (a line has area 0) • We can compute the probability of an interval Statistics notions A standard normal distribution is a normal distribution with mean µ = 0 and standard deviation 𝜎 = 1. Given a normally distributed r.v. 𝑌, a standard normally distributed r.v. is given by: 𝑍 = 𝑦 − 𝜇 𝜎 ~𝑁(0,1) The cumulative distribution function (or cumulative density function), cdf, of a continuous r.v. measures the probability that the r.v. is less (or equal to) a certain value. The cdf 𝐹(𝑦) and is given by the integral of the pdf: 𝐹 𝑦 = 𝑃 𝑌 ≤ 𝑦 = න −∞ 𝑦 𝑓(𝑡) ⅆ𝑡 Statistics notions The cdf of a normally distributed r.v. has a sigmoid shape: • It is given by: F 𝑦 = 𝑃 𝑌 ≤ 𝑎 = ‫׬‬−∞ 𝑎 1 2𝜋𝜎 𝑒−(𝑦−𝜇)2/2𝜎2 ⅆ𝑦 • There are no exact analytical solutions for this integral • The values of the cdf is therefore computed numerically for the standard normal and provided into tables Statistics notions The average value of n financial returns 𝑅𝑖 is known as measure of location or measure of central tendency. • Mode: the most frequently occurring value • Median: the middle value when the elements are arranged in ascending order • Arithmetic mean: 𝑅 𝐴 = 1 𝑛 ෍ 𝑖=1 𝑛 𝑅𝑖 • Geometric mean: 𝑅 𝐺 = 𝑛 ෑ 𝑖=1 𝑛 (1 + 𝑅𝑖) − 1 Statistics notions With log returns 𝑅 𝐴 and 𝑅 𝐺 are the same With simple returns 𝑅 𝐺 < 𝑅 𝐴 The geometric mean accounts for compounding and is less affected by outliers BUT it cannot be used as estimate for future returns The most important measures of spread are: • The range: max – min • The interquartile range: 𝑄3 − 𝑄1 where 𝑄3 is the 3 4 (𝑛 + 1) 𝑡ℎ value and 𝑄1 is the 1 4 (𝑛 + 1) 𝑡ℎ Statistics notions • The variance: 𝜎2 = 1 𝑛 − 1 ෍ 𝑖=1 𝑛 (𝑅𝑖 − 𝑅 𝐴)2 • The standard deviation σ (square root of the variance) • The coefficient of variation: 𝐶𝑉 = 𝜎 𝜇 𝐴 The bar in 𝜎2 indicates a (sample) estimate. We divide by 𝑛 − 1 to correct for the loss of a degree of freedom. The formula for the population variance is: 𝜎2 = 1 𝑛 ෍ 𝑖=1 𝑛 (𝑅𝑖 − μ 𝐴)2 where μ 𝐴 is the “true” mean. Statistics notions In practice, true parameter values are generally not available and we use estimates. To keep the notation light, the bar is usually not used when there is no risk of confusion. Higher moments are important when data are not gaussian • The skewness measures the asymmetry around the mean: 𝑠 = 1 𝜎3 1 𝑛 − 1 ෍ 𝑖=1 𝑛 𝑅𝑖 − 𝑅 𝐴 3 • The kurtosis measures the fatness of the tails and how peaked at the mean the distribution is: 𝑘 = 1 𝜎4 1 𝑛 − 1 ෍ 𝑖=1 𝑛 𝑅𝑖 − ത𝑅 4 Statistics notions Normal vs positive (or right) skewed distribution Normal (k=3) vs leptokurtic (k>3) distribution Statistics notions Measures of association evaluate links between variables: • The covariance between two variables 𝑋 and 𝑌 is: σ 𝑋,𝑌 = 𝐶𝑜𝑣(𝑋, 𝑌) = 1 𝑛 − 1 ෍ 𝑖=1 𝑛 (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌) • The (Pearson) correlation between 𝑋 and 𝑌 is: ρ 𝑋,𝑌 = 𝐶𝑜𝑟𝑟 𝑋, 𝑌 = 𝐶𝑜𝑣 𝑋, 𝑌 𝑆𝐷 𝑋 𝑆𝐷 𝑌 = σ 𝑋,𝑌 𝜎 𝑋 𝜎 𝑌 A negative (positive) covariance means that the variables move, on average, in the opposite (same) direction. Statistics notions The correlation takes value between -1 (perfect negative correlation) and 1 (perfect positive correlation). σ 𝑋,𝑌 or ρ 𝑋,𝑌 equal to zero indicate no linear correlation, but not necessarily independence. Only if 𝑋 and 𝑌 are joint normally distributed, zero covariance implies that they are independent. Statistics notions The variance-covariance matrix, or simply covariance matrix, is a symmetric matrix that contains all the variances (on the diagonal) and covariances (off the diagonal) of the data on which it is estimated: ∑= 𝜎1 2 𝜎1,2 … 𝜎1,𝑛 𝜎2,1 ⋮ 𝜎2 2 ⋮ … ⋱ 𝜎2,𝑛 ⋮ 𝜎 𝑛,1 𝜎 𝑛,2 … 𝜎 𝑛 2 = 𝜎1 2 𝜎1,2 … 𝜎1𝑛 𝜎1,2 ⋮ 𝜎2 2 ⋮ … ⋱ 𝜎2,𝑛 ⋮ 𝜎1,𝑛 𝜎2,𝑛 … 𝜎 𝑛 2 (Σ is the standard notation to indicate the covariance matrix; do not confuse it with the sum symbol!) Statistics notions The relationship between a dependent variable and an explanatory variable can be described with the linear regression model with a single regressor: 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + ε𝑖 This equation describes a line that fits the data, plus a disturbance (or error) term ε𝑖. To fit the data we choose the parameters 𝛼 and 𝛽 using the Ordinary Least Squares (OLS): take each vertical distance from the point to the line, square it and then minimize the total sum of the areas of squares. Statistics notions • For each data point 𝑖 we denote the fitted value obtained from the regression line as ො𝑦𝑖. • ොε𝑖 denotes the residual: ොε𝑖 = 𝑦𝑖 − ො𝑦𝑖 • The OLS method minimizes the residual sum of squares (RSS): ෍ 𝑖=1 𝑛 ොε𝑖 2 = ෍ 𝑖=1 𝑛 (𝑦𝑖 − ො𝑦𝑖)2 Statistics notions The values of 𝛼 and 𝛽 selected by minimizing the RSS are ො𝛼 and መ𝛽. The equation of the fitted line is therefore: ො𝑦𝑖 = ො𝛼 + መ𝛽𝑥𝑖 The OLS estimators for the single regressor case are: መ𝛽 = σ𝑖=1 𝑛 (𝑥𝑖 − ҧ𝑥)(𝑦𝑖 − ത𝑦) σ𝑖=1 𝑛 (𝑥𝑖 − ҧ𝑥)2 ො𝛼 = ത𝑦 − መ𝛽 ҧ𝑥 If 𝐸 ε𝑖 = 0, 𝑣𝑎𝑟 ε𝑖 = 𝜎2 < ∞, 𝑐𝑜𝑣 ε𝑖, ε𝑗 = 0, 𝑐𝑜𝑣(ε𝑖, 𝑥𝑖) = 0, then estimators ො𝛼 and መ𝛽 determined by OLS are BLUE: Best Linear Unbiased Estimator. Statistics notions The multiple linear regression model with 𝑘 regressors (and 𝑘 − 1 explanatory variables) has the form 𝑦𝑡 = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽 𝑘 𝑥 𝑘𝑖 + ε𝑖 • Each 𝛽1, 𝛽2, … , 𝛽 𝑘 is known as partial regression coefficient, and represents the partial effect of the given explanatory variable on the explained variable, after holding constant (or eliminating the effect of) all other explanatory variables. • The regressor 𝑥1𝑖 is not written explicitly, as it is always equal to 1. • The intercept 𝛽1 𝑥1𝑖 = 𝛽1is not an explanatory variable, although for notational convenience 𝑘 might be referred to as the “number of explanatory variables”. Statistics notions The model can be written in matrix form as 𝒚 = 𝑿𝜷 + 𝜺 where 𝒚 and 𝜺 are a 𝑛 x 1 vectors, 𝜷 is a 𝑘 x 1 vector, and 𝑿 is a 𝑛 x 𝑘 matrix. • Written in extended form, the equation describing the model is 𝑦1 𝑦2 ⋮ 𝑦𝑛 = 1 1 𝑥12 𝑥13 𝑥22 𝑥23 ⋯ 𝑥1𝑘 ⋯ 𝑥2𝑘 ⋮ 1 ⋮ ⋮ 𝑥 𝑛2 𝑥 𝑛3 ⋮ ⋮ ⋯ 𝑥 𝑛𝑘 𝛽1 𝛽2 ⋮ 𝛽 𝑘 + ε1 ε2 ⋮ ε 𝑛 • The RSS has to be minimized with respect to all the elements of 𝜷 𝑅𝑆𝑆 = ො𝜺′ො𝜺 = ොε1 ොε2 ⋯ ොε 𝑛 ොε1 ොε2 ⋮ ොε 𝑛 = ොε1 2 + ොε2 2 + ⋯ + ොε 𝑛 2 = ෍ 𝑖=1 𝑛 ොε𝑖 2 Statistics notions • The coefficient estimates are given by ෡𝜷 = መ𝛽1 መ𝛽2 ⋮ መ𝛽 𝑘 = 𝑿′ 𝑿 −1 𝒚 • The variance of the error terms is given by 𝑠2 = ො𝜺′ො𝜺 𝑛−𝑘 . We have 𝑛 − 𝑘 degrees of freedom because 𝑘 parameters were estimated. • The covariance matrix of ෡𝜷 is given by 𝑣𝑎𝑟 ෡𝜷 = 𝑠2 𝑿′ 𝑿 −1 • The standard errors of the coefficients in ෡𝜷 are thus given by the square roots of the terms on the leading diagonal of 𝑣𝑎𝑟 ෡𝜷 .