PORTFOLIO THEORY โ€“ LECTURE NOTES Dr. Andrea Rigamonti MEAN-VARIANCE OPTIMIZATION From an economistโ€™s point of view, an investor that optimizes a portfolio is trying to maximize a utility function. An extremely simple utility function is the linear utility function: ๐‘ˆ(๐‘‰) = ๐‘Ž + ๐‘๐‘‰, ๐‘ > 0 where ๐‘ˆ(๐‘‰) is the utility that the investor gets depending on the value ๐‘‰ of the portfolio. This function simply says that the higher the wealth, the higher the utility. Its shape is a line, and the parameter ๐‘ determines how much the utility increases following a wealth increase. ๐‘ is assumed to be positive, otherwise the investor would be indifferent (๐‘ = 0) or less satisfied (๐‘ < 0) when wealth increases. In other words, only the mean return of the portfolio matters. Markowitz (1952) revolutionized the field by adding risk to the equation. His standard approach assumes that an investor cares not only about the mean but also the variance of the portfolio returns, i.e. the investor has a mean-variance utility. Given a certain mean return, the utility increases as the variance (which quantifies risk) gets lower. Equivalently, given a certain level of variance, the utility increases with a higher mean return. The decision in the trade-off between return and risk is quantified by a risk aversion parameter ๐›พ. A higher ๐›พ means that the investor is more risk averse and will therefore require a higher compensation for an increased risk. A lower ๐›พ means the investor has a lower risk-aversion and will be willing to take more risk. In other words, given a set of assets with a certain mean and variance, the lower the ๐›พ of the investor, the more he will create an optimal portfolio with a higher mean return but also a higher variance. To model such preferences, a quadratic utility function is used: ๐‘ˆ(๐‘‰) = ๐‘‰ โˆ’ ๐›พ 2 ๐‘‰2 , ๐›พ > 0 ๐›พ is assumed to be positive so that the utility function is concave: Source: https://financestu.com This implies that the investor is risk-averse. With ๐›พ = 0 the investor would be indifferent to risk, while ๐›พ < 0 would mean that the investor is risk taker (i.e., prefers more risk for the same amount of wealth, which is obviously not realistic). Remember that the expected return of a portfolio is given by ๐œ‡ ๐‘ƒ = ๐’˜โ€ฒ ๐ = ๐‘‰, where ๐’˜ is the vector of portfolio weights and ๐ is the vector of mean return of the single assets. Moreover, recall that the variance is the expected value of the squared deviation from the mean. So, the utility function becomes: ๐‘ˆ(๐’˜) = ๐’˜โ€ฒ๐ โˆ’ ๐›พ 2 ๐’˜โ€ฒ ฦฉ๐’˜ Therefore, given a risk-free asset and a set of ๐‘ risky assets with mean returns ๐ and covariance matrix ๐šบ, and a certain risk aversion coefficient ๐›พ, the investor we are considering wants to select the weights ๐’˜ in a way that maximizes the following utility function: max ๐’˜ ๐’˜โ€ฒ๐ โˆ’ ๐›พ 2 ๐’˜โ€ฒ ฦฉ๐’˜ This is an unconstrained optimization problem easy to solve. We just need to set the first-order condition, i.e., take the partial derivative with respect to ๐’˜ and set it equal to zero: ๐œ•๐‘ˆ(๐’˜) ๐œ•๐’˜ = ๐ โˆ’ 2๐›พ 2 ฦฉ๐’˜ = ๐ โˆ’ ๐›พฦฉ๐’˜ = ๐ŸŽ We then solve for ๐’˜: ฦฉ๐’˜ = 1 ๐›พ ๐ ๐’˜ = 1 ๐›พ ฦฉโˆ’๐Ÿ ๐ So, the closed-form solution that gives the optimal weights for the risky assets is: ๐’˜ ๐‘ผ = 1 ๐›พ ฦฉโˆ’๐Ÿ ๐ where โ€œUโ€ stands for โ€œUtilityโ€, while the weight for the risk-free asset is equal to 1 โˆ’ ๐’˜ ๐‘ผโ€ฒ๐Ÿ, where ๐Ÿ is a vector of ones with length equal to the number of risky assets. The resulting optimal expected utility is: ๐‘ˆ(๐’˜ ๐‘ผ) = 1 2๐›พ ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐ Notice that we do not need to explicitly include the risk-free asset in the asset menu, as it is equivalent and simpler to work with excess returns, i.e. with the returns of the risky assets from which we subtracted the risk-free rate. A more difficult version is the one where we impose a full investment in the risky assets. In other words, the sum of the weights of the risky assets must be equal to 1, and nothing is invested in the risk-free asset. Hence, we have to solve the following constrained optimization problem: max ๐’˜ ๐’˜โ€ฒ๐ โˆ’ ๐›พ 2 ๐’˜โ€ฒฦฉ๐’˜ subject to: ๐’˜โ€ฒ๐Ÿ = 1 To solve this problem, we use the method of Lagrange multipliers. First, we need to define the Lagrangian function, i.e., a modified version of the objective function that incorporates the constraint in this way: ๐ฟ(๐’˜, ๐œ†) = ๐’˜โ€ฒ๐ โˆ’ ๐›พ 2 ๐’˜โ€ฒฦฉ๐’˜ + ๐œ†[1 โˆ’ ๐’˜โ€ฒ๐Ÿ] where ๐œ† is the Lagrange multiplier. By including this additional term we can now solve an unconstrained problem instead of a constrained one. Therefore, we set the first order conditions for the Lagrangian function. The conditions involve two simultaneous equations, as we have to compute the partial derivative both with respect to ๐’˜ and to ๐œ†. ๐œ•๐ฟ ๐œ•๐’˜ = ๐ โˆ’ 2๐›พ 2 ฦฉ๐’˜ โˆ’ ๐œ†๐Ÿ = ๐ โˆ’ ๐›พฦฉ๐’˜ โˆ’ ๐œ†๐Ÿ = ๐ŸŽ ๐œ•๐ฟ ๐œ•๐œ† = 1 โˆ’ ๐’˜โ€ฒ ๐Ÿ = 0 We start by solving the first equation for ๐’˜: ๐›พฦฉ๐’˜ = ๐ โˆ’ ๐œ†๐Ÿ ๐’˜ = (๐›พฦฉ)โˆ’1(๐ โˆ’ ๐œ†๐Ÿ) Now we can plug this into the second equation: 1 โˆ’ [(๐›พฦฉ)โˆ’1(๐ โˆ’ ๐œ†๐Ÿ)]โ€ฒ ๐Ÿ = 0 Remember that (๐‘จ๐‘ฉ)โ€ฒ = ๐‘ฉโ€ฒ๐‘จโ€ฒ, and therefore we have: (๐ โˆ’ ๐œ†๐Ÿ)โ€ฒ((๐›พฦฉ)โˆ’1)โ€ฒ ๐Ÿ = 1 (๐ โˆ’ ๐œ†๐Ÿ)โ€ฒ ( ฦฉโˆ’๐Ÿ ๐›พ ) โ€ฒ ๐Ÿ = 1 The transpose of the sum of matrices (or vectors) is the sum of the transpose of those matrices: (๐‘จ + ๐‘ฉ)โ€ฒ = ๐‘จโ€ฒ + ๐‘ฉโ€ฒ. Moreover, a scalar is unaffected by the transpose: (๐‘๐‘จ)โ€ฒ = ๐‘๐‘จโ€ฒ. Also notice that ฦฉโˆ’๐Ÿ is a symmetric matrix, and therefore its transpose is still ฦฉโˆ’๐Ÿ . Therefore: (๐โ€ฒ โˆ’ ๐œ†๐Ÿโ€ฒ) ฦฉโˆ’๐Ÿ ๐›พ ๐Ÿ = 1 (๐โ€ฒ โˆ’ ๐œ†๐Ÿโ€ฒ)ฦฉโˆ’๐Ÿ ๐Ÿ = ๐›พ ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ โˆ’ ๐œ†๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ = ๐›พ ๐œ† = ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ โˆ’ ๐›พ ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ We can now plug this into the first equation, finally obtaining the solution we were looking for: ๐’˜ = (๐›พฦฉ)โˆ’1 (๐ โˆ’ ๐œ†๐Ÿ) ๐’˜ = (๐›พฦฉ)โˆ’1 (๐ โˆ’ ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ โˆ’ ๐›พ ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ ๐Ÿ) With some minimal rearrangement, we have therefore the following set of optimal weights that maximize the mean-variance utility given the constraint of full investment in the risky assets: ๐’˜ ๐‘ผโˆ— = ฦฉโˆ’๐Ÿ ๐›พ (๐ + ๐›พ โˆ’ ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ ๐Ÿ) Given a vector of mean ๐ and a covariance matrix ฦฉ, there will be different sets of optimal weights ๐’˜ ๐‘ผ and ๐’˜ ๐‘ผโˆ— for each different value of the risk-aversion parameter ๐›พ. However, while in ๐’˜ ๐‘ผโˆ— the relative wealth allocated to each risky asset changes with each different value of ๐›พ, in ๐’˜ ๐‘ผ all the weights scale up or down by the same proportion. In other words, when there is a risk-free asset, the only thing changing with different values of ๐›พ is the amount of wealth allocated to the risky assets, while the proportions within ๐’˜ ๐‘ผ do not change. For example, imagine we have three risky assets and a risk-free asset, and the optimal weights with ๐›พ = 3 are ๐’˜ ๐‘ผ โ€ฒ = [0.2 0.4 0.3]. This means that 90% of the wealth is invested in the risky assets, and 10% in the risk-free asset. If the risk aversion parameter increases to ๐›พ = 6, the new risky weights will be ๐’˜ ๐‘ผ โ€ฒ = [0.1 0.2 0.15]. Now 45% of the wealth is invested in the risky assets and 55% in the risk-free. However, the weights for the risky assets maintained the same proportions: they were all cut in half. With the formulas that we computed, we can obtain the efficient frontier, i.e., the set of portfolios that have the most efficient mean-variance combination (in other words, the highest possible utility) for each level of ๐›พ. However, the efficient frontier that we can draw in this way is incomplete, as the minimum variance portfolio is only reached with an infinite value for the risk aversion parameter. The following picture provides an illustration of the efficient frontier obtained from 29 asset with (red line) and without (blue curve) the possibility to also invest in a risk-free asset. The black dots are the individual stocks that have been combined to obtain the portfolio allocations that lie on the efficient frontier(s). The higher ๐›พ, the more the investor chooses a portfolio located toward the left side of the plot, i.e., with lower mean and lower standard deviation. We could also extend this plot by plotting the frontier allocations obtained with a negative ๐›พ. In this case the red line and the blue curve would be horizontally mirrored, and have therefore a negative inclination. Such allocation would be sought by an investor who is risk-seeking, i.e., who prefers more risk for a given mean return. As this is unreasonable and not โ€œefficientโ€ in any sensible way, those allocations are generally not considered and are not part of the efficient frontier. Providing a specific value for ๐›พ might be difficult in practice. Moreover, it can be difficult to interpret the meaning of the specific utility value associated with a certain portfolio. While we can say that a portfolio with a certain utility is preferable to another with a lower utility, it is not obvious how good it is in a more general sense. In short, the value itself, in the case of utility, is not very informative. A more intuitive approach is to specify the preferences through a desired mean portfolio return ๐‘…๐‘’ instead of a level of risk aversion. In this case, the goal becomes to minimize the variance given the desired mean return. This is a constrained optimization problem: min ๐’˜ ๐’˜โ€ฒ ฦฉ๐’˜ subject to: ๐’˜โ€ฒ ๐ + (1 โˆ’ ๐’˜โ€ฒ๐Ÿ)๐‘…๐‘“ = ๐‘…๐‘’ In the constraint, ๐’˜โ€ฒ ๐ is the return of the risky assets, and (1 โˆ’ ๐’˜โ€ฒ๐Ÿ)๐‘…๐‘“ is the return of the risk-free asset. Together they give the return of the portfolio, which, as stated, must be equal to ๐‘…๐‘’. As with mean-variance utility maximization, it is possible (and preferable) to work with excess returns instead of explicitly including the risk-free asset in the asset menu. In this case ๐‘…๐‘’ is the desired excess return, and the problem simplifies to: min ๐’˜ ๐’˜โ€ฒ ฦฉ๐’˜ subject to: ๐’˜โ€ฒ ๐ = ๐‘…๐‘’ The Lagrangian function is: ๐ฟ(๐’˜, ๐œ†) = ๐’˜โ€ฒ ฦฉ๐’˜ + ๐œ†[๐‘…๐‘’ โˆ’ ๐’˜โ€ฒ ๐] In order to get a more convenient first order condition, it is common to multiply the first term by 0.5, which does not alter the result (because minimizing half the variance is equivalent to minimizing the variance). So the Lagrangian becomes: ๐ฟ(๐’˜, ๐œ†) = 1 2 ๐’˜โ€ฒ ฦฉ๐’˜ + ๐œ†[๐‘…๐‘’ โˆ’ ๐’˜โ€ฒ ๐] The first order conditions are: ๐œ•๐ฟ ๐œ•๐’˜ = ฦฉ๐’˜ โˆ’ ๐œ†๐ = ๐ŸŽ ๐œ•๐ฟ ๐œ•๐œ† = ๐‘…๐‘’ โˆ’ ๐’˜โ€ฒ ๐ = 0 We start by solving for ๐’˜ in the first equation: ๐’˜ = ๐œ†ฦฉโˆ’๐Ÿ ๐ Then we plug it into the second equation and we solve for ๐œ†: ๐‘…๐‘’ = ๐’˜โ€ฒ ๐ ๐‘…๐‘’ = (๐œ†ฦฉโˆ’๐Ÿ ๐)โ€ฒ ๐ ๐‘…๐‘’ = ๐œ†๐โ€ฒฦฉโˆ’๐Ÿ ๐ ๐œ† = ๐‘…๐‘’ ๐โ€ฒฦฉโˆ’๐Ÿ ๐ We can now substitute this back in the first equation and we get the solution we wanted: ๐’˜ = ๐‘…๐‘’ ๐โ€ฒฦฉโˆ’๐Ÿ ๐ ฦฉโˆ’๐Ÿ ๐ Using the subscript โ€œmvโ€ (for โ€œmean-varianceโ€) to univocally identify the formula, we have: ๐’˜ ๐’Ž๐’— = ๐‘…๐‘’ ๐โ€ฒฦฉโˆ’๐Ÿ ๐ ฦฉโˆ’๐Ÿ ๐ Obviously, it is also possible to specify a given level of variance and maximize the mean return: max ๐’˜ ๐’˜โ€ฒ ๐ subject to: ๐’˜โ€ฒ ฦฉ๐’˜ = ๐œŽ2 We write the Lagrangian and the first order conditions: ๐ฟ(๐’˜, ๐œ†) = ๐’˜โ€ฒ ๐ + ๐œ†[๐œŽ2 โˆ’ ๐’˜โ€ฒ ฦฉ๐’˜] ๐œ•๐ฟ ๐œ•๐’˜ = ๐ โˆ’ 2๐œ†ฦฉ๐’˜ = ๐ŸŽ ๐œ•๐ฟ ๐œ•๐œ† = ๐œŽ2 โˆ’ ๐’˜โ€ฒ ฦฉ๐’˜ = 0 We solve the first equation for ๐’˜: 2๐œ†ฦฉ๐’˜ = ๐ ๐’˜ = ฦฉโˆ’๐Ÿ ๐ 2๐œ† Now we plug into the second equation: ๐œŽ2 = ๐’˜โ€ฒ ฦฉ๐’˜ ๐œŽ2 = ( ฦฉโˆ’๐Ÿ ๐ 2๐œ† ) โ€ฒ ฦฉ ฦฉโˆ’๐Ÿ ๐ 2๐œ† = ๐โ€ฒฦฉโˆ’๐Ÿ 2๐œ† ฦฉ ฦฉโˆ’๐Ÿ ๐ 2๐œ† ๐œŽ2 = ๐โ€ฒ๐‘ฐ 2๐œ† ฦฉโˆ’๐Ÿ ๐ 2๐œ† = ๐โ€ฒฦฉโˆ’๐Ÿ ๐ 4๐œ†2 ๐œŽ = โˆš๐โ€ฒฦฉโˆ’๐Ÿ ๐ 2๐œ† ๐œ† = โˆš๐โ€ฒฦฉโˆ’๐Ÿ ๐ 2๐œŽ And finally we replace this in the first equation to get the solution: ๐’˜ = ฦฉโˆ’๐Ÿ ๐ 2๐œ† = ฦฉโˆ’๐Ÿ ๐ 2 ( โˆš๐โ€ฒฦฉโˆ’๐Ÿ ๐ 2๐œŽ ) = ฦฉโˆ’๐Ÿ ๐ โˆš๐โ€ฒฦฉโˆ’๐Ÿ ๐ ๐œŽ = ๐œŽ โˆš๐โ€ฒฦฉโˆ’๐Ÿ ๐ ฦฉโˆ’๐Ÿ ๐ Mathematically, these are two equivalent optimization problems (notice the obvious similarities between the two solutions). However, it is more intuitive and more common to specify the desired mean and minimize the variance. Also for this problem we can have a more complicated version with the additional constraint of weights for the risky assets summing up to 1: min ๐’˜ ๐’˜โ€ฒ ฦฉ๐’˜ subject to: ๐’˜โ€ฒ ๐ = ๐‘…๐‘’ ๐’˜โ€ฒ ๐Ÿ = 1 As usual, we have to write the Lagrangian. As there are two constraints, this time there are two Lagrange multipliers: ๐ฟ(๐’˜, ๐œ†1, ๐œ†2) = 1 2 ๐’˜โ€ฒ ฦฉ๐’˜ + ๐œ†1[๐‘…๐‘’ โˆ’ ๐’˜โ€ฒ ๐] + ๐œ†2[1 โˆ’ ๐’˜โ€ฒ ๐Ÿ] The first order conditions are: ๐œ•๐ฟ ๐œ•๐’˜ = ฦฉ๐’˜ โˆ’ ๐œ†1 ๐ โˆ’ ๐œ†2 ๐Ÿ = ๐ŸŽ ๐œ•๐ฟ ๐œ•๐œ†1 = ๐‘…๐‘’ โˆ’ ๐’˜โ€ฒ ๐ = 0 ๐œ•๐ฟ ๐œ•๐œ†2 = 1 โˆ’ ๐’˜โ€ฒ ๐Ÿ = 0 We solve the first equation for ๐’˜: ฦฉ๐’˜ โˆ’ ๐œ†1 ๐ โˆ’ ๐œ†2 ๐Ÿ = ๐ŸŽ ฦฉ๐’˜ = ๐œ†1 ๐ + ๐œ†2 ๐Ÿ ๐’˜ = ฦฉโˆ’๐Ÿ(๐œ†1 ๐ + ๐œ†2 ๐Ÿ) ๐’˜ = ๐œ†1ฦฉโˆ’๐Ÿ ๐ + ๐œ†2ฦฉโˆ’๐Ÿ ๐Ÿ We need to get a formula for ๐œ†1 and one for ๐œ†2 that do not contain each other among their terms. To this end, notice that if we pre-multiply each side of the equation by ๐โ€ฒ we get ๐โ€ฒ๐’˜ = ๐œ†1 ๐โ€ฒฦฉโˆ’๐Ÿ ๐ + ๐œ†2 ๐โ€ฒฦฉโˆ’๐Ÿ ๐Ÿ ๐‘…๐‘’ = ๐œ†1 ๐โ€ฒฦฉโˆ’๐Ÿ ๐ + ๐œ†2 ๐โ€ฒฦฉโˆ’๐Ÿ ๐Ÿ Likewise, if we pre-multiply each side by ๐Ÿโ€ฒ we get ๐Ÿโ€ฒ๐’˜ = ๐œ†1 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐ + ๐œ†2 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ 1 = ๐œ†1 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐ + ๐œ†2 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ Therefore, we get a system of two equations: ๐‘…๐‘’ = ๐œ†1 ๐โ€ฒฦฉโˆ’๐Ÿ ๐ + ๐œ†2 ๐โ€ฒฦฉโˆ’๐Ÿ ๐Ÿ 1 = ๐œ†1 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐ + ๐œ†2 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ Notice that ๐โ€ฒฦฉโˆ’๐Ÿ ๐, ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ, ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐ and ๐Ÿโ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ are scalars. For conveniency, we name them as ๐ด = ๐โ€ฒฦฉโˆ’๐Ÿ ๐ ๐ต = ๐โ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ = ๐Ÿโ€ฒ ฦฉโˆ’๐Ÿ ๐ ๐ถ = ๐Ÿโ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ So the system of two equations is ๐œ†1 ๐ด + ๐œ†2 ๐ต = ๐‘…๐‘’ ๐œ†1 ๐ต + ๐œ†2 ๐ถ = 1 which in matrix form is [ ๐ด ๐ต ๐ต ๐ถ ] [ ๐œ†1 ๐œ†2 ] = [ ๐‘…๐‘’ 1 ] We solve the system for ๐œ†1 and ๐œ†2: [ ๐œ†1 ๐œ†2 ] = [ ๐ด ๐ต ๐ต ๐ถ ] โˆ’1 [ ๐‘…๐‘’ 1 ] The inverse of 2 ร— 2 matrix ๐‘ด = [ ๐‘Ž ๐‘ ๐‘ ๐‘‘ ] is given by a simple formula: ๐‘ดโˆ’๐Ÿ = 1 ๐‘Ž๐‘‘ โˆ’ ๐‘๐‘ [ ๐‘‘ โˆ’๐‘ โˆ’๐‘ ๐‘Ž ] Hence, our system becomes [ ๐œ†1 ๐œ†2 ] = 1 ๐ด๐ถ โˆ’ ๐ต2 [ ๐ถ โˆ’๐ต โˆ’๐ต ๐ด ] [ ๐‘…๐‘’ 1 ] [ ๐œ†1 ๐œ†2 ] = 1 ๐ด๐ถ โˆ’ ๐ต2 [ ๐ถ๐‘…๐‘’ โˆ’ ๐ต โˆ’๐ต๐‘…๐‘’ + ๐ด ] So we have obtained the formulas we were looking for, which written in plain form are ๐œ†1 = ๐ถ๐‘…๐‘’ โˆ’ ๐ต ๐ด๐ถ โˆ’ ๐ต2 ๐œ†2 = ๐ด โˆ’ ๐ต๐‘…๐‘’ ๐ด๐ถ โˆ’ ๐ต2 We can finally plug these terms back in the first equation we obtained for ๐’˜: ๐’˜ = ๐œ†1ฦฉโˆ’๐Ÿ ๐ + ๐œ†2ฦฉโˆ’๐Ÿ ๐Ÿ ๐’˜ = ๐ถ๐‘…๐‘’ โˆ’ ๐ต ๐ด๐ถ โˆ’ ๐ต2 ฦฉโˆ’๐Ÿ ๐ + ๐ด โˆ’ ๐ต๐‘…๐‘’ ๐ด๐ถ โˆ’ ๐ต2 ฦฉโˆ’๐Ÿ ๐Ÿ Hence, with some minimal rearrangement, the set of optimal weights that minimize the variance given a target return and the constraint of full investment in the risky assets is: ๐’˜ ๐’Ž๐’—โˆ— = ฦฉโˆ’๐Ÿ [ ๐ถ๐‘…๐‘’ โˆ’ ๐ต ๐ด๐ถ โˆ’ ๐ต2 ๐ + ๐ด โˆ’ ๐ต๐‘…๐‘’ ๐ด๐ถ โˆ’ ๐ต2 ๐Ÿ] where ๐ด = ๐โ€ฒฦฉโˆ’๐Ÿ ๐, ๐ต = ๐Ÿโ€ฒ ฦฉโˆ’๐Ÿ ๐ and ๐ถ = ๐Ÿโ€ฒ ฦฉโˆ’๐Ÿ ๐Ÿ. Analogously to what happens with the weights that maximize the utility, given a certain ๐ and ฦฉ, changing the target return alters the relative wealth allocation between the risky assets in ๐’˜ ๐’Ž๐’—โˆ—, but in ๐’˜ ๐’Ž๐’— only the value of the sum of the weights of the risky assets changes, while the proportions stay the same. By applying with different target returns the formulas that we computed, we can obtain the efficient frontier, which again will be a line when it is possible to invest also in a risk-free asset, or a curve if all the wealth must be placed in the risky assets. As with the one that can be obtained by maximizing utility, the full frontier also includes inefficient allocations. In the picture below, obtained from 29 assets, we plot in red and blue the efficient frontier with and without a risk-free asset respectively, and in orange and green the inefficient frontier allocations. The black dots are individual stocks. The frontier obtainable by minimizing the variance given a target return (or by maximizing the mean return given a target variance) is, of course, identical to the one obtainable by maximizing the meanvariance utility. In both cases we are facing the same trade-off between mean and variance and, given a certain ๐ and ฦฉ, the set of achievable efficient portfolios is the same. The problem of maximizing utility is a bit easier to solve because the risk-aversion parameter ๐›พ directly tells us how to weigh mean and variance in this trade-off. The value ๐›พ is however not very meaningful in practice, and so in a practical setting we need to solve the slightly more complex (but based on the same premises) problem of minimizing the variance given a target mean or vice versa. As you can see, there is a point at which the efficient frontiers with and without a risk-free asset touch each other. That corresponds to the mean and standard deviation of the tangency portfolio. This portfolio is the one portfolio of risky assets which has the highest possible Sharpe ratio. The Sharpe ratio of an investment can be geometrically interpreted as the slope of the line that connects the risk-free rate with an investment in the plot above. Therefore, the highest Sharpe ratio can always be achieved with the optimal weights given by the optimization procedures with a risk-free asset that we derived. The weights ๐’˜ ๐‘ผ or ๐’˜ ๐’Ž๐’— are used for the risky assets, and 1 โˆ’ ๐’˜ ๐‘ผโ€ฒ๐Ÿ or 1 โˆ’ ๐’˜ ๐’Ž๐’˜โ€ฒ๐Ÿ is the weigh for the risk-free asset (which can also be negative). This result (i.e., that the combination of the tangency portfolio and a risk-free asset gives the highest utility for a given riskaversion level) is known as two-fund separation theorem, and dates back to Tobin (1958). It played a large role in the development of CAPM, where the tangency portfolio is identified (under a series of rather stringent assumptions) as the market portfolio. But what if we want or can invest only in the risky assets, and we still want to achieve the highest possible Sharpe ratio? In other words, how can we compute the weights for the tangency portfolio? To this end, working with excess returns, we need to solve the following optimization problem: max ๐’˜ ๐’˜โ€ฒ๐ โˆš๐’˜โ€ฒฦฉ๐’˜ subject to: ๐’˜โ€ฒ ๐Ÿ = 1 This can be solved with the usual Lagrange multiplier method, but the computations are rather long and intricate, so we do not show them. The resulting closed form solution is: ๐’˜ ๐’•๐’‚๐’ = ฦฉโˆ’๐Ÿ ๐ ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐ However, there is another very easy way to arrive at this solution. Notice that the weights of the tangency portfolio are simply the weights of the portfolio that maximizes the mean-variance utility with any given value of ๐›พ in the presence of the risk-free asset, normalized so that they sum to 1. This is because ๐’˜ ๐‘ผ (and ๐’˜ ๐’Ž๐’—) are in fact just the weights of the tangency portfolio scaled according to the risk preferences of the investor: they sum to less than 1 or more than 1 depending on how risk-averse is the investor, but their proportions do not change. Therefore, we can simply take the formula for ๐’˜ ๐‘ผ and divide it by its own sum: ๐’˜ ๐’•๐’‚๐’ = 1 ๐›พ ฦฉโˆ’๐Ÿ ๐ ๐Ÿโ€ฒ ( 1 ๐›พ ฦฉโˆ’๐Ÿ ๐) = ฦฉโˆ’๐Ÿ ๐ ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐ We plot in purple the tangency portfolio in the graph with the frontier: GLOBAL MINIMUM VARIANCE AND EQUALLY WEIGHTED PORTFOLIO So far we treated the inputs ๐ and ฦฉ as if they are given, but in practice they need to be estimated. The simplest approach is called plug-in approach: the sample estimates of the inputs are computed from past data, and are then plugged into the optimization problem as if they were the true values. Obviously, this is not really the case, as sample estimates can be poor estimates of the true parameter values. This typically causes theoretically optimal mean-variance optimized portfolios to perform poorly out-of-sample. In particular, the estimation error in the sample mean is typically so big that minimizing the variance while ignoring ๐ usually leads to portfolios with a higher Sharpe ratio than those computed via mean-variance optimization. Therefore, the investor might want to compute the global minimum variance portfolio (GMV), also simply called minimum variance portfolio. Obviously, we need to impose the constraint that the sum of the weights of the risky assets is equal to 1, which means that nothing is invested in the risk-free asset. Otherwise, everything would be invested in the risk-free asset. Hence, we have to solve the following constrained optimization problem: min ๐’˜ ๐’˜โ€ฒ ฦฉ๐’˜ subject to: ๐’˜โ€ฒ๐Ÿ = 1 We write the Lagrangian function: ๐ฟ(๐’˜, ๐œ†) = ๐’˜โ€ฒ ฦฉ๐’˜ + ๐œ†[1 โˆ’ ๐’˜โ€ฒ๐Ÿ] As done before, we multiply the first term by 0.5, so the Lagrangian becomes: ๐ฟ(๐’˜, ๐œ†) = 1 2 ๐’˜โ€ฒ ฦฉ๐’˜ + ๐œ†[1 โˆ’ ๐’˜โ€ฒ๐Ÿ] The first order conditions are: ๐œ•๐ฟ ๐œ•๐’˜ = ฦฉ๐’˜ โˆ’ ๐œ†๐Ÿ = ๐ŸŽ ๐œ•๐ฟ ๐œ•๐œ† = 1 โˆ’ ๐’˜โ€ฒ ๐Ÿ = 0 Through some simple rearrangement we get: ๐’˜ = ๐œ†ฦฉโˆ’๐Ÿ ๐Ÿ ๐’˜โ€ฒ ๐Ÿ = 1 In the first equation, we can multiply both sides by ๐Ÿโ€ฒ, obtaining: ๐Ÿโ€ฒ๐’˜ = ๐œ†๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ From the second equation we know that: ๐’˜โ€ฒ ๐Ÿ = ๐Ÿโ€ฒ ๐’˜ = 1 Hence, the first equation becomes: 1 = ๐œ†๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ ๐œ† = 1 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ So, finally, we can take this last result and replace ๐œ† in ๐’˜ = ๐œ†ฦฉโˆ’๐Ÿ ๐Ÿ, obtaining: ๐’˜ = 1 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ ฦฉโˆ’๐Ÿ ๐Ÿ Therefore, the closed form-solution that gives the minimum variance weights is: ๐’˜ ๐’— = 1 ๐Ÿโ€ฒฦฉโˆ’๐Ÿ ๐Ÿ ฦฉโˆ’๐Ÿ ๐Ÿ As nothing is invested in the risk-free rate (since we require that the weights for the risky assets must sum up to 1), it is equivalent to work with returns or excess returns. However, it might be convenient to still work with excess returns, so that the results will be easily comparable with those obtained by the mean-variance portfolio. A further improvement can come from restricting the minimum variance portfolio to only have long positions. In other words, we add another constraint that prohibits short selling positions, to get a long-only minimum variance portfolio: min ๐’˜ ๐’˜โ€ฒ ฦฉ๐’˜ subject to: ๐’˜โ€ฒ๐Ÿ = 1 ๐’˜ โ‰ฅ ๐ŸŽ This problem does not have a closed form solution, but it is a quadratic programming problem (i.e., a problem with a quadratic objective function subject to linear constraints) that can easily be solved with computer programs using various algorithms. Notice that this solution is theoretically sub-optimal: if we actually knew the true parameters, it would lead to a loss compared to the unconstrained minimum variance portfolio (which is itself already theoretically sub-optimal compared to mean-variance portfolio). However, because it limits the impact of parameter uncertainty, disallowing short selling generally leads to a performance increase out-of-sample. When a value for ๐›พ is specified, another strategy that mitigates the impact of the estimation error is the 1/N rule.1 In this case we do not ignore the mean, as we want to somehow take into account the mean-variance preferences given by the value of ๐›พ. What we do instead is using the (sample) estimates of ๐ and ฦฉ to optimally allocate the wealth between the risk-free asset and the equally weighted risky assets. Remember that the mean-variance utility optimization problem is max ๐’˜ ๐’˜โ€ฒ๐ โˆ’ ๐›พ 2 ๐’˜โ€ฒฦฉ๐’˜ If all the risky assets must have the same weight, it means we are imposing that ๐’˜ = ๐‘๐Ÿ, where ๐‘ is a scalar that determines the weight of the assets. Therefore, what we need to find is the value of ๐‘. Hence, we substitute ๐’˜ = ๐‘๐Ÿ in the original problem: max ๐‘ (๐‘๐Ÿ)โ€ฒ๐ โˆ’ ๐›พ 2 (๐‘๐Ÿ)โ€ฒ ฦฉ(๐‘๐Ÿ) max ๐‘ ๐‘๐Ÿโ€ฒ๐ โˆ’ ๐›พ 2 ๐‘2 ๐Ÿโ€ฒฦฉ๐Ÿ 1 The name โ€œ1/N ruleโ€ is often used to refer to a naive rule where one simply places all the wealth on risky assets with all weights equal to 1/N, without estimating any input. In these notes we refer instead to the more elaborate rule described in the text. So we are back to an unconstrained problem. The first order condition is: ๐œ•๐‘ˆ(๐‘) ๐œ•๐‘ = ๐Ÿโ€ฒ ๐ โˆ’ ๐›พ๐‘๐Ÿโ€ฒ ฦฉ๐Ÿ = ๐ŸŽ from which we easily get ๐‘ = 1 ๐›พ ๐Ÿโ€ฒ ๐ ๐Ÿโ€ฒฦฉ๐Ÿ We then get the optimal weights for the risky assets by simply plugging this expression into ๐’˜ = ๐‘๐Ÿ ๐’˜ ๐Ÿ/๐‘ต = 1 ๐›พ ๐Ÿโ€ฒ๐ ๐Ÿโ€ฒฦฉ๐Ÿ ๐Ÿ while the weight for the riskless asset is given by 1 โˆ’ ๐’˜โ€ฒ ๐Ÿ/๐‘ต ๐Ÿ. For example, if ๐‘ = 5 and this rule returns a weight of 0.15 for each risky asset, we equally divide 75% of our wealth among the risky assets (i.e., 15% on each risky asset), and then place the remaining 25% on the risk-free asset. FACTOR INVESTING WITH LONG-SHORT PORTFOLIOS Computing optimal weights is not the only possibility. An alternative approach involves creating long-short portfolios. Suppose we want to invest into N assets. First, assets are ranked according to their predicted return. We then assemble a portfolio with two legs: a long leg which contains a given number of assets with the highest predicted returns, and a short leg with a given number of assets predicted to have the lowest returns. Within a certain leg the assets are often equally weighted, although other weighting systems that assign different weights based on the predicted return are of course possible. One of the advantages of this approach is that it allows the investor to consider predicted returns without amplifying the effects of the estimation error in the mean. Optimization procedures like the mean-variance one are in fact error-maximizing: errors in the inputs lead to extreme weights, which can lead to abysmal performance. This is why standard mean-variance optimization rarely works well and is generally replaced either with more advanced techniques that limit extreme allocations, or with a minimum variance portfolio. A long-short portfolio does not have theoretically optimal weights, but it can still work better by avoiding this error-maximization trap. The other advantage is that a long-short portfolio can be self-financing: the money obtained from shorting the assets predicted to perform poorly is used to go long on the assets with predicted high return. As short positions tend to be more risky than long positions, using a partially self-financing portfolio is also common. In this case, less than half (e.g., 30%) of the wealth is placed on the short leg, and the long leg is financed partly from the shorting and partly from the investorโ€™s initial wealth (in our example where 70% of the invested sum goes to the long leg, 30% of the money comes for the shorting and the other 40% from the investorโ€™s funds). A practical disadvantage of such a portfolio is that in the real world it is generally difficult to short a large number of stocks. So in practice N has to be relatively small, and therefore it can be more risky, as it is not very well diversified. A long-short portfolio can also be particularly vulnerable in turbulent market conditions, when the price of virtually all stocks are either increasing or decreasing at the same time. The latter problem can be eased by making the portfolio construction more flexible (e.g., by varying the number of stocks and/or the amount of wealth in the long and short leg depending on the market conditions). Obviously, a pre-condition for creating a long-short portfolio is having a ranking based on how we expect the assets to perform. How can we obtain it? Using the sample means is not appropriate, as we pointed out that such estimates are too unreliable. A much better alternative is to use a factorbased approach. In fact, long-short portfolios are a typical way factor investing is performed. This generally involves computing expected returns using multifactor models. Remember that the formula of a multifactor model with ๐‘˜ factors is: ๐‘…๐‘– = ๐›ผ๐‘– + ๐‘๐‘–1 ๐‘“1 + ๐‘๐‘–2 ๐‘“2 + โ‹ฏ + ๐‘๐‘–๐‘˜ ๐‘“๐‘˜ + ๐œ€๐‘– In practice, the expected return of a stock given a certain multifactor model is computed as: ๐ธ[๐‘…๐‘–] = ๐›ผ๐‘– + ๐‘๐‘–1 ๐›พ1 + ๐‘๐‘–2 ๐›พ2 + โ‹ฏ + ๐‘๐‘–๐‘˜ ๐‘“๐›พ ๐‘˜ where ๐›พ is the factor risk premium.2 Therefore, we need to estimate the loadings ๐‘๐‘–๐‘˜ and the risk premia. This is typically done using the Fama-MacBeth regression. It is a two-stage linear regression. Consider an estimation sample with ๐‘ assets and ๐‘‡ periods. In the first stage, the loadings are estimated by regressing the returns of each asset ๐‘– on the ๐‘˜ factors, using the entire set of ๐‘‡ periods: ๐‘…1๐‘ก = ๐›ผ1 + ๐‘11 ๐‘“1๐‘ก + ๐‘12 ๐‘“2๐‘ก + โ‹ฏ + ๐‘1๐‘˜ ๐‘“๐‘˜ ๐‘…2๐‘ก = ๐›ผ2 + ๐‘21 ๐‘“1 ๐‘ก + ๐‘22 ๐‘“2๐‘ก + โ‹ฏ + ๐‘2๐‘˜ ๐‘“๐‘˜ โ‹ฎ ๐‘…๐‘–๐‘ก = ๐›ผ๐‘– + ๐‘๐‘–1 ๐‘“1 ๐‘ก + ๐‘๐‘–2 ๐‘“2๐‘ก + โ‹ฏ + ๐‘๐‘–๐‘˜ ๐‘“๐‘˜ โ‹ฎ ๐‘… ๐‘๐‘ก = ๐›ผ ๐‘ + ๐‘ ๐‘1 ๐‘“1๐‘ก + ๐‘ ๐‘2 ๐‘“2๐‘ก + โ‹ฏ + ๐‘ ๐‘๐‘˜ ๐‘“๐‘˜๐‘ก The estimated loadings are then used as explanatory variables in a second regression that, for each period ๐‘ก, regresses the asset returns of the entire set of ๐‘ assets: ๐‘…๐‘–1 = ๐›พ10 + ๐›พ11 ๐‘๐‘–1 ฬ‚ + ๐›พ12 ๐‘๐‘–2 ฬ‚ + โ‹ฏ + ๐›พ1๐‘˜ ๐‘๐‘–๐‘˜ ฬ‚ ๐‘…๐‘–2 = ๐›พ20 + ๐›พ21 ๐‘๐‘–1 ฬ‚ + ๐›พ22 ๐‘๐‘–2 ฬ‚ + โ‹ฏ + ๐›พ2๐‘˜ ๐‘๐‘–๐‘˜ ฬ‚ โ‹ฎ ๐‘…๐‘–๐‘ก = ๐›พ๐‘ก0 + ๐›พ๐‘ก1 ๐‘๐‘–1 ฬ‚ + ๐›พ๐‘ก2 ๐‘๐‘–2 ฬ‚ + โ‹ฏ + ๐›พ๐‘ก๐‘˜ ๐‘๐‘–๐‘˜ ฬ‚ โ‹ฎ ๐‘…๐‘–๐‘‡ = ๐›พ ๐‘‡0 + ๐›พ ๐‘‡1 ๐‘๐‘–1 ฬ‚ + ๐›พ ๐‘‡2 ๐‘๐‘–2 ฬ‚ + โ‹ฏ + ๐›พ ๐‘‡๐‘˜ ๐‘๐‘–๐‘˜ ฬ‚ Ideally we should use the true loadings, but their value is of course unknown in practice. To compute the expected returns of each asset ๐‘– we need the loadings, estimated in the first regression, and the risk premia, estimated in the second regression. Notice however that the risk premia are time-varying. A common approach is to compute their average value over the ๐‘‡ periods (just like it is common to compute the average market excess return when using the CAPM). The expected return of asset ๐‘– according to the chosen multifactor model is given by (we omit the ^ to keep the notation light): ๐ธ[๐‘…๐‘–] = ๐‘๐‘–1 ๐›พ1 + ๐‘๐‘–2 ๐›พ2 + โ‹ฏ + ๐‘๐‘–๐‘˜ ๐›พ ๐‘˜ For greater clarity, let us consider how this works with the Fama-French three-factor model, which is probably the most important factor model. 2 In the CAPM, and in single factor models in general, we can directly use the factor value (the excess market return in the case of CAPM). In multifactor models we cannot do this, and we need to use the risk premia of the factors instead. Recall that the model is: ๐‘…๐‘– = ๐‘…๐‘“ + ๐‘๐‘–1(๐‘… ๐‘š โˆ’ ๐‘…๐‘“) + ๐‘๐‘–2 ๐‘†๐‘€๐ต + ๐‘๐‘–3 ๐ป๐‘€๐ฟ In practice the expected return of asset ๐‘– will be computed as: ๐ธ[๐‘…๐‘–] = ๐‘…๐‘“ + ๐‘๐‘–1 ๐›พ(๐‘… ๐‘šโˆ’๐‘… ๐‘“) + ๐‘๐‘–2 ๐›พ๐‘†๐‘€๐ต + ๐‘๐‘–3 ๐›พ ๐ป๐‘€๐ฟ We use the Fama-MacBeth regression to estimate the loadings and the risk premia. Usually, the excess return is used as dependent variable, to focus on the component of the return that is dependent on factor exposure. Therefore, the first stage regression for each asset ๐‘– is: ๐‘…๐‘–๐‘ก โˆ’ ๐‘…๐‘“๐‘ก = ๐›ผ๐‘– + ๐‘๐‘–1(๐‘… ๐‘š๐‘ก โˆ’ ๐‘…๐‘“๐‘ก) + ๐‘๐‘–2 ๐‘†๐‘€๐ต๐‘ก + ๐‘๐‘–3 ๐ป๐‘€๐ฟ ๐‘ก To simplify the notation, we indicate the first factor as ๐‘€๐พ๐‘‡: ๐‘…๐‘–๐‘ก โˆ’ ๐‘…๐‘“๐‘ก = ๐›ผ๐‘– + ๐‘๐‘–1 ๐‘€๐พ๐‘‡๐‘ก + ๐‘๐‘–2 ๐‘†๐‘€๐ต๐‘ก + ๐‘๐‘–3 ๐ป๐‘€๐ฟ ๐‘ก As explained before, this regression needs to be carried out separately for each of the ๐‘ assets. Now that we have the estimates for the loadings, we can set up the second stage regression: ๐‘…๐‘– โˆ’ ๐‘…๐‘“ = ๐›พ๐‘ก0 + ๐›พ๐‘ก1 ๐‘๐‘–1 ฬ‚ + ๐›พ๐‘ก2 ๐‘๐‘–2 ฬ‚ + ๐›พ๐‘ก3 ๐‘๐‘–3 ฬ‚ This regression needs to be carried out separately for each of the ๐‘‡ periods in the estimation window, obtaining ๐‘‡ values for ๐›พ๐‘ก1, ๐›พ๐‘ก2 and ๐›พ๐‘ก3 . We then compute their average in order to have a single value. We rename the average of ๐›พ๐‘ก1, ๐›พ๐‘ก2 and ๐›พ๐‘ก3 as ๐›พ ๐‘€๐พ๐‘‡, ๐›พ๐‘†๐‘€๐ต and ๐›พ ๐ป๐‘€๐ฟ respectively, for better clarity. We also compute the average risk-free rate in order to have a single value for ๐‘…๐‘“.3 We can now compute the expected return of each asset ๐‘– as: ๐ธ[๐‘…๐‘–] = ๐‘…๐‘“ + ๐‘๐‘–1 ๐›พ ๐‘€๐พ๐‘‡ + ๐‘๐‘–2 ๐›พ๐‘†๐‘€๐ต + ๐‘๐‘–3 ๐›พ ๐ป๐‘€๐ฟ It is now straightforward to create the long-short portfolio. We simply rank the assets according to their expected return, and take a long position on those positioned in the upper part of the ranking, and a short position on those in lower part of the ranking. Each time the portfolio has to be updated, we need to compute new estimates of the expected returns. So, for example, if we want to update the portfolio monthly, we ne need to repeat the procedure every month, using the up-to-date data. DOWNSIDE RISK MEASURES So far we measured risk using the variance. This is based on the assumption that returns are symmetrically distributed, or at least that the investor only cares about volatility as a whole, without distinguishing between upside and downside movements. While this is not realistic (because investors want to minimize the losses but not the gains), it greatly simplifies optimization procedures. Moreover, downside risk measures tend to be more difficult to estimate as inputs for optimization procedures, which may lead to worse performance despite targeting a more appropriate measure of risk. For these reasons, here we do not consider portfolio optimization techniques targeting downside risk. However, it is still important to be familiar with the most popular downside risk measures, as they are useful for performance evaluation, and some of them are also employed by regulatory authorities supervising the banking sector. 3 Computing the average value of the risk premia and of the risk-free rate is a reasonable approach, and the one commonly used. However, it is not โ€œtheโ€ right approach. Other approaches might also be appropriate depending on the specific situation. The downside risk measure most closely related to the variance is the semivariance:4 ๐œŽ ๐ต 2 = 1 ๐‘‡ โˆ‘[Min(๐‘…๐‘ก โˆ’ ๐ต, 0)]2 ๐‘‡ ๐‘ก=1 where ๐‘‡ is the number of periods in the estimation window, and ๐ต is the benchmark below which the investor considers volatility as risk. In practice, semivariance is computed by replacing all the portfolio returns above the benchmark with 0. ๐ต depends on the preferences of the investor. It is convenient (and also has some nice theoretical properties) to set ๐ต equal to the risk-free rate. In this way if one works with excess returns, ๐ต can be treated as equal to zero. However, in principle, it can be set to any value. The square root of ๐œŽ ๐ต 2 is called downside deviation, which we indicate with ๐œŽ ๐ต. The downside deviation is to the semivariance, what the standard deviation is to the variance. Theoretically, one can compute an optimal mean-semivariance or minimum semivariance portfolio by simply replacing the covariance matrix with the semicovariance matrix (the analogous to the covariance matrix in a downside risk setting) in the optimization procedures. However, estimating this matrix presents several challenges, and therefore we do not address this topic, but we still provide some intuition regarding what it means to target the semivariance. We distinguish between different scenarios: โ€ข If the distribution is symmetric and the benchmark is equal to the sample mean, targeting the variance or the semivariance is always equivalent. One should therefore target the former, as sample estimates for it are more accurate than sample estimates for the latter. โ€ข If the distribution is symmetric but the benchmark is not equal to the sample mean, targeting the variance or the semivariance is only equivalent if we set a target return (i.e., meansemivariance optimization). Minimize the variance or the semivariance without a target return is not equivalent in this setting. โ€ข If the distribution is not symmetric, targeting the variance is never equivalent to targeting the semivariance. The following figure provides a graphical illustration. 4 Technically, this is the downside semivariance, as it is also possible to compute an upside semivariance by replacing Min with Max in the formula. However, we are generally interested in the downside semivariance, which we therefore simply call โ€œsemivarianceโ€. To compute the risk-adjusted return in this context, the Sharpe ratio should be replaced by the Sortino ratio, which is similar to the Sharpe ratio but replaces the risk-free rate with the benchmark ๐ต, and the standard deviation with the downside deviation ๐œŽ ๐ต: Sortino = ๐‘… โˆ’ ๐ต ๐œŽ ๐ต Another popular downside risk measure is the Value at Risk (VaR). VaR measures the maximum potential loss that an investor can suffer over a certain period, with a 1 โˆ’ ๐›ผ confidence level. ๐›ผ is set by the investor; for example, an ๐›ผ = 0.05 corresponds to a 95% confidence level. More formally, given a profit and loss distribution ๐‘Œ we can define VaR as: ๐‘‰๐‘Ž๐‘… ๐›ผ(๐‘Œ) = โˆ’inf{๐‘ฆ โˆˆ R: (๐‘Œ โ‰ค ๐‘ฆ) > ๐›ผ} For example, if we set ๐›ผ = 0.05 and when evaluating a set of returns we get a ๐‘‰๐‘Ž๐‘… = 0.04, it means that we have a 5% chance of losing 4% or more in one period over the time horizon considered. VaR can be computed in different ways. The most commonly used is the historical method: we simply rank the historical returns in increasing order and then check the (typically negative) return that we have at the ๐›ผ percentile. Another possibility is the parametric method: we assume that returns follow a certain distribution and we compute the loss at the chosen percentile. Simulation (โ€œMonte Carloโ€) approaches are also possible. The main problem with VaR is that it is not a coherent risk measure. Consider the outcomes ๐‘‰1 and ๐‘‰2 of two investments. A risk measure is said to be coherent if it possesses the following desirable properties: โ€ข Monotonicity: if ๐‘‰1 is larger or equal to ๐‘‰2 in every possible scenario, then the risk of ๐‘‰1 must be lower than ๐‘‰2. Formally: if ๐‘‰1 โ‰ฅ ๐‘‰2, then ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰1) < ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰2). โ€ข Translation invariance: for any outcome ๐‘‰, adding an additional outcome C with a certain return reduces the risk by that amount. Formally: ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰ + ๐ถ) = ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰) โˆ’ ๐ถ. โ€ข Positive homogeneity: multiplying all outcomes by a constant should result in a scaling of the risk measure by the same constant. In other words, if we invest, say, twice the original amount, the risk measure should also double. Formally: ๐‘…๐‘–๐‘ ๐‘˜(๐œ†๐‘‰) = ๐œ†๐‘…๐‘–๐‘ ๐‘˜(๐‘‰). โ€ข Subadditivity: the risk of a combination of two risky positions should be lower or equal to the risk of the individual positions. In other words, diversifying by combining different assets should reduce risk, or at worst leave it unaffected, but it cannot increase it. Formally: ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰1 + ๐‘‰2) โ‰ค ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰1) + ๐‘…๐‘–๐‘ ๐‘˜(๐‘‰2). The Value at Risk satisfies the first three conditions, but not the last one. As it violates subadditivity, risk quantified using VaR can sometimes increase with greater diversification, which is not very meaningful. To overcome this problem, the Conditional Value at Risk (CVaR), also known as Expected Shortfall (ES), has been proposed: ๐ถ๐‘‰๐‘Ž๐‘… ๐›ผ(๐‘Œ) = โˆ’ 1 ๐›ผ โˆซ ๐‘‰๐‘Ž๐‘… ๐‘ข ๐‘‘๐‘ข ๐›ผ 0 where ๐‘ข is just the variable of integration and ๐‘‘๐‘ข is the differential of this variable (i.e., we are integrating from 0 to ๐›ผ using infinitesimal increments in ๐‘ข from 0 until we reach ๐›ผ). In more intuitive terms, the CVaR measures the average (the โ€œexpectedโ€) loss that we get, given that the loss exceeds the VaR. As it is a coherent measure of risk, it is preferred and more commonly used than the VaR. Of course, in order to compute the CVaR, you first need to compute the VaR. The following figure provides a graphical intuition of VaR and CVaR: CVaR is always lower than the VaR, because it is the value that we get by computing the average loss that we have when we find ourselves in the red area left of the VaR. Finally, another popular measure of downside risk is the drawdown (DD). The drawdown is the decline in the value of an investment from a peak to a low point. Different drawdown measures can be computed. A popular and easy to compute one is the maximum drawdown (MDD): ๐‘€๐ท๐ท = ๐‘‡๐‘Ÿ๐‘œ๐‘ข๐‘”โ„Ž ๐‘‰๐‘Ž๐‘™๐‘ข๐‘’ โˆ’ ๐‘ƒ๐‘’๐‘Ž๐‘˜ ๐‘‰๐‘Ž๐‘™๐‘ข๐‘’ ๐‘ƒ๐‘’๐‘Ž๐‘˜ ๐‘‰๐‘Ž๐‘ข๐‘’ where the โ€œTrough Valueโ€ is the lowest point in the series that is reached after the highest peak. Obviously, a lower MDD is preferable to a higher MDD. In the worst possible case, MDD is equal to 100%, i.e., the value of the investment drops to zero. Source: https://financetrain.com MDD fails to consider the frequency and duration of losses, and does not account for the size of any gains. To account for the gains, we can use a more informative measure called Calmar Ratio: ๐ถ๐‘Ž๐‘™๐‘š๐‘Ž๐‘Ÿ = ๐‘… โˆ’ ๐‘Ÿ๐‘“ ๐‘€๐ท๐ท This is similar to the Sharpe ratio, but the MDD is used instead of the standard deviation. PERFORMANCE EVALUATION After we obtain the series of portfolio returns, we need to appropriately evaluate results in order to gauge how well our investment strategy performed. We may start with some basic statistics about the distribution of the returns: โ€ข Mean: the higher the better โ€ข Standard deviation: the lower the better โ€ข Skewness: a positive value is preferable โ€ข (Excess) Kurtosis: a lower value is preferable unless the skewness is significantly positive We then compute the Sharpe ratio to quantify the risk-adjusted return. If we are interested in evaluating downside risk we may compute, instead or in addition to the standard deviation and the Sharpe ratio, some or all of these measures: โ€ข Downside deviation: the lower the better โ€ข CVaR: the lower (in absolute value) the better โ€ข (Maximum) drawdown: the lower the better We then quantify the risk-adjusted return with an appropriate measure, like the Sortino ratio. Another important financial indicator is the alpha, used to check if the returns of the investments are explained by a given asset pricing model. The alpha is obtained as the intercept in a regression of the portfolio returns over the returns of the factors of the model considered. Usually, the CAPM (in which case the alpha is called โ€œJensenโ€™s alphaโ€) or the Fama-French three-factor model are used. In the first case the regression takes the form: ๐‘… = ๐›ผ + ๐›ฝ(๐‘… ๐‘€๐‘˜๐‘ก โˆ’ ๐‘…๐‘“) while in the second case it is: ๐‘… = ๐›ผ + ๐‘1(๐‘… ๐‘€๐‘˜๐‘ก โˆ’ ๐‘…๐‘“) + ๐‘2 ๐‘†๐‘€๐ต + ๐‘3 ๐ป๐‘€๐ฟ If ๐›ผ is significantly greater than zero, it means that our strategy achieves returns higher than those predicted by the model based on the portfolio exposure to the factors. In order to check if we have a significant positive ๐›ผ, we compute its standard errors and then perform a test of hypothesis. The usual standard errors for the linear regression are generally not appropriate, as they assume homoskedasticity (i.e., constant variance) and no autocorrelation (i.e., no temporal dependency in the standard errors). These conditions are usually not met in financial time series, where heteroskedasticity and/or autocorrelation are often observed. To account for this, we may use instead the Newey-West standard errors. The technical details of such estimator are somewhat complicated, but not relevant here, and Newey-West standard errors can be easily computed in R. Using them, we can test whether the ๐›ผ of our investment is significantly greater than zero. A proper evaluation, however, should also account for the turnover, i.e., how much trading the strategy requires. The higher the turnover, the higher the transaction costs, which of course translates into lower net returns. To get an idea of the amount of trading required we can compute the average turnover. The turnover at a certain period ๐‘ก is given by: ๐‘‡๐‘‚๐‘ก = โˆ‘|๐‘ค๐‘–,๐‘ก โˆ’ ๐‘ค๐‘–,๐‘กโˆ’1| ๐‘ ๐‘–=1 Basically, for each stock we compute the absolute value of the change in the corresponding weight compared to the previous period, and we sum all these ๐‘ values. We do this for each of the ๐‘‡ periods in which we applied our strategy, and then we compute the mean. This gives us the average turnover. However, applying this formula using the set of weights we selected for the previous period as the value for ๐‘ค๐‘–,๐‘กโˆ’1 is not entirely correct. This is because when we update the weights at the beginning of each new period we need to account for the fact that, due to the realized returns during the period that just ended, the allocation of wealth changed compared to what was at the beginning of the previous period. We clarify this with an example. Suppose we have a portfolio with two assets updated monthly, and the weights at time ๐‘ก โˆ’ 1 were 0.5 and 0.5, while now at time ๐‘ก we want to change them to 0.4 and 0.6. We might think that the turnover is |0.4 โˆ’ 0.5| + |0.6 โˆ’ 0.5| = 0.2, which means we have to trade 20% of our wealth to update the portfolio. If the price of the two assets did not change over the month that just ended, this would indeed be correct. Suppose however that during that month the first asset experienced a +10% return, and the second one a -20% return. When we update the portfolio at time ๐‘ก, we no longer have the two original weights, but 0.5 + 0.5 ร— 0.1 = 0.55 for the first asset and 0.5 โˆ’ 0.5 ร— 0.2 = 0.4 for the second. The weights computed like this do not sum up to 1 because the total value of the portfolio changed compared to period ๐‘ก โˆ’ 1. We need to account for this by dividing both weights by their sum. So in this example where they sum to 0.55 + 0.4 = 0.95 we have 0.55 0.95โ„ โ‰ˆ 0.58 for the first weight and 0.4 0.95โ„ โ‰ˆ 0.42 for the second. Therefore, the actual turnover is |0.4 โˆ’ 0.58| + |0.6 โˆ’ 0.42| = 0.36. We might express this concept by rewriting the formula for the turnover as ๐‘‡๐‘‚๐‘ก = โˆ‘|๐‘ค๐‘–,๐‘ก โˆ’ ๐‘ค๐‘–,๐‘กโˆ’1 + | ๐‘ ๐‘–=1 where the โ€œ+โ€ in ๐‘ค๐‘–,๐‘กโˆ’1 + indicates that we are considering the weights from the previous period after accounting for the redistributing effect of the realized returns. Such weights are given by: ๐‘ค๐‘–,๐‘กโˆ’1 + = (๐‘ค๐‘–,๐‘กโˆ’1 + ๐‘ค๐‘–,๐‘กโˆ’1 ร— ๐‘…๐‘–,๐‘กโˆ’1) ร— โˆ‘ ๐‘ค๐‘–,๐‘กโˆ’1 ๐‘ ๐‘–=1 โˆ‘ (๐‘ค๐‘–,๐‘กโˆ’1 + ๐‘ค๐‘–,๐‘กโˆ’1 ร— ๐‘…๐‘–,๐‘กโˆ’1)๐‘ ๐‘–=1 The โˆ‘ ๐‘ค๐‘–,๐‘กโˆ’1 ๐‘ ๐‘–=1 part in the numerator, which was not used in the example above, is necessary to rescale correctly whenever the weights do not always sum up to 1 (in which case this part of the formula becomes equal to 1 and therefore leaves the numerator unchanged). After computing the turnover in this way for each period, we can then compute the average turnover as before. While the turnover certainly provides some useful information, we can get an even better figure by considering the portfolio returns net of transaction costs. Transaction costs can be fixed or proportional to the amount of trading. It is generally considered more appropriate to use proportional transaction costs. These can be accounted for by multiplying the turnover of each asset for the proportional cost. Frazzini (2012) suggests using transaction costs equal to 10 basis points (bp). A basis point is equal to 0.01%. Therefore, for example, if we need to buy or sell 5% of the positions we have in a certain asset, we face transaction costs equal to 0.05 ร— 0.001 = 0.00005, which means that 0.005% of the money invested in that position is lost in transaction costs. Once we have the portfolio returns net of transaction costs, we can use them to compute all the other statistics we listed before. It is useful to visualize the value ๐‘‰ of the portfolio over time, which can be computed as ๐‘‰๐‘‡ = ๐‘‰0 + โˆ‘( ๐‘‡ ๐‘ก=1 ๐‘‰๐‘กโˆ’1 ๐‘…๐‘ก) It is appropriate to compute the value both ignoring and net of transaction costs. The result is then plotted in a graph, which provides a visual representation of the effectiveness of our strategy. We might want to also compute the evolution of real wealth in addition to the nominal wealth. In other words, we might want to account for the inflation. We can do this by dividing the value of the portfolio over time by the deflator. We can compute the deflator ๐ท using a formula analogous to the one used to compute the value of the portfolio, simply replacing the return with the inflation rate ๐ผ: ๐ท ๐‘‡ = ๐ท0 + โˆ‘( ๐‘‡ ๐‘ก=1 ๐ท๐‘กโˆ’1 ๐ผ๐‘ก) Of course, the two series need to have the same starting value (e.g., 1 unit of wealth), and the same frequency (e.g., monthly). Finally, whatever metrics we use to evaluate the results, we need to compare them with appropriate benchmarks in order to know if the results are satisfying. A benchmark portfolio surprisingly difficult to beat is the one created with a naรฏve 1/N rule.5 This portfolio simply assigns equal weights to all the assets in all periods, and therefore does not require to estimate any input and to perform any optimization procedure. This is indeed one of its main strengths: it is completely immune to estimation errors. Another advantage is that it has a very low turnover, which translates into very low transaction costs. Another possible benchmark is the market, or more precisely the returns of a large stock market index, like the S&P 500. While it is not possible to buy an index, it is possible to buy ETFs. An ETF (Exchange-Traded Fund) is a fund that is traded on the financial markets and which tries to replicate a certain index. By investing in such fund, the investor can invest in a certain index without having to trade al the stocks contained in such index. Stock markets sometimes go through periods of poor performance that can last years, but over the long run they provide good returns (in countries with a solid economy). Therefore, such passive investing solution, which also does not require to perform estimation and optimization procedures, is another reasonable benchmark (over long enough periods of time). 5 We refer to this rule/portfolio as โ€œnaiveโ€ or โ€œnaรฏve 1/Nโ€, to not confuse it with the 1/N rule we described before. However, it is also commonly called simply โ€œ1/Nโ€ rule/portfolio in the literature.