PORTFOLIO THEORY – LECTURE NOTES
Dr. Andrea Rigamonti
MEAN-VARIANCE OPTIMIZATION
From an economist’s point of view, an investor that optimizes a portfolio is trying to maximize a
utility function. An extremely simple utility function is the linear utility function:
𝑈(𝑉) = 𝑎 + 𝑏𝑉, 𝑏 > 0
where 𝑈(𝑉) is the utility that the investor gets depending on the value 𝑉 of the portfolio. This
function simply says that the higher the wealth, the higher the utility. Its shape is a line, and the
parameter 𝑏 determines how much the utility increases following a wealth increase. 𝑏 is assumed
to be positive, otherwise the investor would be indifferent (𝑏 = 0) or less satisfied (𝑏 < 0) when
wealth increases. In other words, only the mean return of the portfolio matters.
Markowitz (1952) revolutionized the field by adding risk to the equation. His standard approach
assumes that an investor cares not only about the mean but also the variance of the portfolio
returns, i.e. the investor has a mean-variance utility. Given a certain mean return, the utility
increases as the variance (which quantifies risk) gets lower. Equivalently, given a certain level of
variance, the utility increases with a higher mean return. The decision in the trade-off between
return and risk is quantified by a risk aversion parameter 𝛾. A higher 𝛾 means that the investor is
more risk averse and will therefore require a higher compensation for an increased risk. A lower 𝛾
means the investor has a lower risk-aversion and will be willing to take more risk. In other words,
given a set of assets with a certain mean and variance, the lower the 𝛾 of the investor, the more he
will create an optimal portfolio with a higher mean return but also a higher variance.
To model such preferences, a quadratic utility function is used:
𝑈(𝑉) = 𝑉 −
𝛾
2
𝑉2
, 𝛾 > 0
𝛾 is assumed to be positive so that the utility function is concave:
Source: https://financestu.com
This implies that the investor is risk-averse. With 𝛾 = 0 the investor would be indifferent to risk,
while 𝛾 < 0 would mean that the investor is risk taker (i.e., prefers more risk for the same amount
of wealth, which is obviously not realistic).
Remember that the expected return of a portfolio is given by 𝜇 𝑃 = 𝒘′
𝝁 = 𝑉, where 𝒘 is the vector
of portfolio weights and 𝝁 is the vector of mean return of the single assets. Moreover, recall that
the variance is the expected value of the squared deviation from the mean.
So, the utility function becomes:
𝑈(𝒘) = 𝒘′𝝁 −
𝛾
2
𝒘′
Ʃ𝒘
Therefore, given a risk-free asset and a set of 𝑁 risky assets with mean returns 𝝁 and covariance
matrix 𝚺, and a certain risk aversion coefficient 𝛾, the investor we are considering wants to select
the weights 𝒘 in a way that maximizes the following utility function:
max
𝒘
𝒘′𝝁 −
𝛾
2
𝒘′
Ʃ𝒘
This is an unconstrained optimization problem easy to solve. We just need to set the first-order
condition, i.e., take the partial derivative with respect to 𝒘 and set it equal to zero:
𝜕𝑈(𝒘)
𝜕𝒘
= 𝝁 −
2𝛾
2
Ʃ𝒘 = 𝝁 − 𝛾Ʃ𝒘 = 𝟎
We then solve for 𝒘:
Ʃ𝒘 =
1
𝛾
𝝁
𝒘 =
1
𝛾
Ʃ−𝟏
𝝁
So, the closed-form solution that gives the optimal weights for the risky assets is:
𝒘 𝑼 =
1
𝛾
Ʃ−𝟏
𝝁
where “U” stands for “Utility”, while the weight for the risk-free asset is equal to 1 − 𝒘 𝑼′𝟏, where
𝟏 is a vector of ones with length equal to the number of risky assets.
The resulting optimal expected utility is:
𝑈(𝒘 𝑼) =
1
2𝛾
𝝁′
Ʃ−𝟏
𝝁
Notice that we do not need to explicitly include the risk-free asset in the asset menu, as it is
equivalent and simpler to work with excess returns, i.e. with the returns of the risky assets from
which we subtracted the risk-free rate.
A more difficult version is the one where we impose a full investment in the risky assets. In other
words, the sum of the weights of the risky assets must be equal to 1, and nothing is invested in the
risk-free asset. Hence, we have to solve the following constrained optimization problem:
max
𝒘
𝒘′𝝁 −
𝛾
2
𝒘′Ʃ𝒘
subject to:
𝒘′𝟏 = 1
To solve this problem, we use the method of Lagrange multipliers.
First, we need to define the Lagrangian function, i.e., a modified version of the objective function
that incorporates the constraint in this way:
𝐿(𝒘, 𝜆) = 𝒘′𝝁 −
𝛾
2
𝒘′Ʃ𝒘 + 𝜆[1 − 𝒘′𝟏]
where 𝜆 is the Lagrange multiplier.
By including this additional term we can now solve an unconstrained problem instead of a
constrained one. Therefore, we set the first order conditions for the Lagrangian function. The
conditions involve two simultaneous equations, as we have to compute the partial derivative both
with respect to 𝒘 and to 𝜆.
𝜕𝐿
𝜕𝒘
= 𝝁 −
2𝛾
2
Ʃ𝒘 − 𝜆𝟏 = 𝝁 − 𝛾Ʃ𝒘 − 𝜆𝟏 = 𝟎
𝜕𝐿
𝜕𝜆
= 1 − 𝒘′
𝟏 = 0
We start by solving the first equation for 𝒘:
𝛾Ʃ𝒘 = 𝝁 − 𝜆𝟏
𝒘 = (𝛾Ʃ)−1(𝝁 − 𝜆𝟏)
Now we can plug this into the second equation:
1 − [(𝛾Ʃ)−1(𝝁 − 𝜆𝟏)]′
𝟏 = 0
Remember that (𝑨𝑩)′
= 𝑩′𝑨′, and therefore we have:
(𝝁 − 𝜆𝟏)′((𝛾Ʃ)−1)′
𝟏 = 1
(𝝁 − 𝜆𝟏)′ (
Ʃ−𝟏
𝛾
)
′
𝟏 = 1
The transpose of the sum of matrices (or vectors) is the sum of the transpose of those matrices:
(𝑨 + 𝑩)′
= 𝑨′
+ 𝑩′. Moreover, a scalar is unaffected by the transpose: (𝑐𝑨)′
= 𝑐𝑨′. Also notice
that Ʃ−𝟏
is a symmetric matrix, and therefore its transpose is still Ʃ−𝟏
. Therefore:
(𝝁′ − 𝜆𝟏′)
Ʃ−𝟏
𝛾
𝟏 = 1
(𝝁′ − 𝜆𝟏′)Ʃ−𝟏
𝟏 = 𝛾
𝝁′
Ʃ−𝟏
𝟏 − 𝜆𝟏′Ʃ−𝟏
𝟏 = 𝛾
𝜆 =
𝝁′
Ʃ−𝟏
𝟏 − 𝛾
𝟏′Ʃ−𝟏 𝟏
We can now plug this into the first equation, finally obtaining the solution we were looking for:
𝒘 = (𝛾Ʃ)−1
(𝝁 − 𝜆𝟏)
𝒘 = (𝛾Ʃ)−1
(𝝁 −
𝝁′
Ʃ−𝟏
𝟏 − 𝛾
𝟏′Ʃ−𝟏 𝟏
𝟏)
With some minimal rearrangement, we have therefore the following set of optimal weights that
maximize the mean-variance utility given the constraint of full investment in the risky assets:
𝒘 𝑼∗ =
Ʃ−𝟏
𝛾
(𝝁 +
𝛾 − 𝝁′
Ʃ−𝟏
𝟏
𝟏′Ʃ−𝟏 𝟏
𝟏)
Given a vector of mean 𝝁 and a covariance matrix Ʃ, there will be different sets of optimal weights
𝒘 𝑼 and 𝒘 𝑼∗ for each different value of the risk-aversion parameter 𝛾. However, while in 𝒘 𝑼∗ the
relative wealth allocated to each risky asset changes with each different value of 𝛾, in 𝒘 𝑼 all the
weights scale up or down by the same proportion. In other words, when there is a risk-free asset,
the only thing changing with different values of 𝛾 is the amount of wealth allocated to the risky
assets, while the proportions within 𝒘 𝑼 do not change.
For example, imagine we have three risky assets and a risk-free asset, and the optimal weights with
𝛾 = 3 are 𝒘 𝑼
′
= [0.2 0.4 0.3]. This means that 90% of the wealth is invested in the risky assets,
and 10% in the risk-free asset. If the risk aversion parameter increases to 𝛾 = 6, the new risky
weights will be 𝒘 𝑼
′
= [0.1 0.2 0.15]. Now 45% of the wealth is invested in the risky assets and
55% in the risk-free. However, the weights for the risky assets maintained the same proportions:
they were all cut in half.
With the formulas that we computed, we can obtain the efficient frontier, i.e., the set of portfolios
that have the most efficient mean-variance combination (in other words, the highest possible utility)
for each level of 𝛾. However, the efficient frontier that we can draw in this way is incomplete, as the
minimum variance portfolio is only reached with an infinite value for the risk aversion parameter.
The following picture provides an illustration of the efficient frontier obtained from 29 asset with
(red line) and without (blue curve) the possibility to also invest in a risk-free asset. The black dots
are the individual stocks that have been combined to obtain the portfolio allocations that lie on the
efficient frontier(s).
The higher 𝛾, the more the investor chooses a portfolio located toward the left side of the plot, i.e.,
with lower mean and lower standard deviation.
We could also extend this plot by plotting the frontier allocations obtained with a negative 𝛾. In this
case the red line and the blue curve would be horizontally mirrored, and have therefore a negative
inclination. Such allocation would be sought by an investor who is risk-seeking, i.e., who prefers
more risk for a given mean return. As this is unreasonable and not “efficient” in any sensible way,
those allocations are generally not considered and are not part of the efficient frontier.
Providing a specific value for 𝛾 might be difficult in practice. Moreover, it can be difficult to interpret
the meaning of the specific utility value associated with a certain portfolio. While we can say that a
portfolio with a certain utility is preferable to another with a lower utility, it is not obvious how good
it is in a more general sense. In short, the value itself, in the case of utility, is not very informative.
A more intuitive approach is to specify the preferences through a desired mean portfolio return 𝑅𝑒
instead of a level of risk aversion. In this case, the goal becomes to minimize the variance given the
desired mean return. This is a constrained optimization problem:
min
𝒘
𝒘′
Ʃ𝒘
subject to:
𝒘′
𝝁 + (1 − 𝒘′𝟏)𝑅𝑓 = 𝑅𝑒
In the constraint, 𝒘′
𝝁 is the return of the risky assets, and (1 − 𝒘′𝟏)𝑅𝑓 is the return of the risk-free
asset. Together they give the return of the portfolio, which, as stated, must be equal to 𝑅𝑒.
As with mean-variance utility maximization, it is possible (and preferable) to work with excess
returns instead of explicitly including the risk-free asset in the asset menu. In this case 𝑅𝑒 is the
desired excess return, and the problem simplifies to:
min
𝒘
𝒘′
Ʃ𝒘
subject to:
𝒘′
𝝁 = 𝑅𝑒
The Lagrangian function is:
𝐿(𝒘, 𝜆) = 𝒘′
Ʃ𝒘 + 𝜆[𝑅𝑒 − 𝒘′
𝝁]
In order to get a more convenient first order condition, it is common to multiply the first term by
0.5, which does not alter the result (because minimizing half the variance is equivalent to minimizing
the variance). So the Lagrangian becomes:
𝐿(𝒘, 𝜆) =
1
2
𝒘′
Ʃ𝒘 + 𝜆[𝑅𝑒 − 𝒘′
𝝁]
The first order conditions are:
𝜕𝐿
𝜕𝒘
= Ʃ𝒘 − 𝜆𝝁 = 𝟎
𝜕𝐿
𝜕𝜆
= 𝑅𝑒 − 𝒘′
𝝁 = 0
We start by solving for 𝒘 in the first equation:
𝒘 = 𝜆Ʃ−𝟏
𝝁
Then we plug it into the second equation and we solve for 𝜆:
𝑅𝑒 = 𝒘′
𝝁
𝑅𝑒 = (𝜆Ʃ−𝟏
𝝁)′
𝝁
𝑅𝑒 = 𝜆𝝁′Ʃ−𝟏
𝝁
𝜆 =
𝑅𝑒
𝝁′Ʃ−𝟏 𝝁
We can now substitute this back in the first equation and we get the solution we wanted:
𝒘 =
𝑅𝑒
𝝁′Ʃ−𝟏 𝝁
Ʃ−𝟏
𝝁
Using the subscript “mv” (for “mean-variance”) to univocally identify the formula, we have:
𝒘 𝒎𝒗 =
𝑅𝑒
𝝁′Ʃ−𝟏 𝝁
Ʃ−𝟏
𝝁
Obviously, it is also possible to specify a given level of variance and maximize the mean return:
max
𝒘
𝒘′
𝝁
subject to:
𝒘′
Ʃ𝒘 = 𝜎2
We write the Lagrangian and the first order conditions:
𝐿(𝒘, 𝜆) = 𝒘′
𝝁 + 𝜆[𝜎2
− 𝒘′
Ʃ𝒘]
𝜕𝐿
𝜕𝒘
= 𝝁 − 2𝜆Ʃ𝒘 = 𝟎
𝜕𝐿
𝜕𝜆
= 𝜎2
− 𝒘′
Ʃ𝒘 = 0
We solve the first equation for 𝒘:
2𝜆Ʃ𝒘 = 𝝁
𝒘 =
Ʃ−𝟏
𝝁
2𝜆
Now we plug into the second equation:
𝜎2
= 𝒘′
Ʃ𝒘
𝜎2
= (
Ʃ−𝟏
𝝁
2𝜆
)
′
Ʃ
Ʃ−𝟏
𝝁
2𝜆
=
𝝁′Ʃ−𝟏
2𝜆
Ʃ
Ʃ−𝟏
𝝁
2𝜆
𝜎2
=
𝝁′𝑰
2𝜆
Ʃ−𝟏
𝝁
2𝜆
=
𝝁′Ʃ−𝟏
𝝁
4𝜆2
𝜎 =
√𝝁′Ʃ−𝟏 𝝁
2𝜆
𝜆 =
√𝝁′Ʃ−𝟏 𝝁
2𝜎
And finally we replace this in the first equation to get the solution:
𝒘 =
Ʃ−𝟏
𝝁
2𝜆
=
Ʃ−𝟏
𝝁
2 (
√𝝁′Ʃ−𝟏 𝝁
2𝜎
)
=
Ʃ−𝟏
𝝁
√𝝁′Ʃ−𝟏 𝝁
𝜎 =
𝜎
√𝝁′Ʃ−𝟏 𝝁
Ʃ−𝟏
𝝁
Mathematically, these are two equivalent optimization problems (notice the obvious similarities
between the two solutions). However, it is more intuitive and more common to specify the desired
mean and minimize the variance.
Also for this problem we can have a more complicated version with the additional constraint of
weights for the risky assets summing up to 1:
min
𝒘
𝒘′
Ʃ𝒘
subject to:
𝒘′
𝝁 = 𝑅𝑒
𝒘′
𝟏 = 1
As usual, we have to write the Lagrangian. As there are two constraints, this time there are two
Lagrange multipliers:
𝐿(𝒘, 𝜆1, 𝜆2) =
1
2
𝒘′
Ʃ𝒘 + 𝜆1[𝑅𝑒 − 𝒘′
𝝁] + 𝜆2[1 − 𝒘′
𝟏]
The first order conditions are:
𝜕𝐿
𝜕𝒘
= Ʃ𝒘 − 𝜆1 𝝁 − 𝜆2 𝟏 = 𝟎
𝜕𝐿
𝜕𝜆1
= 𝑅𝑒 − 𝒘′
𝝁 = 0
𝜕𝐿
𝜕𝜆2
= 1 − 𝒘′
𝟏 = 0
We solve the first equation for 𝒘:
Ʃ𝒘 − 𝜆1 𝝁 − 𝜆2 𝟏 = 𝟎
Ʃ𝒘 = 𝜆1 𝝁 + 𝜆2 𝟏
𝒘 = Ʃ−𝟏(𝜆1 𝝁 + 𝜆2 𝟏)
𝒘 = 𝜆1Ʃ−𝟏
𝝁 + 𝜆2Ʃ−𝟏
𝟏
We need to get a formula for 𝜆1 and one for 𝜆2 that do not contain each other among their terms.
To this end, notice that if we pre-multiply each side of the equation by 𝝁′ we get
𝝁′𝒘 = 𝜆1 𝝁′Ʃ−𝟏
𝝁 + 𝜆2 𝝁′Ʃ−𝟏
𝟏
𝑅𝑒 = 𝜆1 𝝁′Ʃ−𝟏
𝝁 + 𝜆2 𝝁′Ʃ−𝟏
𝟏
Likewise, if we pre-multiply each side by 𝟏′ we get
𝟏′𝒘 = 𝜆1 𝟏′Ʃ−𝟏
𝝁 + 𝜆2 𝟏′Ʃ−𝟏
𝟏
1 = 𝜆1 𝟏′Ʃ−𝟏
𝝁 + 𝜆2 𝟏′Ʃ−𝟏
𝟏
Therefore, we get a system of two equations:
𝑅𝑒 = 𝜆1 𝝁′Ʃ−𝟏
𝝁 + 𝜆2 𝝁′Ʃ−𝟏
𝟏
1 = 𝜆1 𝟏′Ʃ−𝟏
𝝁 + 𝜆2 𝟏′Ʃ−𝟏
𝟏
Notice that 𝝁′Ʃ−𝟏
𝝁, 𝝁′
Ʃ−𝟏
𝟏, 𝟏′Ʃ−𝟏
𝝁 and 𝟏′
Ʃ−𝟏
𝟏 are scalars. For conveniency, we name them as
𝐴 = 𝝁′Ʃ−𝟏
𝝁 𝐵 = 𝝁′
Ʃ−𝟏
𝟏 = 𝟏′
Ʃ−𝟏
𝝁 𝐶 = 𝟏′
Ʃ−𝟏
𝟏
So the system of two equations is
𝜆1 𝐴 + 𝜆2 𝐵 = 𝑅𝑒
𝜆1 𝐵 + 𝜆2 𝐶 = 1
which in matrix form is
[
𝐴 𝐵
𝐵 𝐶
] [
𝜆1
𝜆2
] = [
𝑅𝑒
1
]
We solve the system for 𝜆1 and 𝜆2:
[
𝜆1
𝜆2
] = [
𝐴 𝐵
𝐵 𝐶
]
−1
[
𝑅𝑒
1
]
The inverse of 2 × 2 matrix 𝑴 = [
𝑎 𝑏
𝑐 𝑑
] is given by a simple formula:
𝑴−𝟏
=
1
𝑎𝑑 − 𝑏𝑐
[
𝑑 −𝑏
−𝑐 𝑎
]
Hence, our system becomes
[
𝜆1
𝜆2
] =
1
𝐴𝐶 − 𝐵2
[
𝐶 −𝐵
−𝐵 𝐴
] [
𝑅𝑒
1
]
[
𝜆1
𝜆2
] =
1
𝐴𝐶 − 𝐵2
[
𝐶𝑅𝑒 − 𝐵
−𝐵𝑅𝑒 + 𝐴
]
So we have obtained the formulas we were looking for, which written in plain form are
𝜆1 =
𝐶𝑅𝑒 − 𝐵
𝐴𝐶 − 𝐵2
𝜆2 =
𝐴 − 𝐵𝑅𝑒
𝐴𝐶 − 𝐵2
We can finally plug these terms back in the first equation we obtained for 𝒘:
𝒘 = 𝜆1Ʃ−𝟏
𝝁 + 𝜆2Ʃ−𝟏
𝟏
𝒘 =
𝐶𝑅𝑒 − 𝐵
𝐴𝐶 − 𝐵2
Ʃ−𝟏
𝝁 +
𝐴 − 𝐵𝑅𝑒
𝐴𝐶 − 𝐵2
Ʃ−𝟏
𝟏
Hence, with some minimal rearrangement, the set of optimal weights that minimize the variance
given a target return and the constraint of full investment in the risky assets is:
𝒘 𝒎𝒗∗ = Ʃ−𝟏
[
𝐶𝑅𝑒 − 𝐵
𝐴𝐶 − 𝐵2
𝝁 +
𝐴 − 𝐵𝑅𝑒
𝐴𝐶 − 𝐵2
𝟏]
where 𝐴 = 𝝁′Ʃ−𝟏
𝝁, 𝐵 = 𝟏′
Ʃ−𝟏
𝝁 and 𝐶 = 𝟏′
Ʃ−𝟏
𝟏.
Analogously to what happens with the weights that maximize the utility, given a certain 𝝁 and Ʃ,
changing the target return alters the relative wealth allocation between the risky assets in 𝒘 𝒎𝒗∗,
but in 𝒘 𝒎𝒗 only the value of the sum of the weights of the risky assets changes, while the
proportions stay the same.
By applying with different target returns the formulas that we computed, we can obtain the efficient
frontier, which again will be a line when it is possible to invest also in a risk-free asset, or a curve if
all the wealth must be placed in the risky assets. As with the one that can be obtained by maximizing
utility, the full frontier also includes inefficient allocations. In the picture below, obtained from 29
assets, we plot in red and blue the efficient frontier with and without a risk-free asset respectively,
and in orange and green the inefficient frontier allocations. The black dots are individual stocks.
The frontier obtainable by minimizing the variance given a target return (or by maximizing the mean
return given a target variance) is, of course, identical to the one obtainable by maximizing the meanvariance
utility. In both cases we are facing the same trade-off between mean and variance and,
given a certain 𝝁 and Ʃ, the set of achievable efficient portfolios is the same. The problem of
maximizing utility is a bit easier to solve because the risk-aversion parameter 𝛾 directly tells us how
to weigh mean and variance in this trade-off. The value 𝛾 is however not very meaningful in practice,
and so in a practical setting we need to solve the slightly more complex (but based on the same
premises) problem of minimizing the variance given a target mean or vice versa.
As you can see, there is a point at which the efficient frontiers with and without a risk-free asset
touch each other. That corresponds to the mean and standard deviation of the tangency portfolio.
This portfolio is the one portfolio of risky assets which has the highest possible Sharpe ratio. The
Sharpe ratio of an investment can be geometrically interpreted as the slope of the line that connects
the risk-free rate with an investment in the plot above. Therefore, the highest Sharpe ratio can
always be achieved with the optimal weights given by the optimization procedures with a risk-free
asset that we derived. The weights 𝒘 𝑼 or 𝒘 𝒎𝒗 are used for the risky assets, and 1 − 𝒘 𝑼′𝟏 or 1 −
𝒘 𝒎𝒘′𝟏 is the weigh for the risk-free asset (which can also be negative). This result (i.e., that the
combination of the tangency portfolio and a risk-free asset gives the highest utility for a given riskaversion
level) is known as two-fund separation theorem, and dates back to Tobin (1958). It played
a large role in the development of CAPM, where the tangency portfolio is identified (under a series
of rather stringent assumptions) as the market portfolio.
But what if we want or can invest only in the risky assets, and we still want to achieve the highest
possible Sharpe ratio? In other words, how can we compute the weights for the tangency portfolio?
To this end, working with excess returns, we need to solve the following optimization problem:
max
𝒘
𝒘′𝝁
√𝒘′Ʃ𝒘
subject to:
𝒘′
𝟏 = 1
This can be solved with the usual Lagrange multiplier method, but the computations are rather long
and intricate, so we do not show them. The resulting closed form solution is:
𝒘 𝒕𝒂𝒏 =
Ʃ−𝟏
𝝁
𝟏′Ʃ−𝟏 𝝁
However, there is another very easy way to arrive at this solution. Notice that the weights of the
tangency portfolio are simply the weights of the portfolio that maximizes the mean-variance utility
with any given value of 𝛾 in the presence of the risk-free asset, normalized so that they sum to 1.
This is because 𝒘 𝑼 (and 𝒘 𝒎𝒗) are in fact just the weights of the tangency portfolio scaled according
to the risk preferences of the investor: they sum to less than 1 or more than 1 depending on how
risk-averse is the investor, but their proportions do not change. Therefore, we can simply take the
formula for 𝒘 𝑼 and divide it by its own sum:
𝒘 𝒕𝒂𝒏 =
1
𝛾
Ʃ−𝟏
𝝁
𝟏′ (
1
𝛾
Ʃ−𝟏 𝝁)
=
Ʃ−𝟏
𝝁
𝟏′Ʃ−𝟏 𝝁
We plot in purple the tangency portfolio in the graph with the frontier:
GLOBAL MINIMUM VARIANCE AND EQUALLY WEIGHTED PORTFOLIO
So far we treated the inputs 𝝁 and Ʃ as if they are given, but in practice they need to be estimated.
The simplest approach is called plug-in approach: the sample estimates of the inputs are computed
from past data, and are then plugged into the optimization problem as if they were the true values.
Obviously, this is not really the case, as sample estimates can be poor estimates of the true
parameter values. This typically causes theoretically optimal mean-variance optimized portfolios to
perform poorly out-of-sample.
In particular, the estimation error in the sample mean is typically so big that minimizing the variance
while ignoring 𝝁 usually leads to portfolios with a higher Sharpe ratio than those computed via
mean-variance optimization. Therefore, the investor might want to compute the global minimum
variance portfolio (GMV), also simply called minimum variance portfolio. Obviously, we need to
impose the constraint that the sum of the weights of the risky assets is equal to 1, which means that
nothing is invested in the risk-free asset. Otherwise, everything would be invested in the risk-free
asset. Hence, we have to solve the following constrained optimization problem:
min
𝒘
𝒘′
Ʃ𝒘
subject to:
𝒘′𝟏 = 1
We write the Lagrangian function:
𝐿(𝒘, 𝜆) = 𝒘′
Ʃ𝒘 + 𝜆[1 − 𝒘′𝟏]
As done before, we multiply the first term by 0.5, so the Lagrangian becomes:
𝐿(𝒘, 𝜆) =
1
2
𝒘′
Ʃ𝒘 + 𝜆[1 − 𝒘′𝟏]
The first order conditions are:
𝜕𝐿
𝜕𝒘
= Ʃ𝒘 − 𝜆𝟏 = 𝟎
𝜕𝐿
𝜕𝜆
= 1 − 𝒘′
𝟏 = 0
Through some simple rearrangement we get:
𝒘 = 𝜆Ʃ−𝟏
𝟏
𝒘′
𝟏 = 1
In the first equation, we can multiply both sides by 𝟏′, obtaining:
𝟏′𝒘 = 𝜆𝟏′Ʃ−𝟏
𝟏
From the second equation we know that:
𝒘′
𝟏 = 𝟏′
𝒘 = 1
Hence, the first equation becomes:
1 = 𝜆𝟏′Ʃ−𝟏
𝟏
𝜆 =
1
𝟏′Ʃ−𝟏 𝟏
So, finally, we can take this last result and replace 𝜆 in 𝒘 = 𝜆Ʃ−𝟏
𝟏, obtaining:
𝒘 =
1
𝟏′Ʃ−𝟏 𝟏
Ʃ−𝟏
𝟏
Therefore, the closed form-solution that gives the minimum variance weights is:
𝒘 𝒗 =
1
𝟏′Ʃ−𝟏 𝟏
Ʃ−𝟏
𝟏
As nothing is invested in the risk-free rate (since we require that the weights for the risky assets
must sum up to 1), it is equivalent to work with returns or excess returns. However, it might be
convenient to still work with excess returns, so that the results will be easily comparable with those
obtained by the mean-variance portfolio.
A further improvement can come from restricting the minimum variance portfolio to only have long
positions. In other words, we add another constraint that prohibits short selling positions, to get a
long-only minimum variance portfolio:
min
𝒘
𝒘′
Ʃ𝒘
subject to:
𝒘′𝟏 = 1
𝒘 ≥ 𝟎
This problem does not have a closed form solution, but it is a quadratic programming problem (i.e.,
a problem with a quadratic objective function subject to linear constraints) that can easily be solved
with computer programs using various algorithms.
Notice that this solution is theoretically sub-optimal: if we actually knew the true parameters, it
would lead to a loss compared to the unconstrained minimum variance portfolio (which is itself
already theoretically sub-optimal compared to mean-variance portfolio). However, because it limits
the impact of parameter uncertainty, disallowing short selling generally leads to a performance
increase out-of-sample.
When a value for 𝛾 is specified, another strategy that mitigates the impact of the estimation error
is the 1/N rule.1 In this case we do not ignore the mean, as we want to somehow take into account
the mean-variance preferences given by the value of 𝛾. What we do instead is using the (sample)
estimates of 𝝁 and Ʃ to optimally allocate the wealth between the risk-free asset and the equally
weighted risky assets. Remember that the mean-variance utility optimization problem is
max
𝒘
𝒘′𝝁 −
𝛾
2
𝒘′Ʃ𝒘
If all the risky assets must have the same weight, it means we are imposing that 𝒘 = 𝑐𝟏, where 𝑐 is
a scalar that determines the weight of the assets. Therefore, what we need to find is the value of 𝑐.
Hence, we substitute 𝒘 = 𝑐𝟏 in the original problem:
max
𝑐
(𝑐𝟏)′𝝁 −
𝛾
2
(𝑐𝟏)′
Ʃ(𝑐𝟏)
max
𝑐
𝑐𝟏′𝝁 −
𝛾
2
𝑐2
𝟏′Ʃ𝟏
1
The name “1/N rule” is often used to refer to a naive rule where one simply places all the wealth on risky assets with all
weights equal to 1/N, without estimating any input. In these notes we refer instead to the more elaborate rule described in
the text.
So we are back to an unconstrained problem. The first order condition is:
𝜕𝑈(𝑐)
𝜕𝑐
= 𝟏′
𝝁 − 𝛾𝑐𝟏′
Ʃ𝟏 = 𝟎
from which we easily get
𝑐 =
1
𝛾
𝟏′
𝝁
𝟏′Ʃ𝟏
We then get the optimal weights for the risky assets by simply plugging this expression into 𝒘 = 𝑐𝟏
𝒘 𝟏/𝑵 =
1
𝛾
𝟏′𝝁
𝟏′Ʃ𝟏
𝟏
while the weight for the riskless asset is given by 1 − 𝒘′ 𝟏/𝑵 𝟏.
For example, if 𝑁 = 5 and this rule returns a weight of 0.15 for each risky asset, we equally divide
75% of our wealth among the risky assets (i.e., 15% on each risky asset), and then place the
remaining 25% on the risk-free asset.
FACTOR INVESTING WITH LONG-SHORT PORTFOLIOS
Computing optimal weights is not the only possibility. An alternative approach involves creating
long-short portfolios. Suppose we want to invest into N assets. First, assets are ranked according to
their predicted return. We then assemble a portfolio with two legs: a long leg which contains a given
number of assets with the highest predicted returns, and a short leg with a given number of assets
predicted to have the lowest returns. Within a certain leg the assets are often equally weighted,
although other weighting systems that assign different weights based on the predicted return are
of course possible.
One of the advantages of this approach is that it allows the investor to consider predicted returns
without amplifying the effects of the estimation error in the mean. Optimization procedures like the
mean-variance one are in fact error-maximizing: errors in the inputs lead to extreme weights, which
can lead to abysmal performance. This is why standard mean-variance optimization rarely works
well and is generally replaced either with more advanced techniques that limit extreme allocations,
or with a minimum variance portfolio. A long-short portfolio does not have theoretically optimal
weights, but it can still work better by avoiding this error-maximization trap.
The other advantage is that a long-short portfolio can be self-financing: the money obtained from
shorting the assets predicted to perform poorly is used to go long on the assets with predicted high
return. As short positions tend to be more risky than long positions, using a partially self-financing
portfolio is also common. In this case, less than half (e.g., 30%) of the wealth is placed on the short
leg, and the long leg is financed partly from the shorting and partly from the investor’s initial wealth
(in our example where 70% of the invested sum goes to the long leg, 30% of the money comes for
the shorting and the other 40% from the investor’s funds).
A practical disadvantage of such a portfolio is that in the real world it is generally difficult to short a
large number of stocks. So in practice N has to be relatively small, and therefore it can be more risky,
as it is not very well diversified. A long-short portfolio can also be particularly vulnerable in turbulent
market conditions, when the price of virtually all stocks are either increasing or decreasing at the
same time. The latter problem can be eased by making the portfolio construction more flexible (e.g.,
by varying the number of stocks and/or the amount of wealth in the long and short leg depending
on the market conditions).
Obviously, a pre-condition for creating a long-short portfolio is having a ranking based on how we
expect the assets to perform. How can we obtain it? Using the sample means is not appropriate, as
we pointed out that such estimates are too unreliable. A much better alternative is to use a factorbased
approach. In fact, long-short portfolios are a typical way factor investing is performed. This
generally involves computing expected returns using multifactor models.
Remember that the formula of a multifactor model with 𝑘 factors is:
𝑅𝑖 = 𝛼𝑖 + 𝑏𝑖1 𝑓1 + 𝑏𝑖2 𝑓2 + ⋯ + 𝑏𝑖𝑘 𝑓𝑘 + 𝜀𝑖
In practice, the expected return of a stock given a certain multifactor model is computed as:
𝐸[𝑅𝑖] = 𝛼𝑖 + 𝑏𝑖1 𝛾1 + 𝑏𝑖2 𝛾2 + ⋯ + 𝑏𝑖𝑘 𝑓𝛾 𝑘
where 𝛾 is the factor risk premium.2
Therefore, we need to estimate the loadings 𝑏𝑖𝑘 and the risk premia. This is typically done using the
Fama-MacBeth regression. It is a two-stage linear regression. Consider an estimation sample with
𝑁 assets and 𝑇 periods.
In the first stage, the loadings are estimated by regressing the returns of each asset 𝑖 on the 𝑘
factors, using the entire set of 𝑇 periods:
𝑅1𝑡 = 𝛼1 + 𝑏11 𝑓1𝑡 + 𝑏12 𝑓2𝑡 + ⋯ + 𝑏1𝑘 𝑓𝑘
𝑅2𝑡 = 𝛼2 + 𝑏21 𝑓1 𝑡 + 𝑏22 𝑓2𝑡 + ⋯ + 𝑏2𝑘 𝑓𝑘
⋮
𝑅𝑖𝑡 = 𝛼𝑖 + 𝑏𝑖1 𝑓1 𝑡 + 𝑏𝑖2 𝑓2𝑡 + ⋯ + 𝑏𝑖𝑘 𝑓𝑘
⋮
𝑅 𝑁𝑡 = 𝛼 𝑁 + 𝑏 𝑁1 𝑓1𝑡 + 𝑏 𝑁2 𝑓2𝑡 + ⋯ + 𝑏 𝑁𝑘 𝑓𝑘𝑡
The estimated loadings are then used as explanatory variables in a second regression that, for each
period 𝑡, regresses the asset returns of the entire set of 𝑁 assets:
𝑅𝑖1 = 𝛾10 + 𝛾11 𝑏𝑖1
̂ + 𝛾12 𝑏𝑖2
̂ + ⋯ + 𝛾1𝑘 𝑏𝑖𝑘
̂
𝑅𝑖2 = 𝛾20 + 𝛾21 𝑏𝑖1
̂ + 𝛾22 𝑏𝑖2
̂ + ⋯ + 𝛾2𝑘 𝑏𝑖𝑘
̂
⋮
𝑅𝑖𝑡 = 𝛾𝑡0 + 𝛾𝑡1 𝑏𝑖1
̂ + 𝛾𝑡2 𝑏𝑖2
̂ + ⋯ + 𝛾𝑡𝑘 𝑏𝑖𝑘
̂
⋮
𝑅𝑖𝑇 = 𝛾 𝑇0 + 𝛾 𝑇1 𝑏𝑖1
̂ + 𝛾 𝑇2 𝑏𝑖2
̂ + ⋯ + 𝛾 𝑇𝑘 𝑏𝑖𝑘
̂
Ideally we should use the true loadings, but their value is of course unknown in practice.
To compute the expected returns of each asset 𝑖 we need the loadings, estimated in the first
regression, and the risk premia, estimated in the second regression. Notice however that the risk
premia are time-varying. A common approach is to compute their average value over the 𝑇 periods
(just like it is common to compute the average market excess return when using the CAPM). The
expected return of asset 𝑖 according to the chosen multifactor model is given by (we omit the ^ to
keep the notation light):
𝐸[𝑅𝑖] = 𝑏𝑖1 𝛾1 + 𝑏𝑖2 𝛾2 + ⋯ + 𝑏𝑖𝑘 𝛾 𝑘
For greater clarity, let us consider how this works with the Fama-French three-factor model, which
is probably the most important factor model.
2
In the CAPM, and in single factor models in general, we can directly use the factor value (the excess market return in
the case of CAPM). In multifactor models we cannot do this, and we need to use the risk premia of the factors instead.
Recall that the model is:
𝑅𝑖 = 𝑅𝑓 + 𝑏𝑖1(𝑅 𝑚 − 𝑅𝑓) + 𝑏𝑖2 𝑆𝑀𝐵 + 𝑏𝑖3 𝐻𝑀𝐿
In practice the expected return of asset 𝑖 will be computed as:
𝐸[𝑅𝑖] = 𝑅𝑓 + 𝑏𝑖1 𝛾(𝑅 𝑚−𝑅 𝑓) + 𝑏𝑖2 𝛾𝑆𝑀𝐵 + 𝑏𝑖3 𝛾 𝐻𝑀𝐿
We use the Fama-MacBeth regression to estimate the loadings and the risk premia. Usually, the
excess return is used as dependent variable, to focus on the component of the return that is
dependent on factor exposure. Therefore, the first stage regression for each asset 𝑖 is:
𝑅𝑖𝑡 − 𝑅𝑓𝑡 = 𝛼𝑖 + 𝑏𝑖1(𝑅 𝑚𝑡 − 𝑅𝑓𝑡) + 𝑏𝑖2 𝑆𝑀𝐵𝑡 + 𝑏𝑖3 𝐻𝑀𝐿 𝑡
To simplify the notation, we indicate the first factor as 𝑀𝐾𝑇:
𝑅𝑖𝑡 − 𝑅𝑓𝑡 = 𝛼𝑖 + 𝑏𝑖1 𝑀𝐾𝑇𝑡 + 𝑏𝑖2 𝑆𝑀𝐵𝑡 + 𝑏𝑖3 𝐻𝑀𝐿 𝑡
As explained before, this regression needs to be carried out separately for each of the 𝑁 assets.
Now that we have the estimates for the loadings, we can set up the second stage regression:
𝑅𝑖 − 𝑅𝑓 = 𝛾𝑡0 + 𝛾𝑡1 𝑏𝑖1
̂ + 𝛾𝑡2 𝑏𝑖2
̂ + 𝛾𝑡3 𝑏𝑖3
̂
This regression needs to be carried out separately for each of the 𝑇 periods in the estimation
window, obtaining 𝑇 values for 𝛾𝑡1, 𝛾𝑡2 and 𝛾𝑡3 . We then compute their average in order to have a
single value. We rename the average of 𝛾𝑡1, 𝛾𝑡2 and 𝛾𝑡3 as 𝛾 𝑀𝐾𝑇, 𝛾𝑆𝑀𝐵 and 𝛾 𝐻𝑀𝐿 respectively, for
better clarity. We also compute the average risk-free rate in order to have a single value for 𝑅𝑓.3
We can now compute the expected return of each asset 𝑖 as:
𝐸[𝑅𝑖] = 𝑅𝑓 + 𝑏𝑖1 𝛾 𝑀𝐾𝑇 + 𝑏𝑖2 𝛾𝑆𝑀𝐵 + 𝑏𝑖3 𝛾 𝐻𝑀𝐿
It is now straightforward to create the long-short portfolio. We simply rank the assets according to
their expected return, and take a long position on those positioned in the upper part of the ranking,
and a short position on those in lower part of the ranking.
Each time the portfolio has to be updated, we need to compute new estimates of the expected
returns. So, for example, if we want to update the portfolio monthly, we ne need to repeat the
procedure every month, using the up-to-date data.
DOWNSIDE RISK MEASURES
So far we measured risk using the variance. This is based on the assumption that returns are
symmetrically distributed, or at least that the investor only cares about volatility as a whole, without
distinguishing between upside and downside movements. While this is not realistic (because
investors want to minimize the losses but not the gains), it greatly simplifies optimization
procedures. Moreover, downside risk measures tend to be more difficult to estimate as inputs for
optimization procedures, which may lead to worse performance despite targeting a more
appropriate measure of risk. For these reasons, here we do not consider portfolio optimization
techniques targeting downside risk. However, it is still important to be familiar with the most
popular downside risk measures, as they are useful for performance evaluation, and some of them
are also employed by regulatory authorities supervising the banking sector.
3
Computing the average value of the risk premia and of the risk-free rate is a reasonable approach, and the one
commonly used. However, it is not “the” right approach. Other approaches might also be appropriate depending on the
specific situation.
The downside risk measure most closely related to the variance is the semivariance:4
𝜎 𝐵
2
=
1
𝑇
∑[Min(𝑅𝑡 − 𝐵, 0)]2
𝑇
𝑡=1
where 𝑇 is the number of periods in the estimation window, and 𝐵 is the benchmark below which
the investor considers volatility as risk. In practice, semivariance is computed by replacing all the
portfolio returns above the benchmark with 0.
𝐵 depends on the preferences of the investor. It is convenient (and also has some nice theoretical
properties) to set 𝐵 equal to the risk-free rate. In this way if one works with excess returns, 𝐵 can
be treated as equal to zero. However, in principle, it can be set to any value.
The square root of 𝜎 𝐵
2
is called downside deviation, which we indicate with 𝜎 𝐵. The downside
deviation is to the semivariance, what the standard deviation is to the variance.
Theoretically, one can compute an optimal mean-semivariance or minimum semivariance portfolio
by simply replacing the covariance matrix with the semicovariance matrix (the analogous to the
covariance matrix in a downside risk setting) in the optimization procedures. However, estimating
this matrix presents several challenges, and therefore we do not address this topic, but we still
provide some intuition regarding what it means to target the semivariance. We distinguish between
different scenarios:
• If the distribution is symmetric and the benchmark is equal to the sample mean, targeting
the variance or the semivariance is always equivalent. One should therefore target the
former, as sample estimates for it are more accurate than sample estimates for the latter.
• If the distribution is symmetric but the benchmark is not equal to the sample mean, targeting
the variance or the semivariance is only equivalent if we set a target return (i.e., meansemivariance
optimization). Minimize the variance or the semivariance without a target
return is not equivalent in this setting.
• If the distribution is not symmetric, targeting the variance is never equivalent to targeting
the semivariance.
The following figure provides a graphical illustration.
4
Technically, this is the downside semivariance, as it is also possible to compute an upside semivariance by replacing
Min with Max in the formula. However, we are generally interested in the downside semivariance, which we therefore
simply call “semivariance”.
To compute the risk-adjusted return in this context, the Sharpe ratio should be replaced by the
Sortino ratio, which is similar to the Sharpe ratio but replaces the risk-free rate with the benchmark
𝐵, and the standard deviation with the downside deviation 𝜎 𝐵:
Sortino =
𝑅 − 𝐵
𝜎 𝐵
Another popular downside risk measure is the Value at Risk (VaR). VaR measures the maximum
potential loss that an investor can suffer over a certain period, with a 1 − 𝛼 confidence level. 𝛼 is
set by the investor; for example, an 𝛼 = 0.05 corresponds to a 95% confidence level.
More formally, given a profit and loss distribution 𝑌 we can define VaR as:
𝑉𝑎𝑅 𝛼(𝑌) = −inf{𝑦 ∈ R: (𝑌 ≤ 𝑦) > 𝛼}
For example, if we set 𝛼 = 0.05 and when evaluating a set of returns we get a 𝑉𝑎𝑅 = 0.04, it means
that we have a 5% chance of losing 4% or more in one period over the time horizon considered.
VaR can be computed in different ways. The most commonly used is the historical method: we
simply rank the historical returns in increasing order and then check the (typically negative) return
that we have at the 𝛼 percentile. Another possibility is the parametric method: we assume that
returns follow a certain distribution and we compute the loss at the chosen percentile. Simulation
(“Monte Carlo”) approaches are also possible.
The main problem with VaR is that it is not a coherent risk measure. Consider the outcomes 𝑉1 and
𝑉2 of two investments. A risk measure is said to be coherent if it possesses the following desirable
properties:
• Monotonicity: if 𝑉1 is larger or equal to 𝑉2 in every possible scenario, then the risk of 𝑉1 must
be lower than 𝑉2. Formally: if 𝑉1 ≥ 𝑉2, then 𝑅𝑖𝑠𝑘(𝑉1) < 𝑅𝑖𝑠𝑘(𝑉2).
• Translation invariance: for any outcome 𝑉, adding an additional outcome C with a certain
return reduces the risk by that amount. Formally: 𝑅𝑖𝑠𝑘(𝑉 + 𝐶) = 𝑅𝑖𝑠𝑘(𝑉) − 𝐶.
• Positive homogeneity: multiplying all outcomes by a constant should result in a scaling of
the risk measure by the same constant. In other words, if we invest, say, twice the original
amount, the risk measure should also double. Formally: 𝑅𝑖𝑠𝑘(𝜆𝑉) = 𝜆𝑅𝑖𝑠𝑘(𝑉).
• Subadditivity: the risk of a combination of two risky positions should be lower or equal to
the risk of the individual positions. In other words, diversifying by combining different assets
should reduce risk, or at worst leave it unaffected, but it cannot increase it. Formally:
𝑅𝑖𝑠𝑘(𝑉1 + 𝑉2) ≤ 𝑅𝑖𝑠𝑘(𝑉1) + 𝑅𝑖𝑠𝑘(𝑉2).
The Value at Risk satisfies the first three conditions, but not the last one. As it violates subadditivity,
risk quantified using VaR can sometimes increase with greater diversification, which is not very
meaningful.
To overcome this problem, the Conditional Value at Risk (CVaR), also known as Expected Shortfall
(ES), has been proposed:
𝐶𝑉𝑎𝑅 𝛼(𝑌) = −
1
𝛼
∫ 𝑉𝑎𝑅 𝑢 𝑑𝑢
𝛼
0
where 𝑢 is just the variable of integration and 𝑑𝑢 is the differential of this variable (i.e., we are
integrating from 0 to 𝛼 using infinitesimal increments in 𝑢 from 0 until we reach 𝛼).
In more intuitive terms, the CVaR measures the average (the “expected”) loss that we get, given
that the loss exceeds the VaR. As it is a coherent measure of risk, it is preferred and more commonly
used than the VaR. Of course, in order to compute the CVaR, you first need to compute the VaR.
The following figure provides a graphical intuition of VaR and CVaR:
CVaR is always lower than the VaR, because it is the value that we get by computing the average
loss that we have when we find ourselves in the red area left of the VaR.
Finally, another popular measure of downside risk is the drawdown (DD). The drawdown is the
decline in the value of an investment from a peak to a low point. Different drawdown measures can
be computed. A popular and easy to compute one is the maximum drawdown (MDD):
𝑀𝐷𝐷 =
𝑇𝑟𝑜𝑢𝑔ℎ 𝑉𝑎𝑙𝑢𝑒 − 𝑃𝑒𝑎𝑘 𝑉𝑎𝑙𝑢𝑒
𝑃𝑒𝑎𝑘 𝑉𝑎𝑢𝑒
where the “Trough Value” is the lowest point in the series that is reached after the highest peak.
Obviously, a lower MDD is preferable to a higher MDD. In the worst possible case, MDD is equal to
100%, i.e., the value of the investment drops to zero.
Source: https://financetrain.com
MDD fails to consider the frequency and duration of losses, and does not account for the size of any
gains. To account for the gains, we can use a more informative measure called Calmar Ratio:
𝐶𝑎𝑙𝑚𝑎𝑟 =
𝑅 − 𝑟𝑓
𝑀𝐷𝐷
This is similar to the Sharpe ratio, but the MDD is used instead of the standard deviation.
PERFORMANCE EVALUATION
After we obtain the series of portfolio returns, we need to appropriately evaluate results in order
to gauge how well our investment strategy performed. We may start with some basic statistics
about the distribution of the returns:
• Mean: the higher the better
• Standard deviation: the lower the better
• Skewness: a positive value is preferable
• (Excess) Kurtosis: a lower value is preferable unless the skewness is significantly positive
We then compute the Sharpe ratio to quantify the risk-adjusted return.
If we are interested in evaluating downside risk we may compute, instead or in addition to the
standard deviation and the Sharpe ratio, some or all of these measures:
• Downside deviation: the lower the better
• CVaR: the lower (in absolute value) the better
• (Maximum) drawdown: the lower the better
We then quantify the risk-adjusted return with an appropriate measure, like the Sortino ratio.
Another important financial indicator is the alpha, used to check if the returns of the investments
are explained by a given asset pricing model. The alpha is obtained as the intercept in a regression
of the portfolio returns over the returns of the factors of the model considered. Usually, the CAPM
(in which case the alpha is called “Jensen’s alpha”) or the Fama-French three-factor model are used.
In the first case the regression takes the form:
𝑅 = 𝛼 + 𝛽(𝑅 𝑀𝑘𝑡 − 𝑅𝑓)
while in the second case it is:
𝑅 = 𝛼 + 𝑏1(𝑅 𝑀𝑘𝑡 − 𝑅𝑓) + 𝑏2 𝑆𝑀𝐵 + 𝑏3 𝐻𝑀𝐿
If 𝛼 is significantly greater than zero, it means that our strategy achieves returns higher than those
predicted by the model based on the portfolio exposure to the factors. In order to check if we have
a significant positive 𝛼, we compute its standard errors and then perform a test of hypothesis. The
usual standard errors for the linear regression are generally not appropriate, as they assume
homoskedasticity (i.e., constant variance) and no autocorrelation (i.e., no temporal dependency in
the standard errors). These conditions are usually not met in financial time series, where
heteroskedasticity and/or autocorrelation are often observed. To account for this, we may use
instead the Newey-West standard errors. The technical details of such estimator are somewhat
complicated, but not relevant here, and Newey-West standard errors can be easily computed in R.
Using them, we can test whether the 𝛼 of our investment is significantly greater than zero.
A proper evaluation, however, should also account for the turnover, i.e., how much trading the
strategy requires. The higher the turnover, the higher the transaction costs, which of course
translates into lower net returns. To get an idea of the amount of trading required we can compute
the average turnover. The turnover at a certain period 𝑡 is given by:
𝑇𝑂𝑡 = ∑|𝑤𝑖,𝑡 − 𝑤𝑖,𝑡−1|
𝑁
𝑖=1
Basically, for each stock we compute the absolute value of the change in the corresponding weight
compared to the previous period, and we sum all these 𝑁 values.
We do this for each of the 𝑇 periods in which we applied our strategy, and then we compute the
mean. This gives us the average turnover.
However, applying this formula using the set of weights we selected for the previous period as the
value for 𝑤𝑖,𝑡−1 is not entirely correct. This is because when we update the weights at the beginning
of each new period we need to account for the fact that, due to the realized returns during the
period that just ended, the allocation of wealth changed compared to what was at the beginning of
the previous period. We clarify this with an example.
Suppose we have a portfolio with two assets updated monthly, and the weights at time 𝑡 − 1 were
0.5 and 0.5, while now at time 𝑡 we want to change them to 0.4 and 0.6. We might think that the
turnover is |0.4 − 0.5| + |0.6 − 0.5| = 0.2, which means we have to trade 20% of our wealth to
update the portfolio. If the price of the two assets did not change over the month that just ended,
this would indeed be correct. Suppose however that during that month the first asset experienced
a +10% return, and the second one a -20% return. When we update the portfolio at time 𝑡, we no
longer have the two original weights, but 0.5 + 0.5 × 0.1 = 0.55 for the first asset and 0.5 −
0.5 × 0.2 = 0.4 for the second. The weights computed like this do not sum up to 1 because the total
value of the portfolio changed compared to period 𝑡 − 1. We need to account for this by dividing
both weights by their sum. So in this example where they sum to 0.55 + 0.4 = 0.95 we have
0.55 0.95⁄ ≈ 0.58 for the first weight and 0.4 0.95⁄ ≈ 0.42 for the second. Therefore, the actual
turnover is |0.4 − 0.58| + |0.6 − 0.42| = 0.36.
We might express this concept by rewriting the formula for the turnover as
𝑇𝑂𝑡 = ∑|𝑤𝑖,𝑡 − 𝑤𝑖,𝑡−1
+
|
𝑁
𝑖=1
where the “+” in 𝑤𝑖,𝑡−1
+
indicates that we are considering the weights from the previous period after
accounting for the redistributing effect of the realized returns. Such weights are given by:
𝑤𝑖,𝑡−1
+
=
(𝑤𝑖,𝑡−1 + 𝑤𝑖,𝑡−1 × 𝑅𝑖,𝑡−1) × ∑ 𝑤𝑖,𝑡−1
𝑁
𝑖=1
∑ (𝑤𝑖,𝑡−1 + 𝑤𝑖,𝑡−1 × 𝑅𝑖,𝑡−1)𝑁
𝑖=1
The ∑ 𝑤𝑖,𝑡−1
𝑁
𝑖=1 part in the numerator, which was not used in the example above, is necessary to
rescale correctly whenever the weights do not always sum up to 1 (in which case this part of the
formula becomes equal to 1 and therefore leaves the numerator unchanged). After computing the
turnover in this way for each period, we can then compute the average turnover as before.
While the turnover certainly provides some useful information, we can get an even better figure by
considering the portfolio returns net of transaction costs. Transaction costs can be fixed or
proportional to the amount of trading. It is generally considered more appropriate to use
proportional transaction costs. These can be accounted for by multiplying the turnover of each asset
for the proportional cost. Frazzini (2012) suggests using transaction costs equal to 10 basis points
(bp). A basis point is equal to 0.01%. Therefore, for example, if we need to buy or sell 5% of the
positions we have in a certain asset, we face transaction costs equal to 0.05 × 0.001 = 0.00005,
which means that 0.005% of the money invested in that position is lost in transaction costs.
Once we have the portfolio returns net of transaction costs, we can use them to compute all the
other statistics we listed before.
It is useful to visualize the value 𝑉 of the portfolio over time, which can be computed as
𝑉𝑇 = 𝑉0 + ∑(
𝑇
𝑡=1
𝑉𝑡−1 𝑅𝑡)
It is appropriate to compute the value both ignoring and net of transaction costs. The result is then
plotted in a graph, which provides a visual representation of the effectiveness of our strategy.
We might want to also compute the evolution of real wealth in addition to the nominal wealth. In
other words, we might want to account for the inflation. We can do this by dividing the value of the
portfolio over time by the deflator.
We can compute the deflator 𝐷 using a formula analogous to the one used to compute the value of
the portfolio, simply replacing the return with the inflation rate 𝐼:
𝐷 𝑇 = 𝐷0 + ∑(
𝑇
𝑡=1
𝐷𝑡−1 𝐼𝑡)
Of course, the two series need to have the same starting value (e.g., 1 unit of wealth), and the same
frequency (e.g., monthly).
Finally, whatever metrics we use to evaluate the results, we need to compare them with appropriate
benchmarks in order to know if the results are satisfying. A benchmark portfolio surprisingly difficult
to beat is the one created with a naïve 1/N rule.5 This portfolio simply assigns equal weights to all
the assets in all periods, and therefore does not require to estimate any input and to perform any
optimization procedure. This is indeed one of its main strengths: it is completely immune to
estimation errors. Another advantage is that it has a very low turnover, which translates into very
low transaction costs.
Another possible benchmark is the market, or more precisely the returns of a large stock market
index, like the S&P 500. While it is not possible to buy an index, it is possible to buy ETFs. An ETF
(Exchange-Traded Fund) is a fund that is traded on the financial markets and which tries to replicate
a certain index. By investing in such fund, the investor can invest in a certain index without having
to trade al the stocks contained in such index. Stock markets sometimes go through periods of poor
performance that can last years, but over the long run they provide good returns (in countries with
a solid economy). Therefore, such passive investing solution, which also does not require to perform
estimation and optimization procedures, is another reasonable benchmark (over long enough
periods of time).
5
We refer to this rule/portfolio as “naive” or “naïve 1/N”, to not confuse it with the 1/N rule we described before.
However, it is also commonly called simply “1/N” rule/portfolio in the literature.