Financial Engineering
with Stochastic Calculus
Jeremy Staum
School of Operations Research
and Industrial Engineering
Cornell University
Ithaca, New York
staum@orie.cornell.edu
c 2002
i
Contents
Chapter 1. Introduction 1
1.1. Overview 1
1.2. Portfolio Theory 2
1.3. Fundamentals of Arbitrage Theory 3
1.4. The Black-Scholes Model 6
1.5. Summary 8
1.6. Problems 8
Chapter 2. Brownian Motion and Stochastic Integration 9
2.1. Definition of Brownian Motion 9
2.2. Construction of Brownian Motion 13
2.3. Problems 15
2.4. Definition of Stochastic Integration 16
2.5. It^o's Formula 19
2.6. It^o Processes 22
2.7. Summary 25
2.8. Problems 25
Chapter 3. The Black-Scholes Analysis: Part I 27
3.1. The Black-Scholes PDE 27
3.2. The Black-Scholes Formula and Greeks 28
3.3. Summary 31
3.4. Problems 31
Chapter 4. Conditional Probability and It^o Processes 33
4.1. Conditional Probability 33
4.2. Conditioning with It^o Processes 35
4.3. Martingales 39
4.4. The Markov Property 41
4.5. Summary 42
4.6. Problems 43
Chapter 5. The Black-Scholes Analysis: Part II 44
5.1. The Feynman-Kaˇc Formula 44
5.2. Girsanov Transformation 48
ii
CONTENTS iii
5.3. Examples 53
5.4. Problems 55
Chapter 6. Complete Markets 57
6.1. Market Price of Risk 57
6.2. The Pricing Measure Q 59
6.3. Vector It^o Processes 61
6.4. Markets with Multiple Risky Securities 64
6.5. Martingale Representation 67
6.6. Decomposition 68
6.7. Problems 70
Chapter 7. Futures and Dividends 72
7.1. Futures 72
7.2. Dividends 75
7.3. Problems 78
Chapter 8. Computation 79
8.1. Simulation 79
8.2. Calibration 82
8.3. Problems 85
Chapter 9. Volatility Models 87
9.1. Local Volatility 87
9.2. Stochastic Volatility 89
Appendix A. Final Review 91
Appendix B. Solutions to Problems 92
Appendix. Bibliography 103
CHAPTER 1
Introduction
In this chapter, we
(1) describe financial engineering and its main problems
(2) introduce a framework for probabilistic models of financial markets
(3) discuss arbitrage and its connection to pricing
(4) introduce and criticize the Black-Scholes model
1.1. Overview
What is financial engineering? How is it different from finance and from management
science, or operations research, or industrial engineering?
First, financial engineering is indeed an engineering discipline, and as such, proceeds from
facts about the world plus prespecified goals to produce a solution to a problem. Finance is
a social science, and as a science, it is concerned with understanding how the world really
is. Financial engineering is heavily dependent on probabilistic models for the evolution of
financial variables over time, and it relies on finance for this kind of knowledge. One way
in which financial engineering is unlike other engineering disciplines is that its fundamental
models are much less accurate. The field continues to develop rapidly in tandem with research
in finance, seeking better models.
The obvious difference between financial engineering and MS/OR/IE is the finance. Financial
engineering is not about managerial decisions, military operations, or industrial
processes, but about money, pure and simple. It shares with its parent discipline a focus
on optimization and stochastic processes, but uses mathematical tools that seldom appear
in other domains of MS/OR/IE, such as stochastic calculus. The branch of mathematics
inspired by the problems of financial engineering may be referred to as financial mathematics,
perhaps as biostatistics and econometrics are kinds of statistics having to do with the
problems of biology and economics. On the other hand, mathematical finance seems to be
essentially a synonym for financial engineering.
In this course, we will study three main problems of financial engineering. To deal
effectively with any of them requires the assumption of a probabilistic model for investment
opportunities in financial markets. Each proceeds from a real-world situation in which
someone has a goal and a situation and needs to know what action to take.
(1) Derivative pricing: There is great demand for so-called derivative securities,
whose payoffs are derived from underlying variables, usually the prices of other
1
2 1. INTRODUCTION
securities. Those selling derivative securities need to determine what price to charge
in order to maximize profits and minimize risk.
(2) Risk management: Anyone managing an institutional portfolio needs to be able to
measure and control its risk in order to be able to manage prudently and to satisfy
superiors, investors, or regulators.
(3) Portfolio optimization: Individuals and institutions want to invest optimally given
their financial goals.
1.2. Portfolio Theory
We model a financial market as a collection of traded securities. In general, our notation
is that there are N traded securities, and the ith has price Si(t) at time t. Sometimes we
may also have a 0th security (usually a money market account), in which case there are
N + 1 securities. The price vector S is a stochastic process, and we must have in mind a
probability measure P that supports this stochastic model of security prices. Some people
think of this as the objective probability measure, which describes how the world really
is, others as the subjective probability measure, which describes someone's beliefs about
future prices.
Market participants can buy and sell traded securities, resulting in portfolios that change
over time. A portfolio strategy is a vector stochastic process  where i(t) is the number
of shares of the ith security held at time t. Then the value of this portfolio is
(1.2.1) V (t) = (t)S(t) =
N
i=1
i(t)Si(t).
(The convention is that  is a row vector and S a column vector.) In this course, we will
focus on problems with a finite time horizon, usually called T, so that a portfolio strategy
is defined for t  [0, T].
We are interested primarily in self-financing portfolios, those whose value changes only
as a result of the portfolio's own gains or losses, not because of cash infusions or withdrawals.
This means that changes in the portfolio strategy  must be costless. For instance, when all
security prices are positive, a purchase of more shares of one security requires a compensating
sale of shares of some other securities, in order to raise the necessary funds. In discrete time,
the self-financing condition is
((tj+1) - (tj))S(tj+1) = (tj)D(tj),
where Di(tj) is the dividend paid at step j +1 to the owner at step j of each share of security
i. The condition means that there is zero net cost to the portfolio rebalancing done at step
i + 1, at the prices then current.
There is another condition that we want to demand of portfolio strategies. A portfolio
strategy is tame when the resulting portfolio value is bounded below, i.e. there exists a value
L such that there is zero probability of having V (t) = (t)S(t) < L for any time t  [0, T].
This is an intuitive restriction to impose, because we can imagine L as a market participant's
1.3. FUNDAMENTALS OF ARBITRAGE THEORY 3
credit limit. Should the participant's portfolio value fall beneath L, it would go bankrupt
and have to unwind its portfolio. It would be imprudent or irresponsible to contemplate a
trading strategy with unlimited losses, and unrealistic to expect to find willing creditors or
counterparties when one's liabilities are excessive. Each market participant has some credit
limit L, but tameness is supposed to be a property of a portfolio strategy without reference
to the market participant who executes it. Therefore for tameness we require merely that
there be some finite credit limit L that bounds the portfolio value below. After all, everyone
has a credit limit, so nobody could execute a strategy that is not tame.
1.3. Fundamentals of Arbitrage Theory
An interesting kind of self-financing, tame portfolio strategy is the arbitrage, which can
be thought of as a "free lunch" or "getting something for nothing." There are two kinds of
arbitrage strategies:
(1) Money for nothing: (0)S(0) < 0 and (T)S(T)  0.
(2) Lottery tickets for free: (0)S(0)  0, (T)S(T)  0, and P((T)S(T) > 0) > 0
The first type of arbitrage is a way of getting money now without taking on any risk of future
loss. The cost of setting up the initial portfolio is (0)S(0), and when this is negative, you
get paid a positive amount to set it up. Then (T)S(T)  0 means that you can close the
position at time T without any chance of loss. The second type of arbitrage costs nothing
to set up, has no chance of loss, and has a positive chance of a positive payoff.
Example 1.3.1 (Same stock, different markets). Suppose you can buy a share of stock
in London for a price lower than that for which you can sell it in New York at the same
time. Doing so, you pocket the cash difference now, while retaining no liability.
Example 1.3.2 (Playing the lottery). If you could get for free a lottery ticket that has a
positive chance of winning a prize, that would be an arbitrage. Even though you might end
up with nothing, the chance to win something is worth something, and getting that chance
for free is an arbitrage. On the other hand, the opportunity to buy a lottery ticket for a
positive price, no matter how small, and no matter how valuable or likely the prize, is not
an arbitrage. Indeed, paying one dollar now for a winning lottery ticket that guarantees a
million dollars at a later date with probability 1 is also not an arbitrage, but rather, investing
in a zero-coupon bond that pays a high rate of interest.
Example 1.3.3 (Mispriced bond). As in Example 1.3.2, a zero-coupon bond that pays
a million dollars in one year and costs one dollar now is not an arbitrage by itself. However,
suppose that you can borrow at an interest rate of 5% over that year. Then there is an
arbitrage: the portfolio which consists of borrowing one dollar, buying the zero-coupon
bond, and liquidating the portfolio at the end of the year. The initial cost is zero, and the
final payoff is $999,998.95, which is positive.
It is not the magnitude of the payoff in Example 1.3.3 that matters, just that something
positive is available for a nonpositive price. The example illustrates that it is a portfolio that
4 1. INTRODUCTION
is an arbitrage, not necessarily a single price. From the standpoint of arbitrages, nothing is
wrong with the price of the million-dollar bond, unless it is considered in relation to other
opportunities that exist in the marketplace.
Our entire approach to derivative pricing rests on the assumption that arbitrages should
not exist, if the market is in equilibrium. If  were a tame arbitrage, there would be unlimited
demand for it, causing its price (0)S(0) to rise to a positive level, so that it would no longer
be an arbitrage once prices had reached equilibrium. While the market is not in equilibrium,
an arbitrage can exist. It would certainly be difficult to argue from empirical financial
data that markets are in equilibrium most of the time, although it is also very difficult to
explain how arbitrages of substantial size could be available to a significant group of market
participants for any noticeable length of time. Regardless, the no-arbitrage principle has
a lot of force as a normative principle because nobody wants to give away free money or
valuable lottery tickets.
A classic informal example of an arbitrage is a $20 bill lying in the street. One hardly
ever finds such opportunities. On the other hand, one frequently encounters pennies lying
in the street. How can we interpret this? Perhaps one finds pennies in the street when there
is a temporary disequilibrium in the distribution of loose change, for instance, after pennies
fall from the sky, and people will soon profitably snatch up all these arbitrage opportunities.
On the other hand, perhaps a penny in the street is not an arbitrage, because the cost of
bending down to pick it up or the risk of catching a disease from handling it exceed its value.
The former argument is a way of saying that the no-arbitrage postulate does not always
hold, while the latter is an attempt to explain why certain opportunities are not arbitrages.
Bear these considerations in mind when people attempt to persuade you to pay for access
to a supposed arbitrage opportunity.
Notice that we are denying the existence of only self-financing, tame arbitrage strategies.
To have the financial interpretation of "something for nothing," the strategy must not involve
cash inflows at intermediate times, as well as avoiding initial cost or terminal loss. The reason
for focusing on tame strategies is that there is no good way of excluding arbitrages that are
not tame. The canonical example is based on the "doubling strategy" from gambling.
Example 1.3.4 (Doubling Strategy). For simplicity, imagine a gamble such as betting on
red in roulette, where the bet is either lost or pays off at even odds. That is, a bet of x dollars
results in a change of wealth of either -x or x dollars. After each loss, the gambler doubles
the bet, stopping after the first win. Suppose the gambles are Bernoulli trials, independent
and with identical positive probability of a win. With probability one, the first win occurs on
the nth gamble where n is a finite number, so if the first bet is one dollar, the total winnings
are - n-1
i=1 2i-1
+ 2n
= 1. According to this strategy, one can begin with nothing and end
up with a profit of a dollar. This is an arbitrage, but it is not tame because the intermediate
wealth could be 1 - 2i
after i losses in a row, which is unbounded below because there is no
limit to the number of losses in a row that might occur. Thus to execute this strategy would
require an infinite amount of credit, and the ability to bet an unlimited amount at once.
1.3. FUNDAMENTALS OF ARBITRAGE THEORY 5
In finance, such a strategy might involve buying stock on credit. Nobody could actually
take advantage of this to get an arbitrage, because it requires the ability to borrow an unlimited
amount of money and buy an unlimited amount of shares. So this is of no economic
significance and does not count as an arbitrage. The experience of Long-Term Capital Management
is a case in point. LTCM executed strategies that were supposed to be arbitrages,
but whose maximum loss exceeded the credit that the fund was able to draw upon. In the
event, the losses grew so large that LTCM could not find willing counterparties or creditors,
was unable to continue executing its so-called arbitrage strategies, and went spectacularly
broke.
Aside from the traded securities, we are also interested in derivative securities, also
known as contingent claims. For now, we consider only the simplest derivatives, those
that have a single payoff at a future time T, called maturity or expiration, and are pathindependent,
meaning that the payoff is a function f(X(T)) only of the value at maturity
of the underlying variables X, not their whole histories. The canonical example is the
European call option.
Example 1.3.5 (European call option). The European call option pays f(S(T)) =
(S(T) - K)+
at time T. Here S(T) is the price of a traded security, such as a stock, at
the option's maturity T, and K is a strike price, which is written into the option contract
just like the maturity T is. The notation (S(T) - K)+
means max{S(T) - K, 0}, so this is
an option, but not an obligation, to buy the stock for price K at time T.
Why does the no-arbitrage principle, the belief that arbitrages should not exist in a
market in equilibrium, help us to understand derivative securities? By considering the relationship
between derivative payoffs and underlying securities, we can make substantive
statements about prices of derivatives.
Example 1.3.6 (No-Arbitrage Bounds). The European call option's payoff (S(T)-K)+
is nonnegative, so its price must be nonnegative. If it were negative, then one could buy
the option and receive money for doing so, while taking on no risk of a future loss. This
would be an arbitrage. If we believe in a probability model such that P[S(T) > K] > 0,
then we can furthermore say that the option price must be strictly positive. Also, the payoff
(S(T)-K)+
< S(T), assuming that the strike price K > 0, as is always true in practice. The
portfolio of 1 share of stock and -1 shares of the call option has payoff S(T) - (S(T) - K)+
which equals K when S(T)  K and equals S(T) when S(T)  K. Assuming that in our
probability model, P[S(T) > 0] > 0, this is a nonnegative payoff with positive probability
of being positive, so it must have a positive price to avoid arbitrage. So the price of the call
must be less than that of the stock. Putting these results together, the no-arbitrage price of
the European call option must be between 0 and S(0).
Example 1.3.7 (Put-Call Parity). The European put option has payoff (K - S(T))+
,
thus representing the option to sell the stock for price K at time T. The difference of the
European call and put payoffs is
max{S(T)-K, 0}-max{K-S(T), 0} = max{S(T)-K, 0}+min{S(T)-K, 0} = S(T)-K.
6 1. INTRODUCTION
Suppose that there is a riskless bond paying 1 at time T, which can be bought now for B(0).
Then a portfolio containing one share of stock and -K shares of this bond is also worth
S(T)-K at time T. The no-arbitrage principle demands that these portfolios with the same
terminal value have the same initial price. Therefore the difference between the call and put
prices should equal the initial price of the other portfolio, S(0) - KB(0).
Notice that these results did not depend on a model for the prices of traded securities.
This makes them much more reliable than results that depend on a model, which is never
a perfect description of reality. On the other hand, they are not very specific: the range
[0, S(0)] for an option price is not very helpful, and put-call parity gives you a relationship
between put and call prices, but doesn't tell you exactly what either of them should be. Noarbitrage
reasoning within a stochastic model for traded security prices will give much more
specific but less reliable answers. In particular, if real-world derivative prices do not match
the "no-arbitrage" price from some model, that does not mean that an arbitrage is really
available. One must keep in mind model risk, the risk of losses arising from a portfolio
strategy when reality departs from a model. Thus there is an implicit conditionality in all our
model-based results: if this model were true, then we can specify the price to avoid arbitrage.
Whether a model is adequate for a specific business purpose is an empirical question.
The first part of this course is devoted to understanding the Black-Scholes model of stock
prices and using it to derive no-arbitrage prices for equity derivatives.
1.4. The Black-Scholes Model
Financial engineering in continuous time begins with the results of Black and Scholes,
published in 1973. Theory and practice have both come a long way since that time, partly
due to attempts to correct the shortcomings of their model. Nonetheless, their approach is
the point of departure for much of current practice.
The Black-Scholes model was intended to handle simple equity derivatives. In this model,
there are two securities, a riskless money market account and a risky stock. The price of
a share of the money market account is M(t) = ert
, where r is a constant, continuously
compounded interest rate. The stock price follows a geometric Brownian motion: S(t) =
S(0) exp((-2
/2)t+W(t)), where  is the expected return of the stock,  is its volatility,
and W is a Brownian motion, also known as a Wiener process. We will study this
stochastic process in detail in Chapter 2. It gives the stock returns a normal distribution.
An attraction of this model is that it yields a unique no-arbitrage price (rather than an
interval) for derivative securities such as the European call option. It also yields a hedging
strategy that a derivatives trader can use to eliminate all the risk associated with selling
derivatives. Moreover, some nice portfolio optimization results are available within this
model. It is important to recognize that these are not the right answers in an absolute sense.
Although they may appear inside boxes in textbooks, their validity depends on the validity
of the Black-Scholes model, which is inadequate. It rests on several false assumptions, many
of which are not easy to do away with--decades of research have gone into attempts to relax
them in order to model financial markets more accurately.
1.4. THE BLACK-SCHOLES MODEL 7
(1) The stock pays no dividends, and one can hold a negative number of shares of
the money market account and stock as well as a positive number. Financially,
such negative holdings mean borrowing money at the risk-free rate, and shorting
the stock with no borrow cost. Neither of these is possible. Market participants
are rightly perceived as credit risks, so one must pay a higher rate to borrow than
any debtor viewed as (approximately) risk-free. The institutional mechanisms for
shorting stocks are not trivial, and typically one who shorts must pay a fee to the
lender of the stock. We explore these issues later in the course: it is possible to
ameliorate some of these shortcomings without much trouble.
(2) One can buy or sell an unlimited number of shares of stock at time t at the price
S(t). It is not even true that one can both buy and sell a small number of shares
at a single price. Markets have different mechanisms, but they all cost money, so
there are always transaction costs. In a transaction, the buyer must pay more
than the seller receives, so that there is money to pay for the brokers, computers,
etc. that make a market function. Aside from this, there is the problem of finding a
counterparty, especially for large trades. Liquidity is a vague term for the ability of
a market to absorb large trades without disturbing the price. Because markets are
not perfectly liquid, one who desires to buy a large number of shares of a security
must raise the price offered in order to entice a sufficient number of shareholders to
sell; likewise a large seller must lower the price asked to attract potential buyers.
The change between the price before an order hits the market and the price at which
the trade actually takes place is known as slippage.
(3) The stock price is continuous. The discrete price systems for trading stocks (e.g.
eights or tenths of a dollar) are not the major problem with this assumption. Rather,
it gives a dangerously misleading impression about financial risks. In reality, security
prices sometimes change in large jumps, due to news announcements, or from the
end of one trading session to the beginning of another.
(4) Stock price returns are lognormally distributed. This is also dangerously misleading.
Log stock price returns actually have tails much heavier than normal, especially the
negative tail. This means that the Black-Scholes model tends to underestimate the
probability of large losses.
(5) The risk-free interest rate and stock price volatility are deterministic. Over time
horizons that are not very short, this assumption is also inadequate. Both interest
rates and price volatility are stochastic in a very complicated way, not deterministic.
It is widely accepted that there are features of these stochastic processes (e.g. mean
reversion and autocorrelation) that are important to capture.
Despite the shortcomings of this model, geometric Brownian motion is a good basic
model for a stock price, and many superior models have it as their basis. Moreover, the
Black-Scholes model shapes the language and mindset of many practitioners: for instance,
option prices are often quoted in terms of the Black-Scholes implied volatility. This is the
value of the volatility  that, when plugged into the Black-Scholes formula, yields the actual
8 1. INTRODUCTION
market price of the option. Thus a deep understanding of this model is a great benefit to a
practitioner. Essentially every model of continuous-time financial engineering relies on the
same mathematics of Brownian motion and stochastic integration, which we now study as a
preliminary to the Black-Scholes analysis.
1.5. Summary
Financial engineering is a variant of MS/OR/IE dealing with money. Its main problems
are derivative securities valuation and hedging, risk management, and portfolio optimization.
We model a financial market as a vector stochastic process of the prices of traded securities.
A financial agent controls its random wealth by executing a tame, self-financing
portfolio strategy.
An arbitrage is a way of getting something for nothing. Derivative security prices must fall
within certain bounds in order to avoid arbitrage. The notion of arbitrage and no-arbitrage
pricing may depend on a model, in which case they are subject to model risk.
The Black-Scholes model assumes a constant interest rate and a stock price following
geometric Brownian motion. It forms the basis for a common language among practitioners,
and we must study Brownian motion deeply in order to understand it. Although the BlackScholes
model allows relatively easy computation of clean, closed-form results, it rests on
dramatically erroneous assumptions that undercut the validity of these results.
1.6. Problems
Problem 1.1. Assume the Black-Scholes model, and consider portfolio strategies in the
stock alone, over the time interval [0, 1]. For each of the two (static) portfolio strategies
(1) long one share of stock: S(t) = 1 for t  [0, 1]
(2) short one share of stock: S(t) = -1 for t  [0, 1]
answer each of the questions
(1) Is it self-financing?
(2) Is it tame?
(3) Is it an arbitrage?
Problem 1.2. There is a non-dividend-paying stock worth S(0) = 100 and a riskless
zero-coupon bond paying 1 at time T = 0.5 (years), which is now worth B(0) = 0.97, and
two European call options with maturity T on the stock. One is struck at K1 = 110 and is
worth 20, and the other is struck at K3 = 130 and is worth 5. Do not assume the BlackScholes
or any other model. Assume only that the terminal stock price S(T) > 0 and the
terminal bond price B(T) = 1. Find no-arbitrage bounds (upper and lower) for the price of
a European call option on the stock with the same maturity T and strike K2 = 120. Hint:
try graphing the payoffs of
(1) the portfolio long one call struck at K1 and one struck at K3, and short two calls
struck at K2
(2) the portfolio long one call struck at K1 and short one struck at K2
CHAPTER 2
Brownian Motion and Stochastic Integration
In this chapter, we
(1) define standard, generalized, and geometric Brownian motion, and Brownian bridge
(2) illustrate the construction of Brownian motion
(3) define the It^o stochastic integral
(4) learn It^o's formula for computations with stochastic integrals
2.1. Definition of Brownian Motion
The fundamental stochastic process for this course is the Wiener process, also known
as standard Brownian motion. Brownian motion is named for the 19th-century biologist
Robert Brown, who described the random, erratic motion of minuscule pollen grains
suspended in water. The Wiener process is named for Norbert Wiener, the 20th-century
mathematician who formalized it rigorously.
Its usual representation is W (for Wiener process) or B (for Brownian motion), and we
will use the former. Recall W(t) is a random variable, the value of W at time t, W() is a
sample path, the trajectory that W follows if the state of the world is , and W(, t) is
the value of W at time t in state . Thus the random variable W(t) is random because it is
a function of the unknown state , while W() is a function of time.
A definition of a Wiener process W = (Wt, t  [0, T]) is:
(1) W(0) = 0.
(2) Each W(t) is a normal random variable with mean 0 and variance t.
(3) The increments W(t) - W(s) are stationary and independent.
(4) Each sample path W() is a continuous function of time.
In the Black-Scholes model, S(t) = S(0) exp((-2
/2)t+W(t)) describes the fundamental
uncertainty driving changes in stock price. We now discuss an economic interpretation
of the definition of the Wiener process in light of this model. This is a nice story, but a false
one: remember that the Black-Scholes model is not very accurate.
Suppose that news concerning a company's profit outlook arrives at a constant rate in a
stream of small nuggets of information, each of which is independent of the others. Imagine
taking the limit as nuggets get smaller and the rate goes up, so that we have a continuous
stream of news.
By the central limit theorem, the total impact of news over some time period is normally
distributed. This justifies Condition (2). "At a constant rate" implies stationarity of increments.
Recall that stationary means that the distribution of W(t) - W(s) depends only
9
10 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
on t - s, not t or s. The parameter  controls the variance of W(t), which is proportional to
t, according to the central limit theorem. Independence of the news items implies independence
of increments. Continuity of the sample path follows because we are imagining the
nuggets of information as infinitesimally small.
Condition (1) is just a standardization. The constant S(0) is the initial stock price.
Likewise the constant  controls the expected growth rate of the stock, so the assertion of
mean 0 in Condition (3) is also a standardization.
In Problem 2.2, you will show that the stochastic process S does not have stationary or
independent increments. Nonetheless, Condition (3) does imply that the percentage returns
of the stock, like its log returns ln(S(t)/S(s)-1) = ln(S(t)-S(s))-ln(S(s)), are stationary
and independent. In economic terms, this is a very strong assumption: the uncertainty in
the growth of the stock's value is the same at all times, and does not depend on its past
vicissitudes.
This all makes a nice story, but it is not true. The exact nature of stock price dynamics
remains unclear, but they are not so simple.
We now turn to the distributional properties of W, and in particular, its expectation and
covariance functions. For positive times s < t, W (t) = E[W(t)] = 0 and
cW (s, t) = Cov[W(s), W(t)]
= Cov[W(s) - W(0), (W(s) - W(0)) + (W(t) - W(s))]
= Cov[W(s) - W(0), W(s) - W(0)] + Cov[W(s) - W(0), W(t) - W(s)]
= Cov[W(s), W(s)] + 0
= s.
Each increment W(t) - W(s) is a normal random variable with mean 0 and variance t - s.
For an increasing finite sequence of times (t1, . . . , tn), the distribution of the random vector
(W(t1), . . . , W(tn)) is multivariate normal with mean vector zero and covariance matrix


t1 t1    t1
t1 t2    t2
...
...
...
...
t1 t2    tn


 .
It is important to remember that these statements about joint distributions are more than
just statements about marginal distributions. Here we have said that W(s) and W(t) have a
bivariate normal distribution, both have mean zero, their variances are s and t respectively,
and their correlation is Cov[W(s), W(t)]/ Var[W(s)]Var[W(t)] = s/

st = s/t. Thus
W(s) and W(t) are dependent, but not totally dependent, as we can see from the equation
W(t) = W(s) + (W(t) - W(s)), where the increment W(t) - W(s) is independent of W(s).
On the other hand, we can find random variables with the same marginal distributions, one
N(0, s) and the other N(0, t), but with a different joint distribution.
2.1. DEFINITION OF BROWNIAN MOTION 11
Example 2.1.1. Suppose that the random variable X is normally distributed with mean
zero and variance s. Now define Y = t/sX. Then Y is normally distributed with mean
zero and variance t. However, the joint distribution of X and Y is not the same as that of
W(s) and W(t). The covariance of X and Y is Cov[X, t/sX] = t/sVar[X] =

st, so
their correlation is one, illustrating that they have total dependence. Given X, we know the
value of Y . In this case, we have a degenerate bivariate normal distribution.
Example 2.1.2 (thanks to Sebastien). Again, suppose that the random variable X is
normally distributed with mean zero and variance s. Let U be independent of X, taking
on the values 1 or -1 with equal probability. Then let Y = t/sUX. This is normal with
mean zero and variance t, but X and Y do not have a bivariate normal joint distribution
at all. Given X, we know that Y is either X or -X. This does not conform to the nature
of the bivariate normal distribution, which is that one random variable must be normally
distributed given the other.
It seems plausible that there exist a state space and probability measure that make it
possible to construct a stochastic process with such distributional properties. Yet Condition
(4) is about sample paths, not about distributions at all! It is a nontrivial mathematical
fact that it is possible to construct a Wiener process, which has these distributions and continuous
sample paths. The economic significance of continuity is that it facilitates hedging,
as we will see in the Black-Scholes analysis. When jumps are possible, one frequently finds
that markets are incomplete, that is, not all contingent claims can be perfectly hedged.
This considerably complicates the task of pricing and hedging derivative securities.
To illustrate the nontriviality of continuous sample paths, consider a Poisson process N
with parameter , which like a Wiener process W begins at 0 and has stationary and independent
increments. The difference is that N(t) has the Poisson distribution with parameter
t while W(t) has the normal distribution with parameters 0 and t. But it is not possible
to construct a Poisson process with continuous sample paths, indeed a Poisson process is a
pure jump process, i.e. its sample paths change only when they jump discontinuously.
In Section 2.2, we will construct a Wiener process as the limit of a sequence of simpler,
approximating stochastic processes. Later we will be content to let the formal underpinnings
of Brownian motion remain out of sight, although not out of mind!
Several processes of interest to us derive from the Wiener process, or standard Brownian
motion: generalized Brownian motion, geometric Brownian motion, and the Brownian
bridge. Standard Brownian motion is a special case of generalized Brownian motion,
which is X(t) = X(0) + t + W(t) in the one-dimensional case. The parameter
 is the drift, which controls how fast the process grows on average, and  the volatility,
which controls the size of its fluctuations. The expectation and covariance functions are
X(t) = X(0) + t and cX(s, t) = 2
s for s  t. An m-dimensional generalized Brownian
motion is X(t) = X(0) + t + AW(t) where W is n-dimensional standard Brownian motion,
A is a m × n matrix, and  is a column m-vector of drifts. Then the covariance matrix of
12 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
X is AA t:
Cov [Xi(t), Xj(t)] = Cov
n
k=1
aikW(tk),
n
k=1
ajkW(tk) =
n
k=1
aikajkt = AA ij
t.
Generalized Brownian motion allows for nonzero average growth rates, different variability of
components, and dependence between components. It is sometimes referred to as "Brownian
motion" simply. It still has independent and stationary increments and normal marginal
distributions, but with more general mean and covariance.
Brownian bridge is so named because it is constructed from Brownian motion, and like
a bridge connects two given points while having some freedom to change its height (value)
in between. The definition of a (standard) Brownian bridge is Z(t) = W(t) - tW(1) for
t  [0, 1] only. Then Z(0) = Z(1) = 0, so (standard) Brownian bridge is constrained to start
at 0 at time 0 (like standard Brownian motion) and also end at 0 at time 1. A generalized
Brownian bridge can start and end at different times a and b and values Z(a) and Z(b), and
have a different volatility parameter . Then it would be defined for t  [a, b] as
Z(t) = Z(a) + W(t) +
t - a
b - a
(Z(b) - Z(a) - W(b)).
Drift is not a parameter for a Brownian bridge: this role is played by the given slope (Z(b)Z(a))/(b
- a).
Geometric Brownian motion is a transformation of Brownian motion:
Y (t) = exp(X(t)) = Y (0) exp(t + W(t))
where X(0) = ln(Y (0)). The Black-Scholes model of a stock price is an example of a geometric
Brownian motion. It does not have independent or stationary increments. Whereas
Brownian motion changes in an additive or arithmetic fashion, geometric Brownian motion
changes in a multiplicative or geometric fashion. Its "multiplicative increments" Y (t)/Y (s)
are independent, stationary, and have a lognormal distribution. That is, ln(Y (t)/Y (s)) is
normal. This process is very important in finance, because (contra Malthus) economic quantities
tend to grow in a multiplicative i.e. geometric fashion. For instance, the Black-Scholes
model of a stock price uses geometric Brownian motion rather than just generalized Brownian
motion because a stock price should be positive, and (all else being equal) a share priced
at $60 is much likelier to go up today by $1 than is a share priced at $2.
Example 2.1.3 (Binary call). A binary call option pays f(S(T)) = 1{S(T)K} at
maturity T. That is, it pays 1 if S(T)  K and 0 otherwise. What is the probability, under
2.2. CONSTRUCTION OF BROWNIAN MOTION 13
the Black-Scholes model, that the binary call pays off? It is
P[S(T)  K] = P[S(0) exp(( - 2
/2)T + W(T))  K]
= P W(T) 
ln(K/S(0)) - ( - 2
/2)T

= P
W(T)

T

ln(K/S(0)) - ( - 2
/2)T


T
= 
ln(S(0)/K) + ( - 2
/2)T


T
where  is the standard normal cumulative distribution function, because W(T)/

T is a
standard normal random variable. Here we are using the property that when Z is standard
normal, P[Z  x] = P[Z  -x] = (-x) by symmetry.
This probability is the expected payoff of the binary call, so trading the binary call at
the price P[S(T)  K] would correspond to a "fair gamble." However, it is not its "fair
price." After all, is the stock or the money market account a fair gamble? Would the prices
making them into fair gambles be fair prices? Would you be inclined to buy them for those
prices? We will later find the no-arbitrage price for a binary call.
Example 2.1.4 (Black-Scholes loss probability). Suppose that you plan to hold a portfolio
of 0 shares of the money market account and 1 shares of stock for the whole time
interval [0, T], and you are concerned about the probability of having lost money at time
T, that is, of the event V (T) < V (0). Assume that the Black-Scholes model holds. Then
this event is 1S(T) + 0M(T) < 1S(0) + 0M(0), or 1S(0) exp(( - 2
/2)T + W(T)) <
1S(0)+0(1-erT
). (Recall M(t) = ert
.) By a computation similar to that of Example 2.1.3,
the probability of this event is
P W(T) <
1

ln 1 +
0(1 - erT
)
1S(0)
-  -
1
2
2
T
= 


ln 1 + 0(1-erT )
1S(0)
-  - 1
2
2
T


T

 .
2.2. Construction of Brownian Motion
For a more rigorous exposition of this material, see [KS91, §2.3]. Our approach starts
with an infinite sequence of independent standard normal random variables (Zn, n  N). The
strategy is to use this sequence of independent normals to construct a sequence of stochastic
processes W(m)
that converge to a Wiener process for t  [0, 1]. We want the limit stochastic
process given by W(, t) = limm W(m)
(, t) to satisfy the definition of the Wiener process.
The point is that each W(m)
involves only a finite number of random variables, so we avoid
the worst perplexities of measure theory, at the price of having to think about convergence.
14 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
The construction of W(m)
begins with the Haar functions Hn : [0, 1]  R defined as
H1(t) = 1 and
H2m+k(t) =


2m/2
for t  k-1
2m , k-1
2m + 1
2m+1
-2m/2
for t  k
2m - 1
2m+1 , k
2m
0 otherwise
where 2m
+ k = n is the unique representation of n such that k is an integer between 1 and
2m
. Note that (k - 1)/2m
+ 1/2m+1
= k/2m
- 1/2m+1
. For an equivalent definition, see
[Mik98, pp. 51­52]. The first four Haar functions are
H1(t) = 1 for t  [0, 1]
H2(t) =
1 for t  [0, 1/2)
-1 for t  [1/2, 1]
H3(t) =


2 for t  [0, 1/4]
-

2 for t  (1/4, 1/2]
0 for t  (1/2, 1]
H4(t) =


0 for t  [0, 1/2)

2 for t  [1/2, 3/4]
-

2 for t  (3/4, 1]
Next come the Schauder functions, defined as the integrals of the Haar functions: ~Hn(t) =
t
0
Hn(s) ds. See [Mik98, pp. 53­54] for pictures. The approximate Wiener process defined
for t  [0, 1] is
W(m)
(, t) =
2m
n=1
Zn() ~Hn(t).
It is not theoretically difficult to check that the limit process W(t) = 
n=1 Zn
~Hn(t)
has the right distribution. Each random vector of the form (W(m)
(t1), . . . , W(m)
(tn)) has a
multivariate normal distribution with mean zero, so this remains true for the limit vector
(W(t1), . . . , W(tn)). As for the covariance matrix,
Cov[W(s), W(t)] =

i=1

j=1
~Hi(s) ~Hj(t)Cov[Zi, Zj] =

i=1
~Hi(s) ~Hi(t)
and it is a (non-obvious) property of the Schauder functions that this equals min{s, t}, as
desired.
What about continuity of sample paths? Just because W() is the limit of continuous
sample paths W(m)
() does not prove that it is continuous. In general the conclusion that
the limit of continuous function is itself continuous is justified when the convergence is
uniform, meaning that the rate of convergence does not get too slow for some t. We avoid
the details, mentioning only that this condition is met here (with probability one) because
2.3. PROBLEMS 15
the normal density of the independent Zi's has very light tails: recall that this density is
proportional to e-z2/2
, which is very small for large values z. This means that it is unlikely
for the convergence of W(m)
(, t) to get "held up" far from its limit W(, t) by the repeated
appearance of large and influential values Zn(). So we regard as successful this construction
of a Wiener process with the right distribution and continuous sample paths.
So far we have relied on the "nice" properties of the sequences of standard normal random
variables Zi and Schauder functions ~Hn surviving after convergence. However, the sequence
of Schauder functions also has a "bad" property: its derivative is unbounded in n, because
the sequence of Haar functions is unbounded in n. Indeed, the sample paths W() of a
Wiener process are not differentiable, with probability one.
For the derivative of the sample path W() at t = 0 to exist and be finite, it is necessary
that the slope (W(, s) - W(, 0))/(s - 0) be bounded for s sufficiently near 0. Formally,
there must exist a finite bound x and a positive time t > 0 such that for all s  (0, t], the
absolute value of the slope |(W(, s)-W(, 0))/(s-0)| = |W(, s)/s|  x. The probability
that this does not happen is greater than or equal to P (|W(s)/s| > x), for any particular
s  (0, t]. But
lim
s0+
P (|W(s)/s| > x) = lim
s0+
P |W(1)|/

s > x = lim
s0+
2(-x

s) = 1,
because both W(s)/s and W(1)/

s have a normal distribution with mean 0 and variance
1/s, which is unbounded for small s. This shows that with probability one, the derivative
would have to be larger than any finite x, so the sample path can not have a derivative at
t = 0.
2.3. Problems
Problem 2.1. In Example 2.1.4, assume the interest rate r > 0 and the drift  >
2
/2, and that the numbers of shares 0 and 1 are both strictly positive. Say whether the
probability of loss
* increases
* decreases
* stays the same
* can't tell
in each of the following scenarios:
(1) The interest rate r increases.
(2) The drift  increases.
(3) The volatility  increases.
(4) The time horizon T increases.
(5) The ratio 0M(0)/(1S(0)), of wealth in the money market account to wealth in the
stock, increases.
Give your reasoning in each case.
16 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
Problem 2.2. Show that the increments of a geometric Brownian motion Y are neither
independent nor stationary. Hint: look at the increments Y (s) - Y (0) and Y (2s) - Y (s).
Problem 2.3. What are the mean and covariance functions of a Brownian bridge Z
on the interval [0, T], starting at value Z(0) and ending at Z(T), with volatility parameter
? Its definition is Z(t) = Z(0) + W(t) + (Z(T) - Z(0) - W(T))(t/T). At what time
t is the variance Var[Z(t)] at its maximum? Hint: check that your mean and covariance
functions are correct for the special case of standard Brownian bridge: Z(t) = 0 and
cZ(s, t) = min{s, t} - st.
Problem 2.4. Let (W(t), t  [0, 1]) be a Wiener process. Show that the stochastic
process ~W defined for t  [0, T] by ~W(t) =

TW(t/T) is a Wiener process.
Problem 2.5. What are the mean and covariance functions of the approximate Wiener
processes W(m)
for m  {0, 1, 2}? For the covariance function, give cW(m) (s, t) only for the
case 0  s  t  1. Present your answer by dividing this region of pairs of times into 3
sub-regions for m = 1 and 10 sub-regions for m = 2. Next define G(m)
as the set of pairs
(s, t) for which cW(m) (s, t) = s, as it does for true Brownian motion. What happens to this
set G(m)
as m grows? Warning: this problem is computationally intensive.
2.4. Definition of Stochastic Integration
Now that we understand the Wiener process, we can investigate integration and differential
equations involving it. This piece of mathematics is associated with the name of Kiyosi
It^o, a Japanese mathematician working during World War II on the problem of controlling
long-range rockets. A fundamental tool for handling stochastic differential equations is It^o's
formula, which we discuss in Section 2.5. This comes into play in our first attack on the
Black-Scholes option pricing formula, in Chapter 3.
In the approach we take to developing this subject, integration is more fundamental than
differential equations. In this section, we develop the It^o stochastic integral by extension
from integration of discrete-time stochastic processes. This relates to our motivation, which
is to compute the gains process of a self-financing portfolio.
The gain from a discrete-time, m-step portfolio strategy  in N assets with price vector
S over the time interval [0, t] is
(2.4.1) G(t) =
m
j=1
(tj-1)(S(tj) - S(tj-1)) =
N
i=1
m
j=1
i(tj-1)(Si(tj) - Si(tj-1)).
To evaluate the gain from a continuous-time portfolio strategy, we must define a stochastic
integral
T
0
i(t) dSi(t) to replace the stochastic sum m
j=1 i(tj-1)(Si(tj) - Si(tj-1)). In a
similar way, the familiar Riemann integral replaces m
j=1 f(tj-1)(tj - tj-1) with
T
0
f(t) dt
by setting each tj - tj-1 to be a constant t, and taking a limit as t goes to 0. However,
the situation is not nearly so simple here, because S is a stochastic process.
The grand strategy for constructing the It^o stochastic integral X(t) dW(t) is:
2.4. DEFINITION OF STOCHASTIC INTEGRATION 17
(1) Define the It^o integral for the aptly named simple processes.
(2) Approximate a more general process X with a sequence of simple processes C(m)
.
(3) The It^o integral of X is the limit of the integrals of C(m)
.
Unfortunately, this construction is very opaque because it does not work pathwise. That
is, we do not define from the sample paths W() of the Wiener process and X() of the
integrand process a path
t
0
X(, s) dW(, s) for the It^o integral. Instead, the integral's
definition involves the convergence of stochastic processes. Remember that the It^o integral
t
0
X(s) dW(s) is a random variable if t is regarded as a fixed time, and a stochastic process
if t is regarded as the time index of the stochastic process.
A simple process C on [0, T] is one that for some partition (t0, . . . , tn) satisfies
C(t) = C(ti-1) for t  [ti-1, ti).
A simple process is not as complicated as a full-blown stochastic process, which has a different
random variable associated with each time. A simple process contains only a finite number
of different random variables, namely (C(ti), i = 0, . . . , n).
Example 2.4.1 (Simple processes). Take a set of time intervals [a1, b1), . . . , [am, bm) and
a set of random variables X1, . . . , Xm. Then the process C(t) = m
k=1 Xk1[ak,bk)(t) is simple.
The approximate Wiener processes W(m)
of Section 2.2 are not simple, because the Schauder
functions change continuously over time. A Poisson process is not simple: although its sample
paths are piecewise constant, the times at which its value changes are random, not fixed. It
is possible that the Poisson process changes its value on any subinterval [ti-1, ti]; the random
variables N(ti-1) and N(ti) are never the same.
For a simple process, the It^o integral is just a sum:
T
0
C(t) dW(t) =
n
i=1
C(ti-1) (W(ti) - W(ti-1)) .
Notice the similarity to the discrete-time gains process mentioned at the beginning of this
section. The It^o integral of a simple process C on [0, T] is a random variable, so we would
like to be able to compute its moments. Its expectation and variance are
E
T
0
C(t) dW(t) = 0(2.4.2)
Var
T
0
C(t) dW(t) = E
T
0
C(t) dW(t)
2
=
T
0
E (C(t))2
dt.(2.4.3)
The latter equation is called the It^o isometry, because it shows how two measures (metrics)
are the same (iso): if you measure the size of the stochastic integral
T
0
C(t) dW(t) by its
variance, you get the same result as if you measure the size of the stochastic process C(t) by
the time integral of its second moment over [0, T]. We defer derivations of these properties
until Section 4.3.
18 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
For the moment, we can gain some intuition by making the simple process C a deterministic
function f of time, still piecewise constant and changing only at the times t1, . . . , tn. In
that case we would have
E
T
0
f(t) dW(t) =
n
i=1
f(ti-1)E[W(ti) - W(ti-1)] = 0
and
Var
T
0
f(t) dW(t) = Var
n
i=1
f(ti-1) (W(ti) - W(ti-1))
=
n
i=1
f2
(ti-1)Var [W(ti) - W(ti-1)]
=
n
i=1
f2
(ti-1)(ti - ti-1)
=
T
0
f2
(t) dt,
which makes it clear that the integral
T
0
f(t) dW(t) is accumulating variance at rate f(t)2
at time t.
We proceed to It^o integration of more general processes X, subject to the restriction that
X(, t) be a function of the Wiener process sample path history (W(, s), s  [0, t]), and
(2.4.4)
T
0
E (X(t))2
dt < .
This is the defining criterion of Bj¨ork's class 2
, see [Bj¨o98, Def. 3.3]. The former restriction
makes perfect sense from the standpoint of the application to finance: one's portfolio
strategy will be determined by the only relevant and available information, namely asset
price histories, which are all functions of the driving Wiener process. The latter restriction
is mathematically excessive, but it is convenient to adopt because of its relation to the It^o
isometry: it serves as a guarantee that the stochastic integral of X will have finite variance.
Infinite variance is not a mathematically monstrous property. Real-world processes such
as insurance claims might well have infinite variance, but it is easier for us to deal with
finite-variance models.
It is not difficult to imagine constructing a sequence of simple functions C(m)
that approximate
X better and better, much as we saw how our approximate Wiener processes
W(m)
got closer and closer to a true Wiener process. We do not need to investigate the
mathematical subtleties; rather, we will accept that this can be done and moreover that
no matter what sequence of approximating simple functions we use, we get the same limit
limm
T
0
C(m)
(t) dW(t) and call it
T
0
X(t) dW(t). This is by no means obvious.
2.5. IT^O'S FORMULA 19
Because we can not see the workings of the It^o integral, we tend to lose intuition. We
will regain intuition after learning the rules of It^o integration and practicing it. In terms
of applications, think of it this way: if X represents a human response to observing the
phenomenon driven by a Wiener process W, the response can not be instantaneous and
continuous; perhaps a simple process C would be a better way to model human response. In
particular, this holds for our model of portfolio strategies. Nonetheless, we may be very close
to the continuous limit, which is actually more mathematically tractable to compute once
you know stochastic calculus: after all, who would want to make computations in physical
dynamics as sums over small discrete time intervals? When you first learn it, classical
calculus is more opaque than summation, but its convenience and elegance makes up for
that, once it has become familiar with use.
2.5. It^o's Formula
The major computational tool for the It^o integral is It^o's formula. (This is often called
It^o's lemma. Since we will use it to compute things, not prove things, we will call it a
formula, not a lemma.) This formula is a substitute for the chain rule of ordinary calculus:
d
dt
f(g(t)) = f (g(t))g (t),
which leads to the equation
b
a
f (g(t))g (t) dt = f(g(b)) - f(g(a)),
so it is a computational tool for evaluating integrals as well as derivatives. This is equivalent
to making the formal substitution g (t) dt = dg(t). We would like to have similar rules
for manipulating infinitesimals such as dt and dW(t). We should regard these rules as a
shorthand for corresponding statements about integrals. Here are the rules for multiplying
the infinitesimals dt, dW1(t), and dW2(t), where W is a multidimensional Wiener process,
which has independent components.
dt dW1(t) dW2(t)
dt 0 0 0
dW1(t) 0 dt 0
dW2(t) 0 0 dt
Why? Imagine dt  0 and dWi(t) = (Wi(t + dt) - Wi(t)) for i = 1, 2. The principle is
that (dt)2
is much smaller than dt when the latter is near 0, so (dt)2
 0. Anything going to
0 faster than dt is relatively negligible. Then dt dWi(t) is a random variable with distribution
N(0, (dt)3
), so dt dWi(t)  0. However (dWi(t))2
is a random variable with expectation dt
and variance
Var (dWi(t))2
= E (dWi(t))4
- E (dWi(t))2 2
= 3(dt)2
- (dt)2
= 2(dt)2
20 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
so (dWi(t))2
 dt. But dW1(t) dW2(t) has expectation 0 and variance
Var [dW1(t) dW2(t)] = E (dW1(t) dW2(t))2
- E [dW1(t) dW2(t)]2
= E (dW1(t))2
E (dW2(t))2
- 0
= (dt)2
so dW1(t) dW2(t)  0.
Using these heuristic rules, we derive (but do not prove) It^o's formula via a Taylor
expansion. For a sufficiently differentiable function f, whose kth derivative is f(k)
, and
partitioning the time interval [s, T] into m evenly spaced steps t1, . . . , tm,
f(W(T)) - f(W(s)) =
m
i=1
(f(W(ti)) - f(W(ti-1)))
= lim
m
m
i=1

k=1
1
k!
f(k)
(W(ti-1))(W(ti) - W(ti-1))k
=
T
s
f (W(t)) dW(t) +
1
2
f (W(t)) (dW(t))2
and (dW(t))2
= dt, so we get the formula
f(W(T)) = f(W(s)) +
T
s
f (W(t)) dW(t) +
1
2
T
s
f (W(t)) dt
where the first integral is an It^o stochastic integral and the second is a Riemann integral
over time. Equivalently, in differential form,
(2.5.1) df(W(t)) = f (W(t)) dW(t) +
1
2
f (W(t)) dt.
Compare this to the ordinary chain rule dg(t) = g (t) dt. There is an extra term because
the squared infinitesimal changes of the Wiener process W(t) are nonnegligible, unlike the
squared infinitesimal changes of the degenerate stochastic process t.
It^o's formula applies if f is only twice continuously differentiable; we only use two derivatives
in the formula. Subsequent versions of It^o's formula will still need this differentiability
condition, but it will not be repeated explicitly.
Example 2.5.1 (Simple It^o computations).
(1) With the It^o formula, using f(x) = x, we can verify
T
s
dW(t) =
T
s
1 dW(t) = W(T) - W(s) -
1
2
T
s
0 dt = W(T) - W(s),
which is what it ought to be.
2.5. IT^O'S FORMULA 21
(2) Suppose we want to evaluate
T
0
W(t) dW(t). Then to use It^o's formula, we need
f (W(t)) = W(t), i.e. f (x) = x. Then by ordinary integration, f(x) = x2
/2 + C,
and by differentiation, f (x) = 1. So the formula yields
T
0
W(t) dW(t) =
W(T)2
2
+ C -
W2
0
2
+ C -
1
2
T
0
dt =
1
2
(W(T)2
- T).
From this we can see that it is always convenient to take the constant of integration
C = 0.
Before continuing to more interesting examples, we state two useful rules. Both are
subject to some technical conditions that we will disregard.
(1) The stochastic integral
T
s
f(t) dW(t) has a normal distribution with mean zero and
variance
T
s
f2
(t) dt. This is the variance given by the It^o isometry; see the discussion
of deterministic simple functions there for a justification. When integrating a
stochastic process, not a deterministic function, the integral does not have to come
out normal: this is important, not just a technicality.
(2) When X(t) = g(W(t)), the expectation
E
T
s
X(t) dt =
T
s
E[X(t)] dt,
that is, one may interchange expectation and integration. This makes sense because
expectation is a type of integration, and we are used to being able to change the
order of integration. In this class, you should feel free to interchange expectation
and integration without worrying about the technicalities.
This will help us in the following example.
Example 2.5.2 (Higher normal moments).
(1) Let's evaluate
T
0
W2
(t) dW(t). Where f (x) = x2
, f(x) = x3
/3, and f (x) = 2x, so
using It^o's formula,
T
0
W2
(t) dW(t) =
1
3
W3
(T) -
1
3
W3
(0) -
1
2
T
0
2W(t) dt =
1
3
W3
(T) -
T
0
W(t) dt
and for the moment we are stuck, unable to deal with the time integral
T
0
W(t) dt.
We will return to this example later. However, by interchanging expectation and
integration, we see that the expectation E[
T
0
W(t) dt] =
T
0
E[W(t)] dt = 0. The
expectation of the stochastic integral
T
0
W(t)2
dW(t) is also zero. Therefore W3
(T)
also has zero expectation. In particular, since W(1) is standard normal, this shows
that the third moment E[W3
(1)] of a standard normal is zero.
22 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
(2) Let's try to repeat this success for the fourth moment E[W4
(1)]. In this case,
f(x) = x4
, so f (x) = 4x3
and f (x) = 12x2
. Thus
W4
(1) = W4
(0) + 4
1
0
W3
(t) dW(t) + 6
1
0
W2
(t) dt
and the first term is 0, the stochastic integral has 0 expectation, and the expectation
of the last term is
6E
1
0
W2
(t) dt = 6
1
0
E[W2
(t)] dt = 6
1
0
t dt = 3
by interchanging order again. Thus we have computed the fourth moment of a standard
normal random variable by stochastic integration, without any messy classical
integration of the probability density function.
2.6. It^o Processes
We begin by extending It^o's formula to allow the function to depend on time: we have
a function f(t, x), and are interested in the increment f(T, W(T)) - f(s, W(s)), or the
differential df(t, W(t)). One important example is the stock price in the Black-Scholes model,
in which case f(t, x) = S(0) exp(( - 2
/2)t + x).
The Taylor expansion now yields terms with dt, dW(t) = W(t + dt) - W(t), and
(dW(t))2
= dt. Remember that other terms, including (dt)2
and dt dW(t), are negligible.
We need the partial derivatives fx(t, x) = f(t, x)/x, ft(t, x) = f(t, x)/t, and
fxx(t, x) = 2
f(t, x)/x2
. We get
f(T, W(T)) = f(s, W(s)) +
T
s
fx(t, W(t)) dW(t) +
T
s
ft(t, W(t)) +
1
2
fxx(t, W(t)) dt
or
(2.6.1) df(t, W(t)) = ft(t, W(t)) +
1
2
fxx(t, W(t)) dt + fx(t, W(t)) dW(t).
Example 2.6.1 (Example 2.5.2 revisited). In Example 2.5.2, we got stuck when faced
with the time integral
T
0
W(t) dt. To make this integral appear in It^o's formula, we try to
choose ft(t, x) = x and fxx(t, x) = 0. One simple way of doing this is f(t, x) = tx, which has
fx(t, x) = t. Then the formula says
TW(T) = 0 +
T
0
W(t) dt +
T
0
t dW(t) or
T
0
W(t) dt =
T
0
(T - t) dW(t).
This stochastic integral of a deterministic function is normal with zero mean and variance
T
0
(T - t)2
dt = T3
/3.
Example 2.6.2 (It^o exponential). The process X(t) = exp(W(t) - t/2) is the "It^o
exponential," i.e. the process that satisfies
dX(t) = X(t) dW(t).
2.6. IT^O PROCESSES 23
How is this done? The trick is to make the above equation match the result of It^o's formula.
This means we need to have fx(t, x) = f(t, x), and also ft(t, x) + fxx(t, x)/2 = 0. This is a
sort of puzzle we need to solve. To get fx(t, x) = f(t, x) suggests that f must be exponential
in x, in which case we will also get fxx(t, x) = f(t, x). Then we need ft(t, x) = -f(t, x)/2,
which suggests that f is exponential in -t/2. Putting all the clues together, we try f(t, x) =
exp(x - t/2), which works.
This is no mere mathematical curiosity, but rather a prototype for geometric Brownian
motion, thus for the stock price in the Black-Scholes model.
Example 2.6.3 (Black-Scholes stock). The geometric Brownian motion
S(t) = S(0) exp  -
1
2
2
t + W(t)
fits the form we are discussing, with f(t, x) = S(0) exp(( - 2
/2)t + x). Then
ft(t, x) = ( - 2
/2)S(0) exp(( - 2
/2)t + x) = ( - 2
/2)f(t, x)
fx(t, x) = S(0) exp(( - 2
/2)t + x) = f(t, x)
fxx(t, x) = 2
S(0) exp(( - 2
/2)t + x) = 2
f(t, x)
so the formula gives
dS(t) = S(t) ( dt +  dW(t)) .
Thus while the generalized Brownian motion ln S(t) = ln S(0) + ( - 2
/2)t + W(t) has
arithmetic drift  - 2
/2 and arithmetic volatility , the geometric Brownian motion
S(t) = exp(ln S(t)) has geometric drift  and geometric volatility . The geometric
drift and volatility of S(t) are formed by dividing the integrands by S(t). Processes of the
general form
(2.6.2) X(t) = X(0) +
t
0
(s) ds +
t
0
(s) dW(s) or dX(t) = (t) dt + (t) dW(t),
where  and  are now stochastic processes driven by the Wiener process W, are called It^o
processes. Generalized Brownian motion is an It^o process where  and  are constant,
and geometric Brownian motion is an It^o process where  and  are proportional to the
geometric Brownian motion itself. Any positive It^o process can be written, like geometric
Brownian motion, in the form
Y (t) = Y (0) +
t
0
Y (s)(s) ds +
t
0
Y (s)(s) dW(s)
dY (t) = Y (t)((t) dt + (t) dW(t))(2.6.3)
where  and  are now its geometric drift and volatility. Its arithmetic drift and volatility
would be Y  and Y .
We want to be able to do stochastic integration with respect to an It^o process, not just
with respect to a Wiener process. In particular, we want to be able to deal with a stochastic
24 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
integral such as
T
0
(t) dS(t), representing the portfolio gains process. For It^o processes X
and Y , with X as given in (2.6.2), the stochastic integral
T
s
Y (t) dX(t) =
T
s
Y (t)(t) dt +
T
s
Y (t)(t) dW(t).
We know what both of these terms mean: one is a time integral, and the other is a stochastic
integral with respect to a Wiener process. This equation relies on the formal substitution
dX(t) = (t) dt + (t) dW(t).
The It^o formula for a function of an It^o process X as given in (2.6.2) is
df(t, X(t)) = ft(t, X(t)) +
1
2
(t)2
fxx(t, X(t)) dt + fx(t, X(t)) dX(t)
= ft(t, X(t)) + (t)fx(t, X(t)) +
1
2
(t)2
fxx(t, X(t)) dt + (t)fx(t, X(t)) dW(t).(2.6.4)
Example 2.6.4 (exp). Consider X an It^o process of the form (2.6.2), i.e. with arithmetic
drift  and volatility  (these can be stochastic processes). Taking f(t, x) = exp(x), so
ft(t, x) = 0 and f(t, x) = fx(t, x) = fxx(t, x), we find
exp(X(t)) = exp(X(0)) +
t
0
(s) exp(X(s)) +
1
2
2
(s) exp(X(s)) ds
+
t
0
(s) exp(X(s)) dW(s)
and thus exp(X) is also an It^o process, with geometric drift  + 2
/2 and volatility .
Example 2.6.5 (ln). Take Y a positive It^o process with geometric drift  and volatility ,
thus arithmetic drift Y and volatility Y . Let f(t, x) = ln(x), so ft(t, x) = 0, fx(t, x) = x-1
,
fxx(t, x) = -x-2
, we find
ln(Y (t)) = ln(Y (0)) +
t
0
(s) -
1
2
(s)2
ds +
t
0
(s) dW(s)
because the factors Y -1
and Y -2
from fx and fxx cancel the factors of Y from the arithmetic
drift and volatility. Thus ln(Y ) is also an It^o process, with arithmetic drift  - 2
/2 and
volatility .
Example 2.6.6 (Extended Black-Scholes Binary Option). In Example 2.6.5, consider
the case where S = Y has deterministic (but not necessarily constant) geometric drift  and
volatility . We might use this to model a stock price, as a slight extension of the BlackScholes
model. Now what is the probability that S(T) exceeds some level K? (Compare
with Example 2.1.3.) As in Example 2.6.5, ln(S(T)) = ln(S(0)) +
T
0
((t) - (t)2
/2) dt +
T
0
(t) dW(t). The first two terms are constants. The second term is a stochastic integral
2.8. PROBLEMS 25
of a deterministic function, so it is normal with mean 0 and variance
T
0
(t)2
dt. Therefore
ln(S(T))  N ln(S(0)) +
T
0
((t) - (t)2
/2) dt,
T
0
(t)2
dt
and the probability that S(T) exceeds K is
P[S(T) > K] = P[ln(S(T)) > ln(K)] = P[- ln(S(T)) < - ln(K)]
= 
- ln(K) - E[- ln(S(T))]
Var[- ln(S(T))]
= 


ln(S(0)/K) +
T
0
((t) - (t)2
/2) dt
T
0
(t)2 dt

 .
2.7. Summary
The Wiener process (standard Brownian motion) has continuous sample paths and stationary,
independent increments with a normal distribution whose variance is proportional
to the length of the increment. Generalized and geometric Brownian motion, as well as
Brownian bridge, are transformations of the Wiener process. The Black-Scholes model for
a stock price is geometric Brownian motion. The Wiener process can be constructed as
a limit by taking the running sum of more and more normal random variables. Thus the
Black-Scholes model has an interpretation in terms of a continuous stream of news.
The It^o stochastic integral relates to portfolio gains. It is defined for simple processes
as a sum, and for more complicated processes as the limit of stochastic integrals of simple
processes. It is often convenient to write stochastic integral equations in differential form.
It^o's formula comes from a second-order Taylor expansion and relates the stochastic integral
of f (W(t)) to f and the time integral of f (W(t)). Equivalently, it relates df(W(t))
to f (W(t)), f (W(t)), dt, and dW(t). The time-dependent version relates df(t, W(t)) to
ft(t, W(t)), fx(t, W(t)), fxx(t, W(t)), dt, and dW(t). We can also define a stochastic integral
with respect to an It^o process X rather than just a Wiener process W, and get an It^o formula
for df(X(t)).
2.8. Problems
Problem 2.6. Let Y (t) =
t
0
X(s) dW(s), where X is an arbitrary suitable stochastic
process. What are the expectation and variance of Y (t)? Does Y have independent
increments?
Problem 2.7 (Exercises 3.1, 3.2 of [Bj¨o98]). Compute dY (t) when the stochastic process
Y is defined by
(1) Y (t) =
t
0
X(s) dW(s), where X is an arbitrary suitable stochastic process
(2) Y (t) = exp(W(t))
(3) Y (t) = exp(X(t)), where dX(t) =  dt +  dW(t)
26 2. BROWNIAN MOTION AND STOCHASTIC INTEGRATION
(4) Y (t) = X2
(t), where dX(t) = X(t) dt + X(t) dW(t)
(5) Y (t) = 1/X(t), where dX(t) = X(t) dt + X(t) dW(t)
In these definitions,  and  are constants. A "suitable" stochastic process means one in
Bj¨ork's class 2
, i.e. satisfying the technical requirements to be a good stochastic integrand.
In the last two parts, try to express dY (t) in terms of Y , not X.
Problem 2.8. Let X(0) and x be constants. In each of the following cases, evaluate
P[X(T) < x], or say why you can not evaluate it.
(1) dX(t) =  dt +  dW(t) where  and  are constants
(2) dX(t) = X(t)( dt +  dW(t)) where  and  are constants
(3) dX(t) = (t) dt + (t) dW(t) where  and  are stochastic processes
CHAPTER 3
The Black-Scholes Analysis: Part I
In this chapter, we
(1) use no-arbitrage reasoning to derive the Black-Scholes PDE
(2) define the greeks and show their relationship to the Black-Scholes PDE
(3) state the Black-Scholes formula and show it satisfies the Black-Scholes PDE
3.1. The Black-Scholes PDE
Before developing a general theory in Chapter 6, we use the Black-Scholes model as
an extended example of no-arbitrage derivative pricing. We consider a path-independent
derivative with a single payoff g(S(T)) at time T, such as the European call option, which
pays g(S(T)) = (S(T) - K)+
. Since we have not yet developed a general theory, for the
moment we simply assume that
(1) There is a self-financing portfolio strategy  that replicates the option's payoff,
meaning that g(S(T)) = V (T) = 0(T)M(T) + 1(T)S(T).
(2) There is a sufficiently differentiable function f(t, S) of time t and stock price S that
gives the unique no-arbitrage price of the option.
When we complete the analysis, we will have verified these assumptions, justifying the deriva-
tion.
Recall the Black-Scholes model has a money market account M(t) = ert
growing exponentially
and a stock S(t) = S(0) exp(( - 2
/2)t + W(t)) following geometric Brownian
motion. In Example 2.6.3, we saw that
dSt = St( dt +  dWt).
This shows that  is the expected geometric growth rate of the stock. Likewise, M(t) =
f(t, W(t)) where f(t, x) = ert
, so
dM(t) = rert
dt = rM(t) dt,
which was obvious from ordinary calculus. The continuously compounded interest rate r is
the geometric growth rate of the money market account.
We now analyze a replicating strategy for the option. The no-arbitrage principle says
that the option price f(t, S(t)) = V (t), the value of the replicating portfolio. This implies
df(t, S(t)) = dV (t). The self-financing condition dV (t) = dG(t) = 0(t) dM(t) + 1(t) dS(t),
and plugging in for dS(t) = S(t)( dt +  dW(t)) and dM(t) = rM(t) dt, we have
df(t, S(t)) = dV (t) = (0M(t)r + 1S(t)) dt + 1S(t) dW(t).
27
28 3. THE BLACK-SCHOLES ANALYSIS: PART I
The strategy is to use It^o's formula and the no-arbitrage principle to get a PDE for f, which
we will then solve.
First, apply the time-dependent It^o formula to this option price function f to get another
expression for df(t, S(t)). The formula says, where the input It^o process has dX(t) = (t) dt+
(t) dW(t), that
df(t, X(t)) = ft(t, X(t)) + (t)fx(t, X(t)) +
(t)2
2
fxx(t, X(t)) dt + (t)fx(t, X(t)) dW(t).
Here the stock price process S is given by dS(t) = S(t)( dt +  dW(t)), so we get
df(t, S(t)) = ft(t, S(t)) + S(t)fS(t, S(t)) +
2
2
S(t)2
fSS(t, S(t)) dt
+ S(t)fS(t, S(t)) dW(t).
Therefore the number of shares of stock in the replicating portfolio 1(t) = fS(t, S(t)), the
first derivative of the option price with respect to the stock price, also known as .
By the no-arbitrage principle, f(t, S(t)) = V (t) = 0(t)M(t)+1(t)S(t). Rearranging and
plugging in for 1, 0(t)M(t) = f(t, S(t)) - fS(t, S(t))S(t). Therefore the drift of f(t, S(t))
is
0(t)M(t)r + 1(t)S(t) = fS(t, S(t))S(t) + (f(t, S(t)) - fS(t, S(t))S(t))r
= rf(t, S(t)) + ( - r)S(t)fS(t, S(t)).
Equating this with the drift from It^o's formula yields the Black-Scholes PDE
(3.1.1) r(SfS(t, S) - f(t, S)) + ft(t, S) +
1
2
2
S2
fSS(t, S) = 0.
Note that this applies to any derivative, whatever the terminal payoff g.
It is highly significant that , the geometric drift of the stock, does not appear in this
PDE. In this model, it is irrelevant for pricing derivatives. This should be surprising. This
mean growth rate  summarizes investors' attitudes towards risk: the more risk-averse they
are, the greater the expected rewards they demand for holding risk. Let us think of the
distribution of terminal stock price S(T) as being fixed. Then more risk-aversion means
greater drift  and consequently lower initial stock price S(0). The initial no-arbitrage call
price is 0M(0)+1S(0), so we can see that risk preferences are reflected in the call price, but
they enter through S(0), not through . This is very good for financial engineers, because
S(0) is directly observable in the marketplace, while  turns out to be effectively impossible
to estimate!
3.2. The Black-Scholes Formula and Greeks
In the Black-Scholes model, pricing the European call option means finding the solution
for t  [0, T] to the Black-Scholes PDE with the terminal condition f(T, S(T)) = (S(T) -
K)+
. (This is much like an initial condition for a differential equation, but it is called
terminal in recognition of the presence of time T not 0.) At this point, we simply pull the
3.2. THE BLACK-SCHOLES FORMULA AND GREEKS 29
solution out of a hat and verify that it solves the Black-Scholes PDE. Later we will see how
to derive this solution using the Feynman-Kaˇc formula and Girsanov transformation.
The Black-Scholes European call pricing formula is
f(t, S) = S(d1) - e-r(T-t)
K(d2)
where
d1 =
ln(S/K) + (r + 2
/2)(T - t)


T - t
and d2 = d1 - 

T - t.
The Black-Scholes PDE involves three derivatives of the pricing function f(t, x): the
time derivative ft(t, x), the first spatial derivative fx(t, x), and the second spatial derivative
fxx(t, x). Since the spatial dimension ends up having the interpretation of the stock price
S, these (calculus) partial derivatives describe the sensitivity of the (financial) derivative
security to the passage of time and changes in the stock price. They get special names:
theta is  = ft(t, S(t)), delta is  = fx(t, S(t)), and gamma is  = fxx(t, S(t)). Let us
also introduce the notation C = f(t, S(t)) for the price of the call option.
Using these names, the Black-Scholes PDE is
r(S - C) +  +
1
2
2
S2
 = 0.
We now verify that the Black-Scholes European call pricing formula indeed solves the BlackScholes
PDE, i.e. that
C = S +
1
r
 +
1
2
2
S2
 .
Some facts that aid in the calculation are
d1
S
=
d2
S
=
1
S

T - t
and
d2
t
=
d1
t
+

2

T - t
and
(d2) = (d1 - 

T - t) =
1

2
exp -
1
2
(d1 - 

T - t)2
= (d1) exp d1

T - t -
1
2
2
(T - t) = (d1)
S
K
er(T-t)
.
So the partial derivatives are
 = S(d1)
d1
t
- e-r(T-t)
K(d2)
d2
t
- re-r(T-t)
K(d2)
= -
S(d1)
2

T - t
- re-r(T-t)
K(d2)
 = (d1) + S(d1)
d1
S
- e-r(T-t)
K(d2)
d2
S
= (d1)
 =
(d1)
S

T - t
.
30 3. THE BLACK-SCHOLES ANALYSIS: PART I
The equation for  is known as the "law of the unconscious finance professor" because
the result is the same as what you would get if you differentiate the formal expression
S(d1) - Ke-r(T-t)
(d2) with respect to S while forgetting that d1 and d2 depend on S.
Remember that in the derivation of the Black-Scholes PDE, we found that the number of
shares of stock to hold in the replicating portfolio, 1, is delta. To make the portfolio value
equal the option value C, there must be C-S in the money market account. This accounts
for the two terms of the Black-Scholes formula:  = (d1) shares of stock worth St(d1),
and -Ke-rT
(d2) shares of the money market account, worth -Ke-r(T-t)
(d2).
Clearly  and  are positive; the call option's sensitivity to changes in the stock price
increases as the stock price increases. Indeed   (0, 1) and limS0  = 0, while limS  =
1. Once the stock price is very large, it is almost certain that the option will be exercised
and result in a payoff of S(T) - K, which has slope 1 in S(T). On the other hand,  is
negative, meaning that the option loses value over time (if the stock price is held fixed).
Unfortunately, some people define  to be the negative of what we have here, in order to
talk about a positive value. It is worth reiterating that all these results are for a European
call in the Black-Scholes model. The greeks can have different values and signs for different
securities, or in different models.
From our expressions for these greeks, we verify that the Black-Scholes PDE is satisfied:
 = -2
S2
/2 - r(S - C).
Indeed 2
S2
/2 and r(S - C) account for the two terms of , giving an interesting interpretation
of the greeks. Imagine replicating the call with the strategy of holding  shares
of stock and borrowing C - S by holding negative shares of the money market account.
Since the option value is the same as the replicating portfolio value, the option's  can be
understood in terms of the replicating portfolio strategy.
One term of  is simply the interest being paid on the amount borrowed to finance the
purchase of shares. As for the other term, remember that  describes the change in option
value as stock price is held fixed, so imagine a time period [t, u] passing with S(u) = S(t).
(And then imagine letting u - t go to 0.) Because the stock price is geometric Brownian
motion, it does not stay flat over this time period. So what happens along the way as the
stock price fluctuates? Because  > 0, when the stock price rises, you buy more stock, but
then it goes back down, to return to S(u) = S(t), at which point you need to reduce your
stock holdings again, so you lose money; similarly, you lose money by selling low and buying
high when the stock goes down at an intermediate time. So the replicating strategy loses
money at a rate proportional to  and to the instantaneous variance of the stock price (which
is 2
S2
), when the stock price remains flat over some time interval. This accounts for the
other term, which we will investigate further: managing it in practice can pose a challenge
for financial institutions.
Another way of understanding the "time decay"  is with Jensen's inequality, which
says that if g is a convex function, then
E[g(X)]  g(E[X]).
3.4. PROBLEMS 31
In this case, the option's payoff g(ST ) = (ST - K)+
is convex, so the discounted payoff
e-r(T-t)
(ST -K)+
is also convex, and so the option price EQ
[e-r(T-t)
(ST -K)+
] is more than
the discounted payoff evaluated at the risk-neutral expected stock price, e-r(T-t)
(Ster(T-t)
-
K)+
. So if the stock price stays flat as time passes, the option is losing its value for two
reasons. One is that the stock ought to be going up (in the risk-neutral world) with geometric
growth rate r, and if it doesn't, it fails to balance the interest owed due to borrowing the
money to buy the stock. The other reason is that the variability of the final stock price is
what makes the option's "optionality" valuable, according to Jensen's inequality. If time
passes without a change in the stock price, the remaining variance of the final stock price is
reduced, and some optionality has evaporated.
So we see that the value of an option is positively related to the volatility of the underlying
security, which brings us to another greek , which, oddly enough, is called not lambda but
vega, like the star. This is the partial derivative of the option price with respect to the
volatility parameter . One might object that  is not even a variable that is supposed to
change within the Black-Scholes model, such as time or stock price, but a constant parameter
of the model. This is quite true, but since the Black-Scholes model is not a perfect description
of how stock prices really behave, it is a good idea to acknowledge that  is not a constant
governing the stock's geometric volatility. In later chapters we will discuss how vega can
be used in hedging schemes and whether it is adequate as a description of model risk for
Black-Scholes. Similar comments apply to rho, the partial derivative with respect to the
interest rate r: in the model r is supposed to be constant, but if we change the value of
the parameter,  expresses the sensitivity of the price formula to this change. The values of
these greeks are
 = S(d1)

T - t
 = K(T - t)e-r(T-t)
(d2).
Note that there is no greek relating to the stock's drift  because it does not enter into the
pricing formula.
3.3. Summary
From the no-arbitrage principle, the self-financing condition, and It^o's formula, we derived
the Black-Scholes PDE. This is a relationship among the no-arbitrage price under the
Black-Scholes model of a derivative, its Greeks , , , the interest rate r, and the volatility
. It is significant that the drift  is not involved, and therefore does not feature in
the Black-Scholes formula for a European call option price, which satisfies the Black-Scholes
PDE.
3.4. Problems
Problem 3.1. Compute the geometric drift and volatility under the subjective probability
measure P of the European call price, as given by the Black-Scholes formula. (Hint: first
compute the arithmetic drift and volatility. Do this not by applying It^o's formula again but
32 3. THE BLACK-SCHOLES ANALYSIS: PART I
by calculating the coefficients of the replicating portfolio, using the known representations
of S and M as It^o processes.) Assume that the stock's geometric drift  > r, the risk-free
rate. Compare the geometric coefficients of the call option to those of the stock.
The following problems depend on put-call parity in the Black-Scholes model, for which
see Example 1.3.7. The price at time t of a bond paying 1 at T is B(t) = M(t)/M(T) =
e-r(T-t)
. Put-call parity says the difference between call and put prices at time t is S(t) -
KB(t).
Problem 3.2. Use put-call parity to show that the Black-Scholes no-arbitrage price of
the European put is e-r(T-t)
K(-d2) - S(-d1).
Problem 3.3. Compute the greeks , , , , and  of the European put. Show that
they satisfy the Black-Scholes PDE.
Problem 3.4. Compute , , and  for the stock and money market account (this is
trivial). Show that these greeks satisfy put-call parity like prices do.
CHAPTER 4
Conditional Probability and It^o Processes
Conditional probability is necessary to answer questions such as "What would we do if
some event E occurs?" A good example in financial engineering is the question "How would
I hedge this option if the stock price were S(t) at time t?" A first step is to determine what
we would expect to happen given the condition that E occurs, or how we would value our
option given such a state of affairs. To use conditional probability to give a more satisfying
derivation of the Black-Scholes formula and learn the tools to compute similar results, we
must study it at a deeper level than in introductory probability courses.
Those who are rusty should review the elementary facts in an introductory probability
text. A very good, concise account appears in [Mik98, §1.4.1]. Interested students are urged
to read all of [Mik98, §1.4] to get a deeper understanding of conditional probability, while
still avoiding measure theory.
In this chapter, we will
(1) develop more advanced concepts of conditional probability
(2) learn the rules of conditional expectation
(3) evaluate conditional expectations involving It^o processes
(4) see the connection between martingales and stochastic integrals
4.1. Conditional Probability
Probabilistic computations are always based on beliefs, or knowledge. When knowledge
is static and unchanging, we use a single probability measure P and expectation operator
E. To make our notation a little more explicit, P[] gives the probability of events and E[]
gives the expectation of random variables: plug in an event E or a random variable X in
place of the dot, and get its probability or expectation respectively. Recall that these two
mathematical objects are essentially the same, because P[E] = E[1E], where 1E given by
1E() =
1 if   E
0 if  / E
is the indicator function.
Now suppose that our knowledge grows during the course of an experiment. For instance,
suppose we are flipping a coin repeatedly. Let the random variable Hn be the total number
of heads after n flips. If we believe that the coin is fair, then we initially say P[H2 = 0] =
1/4 = P[H2 = 2] and P[H2 = 1] = 1/2. Then E[H2] = 1.
33
34 4. CONDITIONAL PROBABILITY AND IT^O PROCESSES
Suppose now that the coin comes up heads the first time, that is, H1 = 1. Although our
beliefs about the fairness of the coin might not change, our beliefs about the total number
of heads certainly does! We can now define P[|H1 = 1] and E[|H1 = 1] to reflect our
beliefs given that the coin came up heads the first time. Then P[H2 = 0|H1 = 1] = 0 and
P[H2 = 1|H1 = 1] = 1/2 = P[H2 = 2|H1 = 1], so E[H2|H1 = 1] = 3/2.
What if the coin had come up tails the first time instead? Then we would say P[H2 =
0|H1 = 1] = 0 and P[H2 = 1|H1 = 1] = 1/2 = P[H2 = 2|H1 = 0], so E[H2|H1 = 1] = 1/2.
Initially, we do not know whether the first coin will come up heads or tails. From this
perspective of complete ignorance, our future expectation is unknown: it is a random variable
E[H2|H1].
Let Fn represent our knowledge after the nth coin toss. Thus F0 is no knowledge, F1 is
knowing H1, F2 is knowing H1 and H2, etc. These symbols represent mathematical objects
called -algebras or -fields, but we do not need to study them in detail. We write F1  F2
to represent that F2 contains all the knowledge in F1.
From the interpretation of -algebras alone, we can see how P[|Fn] must behave. First,
P[|F0] = P[]. Thus also E[|F0] = E[]. In particular, E[H2|F0] = 1. Next,
P[|F1] = P[|H1] = 1H1=1P[|H1 = 1] + 1H1=0P[|H1 = 0].
That is, P[|F1] takes on the value P[|H1 = 1] when H1 = 1, and the value P[|H1 = 0] when
H1 = 0. We have computed both these values above. The key is that F1 contains exactly
the information about the value of H1. In particular, E[H2|F1] = 1H1=1(3/2) + 1H1=0(1/2).
Finally, E[H2|F2] = H2 because H2 is actually known given the information F2.
Here we saw how E[H2|Fn] becomes "more random" as n, i.e. our knowledge, increases.
With no knowledge to condition on, we have just one number, an expectation. Conditioning
on full knowledge, we know the outcome, so the conditional expectation is the same as
the random variable. With the intermediate amount of knowledge in F1, our conditional
expectation is random, but it is "coarser" than H2 itself, in the sense of having fewer distinct
values. In fact, its value is the average of some values of H2. The conditional expectation
E[H2|F1] is not allowed to be too fine, because it must be a function only of what is known
after the first coin toss.
We now define the conditional expectation a bit more precisely. Say that an event E is
knowable given a -algebra F if F contains enough information to ascertain whether E
has occurred or not. This is sometimes written E  F. For instance, the event {H1 = 0}
is knowable given F1, but {H2 = 0} is not. This is so even though when H1 = 1 we know
that H2 = 0 does not occur; when H1 = 0, we are not sure whether or not H2 = 0 occurs. A
random variable X is knowable given F if F contains enough information to determine the
value of X. For instance, H1 is knowable given either F1 or F2, while H2 is knowable given
F2 but not F1.
The definition of E[X|F] is that it is the random variable satisfying the two properties:
(1) E[X|F] is knowable given F.
(2) For every event A that is knowable given F, E[1AE[X|F]] = E[1AX].
4.2. CONDITIONING WITH IT^O PROCESSES 35
The first property states the obvious: if E[X|F] is our expectation of X conditional on
the information F, it must be knowable given F. The second property says that on any
decidable event of positive probability, the average values of X and of E[X|F] must be the
same. (For events of zero probability, these expectations come out zero no matter what.)
Here are the rules for manipulating conditional expectations.
Rule 4.1.1 (Linearity). E[aX + bY |F] = aE[X|F] + bE[Y |F].
This is the same as for an ordinary (unconditional) expectation.
Rule 4.1.2 (Extraction). If X is knowable given F, then E[XY |F] = XE[Y |F].
In particular, if X is knowable given F, then E[X|F] = X. To see this, let Y = 1 in the
above rule; it also follows from the definition.
Rule 4.1.3 (Tower Property). If F1  F2, then E[E[X|F2]|F1] = E[X|F1].
In this situation, E[E[X|F1]|F2] = E[X|F1] follows from the extraction rule. So we can
say "The less informative -algebra wins."
This rule is called the tower property because we can picture it as a two-story tower
collapsing into a one-story heap of rubble. Think of the coarser, less informative -algebra
F1 as the ground floor. Intuitively, the tower property says "Our best guess today as to
what our best guess at X will be tomorrow is just our best guess at X today."
In particular, we see E[E[X|F]] = E[X]. To illustrate the connection to the tower
property, we could say the unconditional expectation E[X] = E[X|F0] where F0 is the
trivial, wholly uninformative -algebra, representing no knowledge. Then F0  F.
Rule 4.1.4 (Independence). If X is independent of F, then E[X|F] = E[X].
The expectation conditional on irrelevant information is the same as the ordinary, unconditional
expectation. We have not defined rigorously what it means for a random variable X
to be independent of a -algebra F, but in practice there is seldom confusion. For instance,
when F is the information gained by observing random variables that are independent of
X, then X is independent of F. One may ask, "Does having the knowledge of F tell me
anything about the likelihood of the values of X?" If not, X is independent of F.
4.2. Conditioning with It^o Processes
In our treatment of financial engineering, we are concerned with the knowledge gained by
observing market prices that follow It^o processes. We want to be able to handle conditional
distributions of market prices and conditional expected payoffs of derivative securities. In
this section, we will extend Examples 2.1.3 and 2.1.4 to conditional probabilities in the
Black-Scholes model.
Under some technical conditions (regarding invertibility of the volatility matrix) that we
will ignore, the information from observing market prices is the same as the information from
observing the underlying Wiener process W that drives this It^o process. So Ft represents
knowledge of (W(s), s  [0, t]), that is, of the Wiener process from time 0 to t. A collection
36 4. CONDITIONAL PROBABILITY AND IT^O PROCESSES
such as (Ft, t  [0, T]) is called a filtration and represents our increasing knowledge as time
passes. When for all t  [0, T], the random variable X(t) is knowable given the -algebra
Ft, we say that the stochastic process X is adapted to the filtration (Ft, t  [0, T]). We
will consider only It^o processes, which are adapted to the filtration generated by W. The
point is that they do not depend on extraneous sources of randomness or on information
about the future. Using our filtration, we can interpret E[X|Ft] as a stochastic process.
Our expectation of a fixed random variable X changes continuously in time as we get new
information by observing the market.
An It^o process X has the arithmetic representation (2.6.2):
X(t) = X(0) +
t
0
(s) ds +
t
0
(s) dW(s).
Here X(0) is a constant and  and  are stochastic processes. The first integral is a Riemann
integral, here called a "time integral" because of its interpretation, while the second is an
It^o stochastic integral. The coefficients (t) and (t) are respectively the arithmetic drift
and the arithmetic volatility of X at time t.
Focusing on time t, we can specify this instantaneous drift and volatility by writing
dX(t) = (t) dt + (t) dB(t).
This is a stochastic differential equation (SDE for short). It specifies how X changes
with the passage of time and the changes of the Brownian motion B. It is simply a shorthand
for (2.6.2). One thing is missing: the initial condition specifying the value of X(0). We
can think of this SDE as just a compact form of notation.
In the Black-Scholes model, we are interested in the stock price stochastic process S(t) =
S(0) exp(( - 2
/2)t + W(t)). Let X(t) = ln S(t). Then X(t) satisfies the SDE dX(t) =
( - 2
/2) dt +  dW(t). From Example 2.6.3, we know S(t) satisfies dS(t) = S(t)( dt +
 dW(t)). The SDEs for X and S are related by It^o's formula. In the general arithmetic
representation of an It^o process, the arithmetic drift process (t) is here ( - 2
/2), while
the arithmetic volatility (t) = . To investigate the conditional distributions of S, it will
be more convenient to analyze X, which has normal distributions, and is related to S by a
one-to-one transformation.
Consider conditioning on all information available at time t, represented by Ft. Then
the following "late-starting" arithmetic representation for an It^o process is convenient:
(4.2.1) X(T) = X(t) +
T
t
(s) ds +
T
t
(s) dW(s).
You can check that this is consistent with (2.6.2) by plugging equation (2.6.2) in for X(t)
in equation (4.2.1) and seeing that the result is indeed equation (2.6.2) again, but with
T substituted for t. The reason for using this late-starting representation is that X(t) is
knowable given Ft, whereas the integral terms are not. Indeed if the coefficient processes 
and  are deterministic, then these integral terms are even independent of Ft.
4.2. CONDITIONING WITH IT^O PROCESSES 37
Along with conditional expectation, one should also understand conditional variance and
conditional covariance. These derive simply from conditional expectation:
Cov[X, Y |F] = E[XY |F] - E[X|F]E[Y |F], so Var[X|F] = E X2
|F - E[X|F]2
.
Example 4.2.1 (Brownian conditional expectation and variance). Because the increment
W(t) - W(s) is independent of Fs and is normal with mean 0 and variance t - s,
Var[W(t)|Fs] = E W2
(t)|Fs - E[W(t)|Fs]2
= E (W(s) + (W(t) - W(s)))2
|Fs - W2
(s)
= E W2
(s) + 2W(s)(W(t) - W(s)) + (W(t) - W(s))2
|Fs - W2
(s)
= W2
(s) + 2W(s)E[W(t) - W(s)|Fs] + E (W(t) - W(s))2
|Fs - W2
(s)
= 2W(s)E[W(t) - W(s)] + E (W(t) - W(s))2
= 2W(s)0 + (t - s)
= t - s.
A very useful fact from elementary probability is that when Z1 and Z2 are bivariate
normal:
Z1
Z2
 N
1
2
,
11 12
12 22
then the conditional expectation
E[Z2|Z1] = 2 +
12
11
(Z1 - 1).
Here the correlation  = 12/

1122 and 12/11 is the regression coefficent; remember the
connection between linear regression and conditional expectation as the minimizer of the
sum of squared errors. Furthermore, Z2 is conditionally normal given Z1:
(4.2.2) Z2|Z1  N 2 + 
22
11
(Z1 - 1), 22(1 - 2
) .
A reference on this topic is [BD77, §1.4].
4.2.1. Constant Coefficients. If X's arithmetic drift and volatility (t) =  and
(t) =  are constant, then we compute conditional probabilities involving X in terms of
the normal distribution. In this case, the integrals are
t
0
(s) ds = 
t
0
ds = t and
t
0
(s) dW(s) = 
t
0
dW(s) = W(t)
so
X(t) = X(0) + t + W(t)  N(X(0) + t, 2
t).
This It^o process X is generalized Brownian motion, whose expectation function is
E[X(t)] = X(0) + t
38 4. CONDITIONAL PROBABILITY AND IT^O PROCESSES
and whose covariance function is
Cov[X(s), X(t)] = Cov[W(s), W(t)] = 2
Cov[W(s), W(t)] = 2
min{s, t}.
To sum up, X(s) and X(t) are bivariate normal with the following mean vector and covariance
matrix:
X(0) + s
X(0) + t
and 2 s min{s, t}
min{s, t} t
.
For any It^o process S with constant geometric coefficients, including the Black-Scholes
stock price, we look at the transformation X(t) = ln S(t), which has constant arithmetic
coefficients. In general, when
dS(t) = S(t)((t) dt + (t) dW(t))
It^o's formula yields, for X(t) = ln S(t),
dX(t) = (t) -
1
2
2
(t) dt + (t) dW(t).
Now we know the distribution of X, as long as its coefficients, or equivalently the original
coefficients of S, are deterministic. Thus we can find the distribution of S(T): P(S(T) 
K) = P(X(T)  ln K). In the particular case of the Black-Scholes stock price, the geometric
coefficients of S, (t) and (t), are just the constants  and . The following examples extend
Examples 2.1.3 and 2.1.4.
Example 4.2.2 (Conditional binary call). What will be the conditional probability as
of time t that the stock price S(T)  K, so that the binary call option pays off?
P[S(T)  K|Ft] = P[X(T)  ln K|Ft]
= P[X(t) + ( - 2
/2)(T - t) + (W(T) - W(t))  ln K|Ft]
= P (W(T) - W(t)) 
ln K - X(t) - ( - 2
/2)(T - t)

|Ft
= 
ln(S(t)/K) + ( - 2
/2)(T - t)


T - t
.
Example 4.2.3 (Black-Scholes conditional loss probability). Recall that the loss event
V (T) < V (0) is S(T) < S(0) + (0/1)(1 - erT
). Just let K = S(0) + (0/1)(1 - erT
). Then
we want to find
P[S(T) < K|Ft] = P (W(T) - W(t)) <
ln K - X(t) - ( - 2
/2)(T - t)

|Ft
= 
ln(K/S(t)) - ( - 2
/2)(T - t)


T - t
.
4.3. MARTINGALES 39
4.2.2. Deterministic Coefficients. Next we examine the case where the It^o process
X has arithmetic drift  and volatility  which are deterministic functions of time: (t)
and (t) are the values at time t. In this case, the integral
t
0
(s) ds is also a deterministic
function of time t, and the stochastic integral
t
0
(s) dW(s) forms a stochastic process in
time t. If we consider a fixed time t, then
t
0
(s) ds is a constant and
t
0
(s) dW(s) is a
random variable. Because  is deterministic, this random variable is normally distributed
with mean 0 and variance given by the It^o isometry. So
X(t)  N X(0) +
t
0
(s) ds,
t
0
2
(s) ds .
In a sense, X is an "even more generalized" Brownian motion. The random variables X(t)
and X(u) are bivariate normal with the following mean vector and covariance matrix:
X(0) +
t
0
(s) ds
X(0) +
u
0
(s) ds
and
t
0
2
(s) ds
min{t,u}
0
2
(s) ds
min{t,u}
0
2
(s) ds
u
0
2
(s) ds
.
In the extended Black-Scholes model, the geometric drift (t) and volatility (t) of the
stock, as well as the interest rate, are deterministic functions of time.
Example 4.2.4 (Extended Black-Scholes). In the extended Black-Scholes model, what
will be the conditional probability as of time t that the stock price S(T)  K, so that the
binary call option pays off?
P[S(T)  K|Ft] = P[X(T)  ln K|Ft]
= P X(t) +
T
t
(u) -
1
2
2
(u) du +
T
t
(u) dW(u)  ln K|Ft
= P
T
t
(u) dW(u)  ln(K/S(t)) -
T
t
(u) -
1
2
2
(u) du|Ft
= 


ln(S(t)/K) +
T
t
(u) - 1
2
2
(u) du
T
t
2(u) du

 .
4.3. Martingales
A stochastic process M is a martingale with respect to a filtration (Ft, t  [0, T]) and
probability measure P when
* M is adapted to the filtration.
* M equals its own conditional expectation: M(s) = E[M(t)|Fs] when 0  s  t.
There is also a technical condition that the expectation E[|X(t)|] exist and be finite for all
t, but since the definition already requires that conditional expectations be finite, only very
"bad" processes will fall afoul of this requirement, which we will ignore.
40 4. CONDITIONAL PROBABILITY AND IT^O PROCESSES
Example 4.3.1 (Generalized Brownian motion is sometimes a martingale). As we have
already seen, where F is the filtration generated by a Wiener process W, and s < t, then
E[W(t)|Fs] = W(s), so a Wiener process is a martingale. And where X(t) = X(0) + t +
W(t),
E[X(t)|Fs] = X(0) + t + E[W(t)|Fs] = X(0) + t + W(s)
and this equals X(s) only when  = 0.
It is actually important to remember that a process is a martingale only with respect
to a filtration (obviously) and a probability measure (because the conditional expectation
involves the probability measure). There is no intrinsic "martingality" that a process can
possess in itself. This becomes clearer in light of the interpretation of a martingale as a
fair game: if M(s) is a gambler's wealth at time s, then continuing to play the game does
not change the expected future wealth E[M(t)|Fs] = M(s). But suppose we change the
probability measure so that the dice are loaded. The game would no longer be fair. Or
suppose we change the filtration so that it is known at time s which cards will be dealt next.
For some cards, the player would expect to start losing, for others, expect to start winning.
We now see that the It^o stochastic integral of a simple process is a martingale (with
respect to F and P). Obviously it is adapted, because for every i = 1, . . . , n, C(ti-1) and
W(ti)-W(ti-1) are knowable given FT . (Go back and look at the definition of the stochastic
integral of a simple process.) As for the martingale property, take tj-1  t  tj  T, that
is, for a time t falling into the jth subinterval of the partition,
E
T
0
C(s) dW(s)|Ft = E
t
0
C(s) dW(s) +
T
t
C(s) dW(s)|Ft
=
t
0
C(s) dW(s) + E
T
t
C(s) dW(s)|Ft
and
E
T
t
C(s) dW(s)|Ft = E C(tj-1) (W(tj) - W(t)) +
n
i=j+1
C(ti-1) (W(ti) - W(ti-1)) |Ft
= C(tj-1)E [W(tj) - W(t)|Ft]
+
n
i=j+1
E E C(ti-1) (W(ti) - W(ti-1)) |Fti-1
|Ft
= C(tj-1)0 +
n
i=j+1
E C(ti-1)E W(ti) - W(ti-1)|Fti-1
|Ft
= 0
so we see E
T
0
C(s) dW(s)|Ft =
t
0
C(s) dW(s). The stochasic integral of a more general
process X is also a martingale, as long as it satisfies the technical integrability condition
(2.4.4).
4.4. THE MARKOV PROPERTY 41
However, the It^o integral does not have independent increments.
Example 4.3.2 (It^o integral of a simple process). One simple, adapted process on [0, 2]
is given by the partition (0, 1, 2) and C(0) = 1, C(1) = 1{W(1)  0}, and C(2) = C(1). For
T  [0, 1], the integral
T
0
C(t) dW(t) = C(0) (W(T) - W(0)) = W(T). For T  [1, 2], the
integral
T
0
C(t) dW(t) = C0 (W(1) - W(0)) + C(1) (W(T) - W(1))
= W(1) + 1{W(1)  0} (W(T) - W(1))
=
W(1) if W(1) < 0
W(T) if W(1)  0
.
The integral does not have independent increments:
2
1
C(t) dW(t) obviously depends on
1
0
C(t) dW(t) = W(1); we can see that the later increment has become "contaminated" by
C(1), which depends on the past. However, the integral is a martingale. For t < T  1,
E
T
0
C(s) dW(s)|Ft = E[W(T)|Ft] = W(t) =
t
0
C(s) dW(s).
For t < 1  T,
E
T
0
C(s) dW(s)|Ft = E[W(1) + 1{W(1)  0}(W(T) - W(1))|Ft]
= E[W(1)|Ft] + E[E[1{W(1)  0}(W(T) - W(1))|F1]|Ft]
= W(t) + E[1{W(1)  0}E[W(T) - W(1)|F1]|Ft]
= W(t) + E[1{W(1)  0}0|Ft]
= W(t) =
t
0
C(s) dW(s).
For 1  t < T,
E
T
0
C(s) dW(s)|Ft =
E[W(1)|Ft] = W(1) if W(1) < 0
E[W(T)|Ft] = W(t) if W(1)  0
and in either case, this is
t
0
C(s) dW(s).
4.4. The Markov Property
Let Ft represent the information generated by observing (X(s), s  [0, t]). Then a process
X is Markov when for s  t  T the conditional distribution of X(T) given X(t) is the
same as the conditional distribution of X(T) given Ft:
P[X(T) < x|Ft] = P[X(T) < x|X(t)]
42 4. CONDITIONAL PROBABILITY AND IT^O PROCESSES
for all x. That is, for a Markov process, the future depends only on the present, not the
past.
For any process with independent increments, future changes depend on neither the past
nor the present, so the process is Markov. An It^o process X with deterministic arithmetic
coefficients has independent increments and thus is Markov. A one-to-one function of a
Markov process is also Markov.
The Markov property is a type of memorylessness, but do not confuse it with the memorylessness
of the exponential distribution for arrival times. Also do not confuse a Markov
process with a martingale. The martingale property is only about conditional expectations,
not the whole conditional distribution. However, it also specifies what this conditional expectation
must be, while the Markov property does not say what the conditional distribution
is. A Markov process also need not have independent increments.
Example 4.4.1 (Markov = martingale). The Poisson process is Markov (because it has
independent increments) but not a martingale: recall its conditional expectation is
E[N(T)|Ft] = N(t) + (T - t) = N(t).
The process I(t) =
t
0
C(s) dW(s), where C is a stochastic process satisfying the technical
requirements, is a martingale, but not in general Markov. That is because (I(s), s  [0, t])
contains more information about (C(s), s  (t, T)) than I(t) alone does. For instance, look
at Example 4.3.2. There the conditional distribution of I(2) given I(3/2) is not the same as
that given knowledge of (I(s), s  [0, 3/2]). There is a great value to knowing I(1) = W(1),
since if W(1) < 0, then I(2) - I(1) = 0, whereas otherwise it is standard normal. But I(1)
is not known given I(3/2).
4.5. Summary
A -algebra represents knowledge. A filtration represents knowledge that increases over
time as we observe the market. A conditional expectation is an expectation given the knowledge
available. The conditional expectation of a knowable random variable is simply the
random variable itself. The conditional expectation of a random variable given no relevant
information is just the ordinary unconditional expectation. Conditional expectation is linear,
and in a pair of nested conditional expectations, the less informative -algebra wins. A
conditional probability is a conditional expectation of an indicator function.
An It^o process is most easily represented as a stochastic differential equation (SDE) and
an initial condition. When applying conditional probability to It^o processes, it helps to use
a late-starting representation. It also helps to understand the bivariate normal distribution
thoroughly. The basic facts are that the Wiener process has increments that are independent
and have zero expectation, and it accumulates variance linearly in time.
A martingale is like a fair gambling game. Under some technical conditions, stochastic
integrals are martingales. The stochastic integral of a deterministic integrand has independent
increments and thus is a Markov process. These properties do not hold generally when
the integrand is stochastic.
4.6. PROBLEMS 43
4.6. Problems
In the following problems, do not attempt to verify the definition of conditional expectation.
Use your intuition, or rules in boxes. Your reasoning should be correct and explicitly
stated, but need not be rigorous at all.
Problem 4.1. Compute the conditional probability that a standard Brownian motion
will be positive at time T given the information up to time t < T: P[W(T) > 0|Ft] =
E[1{W(T)>0}|Ft]. Next compute the conditional probability that a generalized Brownian
motion given by X(t) = X(0) + t + W(t) will be positive at time T: P[X(T) > 0|Ft].
Finally, what happens to this conditional probability as:
(1) The future time T approaches t.
(2) The future time T goes to infinity.
(3) The volatility  approaches 0.
(4) The volatility  goes to infinity.
(5) The drift  goes to infinity.
(6) The drift  goes to negative infinity.
Problem 4.2. Compute E[W(t)W(u)|Fs] where s < t < u.
Problem 4.3. Suppose a stock price process S is a geometric Brownian motion, as in
the Black-Scholes model, and let X = ln S. Then for 0 < t < T, X(t) and X(T) have a
bivariate normal distribution. What is its mean vector and covariance matrix? Use this
result to compute P[S(T)  K|S(t)]. Is this equal to P[S(T)  K|Ft] where Ft is (as usual)
the information generated by (Ws, s  [0, t])? Why or why not?
Problem 4.4. What is the conditional loss probability, as in Example 4.2.3, but in the
extended Black-Scholes model where , , and r are all deterministic functions of time?
CHAPTER 5
The Black-Scholes Analysis: Part II
In this chapter, we
(1) learn the Feynman-Kaˇc formula
(2) learn Girsanov transformation
(3) apply them while deriving the Black-Scholes formula
5.1. The Feynman-Kaˇc Formula
The Black-Scholes PDE (3.1.1), together with a terminal condition such as f(T, S) =
(ST - K)+
, is an example of a Cauchy problem, which is one kind of PDE boundary-value
problem. In financial engineering, there are several useful variants of the Cauchy problem,
so it is worth knowing in generality. The Cauchy problem is to solve the PDE
(5.1.1) ft(t, x) + (t, x)fx(t, x) +
1
2
2
(t, x)fxx(t, x) - r(t, x)f(t, x) + h(t, x) = 0,
with the terminal condition f(T, x) = g(x). Under some uninteresting technical conditions
on the coefficients , , r, h and the boundary value g, the Cauchy problem has a solution
that can be represented as
(5.1.2) f(t, x) = EQ
t,x g(X(T))Dt(T) +
T
t
h(u, X(u))Dt(u) du
where Dt is a stochastic discount process given by Dt(u) = D(u)/D(t) and
(5.1.3) D(u) = exp -
u
0
r(s, X(s)) ds so Dt(u) = exp -
u
t
r(s, X(s)) ds
and X is the It^o process with late-starting representation
(5.1.4) X(u) = x +
u
t
(s, X(s)) ds +
u
t
(s, X(s)) dWQ
(s)
and where WQ
is a Wiener process under the probability measure Q. The subscript t, x
on the expectation indicates that we are conditioning on X(t) = x. The interpretation is
that Dt(u) is a stochastic discount factor applying to the time interval [t, u], and X is a
stochastic process under a new probability measure Q, generally different from P, which we
had in mind before.
This result is the Feynman-Kaˇc formula, and it is a very important tool, which gives
a connection between a PDE and a stochastic process: the idea is that X(t) is a random
44
5.1. THE FEYNMAN-KAˇC FORMULA 45
position in space (x) at time t. The Feynman-Kaˇc formula turns a PDE problem into a
problem in stochastic calculus, namely computing an expectation of a function of an It^o
process, something we have an idea how to handle with the It^o formula. One potential
conceptual pitfall is the new measure Q under which X is an It^o process with arithmetic
drift (t, X(t)) and volatility (t, X(t)): it need not be the same as P, the measure under
which the original problem was defined.
Going through a nontechnical proof will help comprehension of this fundamental result.
First, consider the simpler case where h and r are zero, so the PDE is
ft(t, x) + (t, x)fx(t, x) +
1
2
2
(t, x)fxx(t, x) = 0.
Define the stochastic process Y (t) = f(t, Xt). Applying the It^o formula,
dYt = ft(t, X(t)) + (t, X(t))fx(t, X(t)) +
1
2
2
(t, X(t))fxx(t, X(t)) dt
+(t, X(t))fx(t, X(t)) dWQ
(t)
= (t, X(t))fx(t, X(t)) dWQ
(t)
because the PDE says precisely that the drift is zero. Therefore Y is a martingale (with
respect to Q and the Brownian filtration F, and under some technical conditions). The terminal
condition is Y (T) = g(X(T)), so the martingale property says Y (t) = EQ
[g(X(T))|Ft].
The process X is Markov, because its It^o coefficients (t, X(t)) and (t, X(t)), while stochastic,
are functions of X itself. So
f(t, X(t)) = Y (t) = EQ
[g(X(T))|Ft] = EQ
[g(X(T))|X(t)] = EQ
t,X(t)[g(X(T))].
This proves that f(t, x) = EQ
t,x[g(X(T))].
This illustrates the main idea, and we now go back to the more general version:
ft(t, x) + (t, x)fx(t, x) +
1
2
2
(t, x)fxx(t, x) - r(t, x)f(t, x) + h(t, x) = 0.
This time define Y (u) = Dt(u)f(u, X(u)). From the definition (5.1.3),
dDt(u) = -r(u, X(u))Dt(u) du.
Then the differential of Y is Dt(u) df(u, X(u)) + f(u, X(u)) dDt(u) or
Dt(u) (ft(u, X(u)) + (u, X(u))fx(u, X(u))
+
1
2
2
(u, X(u))fxx(u, X(u)) - r(u, X(u))f(u, X(u))).
We really ought to justify this by means of the vector It^o formula, which we will develop
later. For now, we can see that it makes intuitive sense because it follows from the ordinary
product rule of differentiation, which would certainly apply here if r (and thus D and Dt)
were deterministic.
Applying the PDE, this drift is -Dt(u)h(u, X(u)), so the full stochastic differential is
dY (u) = Dt(u) -h(u, X(u)) du + (u, X(u))fx(u, X(u)) dWQ
(u) ,
46 5. THE BLACK-SCHOLES ANALYSIS: PART II
and integrating,
Y (T) = Y (t) -
T
t
Dt(u)h(u, X(u)) du +
T
t
Dt(u)(u, X(u))fx(u, X(u)) dWQ
(u).
Since Y (T) = Dt(T)f(T, X(T)) = Dt(T)g(X(T)) and Y (t) = Dt(t)f(t, X(t)) = f(t, X(t)) =
f(t, x), we have
f(t, x) = Dt(T)g(X(T)) +
T
t
Dt(u)h(u, X(u)) du + (M(T) - M(t))
where the Q-martingale M is the last, stochastic integral term. Again because X is Markov,
EQ
t,x[M(T) - M(t)] = EQ
[M(T) - M(t)|Ft] = M(t) - M(t) = 0, so
f(t, x) = EQ
t,x g(X(T))Dt(T) +
T
t
h(u, X(u))Dt(u) du .
The coefficients of the Cauchy problem have the following interpretations in financial
engineering:
*  is the coefficient of fx, and it becomes the Q-drift of the process X.
*  is the square root of twice the coefficient of fxx, and it becomes the Q-volatility
of the process X.
* r is the negative coefficient of f, and it becomes the instantaneous interest rate used
for discounting.
* h is the nonhomogenous term (i.e. the term not multiplying f or any of its derivatives),
and it is the continuous payment stream of the derivative security.
* g is the terminal condition, and it is the terminal payoff of the derivative security.
Thus the Feynman-Kaˇc formula says that the value of the derivative security is the expected
discounted sum of the terminal payoff and the cumulative continuous payments, with
expectations taken under the probability measure Q.
The point of all this is that we might have an easier time evaluating the Q-expectation
of the Feynman-Kaˇc formula than staring at the PDE of the Cauchy problem and trying to
guess what its solution is given the particular boundary condition g. Let's see this in the
Black-Scholes setting. Recalling our identification of stock price S with space x, the formula
yields the initial option price
f(0, S(0)) = EQ
(S(T) - K)+
e-rT
where now the stock price S has a representation as an It^o process under Q
S(t) = S(0) +
t
0
rS(s) ds +
t
0
S(s) dWQ
(s) or dS(t) = S(t)(r dt +  dWQ
(t)).
Thus in the world as seen through probability measure Q, the stock price still starts at
S(0) and is a geometric Brownian motion with volatility , but now it has drift r, the riskfree
interest rate. For this reason, Q is known as the risk-neutral measure: if investors
were neutral to risk, there would be the same reward (expected return) for holding a risky
5.1. THE FEYNMAN-KAˇC FORMULA 47
asset like the stock as for holding a riskless asset like the money market account. Of course
this is not true, and in the real world, as seen through the probability measure P under
which the original W is a Wiener process, the stock has geometric drift  > r. What we
have discovered by applying the Feynman-Kaˇc formula is that computing option prices in the
Black-Scholes is done under the risk-neutral measure Q under which the stock has geometric
drift r, or if you like, by pretending that the stock has geometric drift r.
Example 5.1.1. Consider the PDE
ft(t, x) + (t)fx(t, x) +
1
2
2
(t)fxx(t, x) - r(t)f(t, x) = 0,
where , , and r are deterministic processes, and with terminal condition f(T, x) = g(x) =
x. The Feynman-Kaˇc formula says that the solution is
f(t, x) = EQ
t,x [X(T)Dt(T)]
where (assuming Xt = x)
X(T) = x +
T
t
(u) du +
T
t
(u) dWQ
(u).
So
f(t, x) = Dt(T)EQ
x +
T
t
(u) du +
T
t
(u) dWQ
(u) = Dt(T) x +
T
t
(u) du .
In particular, consider the case (t) = r - 2
/2, (t) = , and r(t) = r, i.e. the PDE is
ft(t, x) + (r - 2
/2)fx(t, x) +
1
2
2
fxx(t, x) - rf(t, x) = 0.
Then we get a generalized Brownian motion dX(t) = (r-2
/2) dt+ dWQ
(t) and a constant
interest rate of r. This corresponds to the log stock price X = ln S in the Black-Scholes
model: we would get the same expression for dX(t) by applying It^o's formula to dS(t) =
S(t)( dt +  dWQ
(t). In this case, substituting ln S for X, we have
ln S(T) = ln S(t) + (r - 2
/2)(T - t) + (WQ
(T) - WQ
(t)),
which shows
f(t, ln S(t)) = e-r(T-t)
(ln S(t) + (r - 2
/2)(T - t))
is the price at t of the derivative security with payoff ln S(T) at maturity T. Note that this
payoff is negative if S(T) < 1!
If this seems fishy to you, check out the next example, which confirms it. This shows
how different PDE coefficients  and  and terminal condition g can relate to what is
fundamentally the same problem. In what we just saw, the process X is generalized Brownian
motion, and the payoff is X(T), while in what follows, X is geometric Brownian motion, and
the payoff is ln X(T).
48 5. THE BLACK-SCHOLES ANALYSIS: PART II
Example 5.1.2. Consider the PDE
ft(t, x) + rxfx(t, x) +
1
2
2
x2
fxx(t, x) - rf(t, x) = 0,
with terminal condition f(T, x) = g(x) = ln x. The Feynman-Kaˇc formula says that the
solution is
f(t, x) = EQ
t,x e-r(T-t)
ln X(T)
where
dX(t) = X(t)(r dt +  dWQ
(t)).
By It^o's formula,
d ln X(t) = (r - 2
/2) dt +  dWQ
(t),
so
f(t, x) = e-r(T-t)
EQ
t,x [ln X(T)]
= e-r(T-t)
EQ
[ln x + (r - 2
/2)(T - t) + (WQ
(T) - WQ
(t))]
= e-r(T-t)
(ln x + (r - 2
/2)(T - t)).
5.2. Girsanov Transformation
For the European call option, we need to evaluate the expectation EQ
t,S(t)[(S(T)-K)+
e-r(T-t)
].
The payoff (S(T) - K)+
= max{S(T) - K, 0} = (S(T) - K)1{S(T) > K}, so
f(t, S(t)) = EQ
t,S(t) (S(T) - K)+
e-r(T-t)
= e-r(T-t)
EQ
t,S(t) [(S(T) - K)1{S(T) > K}]
= e-r(T-t)
EQ
t,S(t) [S(T)1{S(T) > K}] - KQt,S(t)[S(T) > K] .
The probability Qt,S(t)[S(T) > K] is easy to handle, since S is a geometric Brownian
motion under Q. By the It^o formula,
(5.2.1) ln S(T) = ln S(t) +
T
t
(r - 2
/2) du +
T
t
 dWQ
(u)
so it has under Q, conditional on S(t), a normal distribution with mean ln S(t) + (r -
2
/2)(T - t) and variance 2
(T - t). Therefore the probability is
Qt,S(t)[S(T) > K] = Qt,S(t)[- ln S(T) < - ln K] = 
- ln K + Et,S(t)[ln S(T)]
Vart,S(t)[ln S(T)]
= 
ln(S(t)/K) + (r - 2
/2)(T - t)


T - t
= (d2).(5.2.2)
The other term EQ
t,S(t)[S(T)1{S(T) > K}] is not so easy to handle.
To deal with it, we need yet more mathematical machinery: the Girsanov transformation.
This is a way of changing a complicated expectation into an easier expectation under a new
probability measure. The idea is to define a new probability measure so that the change
of measure will "use up" a factor in the random variable whose expectation you need to
5.2. GIRSANOV TRANSFORMATION 49
take. The new probability measure alters the drift of the old Wiener process by an amount
related to the factor used up in the change of measure. Loosely speaking, Girsanov's theorem
relates changes of probability measure to changes of the drift of Brownian motion. That is,
for some pairs of probability measures P and Q, with WP
and WQ
Wiener processes under
P and Q respectively, this result gives the relationship between WP
and WQ
. This is a
useful tool because we will find that the easiest way to evaluate some expectations (such as
EQ
[S(T)1{S(T) > K}]) is by changing the probability measure under which they are taken.
But how can one change the probability measure under which an expectation is taken?
Supposing that the original probability measure is P, there are some limits on the measures
that we can change to. Remember that we think of expectations as integrals: for a random
variable X,
EP
[X] = X() dP(),
where in the simplest case, P has a density p, and dP() = p() d. In particular,
P[A] = 1{  A} dP().
If some other probability measure ~P can be defined by
~P[A] = 1{  A} d~P() = 1{  A}Y () dP()
with some nonnegative random variable Y , then we say that ~P is absolutely continuous
with respect to P. This is written ~P P because it implies that if P[A] = 0, then
~P[A] = 0. (Note that this does not imply that ~P[A]  P[A] for all events A, which would
be impossible!) We also say that the random variable Y is the density of ~P with respect
to P,1
written d~P/dP. The reason for this notation is that we imagine cancelling dP in the
equation
E
~P
[X] = X d~P = X
d~P
dP
dP = EP
X
d~P
dP
that shows how to change the probability measure under which an expectation is taken. This
is exactly the same thing that is going on in importance sampling in Monte Carlo simulation.
Girsanov's theorem tells us about the density d~P/dP we need to change to a measure
~P under which the drift of Brownian motion changes. This density will be a stochastic
exponential, i.e. a positive It^o process of the form
Z(t) = exp -
1
2
t
0
2
(s) ds +
t
0
(s) dW(s)
where  is another stochastic process; this is called the stochastic exponential of . Under
the usual sort of technical condition, the stochastic exponential is a martingale, since dZ(t) =
Z(t)(t) dW(t) has zero drift, by the It^o formula.
1or Radon-Nikodym derivative, or likelihood ratio
50 5. THE BLACK-SCHOLES ANALYSIS: PART II
Then Girsanov's theorem states: Where WP
is a Wiener process under P, and d~P/dP =
Z(T) as given above, then W
~P
given by dW
~P
(t) = dWP
(t) - (t) dt is a Wiener process
under ~P, for t  [0, T].
Example 5.2.1 (Brownian Motion with Drift). Suppose W is a Wiener process and
~W(t) = W(t) + at. Then ~W is a Wiener process under ~P where d~P/dP = exp(-a2
T/2 -
aW(T)).
Here follows an informal justification of Girsanov's theorem, for the case where the change
of drift  is a constant. The key is the way that the stochastic exponential relates to the
standard normal density.
The goal is to show that ~W, given by ~W(t) = W(t) - t, is a Wiener process on [0, T]
under the measure ~P defined by d~P/dP = exp(-2
T/2 + W(T)). Obviously ~W starts
at zero and has continuous paths. We just need to check that under ~P it has independent
increments, and ~W(t) has distribution N(0, t) for t  T.
First we check the independence of increments by showing that the joint probability that
the increment ~W(t) takes a value in some set A1 and that the increment ~W(T)- ~W(t) takes
a value in some set A2 is the product of the marginal probabilities.
~P[ ~W(t)  A1, ~W(T) - ~W(t)  A2]
= E
~P
[1{ ~W(t)  A1}1{ ~W(T) - ~W(t)  A2}]
= EP d~P
dP
1{W(t) - t  A1}1{W(T) - W(t) - (T - t)  A2}
= EP
exp -
1
2
2
T + W(T) 1{W(t) - t  A1}1{W(T) - W(t) - (T - t)  A2}
= EP
exp -
1
2
2
t + W(t) 1{W(t) - t  A1}
× exp -
1
2
2
(T - t) + (W(T) - W(t)) 1{W(T) - W(t) - (T - t)  A2}
= EP
exp -
1
2
2
t + W(t) 1{W(t) - t  A1}
×EP
exp -
1
2
2
(T - t) + (W(T) - W(t)) 1{W(T) - W(t) - (T - t)  A2}
because W has independent increments under P. Because this probability factors, ~W has
independent increments under ~P.
5.2. GIRSANOV TRANSFORMATION 51
Now we focus just on the first of these expectations, in order to check the distribution of
~W(t) under ~P.
~P[ ~W(t)  A1] = EP
1{W(t) - t  A1} exp -
1
2
2
t + W(t)
= 1{x - t  A1} exp -
1
2
2
t + x (x/

t) dx
= 1{y  A1} exp -
1
2
2
t + y + 2
t ((y + t)/

t) dy
=
1

2 A1
exp +
1
2
2
t + y exp -
1
2t
(y + t)2
dy
=
1

2 A1
exp -
1
2t
(-2
t2
- 2yt + y2
+ 2yt + 2
t2
) dy
=
1

2 A1
exp -
1
2
y

t
2
dy
which shows that ~W(t) indeed has the N(0, t) distribution.
Armed with Girsanov's theorem, we return to the Black-Scholes model, in which dS(t) =
dS(t)( dt +  dW(t)), so S(t) = S(0) exp(( - 2
/2)t + W(t)). Remember that W is a
Wiener process under the probability measure P. We are faced with the problem of computing
EQ
t,S(t)[S(T)1{S(T) > K}], but first let's look at the easier computation of EP
[S(T)].
Indeed, previously we have not computed the expectation of an It^o process with deterministic
geometric coefficients.
The way to use change of probability measure in evaluating expectations is to "use up"
any unwelcome factors by putting them into the density d~P/dP. The result is that instead
of a difficult expectation under P we get an easy probability under ~P.
Example 5.2.2 (Black-Scholes stock mean). Consider EP
[S(T)] = EP
[S(0) exp(( -
2
/2T + W(T))]. The factor exp(W(T)) is unwelcome, because we do not know how to
compute its mean. But d~P/dP is supposed to be a stochastic exponential, so we must set
d~P/dP = exp(-2
T/2 + W(T)). Now
EP
[S(T)] = EP
[S(0) exp(( - 2
/2)T + W(T))]
= S(0) exp(T)EP
[exp(-2
T/2 + W(T))]
= S(0) exp(T)EP
[d~P/dP]
= S(0) exp(T)E
~P
[1]
= S(0) exp(T)
This justifies our previous assertions that  is the mean geometric growth rate of the
stock in the Black-Scholes model. Actually, it is clear without Girsanov's theorem that
52 5. THE BLACK-SCHOLES ANALYSIS: PART II
EP
[exp(-2
T/2 + W(T))] = 1 because the stochastic exponential is a martingale starting
at 1.
Next we will apply this tool to the expectation EQ
[ST 1{ST > K}] appearing in the price
at time 0 of the European call option in the Black-Scholes model, using Girsanov's theorem
as in Example 5.2.2.
Example 5.2.3 (The difficult term in the Black-Scholes formula). This time use S(T) =
exp(( - 2
/2)T + WQ
(T)), where WQ
is a Wiener process under Q. Much as in Example
5.2.2,
EQ
[S(T)1{S(T) > K}] = EQ
[S(0) exp((r - 2
/2T + WQ
(T))1{S(T) > K}]
= EQ
[S(0) exp(rT) exp(-2
T/2 + WQ
(T))1{S(T) > K}]
= S(0) exp(rT)EQ
[1{S(T) > K}d~P/dQ]
= S(0) exp(rT)E
~P
[1{S(T) > K}]
= S(0) exp(rT)~P[S(T) > K].
Girsanov's theorem says that under ~P, W
~P
(t) = WQ
(t) - t is a Wiener process. So
WQ
(t) = W
~P
(t) + t and
~P[S(T) > K] = ~P[S(0) exp((r - 2
/2)T + WQ
(T)) > K]
= ~P[S(0) exp((r - 2
/2)T + (W
~P
(T) + T)) > K]
= ~P[S(0) exp((r + 2
/2)T + W
~P
(T)) > K]
= ~P W
~P
(T) >
ln(K/S(0)) - (r + 2
/2)T

= 
ln(S(0)/K) + (r + 2
/2)T


T
.
Generally, if Z  N(m, s2
) under P, then
EP
[exp(Z)] = exp(m + s2
/2)
and
EP
[exp(Z)1{Z > z}] = exp(m + s2
/2)((-z + m + s2
)/s).
These formulae also work exactly the same way for conditional expectations. We can derive
these useful facts about the lognormal distribution without doing any serious classical
integration. The former formula follows from the latter, so we'll just prove the latter.
First observe that
1
0
m dt +
1
0
s dWP
(t) = m + sWP
(1) also has distribution N(m, s2
)
under P, when m and s are constants, so we can set Z equal to this value of an It^o process
5.3. EXAMPLES 53
at time 1. Then
EP
[exp(Z)1{Z > z}] = EP
exp
1
0
m dt +
1
0
s dWP
(t) 1{Z > z}
= EP
exp
1
0
m +
s2
2
dt -
1
0
s2
2
dt +
1
0
s dWP
1{Z > z}
= exp
1
0
m +
s2
2
dt EP d~P
dP
1{Z > z}
= exp(m + s2
/2)~P[Z > z]
and by Girsanov's theorem, since we used
d~P
dP
= exp -
1
0
s2
2
dt +
1
0
s dBt ,
the Wiener process under ~P is W
~P
(t) = WP
(t) - st, so
~P[Z > z] = ~P[m + sWP
(1) > z]
= ~P[m + s(W
~P
(1) + s) > z]
= ~P[(m + s2
) + sW
~P
(1) > z]
= ~P W
~P
(1) >
z - m - s2
s
= 
-z + m + s2
s
.
Now we finally get the Black-Scholes formula. From the Feynman-Kaˇc formula, we have
the European call price as
f(0, S(0)) = e-rT
EQ
[S(T)1{S(T) > K}] - KEQ
[1{S(T) > K}]
and S(T) = exp(Z) where the distribution of Z under probability Q is normal with mean
m = ln S(0) + (r - 2
/2)T and variance s2
= 2
T. We get EQ
[1{S(T) > K}] by using
t = 0 in equation (5.2.2) and we found EQ
[S(T)1{S(T) > K}] in Example 5.2.3, so the
Black-Scholes formula is
S0
ln(S0/K) + (r + 2
/2)T


T
- Ke-rT

ln(S0/K) + (r - 2
/2)T


T
,
where the first and second arguments of  are known as d1 and d2.
5.3. Examples
Let's go back and practice our new tools, the Feynman-Kaˇc formula and Girsanov's
theorem.
54 5. THE BLACK-SCHOLES ANALYSIS: PART II
Example 5.3.1 (Continuous coupon bond). Consider the PDE
ft(t, x) + (t, x)fx(t, x) +
1
2
2
(t, x)fxx(t, x) - rf(t, x) + h = 0,
where r and h are constants, and with terminal condition f(T, x) = g(x) = 1. The FeynmanKaˇc
formula says that the solution is
f(t, x) = EQ
t,x 1e-r(T-t)
+ h
T
t
e-r(u-t)
du = e-r(T-t)
+
h
r
(1 - e-r(T-t)
),
because
T
t
e-r(u-t)
du = -
1
r
e-r(u-t) T
t
=
1
r
(1 - e-r(T-t)
).
Because all payments are deterministic, the coefficients  and  of the stochastic process
X are irrelevant. The repayment of principal 1 at time T is worth e-r(T-t)
at time t, while
the value of receiving interest payments at rate h over [t, T] is worth the second term.
Consider T   in the second term. The limit h/r is the value of receiving payments at
rate h forever starting now; consider putting h/r in the money market account, and then
withdrawing the interest at rate r(h/r) = h. So (h/r)e-r(T-t)
is the value now (at time t) of
receiving payments at rate h forever, but starting at time T. Thus now at time t the value
of payments over [t, T] is the difference.
In the following example, we not only practice with our tools, but also reinforce the idea
that it makes sense to price securities by taking expectations under the risk-neutral measure
Q rather than some other probability measure, such as the original P. In this example, we
will call the process X from the Feynman-Kaˇc formula S because it represents a dividendpaying
stock. We see that using Q allows us to "reprice" the stock, that is, if we regard
the stock as a trivial "derivative" security whose prices is f(t, S(t)), then the Feynman-Kaˇc
formula indeed tells us f(t, S(t)) = S(t).
Example 5.3.2 (Repricing a dividend-paying stock). We have the PDE
ft(t, x) + (r - q)xfx(t, x) +
1
2
2
x2
fxx(t, x) - rf(t, x) + qx = 0,
with terminal condition g(x) = x. This represents holding the stock over [t, T], collecting
dividend qS(u) at each time u, and then selling it at T for payoff S(T). The Feynman-Kaˇc
formula says
f(t, S(t)) = EQ
t,S(t) S(T)e-r(T-t)
+
T
t
qS(u)e-r(u-t)
du
where
dS(t) = (r - q)S(t) dt + S(t) dWQ
(t) so d ln S(t) = (r - q - 2
/2) dt +  dWQ
(t).
5.4. PROBLEMS 55
First evaluating the first term, we have
EQ
t,S(t)[S(T)e-r(T-t)
]
= EQ
t,S(t)[S(t) exp(-r(T - t) + (r - q - 2
/2)(T - t) + (WQ
(T) - WQ
(t)))]
= S(t)e-q(T-t)
EQ
[exp(-2
/2(T - t) + (WQ
(T) - WQ
(t)))]
= S(t)e-q(T-t)
because the expression inside the expectation is a stochastic exponential, hence a martingale
starting at 1, so its expectation is 1. (Or you could use Girsanov's theorem to show that
this Q-expectation is E
~P
[1] = 1.) For the second term, similarly
EQ
t,S(t)
T
t
qS(u)e-r(u-t)
du
=
T
t
EQ
t,S(t)[qS(t) exp((-q - 2
/2)(u - t) + (WQ
(u) - WQ
(t)))] du
= qSt
T
t
e-q(u-t)
EQ
t,S(t)[exp((-2
/2)(u - t) + (WQ
(u) - WQ
(t)))] du
= qS(t)
T
t
e-q(u-t)
du
= S(t)(1 - e-q(T-t)
)
so the sum of the two terms is S(t), as required.
The first term, S(t)e-q(T-t)
, is the price of the stock for delivery at T: you would pay
this much at t to get the stock's value S(T) at T, without collecting the dividends over
[t, T]. Much as in Example 5.3.1, the second term, the value of collecting dividends over
[t, T], is the value of collecting dividends over [t, ) minus the value of collecting dividends
over [T, ). The stock itself is precisely a certificate entitling its holder to collect dividends
forever. It costs S(t) to buy the stock now and get dividends over [t, ), and it costs the
price for future delivery S(t)e-q(T-t)
to get them over [T, ].
5.4. Problems
Problem 5.1 (Exercise 4.11 of [Bj¨o98]). Solve (for f(t, x)) the PDE
ft(t, x) +
1
2
x2
fxx(t, x) + x = 0
with terminal condition f(T, x) = ln x.
56 5. THE BLACK-SCHOLES ANALYSIS: PART II
Problem 5.2. In class, we derived the Black-Scholes option price f(0, S(0)) at time 0.
Do the derivation again, this time computing
f(t, S(t)) = S(t)
ln(S(t)/K) + (r + 2
/2)(T - t)


T - t
-Ke-r(T-t)

ln(S(t)/K) + (r - 2
/2)(T - t)


T - t
.
Hint: the conditional distribution of S(T) given either S(t) or Ft is normal. Use the quick
rule for evaluating E[exp(Z)1{Z > z}].
Problem 5.3. In this problem, you find the price f(t, F(t)) of a European call option on
a futures price F, rather than a stock price S, in the Black-Scholes model. Solve the PDE
ft(t, x) +
1
2
2
x2
fxx(t, x) - rf(t, x) = 0
with terminal condition f(T, x) = (x - K)+
, to get a formula similar to that of the previous
problem.
CHAPTER 6
Complete Markets
6.1. Market Price of Risk
We were able to compute a unique no-arbitrage price for the European call option in
the Black-Scholes model because it can be replicated by a tame portfolio strategy , that
is, (T)S(T) = g(S(T)), the payoff. Then by the no-arbitrage principle, the price of the
derivative is (0)S(0), the price of its replicating portfolio. If you know how to replicate
a derivative, you know how to hedge it, since g(S(T)) - (T)S(T) = 0, and thus the
combination of the derivative and the hedging portfolio is riskless. Strictly speaking then,
one ought to reserve the term "hedging strategy" for - as opposed to , the replicating
strategy.
Example 6.1.1. For the European call option in the Black-Scholes model, the replicating
strategy is
1(t) = (d1) and 0(t) = -Ke-rT
(d2).
In the Black-Scholes model, every contingent claim can be replicated by a self-financing
portfolio strategy, which means the market is complete. Then every claim has a unique
no-arbitrage price, equal to the initial value of its replicating portfolio. In the Black-Scholes
model, this price is EQ
[e-rT
g(S(T))], the expected discounted payoff under the probability
measure Q that makes the stock have geometric drift r. The probability measure Q is called
an equivalent martingale measure (EMM) for the original P.
Q is a martingale measure in that the discounted stock price e-rt
S(t) = S(t)/M(t)
becomes a martingale under Q. Then the discounted price of a money market account share,
M(t)/M(t) = 1, is also a martingale. The discounted value of any tame, self-financing portfolio
must be a martingale under the martingale measure. (In particular, then, the discounted
price of any non-dividend-paying traded security must be a martingale.) Consequently, the
discounted value of any replicated claim (which equals the value of the replicating portfolio)
is a martingale under Q--and in a complete market, this is every claim. For a Q-martingale
V ,
EQ V (T)
M(T)
= EQ V (T)
M(T)
|F0 =
V (0)
M(0)
= V (0),
and this justifies computing the initial price V (0) of a derivative as the Q-expectation of its
discounted payoff V (T). It would not work to use a probability measure, such as P, under
which the value is not a martingale: then E[V (T)/M(T)] = V (0).
57
58 6. COMPLETE MARKETS
In the preceding discussion, there was a special role for the money market account: its
price was used for discounting. This seems natural because M(t) = ert
in the Black-Scholes
model, and this is a simple way of expressing the time value of money. However, it is
perfectly legitimate to use the stock price S(t) to discount, and look for a measure under
which M(t)/S(t) and S(t)/S(t) = 1 are martingales. This would be a different measure from
Q, and a useful one, as we will see later. The asset whose price is used for discounting is
called the numeraire. All values are expressed in terms of shares of this asset. For instance,
K dollars at time T equals K/M(T) shares of the money market account or K/S(T) shares
of stock. In the Black-Scholes model, M(T) is a constant, so K dollars at time T is worth
K/M(T) dollars at time 0.
Q is equivalent to P in the sense that these two probability measures agree about which
events are possible:
P[E] = 0  Q[E] = 0.
In the Black-Scholes model, we can see this is true, because Q results from a Girsanov
transformation of P. We have
dS(t) = S(t)( dt +  dW(t)) but dS(t) = S(t)(r dt +  dWQ
(t))
and therefore  dWQ
(t) =  dW(t)+(-r) dt, so dWQ
(t) = dW(t)+ dt, where  satisfies
(6.1.1)  =  - r.
Therefore, as in the discussion of the Girsanov transformation in Section 5.2, there is a
density
(6.1.2)
dQ
dP
= Z(T) = exp -
 2
2
T -  dW(T) ,
which is finite and positive, relating the two probability measures. So Q[E] = EQ
[1E] =
E[1EdQ/dP] is zero if and only if P[E] = E[1E] is.
The density process of Q with respect to the old probability measure P is stochastic
process
(6.1.3) Z(t) = exp -
1
2
t
0
s
2
ds -
t
0
s dW(s) .
The density process is the conditional expectation of the density:
E
dQ
dP
|Ft = E Z(t)
Z(T)
Z(t)
|Ft
= Z(t)E exp -
1
2
T
t
s
2
ds -
T
t
s dW(s) |Ft
= Z(t)
6.2. THE PRICING MEASURE Q 59
because the stochastic exponential is a martingale. So we can refer to Z(t) as E[dQ/dP|Ft].
This makes it clear how to use Girsanov transformations for conditional expectations.
EP
[Z(T)Y |Ft] = Z(t)EQ
[Y |Ft](6.1.4)
WP
(t) = WQ
(t) -
t
0
(s) ds(6.1.5)
The idea is that Z(t) is known given Ft, and the remaining factor
Z(T)
Z(t)
= exp -
1
2
T
t
(s) 2
ds -
T
t
(s) dW(s)
is used up in changing the probability measure so that the old standard Brownian motion
B becomes a Brownian motion with drift  under the new probability measure ~P. (The
transpose symbol appears because  is a row vector but a drift, like a Brownian motion,
should be a column vector.)
The market price of risk  expresses the reward for accepting the risk inherent in
the stock as an increase in expected return. In the one-dimensional Black-Scholes case,
 = ( - r)/, the stock's excess return divided by volatility, which is its Sharpe ratio. In
the general case,  is a vector stochastic process, and its kth component expresses the increase
in expected return earned by exposure to a unit of the kth risk, i.e. the kth component of
the fundamental Wiener process W.
6.2. The Pricing Measure Q
Thus the switch from the original probability measure P to Q, which is used for pricing,
is justified on financial grounds. It would not be correct to value risky payoffs by
discounting them at the risk-free rate and then taking an expectation. The density Z(T) of
equation (6.1.2), built from the market price of risk process , helps us to understand how
much a dollar is worth in different states . If the risk-free rate is a stochastic process r,
then define
(6.2.1) D(t) = exp -
t
0
r(s) ds ,
the accumulated discount factor up to time t, formed by continuous compounding at the
risk-free rate. If there is a money market account, D(t) = 1/M(t). The pricing kernel or
state price density  = DZ is a stochastic process specifying the desirability of a dollar
at each time and in each state. The price of a derivative paying f(S(T)) at time T is
(6.2.2)
EQ
[D(T)f(S(T))] = E D(T)f(S(T))
dQ
dP
= E[D(T)f(S(T))Z(T)] = E[(T)f(S(T))].
So the value of a dollar at time T in state , as measured by its contribution to the price
of a derivative, depends on three things:
* the probability of state  under the real-world probability measure P
60 6. COMPLETE MARKETS
* the time value of money due to discounting, as expressed by Dt()
* another factor Z(, T) representing the influence of relative supply and demand of
wealth in state  at time T
We will study the financial significance of Z in greater depth when we cover portfolio optimization,
in Chapter ??, where we will find that the state price (, T) is related to the
marginal utility of wealth in state  at time T. For now, here is a heuristic argument in an
example.
Example 6.2.1. Consider an economy satisfying the Black-Scholes model, and where
the net supply of money market shares is zero, and the net supply of stock shares is positive.
That is, market participants borrow from and lend to each other at constant rate r, so for
each owner of a money market share (lender) there is an owner of a negative money market
share (borrower). But there exists a company with a certain number of shares outstanding,
so while some people may own a positive and others a negative number of shares, the total
holdings are positive. Then the total wealth in the economy is proportional to the stock
price S. Assuming that the "demand for wealth"1
is constant, the "price of wealth" should
be inversely related with its supply, hence inversely related with S. This is indeed what we
found:
S(T) = S(0) exp ( - 2
/2)T + W(T) and (T) = exp -(r + 2
/2)T - W(T)
where both  and  are positive, so the relationship is inverse.
To sum up, we have seen a mathematical and a financial reason why we price by taking
expected discounted payoffs under the equivalent martingale measure Q, not the original
measure P. The mathematical reason is that a derivative's initial price equals the inital
price V (0) of its replicating portfolio, and we need the discounted portfolio value DV to be a
Q-martingale to justify EQ
[D(T)V (T)] = V (0). The financial reason is that Q incorporates
information about the value of wealth in different states of the world. Ignoring this information
and trying to price with P would be like saying that people are indifferent between
having an extra dollar when the economy is in a recession or during a boom.
One might also ask why there has to be a numeraire. Why can't prices expressed in
nominal dollars be made into martingales? The reason is the money market account. For
instance, in the Black-Scholes model, it is deterministic, and thus no change of probability
measure2
can change it at all.
Finally, here is an example of what goes wrong when one tries to price with P: it does
not even reprice the traded securities.
Example 6.2.2. In the Black-Scholes model, we can regard the stock as a trivial derivative
of itself, paying f(S(T)) = S(T) at some time T. To be correct, a scheme for pricing
1One reason that this is a heuristic argument is that it is not clear that the concept of demand for wealth
makes any sense. Later we will have an economically sound argument supporting the conclusions offered
here.
2more precisely, no change to an equivalent measure
6.3. VECTOR IT^O PROCESSES 61
derivatives must tell us that the initial price of this trivial derivative is S(0), the initial stock
price. However
E[D(T)f(S(T))] = E[e-rT
S(T)]
= E[exp(-rT)S(0) exp(( - 2
/2)T + W(T))]
= E[S(0) exp(( - r - 2
/2)T + W(T))]
= S(0)e(-r)T
E[exp((-2
/2)T + W(T))]
= S(0)e(-r)T
since the expression inside the expectation is a P-martingale starting at 1 (or by applying a
Girsanov transformation). Thus this scheme mis-prices the stock. On the other hand,
EQ
[D(T)f(S(T))] = EQ
[S(0) exp(( - r - 2
/2)T + W(T))]
= EQ
[S(0) exp(( - r - 2
/2)T -  T + (W(T) +  T))]
= EQ
[S(0) exp((-2
/2)T + WQ
(T))]
= S(0)
which is correct.
Assuming, as only makes sense, that  > r because investors demand a reward for holding
the risky stock, using P over-prices the stock. It is true that expected future wealth is greater
if one puts S(0) dollars into buying one share of the stock rather than into a money market
account, but this is offset by the nature of the stock's riskiness. According to our previous
discussion, the many dollars that one gets from owning the stock when S(T) is high are not
valued as highly as dollars when S(T) is low. To value the stock's or a derivative's payoff
correctly, we must use Q.
6.3. Vector It^o Processes
We will use vector It^o processes in order to model a market with multiple risky securities.
Let X be a vector It^o process of N components based on a Wiener process W with K
independent components, each a one-dimensional Wiener process. They model multiple
independent sources of risk.
The arithmetic representation of an It^o process (2.6.2)
X(t) = X(0) +
t
0
(s) ds +
t
0
(s) dW(s)
now needs further explanation. The convention is that X and W are column vectors of
length N and K respectively. Then X(0) is a column vector of length N, or for short, a
column N-vector. Its entries Xi(0) are the starting values of the ith component of the It^o
process X, written. The random variable (t) is also a column N-vector. However, (t) is
62 6. COMPLETE MARKETS
an N × K matrix. We have
t
0
(s) dW(s) =
K
k=1
t
0
ˇk(s) dWk(s)
where ˇk is the kth column of , i.e. the column N-vector whose ith component is the
element ik. Now the K integrals on the right-hand side are ordinary It^o stochastic integrals
with only a single one-dimensional Wiener process Wk involved.
Breaking up (2.6.2) row by row yields, for the ith row,
Xi(t) = Xi(0) +
t
0
i(s) ds +
t
0
iˇ(s) dW(s) = Xi(0) +
t
0
i(s) ds +
K
k=1
t
0
ik(s) dWk(s)
and here we can see explicitly the interpretation of all the coefficients: Xi(0) is the starting
value of Xi, i(t) is its arithmetic drift at time t, and ik(t) is its time-t arithmetic volatility
on the kth Brownian motion Wk. The row K-vector iˇ
(t) is the arithmetic volatility vector
of Xi at time t.
So  is the N × K instantaneous arithmetic volatility matrix, so called because (t)
contains arithmetic volatilities for the instant t. The instantaneous covariance matrix is
N × N and can be derived heuristically as follows:
Cov [dXi(t), dXj(t)] = Cov i(t) dt +
K
k=1
ik(t) dWk(t), j(t) dt +
K
=1
j (t) dW (t)
=
K
k=1
Cov [ik(t) dWk(t), jk(t) dWk(t)]
=
K
k=1
ik(t)jk(t) dt
This is the basis for the vector It^o formula. The instantaneous covariance between Xi and
Xj at time t is
K
k=1
ik(t)jk(t) = iˇ(t) (jˇ(t)) ,
the dot product between the volatility row vectors iˇ(t) and jˇ(t). So the instantaneous
covariance matrix is the N × N matrix (t) (t).
Example 6.3.1 (Multidimensional Generalized Brownian Motion). Let X be a twodimensional
It^o process with constant coefficients:
X(t) =
2
-2
+
t
0
-1
1
ds +
t
0
3 4
4 3
dW(s)
=
2 - t + 3W1(t) + 4W2(t)
-2 + t + 4W1(t) + 3W2(t)
.
6.3. VECTOR IT^O PROCESSES 63
What is the probability that X1(2) is negative given X2(2)? The components of X(t) are
bivariate normal with mean vector and covariance matrix
2 - t
t - 2
and
25 24
24 25
t.
(For the covariance matrix, the computations are 32
+42
= 25 = 42
+32
and 3×4+4×3 = 24.)
The conditional distribution of X1(t) given X2(t) is then normal with mean
E [X1(t)|X2(t)] = E [X1(t)] +
Cov [X1(t), X2(t)]
Var [X2(t)]
(X2(t) - E [X2(t)])
= 2 - t +
24
25
(X2(t) - (t - 2))
and variance
Var [X1(t)|X2(t)] = Var [X1(t)] 1 Cov
[X1(t), X2(t)]2
Var [X1(t)] Var [X2(t)]
= 25t 1 -
242
252
=
49
25
t.
For t = 2 we have conditional mean (24/25)X2(2) and conditional variance 98/25, so the
conditional probability that X1(2) is negative is
P[X1(2) < 0|X2(2)] = 
0 - E [X1(2)|X2(2)]
Var [X1(2)|X2(2)]
= 
-(24/25)X2(2)
98/25
= 
X2(2)12

2
35
.
The vector It^o formula in differential form says that where X is a vector It^o process
satisfying dX(t) = (t) dt + (t) dW(t),
df(t, X(t)) = ft(t, X(t)) +
1
2
tr  (t)fxx(t, X(t))(t) dt + fx(t, X(t)) dX(t)
= ft(t, X(t)) + fx(t, X(t))(t) +
1
2
tr  (t)fxx(t, X(t))(t) dt(6.3.1)
+fx(t, X(t))(t) dW(t).
Here fx is the gradient (a row vector) and fxx is the Hessian matrix.
The trace of a square matrix is the sum of the squared diagonal elements: if A is N × N,
then tr(A) = N
i=1 Aii. In the formula,
(6.3.2) tr  (t)fxx(t, Xt)(t) =
N
i=1
N
j=1
2
f
xixj
(t, X(t))iˇ(t)jˇ(t).
Here is a useful special case, sometimes referred to as integration by parts, sometimes as
a product rule. Recall that in classical calculus, (fg) = f g + g f, or in different notation,
d(XY ) = X dY + Y dX: the product rule. Equivalent to this is X dY = XY - Y dX:
integration by parts.
64 6. COMPLETE MARKETS
Example 6.3.2 (It^o product rule). Suppose dX(t) = X(t) dt+X(t) dW(t) and dY (t) =
Y (t) dt + Y (t) dW(t). Then X(t)Y (t) = f(t, [X(t)Y (t)] ) where f(t, x) = x1x2. We have
ft t,
X(t)
Y (t)
= 0, fx t,
X(t)
Y (t)
= Y (t) X(t) , fxx t,
X(t)
Y (t)
=
0 1
1 0
.
By the vector It^o formula,
d(X(t)Y (t)) = 0 +
1
2
(0 + X(t)Y (t) + Y (t)X(t) + 0) dt + Y (t) X(t)
dX(t)
dY (t)
= X(t)Y (t) dt + X(t) dY (t) + Y (t) dX(t).(6.3.3)
Thus the It^o correction term for a product is the product of the instantaneous covariance
between the two factors. In the product (dX(t))(dY (t)), the only nonnegligible term is
X(t)Y (t) dt, which comes from multiplying the stochastic parts together. So we can write
an analogue of the product rule:
(6.3.4) d(X(t)Y (t)) = X(t) dY (t) + Y (t) dX(t) + (dX(t))(dY (t)).
6.4. Markets with Multiple Risky Securities
In the multidimensional case, the traded securities have a price vector S modeled as
a vector It^o process: dS(t) = (t) dt + (t) dW(t) where S and  are column N-vectors,
the Wiener process W is a column K-vector, and  is an N × K matrix. Note that the
coefficient processes  and  are arithmetic, so the Black-Scholes model has i(t) = Si(t)i
and ik(t) = Si(t)ik, for a constant vector  and constant matrix . We can write this
in matrix notation by defining the diagonal matrix diag(S(t)), so that (t) = diag(S(t))
and (t) = diag(S(t)). We designate the money market account by M(t) = S0(t), with
dM(t) = r(t)M(t) dt. We also have to specify the dynamics of the risk-free rate r, which is
not itself a traded security.
It is important to understand when such a model of a market will be arbitrage-free, and
when complete. The answer has everything to do with market prices of risk and equivalent
martingale measures (EMMs). Precise results are unnecessarily technical. Let it suffice to
say that absence of arbitrage is close to being the same as the existence of an EMM, and
completeness is close to being the same as uniqueness of the EMM. Also, an EMM is close
to being the same as a market price of risk process  that, with a risk-free rate process r,
solves a generalization of Equation (6.1.1):
(6.4.1) (t) (t) = (t) - r(t)S(t).
This is because the density dQ/dP arises as the stochastic exponential of . The existence
and uniqueness of a market price of risk process  are just a question of linear algebra.
To get a solution, generally we want the number of assets N  K, the number of risks.
Actually, what we need is for any linear dependence among the rows of t to generate a
6.4. MARKETS WITH MULTIPLE RISKY SECURITIES 65
corresponding dependence in the drifts t:
i(t) =
j=i
jj(t)  i(t) = r(t)Si(t) +
j=i
j (j(t) - r(t)Sj(t)) .
The principle is "same risks, same rewards." Otherwise one can construct an arbitrage by
hedging the ith asset with this portfolio of other assets to get an instantaneously riskless
portfolio that has a return different from r. The technique is first to construct a portfolio
of risky assets with zero volatility, then make its initial value zero by borrowing or lending:
see Example 6.4.2.
Example 6.4.1. Suppose (t) = diag(S(t)) and (t) = diag(S(t)) where the constants
 =
1 2
2 1
and  =
0.1
0.09
and the risk-free rate is a constant r(t) = r = 0.05. The matrix  has inverse
-1
=
1
3
-1 2
2 -1
so we can compute directly that the market price of risk vector process is
 (t) = -1
(t)((t) - r(t)S(t)) = -1
( - r1) =
0.01
0.02
.
In this equation the presence of the column vector 1 is just supposed to make it a legitimate
vector equation, in which the risk-free rate r is subtracted from each coordinate of the
geometric drift vector . What happens in this equation is that effectively the stock price
S(t) cancels out of the equation: this does not happen in general, but does happen here in
this multidimensional Black-Scholes model.
Example 6.4.2. Suppose (t) = diag(S(t)) and (t) = diag(S(t)) where the constants
 =
1
2
and  =
0.1
0.09
and the risk-free rate is a constant r(t) = r = 0.05. There is only one source of risk, so  must
be a scalar. From the first row of the matrix equation (6.4.1), S1(t) = 0.1S1(t) - 0.05S1(t)
so  = 0.05, but from the second row, S2(t)2 = 0.09S2(t)-0.05S2(t) so  = 0.02. Therefore
there is no market price of risk  that solves the matrix equation, and the market is not
arbitrage-free.
We could construct an arbitrage by following these steps. The first asset, S1, is more
attractive: it requires a higher value of , and it is plain that it has more return but less risk
than S2. So we start our portfolio by buying one share of S1, that is, 1 = 1. Then to give
the portfolio zero volatility, choose
2(t) = -
S1(t)1
S2(t)2
.
66 6. COMPLETE MARKETS
To make the portfolio initially costless, include
0(0) = S1(0)
1
2
- 1
shares of the money market account. This portfolio has instantaneous arithmetic drift at
time 0 of
0(0)M(0)r+1(0)S1(0)1 +2(0)S2(0)2 = S1(0)
1
2
- 1 r + 1 -
1
2
2 = S1(0)(0.03)
so at this instant, the zero-cost portfolio is making a riskless positive return. Formally, we
haven't shown that we have an arbitrage. We would have to compute the 0(t) that makes
the portfolio self-financing, and then check the portfolio value at some later time T, but this
is a bit cumbersome.
Example 6.4.3. Suppose
(t) =
1 sin t
1 0
and (t) =
0.1
0.09
.
There are two assets and two sources of risk, but whenever t is a multiple of , the matrix
(t) becomes degenerate: its two rows are the same. If there were a market price of risk ,
this would imply that (t) - r(t)S(t) should also have two identical rows. But since the two
rows of (t) are not identical, this will not happen in general, so there can not be a market
price of risk. At some times, the risks of the two assets are the same, but their rewards are
not, hence they can not be explained by market prices of risk.
To make a solution unique, generally we want N  K, or more precisely, it requires that
there be at least K linearly independent rows of the volatility matrix . When that is true,
it is possible to construct a portfolio whose volatility vector has only one nonzero entry, thus
identifying the market price of that risk. Otherwise, there will be more than one way of
allocating the N assets' expected returns to the K sources of risk.
Example 6.4.4. Suppose (t) = diag(S(t)) and (t) = diag(S(t)) where the constants
 =
1 2
1 2
and  =
0.1
0.1
and the risk-free rate is a constant r(t) = r = 0.05. We have two unknowns, the two
components of the market price of risk vector , in one equation, 1 + 22 = (0.1 - 0.05),
which leads to an infinite number of solutions.
In conclusion, we see that the most natural case of a complete market in which a unique
EMM exists involves an invertible geometric volatility matrix.
6.5. MARTINGALE REPRESENTATION 67
6.5. Martingale Representation
The martingale representation theorem is a kind of converse of our earlier result that,
under some technical conditions, any stochastic integral is a martingale. It says that a
martingale adapted to the market filtration F and starting at zero can be represented as a
stochastic integral. In particular, a contingent claim paying Y at time T has a representation
Y = Y (0) +
T
0
(t) dW(t)
because E[Y |Ft] - E[Y ] is a martingale starting at zero and ending at Y - E[Y ]. So we see
that Y (0) = E[Y ].
Also, we have some freedom to choose how we construct the martingale to be represented.
Actually, we are interested in finding a hedging strategy that relates to the contingent claim's
no-arbitrage value, which we know involves discounting and the probability measure Q, so
use
D(T)Y = EQ
[D(T)Y ] +
T
0
(t) dWQ
(t).
Here D(t)V (t) = EQ
[D(T)Y |Ft] is a Q-martingale, which as the notation suggests is the
discounted value of the contingent claim, or its hedging portfolio. A Q-martingale starting
at zero is D(t)V (t) - V (0), where V (0) = EQ
[D(T)Y ] is of course the initial value of the
contigent claim. (Note that V (0) = D(0)V (0).)
How does this relate to the problem of hedging the contingent claim Y , not D(T)Y ? From
the martingale representation theorem, we have d(D(t)V (t)) = (t) dWQ
(t), but we can also
get an expression for d(D(t)V (t)) by applying It^o's formula to the product of the discount
factor D and hedging portfolio value V . Then we will see that there is a self-financing
portfolio strategy  such that (T)S(T) = V (T) = Y , and what this hedging strategy is.
By the It^o product formula (6.3.4), d(D(t)V (t)) = D(t)dV (t)+V (t)dD(t)+(dD(t))(dV (t)).
From Equation (6.2.1), dD(t) = -r(t)D(t) dt, while the self-financing condition is dV (t) =
(t) dS(t). So we get
d(D(t)V (t)) = D(t)(t) dS(t) - V (t)r(t)D(t) dt.
Risk-neutrally, Si should have geometric drift r(t) at time t, just like the money market
account. So the arithmetic drift of S under Q is r(t)S(t). Then plugging into the above
dS(t) = r(t)S(t) dt + (t) dWQ
(t) and V (t) = (t)S(t), we get
d(D(t)V (t)) = D(t)(t)r(t)S(t) dt + D(t)(t)(t) dWQ
(t) - (t)S(t)r(t)D(t) dt
= D(t)(t)(t) dWQ
(t).
Equating the two expressions for d(D(t)V (t)), we have
(t) = D(t)(t)(t).
So generally, there will be a portfolio strategy  solving this equation when the volatility matrix
 has at least as many linearly independent rows as the K components of the Brownian
motion (and hence of ).
68 6. COMPLETE MARKETS
The problem with this approach is that the martingale representation theorem tells us
that there exists some integrand  giving rise to the martingale D(t)V (t) - V (0), but it
gives us no help whatsoever in computing this , so we also have no idea what the hedging
portfolio  is. This result is of purely theoretical significance in understanding how the vector
It^o process model for traded security prices gives rise to a complete market, under conditions
on the volatility matrix . To compute hedging strategies in the Black-Scholes model, we
had to construct and solve a PDE.
6.6. Decomposition
One way to gain some insight into the use of numeraires and the European call option is
to decompose its payoff and price into two parts. The payoff
(S(T) - K)+
= (S(T) - K)1{S(T)  K} = S(T)1{S(T)  K} - K1{S(T)  K},
and we can treat these two parts as the payoffs of separate derivative securities. The first
part pays the final stock price, but only if it is above K: this is known as a stock-or-nothing
call. The latter part pays -K only when the stock price finishes above K. Think of this
as being -K shares of a derivative that pays 1 when the stock price finishes above K: this
is a cash-or-nothing call. This type of option is also called a binary option, because the
outcome is binary: either it pays 1, or 0. So the "plain vanilla" European call option equals
a portfolio of one stock-or-nothing call and -K cash-or-nothing calls, all struck at K.
Its price must then be the sum of these prices, and remember that we derived the plain
vanilla call price by evaluating EQ
[e-r(T-t)
S(T)1{S(T)  K}|Ft] and EQ
[e-r(T-t)
K1{S(T) 
K}|Ft] separately. So the price of the cash-or-nothing call is e-r(T-t)
(d2) and that of the
stock-or-nothing call is S(t)(d1). What the cash-or-nothing option pays is either 0 or e-rT
shares of the money market account, and it was easy to price this payoff using the money
market account as the numeraire. What the stock-or-nothing option pays is either 0 or 1
shares of stock, so it is natural to try pricing this payoff using the stock as the numeraire.
Now the discount factor D(t) = 1/S(t), so the discounted payoff is D(T)S(T)1{S(T) 
K} = 1{S(T)  K}. But under what probability measure are we taking the expectation of
this discounted payoff? The equivalent martingale measure ~P when S is the numeraire must
make D(t)S(t) = 1 and
D(t)M(t) =
M(t)
S(t)
=
1
S(0)
exp r -  -
2
2
t - W(t)
into martingales. Let ~ =  -  = ( - r)/ -  and W
~P
t = Wt + ~t, so
M(t)
S(t)
=
1
S(0)
exp -~ - 2
/2 t - W(t) =
1
S(0)
exp -2
t/2 - W
~P
(t) .
6.6. DECOMPOSITION 69
This is a martingale under the probability measure that makes exp(-2
t/2 - W
~P
(t)) a
stochastic exponential, which it is when W
~P
is a Wiener process. So the equivalent martingale
measure is the one under which W
~P
is a Wiener process. According to the Girsanov
transformation, that is ~P where d~P/dP = exp(-~2
T/2 - ~W(T)).
Under ~P, the conditional distribution of ln(M(T)/S(T)) given Ft is normal with mean
ln(M(t)/S(t)) - 2
(T - t)/2 and variance 2
(T - t). So the value of the discounted payoff
1{S(T)  K} is
E
~P
[1{S(T)  K}|Ft] = ~P ln
M(T)
S(T)
< ln
M(T)
K
|Ft
= 
ln(M(T)/K) - ln (M(t)/S(t)) + 2
(T - t)/2


T - t
= 
rT - ln(K) - (rt - ln(S(t))) + 2
(T - t)/2


T - t
= (d1).
Finally, since the payoff was being expressed in terms of shares of stock, this quantity must
also be interpreted as a number of shares of stock. So the dollar price is S(t)(d1).
This approach is equivalent to and faster than our original approach to the same problem
in the derivation of the Black-Scholes formula. What happens here is that we reach the
probability measure ~P in one step, rather than in two steps, from P to Q to ~P. It is the
same measure ~P in both derivations, as we can see by observing
d~P
dQ
dQ
dP
= exp -
2
2
T + WQ
(T) exp -
2
2
T - W(T)
= exp -
2
+ 2
2
T + WQ
(T) - W(T)
= exp -
2
+ 2
2
T + (W(T) + T) - W(T)
= exp -
2
- 2 + 2
2
T + ( - )W(T)
= exp (
- )2
2
T + ( - )W(T)
= exp -
~2
2
T - ~W(T)
=
d~P
dP
since ~ =  - .
70 6. COMPLETE MARKETS
By choosing the right numeraire for this problem, the expectation becomes simple to
evaluate under the appropriate probability measure. Notice that under this measure ~P,
associated with the stock as numeraire, the stock is
S(t) = S(0) exp(( - 2
/2)t + W(t)) = S0 exp((r + 2
/2)t + W
~P
(t))
using W(t) = W
~P
+ ~t = W
~P
(t) + t - (( - r)/)t. Of course the money market account
is still M(t) = ert
. So ~P is an equivalent martingale measure, and one based on using one
of the traded securities as a numeraire, but it is not a risk-neutral measure as Q is. The
geometric drift of the stock under ~P is r + 2
, by It^o's formula, while that of the money
market account is still r (nothing can change this).
6.7. Problems
Problem 6.1. Suppose dX(t) = X(t)(X(t) dt+X(t) dW(t)) and dY (t) = Y (t)(Y (t) dt+
Y (t) dW(t)) where W is a vector Wiener process. Find the stochastic differential of the It^o
process X/Y .
All of the following assume the model of Section 6.4: an N-vector of market security
prices follows the SDE dS(t) = (t) dt + (t) dW(t), where W is a K-dimensional Wiener
process under P.
Problem 6.2. Suppose (t) = diag(S(t)) and (t) = diag(S(t)) where the constants
 =


2 1 0
1 3 0
2 2 1

 and  =


0.12
0.16
0.05


and the risk-free rate is a constant r(t) = r = 0.05. Does there exist a unique market price
of risk vector? If so, compute it.
Problem 6.3. Suppose that the arithmetic volatility for the two risky market securities
and the market price of risk are the constant matrices
(t) =  =
1 0
1 1
and (t) =  =
0.02
0.03
and the risk-free rate is a constant r(t) = r = 0.05. What is the arithmetic drift vector
(t) (under P)? Now suppose further that there is a derivative security whose unique noarbitrage
price C has constant arithmetic volatility [0 1]. How many shares of the two risky
assets does the replicating strategy have at time t? What is the arithmetic drift of C (under
P)? Hints: do not forget that the self-financing replicating strategy might also involve the
money market account, and do not give answers under Q.
Problem 6.4. Suppose that (t) = diag(S(t)) and (t) = diag(S(t)) where  and
 are constants, and that there exists a unique constant market price of risk .
(1) Show that EP
[S1(T)(T)/(t)|Ft] = S1(t), i.e. using the pricing kernel reprices the
first security correctly.
6.7. PROBLEMS 71
(2) What is the time-t no-arbitrage price of a derivative security paying exp(W(T))
at time T?
CHAPTER 7
Futures and Dividends
7.1. Futures
A forward contract is an agreement to exchange something, whether that be a quantity
of pork bellies or heating oil, or the value of some bonds or a stock index, at a future time
T, but for a price agreed on now. The price agreed upon now, but to be paid only at time
T, is called the forward price. Unlike other derivative securities that we have discussed
so far, no money changes hands when the contract is made. The forward price is not the
cost to acquire the forward contract, but the amount of money that changes hands at the
maturity T.
The question of pricing is not about finding the price for the contract, but finding the
forward price F(0, T), in terms of the spot price of the underlying, S(0), which is the price
paid to get it now (not later). The notation F(0, T) means the forward price at time 0
for delivery at time T, i.e. for a forward contract of maturity T. Consider the case of a
stock that pays no dividends in a market where it is possible to buy or sell at time 0 a
riskless bond paying 1 at time T for the price B(0, T). For instance, in the Black-Scholes
model, B(0, T) = e-rT
: just buy e-rT
shares of the money market account. Then a simple
no-arbitrage argument shows that the forward price of the underlying must be S(0)/B(0, T).
The replicating portfolio for the forward contract is to buy the underlying for S(0) and sell
F(0, T) of the riskless bond for F(0, T)B(0, T): this produces the final payoff S(T)-F(0, T)
at time T, and its cost now is S(0) - F(0, T)B(0, T). This needs to be 0 to match the initial
value of the forward contract, so F(0, T) = S(0)/B(0, T). In the Black-Scholes model,
F(0, T) = S(0)erT
: the forward price is the initial stock price, appreciated at the risk-free
rate. This is not the expected value of the stock price at T, which is E[ST ] = S(0)eT
, but
rather the time-T value of the debt required to buy the stock now.
The forward price is chosen to make the forward contract have an initial value of 0, but
what happens to its value when time passes? Consider a forward contract for delivery at T,
agreed upon at time 0, when the forward price was F(0, T) = S(0)/B(0, T). At time t, the
forward price for delivery at T has become F(t, T) = S(t)/B(t, T). The owner of the forward
contract struck at F(0, T) could now sell a new forward contract, removing all risk due to
fluctuations in the price of the underlying. The resulting cashflow at time T would then
be F(t, T) - F(0, T). The value of this payoff as of time t is B(t, T)(F(t, T) - F(0, T)) =
S(t) - S(0)B(t, T)/B(0, T). The no-arbitrage version of this argument is that a portfolio
that contains 1 forward contract struck at F(0, T), -1 forward contract struck at F(t, T),
and F(0, T) - F(t, T) bonds maturing at T has value 0 at time T. So its price at time t
72
7.1. FUTURES 73
must be 0 to avoid arbitrage, and therefore the time-t price of the forward contract struck
at F(0, T) is the negative of the sum of the values of the other elements of the portfolio:
-(0 + B(t, T)(F(0, T) - F(t, T))).
So we have seen that owning the forward contract is exactly like borrowing S(0) to buy
the stock. Perhaps for this reason, forward contracts exist primarily as contracts negotiated
between two parties, and not as exchange-traded securities. A forward contract involves a
significant amount of credit risk. Suppose an airline enters into a forward contract with an
oil company to buy fuel. The oil company should factor into the forward price the risk that
the airline might be bankrupt and unable to pay when the time for delivery comes. This
risk is different from that for a power utility, for example. In reality, different creditors can
borrow at different rates, and this must be reflected in the forward prices that they get. This
is unsuitable for exchange-traded securities, where the principle of one price for all reigns.
What would be like a forward contract, but suitable for trading on an exchange? A
futures contract, which is also a contract to buy a stock or commodity at a future time T
for an agreed-upon price called the futures price. It has the characteristic that the profit
or loss from owning it is actually realized every day. This process is called marking to
market. Suppose we now let F(t, T) symbolize the futures price for delivery at T as of time
t. Then at the end of day i, an owner of a futures contract receives that day's change in
the futures price: F(ti, T) - F(ti-1, T). (If this is a negative number, then the owner of the
contract must make a payment.) Because the changes in the futures price are thus accounted
for daily, a futures contract is worth 0 again at the end of each day.
Someone who had shorted a futures contract would get F(ti-1, T) - F(ti, T) at the end
of day i. A futures market is, in dollar terms, a zero-sum game, so that there are always an
equal number of long and short positions, and all the accounts balance. Nobody is allowed
implicitly to run up a debt as with forward contracts; the market mechanisms are designed
to minimize credit issues. Indeed, one is actually required to post a margin in account with
one's futures broker: the size of the margin is a feature of the futures contract, and is chosen
to be small relative to the futures price, and yet very likely to be larger than a daily price
change.
There are many details that we are not modeling: typically, margin accounts pay interest
like a money market account; if one's margin account falls too low, one receives anxious or
threatening calls from one's broker, who is authorized to close out your positions if you do
not post more margin; there may be an impact from financial commodities' dividends or
interest payments, or physical commodities' storage costs, convenience yields, or seasonality
factors. Imagine that we are modeling futures on a non-dividend-paying stock.
Because of marking-to-market, a futures contract is much more complicated than a forward
contract, so much so that the futures price and forward price need not be the same, if interest
rates are stochastic. The reason is that it is not the same thing to receive S(T)-F(0, T)
at time T as it is to receive daily payments that sum to this amount for each day between
times 0 and T. However, since we have not studied stochastic interest rates yet, for the
moment we assume that interest rates are deterministic, in which case the forward price and
futures prices are the same.
74 7. FUTURES AND DIVIDENDS
A no-arbitrage argument shows why. Here we use F to denote the futures price, which
after all we are about to show is the same as the forward price denoted the same way.
Suppose that on day i, that is, over the time interval [ti, ti+1], we are short B(ti+1, T)
futures contracts: this is possible because interest rates are deterministic, so B(ti+1, T)
is already known at ti. At the end of the day, at ti+1, the mark-to-market payment is
-B(ti+1, T)(F(ti+1, T -F(ti, T)), and we spend this to buy F(ti, T)-F(ti+1) bonds maturing
at T. If we do this every day from 0 = t0 until T = tm, the result is a time-T payoff of
m-1
i=0
(F(ti, T) - F(ti+1)) = F(0, T) - F(T, T),
but F(T, T) = S(T): the price as of T to get the underlying commodity at T. We can also
buy the underlying at time 0 while shorting S(0)/B(0, T) bonds for net zero cost, which
gives us the time-T payoff of S(T) - S(0)/B(0, T). Putting these two zero-cost strategies
together, we get a final payoff of F(0, T)-S(0)/B(0, T). To avoid arbitrage, this payoff must
be zero, i.e. F(0, T) = S(0)/B(0, T) for the futures price, the same as the forward price.
Recall that the extended Black-Scholes model is
B(t, T) = exp -
T
t
r(u) du
S(t) = S(0) exp
t
0
((u) - 2
(u)/2) du +
t
0
(u) dW(u)
where r, , and  are all deterministic functions. Note M(t) = 1/B(0, t), but it is helpful
for us to focus on bond prices in order to express conveniently forward/futures prices, which
are the same because interest rates are deterministic. The risk-neutral dynamics are
dB(t, T) = r(t)B(t, T) dt
S(t) = S(t) r(t) dt + (t) dWQ
(t) .
Then we have
F(t, T) =
S(t)
B(t, T)
=
S(0)
B(0, T)
exp
t
0
(r(u) - 2
(u)/2) du +
t
0
(u) dWQ
(u)
exp
t
0
r(u) du
=
S(0)
B(0, T)
exp -
1
2
t
0
2
(u) du +
t
0
(u) dWQ
(u)
and because the second factor is a stochastic exponential, this shows that the futures price has
zero risk-neutral drift and is itself a Q-martingale. (This also follows by applying Problem 6.1
to the risk-neutral SDEs.) The reason for this is that the futures contract is costless to enter
into: having no capital tied up in it (neglecting the margin account), one does not "deserve"
any expected return.
7.2. DIVIDENDS 75
7.2. Dividends
Most stocks pay dividends, so it would be nice to modify the Black-Scholes model slightly
to account for this fact. Our fixes will not be perfect: we will model dividends as being
somehow predictable, either constant or proportional to stock price. In reality, companies
can change their dividend policies, but these fixes are better than nothing. A simple and
elegant way to incorporate dividends into the Black-Scholes model is to assume a constant
proportional dividend yield, that is, at time t the stock pays dividends at a rate qS(t), where
q  0 is a constant. However, what companies actually do is announce in advance that they
will pay a certain lump sum on a given date. For the short-term (perhaps one year) covered
by dividend announcements, it might be preferable to use a model of deterministic, discrete
dividends, but such models are a bit more irritating to deal with.
The idea of a model with constant proportional dividend yield is that a company with
will tend to pay out as dividends a constant proportion of its value, so that dividends per
share will be a fixed fraction of the share price. The assumption of a continuous yield is
an approximation made for mathematical convenience. It is a better approximation for an
index than for an individual stock: stocks in the index pay on various dates dividends small
relative to the index value. This model of dividends also works for convenience yields on
commodity futures, as well as their storage costs (in which case take q < 0).
With proportional dividend yield q, the instantaneous gain from holding a share of stock
is dS(t) + qS(t) dt. Someone shorting a share would also have to pay out the dividends.
This affects the self-financing condition for a portfolio: the dividend payout goes into the
money market account or into further stock purchases. Therefore we end up with a different
PDE from the ordinary Black-Scholes PDE. Of course, this results in a new risk-neutral
measure and new prices for derivative securities. Let's go through the relevant parts of the
Black-Scholes analysis in this modified model.
Consider a derivative security having terminal payoff g(S(T)) at T and making a continuous
payment stream at rate h(u, S(u)) at time u. Assume it has a price f(t, S(t)),
and by no-arbitrage, dV (t) = df(t, S(t)) - h(t, S(t)) dt where V is the value of the replicating
portfolio, which must provide us not only with the value of the derivative security
at the end, but also with the continuous payment stream. The new self-financing
condition is dV (t) = dG(t) = 1(t)(dS(t) + qS(t) dt) + 0(t) dM(t), and plugging in for
dS(t) = S(t)( dt +  dW(t)) and dM(t) = rM(t) dt, we have
df(t, S(t)) = dV (t) + h(t, S(t)) dt
= (1(t)S(t)( + q) + 0(t)M(t)r - h(t, S(t))) dt + 1(t)S(t) dW(t).
From the time-dependent It^o formula, we had
df(t, S(t)) = ft(t, S(t)) + S(t)fS(t, S(t)) +
2
2
S2
(t)fSS(t, S(t)) dt+S(t)fS(t, S(t)) dW(t)
so it is still true that the number of shares of stock in the replicating portfolio 1 = fS(t, S(t)).
And again by the no-arbitrage principle, 0(t)M(t) = f(t, S(t)) - fS(t, S(t))S(t).
76 7. FUTURES AND DIVIDENDS
Now we compute that the drift of f(t, S(t)) is
1(t)S(t)( + q) + 1(t)M(t)r = fS(t, S(t))S(t)( + q) + (f(t, S(t)) - fS(t, S(t))S(t))r
= rf(t, S(t)) + ( + q - r)S(t)fS(t, S(t)).
Equating this with the drift from It^o's formula yields the PDE
(7.2.1) (r - q)xfx(t, x) - rf(t, x) + ft(t, x) +
1
2
2
x2
fxx(t, x) + h(t, x) = 0.
Applying the Feynman-Kaˇc formula, the time-t value of a derivative security with payoff
g(S(T)) at time T and its own (totally separate) continuous payment stream h(u, S(u)) for
u  [0, T] is
EQ
t,S(t) e-r(T-t)
g(S(T)) +
T
t
e-r(u-t)
h(u, S(u)) du
where
(7.2.2) dS(t) = S(t)((r - q) dt + dWQ
(t)).
Notice that the only thing that has changed from the Black-Scholes PDE is that now the
dividend-paying stock has drift r - q under the probability measure Q.
This may still be fairly called a risk-neutral measure. The total return on the stock is
the sum of capital appreciation and dividend payment, and the sum of the geometric drift
under Q plus dividend yield is (r - q) + q = r, the same as for the money market account.
In the original Black-Scholes model, the market price of risk was ( - r)/. Now it is
( - (r - q))/ = ( + q - r)/, which is more: to get geometric drift  plus dividend yield
q is greater compensation than just getting geometric drift .
From this point it is not difficult to duplicate the Black-Scholes analysis: we simply have
a different geometric drift r - q for the stock under Q. We will carry out the analysis under
the extended Black-Scholes model, where , r, q, and  are all deterministic functions of
time. Then under Q, the conditional distribution of ln S(T) given Ft is
(7.2.3) N ln S(t) +
T
t
r(u) - q(u) -
2
(u)
2
du,
T
t
2
(u) du .
Here M(t)/M(T) = B(t, T) = exp(-
T
t
r(u) du). We have
EQ
t,S(t)[-B(t, T)K1{S(T) > K}]
= -B(t, T)KQ[ln S(T) > ln K|Ft]
= -B(t, T)K


ln(S(t)/K) +
T
t
r(u) - q(u) - 2(u)
2
du
T
t
2(u) du


7.2. DIVIDENDS 77
and
EQ
t,S(t)[B(t, T)S(T)1{S(T) > K}]
= B(t, T)EQ
[S(t) exp
T
t
r(u) - q(u) -
2
(u)
2
du +
T
t
(u) dWQ
(u) 1{S(T) > K}|Ft]
= S(t) exp -
T
t
q(u) du EQ
[exp -
1
2
T
t
2
(u) du +
T
t
(u) dWQ
(u) 1{S(T) > K}|Ft]
= S(t) exp -
T
t
q(u) du ~P[S(T) > K|Ft]
by a Girsanov transformation with W
~P
(t) = WQ
(t) - t as before. Now the calculation is
similar to above to get
~P[S(T) > K|Ft] = 


ln(S(t)/K) +
T
t
r(u) - q(u) + 2(u)
2
du
T
t
2(u) du


so instead of the Black-Scholes formula, for the European call price we get
S(t) exp -
T
t
q(u) du 


ln(S(t)/K) +
T
t
r(u) - q(u) + 2(u)
2
du
T
t
2(u) du

(7.2.4)
-B(t, T)K


ln(S(t)/K) +
T
t
r(u) - q(u) - 2(u)
2
du
T
t
2(u) du

 .
Because the risk-neutral geometric drift of the stock is decreased by q, the probabilities
Q[S(T) > K|Ft] and ~P[S(T) > K|Ft] are both diminished. Also, the time-t value of getting
the stock at time T is now only S(t) exp -
T
t
q(u) du , not S(t). Some of the value of the
stock leaks away in the form of dividends between t and T. This must be accounted for if
you use the stock as numeraire to price the first term!
Much as in Section 3.2, we can compute the Greeks of this European call option in the
extended Black-Scholes model with dividends. Note that we get different answers for the
same option because we have changed the model. Much as B(t, T) = exp(-
T
t
r(u) du), let
Q(t, T) = exp(-
T
t
q(u) du) and V (t, T) =
T
t
2
(u) du. The no-arbitrage option price can
now be written
f(t, S) = SQ(t, T)(d1) - KB(t, T)(d2) where d1,2 =
ln SQ(t,T)
KB(t,T)
 1
2
V (t, T)
V (t, T)
.
78 7. FUTURES AND DIVIDENDS
Some useful facts are
d1
S
=
d2
S
=
1
S V (t, T)
d2
t
=
d1
t
+
2
(t)
2 V (t, T)
(d2) = (d1)
SQ(t, T)
KB(t, T)
and the partial derivatives are
 = q(t)Q(t, T)S(d1) - r(t)B(t, T)K(d2) SQ(t,
T)2
(t)(d1)
2 V (t, T)
 = Q(t, T)(d1)
 =
Q(t, T)(d1)
S V (t, T)
.
There are two main changes: first, one share of stock received at T is worth Q(t, T)  1
shares of stock now, which affects the hedging policy, thus  and . The dividend impacts
 because the replicating portfolio now not only pays interest on its borrowings, but gets
a dividend at a rate q(t) times the value invested in the stock. Second, volatility enters in
a more complicated way: for the most part, we care about the total remaining volatility
V (t, T), but in , the losses due to rebalancing when the stock wiggles and returns to its
previous value have to do with the current instantaneous volatility (t).
7.3. Problems
Problem 7.1. In this same extended Black-Scholes model of a stock with continuous
proportional dividend yield, find the no-arbitrage price of a European call option with strike
K and maturity T on the stock futures contract with maturity U  T.
Problem 7.2. In Problem 7.1, let U = T, so that the maturity of the futures contract is
the same as that of the option written on it. Is the option on the futures contract the same
as an option (of the same strike and maturity) on the spot, i.e. the underlying stock? Why
or why not? If the options are the same, show that your no-arbitrage pricing formulas give
the same price, or explain how and why they are different. If the options are not the same,
explain how their no-arbitrage prices differ. What do you think would be the best way to
hedge each of these two options: with the stock, with the futures, or both? What if we were
talking about a stock index, not a stock?
CHAPTER 8
Computation
There are two main types of computations in financial engineering. One type assumes
that a model (and its parameters) are given, then deduces a value such as a price for a
derivative security, an optimal hedging strategy, or a risk measure of a portfolio. The major
computational tools here are simulation and numerical solution of partial differential equations.
The other type is called calibration of a model: given a framework for modeling, with
unknown parameters, it chooses values of the parameters given market data, usually current
prices or price histories. The main tool here is optimization, frequently of a statistical flavor
(e.g. likelihood maximization or least-squares minimization). Numerical methods allow the
financial engineer to apply more sophisticated models to more complicated situations where
analytical results are not available.
In this chapter, we learn simulation and calibration by applying them to a simple model
where analytical results are available for comparison, and where data is readily available to
use in calibration. This is the extended Black-Scholes model for futures prices, applied to
the usual "vanilla" European options. In subsequent chapters, we will use simulation and
calibration to study more sophisticated models and securities.
8.1. Simulation
If we want to compute the no-arbitrage price of a derivative security as an expected
discounted payoff, but can not evaluate the expectation by hand, simulation can give us a
numerical estimate of the expectation. Suppose we want to evaluate EQ
[Y ], where Y is the
payoff discounted by the money market account, and Q is the risk-neutral measure. (In
general, we can use any numeraire to discount, as long as we use the associated equivalent
martingale measure.) Suppose also that we can sample from the distribution of Y under Q.
Produce n such independent, identically distributed (iid) samples Y (1)
, . . . , Y (n)
. Then Y =
n
j=1 Y (j)
/n is a sample average estimate of the mean EQ
[Y ]. Moreover, by the central limit
theorem, if n is large enough, then Y is approximately distributed as N(EQ
[Y ], VarQ
[Y ]/n).
This allows us to compute an approximate confidence interval at level  in the usual way:
Y  z/2^s/

n where z/2 = -1
(/2) and ^s = n
j=1(Y (j) - Y )2/n is the sample standard
deviation of Y (1)
, . . . , Y (n)
: having assumed that n is large, we will not quibble about whether
n or n - 1 should appear in the denominator. See any elementary statistics textbook for a
discussion of confidence intervals.
79
80 8. COMPUTATION
We now focus on the case of pricing a European call option in the extended Black-Scholes
model. The underlying is a stock index with a continuous dividend yield. The no-arbitrage
call price is given analytically in equation (7.2.4).
Because we are only interested in the option's payoff, which depends only on the final
stock index level S(T), we can just simulate that. The distribution of ln S(T) (conditional
on what we know at time t) is in formula (7.2.3). So we can simulate the jth value of S(T)
as
S(j)
(T) = exp

ln S(t) +
T
t
r(u) - q(u) -
2
(u)
2
du +
T
t
2(u) duZ(j)


where Z(1)
, . . . , Z(n)
are iid standard normal. Any software you are using for simulation
should have a routine for producing such random numbers. Then our Monte Carlo estimate
for the option price at t is
exp -
T
t
r(u) du
1
n
n
j=1
(S(j)
(T) - K)+
.
Another way to simulate is based on the SDE (7.2.2). Whenever we have an It^o process
obeying equation (5.1.4), the result of the Feynman-Kaˇc formula, we can simulate X by the
Euler scheme of discretizing time into m steps each of length h = (T -t)/m, so ti = t+ih.
For the jth path, let ^X(j)
(t0) = x, and
(8.1.1) ^X(j)
(ti+1) = ^X(j)
(ti) + (t, ^X(j)
(ti))h + (t, ^X(j)
(ti))

hZ
(j)
i+1
where all the Z
(j)
i for i = 1, . . . , m and j = 1, . . . , n are iid standard normal. The reason for
writing ^X instead of X is that we should realize that when we discretizing time like this,
we may not be sampling from the correct distribution anymore: in general, (u, X(u)) and
(u, X(u)) do not have to be constant over the time interval [ti, ti+1), and the conditional
distribution of X(ti+1) given X(ti) does not have to be normal with mean X(ti)+(ti, X(ti))h
and variance 2
(t, X(ti))h. This means that our simulation may be estimating the wrong
expectation, resulting in discretization bias.
In this situation, to reduce discretization bias, it looks like a better idea to simulate
the log stock price rather than the stock price itself. This is because the (arithmetic) drift
and volatility coefficients of the log are deterministic, while those of the stock price are not.
Indeed, we can eliminate discretization bias entirely in this case by defining
ai =
ti+1
ti
(r(u) - q(u) - 2
(u)/2) du and b2
i =
ti+1
ti
2
(u) du
and simulating X(j)
(t0) = x,
(8.1.2) X(j)
(ti+1) = X(j)
(ti) + ai + biZ
(j)
i+1,
then computing S(j)
(ti) = exp(X(j)
(ti)).
8.1. SIMULATION 81
Another use of simulation is to evaluate the quality of an approximate hedging strategy.
In reality, it is not possible to follow a hedging strategy that changes continuously over
time, nor is it practical to come close: the transaction costs would be too great. With an
approximate hedging strategy, we have hedging error, i.e. there is a nonzero profit or loss left
over at the end: we sold a contingent claim paying (S(T) - K)+
, executed the approximate
hedging strategy , and find that we end up with a residual (T)S(T) - (S(T) - K)+
which is typically nonzero. Before doing this, we should get inform ourselves as to what
the distribution of the residual might be. We care about the residual under P, a subjective
probability measure intended to describe the real world, not under Q or any other equivalent
martingale measure, which is just a construct that we use to compute no-arbitrage prices.
One straightforward way to construct an approximate hedging strategy is simply to
rebalance the portfolio at a finite number of times 0 = t0, . . . , tm = T, changing nothing in
between. In this case,  is a simple stochastic process. Trading in discrete time this way is
certainly practicable. One particular choice of such a strategy is to choose 1(ti) equal to
the value of the perfect continuous-time hedging strategy at that moment. Note that it is
not obvious that this is the best choice. Also, if we are to do this in a self-financing way, we
will not be able to have 0(ti) equal to the value prescribed by the continuous-time strategy
in general.
When simulating the result of this hedging process, we need to simulate all m steps, in
order to see what we would do at each time. It is irrelevant that we could simulate S(T)
correctly without any intermediate values, because we need to know those values in order to
see what hedge we use and what gains we incur at each step. However, we can and should
simulate based on the SDE as in equation (8.1.2) so as to avoid discretization bias. In the
simulation, we keep track not only of S and X, the stock price and its log, but also 0 and
1, the number of shares held in the money market account and in the stock.
At each step i, we will always choose 1(ti) =  = Q(t, T)(d1), as described in Section
7.2. At step 0, we have 0(0) = C - 1(0), where C is whatever premium we received
for selling the call. Consider how to update at step i + 1. We need to choose a way to
account for the dividends. Let us say that they are continuously reinvested in the stock.
Then we have 1(ti) shares of stock at time ti, and we have 1(ti) exp(
T
t
q(u) du) shares of
stock when the ith step ends at time ti+1. Then we decide how many shares of stock we
want to have at that time, which is 1(ti+1) based on the new computation of . So we buy
1(ti+1) - 1(ti) exp(
T
t
q(u) du) shares at ti+1, and I pay for this purchase by taking that
much money out of the money market account. So we must update 0 as follows:
0(ti+1) = 0(ti) (1(ti+1)
- 1(ti) exp(
T
t
q(u) du))S(ti+1)
M(ti+1)
.
Then at the end, our profit on the trade is 0(T)M(T) + 0(T)S(T) - (S(T) - K)+
.
82 8. COMPUTATION
8.2. Calibration
There is little orthodoxy on the subject of calibration, which is the process of using data
to choose the parameters of a model. There are two main kinds of calibration: using historical
data from the underlying price process, and using current prices of market-traded derivative
securities. An example of the former is statistical estimation of the historical volatility of
a stock price in order to choose  in the Black-Scholes model. An example of the latter is
finding a value of  that makes the results of the Black-Scholes formula come close to fitting
simultaneously the prices of European call and put options on this underlying, but with
various strikes and maturities. These approaches are sometimes combined: for instance, to
calibrate a Black-Scholes model with two stocks, one might use option prices on each of
them to choose the volatility magnitudes, and use historical stock price data to estimate
the correlation between the stocks. This would be enough information to construct a 2 × 2
volatility matrix.
At this point, we will focus on calibration to current market prices. The idea of this kind
of calibration is use current market prices of derivative securities to choose the parameters of
a model so that the model prices match market prices well. Then the model can be used to
price nonmarketed derivative securities in a way that can be regarded as "consistent" with
market prices.
We write the model price of a derivative security as h(; ) where  are unknown parameters
that need to be calibrated, while  are the known terms of the contract specific to
a derivative security. The notation with a semicolon indicates that we regard h as a function
of , and merely acknowledge the dependence on : if you like, h( ; ) is a different function
for each value of . We regard the market observables as fixed and do not include them in
the notation. For example, consider the Black-Scholes model. The market observables are S
and r: ignoring some troublesome facts (e.g. there is actually a bid-ask spread for the stock
price, and interest rates are not actually constant), we can simply observe the current stock
price and the current instantaneous interest rate. The contract terms are strike price K and
maturity T: these vary from one option to another, but are always known. The unknown
parameter is the volatility ; the drift  is irrelevant for option pricing in this model.
In the Black-Scholes model, the option pricing formula h is invertible, and for an option
whose contract terms are  and whose market price is P, h-1
(P; ) is called the implied
volatility. That is, if the Black-Scholes model were true, the volatility would have to be
this, in order for the option to have the price that it does in the market. Here the semicolon
notation shows that  is not involved in the inverse: we have P = h(h-1
(P; ); ).
Implied volatility can be very useful for discussing options, because market participants
have more intuition about the level of implied volatility for an underlying than for the price
of an option of specified strike and maturity. However, because the Black-Scholes model is
wrong, implied volatility and actual (statistical, historical) volatility do not have to have
a close relationship. In particular, one usually sees implied volatilities that are greater
than statistical volatilities, especially for options of short maturity, or which are deep out
of the money (meaning S(t) < K for a call). This is because the lognormal distribution
8.2. CALIBRATION 83
underestimates the probability of large, rapid changes in stock price relative to changes of
modest size. Given a statistically accurate volatility (if such a thing were possible), the
Black-Scholes formula would tend to give option prices that are too small: so the implied
volatility, which matches option prices, is larger.
In general, calibration is not as simple as using h-1
, or finding the values of parameters
implied by option prices. Inverse problems like this are often ill-posed, meaning that there
might be no solutions, or many solutions. If the model is not correct, then there might be
no value of the parameter vector  that simultaneously matches many market prices. This
is quite clear with the Black-Scholes model, where we typically find an implied volatility
surface such that implied volatility is not the same for every option (as the model would
suggest) but rather curved, and typically highest for short-maturity, out-of-the-money options.
No one value of  will get every option price right. If the data is dirty or stale (i.e.
reflects trades that took place hours ago, not recently) then there can even be apparent
arbitrages in the data, in which case there will certainly be no value of  that matches these
prices. But there could also be multiple values of  that do just as well as each other at
matching the market prices.
For these reasons, people usually regard calibration as an optimization problem of minimizing
an objective function that expresses the error between model prices and market
prices. The idea is that minimizing this error makes the model "as close as possible" to
matching market prices. The value 
of the parameter vector  that does this is said to be
a "best fit." One simple objective function would be the sum of squared errors:
O(; P, ) =
N
k=1
(Pk - h(; k))2
where Pk is the market price and k the contract terms of the kth security among the N
data points used in calibration. This is simple, but has the drawback that it tends to result
in pricing errors of approximately the same size for each security, even though some of them
(e.g. options deep in the money) will have prices much greater than others (e.g. options
deep out of the money), by factors of perhaps 1000. We might be uncomfortable with this
disparity in the relative pricing errors 1 - h(; k)/Pk, and one way to attack that problem
is to use instead the objective function
O(; P, ) =
N
k=1
(ln Pk - ln h(; k))2
,
that is, first take the logarithm of prices: the difference of logs is the log of the ratio, so
we can see the minimization of this objective function as focusing on relative pricing errors.
There are other possible transformations, for example, the whole family of transformations
g(y) = (g
- 1)/ for   (0, 1], with g(y) = ln y corresponding to the choice  = 0. Aside
from influencing the nature of the best fit, using a transformation can also have a big impact
on the time required to execute whatever nonlinear optimization routine you might be using.
84 8. COMPUTATION
It would be nice to modify the objective function to reflect the fact that some of the
market prices being used embody higher-quality information than others. For example,
options that are of a short, but not too short maturity, and are slightly out of the money,
are often more liquid than some others. The prices of more liquid securities are more recent
and depend less on the vagaries of supply and demand. The prices of illiquid securities drop
a lot when someone decides to sell, and rise a lot when someone decides to buy. So we might
like to give more weight in the objective to the more liquid options, reasoning that it seems
better to price them well while pricing worse the illiquid options, whose prices are not so
informative. This seems to be an advanced topic.
One might think that the way to deal with a model that fits the data poorly is to increase
the number of parameters. For instance, it might be a good idea to replace the Black-Scholes
model with the extended Black-Scholes model when calibrating to prices of options of several
maturities T1, . . . , Tm. Instead of just one parameter , we have m parameters b1, . . . , bm,
where bi is the standard deviation of the log stock price over [Ti-1, Ti]. Sometimes adding
parameters helps, but it has its dangers. Even if h is a ridiculous function, having nothing
whatever to do with finance, we might be able to get a good fit of "model prices" h(; )
to market prices P if there are enough parameters, i.e. if  is sufficiently high-dimensional.
This is called curve-fitting, and usually causes poor results. Remember, the objective is not
to fit the data well with some curve, but to come up with model prices for non-marketed
derivative securities that work, i.e. enable us to hedge so that we end up with long-term
profits that are adequate compensation for the risks that remain.
This is hard to assess without putting money on the line, but one way to assess from
the safety of one's computer whether the prices from the calibrated model are "good" is
to test them against out-of-sample market data. That is, calibrate the model using some
market prices, then compare the calibrated model prices for a new set of market-traded
options to their market prices. If they fit poorly out-of-sample, this model and calibration
scheme do not inspire much confidence. This often happens when a model is over-fitted. A
dilemma here is that one would like to use out-of-sample data to check whether the model
is doing well, yet one would like to calibrated the model to all of the available data so as not
to "waste" it! The scheme of calibration with penalization and cross-validation solves this
problem at the same as another one: one often wants to use a model with many parameters
while avoiding over-fitting.
The idea behind penalization is that if the fitted parameter vector 
is ugly, this is
a sign of over-fitting. In the extended Black-Scholes model, we expect that the graph of
 = b1 . . . bm should be relatively smooth and pretty, not wiggling up and down all the
time, which would be hard to rationalize financially. It would be difficult to believe that such
a parameter vector bears much relation to the actual behavior of the market price process.
If you like, you can think of this in Bayesian terms: we have a prior predisposition to believe
that the parameter vector is pretty, but we also want to update our beliefs to reflect data
about current derivative security prices. So we can invent a penalty term to penalize values
8.3. PROBLEMS 85
of  where, for instance, the components change too much. Examples are:
P() =
m
i=1
(bi - bi-1)2
or P() =
m
i=2
(bi - 2bi-1 + bi-2)2
,
where the first penalizes changes between adjacent parameters, and the second penalizes
changes in the rate of change of parameters. Note that these penalties depend on the linear
temporal structure of  = b, the time series of log stock price standard deviations. If the
parameters had a different structure, we would need a different form for the penalty.
Now we will minimize the penalized objective function
O(; P, ) = E(P, h(; )) + P(),
where E(x, y) is the usual error term, such as k(xk - yk)2
discussed above, and  is the
strength of penalization. Therefore we will no longer choose the error-minimizing , which
might suffer from over-fitting, but a different, prettier , which we hope will be more useful.
The problem with this scheme is that, a priori, we have no good way of choosing the
penalization strength . If it is too small, the penalty will count for little, and we will overfit.
If it is too big, we will get a pretty  (nearly a constant or nearly a line, respectively,
for the two penalty terms discussed above) that provides a bad fit to the data. So now we
also need a way to choose , and this is what cross-validation does. The idea is that for each
data point k, we see how we would do in matching this market price after we calibrated to
all the other data points using penalization strength . When done for each k, this yields
an entire vector of model prices. Then we choose  to minimize the resulting error, known
as the cross-validation error.
For each k, let P-k and -k be respectively the vector of market prices and of contract
terms, but with the kth component removed. Then find the minimizer ~k() of
O(; P-k, -k, ) = E(P-k, h(; -k)) + P().
Let ~() be a matrix formed of these vectors ~k(), and likewise  be a matrix formed of
the vectors k. After doing this for each k, we can compute the cross-validation error
CV(; P, ) = E(P, h( ~(); )).
After finding the optimal value 
that minimizes CV(; P, ), we then optimize one last
time to get our fitted parameter vector 
, the minimizer of
OCV (; P, ) = E(P, h(; )) + 
P().
This approach helps a great deal to counteract the problem of over-fitting, but it is very
computationally expensive, because we have to do nested optimizations.
8.3. Problems
If using Excel, submit a printout of your spreadsheet, with labels showing the formulae
that you use to compute the cells. If using MATLAB, submit the output and source code of
your programs. In either case, make sure that it is easy for the grader to find your answers
86 8. COMPUTATION
and to see how you arrived at them. Groups may submit a single copy with the names of all
group members.
Problem 8.1. An Asian option is one based on an average price. Here define Sj =
j
i=1 S(ti)/j, the discrete, arithmetic average of the prices at the dates t1, . . . , tj. Consider
an Asian option paying ( Sm - K)+
at time T = tm. Take T = 0.25 and m = 13 for
weekly averaging dates, and let K = 105. Assume the ordinary Black-Scholes model with
S(0) = 100, r = 0.05,  = 0.15, and  = 0.2. Construct a simulation estimate for the
no-arbitrage price of this Asian option in this model. Simulate n = 100 paths. On the basis
of information from this simulation, how many paths do you think you need in order to have
95% confidence of estimating the option's price within 1% of the true value? Now run a
simulation with this many paths, and report a 95% confidence interval for the option price.
Problem 8.2. Under "External Links" on the course page, there is a link to the Chicago
Board Options Exchange term sheet for options on the S&P500 index. Under "Assignments"
there is a webpage with data from the market close of November 21, 2002. The data contains
the closing level of the S&P500 index (933.79), and data for calls and puts of many strikes
and two maturities:
* Last Sale is the price at which the most recent trade in this option took place. Note
that the data does not say when that trade was.
* Net is the change between Last Sale and the price of the trade previous to that one.
* Bid and Ask are prices at which market makers most recently offered to buy or sell
(respectively) this option.
* Vol is how many of this option traded today.
* Open Int is "open interest," which equals the total number of options owned by
those people who are long this option. Because the net total number of options is
zero, this also equals the total number of options sold by those people who are short
this option.
Calibrate the extended Black-Scholes model with continuous dividend yield to this data.
Show what you did, explain why you did it, and report clearly the resulting values of all
parameters. Provide graphs illustrating the quality of the model's fit to the data: think
hard about how to present this information, and strive to make the graphs as informative as
possible. How useful do you think this model is for this market? What, if anything, could
be changed to make it more useful?
Do not ask the instructor or assistant questions outside of section or office hours, even
when it is not clear how to proceed. For example, the data does not contain an interest rate:
look it up elsewhere. Or again, the term sheet tells you when the options expire, but you
have to decide for yourself how to measure the amount of time until that day.
CHAPTER 9
Volatility Models
At this point, we have seen that we have problems calibrating even the extended BlackScholes
model to European stock index options data. The flexibility to let volatility change
over time helps to fit prices of options with different maturities, but the fit is poor when
looking at options with the same maturity but different strikes. One approach to overcoming
this problem is to allow the volatility to have some sort of dependence on the stock price
itself. (Another is to leave the realm of It^o processes, for instance, by considering models
with jumps.)
There are two main types of models that do this: local volatility models:
(9.0.1) dS(t) = S(t)((t) dt + (t, S(t)) dW(t))
and stochastic volatility models:
dS(t) = S(t)((t) dt + V (t) dW(t))(9.0.2)
dV (t) = a(t) dt + b(t) dW(t).
Notice that neither of these is a model of implied volatility. The point of both of them is that
the stock's geometric volatility has to be more interesting than just a deterministic process,
but they go about this quite differently. A local volatility model implies a complete market,
because the stock is the only source of risk: the volatility is a function  of time and stock
price. In a stochastic volatility model, typically the stock price S and its volatility V are
correlated, but not perfectly dependent, and the result is an incomplete market: the risks
associated with changes in the level of volatility V , which is not itself a traded security, can
not be hedged away entirely just by trading in the stock.
9.1. Local Volatility
A local volatility model is so called because the volatility of the stock depends only on
a space-time "location": the current time t, and the current value of the stock S(t). In this
model,  is a function of two variables that tells the geometric volatility given the time and
stock price. The drift  is merely a function of time: a deterministic process. The rationale
for this seems to be that there is not very much point in allowing the drift too to depend on
the stock price, because we could not calibrate such a function, since it does not influence
option prices in this model.
On the other hand, the different volatilities that take effect at different stock prices do
affect various options in different ways. The local volatilities at very low stock price levels
will have little effect on a call option with a high strike price: if the stock price gets that
87
88 9. VOLATILITY MODELS
low, this option is likely to have zero payoff. But local volatilities at very low stock price
levels will have a significant effect on a put option with a low strike: high volatility at low
stock price levels increases the value of this option, because it increases the likelihood that
the stock will drop far below the strike, resulting in a big payoff.
Local volatility models appeared in 1994 in Risk magazine1
where articles by Dupire
(of Paribas) and Derman & Kani (of Goldman Sachs) showed how they could be used to
calibrate binomial tree models to implied volatility surfaces. With the general local volatility
model (9.0.1), one is tempted to have as many parameters as options. If there are maturities
T1, . . . , Tm and strikes K1, . . . , Kn, one may discretize space-time into m(n + 1) rectangles.
This can lead to overfitting, yet it is hard to justify deliberately coarsening this discretization.
So this is a perfect setting for penalized calibration, or for using some technique such as
spline-fitting to create a smooth surface with fewer parameters.
We do not cover binomial trees in this course. Instead, we consider simulation of a local
volatility model. As we saw before, it makes sense to use the Euler scheme to discretize the
log stock price X(t) = ln S(t), not the stock price. The discretization
X(j)
(ti+1) = X(j)
(ti) + ((ti) - 2
(ti, S(j)
(ti))/2)h + (ti, S(j)
(ti))

hZ
(j)
i+1
is not exact in this case, because (t, S(t)) does change during a time step, but it is still
better. Another advantage to this scheme is that the simulated stock price can not become
negative. For risk-neutral simulation, we would have in the above (u) = r(u), the interest
rate.
One specific type of local volatility model is the constant elasticity of variance (CEV)
model, which specifies
(9.1.1) (t, S) = S-1
.
Usually, we would also take the drift (t) =  a constant, and thus have the model dS(t) =
S(t) dt+S
(t) dW(t)). That is, the local volatility does not depend on time, only the level
of the stock price. If  = 1, we get the Black-Scholes model with constant geometric drift
 and volatility : we can see that  governs the magnitude of volatility. But  governs the
rate at which geometric volatility changes with changes in the stock price. Typically, a CEV
model has  < 1, so that the geometric volatility increases when the stock price decreases.
One attraction of CEV is that options can be priced in closed form, in terms of the
noncentral chi-square distribution. The derivation is excessively involved, but the result is
that, when the interest rate is q and the continuous proportional dividend yield is q, the
no-arbitrage price at time t of the European call option with maturity T and strike K is
(9.1.2) S(t)e-q(T-t) F y, 2 +
1
1 - 
,  - Ke-r(T-t) F y, 2 -
1
1 - 
, 
1Available in the business school library: read it.
9.2. STOCHASTIC VOLATILITY 89
where F(y, n, ) is the noncentral chi-square cumulative distribution function with n degrees
of freedom and noncentrality parameter , F is the complement 1 - F, and
y = K2(1-)
 = S2(1-)
(t) exp(2(r - q)(1 - )(T - t))
 =
2(r - q)
2(1 - )(exp(2(r - q)(1 - )(T - t)) - 1)
.
This noncentral chi-square cdf F is available in the statistics toolbox of MATLAB.
9.2. Stochastic Volatility
"Stochastic volatility" usually means that, in the model, the actual (statistical, not implied)
volatility of the stock can change randomly, in a way that can not be completely
explained by the passage of time or the change in the stock price. We need at least K = 2
components of the Wiener process to make this work. The correlation between changes in
the stock price and changes in volatility is very important: this relationship is the crux of
local volatility models, and we do not want to give it up here. Again, usually we find that
this correlation is negative. This tends to give out-of-the-money puts (with low strikes) a
high implied volatility relative to out-of-the-money calls (with high strikes), a phenomenon
which is frequently strongly present in option prices. This leads some people to refer to
a volatility "smirk," having an asymmetric shape like a check mark, rather than "smile,"
having a symmetric shape like the letters U or V.
One particular case is the Heston model:
dS(t) = S(t)((t) dt + V (t) dW1(t))
dV (t) = -V (t) dt +  dW(t).(9.2.1)
By It^o's formula, the SDE for the instantaneous variance, as opposed to volatility, is:
dV 2
(t) = (2
- 2V 2
(t)) dt + 2V (t) dW(t).
From this we can see that the instantaneous variance process is mean-reverting: when
V 2
(t) is below the "mean" level 2
/(2), it has positive drift, and negative drift when it is
above. Note that we need  > 0 for this mean level to be finite and positive: the stock
volatility V needs to have a negative drift, otherwise the variance of stock returns will blow
up.
Remember, we do not care if volatility V becomes negative, because it is the absolute
value (equivalently, its square) that really counts. Everything works the same when V is
negative as when it is positive. If V (t) < 0 and  > 0, the drift of V still tends to make it
smaller in absolute value. Likewise, although S will now vary inversely to W, the absolute
value of V also varies inversely to W, because increasing V means decreasing |V |.
The arithmetic volatility of the instantaneous variance V 2
is proportional to the square
root, namely the instantaneous stock volatility V . We will see when covering interest rates
that these two features of V 2
, mean reversion and square root volatility, mean that it is
90 9. VOLATILITY MODELS
obeying the Cox-Ingersoll-Ross model. The point of this is that the instantaneous variance,
the rate at which stock returns' variance increases with time, should stay around some
moderate level over the long run (not grow without limit), it should change less when it is
small than when it is large, and it should never become negative. The square root volatility
prevents negativity: when V 2
is near zero, its volatility gets small, and its drift tends to pull
it back up. This is all similar to what we want an interest rate model to do.
In terms of option pricing, we face an interesting problem, typical of incomplete markets.
There is no obvious way to get a unique EMM: S is the only asset here, other than a money
market account, because V is not an asset. So under an EMM for the money market as
numeraire, S should have drift r, but what drift should V have? There is no way to decide.
We know how the drift of W1 changes under a Girsanov transformation from P to "Q,"
because dW1 appears in dS: but what about W2, whose change of drift we would also need
to know in order to find the drift of V ? If we want to price options via expected discounted
payoff under an EMM in the usual way, we will have to come up with a way of picking a
drift for V . One way to do this is via calibration: we pick a drift for V that causes option
prices under this model to agree with the actual market prices.
We can not observe volatility directly in the market, and the stock price S by itself is not
a Markov process: looking at the recent history of stock price changes will tell you something
about the level that volatility was at recently, and therefore about the distribution of future
stock price changes. One approach to calibration is to do just this, that is, to try to infer the
current value V (t) of volatility from recent stock price history. Then it would still remain to
calibrate from current market prices the remaining parameters,  and the 2-vector . The
norm  = 2
1 + 2
2 gives the "vol of vol," and the significance of the two components
is that when 1 is relatively larger, there is greater correlation between V and S, which
depends only on W1. Another approach to calibration would be to calibrate V (t), , and 
from market prices. As always, it is possible to attempt to calibrate all the parameters from
historical data, but this is not a good idea if the model is too far from the truth. One way
to check for a problem here is to see if you get very different answers when calibrating to
historical data based on different time scales: that is, if you look at volatilities of hourly or
weekly returns, or if you define "recent" history as the last 20 or 100 time periods.
In simulating this Heston model, it seems like a good idea to simulate V , which already
has a constant volatility:
V (j)
(ti+1) = V (j)
(ti) - (V (ti) + 2
/2)h + 1

hZ
(j,1)
i+1 + 2

hZ
(j,2)
i+1
and X = ln S:
X(j)
(ti+1) = X(j)
(ti) + ((ti) - V 2
(t)/2)h + V (ti)

hZ
(j,1)
i+1 .
More sophisticated schemes are possible: for instance, once we have simulated not only V (ti)
but also V (ti+1), we could use that information in simulating the change in the log stock
price X, based on the idea that its average volatility over [ti, ti+1] is probably between V (ti)
and V (ti+1), rather than right at V (ti).
APPENDIX A
Final Review
Mathematical Tools:
* It^o's formula, especially:
­ exp/ln transformations; arithmetic vs. geometric representations
­ product rule
* (conditional) probabilities and moments for It^o processes
* Feynman-Kaˇc formula
* Girsanov transformation
Concepts:
* portfolio strategies: tame, self-financing, arbitrage
* no-arbitrage pricing by replication
* hedging and greeks
* numeraires and equivalent martingale measures: risk-neutral pricing as special case
* simulation
* calibration
Models: Black-Scholes variants:
* ordinary
* multi-dimensional
* extended
* with continuous proportional dividend yield
* for futures
91
APPENDIX B
Solutions to Problems
Problem 1.1. Both strategies are self-financing because the portfolio allocation is not
changing. The long stock strategy is tame because S(t), as a geometric Brownian motion,
is bounded below by 0. Since it has a positive initial cost S(0), it is not an arbitrage. The
short stock strategy is not tame because S(t) is unbounded above. Since it has a negative
terminal payoff -S(T) < 0, it is not an arbitrage.
Problem 1.2. The payoff of the portfolio long one call struck at K1 and one struck at
K3, and short two calls struck at K2, is nonnegative. Therefore 0  C(K1)+C(K3)-2C(K2)
by the no-arbitrage principle. Thus we get as an upper bound C(K2)  1
2
(C(K1)+C(K3)) =
12.5. For the lower bound, consider the portfolio long one call struck at K1 and short one
struck at K2. This portfolio has a payoff bounded above by 10. But a payoff of 10 must
be worth 9.7 now, in order to avoid arbitrage with the bond. So C(K1) - C(K2)  9.7, i.e.
C(K2)  C(K1) - 9.7 = 10.3.
Problem 2.1. These can all be solved by differentiating. The probability of loss
(1) Decreases, because you make more in the money market account.
(2) Decreases, because you make more in the stock.
(3) Increases, because the risk of stock losses increases. This is obvious, but we can also
see the relationship from
ln 1 + 0(1-erT )
1S(0)
- ( - 1
2
2
)T


T
=
ln 1 + 0(1-erT )
1S(0)
- T


T
+


T
2
where the numerator of the first term is negative, because 1 - erT
< 0, so we are
taking the log of something less than 1.
(4) Sorry, this is a bit too complicated to get into, given  > 2
/2.
(5) Decreases, because more value is in the money market, which will surely increase in
value. Again, note that 1 - erT
is negative.
Problem 2.2. The increment Y (s) - Y (0) = Y (0)(exp(s + W(s)) - 1) whereas
Y (2s) - Y (s) = Y (s)(exp(s + (W(2s) - W(s))) - 1). The Wiener process increments
W(s) = W(s) - W(0) and W(2s) - W(s) are independent with distribution N(0, s), so
exp(s + W(s)) - 1 and exp(s + (W(2s) - W(s))) - 1 are independent and have the
same distribution. However Y (0) is constant while Y (s) is a lognormal random variable, so
Y (s)-Y (0) and Y (2s)-Y (s) do not have the same distribution. They are dependent, since
92
B. SOLUTIONS TO PROBLEMS 93
Y (2s) - Y (s) = (Y (0) + (Y (s) - Y (0)))X where Y (0) is a constant and X is independent of
Y (s) - Y (0).
Problem 2.3. The mean function is
Z(t) = E[Z(t)] = E[Z(0) + W(t) + (Z(T) - Z(0) - W(T))(t/T)]
= Z(0) + E[W(t)] + (Z(T) - Z(0) - E[W(T)])(t/T)
= Z(0) + 0 + (Z(T) - Z(0) - 0)(t/T) = Z(0)
T - t
T
+ Z(T)
t
T
.
The covariance function is
cZ(t, s) = Cov[Z(t), Z(s)]
= Cov[Z(0) + W(t) + (Z(T) - Z(0) - W(T))(t/T),
Z(0) + W(s) + (Z(T) - Z(0) - W(T))(s/T)]
= Cov[(W(t) - W(T)(t/T)), (W(s) - W(T)(s/T))]
= 2
Cov[W(t), W(s)] - Cov[W(t), W(T)]
s
T
- Cov[W(s), W(T)]
t
T
+Cov[W(T), W(T)]
st
T2
= 2
min{s, t} -
st
T
-
st
T
+
st
T
= 2
min{s, t} -
st
T
The variance Var[Z(t)] = 2
(t - t2
/T) and this is at its maximum, T/4, at t = T/2.
Problem 2.4. Because W(t/T) has the distribution N(0, t/T), ~W(t) =

TW(t/T)
has the distribution N(0, t). Also ~W(0) =

T0 = 0. Its increment is ~W(t) - ~W(s) =
T(W(t/T) - W(s/T)). If we let v = t/T and u = s/T, we can see clearly that we are
talking about Wiener process increments of the form W(v)-W(u), which have the properties
of stationarity and independence. Multiplying by a constant has no effect on stationarity
and independence. If a function f(t) is continuous, then so is af(bt). Thus continuity of
the sample path W(), which is a function of time, implies continuity of ~W(), because
~W(, t) =

TW(, t/T): we just are plugging in a =

T and b = 1/T.
Problem 2.5. The approximate Wiener process is always a linear combination of standard
normal random variables, so the mean function is always zero. For each m, the covariance
function
cm(t, s) = cW(m) (t, s) = Cov W(m)
(t), W(m)
(s) =
2m
i=1
~Hi(s) ~Hi(t).
94 B. SOLUTIONS TO PROBLEMS
The relevant Schauder functions (up to n = 22
= 4) are
~H1(t) = t
~H2(t) =
t for t  [0, 1/2]
1 - t for t  [1/2, 1]
~H3(t) =


2t for t  [0, 1/4]
2(1/2 - t) for t  [1/4, 1/2]
0 for t  [1/2, 1]
~H4(t) =


0 for t  [0, 1/2]
2(t - 1/2) for t  [1/2, 3/4]
2(1 - t) for t  [3/4, 1]
The covariance function of a true Wiener process is s since s  t. For m = 0, c0(t, s) =
~H1(s) ~H1(t) = st. This is correct only on the line segments in G(0)
= {(t, s)|s = 0 or t = 1}.
Next, c1(t, s) = c0(t, s) + ~H2(s) ~H2(t), and the second term must be evaluated over the
three regions s  t  1/2, s  1/2  t, and 1/2  s  t. (Remember we are doing the
calculation only for s  t.) The result is
c1(t, s) =


2st for s  t  1/2
s for s  1/2  t
st + (1 - t)(1 - s) for 1/2  s  t
So G(1)
is the region {(t, s)|s  1/2  t}  G(0)
.
Finally, c2(t, s) = c1(t, s) + ~H3(s) ~H3(t) + ~H4(s) ~H4(t) and there are ten regions on which
to evaluate c2: the ten cells of the 4 × 4 square for which s  t. However, ~H3(s) ~H3(t) is
nonzero only when s and t are both in [0, 1/2]:
~H3(s) ~H3(t) =


2st for s  t  1/4
2s(1/2 - t) for s  1/4  t  1/2
2(1/2 - t)(1/2 - s) for 1/4  s  t  1/2
0 otherwise
Likewise
~H4(s) ~H4(t) =


2(s - 1/2)(t - 1/2) for 1/2  s  t  3/4
2(s - 1/2)(1 - t) for 1/2  s  3/4  t
2(1 - t)(1 - s) for 3/4  s  t
0 otherwise
In total,
c2(t, s) =


4st for s  t  1/4
s for s  3/4, 1/4  t, s  t
st + 3(1 - s)(1 - t) for 3/4  s  t
B. SOLUTIONS TO PROBLEMS 95
So G(2)
is the region {(t, s)|s  3/4, 1/4  t, s  t}  G(0)
. As m increases, the approximate
Wiener process gets better in the sense that the covariance function is correct for more pairs
of times (t, s).
Problem 2.6 The expectation E[Y (t)] = 0, and the variance Var[Y (t)] =
t
0
E[X2
(s)] ds
by the It^o isometry. The increments are not independent in general: for instance, both
Y (2) =
2
0
X(s) ds and Y (3) - Y (2) =
3
2
X(s) ds may depend on X(1). This is because
X(t) for t  [1, 2] and for t  [2, 3] may be influenced by X(1).
Problem 2.7
(1) dY (t) = X(t) dW(t)
(2) dY (t) = Y (t) 2
2
dt +  dW(t)
(3) dY (t) = Y (t)  + 22
2
dt +  dW(t)
(4) dY (t) = Y (t) ((2 + 2
) dt + 2 dW(t))
(5) dY (t) = Y (t) ((2
- ) dt -  dW(t))
Problem 2.8
(1)

x - X(0) - T


T
(2)

ln(x/X(0)) - ( - 2
/2)T


T
because d(ln(X(t)) = ( - 2
/2) dt +  dW(t).
(3) It is not clear how to find the distribution of X(T) here.
Problem 3.1 In the derivation of the Black-Scholes PDE, we found that the arithmetic
volatility of the call price f(t, S(t)) is StfS(t, S(t)) and the arithmetic drift is
rf(t, S(t)) + ( - r)S(t)fS(t, S(t)).
Then we learned that the delta fS(t, S(T)) = (d1). To get the geometric coefficients, simply
divide by the value of the call, which is
f(t, S(t)) = S(t)(d1) - Ke-r(T-t)
(d2).
The call option's geometric volatility is
C =
S(t)(d1)
S(t)(d1) - Ke-r(T-t)(d2)
 > ,
the stock's geometric volatility. The call option's geometric drift is then
C = r + ( - r)
C

> r + ( - r) =  > r.
96 B. SOLUTIONS TO PROBLEMS
Problem 3.2 Put-call parity is C(t) - P(t) = S(t) - Ke-r(T-t)
or
P(t) = C(t) - S(t) + Ke-r(T-t)
= S(t)((d1) - 1) + Ke-r(T-t)
(1 - (d2))
= Ke-r(T-t)
(-d2) - S(t)(-d1)
using (-z) = 1 - (z).
Problem 3.3 The result of the computations is
 = -
S(d1)
2

T - t
+ re-r(T-t)
K(-d2)
 = -(-d1)
 =
(d1)
S

T - t
.
This does satisfy the Black-Scholes PDE  = -2
S2
/2 - r(S - P), where P is now the
put value P = Ke-r(T-t)
(-d2)-St(-d1). As before, -2
S2
/2 = -S(d1)/(2

T - t).
As for the second term in the PDE,
-r(S - P) = -r(-S(-d1) - Ke-r(T-t)
(-d2) + S(-d1) = rKe-r(T-t)
(-d2),
which is indeed the other term of .
Problem 3.4 The stock S(t) has delta of 1, and zero gamma and theta. The money
market account M(t) = ert
has zero delta and gamma, and its theta is rMt = rert
. Put-call
parity for prices was C(t) - P(t) = S(t) - Ke-r(T-t)
or P(t) = C(t) + Ke-rT
M(t) - S(t).
So we verify
(P)
= (C)
+ Ke-rT
(M)
- (S)
= (d1) + 0 - 1 = (-d1)
(P)
= (C)
+ Ke-rT
(M)
- (S)
= (C)
(P)
= (C)
+ Ke-rT
(M)
- (S)
= -
S(d1)
2

T - t
- re-r(T-t)
K(d2) + Ke-rT
rert
- 0
= -
S(d1)
2

T - t
+ re-r(T-t)
K(-d2).
Problem 4.1 P[W(T) > 0|Ft] = P[W(T) - W(t) > -W(t)|Ft], and given Ft, W(t)
is known and W(T) - W(t) is independent with distribution N(0, T - t). Therefore the
conditional probability is (W(t)/

T - t).
P[X(T) > 0|Ft] = P[X(0) + T + W(T) > 0|Ft] = P[W(T) > -(X(0) + T)/|Ft]
= 
X(0) + T + W(t)


T - t
= 
X(t) + (T - t)


T - t
B. SOLUTIONS TO PROBLEMS 97
using the previous result. Or one could simply argue that the conditional distribution of
X(T) given Ft is N(X(t) + (T - t), 2
(T - t)).
(1) As T  t, the probability goes to 1 if X(t) is positive, to 0 if it is negative, and to
1/2 if it is 0. Note that the effect of the drift is negligible compared to the effect of
volatility in the limit.
(2) As T  , the probability goes to 1 if  is positive, to 0 if it is negative, and to 1/2
if it is 0. Note that the effect of the volatility is negligible compared to the effect of
drift in the limit.
(3) As   0, the probability goes to 1 if X(t)+(T -t) is positive, to 0 if it is negative,
and to 1/2 if it is 0.
(4) As   , the probability goes to 1/2.
(5) As   , the probability goes to 1.
(6) As   -, the probability goes to 0.
Problem 4.2 Using the tower property, E[W(t)W(u)|Fs] = E[E[W(t)W(u)|Ft]|Fs].
Pulling out what is known at time t, this equals E[W(t)E[W(u)|Ft]|Fs] = E[W(t)W(t)|Fs].
Because the second moment is mean squared plus variance, this equals E[W(t)|Fs]2
+
Var[W(t)|Fs]. But
Var[W(t)|Fs] = Var[W(s) + (W(t) - W(s))|Fs]
= Var[W(s)|Fs] + Var[W(t) - W(s)|Fs] = Var[W(t) - W(s)|Fs]
because W(s) is known given Fs. So we get W2
(s) + t - s.
Problem 4.3 The mean vector and covariance matrix are
ln S(0) + ( - 2
/2)t
ln S(0) + ( - 2
/2)T
and
2
t 2
t
2
t 2
T
.
Thus the correlation is  = t/

tT = t/T. We are looking for P[S(T)  K|S(t)] =
P[X(T)  ln K|X(t)], and using the bivariate normal, the conditional distribution of X(T)
given X(t) is normal with mean
ln S(0) + ( - 2
/2)T + 1(X(t) - ln S(0) + ( - 2
/2)t) = X(t) + ( - 2
/2)(T - t)
= ln S(t) + ( - 2
/2)(T - t)
and variance 2
T(1 - t/T) = 2
(T - t). Therefore the conditional probability is

ln(S(t)/K) + ( - 2
/2)(T - t)


T - t
.
This is equal to the conditional probability given all the information in Ft because S is
Markov.
98 B. SOLUTIONS TO PROBLEMS
Problem 4.4 We have
M(t) = exp
t
0
r(s) ds
S(t) = S(0) exp
t
0
((s) - 2
(s)/2) ds +
t
0
(s) dW(s) .
The loss event V (T) < V (0) is S(T) < S(0)+(0/1) (1 - M(T)). Let K = S(0)+(0/1)(1M(T)).
Then we want to find
P[S(T) < K|Ft] = P S(t) exp
T
t
((s) - 2
(s)/2) ds +
T
t
(s) dW(s) < K|Ft
= P
T
t
(s) dW(s) < ln(K/S(t)) -
T
t
((s) - 2
(s)/2) ds|Ft
= 


ln(K/S(t)) -
T
t
((s) - 2
(s)/2) ds
T
t
2(s) ds

 .
Problem 5.1 From the F-K formula,
f(t, x) = EQ
t,x ln X(T) +
T
t
X(u) du
= EQ
t,x [ln X(T)] +
T
t
EQ
t,x[X(u)] du
where
dX(t) = X(t) dWQ
(t) or d ln X(t) = -
1
2
dt + dWQ
(t).
So
EQ
t,x[ln X(T)] = EQ
ln x -
1
2
(T - t) + (WQ
(T) - WQ
(t)) = ln x -
1
2
(T - t)
and
EQ
t,x[X(u)] = EQ
t,x x +
T
t
X(u) dWQ
(u) = x
because the stochastic integral has zero expectation: X is itself a martingale under Q. If
you want to see this the usual way, with a stochastic exponential, compute
EQ
t,x[X(u)] = EQ
exp ln x -
1
2
(u - t) + (WQ
(u) - WQ
(t))
= exp(ln x)EQ
exp -
1
2
u
t
1 ds +
u
t
1 WQ
(s)
= x.
B. SOLUTIONS TO PROBLEMS 99
Thus we get
f(t, x) = ln x -
1
2
(T - t) +
T
t
x du = ln x + x -
1
2
(T - t).
Problem 5.2 We have the Black-Scholes PDE
ft(t, x) + rxfx(t, x) +
1
2
2
x2
fxx(t, x) - rf(t, x) = 0,
with terminal condition f(T, x) = g(x) = (x - K)+
. The Feynman-Kaˇc formula evaluated
at (t, S(t)) yields
f(t, S(t)) = EQ
t,S(t) (S(T) - K)+
e-r(T-t)
where
dS(t) = S(t)(r dt+ dWQ
t ) so ln S(T) = ln S(t)+(r-2
/2)(T -t)+(WQ
(T)-WQ
(t)).
As shown in the notes, the second term in the expectation is
EQ
t,S(t)[Ke-r(T-t)
1{ST > K}] = Ke-r(T-t)

ln(S(t)/K) + (r - 2
/2)(T - t)


T - t
.
For the first term, use that the conditional distribution of ln S(T) given S(t) under Q is
normal with mean ln S(t) + (r - 2
/2)(T - t) and variance 2
(T - t). By the quick formula
that says E[eZ
1{Z > z}] = exp(m + s2
/2)((-z + m + s2
)/s) when Z  N(m, s2
), we get
that the first term is
exp ln S(t) + r -
2
2
(T - t) +
2
2
(T - t) 
ln(S(t)/K) + (r - 2
/2)(T - t) + 2
(T - t)


T - t
S(t)er(T-t)

ln(St/K) + (r + 2
/2)(T - t)


T - t
.
The er(T-t)
in this expression cancels the discount factor in the Feynman-Kaˇc formula, giving
us the Black-Scholes formula S(t)(d1)-Ke-r(T-t)
(d2). The discount factor multiplies the
strike price (which, if it is paid, will be paid in the future) but not the current stock price.
Problem 5.3 We get from the F-K formula
f(t, F(t)) = EQ
t,F(t) (F(T) - K)+
e-r(T-t)
where
dF(t) = F(t) dWQ
(t) so ln F(T) = ln F(t) -
1
2
2
(T - t) + (WQ
(T) - WQ
(t)).
100 B. SOLUTIONS TO PROBLEMS
The difference is that F has zero drift because there was no fx(t, x) term in the PDE. But
notice that r has not disappeared altogether: it is still the discount rate. When evaluating
the second term in the expectation, we arrive at
EQ
t,F(t)[Ke-r(T-t)
1{F(T) > K}] = Ke-r(T-t)

- ln K + Et,F(t)[ln F(T)]
Vart,F(t)[ln F(T)]
= Ke-r(T-t)

ln(F(t)/K) - (2
/2)(T - t)


T - t
.
Likewise for the first term,
EQ
t,F(t)[F(T)e-r(T-t)
1{F(T) > K}]
= e-r(T-t)
EQ
t,F(t) F(t) exp -
1
2
2
(T - t) + (WQ
(T) - WQ
(t)) 1{F(T) > K}
= F(t)e-r(T-t) ~Pt,F(t)[F(T) > K]
= F(t)e-r(T-t)

ln(F(t)/K) + (2
/2)(T - t)


T - t
.
This could also be done using the quick Z(m, s2
) method. The final result is
f(t, F(t)) = e-r(T-t)
F(t)
ln(F(t)/K) + (2
/2)(T - t)


T - t
- K
ln(F(t)/K) - (2
/2)(T - t)


T - t
.
Problem 6.1 Apply the vector It^o formula to f(x) = x1/x2, which has
fx(x) =
1
x2
-x1
x2
2
and fxx(x) =
0 - 1
x2
2
- 1
x2
2
2x1
x3
2
.
So the It^o correction term is (leaving out the (t)'s)
1
2
0 - 2
1
Y 2
(XX)(Y Y ) +
2X
Y 3
(Y Y )(Y Y ) =
X
Y
(Y - X)Y .
The formula then yields
d
X(t)
Y (t)
=
X(t)
Y (t)
X(t) - Y (t) + (Y (t) - X(t))Y (t) dt + (X(t) - Y (t)) dW(t) .
Problem 6.2 The answer is  = [0.02 0.03 - 0.1]. Then we have


0.12
0.16
0.05

 =


0.05
0.05
0.05

 +


2 1 0
1 3 0
2 2 1


0.02
0.03
-0.1

 .
B. SOLUTIONS TO PROBLEMS 101
Problem 6.3 The arithmetic drift is
(t) = rS(t) + (t) (t) =
0.05S1(t) + 0.02
0.05S2(t) + 0.05
.
The replicating strategy includes 1(t) = -1 and 2(t) = 1 because then (t)(t) = -[1 0]+
[1 1] = [0 1]. The arithmetic drift of C under P is
rC(t) + C(t) (t) = 0.05C(t) + 0.03.
Problem 6.4
(1) Using 1 = r + 1 ,
S1(T)(T)
(t)
= S1(t) exp (1 - 1
2
/2)(T - t) + 1(W(T) - W(t))
× exp -(r +  2
/2)(T - t) - (W(T) - W(t))
= S1(t) exp (1 - 1
2
/2 -  2
/2)(T - t) + (1 - )(W(T) - W(t))
= S1(t) exp (- 1 -  2
/2)(T - t) + (1 - )(W(T) - W(t))
and the conditonal expectation of this is S1(t), because the other factor is a stochastic
exponential independent of Ft.
(2) What is the no-arbitrage price of a derivative security paying exp(W(T)) at time
T? Using the pricing kernel approach, it is
E[exp(W(T))(T)/(t)|Ft]
= exp(W(t))E[exp((W(T) - W(t))) exp(-(r +  2
/2)(T - t) - (W(T) - W(t)))|Ft]
= exp(W(t)) exp(-(r +  2
/2)(T - t))
A better question would be to price the payoff exp(-W(T)).
Problem 7.1 The key point is that F(t, U) = S(t)Q(t, U)/B(t, U), so the payoff is
(F(T, U) - K)+
=
S(T)Q(T, U)
B(T, U)
- K
+
=
Q(T, U)
B(T, U)
S(T) B(T,
U)
Q(T, U)
K
+
.
(Because we have assumed deterministic interest rates, the futures and forward prices are
the same.) Therefore this futures option is equivalent to Q(T, U)/B(T, U) shares of a stock
option with strike KB(T, U)/Q(T, U) and maturity T. The answer is
B(t, T) (F(t, U)(d1) - K(d2))
where
d1,2 =
ln(F(t, U)/K)  1
2
V (t, T)
V (t, T)
.
This is because into equation (7.2.4) we can substitute KB(T, U)/Q(T, U) in place of K and
S(t) = F(t, T)B(t, T)/Q(t, T), and multiply the whole thing by Q(T, U)/B(T, U). Then,
multiplying (d1) we have the factor S(t)Q(t, T)Q(T, U)/B(T, U) = S(t)Q(t, U)/B(T, U) =
102 B. SOLUTIONS TO PROBLEMS
B(t, T)F(t, U), and inside the arguments d1 and d2, we have instead of ln((S(t)Q(t, T))/(KB(t, T)))
now ln((S(t)Q(t, T)Q(T, U))/(KB(t, T)B(T, U))) = ln(F(t, U)/K).
Problem 7.2 If U = T, the futures option has payoff (F(T, T)-K)+
= (S(T)-K)+
, so it
is essentially the same as the stock option. The no-arbitrage formulae do give the same price,
because B(t, T)F(t, T) = S(t)Q(t, T), and ln(F(t, T)/K) = ln(S(t)/K)+
T
t
(r(u)-q(u)) du.
It would certainly be easier to hedge in futures than trade the 30 or 500 stocks making up
an index.
Bibliography
[BD77] Peter J. Bickel and Kjell A. Doksum, Mathematical statistics: Basic ideas and selected topics,
Prentice-Hall, Englewood Cliffs, New Jersey, 1977.
[Bj¨o98] Tomas Bj¨ork, Arbitrage theory in continuous time, Oxford University Press, New York, 1998.
[KS91] Ioannis Karatzas and Steven E. Shreve, Brownian motion and stochastic calculus, 2nd ed., Graduate
Texts in Mathematics, no. 113, Springer-Verlag, New York, 1991.
[Mik98] Thomas Mikosch, Elementary stochastic calculus with finance in view, Advanced Series on Statistical
Science and Applied Probability, no. 6, World Scientific, Singapore, 1998.
103