CZ.1.07/2.2.00/28.0041
Centrum interaktivn´ıch a multimedi´aln´ıch studijn´ıch opor pro inovaci v´yuky a efektivn´ı uˇcen´ı
Prologue
You should spent most of your time thinking about
what you should think about most of your time.
IV054 0. 2/74
RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020
RANDOMIZED ALGORITHMS
AND PROTOCOLS - 2020
Prof. Jozef Gruska, DrSc
Wednesday, 10.00-11.40, B410
WEB PAGE of the LECTURE
http://www.fi.muni.cz/usr/gruska/random20
FINAL EXAM: You need to answer four questions out of ﬁve given to you.
CREDIT (ZAPOˇCET): You need to answer three questions out of ﬁve given to you.
EXERCISES/TUTORIALS
EXERCISES/TUTORIALS: Thursdays 14.00-15.40, C525
TEACHER: RNDr. Matej Pivoluˇska PhD
Language English
NOTE: Exercises/tutorials are not obligatory
IV054 0. 5/74
CONTENTS - preliminary
1 Basic concepts and examples of randomized algorithms
2 Types and basic design methods for randomized algorithms
3 Basics of probability theory
4 Simple methods for design of randomized algorithms
5 Games theory and analysis of randomized algorithms
6 Basic techniques I: moments and deviations
7 Basic techniques II: tail probabilities inequalities
8 Probabilistic method I:
9 Markov chains - random walks
10 Algebraic techniques - ﬁngerprinting
11 Fooling the adversary - examples
12 Randomized cryptographic protocols
13 Randomized proofs
14 Probabilistic method II:
15 Quantum algorithms
IV054 0. 6/74
LITERATURE
R. Motwami, P. Raghavan: Randomized algorithms, Cambridge University Press,
UK, 1995
J. Gruska: Foundations of computing, International Thompson Computer Press,
USA. 715 pages, 1997
J. Hromkoviˇc: Design and analysis of randomized algorithms, Springer, 275 pages,
2005
N. Alon, J. H. Spencer: The probabilistic method, Willey-Interscience, 2008
Part I
Basic Techniques II: Concentration Bounds
Two very important inequalities
For any random variable X, any real λ > 0 and any integer k ≥ 1 it holds:
Pr [|X| > λ] ≤
E(|X|k
)
λk
Case 1 k → 1 λ → λE(|X|)
Pr [|X| ≥ λE(|X|)] ≤
1
λ
Markov s inequality
Case 2 k → 2 X → X − E(X), λ → λ V (X)
Pr |X − E(X)| ≥ λ V (X) ≤
E((X − E(X))2
)
λ2V (X)
=
=
V (X)
λ2V (X)
=
1
λ2
Chebyshev s inequality
Another variant of Chebyshev’s inequality:
Pr[|X − E(X)| ≥ λ] ≤
V (X)
λ2
and this is one of the main reasons why variance is used.
IV054 1. Basic Techniques II: Concentration Bounds 9/74
Chapter 7. BASIC TECHNIQUES II: CONCENTRATION BOUNDS
Many so called probability concentration bounds have been already developed and
broadly applied. In this chapter we derive and apply some of them - so called tail
probability bounds - bounds on the probability that values of some random variables
diﬀer much - by some bound - from their means.
At ﬁrst we determine bounds on probabilities that the random variables
X =
n
i=1
Xi ,
diﬀer by a ﬁxed margin from the average, where all Xi are binary random variables with
Bernoulli distribution. That is, Xi can be seen as a coin tossing with Pr [Xi = 1] = pi and
Pr [Xi = 0] = 1 − pi .
(Observe that as a special case p1 = p2 = ... = pn = p we have a random variable X with
the binomial distribution.)
At the end of the chapter we will deal with special sequences of dependent random
variables, so called martingales, and also with tail bounds for martingales. That will then
be applied also to the occupancy problem.
IV054 1. Basic Techniques II: Concentration Bounds 10/74
BASIC PROBLEM and METHODS - II.
The above approach is often used to show that X lies close to E[X] with reasonably
high probability.
Of the large importance is the case X is the sum of random variables. For the case
that these random variables are independent we derive so called Chernoﬀ bound.
For the case that random variables of the sum are dependent, but form so called
martingale we get so called Azuma-Hoeﬀding bound.
IV054 1. Basic Techniques II: Concentration Bounds 11/74
Basic problem of the analysis of randomized algorithms
What is the probability of the deviation of X = n
i=1 Xi from its mean
EX = µ =
n
i=1
pi
by a ﬁxed factor?
Namely, let δ > 0. (1) What is the probability that X is larger than (1 + δ)µ ?
(2) What is the probability that X is smaller than (1 − δ)µ?
Notation: For a random variable X, let E etX
, t > 0 ﬁxed, be called the
moment generating function of X. It holds:
E etX
= E


k≥0
tk
Xk
k!

 =
k≥0
tk E Xk
k!
Very important Chernoﬀ bounds on the sum of independent Poisson trials are
obtained when the moment generating functions of X are considered.
IV054 1. Basic Techniques II: Concentration Bounds 12/74
CHERNOFF BOUNDS - I
Theorem: Let X1, X2, .., Xn be independent Poisson trials such that, for 1 ≤ i ≤ n,
Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n
i=1 Xi , µ = E [X] = n
i=1 pi , and
any δ > 0
Pr [X > (1 + δ) µ] <
eδ
(1 + δ)(1+δ)
µ
(1)
Proof: For any t ∈ R>0
Pr [X > (1 + δ) µ] = Pr etX
> et(1+δ)µ
By applying Markov inequality to the right-hand side we get
Pr [X > (1 + δ) µ] <
E etX
et(1+δ)µ
(inequality is strict).
Observe that:
E etX
= E et n
i=1 Xi
= E
n
i=1
etXi
=
n
i=1
E etXi
,
Pr [X > (1 + δ) µ] <
n
i=1 E etXi
et(1+δ)µ
.
IV054 1. Basic Techniques II: Concentration Bounds 13/74
CHERNOFF BOUNDS - II.
Since E etXi = pi et
+ (1 − pi ), we have:
Pr [X > (1 + δ) µ] <
n
i=1 pi et
+ 1 − pi
et(1+δ)µ
=
n
i=1 1 + pi et
− 1
et(1+δ)µ
.
By taking the inequality 1 + x < ex
, with x = pi et
− 1 ,
Pr [X > (1 + δ) µ] <
n
i=1 epi (et
−1)
et(1+δ)µ
=
e
n
i=1 pi (et
−1)
et(1+δ)µ
=
e(et
−1)µ
et(1+δ)µ
.
Taking t = ln (1 + δ) we get our Theorem (and basic Chernoﬀ bound), that is:
Pr [X > (1 + δ) µ] <
eδ
(1 + δ)(1+δ)
µ
(2)
Observe three tricks that have been used in the above proof!
IV054 1. Basic Techniques II: Concentration Bounds 14/74
COROLLARIES
From the above Chernoﬀ bound the following corollaries can be derived
Corollary: Let X1, X2, .., Xn be independent Poisson trials such that, for 1 ≤ i ≤ n,
Pr [Xi = 1] = pi , where 0 < pi < 1. Then for
X =
n
i=1
Xi and µ = E [X] =
n
i=1
pi ,
it holds
1 For 0 < δ < 1.81
Pr(X > (1 + δ)µ) ≤ e−µδ2
/3
2 For 0 ≤ δ ≤ 4.11
Pr[X ≥ (1 + δ)µ] ≤ e−µδ2
/4
3 For R ≥ 6µ
Pr(X ≥ R) ≤ 2−R
(3)
IV054 1. Basic Techniques II: Concentration Bounds 15/74
EXAMPLE I - SOCCER GAMES OUTCOMES
Notation: F+
(µ, δ) = eδ
(1+δ)(1+δ)
µ
– the right-hand side of inequality (1) from the
previous slide.
Example: A soccer team STARS wins each game with probability 1
3
. Assuming that
outcomes of diﬀerent games are independent we derive an upper bound on the
probability that STARS win more than half out of n games.
Let Xi =
1, if STARS win i−th game
0, otherwise.
Let Yn = n
i=1 Xi
By applying the last theorem (Pr(X > (1 + δ)µ) > F(µ, δ)), we get for µ = n
3
and δ = 1
2
,
Pr Yn >
n
2
< F+ n
3
,
1
2
< (0.915)n
−exponentially small in n
IV054 1. Basic Techniques II: Concentration Bounds 16/74
SECOND TYPE of CHERNOFF BOUNDS
Previous theorem puts an upper bound on deviations of X = Xi
above its expectations µ, i.e. for
Pr [X > (1 + δ) µ] .
Next theorem puts a lower bound on deviations of X = Xi below its
expectations µ, i.e. for
Pr [X < (1 − δ) µ] .
IV054 1. Basic Techniques II: Concentration Bounds 17/74
Theorem: Let X1,X2,..,Xn be independent Poisson trials such that, for 1 ≤ i ≤ n,
Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n
i=1 Xi , µ = E [X] = n
i=1 pi ,
and for 0 < δ ≤ 1
Pr [X < (1 − δ) µ] < e−µ δ2
2
Proof: Pr [X < (1 − δ) µ] = Pr [−X > − (1 − δ) µ] = Pr e−tX
> e−t(1−δ)µ
for any
positive real t.
By applying Markov inequality
Pr [X < (1 − δ) µ] <
E e−tX
e−t(1−δ)µ
=
n
i=1 E e−tXi
e−t(1−δ)µ
<
n
i=1 pi e−t
+ 1 − pi
e−t(1−δ)µ
=
n
i=1 1 + pi e−t
− 1
e−t(1−δ)µ
.
IV054 1. Basic Techniques II: Concentration Bounds 18/74
By applying the inequality 1 + x < ex
we get
Pr [X < (1 − δ) µ] <
e
n
i=1 pi (e−t
−1)
e−t(1−δ)µ
=
e(e−t
−1)µ
e−t(1−δ)µ
and if we take t = ln 1
1−δ
, then
Pr [X < (1 − δ) µ] <
e−δ
(1 − δ)1−δ
µ
(4)
and then we have
Pr [X < (1 − δ) µ] < e−µ δ2
2
From 3 and 4 it follows
Corollary: For 0 < δ < 1
Pr(|X − µ| ≥ δµ) ≤ 2e−µδ2
/3
(5)
IV054 1. Basic Techniques II: Concentration Bounds 19/74
EXAMPLE - COIN TOSSING
Let X be a number of heads in a sequence of n independent fair coin
ﬂips. An application of the bound (5) gives, for µ = n/2 and
δ = 6 ln n
n
Pr X −
n
2
≥
1
2
√
6n ln n ≤ 2e−1
3
n
2
6 ln n
n =
2
n
This implies that concentration of the number of heads around the
mean n
2
is very tight.
Indeed, the deviations from the mean are on the order of O(
√
n ln n).
IV054 1. Basic Techniques II: Concentration Bounds 20/74
CHEBYSHEV versus CHERNOFF
Let X be again the number of heads in a sequence of n independent fair coin ﬂips.
Let us consider probability of having either more than 3n/4 or fewer than n/4 heads in a
sequence of n independent fair coin-ﬂips.
Chebyshev’s inequality gives us
Pr X −
n
2
≥
n
4
≤
4
n
On the other side, using Chernoﬀ bound we have
Pr X −
n
2
≥
n
4
≤ 2e− 1
3
n
2
1
4 ≤ 2e−n/24
.
Chernoﬀ’s method therefore gives an exponentially smaller upper bound than the upper
bound obtained using Chebyshev’s inequality.
IV054 1. Basic Techniques II: Concentration Bounds 21/74
SOCCER GAMES REVISITED
Notation: [For the lower tail bound function]
F−
(µ, δ) = e
−µδ2
2 .
Example: Assume that the probability that STAR team wins the game is
3
4. What is the probability that in n games STAR lose more than n
2 games?
In such a case µ = 0.75n, δ = 1
3 and for Yn = n
i=1 Xi we have
Pr Yn <
n
2
< F−
0.75n,
1
3
< (0.9592)n
and therefore the probability decreases exponentially fast in n.
IV054 1. Basic Techniques II: Concentration Bounds 22/74
TWO SIDED BOUNDS
By inequality (5), for δ < 1,
Pr[|X − µ| ≥ δµ] ≤ 2e−µδ2
/3
and if we want that this bound is less than an ε, then we
get
Pr |X − µ| ≥ 3µ ln(2/ε) ≤ ε
provided ε ≥ 2e−µδ2
/3
.
IV054 1. Basic Techniques II: Concentration Bounds 23/74
Proof
If ε = 2e−µδ2
/3
, then
3µ ln(2/ε) = 3µ ln(eµδ2/3)
= 3µ · µδ2/3
= µ2δ2
= µδ
IV054 1. Basic Techniques II: Concentration Bounds 24/74
NEW QUESTION
New question: Given ε, how large has δ be in order
Pr [X > (1 + δ) µ] < ε?
In order to deal with such and related questions, the following
deﬁnitions/notations are introduced.
Df.: ∆+ (µ, ε) is a number such that F+ (µ, ∆+ (µ, ε)) = ε.
∆− (µ, ε) is a number such that F− (µ, ∆− (µ, ε)) = ε.
In other words, a deviation of δ = ∆+ (µ, ε) suﬃces to keep
Pr [X > (1 + δ) µ] bellow ε (irrespective of the values of n and pi ’s).
IV054 1. Basic Techniques II: Concentration Bounds 25/74
EXAMPLE 2 - MONTE CARLO METHOD - I
In this example we illustrate how Chernoﬀ bound help us to show that a simple Monte
Carlo algorithm can be used to approximate number π through sampling.
The term Monte Carlo method refers to a broad collection of tools for estimating various
values through sampling and simulation.
Monte Carlo methods are used extensively in all areas of physical sciences and
technologies.
IV054 1. Basic Techniques II: Concentration Bounds 26/74
MONTE CARLO ESTIMATION OF π - I.
Let Z = (X, Y ) be a point chosen randomly in a 2 × 2 square centered in (0, 0).
This is equivalent to choosing X and Y randomly from interval [−1, 1].
Let Z be considered as a random variable that has value 1 (0) if the point (X, Y )
lies in the circle of radius 1 centered in the point (0, 0).
Clearly
Pr(Z = 1) =
π
4
If we perform such an experiment m times and Zi be the value of Z at the ith run,
and W = m
i=1 Zi , then
E[W ] = E
m
i=1
Zi =
m
i=1
E[Zi ] =
mπ
4
and therefore W = (4/m)W is a natural estimation for π.
IV054 1. Basic Techniques II: Concentration Bounds 27/74
MONTE CARLO ESTIMATION OF π - II.
How good is this estimation? An application of second Chernoﬀ bound
gives
Pr(|W − π| ≥ επ) = Pr W −
mπ
4
≥
εmπ
4
= Pr([W − E[W ]) ≥ εE[W ])
≤ 2e−mπε2/12
because E(W ) = mπ
4 and for 0 < δ < 1
Pr(|X − µ| ≥ δµ) ≤ 2e−µδ2/3
(6)
Therefore, by taking m suﬃciently large we can get an arbitrarily good
approximation of π
IV054 1. Basic Techniques II: Concentration Bounds 28/74
HYPERCUBES
IV054 1. Basic Techniques II: Concentration Bounds 29/74
5-d hypercube
IV054 1. Basic Techniques II: Concentration Bounds 30/74
6-d hypercube
IV054 1. Basic Techniques II: Concentration Bounds 31/74
7-d hypercube
IV054 1. Basic Techniques II: Concentration Bounds 32/74
9-d hypercube
IV054 1. Basic Techniques II: Concentration Bounds 33/74
A CASE STUDY - routing on hypercubes
Networks are modeled by graphs. Processors are models by nodes and Communication
links are models by edges.
Principles of synchronous communication. :links (edges) carry synchronously packets
(i, X, d(i)) where i is a source node, X are data and d(i) is destination node.
Permutation routing problem on an n-processor network
with nodes 1, 2, ..., n
Each node i starts by sending, in parallel, a packet to a node d(i)
where d(1), d(2), ..., d(n) is a permutation of 1, 2, ..., n.
Problem: How many steps (from a node to a node) are necessary and suﬃcient to route
an arbitrary permutation?
A route for a packet is a sequence of edges the packet has to follow from its source to its
destination.
A routing algorithm for the permutation routing problem has to specify a route (in
parallel) for each packet.
A packet may occasionally have to wait at a node because the next edge on its route is
”busy”, transmitting another packet at that moment.
We assume each node contains one queue for each edge. A routing algorithm must
therefore specify also a queueing discipline.
IV054 1. Basic Techniques II: Concentration Bounds 34/74
OBLIVIOUS ROUTING ALGORITHMS
are such routing algorithms that the route followed by a packet from a source node i to a
destination d(i) depends on i and d(i) only (and not on other d(j), for j = i).
The following theorem gives a limit on the performance of oblivious algorithms.
Theorem: For any deterministic oblivious permutation routing algorithm on a network of
n nodes each of the out-degree d, there is an instance of the permutation routing
requiring Ω n
d
steps.
Example:
Consider any d-dimensional hypercube Hd and the left-to-right routing.
Any packet with the destination node d(i) is sent from any current node ni to the node
nj such that binary representation of nj diﬀers from the binary representation of ni in the
leftmost bit in which ni and d(i) diﬀer.
Example Consider the permutation routing in H10 given by the “reverse” mapping
b1...b10 → b10...b1
Observe that if the left-to-right routing strategy is used, then all messages from nodes
b1b2b3b4b500000 have to go through the node 0000000000.
Left-to-right routing on hypercube Hd requires sometimes Ω 2d
d
steps.
IV054 1. Basic Techniques II: Concentration Bounds 35/74
RANDOMIZED ROUTING
We show now a randomized (oblivious) routing algorithm with expected number of
steps smaller, asymptotically, than 2d
d
. Notation : N = 2d
Phase 1: Pick a random intermediate destination σ(i) from {1, ..., N}. Let the
packet vi is to travel ﬁrst to the node σ(i).
Phase 2: Let the packet vi to travel next from σ(i) to its ﬁnal destination D(i).
(In both phases the deterministic left-to-right oblivious routing is used.)
Queueing discipline: FIFO for each edge.
(Actually any queueing discipline is good provided at each step there is a packet ready to
travel.)
IV054 1. Basic Techniques II: Concentration Bounds 36/74
Question: How many steps are needed before a packet vi reaches its destination? (Let
us consider at ﬁrst only the Phase 1).
Let ρi denote the route for a packet vi . It clearly holds:
The number of steps taken by vi is equal to the length of ρi , which is at most d, plus the
number of steps for which vi is queued at intermediate nodes of ρi .
Fact: For any two packets vi , vj there is at most one queue q such that vi and vj are in
the queue q at the same time.
IV054 1. Basic Techniques II: Concentration Bounds 37/74
Lemma: Let the route of a packet vi follow the sequence of edges ρi =
(e1, e2, ..., ek ). Let S be the set of packets (other than vi ), whose routes pass
through at least one of the edges {e1, ..., ek }. Then the delay the packet vi makes
is at most |S|.
Proof: A packet in S is said to leave ρi at that time step at which it traverses an edge of
ρi for the last time.
If a packet is ready to follow an edge ej at time t we deﬁne its lag at time t to be t − j.
Clearly, the lag of a packet vi is initially 0, and the total delay of vi is its lag when it
traverses the last edge ek of the route ρi .
We show now that at each step at which the lag of vi increases by 1, the lag can be
charged to a distinct member of S.
IV054 1. Basic Techniques II: Concentration Bounds 38/74
If the lag of vi reaches a number l + 1, some packet in S leaves ρi with lag l. (When the
lag of vi increases from l to l + 1, there must be at least one packet (from S) that wishes
to traverse the same edge as vi at that time step.) Thus, S contains at least one packet
whose lag is l.
Let t be the last step any packet in S has the lag l. Thus there is a packet v ∈ S ready
to follow an edge ej , at t = l + j . We show that some packet of S leaves ρi at t . This
establish Lemma by the Fact from the slide before the previous slide.
Since v is ready to follow ej at t , some packet ω (which may be v itself) in S follow
edge ej , at t . Now ω has to leave ρi at t . We charge to ω the increase in the lag of vi
from l to l + 1; since ω leaves ρi it will never be charged again.
Thus, each member of S whose route intersects ρi is charged for at most one delay, what
proves the lemma.
IV054 1. Basic Techniques II: Concentration Bounds 39/74
PROOF CONTINUATION - I.
Let Hij be the random variable deﬁned as follows
Hij =
1 if routes ρi and ρj share an edge
0 otherwise
The total delay a packet vi makes is at most N
j=1 Hij .
Since the routes of diﬀerent packets are chosen independently and randomly, the Hij ’s are
independent Poisson trials for j = i.
Thus, to bound the delay of the packet vi from above, using the Chernoﬀ bound, it
suﬃces to obtain an upper bound on N
j=1 Hij . At ﬁrst we ﬁnd a bound for
E N
j=1 Hij .
For any edge e of the hypercube let the random variable T(e) denote the number of
routes that pass through e.
Fix any route ρi = (ei,1, ei,2, ..., ei,k ) , k ≤ d. Then
N
j=1
Hij ≤
k
j=1
T(ei,j ) ⇒ E
N
j=1
Hij ≤
k
j=1
E [T(ei,j )]
IV054 1. Basic Techniques II: Concentration Bounds 40/74
PROOF CONTINUATION - II.
It can be shown that E [T(ei,j )] = E [T(ei,m)] for any two edges.
The expected length of ρi is d
2
. An expectation of the total route length, summed over
all packets, is therefore Nd
2
. The number of edges in the hypercube is Nd and therefore
⇒ E [T(eij )] ≤ Nd/2
Nd
= 1
2
for any i, j.) Therefore
E
N
j=1
Hij ≤
k
2
≤
d
2
.
By the Chernoﬀ bound (for δ > 2e − 1), see page 7,
Pr [X > (1 + δ)µ] < 2−(1+δ)µ
with X = N
j=1 Hij , δ = 11, µ = d
2
, we get that probability that N
j=1 Hij exceeds 6d is
less than 2−6d
.
The total number of packets is N = 2d
.
The probability that any of the N packets experiences a delay exceeding 6d is less than
2d
× 2−6d
= 2−5d
.
IV054 1. Basic Techniques II: Concentration Bounds 41/74
PROOF CONTINUATION - III.
By adding the length of the route to the delay we get:
Theorem: With probability at least 1 − 2−5d
every packet reaches its intermediate
destination in Phase 1 in 7d or fewer steps.
The routing scheme for Phase 2 can be seen as the scheme for Phase 1, which ”runs
backwards”. Therefore the probability that any packet fails to reach its target in either
phase is less than 2 · 2−5d
. To summarize:
Theorem: With probability at least 1 − 1
25d every packet reaches its destination in 14d or
fewer steps.
IV054 1. Basic Techniques II: Concentration Bounds 42/74
WIRING PROBLEM - I.
Global wiring in gate arrays
Gate-array:is
√
n ×
√
n array of n gates.
Net - is a pair of gates to be connected
by a wire.
Manhattan wiring - wires can run
vertically and horizontally only.
1r
r1
2r
r2
3r
r3
4r
r4
Global wiring problem I (GWPW ): given a set of nets and an integer W we need to
specify, if possible, the set of gates each wire should pass through in connecting its
end-points, in such a way that no more than W nets pass through any boundary.
Global wiring problem II: For a boundary b between two gates in the array, let WS (b)
be the number of wires that pass through b in a solution S to the global wiring problem.
Notation: WS = max
b
WS (b)
Goal: To ﬁnd S such that WS is minimal.
IV054 1. Basic Techniques II: Concentration Bounds 43/74
WIRING PROBLEM - II.
We will consider only so called one-bend Manhattan routing at
which direction is changed at most once.
Problem; how to decide for each net which of the following
connections to use:
p
p
p
p
(that is vertical ﬁrst and horizontal (right or left) next or vice verse) in
order to get wiring S with minimal WS .
IV054 1. Basic Techniques II: Concentration Bounds 44/74
REFORMULATION of the WIRING PROBLEM
Global wiring problem can be reformulated as a 0-1 linear programming problem.
For the net i we use two binary variables xi0, xi1
xi0 = 1 ⇔ i-th route step has the form (horiz.=vert.)
p
p
xi1 = 1 ⇔ i-th route step has the form (vert+hori.)
p p
Notation:
Tb0 = { i | net i passes through b and xi0 = 1 }
and
Tb1 = { i | net i passes through b and xi1 = 1 } .
The corresponding 0-1 linear programming problem
minimize W ,
where xi0, xi1 ∈ {0, 1} for each net i (3)
xi0 + xi1 = 1 for each net i (4)
i∈Tb0
xi0 +
i∈Tb1
xi1 ≤ W for all b. (5)
Last condition requires that at most W
wires cross the boundary b
Denote W0 the minimum obtained this way.
IV054 1. Basic Techniques II: Concentration Bounds 45/74
TRICK - I.
1. Solve the corresponding rational linear programming problem with
xi0, xi1 ∈ [0, 1]
instead of (3).
This trick is called linear relaxation.
Denote xi0, xi1 solutions of the above rational linear programming problem, 1 ≤ i ≤ n,
and let W be the value of the objective function for this solution. Obviously,
W0 ≥ W .
2. Apply the technique called randomized rounding.
Independently for each i, set xi0 to 1 with probability xi0
to 0 ” xi1
and set xi1 to 0 ” xi0
to 1 ” xi1
The idea of randomized rounding is to interpret the fractional solutions provided by the
linear program as probabilities for the rounding process.
IV054 1. Basic Techniques II: Concentration Bounds 46/74
TRICK - II.
A nice property of randomized rounding is that if the fractional value of a variable is
close to 0 (or to 1), then this variable is likely to be set to 0 (or 1).
Theorem: If 0 < ε < 1, then with probability 1 − ε the global wiring S produced
by randomized rounding satisﬁes the inequalities:
WS ≤ W 1 + ∆+
W ,
ε
2n
≤ W0 1 + ∆+
W0,
ε
2n
Proof: We show that following the rounding process, with probability at least 1 − ε, no
more than W 1 + ∆+
W , ε
2n
wires pass through any boundary.
This will be done by showing, for any boundary b, that the probability that
WS (b) > W 1 + ∆+
W , ε
2n
is at most ε
2n
.
Since a
√
n ×
√
n array has at most 2n boundaries, one has to sum the above probability
of failure over all boundaries b to get an upper bound of ε on the failure probability.
IV054 1. Basic Techniques II: Concentration Bounds 47/74
TRICK - III.
Let b be a boundary. The solution of the rational linear program satisfy its constrains,
therefore we have
i∈Tb0
xi0 +
i∈Tb1
xi1 ≤ W .
The number of wires passing through b in the solution S is
WS (b) =
i∈Tb0
xi0 +
i∈Tb1
xi1.
xi0 and xi1 are Poisson trials with probabilities
xi0 and xi1
In addition, xi0 and xi1 are each independent of xj0 and xj1 for i = j.
Therefore WS (b) is the sum of independent Poisson trials.
E[WS (b)] =
i∈Tbo
E [xi0] +
i∈Tb1
E [xi1] =
i∈Tb0
xi0 +
i∈Tb1
xi1 ≤ W
Since ∆+
W , ε
2n
is such that
Pr WS (b) > W 1 + ∆+
W ,
ε
2n
≤
ε
2n
the theorem follows.
IV054 1. Basic Techniques II: Concentration Bounds 48/74
HOEFFDING INEQUALITY
The problem with Chernoﬀ bounds is that they work only for 0-1 random variables.
Hoeﬀding inequality is another concentration bound based on the moment generating
functions that applies to any sum of independent random variables with mean 0.
Theorem Let X1 . . . , Xn be independent random variables with E[Xi ] = 0 and |Xi | ≤ ci
for all i and some constants ci . Then for all t,
Pr
n
i=1
Xi ≥ t ≤ e
− t2
2 n
i=1
c2
i
In the case xi are dependent, but form so called martingale Hoeﬀding inequality can be
generalized and we get so called Azuma-Hoeﬀding inequality.
IV054 1. Basic Techniques II: Concentration Bounds 49/74
MARTINGALES
MARTINGALES
IV054 1. Basic Techniques II: Concentration Bounds 50/74
MARTINGALES
Martingales are very special sequences of random variables that arise at numerous
applications, such as at gambling or at random walks.
These sequences have various interesting properties and for them powerful techniques
exist to derive special Chernoﬀ-like tail bounds.
Martingales can be very useful in showing that values of a random variable V are sharply
concentrated around its expectation E[V ].
Martingales originally referred to systems of betting in which a player increases his stake
(usually by doubling) each time he lost a bet.
For analysis of randomized algorithms of large importance is that, as a general rule
of thumb says, most things that work for sums of independent random variables
work also for martingales.
IV054 1. Basic Techniques II: Concentration Bounds 51/74
MARTINGALES - MAIN DEFINITION
Deﬁnition: A sequence of random variables Z0, Z1, . . . is a
martingale with respect to a sequence of rand. variabl.,
X0, X1, . . ., if, for all n ≥ 0, the following conditions hold:
Zn is a function of X0, X1, . . . , Xn
E[|Zn|] < ∞;
E[Zn+1|X0, . . . , Xn] = Zn;
A sequence of random variables Z0, Z1, . . . is called martingale if it is
mrtngl with respect to itself. That is E[|Zn|] < ∞ and
E[Zn+1|Z0, . . . , Zn] = Zn
IV054 1. Basic Techniques II: Concentration Bounds 52/74
EXAMPLE
Let us have a gambler who plays a sequence of fair games.
Let Xi be the amount the gambler wins in the ith game.
Let Zi be the gambler’s total winnings at the end of the ith game.
Because each game is fair we have E[Xi ] = 0
E[Zi+1|X1, X2, . . . , Xi ] = Zi + E[Xi+1] = Zi
Thus Z1, Z2, . . . , Zn is martingale with respect to the sequence
X1, X2, . . . , Xn.
IV054 1. Basic Techniques II: Concentration Bounds 53/74
DOOB MARTINGALES
A Doob martingale is a martingale constructed using the following general
scheme:
Let X0, X1, . . . , Xn be a sequence of random variables, and let Y be
another random variable with E[|Y |] < ∞.
The sequence
Zi = E[Y | X0, . . . , Xi ], i = 1, . . . , n
for i = 0, 1, 2, . . . is a martingale with respect to X0, X1, . . . , Xn.
Indeed,
E[Zi+1 | X0, . . . , Xi ] = E[E[Y | X0, . . . , Xi+1] | X0, . . . , Xi ]
= E[Y | X0, . . . , Xi ] = Zi
Here we have used the fact that E[V | W ] = E[E[V | U, W ] | W ] for any r.v.
U, V , W .
IV054 1. Basic Techniques II: Concentration Bounds 54/74
REMAINDER - CONDITIONAL EXPECTATION
Deﬁnition: It is natural and useful to deﬁne conditional expectation of a random variable
Y , conditioned on an event E, by
E[Y |E] = yPr(Y = y|E).
Example: Let we roll independently two perfect dice and let Xi be the number that
shows on the ith dice and let X be sum of numbers on both dice.
E[X|X1 = 3] =
x
xPr(X = x|X1 = 3) =
9
x=4
x
1
6
=
13
2
E[X1|X = 5] =
4
x=1
xPr(X1 = x|X = 5) =
4
x=1
x
Pr(X1 = x ∩ X = 5)
Pr(X = 5)
=
5
2
Deﬁnition: For two random variables Y and Z, E[Y |Z] is deﬁned to be a random
variable f (Z) that takes on the value E[Y |Z = z] when Z = z.
Theorem For any random variables Y , Z it holds
E[Y ] = E[E[Y |Z]].
IV054 1. Basic Techniques II: Concentration Bounds 55/74
A USEFUL FACT
For random variables X, Y it holds
E[E[X|Y ]] = E[X]
In words: what you will expect after expecting X be after learning Y is the same as what
you can expect X directly to be.
Proof:
E[X, Y = y] =
x
xPr[X = x, Y = y] =
x
x
Pr[x, y]
PrY [y]
and therefore
E[E[X|Y = y]] =
y
PrY [y]
x
x
Pr[x, y]
PrY [y]
=
x y
xPr[x, y] = E[X]
IV054 1. Basic Techniques II: Concentration Bounds 56/74
STOPPING TIME
A stopping time corresponds to such a strategy for stopping a sequence of steps (say at a
gambling), that is based only on the outcomes seen so far.
Examples of such rules at which the decision to stop gambling is a stopping time:
1 First time the gambler wins 5 games in total;
2 First time the gambler either wins or looses 1000 dolars;
3 First time the gambler wins 4 times in a row.
The rule ”Last time the gambler wins 4 times in a row” is not a stopping time.
IV054 1. Basic Techniques II: Concentration Bounds 57/74
MARTINGALE STOPPING THEOREM
Theorem: If Z0, Z1, . . . , is a martingale with respect to X1, X2, . . . and if T
is a stopping time for X1, X2, . . ., then
E[ZT ] = E[Z0]
whenever one of the following conditions holds:
there is a constant c such that, for all i, |Zi | ≤ c - that is all Zi are
bounded;
T is bounded;
E[T] < ∞ and there is a constant c such that
E[|Zi+1 − Zi | | X1, . . . , Xi ] < c;
IV054 1. Basic Techniques II: Concentration Bounds 58/74
EXAMPLE - GAMBLER’s PROBLEM
Consider a sequence of independent fair games, where in each round
each player either wins or looses one euro with probability 1
2.
Let Z0 = 0, let Xi be the amount won at the ith game and let Zi be
the total amount won after i games.
Let us assume that the player quits the game when he either looses l1
euro or wins l2 euro (for given l1, l2).
What is the probability p that the player wins l2 euro before losing l1
euro?
IV054 1. Basic Techniques II: Concentration Bounds 59/74
GAMBLER’s PROBLEM - ANSWER
Let T be the time when the gambler for the ﬁrst time either won l2 or lost l1 euro.
T is stopping time for the sequence X1, X2, . . ..
Sequence Z0, Z1, . . . is martingale. Since values of Zi are bounded, the martingale
stopping theorem can be applied. Therefore, we have:
E[ZT ] = 0
Let now p be probability that the gambler quits playing after winning l2 euro. Then
E
¯
[ZT ] = l2p − l1(1 − p) = 0
and therefore
p =
l1
l1 + l2
IV054 1. Basic Techniques II: Concentration Bounds 60/74
ELECTION PROBLEM
Suppose candidates A and B run for elections and at the end A gets vA votes and B
gets vB votes and vB < vA.
Let us assume that votes are counted at random. What is the probability that the
candidate A will be always ahead during the counting process?
Let n = vA + vB and let Sk be the number of votes by which A is leading after k
votes were counted. Clearly Sn = vA − vB .
For 0 ≤ k ≤ n − 1 we deﬁne
Xk =
Sn−k
n − k
It can be shown, after some calculations, that the sequence X0, X1, . . . , Xn forms a
martingale.
Note that the sequence X0, X1, . . . , Xn relates to the counting process in a backward
order - X0 is a function of Sn,....
IV054 1. Basic Techniques II: Concentration Bounds 61/74
ELECTION PROBLEM - RESULT
Let T be the minimum k such that Xk = 0 if such a k exists, and T = n − 1
otherwise.
T is a bounded stopping time and therefore the martingale stopping theorem gives
E[XT ] = E[X0] =
E[Sn]
n
=
vA − vB
vA + vB
Case 1: Candidate A leads through the count. In such a case all Sn−k and therefore
all Xk are positive, T = n − 1 and XT = Xn−1 = S1 = 1.
Case 2: Candidate A does not lead through the count. For some k < n − 1 Xk = 0.
If candidate B ever leads it has to be a k where Sk = Xk = 0. In this case
T == k < n − 1 and XT = 0..
We have therefore
E[XT ] =
vA − vB
vA + vB
= 1 · Pr(Case 1) + 0 · Pr(Case 2)
Therefore the probability of Case 1, in which candidate A leads through the account,
is
vA − vB
vA + vB
IV054 1. Basic Techniques II: Concentration Bounds 62/74
AZUMA-HOEFFDING INEQUALITY
Perhaps the main importance of the martingale concept for the analysis of randomized
algorithms is due to various special Chernoﬀ-type inequalities that can be applied even in
case random variables are not independent.
Theorem Let X0, X1, . . . , Xn be a martingale such that for any k
|Xk − Xk−1| ≤ ck .
for some ck .
Then, for all t ≥ 0 and any λ > 0
Pr(|Xt − X0| ≥ λ) ≤ 2e−λ2
/(2 t
i=1 c2
i )
IV054 1. Basic Techniques II: Concentration Bounds 63/74
EXAMPLE - PATTERN MATCHING - I.
Let S = (s1, . . . , sn) be a string of n characters chosen randomly from an s-nary
alphabet Σ. Let P = (p1, . . . , pk ) be a string (pattern) of k characters from Σ.
Let FP,S be the number of occurrences of P in S. Clearly
E[FP,S ] = (n − k + 1)
1
s
k
We use now a Doob martingale and Azuma-Hoeﬀding inequality to show that, if k is
relatively small with respect to n, then the number of occurrences of the pattern P
in S is highly concentrated around its mean.
Let Z0 = E[FP,S ] and, for 1 ≤ i ≤ n let
Zi = E[FP,S | s1, . . . , si ].
The sequence Z0, . . . , Zn is a Doob martingale, and Zn = FP,S .
IV054 1. Basic Techniques II: Concentration Bounds 64/74
EXAMPLE - PATTERN MATCHING - II.
Since each character in the pattern P can participate in no more than k possible
matches, for any 0 ≤ i ≤ n we have
|Zi+1 − Zi | ≤ k.
In other word, the value of si+1 can aﬀect the value of F by at most k. Hence
|E[FP,S | s1, . . . , si+1] − E[FP,S | s1, . . . , si ]| = |Zi+1 − Zi | ≤ k.
By Azuma-Hoeﬀding inequality/theorem,
Pr(|FP,S − E[FP,S ]| ≥ ε) = Pr(|(Zn − Z0)| ≥ ε) ≤ 2e−ε2
/2nk2
.
IV054 1. Basic Techniques II: Concentration Bounds 65/74
WAITING TIMES for PATTERNS PROBLEM
Problem: Let us suppose that we ﬂip coins until we see some pattern to appear. What is
the expected number of coin-ﬂips until this happens?
Example: We ﬂip coins until we see HTHH.
Suppose that x1x2 . . . xn is the pattern we want to get.
Let us imagine we have an army of gamblers, and let one new shows up before each new
coin ﬂip.
Let each gambler start by borrowing 1$ and betting that the next coin-ﬂip will be x1. If
she wins, she takes her 2$ and bets 2$ that next coin-ﬂip will be x2, continuing to play
double-or-nothing until either she loses (and is out of her initial 1$) or wins her last bet
on xk (and is up 2k
− 1 dollars).
Because each gambler’s winnings form a martingale, so does their sum, and so the
expected total return of all gamblers up to the stopping time τ at which our pattern
occurs for the ﬁrst time is 0.
IV054 1. Basic Techniques II: Concentration Bounds 66/74
The above facts will now be used to compute E[τ].
When we stop at time τ we have one gambler who has won 2k
− 1. Other gamblers may
still play.
For each i with x1 . . . xk = xk−i+1 . . . xk there will be a gambler with net winnings 2i
− 1.
All remaining gamblers will all be at −1.
Let χi = 1 if x1 . . . xi = xk−i+1 . . . xk , and 0 otherwise. Then, using the stopping time
theorem,
E[Xτ ] = E −τ +
k
i=1
χi 2i
= −E[τ] +
k
i=1
χi 2i
= 0
and therefore
E[τ] =
k
i=1
χi 2i
.
Examples: if pattern is HTHH (HHHH) [THHH], then E[τ] equals 18 (30) [16].
IV054 1. Basic Techniques II: Concentration Bounds 67/74
EXAMPLE - OCCUPANCY PROBLEM
Suppose that m balls are thrown randomly into n bins and let Z
denote the number of bins that remain empty at the end.
For 0 ≤ t ≤ m let Zt be the expectation at time t of the number of
bins that are empty at time m.
The sequence of random variables
Z0, Z1, . . . , Zm
is a martingale, Z0 = E[Z] and Zm = Z.
IV054 1. Basic Techniques II: Concentration Bounds 68/74
SOME ESTIMATIONS
Kolmogorov-Doob inequality Let X0, X1, . . . be a martingale. Then for any λ > 0
Pr[ max
0≤i≤n
Xi ≥ λ] ≤
E[|Xn|]
λ
.
Azuma inequality Let X0, X1, . . . be a martingale sequence such that for each k
|Xk − Xk−1| ≤ ck ,
then for all t ≥ 0 and any λ > 0
Pr[|Xt − X0| ≥ λ] ≤ 2 exp −
λ2
2 t
k=1 c2
k
.
Corollary Let X0, X1, . . . be a martingale sequence such that for each k
|Xk − Xk−1| ≤ c
where c is independent of k. Then, for all t ≥ 0 and any λ > 0
Pr[|Xt − X0| ≥ λc
√
t] ≤ 2e−λ2
/2
,
IV054 1. Basic Techniques II: Concentration Bounds 69/74
OCCUPANCY PROBLEM REVISITED
Let us have m balls thrown randomly into n bins and let Z denote the
number of bins that remain empty.
Azuma inequality allows to show:
µ = E[Z] = n(1 −
1
n
)m
≈ ne−m/n
and for λ > 0
Pr[|Z − µ| ≥ λ] ≤ 2e
−
λ2(n−1/2)
n2−µ2
.
IV054 1. Basic Techniques II: Concentration Bounds 70/74
APPENDIX
APPENDIX
IV054 1. Basic Techniques II: Concentration Bounds 71/74
EXERCISES
1 What is larger, eπ
or πe
, for the basis e of
natural logarithms
2 Hint 1: There exists one-line proof of the
correct relation.
IV054 1. Basic Techniques II: Concentration Bounds 72/74
EXERCISES
1 What is larger, eπ
or πe
, for the basis e of
natural logarithms
2 Hint 1: There exists one-line proof of correct
relation.
3 Hint 2: Solution: use inequality ex
> 1 + x with
x = π/e − 1.
IV054 1. Basic Techniques II: Concentration Bounds 73/74
EXERCISES
1 What is larger, eπ
or πe
, for the basis e of natural
logarithms
2 Hint 1: There exists one-line proof of correct relation.
3 Hint 2: Use the inequality ex
> 1 + x with x = π/e − 1.
4 Solution:
eπ/e−1
> 1 + π/e − 1
implies:
eπ/e−1
> π/e ==> eπ/e
> π ==> eπ
> πe
IV054 1. Basic Techniques II: Concentration Bounds 74/74