CZ.1.07/2.2.00/28.0041 Centrum interaktivn´ıch a multimedi´aln´ıch studijn´ıch opor pro inovaci v´yuky a efektivn´ı uˇcen´ı Prologue You should spent most of your time thinking about what you should think about most of your time. IV054 0. 2/74 RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020 RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020 Prof. Jozef Gruska, DrSc Wednesday, 10.00-11.40, B410 WEB PAGE of the LECTURE http://www.fi.muni.cz/usr/gruska/random20 FINAL EXAM: You need to answer four questions out of five given to you. CREDIT (ZAPOˇCET): You need to answer three questions out of five given to you. EXERCISES/TUTORIALS EXERCISES/TUTORIALS: Thursdays 14.00-15.40, C525 TEACHER: RNDr. Matej Pivoluˇska PhD Language English NOTE: Exercises/tutorials are not obligatory IV054 0. 5/74 CONTENTS - preliminary 1 Basic concepts and examples of randomized algorithms 2 Types and basic design methods for randomized algorithms 3 Basics of probability theory 4 Simple methods for design of randomized algorithms 5 Games theory and analysis of randomized algorithms 6 Basic techniques I: moments and deviations 7 Basic techniques II: tail probabilities inequalities 8 Probabilistic method I: 9 Markov chains - random walks 10 Algebraic techniques - fingerprinting 11 Fooling the adversary - examples 12 Randomized cryptographic protocols 13 Randomized proofs 14 Probabilistic method II: 15 Quantum algorithms IV054 0. 6/74 LITERATURE R. Motwami, P. Raghavan: Randomized algorithms, Cambridge University Press, UK, 1995 J. Gruska: Foundations of computing, International Thompson Computer Press, USA. 715 pages, 1997 J. Hromkoviˇc: Design and analysis of randomized algorithms, Springer, 275 pages, 2005 N. Alon, J. H. Spencer: The probabilistic method, Willey-Interscience, 2008 Part I Basic Techniques II: Concentration Bounds Two very important inequalities For any random variable X, any real λ > 0 and any integer k ≥ 1 it holds: Pr [|X| > λ] ≤ E(|X|k ) λk Case 1 k → 1 λ → λE(|X|) Pr [|X| ≥ λE(|X|)] ≤ 1 λ Markov s inequality Case 2 k → 2 X → X − E(X), λ → λ V (X) Pr |X − E(X)| ≥ λ V (X) ≤ E((X − E(X))2 ) λ2V (X) = = V (X) λ2V (X) = 1 λ2 Chebyshev s inequality Another variant of Chebyshev’s inequality: Pr[|X − E(X)| ≥ λ] ≤ V (X) λ2 and this is one of the main reasons why variance is used. IV054 1. Basic Techniques II: Concentration Bounds 9/74 Chapter 7. BASIC TECHNIQUES II: CONCENTRATION BOUNDS Many so called probability concentration bounds have been already developed and broadly applied. In this chapter we derive and apply some of them - so called tail probability bounds - bounds on the probability that values of some random variables differ much - by some bound - from their means. At first we determine bounds on probabilities that the random variables X = n i=1 Xi , differ by a fixed margin from the average, where all Xi are binary random variables with Bernoulli distribution. That is, Xi can be seen as a coin tossing with Pr [Xi = 1] = pi and Pr [Xi = 0] = 1 − pi . (Observe that as a special case p1 = p2 = ... = pn = p we have a random variable X with the binomial distribution.) At the end of the chapter we will deal with special sequences of dependent random variables, so called martingales, and also with tail bounds for martingales. That will then be applied also to the occupancy problem. IV054 1. Basic Techniques II: Concentration Bounds 10/74 BASIC PROBLEM and METHODS - II. The above approach is often used to show that X lies close to E[X] with reasonably high probability. Of the large importance is the case X is the sum of random variables. For the case that these random variables are independent we derive so called Chernoff bound. For the case that random variables of the sum are dependent, but form so called martingale we get so called Azuma-Hoeffding bound. IV054 1. Basic Techniques II: Concentration Bounds 11/74 Basic problem of the analysis of randomized algorithms What is the probability of the deviation of X = n i=1 Xi from its mean EX = µ = n i=1 pi by a fixed factor? Namely, let δ > 0. (1) What is the probability that X is larger than (1 + δ)µ ? (2) What is the probability that X is smaller than (1 − δ)µ? Notation: For a random variable X, let E etX , t > 0 fixed, be called the moment generating function of X. It holds: E etX = E   k≥0 tk Xk k!   = k≥0 tk E Xk k! Very important Chernoff bounds on the sum of independent Poisson trials are obtained when the moment generating functions of X are considered. IV054 1. Basic Techniques II: Concentration Bounds 12/74 CHERNOFF BOUNDS - I Theorem: Let X1, X2, .., Xn be independent Poisson trials such that, for 1 ≤ i ≤ n, Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n i=1 Xi , µ = E [X] = n i=1 pi , and any δ > 0 Pr [X > (1 + δ) µ] < eδ (1 + δ)(1+δ) µ (1) Proof: For any t ∈ R>0 Pr [X > (1 + δ) µ] = Pr etX > et(1+δ)µ By applying Markov inequality to the right-hand side we get Pr [X > (1 + δ) µ] < E etX et(1+δ)µ (inequality is strict). Observe that: E etX = E et n i=1 Xi = E n i=1 etXi = n i=1 E etXi , Pr [X > (1 + δ) µ] < n i=1 E etXi et(1+δ)µ . IV054 1. Basic Techniques II: Concentration Bounds 13/74 CHERNOFF BOUNDS - II. Since E etXi = pi et + (1 − pi ), we have: Pr [X > (1 + δ) µ] < n i=1 pi et + 1 − pi et(1+δ)µ = n i=1 1 + pi et − 1 et(1+δ)µ . By taking the inequality 1 + x < ex , with x = pi et − 1 , Pr [X > (1 + δ) µ] < n i=1 epi (et −1) et(1+δ)µ = e n i=1 pi (et −1) et(1+δ)µ = e(et −1)µ et(1+δ)µ . Taking t = ln (1 + δ) we get our Theorem (and basic Chernoff bound), that is: Pr [X > (1 + δ) µ] < eδ (1 + δ)(1+δ) µ (2) Observe three tricks that have been used in the above proof! IV054 1. Basic Techniques II: Concentration Bounds 14/74 COROLLARIES From the above Chernoff bound the following corollaries can be derived Corollary: Let X1, X2, .., Xn be independent Poisson trials such that, for 1 ≤ i ≤ n, Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n i=1 Xi and µ = E [X] = n i=1 pi , it holds 1 For 0 < δ < 1.81 Pr(X > (1 + δ)µ) ≤ e−µδ2 /3 2 For 0 ≤ δ ≤ 4.11 Pr[X ≥ (1 + δ)µ] ≤ e−µδ2 /4 3 For R ≥ 6µ Pr(X ≥ R) ≤ 2−R (3) IV054 1. Basic Techniques II: Concentration Bounds 15/74 EXAMPLE I - SOCCER GAMES OUTCOMES Notation: F+ (µ, δ) = eδ (1+δ)(1+δ) µ – the right-hand side of inequality (1) from the previous slide. Example: A soccer team STARS wins each game with probability 1 3 . Assuming that outcomes of different games are independent we derive an upper bound on the probability that STARS win more than half out of n games. Let Xi = 1, if STARS win i−th game 0, otherwise. Let Yn = n i=1 Xi By applying the last theorem (Pr(X > (1 + δ)µ) > F(µ, δ)), we get for µ = n 3 and δ = 1 2 , Pr Yn > n 2 < F+ n 3 , 1 2 < (0.915)n −exponentially small in n IV054 1. Basic Techniques II: Concentration Bounds 16/74 SECOND TYPE of CHERNOFF BOUNDS Previous theorem puts an upper bound on deviations of X = Xi above its expectations µ, i.e. for Pr [X > (1 + δ) µ] . Next theorem puts a lower bound on deviations of X = Xi below its expectations µ, i.e. for Pr [X < (1 − δ) µ] . IV054 1. Basic Techniques II: Concentration Bounds 17/74 Theorem: Let X1,X2,..,Xn be independent Poisson trials such that, for 1 ≤ i ≤ n, Pr [Xi = 1] = pi , where 0 < pi < 1. Then for X = n i=1 Xi , µ = E [X] = n i=1 pi , and for 0 < δ ≤ 1 Pr [X < (1 − δ) µ] < e−µ δ2 2 Proof: Pr [X < (1 − δ) µ] = Pr [−X > − (1 − δ) µ] = Pr e−tX > e−t(1−δ)µ for any positive real t. By applying Markov inequality Pr [X < (1 − δ) µ] < E e−tX e−t(1−δ)µ = n i=1 E e−tXi e−t(1−δ)µ < n i=1 pi e−t + 1 − pi e−t(1−δ)µ = n i=1 1 + pi e−t − 1 e−t(1−δ)µ . IV054 1. Basic Techniques II: Concentration Bounds 18/74 By applying the inequality 1 + x < ex we get Pr [X < (1 − δ) µ] < e n i=1 pi (e−t −1) e−t(1−δ)µ = e(e−t −1)µ e−t(1−δ)µ and if we take t = ln 1 1−δ , then Pr [X < (1 − δ) µ] < e−δ (1 − δ)1−δ µ (4) and then we have Pr [X < (1 − δ) µ] < e−µ δ2 2 From 3 and 4 it follows Corollary: For 0 < δ < 1 Pr(|X − µ| ≥ δµ) ≤ 2e−µδ2 /3 (5) IV054 1. Basic Techniques II: Concentration Bounds 19/74 EXAMPLE - COIN TOSSING Let X be a number of heads in a sequence of n independent fair coin flips. An application of the bound (5) gives, for µ = n/2 and δ = 6 ln n n Pr X − n 2 ≥ 1 2 √ 6n ln n ≤ 2e−1 3 n 2 6 ln n n = 2 n This implies that concentration of the number of heads around the mean n 2 is very tight. Indeed, the deviations from the mean are on the order of O( √ n ln n). IV054 1. Basic Techniques II: Concentration Bounds 20/74 CHEBYSHEV versus CHERNOFF Let X be again the number of heads in a sequence of n independent fair coin flips. Let us consider probability of having either more than 3n/4 or fewer than n/4 heads in a sequence of n independent fair coin-flips. Chebyshev’s inequality gives us Pr X − n 2 ≥ n 4 ≤ 4 n On the other side, using Chernoff bound we have Pr X − n 2 ≥ n 4 ≤ 2e− 1 3 n 2 1 4 ≤ 2e−n/24 . Chernoff’s method therefore gives an exponentially smaller upper bound than the upper bound obtained using Chebyshev’s inequality. IV054 1. Basic Techniques II: Concentration Bounds 21/74 SOCCER GAMES REVISITED Notation: [For the lower tail bound function] F− (µ, δ) = e −µδ2 2 . Example: Assume that the probability that STAR team wins the game is 3 4. What is the probability that in n games STAR lose more than n 2 games? In such a case µ = 0.75n, δ = 1 3 and for Yn = n i=1 Xi we have Pr Yn < n 2 < F− 0.75n, 1 3 < (0.9592)n and therefore the probability decreases exponentially fast in n. IV054 1. Basic Techniques II: Concentration Bounds 22/74 TWO SIDED BOUNDS By inequality (5), for δ < 1, Pr[|X − µ| ≥ δµ] ≤ 2e−µδ2 /3 and if we want that this bound is less than an ε, then we get Pr |X − µ| ≥ 3µ ln(2/ε) ≤ ε provided ε ≥ 2e−µδ2 /3 . IV054 1. Basic Techniques II: Concentration Bounds 23/74 Proof If ε = 2e−µδ2 /3 , then 3µ ln(2/ε) = 3µ ln(eµδ2/3) = 3µ · µδ2/3 = µ2δ2 = µδ IV054 1. Basic Techniques II: Concentration Bounds 24/74 NEW QUESTION New question: Given ε, how large has δ be in order Pr [X > (1 + δ) µ] < ε? In order to deal with such and related questions, the following definitions/notations are introduced. Df.: ∆+ (µ, ε) is a number such that F+ (µ, ∆+ (µ, ε)) = ε. ∆− (µ, ε) is a number such that F− (µ, ∆− (µ, ε)) = ε. In other words, a deviation of δ = ∆+ (µ, ε) suffices to keep Pr [X > (1 + δ) µ] bellow ε (irrespective of the values of n and pi ’s). IV054 1. Basic Techniques II: Concentration Bounds 25/74 EXAMPLE 2 - MONTE CARLO METHOD - I In this example we illustrate how Chernoff bound help us to show that a simple Monte Carlo algorithm can be used to approximate number π through sampling. The term Monte Carlo method refers to a broad collection of tools for estimating various values through sampling and simulation. Monte Carlo methods are used extensively in all areas of physical sciences and technologies. IV054 1. Basic Techniques II: Concentration Bounds 26/74 MONTE CARLO ESTIMATION OF π - I. Let Z = (X, Y ) be a point chosen randomly in a 2 × 2 square centered in (0, 0). This is equivalent to choosing X and Y randomly from interval [−1, 1]. Let Z be considered as a random variable that has value 1 (0) if the point (X, Y ) lies in the circle of radius 1 centered in the point (0, 0). Clearly Pr(Z = 1) = π 4 If we perform such an experiment m times and Zi be the value of Z at the ith run, and W = m i=1 Zi , then E[W ] = E m i=1 Zi = m i=1 E[Zi ] = mπ 4 and therefore W = (4/m)W is a natural estimation for π. IV054 1. Basic Techniques II: Concentration Bounds 27/74 MONTE CARLO ESTIMATION OF π - II. How good is this estimation? An application of second Chernoff bound gives Pr(|W − π| ≥ επ) = Pr W − mπ 4 ≥ εmπ 4 = Pr([W − E[W ]) ≥ εE[W ]) ≤ 2e−mπε2/12 because E(W ) = mπ 4 and for 0 < δ < 1 Pr(|X − µ| ≥ δµ) ≤ 2e−µδ2/3 (6) Therefore, by taking m sufficiently large we can get an arbitrarily good approximation of π IV054 1. Basic Techniques II: Concentration Bounds 28/74 HYPERCUBES IV054 1. Basic Techniques II: Concentration Bounds 29/74 5-d hypercube IV054 1. Basic Techniques II: Concentration Bounds 30/74 6-d hypercube IV054 1. Basic Techniques II: Concentration Bounds 31/74 7-d hypercube IV054 1. Basic Techniques II: Concentration Bounds 32/74 9-d hypercube IV054 1. Basic Techniques II: Concentration Bounds 33/74 A CASE STUDY - routing on hypercubes Networks are modeled by graphs. Processors are models by nodes and Communication links are models by edges. Principles of synchronous communication. :links (edges) carry synchronously packets (i, X, d(i)) where i is a source node, X are data and d(i) is destination node. Permutation routing problem on an n-processor network with nodes 1, 2, ..., n Each node i starts by sending, in parallel, a packet to a node d(i) where d(1), d(2), ..., d(n) is a permutation of 1, 2, ..., n. Problem: How many steps (from a node to a node) are necessary and sufficient to route an arbitrary permutation? A route for a packet is a sequence of edges the packet has to follow from its source to its destination. A routing algorithm for the permutation routing problem has to specify a route (in parallel) for each packet. A packet may occasionally have to wait at a node because the next edge on its route is ”busy”, transmitting another packet at that moment. We assume each node contains one queue for each edge. A routing algorithm must therefore specify also a queueing discipline. IV054 1. Basic Techniques II: Concentration Bounds 34/74 OBLIVIOUS ROUTING ALGORITHMS are such routing algorithms that the route followed by a packet from a source node i to a destination d(i) depends on i and d(i) only (and not on other d(j), for j = i). The following theorem gives a limit on the performance of oblivious algorithms. Theorem: For any deterministic oblivious permutation routing algorithm on a network of n nodes each of the out-degree d, there is an instance of the permutation routing requiring Ω n d steps. Example: Consider any d-dimensional hypercube Hd and the left-to-right routing. Any packet with the destination node d(i) is sent from any current node ni to the node nj such that binary representation of nj differs from the binary representation of ni in the leftmost bit in which ni and d(i) differ. Example Consider the permutation routing in H10 given by the “reverse” mapping b1...b10 → b10...b1 Observe that if the left-to-right routing strategy is used, then all messages from nodes b1b2b3b4b500000 have to go through the node 0000000000. Left-to-right routing on hypercube Hd requires sometimes Ω 2d d steps. IV054 1. Basic Techniques II: Concentration Bounds 35/74 RANDOMIZED ROUTING We show now a randomized (oblivious) routing algorithm with expected number of steps smaller, asymptotically, than 2d d . Notation : N = 2d Phase 1: Pick a random intermediate destination σ(i) from {1, ..., N}. Let the packet vi is to travel first to the node σ(i). Phase 2: Let the packet vi to travel next from σ(i) to its final destination D(i). (In both phases the deterministic left-to-right oblivious routing is used.) Queueing discipline: FIFO for each edge. (Actually any queueing discipline is good provided at each step there is a packet ready to travel.) IV054 1. Basic Techniques II: Concentration Bounds 36/74 Question: How many steps are needed before a packet vi reaches its destination? (Let us consider at first only the Phase 1). Let ρi denote the route for a packet vi . It clearly holds: The number of steps taken by vi is equal to the length of ρi , which is at most d, plus the number of steps for which vi is queued at intermediate nodes of ρi . Fact: For any two packets vi , vj there is at most one queue q such that vi and vj are in the queue q at the same time. IV054 1. Basic Techniques II: Concentration Bounds 37/74 Lemma: Let the route of a packet vi follow the sequence of edges ρi = (e1, e2, ..., ek ). Let S be the set of packets (other than vi ), whose routes pass through at least one of the edges {e1, ..., ek }. Then the delay the packet vi makes is at most |S|. Proof: A packet in S is said to leave ρi at that time step at which it traverses an edge of ρi for the last time. If a packet is ready to follow an edge ej at time t we define its lag at time t to be t − j. Clearly, the lag of a packet vi is initially 0, and the total delay of vi is its lag when it traverses the last edge ek of the route ρi . We show now that at each step at which the lag of vi increases by 1, the lag can be charged to a distinct member of S. IV054 1. Basic Techniques II: Concentration Bounds 38/74 If the lag of vi reaches a number l + 1, some packet in S leaves ρi with lag l. (When the lag of vi increases from l to l + 1, there must be at least one packet (from S) that wishes to traverse the same edge as vi at that time step.) Thus, S contains at least one packet whose lag is l. Let t be the last step any packet in S has the lag l. Thus there is a packet v ∈ S ready to follow an edge ej , at t = l + j . We show that some packet of S leaves ρi at t . This establish Lemma by the Fact from the slide before the previous slide. Since v is ready to follow ej at t , some packet ω (which may be v itself) in S follow edge ej , at t . Now ω has to leave ρi at t . We charge to ω the increase in the lag of vi from l to l + 1; since ω leaves ρi it will never be charged again. Thus, each member of S whose route intersects ρi is charged for at most one delay, what proves the lemma. IV054 1. Basic Techniques II: Concentration Bounds 39/74 PROOF CONTINUATION - I. Let Hij be the random variable defined as follows Hij = 1 if routes ρi and ρj share an edge 0 otherwise The total delay a packet vi makes is at most N j=1 Hij . Since the routes of different packets are chosen independently and randomly, the Hij ’s are independent Poisson trials for j = i. Thus, to bound the delay of the packet vi from above, using the Chernoff bound, it suffices to obtain an upper bound on N j=1 Hij . At first we find a bound for E N j=1 Hij . For any edge e of the hypercube let the random variable T(e) denote the number of routes that pass through e. Fix any route ρi = (ei,1, ei,2, ..., ei,k ) , k ≤ d. Then N j=1 Hij ≤ k j=1 T(ei,j ) ⇒ E N j=1 Hij ≤ k j=1 E [T(ei,j )] IV054 1. Basic Techniques II: Concentration Bounds 40/74 PROOF CONTINUATION - II. It can be shown that E [T(ei,j )] = E [T(ei,m)] for any two edges. The expected length of ρi is d 2 . An expectation of the total route length, summed over all packets, is therefore Nd 2 . The number of edges in the hypercube is Nd and therefore ⇒ E [T(eij )] ≤ Nd/2 Nd = 1 2 for any i, j.) Therefore E N j=1 Hij ≤ k 2 ≤ d 2 . By the Chernoff bound (for δ > 2e − 1), see page 7, Pr [X > (1 + δ)µ] < 2−(1+δ)µ with X = N j=1 Hij , δ = 11, µ = d 2 , we get that probability that N j=1 Hij exceeds 6d is less than 2−6d . The total number of packets is N = 2d . The probability that any of the N packets experiences a delay exceeding 6d is less than 2d × 2−6d = 2−5d . IV054 1. Basic Techniques II: Concentration Bounds 41/74 PROOF CONTINUATION - III. By adding the length of the route to the delay we get: Theorem: With probability at least 1 − 2−5d every packet reaches its intermediate destination in Phase 1 in 7d or fewer steps. The routing scheme for Phase 2 can be seen as the scheme for Phase 1, which ”runs backwards”. Therefore the probability that any packet fails to reach its target in either phase is less than 2 · 2−5d . To summarize: Theorem: With probability at least 1 − 1 25d every packet reaches its destination in 14d or fewer steps. IV054 1. Basic Techniques II: Concentration Bounds 42/74 WIRING PROBLEM - I. Global wiring in gate arrays Gate-array:is √ n × √ n array of n gates. Net - is a pair of gates to be connected by a wire. Manhattan wiring - wires can run vertically and horizontally only. 1r r1 2r r2 3r r3 4r r4 Global wiring problem I (GWPW ): given a set of nets and an integer W we need to specify, if possible, the set of gates each wire should pass through in connecting its end-points, in such a way that no more than W nets pass through any boundary. Global wiring problem II: For a boundary b between two gates in the array, let WS (b) be the number of wires that pass through b in a solution S to the global wiring problem. Notation: WS = max b WS (b) Goal: To find S such that WS is minimal. IV054 1. Basic Techniques II: Concentration Bounds 43/74 WIRING PROBLEM - II. We will consider only so called one-bend Manhattan routing at which direction is changed at most once. Problem; how to decide for each net which of the following connections to use: p p p p (that is vertical first and horizontal (right or left) next or vice verse) in order to get wiring S with minimal WS . IV054 1. Basic Techniques II: Concentration Bounds 44/74 REFORMULATION of the WIRING PROBLEM Global wiring problem can be reformulated as a 0-1 linear programming problem. For the net i we use two binary variables xi0, xi1 xi0 = 1 ⇔ i-th route step has the form (horiz.=vert.) p p xi1 = 1 ⇔ i-th route step has the form (vert+hori.) p p Notation: Tb0 = { i | net i passes through b and xi0 = 1 } and Tb1 = { i | net i passes through b and xi1 = 1 } . The corresponding 0-1 linear programming problem minimize W , where xi0, xi1 ∈ {0, 1} for each net i (3) xi0 + xi1 = 1 for each net i (4) i∈Tb0 xi0 + i∈Tb1 xi1 ≤ W for all b. (5) Last condition requires that at most W wires cross the boundary b Denote W0 the minimum obtained this way. IV054 1. Basic Techniques II: Concentration Bounds 45/74 TRICK - I. 1. Solve the corresponding rational linear programming problem with xi0, xi1 ∈ [0, 1] instead of (3). This trick is called linear relaxation. Denote xi0, xi1 solutions of the above rational linear programming problem, 1 ≤ i ≤ n, and let W be the value of the objective function for this solution. Obviously, W0 ≥ W . 2. Apply the technique called randomized rounding. Independently for each i, set xi0 to 1 with probability xi0 to 0 ” xi1 and set xi1 to 0 ” xi0 to 1 ” xi1 The idea of randomized rounding is to interpret the fractional solutions provided by the linear program as probabilities for the rounding process. IV054 1. Basic Techniques II: Concentration Bounds 46/74 TRICK - II. A nice property of randomized rounding is that if the fractional value of a variable is close to 0 (or to 1), then this variable is likely to be set to 0 (or 1). Theorem: If 0 < ε < 1, then with probability 1 − ε the global wiring S produced by randomized rounding satisfies the inequalities: WS ≤ W 1 + ∆+ W , ε 2n ≤ W0 1 + ∆+ W0, ε 2n Proof: We show that following the rounding process, with probability at least 1 − ε, no more than W 1 + ∆+ W , ε 2n wires pass through any boundary. This will be done by showing, for any boundary b, that the probability that WS (b) > W 1 + ∆+ W , ε 2n is at most ε 2n . Since a √ n × √ n array has at most 2n boundaries, one has to sum the above probability of failure over all boundaries b to get an upper bound of ε on the failure probability. IV054 1. Basic Techniques II: Concentration Bounds 47/74 TRICK - III. Let b be a boundary. The solution of the rational linear program satisfy its constrains, therefore we have i∈Tb0 xi0 + i∈Tb1 xi1 ≤ W . The number of wires passing through b in the solution S is WS (b) = i∈Tb0 xi0 + i∈Tb1 xi1. xi0 and xi1 are Poisson trials with probabilities xi0 and xi1 In addition, xi0 and xi1 are each independent of xj0 and xj1 for i = j. Therefore WS (b) is the sum of independent Poisson trials. E[WS (b)] = i∈Tbo E [xi0] + i∈Tb1 E [xi1] = i∈Tb0 xi0 + i∈Tb1 xi1 ≤ W Since ∆+ W , ε 2n is such that Pr WS (b) > W 1 + ∆+ W , ε 2n ≤ ε 2n the theorem follows. IV054 1. Basic Techniques II: Concentration Bounds 48/74 HOEFFDING INEQUALITY The problem with Chernoff bounds is that they work only for 0-1 random variables. Hoeffding inequality is another concentration bound based on the moment generating functions that applies to any sum of independent random variables with mean 0. Theorem Let X1 . . . , Xn be independent random variables with E[Xi ] = 0 and |Xi | ≤ ci for all i and some constants ci . Then for all t, Pr n i=1 Xi ≥ t ≤ e − t2 2 n i=1 c2 i In the case xi are dependent, but form so called martingale Hoeffding inequality can be generalized and we get so called Azuma-Hoeffding inequality. IV054 1. Basic Techniques II: Concentration Bounds 49/74 MARTINGALES MARTINGALES IV054 1. Basic Techniques II: Concentration Bounds 50/74 MARTINGALES Martingales are very special sequences of random variables that arise at numerous applications, such as at gambling or at random walks. These sequences have various interesting properties and for them powerful techniques exist to derive special Chernoff-like tail bounds. Martingales can be very useful in showing that values of a random variable V are sharply concentrated around its expectation E[V ]. Martingales originally referred to systems of betting in which a player increases his stake (usually by doubling) each time he lost a bet. For analysis of randomized algorithms of large importance is that, as a general rule of thumb says, most things that work for sums of independent random variables work also for martingales. IV054 1. Basic Techniques II: Concentration Bounds 51/74 MARTINGALES - MAIN DEFINITION Definition: A sequence of random variables Z0, Z1, . . . is a martingale with respect to a sequence of rand. variabl., X0, X1, . . ., if, for all n ≥ 0, the following conditions hold: Zn is a function of X0, X1, . . . , Xn E[|Zn|] < ∞; E[Zn+1|X0, . . . , Xn] = Zn; A sequence of random variables Z0, Z1, . . . is called martingale if it is mrtngl with respect to itself. That is E[|Zn|] < ∞ and E[Zn+1|Z0, . . . , Zn] = Zn IV054 1. Basic Techniques II: Concentration Bounds 52/74 EXAMPLE Let us have a gambler who plays a sequence of fair games. Let Xi be the amount the gambler wins in the ith game. Let Zi be the gambler’s total winnings at the end of the ith game. Because each game is fair we have E[Xi ] = 0 E[Zi+1|X1, X2, . . . , Xi ] = Zi + E[Xi+1] = Zi Thus Z1, Z2, . . . , Zn is martingale with respect to the sequence X1, X2, . . . , Xn. IV054 1. Basic Techniques II: Concentration Bounds 53/74 DOOB MARTINGALES A Doob martingale is a martingale constructed using the following general scheme: Let X0, X1, . . . , Xn be a sequence of random variables, and let Y be another random variable with E[|Y |] < ∞. The sequence Zi = E[Y | X0, . . . , Xi ], i = 1, . . . , n for i = 0, 1, 2, . . . is a martingale with respect to X0, X1, . . . , Xn. Indeed, E[Zi+1 | X0, . . . , Xi ] = E[E[Y | X0, . . . , Xi+1] | X0, . . . , Xi ] = E[Y | X0, . . . , Xi ] = Zi Here we have used the fact that E[V | W ] = E[E[V | U, W ] | W ] for any r.v. U, V , W . IV054 1. Basic Techniques II: Concentration Bounds 54/74 REMAINDER - CONDITIONAL EXPECTATION Definition: It is natural and useful to define conditional expectation of a random variable Y , conditioned on an event E, by E[Y |E] = yPr(Y = y|E). Example: Let we roll independently two perfect dice and let Xi be the number that shows on the ith dice and let X be sum of numbers on both dice. E[X|X1 = 3] = x xPr(X = x|X1 = 3) = 9 x=4 x 1 6 = 13 2 E[X1|X = 5] = 4 x=1 xPr(X1 = x|X = 5) = 4 x=1 x Pr(X1 = x ∩ X = 5) Pr(X = 5) = 5 2 Definition: For two random variables Y and Z, E[Y |Z] is defined to be a random variable f (Z) that takes on the value E[Y |Z = z] when Z = z. Theorem For any random variables Y , Z it holds E[Y ] = E[E[Y |Z]]. IV054 1. Basic Techniques II: Concentration Bounds 55/74 A USEFUL FACT For random variables X, Y it holds E[E[X|Y ]] = E[X] In words: what you will expect after expecting X be after learning Y is the same as what you can expect X directly to be. Proof: E[X, Y = y] = x xPr[X = x, Y = y] = x x Pr[x, y] PrY [y] and therefore E[E[X|Y = y]] = y PrY [y] x x Pr[x, y] PrY [y] = x y xPr[x, y] = E[X] IV054 1. Basic Techniques II: Concentration Bounds 56/74 STOPPING TIME A stopping time corresponds to such a strategy for stopping a sequence of steps (say at a gambling), that is based only on the outcomes seen so far. Examples of such rules at which the decision to stop gambling is a stopping time: 1 First time the gambler wins 5 games in total; 2 First time the gambler either wins or looses 1000 dolars; 3 First time the gambler wins 4 times in a row. The rule ”Last time the gambler wins 4 times in a row” is not a stopping time. IV054 1. Basic Techniques II: Concentration Bounds 57/74 MARTINGALE STOPPING THEOREM Theorem: If Z0, Z1, . . . , is a martingale with respect to X1, X2, . . . and if T is a stopping time for X1, X2, . . ., then E[ZT ] = E[Z0] whenever one of the following conditions holds: there is a constant c such that, for all i, |Zi | ≤ c - that is all Zi are bounded; T is bounded; E[T] < ∞ and there is a constant c such that E[|Zi+1 − Zi | | X1, . . . , Xi ] < c; IV054 1. Basic Techniques II: Concentration Bounds 58/74 EXAMPLE - GAMBLER’s PROBLEM Consider a sequence of independent fair games, where in each round each player either wins or looses one euro with probability 1 2. Let Z0 = 0, let Xi be the amount won at the ith game and let Zi be the total amount won after i games. Let us assume that the player quits the game when he either looses l1 euro or wins l2 euro (for given l1, l2). What is the probability p that the player wins l2 euro before losing l1 euro? IV054 1. Basic Techniques II: Concentration Bounds 59/74 GAMBLER’s PROBLEM - ANSWER Let T be the time when the gambler for the first time either won l2 or lost l1 euro. T is stopping time for the sequence X1, X2, . . .. Sequence Z0, Z1, . . . is martingale. Since values of Zi are bounded, the martingale stopping theorem can be applied. Therefore, we have: E[ZT ] = 0 Let now p be probability that the gambler quits playing after winning l2 euro. Then E ¯ [ZT ] = l2p − l1(1 − p) = 0 and therefore p = l1 l1 + l2 IV054 1. Basic Techniques II: Concentration Bounds 60/74 ELECTION PROBLEM Suppose candidates A and B run for elections and at the end A gets vA votes and B gets vB votes and vB < vA. Let us assume that votes are counted at random. What is the probability that the candidate A will be always ahead during the counting process? Let n = vA + vB and let Sk be the number of votes by which A is leading after k votes were counted. Clearly Sn = vA − vB . For 0 ≤ k ≤ n − 1 we define Xk = Sn−k n − k It can be shown, after some calculations, that the sequence X0, X1, . . . , Xn forms a martingale. Note that the sequence X0, X1, . . . , Xn relates to the counting process in a backward order - X0 is a function of Sn,.... IV054 1. Basic Techniques II: Concentration Bounds 61/74 ELECTION PROBLEM - RESULT Let T be the minimum k such that Xk = 0 if such a k exists, and T = n − 1 otherwise. T is a bounded stopping time and therefore the martingale stopping theorem gives E[XT ] = E[X0] = E[Sn] n = vA − vB vA + vB Case 1: Candidate A leads through the count. In such a case all Sn−k and therefore all Xk are positive, T = n − 1 and XT = Xn−1 = S1 = 1. Case 2: Candidate A does not lead through the count. For some k < n − 1 Xk = 0. If candidate B ever leads it has to be a k where Sk = Xk = 0. In this case T == k < n − 1 and XT = 0.. We have therefore E[XT ] = vA − vB vA + vB = 1 · Pr(Case 1) + 0 · Pr(Case 2) Therefore the probability of Case 1, in which candidate A leads through the account, is vA − vB vA + vB IV054 1. Basic Techniques II: Concentration Bounds 62/74 AZUMA-HOEFFDING INEQUALITY Perhaps the main importance of the martingale concept for the analysis of randomized algorithms is due to various special Chernoff-type inequalities that can be applied even in case random variables are not independent. Theorem Let X0, X1, . . . , Xn be a martingale such that for any k |Xk − Xk−1| ≤ ck . for some ck . Then, for all t ≥ 0 and any λ > 0 Pr(|Xt − X0| ≥ λ) ≤ 2e−λ2 /(2 t i=1 c2 i ) IV054 1. Basic Techniques II: Concentration Bounds 63/74 EXAMPLE - PATTERN MATCHING - I. Let S = (s1, . . . , sn) be a string of n characters chosen randomly from an s-nary alphabet Σ. Let P = (p1, . . . , pk ) be a string (pattern) of k characters from Σ. Let FP,S be the number of occurrences of P in S. Clearly E[FP,S ] = (n − k + 1) 1 s k We use now a Doob martingale and Azuma-Hoeffding inequality to show that, if k is relatively small with respect to n, then the number of occurrences of the pattern P in S is highly concentrated around its mean. Let Z0 = E[FP,S ] and, for 1 ≤ i ≤ n let Zi = E[FP,S | s1, . . . , si ]. The sequence Z0, . . . , Zn is a Doob martingale, and Zn = FP,S . IV054 1. Basic Techniques II: Concentration Bounds 64/74 EXAMPLE - PATTERN MATCHING - II. Since each character in the pattern P can participate in no more than k possible matches, for any 0 ≤ i ≤ n we have |Zi+1 − Zi | ≤ k. In other word, the value of si+1 can affect the value of F by at most k. Hence |E[FP,S | s1, . . . , si+1] − E[FP,S | s1, . . . , si ]| = |Zi+1 − Zi | ≤ k. By Azuma-Hoeffding inequality/theorem, Pr(|FP,S − E[FP,S ]| ≥ ε) = Pr(|(Zn − Z0)| ≥ ε) ≤ 2e−ε2 /2nk2 . IV054 1. Basic Techniques II: Concentration Bounds 65/74 WAITING TIMES for PATTERNS PROBLEM Problem: Let us suppose that we flip coins until we see some pattern to appear. What is the expected number of coin-flips until this happens? Example: We flip coins until we see HTHH. Suppose that x1x2 . . . xn is the pattern we want to get. Let us imagine we have an army of gamblers, and let one new shows up before each new coin flip. Let each gambler start by borrowing 1$ and betting that the next coin-flip will be x1. If she wins, she takes her 2$ and bets 2$ that next coin-flip will be x2, continuing to play double-or-nothing until either she loses (and is out of her initial 1$) or wins her last bet on xk (and is up 2k − 1 dollars). Because each gambler’s winnings form a martingale, so does their sum, and so the expected total return of all gamblers up to the stopping time τ at which our pattern occurs for the first time is 0. IV054 1. Basic Techniques II: Concentration Bounds 66/74 The above facts will now be used to compute E[τ]. When we stop at time τ we have one gambler who has won 2k − 1. Other gamblers may still play. For each i with x1 . . . xk = xk−i+1 . . . xk there will be a gambler with net winnings 2i − 1. All remaining gamblers will all be at −1. Let χi = 1 if x1 . . . xi = xk−i+1 . . . xk , and 0 otherwise. Then, using the stopping time theorem, E[Xτ ] = E −τ + k i=1 χi 2i = −E[τ] + k i=1 χi 2i = 0 and therefore E[τ] = k i=1 χi 2i . Examples: if pattern is HTHH (HHHH) [THHH], then E[τ] equals 18 (30) [16]. IV054 1. Basic Techniques II: Concentration Bounds 67/74 EXAMPLE - OCCUPANCY PROBLEM Suppose that m balls are thrown randomly into n bins and let Z denote the number of bins that remain empty at the end. For 0 ≤ t ≤ m let Zt be the expectation at time t of the number of bins that are empty at time m. The sequence of random variables Z0, Z1, . . . , Zm is a martingale, Z0 = E[Z] and Zm = Z. IV054 1. Basic Techniques II: Concentration Bounds 68/74 SOME ESTIMATIONS Kolmogorov-Doob inequality Let X0, X1, . . . be a martingale. Then for any λ > 0 Pr[ max 0≤i≤n Xi ≥ λ] ≤ E[|Xn|] λ . Azuma inequality Let X0, X1, . . . be a martingale sequence such that for each k |Xk − Xk−1| ≤ ck , then for all t ≥ 0 and any λ > 0 Pr[|Xt − X0| ≥ λ] ≤ 2 exp − λ2 2 t k=1 c2 k . Corollary Let X0, X1, . . . be a martingale sequence such that for each k |Xk − Xk−1| ≤ c where c is independent of k. Then, for all t ≥ 0 and any λ > 0 Pr[|Xt − X0| ≥ λc √ t] ≤ 2e−λ2 /2 , IV054 1. Basic Techniques II: Concentration Bounds 69/74 OCCUPANCY PROBLEM REVISITED Let us have m balls thrown randomly into n bins and let Z denote the number of bins that remain empty. Azuma inequality allows to show: µ = E[Z] = n(1 − 1 n )m ≈ ne−m/n and for λ > 0 Pr[|Z − µ| ≥ λ] ≤ 2e − λ2(n−1/2) n2−µ2 . IV054 1. Basic Techniques II: Concentration Bounds 70/74 APPENDIX APPENDIX IV054 1. Basic Techniques II: Concentration Bounds 71/74 EXERCISES 1 What is larger, eπ or πe , for the basis e of natural logarithms 2 Hint 1: There exists one-line proof of the correct relation. IV054 1. Basic Techniques II: Concentration Bounds 72/74 EXERCISES 1 What is larger, eπ or πe , for the basis e of natural logarithms 2 Hint 1: There exists one-line proof of correct relation. 3 Hint 2: Solution: use inequality ex > 1 + x with x = π/e − 1. IV054 1. Basic Techniques II: Concentration Bounds 73/74 EXERCISES 1 What is larger, eπ or πe , for the basis e of natural logarithms 2 Hint 1: There exists one-line proof of correct relation. 3 Hint 2: Use the inequality ex > 1 + x with x = π/e − 1. 4 Solution: eπ/e−1 > 1 + π/e − 1 implies: eπ/e−1 > π/e ==> eπ/e > π ==> eπ > πe IV054 1. Basic Techniques II: Concentration Bounds 74/74