Prologue CZ.1.07/2.2.00/28.0041 Centrum interaktivních a multimediálních studijních opor pro inovaci výuky a efektivní učení evropský sociální soaaini ^h^h ministerstvo školství, opvzdsiavanr v*^^< fondVCR EVROPSKÁ UNIE MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost ,*4JÍA'»*' INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ You should spent most of your time thinking about what you should think about most of your time. 2/128 RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020 WEB PAGE of the LECTURE RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020 Prof. Jozef Gruska, DrSc Wednesday, 10.00-11.40, B410 http://www.f i.muni.cz/usr/gruska/random20 FINAL EXAM: You need to answer four questions out of five given to you. CREDIT (ZAPOČET): You need to answer three questions out of five given to you. EXERCISES/TUTORIALS EXERCISES/TUTORIALS: Thursdays 14.00-15.40, C525 TEACHER: RNDr. Matej Pivoluška PhD Language English NOTE: Exercises/tutorials are not obligatory 5/128 CONTENTS - preliminary 0 EE! Basic concepts and examples of randomized algorithms Types and basic design methods for randomized algorithms Basics of probability theory Simple methods for design of randomized algorithms Games theory and analysis of randomized algorithms Basic techniques I: moments and deviations Basic techniques II: tail probabilities inequalities Probabilistic method I: Markov chains - random walks Algebraic techniques - fingerprinting Fooling the adversary - examples Randomized cryptographic protocols Randomized proofs Probabilistic method II: Quantum algorithms 6/128 LITERATURE Part I R. Motwami, P. Raghavan: Randomized algorithms, Cambridge University Press, UK, 1995 J. Gruska: Foundations of computing, International Thompson Computer Press, USA. 715 pages, 1997 J. Hromkovic: Design and analysis of randomized algorithms, Springer, 275 pages, 2005 N. Alon, J. H. Spencer: The probabilistic method, Willey-lnterscience, 2008 Chapter 9. RANDOM WALKS - MARKOV CHAINS/MODELS Random walks on graphs are a very simple, interesting and fundamental tool with surprisingly many applications in informatics and also in mathematics and natural sciences. Design and analysis of randomized algorithms is one of them. The concept of a random walk is closely related with that of Markov chain (model) - one of the key concepts of discrete stochastic processes. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 9/128 BASIC CONCEPTS I Notation Let G — (V, E) be a connected and undirected graph with n nodes (in V) and m edges (in E). For a node v £ V, let ^g(v) denote the set of neighbors of v in G. A random walk on G is the following sequence of moves of the process that starts in some (initial) node vo and then: a neighbor v\ of vo, from \~g(vo), is chosen, randomly and independently,and then the process moves (walks) from vo to v\. Afterwards a neighbor V2 of v\ is chosen, again randomly and independently, and then the process walks/moves from vi to V2- The process then continues to walk/move, in the same way, from a current node to a randomly chosen neighbouring one, until, for some reasons, processor ends moving, or it moves again and again .. .even for ever. Typical problems to explore concerning walking for a given graph G are: ■ What is the expected number of steps to get from a given node u to a given node vl ■ What is the expected number of steps needed to visit all nodes of G at least once when starting in a given node ul IV054 1. Chapter 9. Random Walks - Markov Chains/Models 10/128 Simple hypercubes 6-d hypercube X Y Z w k? IV054 1. Chapter 9. Random Walks - Markov Chains/Models 11/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 12/128 DRUNKARD'S WALK EXAMPLE □□□□□□□□□□□□□□DO Let a drunken seaman walk on a linear, both sides infinite, graph/pathway, each time making a step right or left, with the same probability. What are probabilities for such a drunken man to be in a particular position after some steps in case he starts in some fixed initial position? Let i G = Kn be the complete graph of n nodes ■ u ^ v be any two vertices of G. It holds: ■ The expected number of steps of a random walk that begins in u and ends when for the first time reaches v, is ??????? IV054 1. Chapter 9. Random Walks - Markov Chains/Models 13/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 14/128 EXAMPLE - to finish Let G = Kn be the complete graph of n nodes It holds: The expected number of steps of a random walk in Kn that begins in a fixed node u and ends when first reaching a fixed node v is Hence: p = n — 1 The expected number of steps to visit all nodes in G starting from any node u is (n-l)Hn, where Hn is so called Harmonic number Hn = J2) = °0g") i=l IV054 1. Chapter 9. Random Walks - Markov Chains/Models 15/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 16/128 A RELATED PROBLEM - COUPON SELECTION Coupon selector problem: There are n types of coupons and at each time a coupon is chosen randomly and returned. It has been shown that the average number of trials needed to have a coupon of each type is nHn. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 17/128 EXA MPLE Let us consider graph K3 with the initial probabilities of its three nodes, say AO, Al, A2, being P0,Pl,P2 and let 1/2 be the probability of the transmission through any edge. If pi is the initial probability of one of the above nodes, then the probability of being in the same node after one step is after two steps is Pi = (1 - Pi) 2 = 2 " ~2' P} = (1-(1- P(.)!)I = ! _ I + £ and after j steps the probability is 7=1 Therefore lim P>,: = \ IV054 1. Chapter 9. Random Walks - Markov Chains/Models 18/128 EXAMPLE - 2-SATISFIABILITY RELATION TO A RANDOM WALK ON THE LINE A simple polynomial-time (Monte Carlo) algorithm will be given to □ Start with an arbitrary assignment. B while there is an unsatisfied clause C, choose randomly one of two literals of C and complement its value in the current assignment. Theorem The expected number of steps of the above algorithm at finding a satisfying assignment is ö(n2) (where n is the number of variables). Let A be a particular satisfying assignment. The progress of the above algorithm can be represented by a particle moving between integers {0,1,..., n} on the real line. The position of the particle will always indicate how many variables in the current assignment have the correct value (as in A). Crucial observation. In an unsatisfied clause at least one of two literals has an incorrect value. Therefore at each step of the algorithm with probability | we increase by one the number of variables having their correct value; and with probability \ the number of variables having correct value is decreased by one. The motion of the particle therefore resembles a random walk on the line (that is on the linear graph). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 19/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 20/128 STOCHASTIC PROCESSES in RANDOMIZED ALGORITHMS A stochastic process P is a sequence of random variables P — {Xt}t>i, where we think of values of Xt as values (states) of the process P at time t. Two types of stochastic processes come up over and over again in the analysis of randomized algorithms: □ Martingale, where values of each next variable may depend, even in a complicated way, on the past history, but its expectation is 0. B Markov chain where next state depends always only on the current state and not on ways to get there - not on the past history. The most useful algorithmic property of Markov chains, to be explored in the next, is their convergence to a fixed (probability) distributions on states. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 21/128 MARKOV CHAINS - 1st DEFINITION A Markov chain is a discrete-time stochastic process defined over a set of states S in terms of a matrix P of transition probabilities P(i,j) — Pij — the probability that the next state will be j if the current state is /. Probability conditions: For any i,j it has to hold 0 < py < 1 and ^2 Pij — 1- j Denote Xt the state of the Markov chain at time t. The stochastic process {Xt}^0, specifying the history of the evolution of the Markov chain at time t, has the following memoryless property: The future behaviors of a Markov chain depends on its current state, and not how the chain arrived at the current state. That is, it holds, for any t > 1: Pr[Xt+1 = j\Xo = /'o, Xi = /i, .. ., X = it = /] = Pr[X+1 = j\Xt = /] = pu. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 22/128 NOTE APPLICATIONS of MARKOV CHAINS Note: Markov chains do not need to have prespecified initial states. In general, initial states are chosen according to some (initial) probability distribution Xq over S. Such a distribution is called the initial (probability) distribution. In physical sciences, Markov chain provide a fundamental model for the emergence of global properties from local interactions. In informatics, random walks provide a general paradigm for random exploration of an exponentially large combinatorial structures (for example graphs), by a sequence of simple and local transitions. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 23/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 24/128 HIDDEN MARKOV MODELS Hidden Markov Model (HMM) is a Markov model, with random transitions among states, extended by random outputs of the states. HMM works in such a way that an observer can see only a sequence of states' outputs and not the internal structure (states, transition and emission probabilities) of the underlying Markov model. Hidden Markov Model (HMM), has a lot of applications, especially in artificial intelligence. For example, almost all fast speech and patterns recognition systems use HMM. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 25/128 DEVELOPMENT of MARKOV MODEL The concept of Markov chain/model introduced Andreei Andreevich Markov in 1906, in order to consider the case of having a sequence of random experiments in which result of any experiment depends also on the result of the previous experiment. Before that only such sequences of random experiments were considered where the result of each random experiment was fully independent from all previous experiments. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 26/128 UNIVERSALITY of QUANTUM RANDOM WALKS MARKOV CHAINS -2nd DEFINITION It can be also shown that any quantum evolution can be seen as so-called continuous quantum walk. A Markov chain/model is a discrete time stochastic process {Xk}k>o of random variables with values in a countable set / such that for every k > 1 and every /'o, h,..., ik from / we have Pi\Xk — ik | Xk-i = /*_!,..., X0 = i0] — Pr[Xk — ik | Xk-i — ik-i] — Pik_Vk ■ The matrix P(i,j) — p,j is called the transition matrix of one-step transition probabilities. /c-steps transition probabilities are defined by and they do not depend on m. If we define a matrix P^ by P^k\i,j) — p\j \ then (Chapman-Kolmogorov equations) p(k+m) _ p(k) p(m) The matrix P^ is said to be the /c-steps transition matrix. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 27/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 28/128 DETAILS of MARKOV'S MODEL STEPS Rows of all /c-steps transition matrices have non-negative entries and sum up to 1. Such matrices are called stochastic matrices. The probability distribution over states of a Markov chain C with n nodes Ni,Nn and with a transition n x n matrix P at any given time t is given by a row vector Qt — (P(l, t),..., P(n, t), where P(i, t) is the probability that chain is in the state A/, after t steps. C?t is therefore distribution vector for time step t and therefore probability distribution of states after t steps Distribution vector of such a Markov chain at time t is then given by the vector P'Qo IV054 1. Chapter 9. Random Walks - Markov Chains/Models 29/128 BASIC REACHABILITY CONCEPTS A state j is called reachable/accessible from the state / if there is a k > 0 such that p\p > 0. We say that states / and j are called mutually reachable if / is reachable from j and vice verse. A Markov chain is called irreducible, if any two of its states are mutually reachable. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 30/128 GENERAL OBSERVATION EXAMPLE - EATING HABITS Systems represented by Markov chains change randomly and therefore it is generally impossible to determine the exact state of the system in the future. However, we often can determine various useful statistical properties of Markov chains. Example: A creature in ZOO eats once a day, either grapes, or cheese, or lettuce, according to the following rules: ■ If it ate cheese yesterday it will not eat it today and will eat lettuce and grapes with the same probability 0.5. ■ If it ate grapes yesterday, it will eat today grapes with probability 0.1 cheese with probability 0.4 and lettuce with probability 0.5 ■ If it ate lettuce yesterday, it will eat grapes with probability 0.4 and cheese with probability 0.6. These eating habits can be modelled by Markov model in the next figure. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 31/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 32/128 MARKOV CHAIN for EATING HABIT! A Markov chain is often described by a directed graph with edges labelled by probabilities for going from one state/node to another one. 0.5 Eating habits of a creature. One statistical property that can be computed is the percentage of days the creature eats grapes (or cheese). 1. Ch apter 9. Random Walks - Markov Chains/Models 33/128 EHRENFEST MODEL There are two urns that, in total, always contain four balls. At each step, one of the balls is chosen at random and moved to other urn. If we choose as states number of balls in the first urn, then the transition matrix, where rows and columns are labelled (from the top to bottom and from the left to right) 0, 1, 2, 3, 4 looks as follows P = / 0 1 0 0 0 \ 1/4 0 3/4 0 0 0 1/2 0 1/2 0 0 0 3/4 0 1/4 V 0 0 0 1 0 / IV054 1. Chapter 9. Random Walks - Markov Chains/Models 34/128 P(i,j) is probability that if first urn has / balls than in next step will have j balls. ABSORBING DRUNKARD'S WALK A drunk man walks on a 5 nodes (0,1,2,3,4) linear graph, where leftmost node (0) is his home and rightmost one (4) is a bar. If he is at home or in bar he keeps staying there. Otherwise he moves with probability 1/2 to left node and probability 1/2 to right node. The transition matrix has therefore the form P = where rows and columns are labeled by 0,1,2,3,4 and P(i,j) is probability that the drunken man goes from the node / to the node j. í 1 0 0 0 0 1/2 0 1/2 0 0 0 1/2 0 1/2 0 0 0 1/2 0 1/2 0 0 0 0 1 ) IV054 1. Chapter 9. Random Walks - Markov Chains/Models 35/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 35/128 ABSORBING MARKOV CHAINS BASIC CONCEPTS - 1. Definition A state s, of a Markov chain is called absorbing if it is impossible to leave it (i.e. p„ = 1). A Markov chain is called absorbing if it has at least one absorbing state, and if from any state one can go to an absorbing state, in some number of steps. The Drunkard's walk is an example of an absorbing Markov chain. If the transition matrix of a Markov walk has a absorbing states and t not absorbing states, one can renumber states so that all first t rows and columns represent not absorbing states and remaining ones absorbing states. The matrix has then the following canonical form: where C? is a t x t matrix, R is a t x a matrix, 0 is a a x t zero matrix and / is a x a identity matrix, with first t rows and columns labeled by not absorbing states. This means that for any integer n, " = Pr[Xt = j and Xs / j for 1 < s < t \ X0 = /]. Given an initial state Xo = /, then the probability that there is a visit to state j at some time t > 0 is given by t>0 fa is the probability that Markov chain will return to the state / at least once when started in the state /. If fn — 1 the state / is called persistent/recurrent (vracajúci sa); otherwise it is called transient (prechodný). A Markov chain is recurrent if every its state is recurrent. All states of an irreducible Markov chain are recurrent. If fn < 1, then each time the chain is in the state /, with probability 1 — f„ will never return again to /. It therefore holds: Pr[The number of visits to / from / equals k] — f^(l — fn). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 36/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 37/128 BASIC CONCEPTS II. Denote as the hitting time (čas dosiahnutia/riešenia) h,j the expected number of steps needed to visit the state/node j for the first time when starting from the state/node /. Clearly, it holds t>0 If ha < oo, then the state / is called positive (non-null) recurrent/persistent; otherwise it is called null-recurrent/persistent. If a state / is reachable from itself, then the greatest common divisor of the set of positive k's such that > 0, is called the period of / and is denoted by d,. If d i = 1, then the state / is said to be aperiodic. A finite Markov chain all states of which are aperiodic and recurrent is called ergodic. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 38/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 39/128 EQUIVALENT DEFINITIONS SUMMARY A state / has a period k if any return to the state / must occur in time steps that are multiple of k. A state / is A state / is called aperiodic - if it is not periodic for any k > 1. transient if /// < 1, ha = oo; A state / is said to be transient, if given that we start in the state /, there is non-zero probability that we will never return to /. If a state / is not transient, then it is called recurrent. null recurrent if fa = 1, /?„ = oo; non-null recurrent if fa = 1, /?„ < oo. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 40/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 41/128 EXAMPLE of a Markov chain with null-recurrent states ERGODIC MARKOV MODEL REVISITED Consider a Markov chain whose states are all positive integers. From each state / the next states are the state / + 1 (with probability and the state 1 (with probability -J-j-) Starting at state 1, the probability of not having returned to state 1 within the first t steps is ri J = 1 f_\ j+1 t+1 Hence the probability of never returning to state 1 from 1 is 0, and state 1 is recurrent. It holds also 1,1 t t + i t(t + iy However, the expected number of steps until the first return to state 1 from state 1 is An equivalent definition of ergodic Markov chains. Definition A Markov chain with a transition matrix P is called ergodic if it is possible to go from every state to every state and there is an integer n such that all entries of the matrix Pn are positive. oo oo , t=i t=i which is unbounded. However, it holds: In a finite Markov chain at least one state is recurrent and all recurrent states are positive recurrent. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 42/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 43/128 ERGODIC THEOREM Let us have an ergodic Markov chain C with the set of states S — {1,..., n}. Then, it holds: ■ There exists a vector tt — (tti, ...,irn) such that for every i,j £ S it holds i- CO ttj = lim p)j and the limit (limiting probability) does not depend on /. ■ The vector tt is the only non-negative solution of the system of linear equalities n n ;=i j=i ■ The vector tt (stationary prob. distrib) satisfies the identity it — irP. ■ For every 1 < / < n, it holds that fa — 1 and ha — ■ If N(i, t) denotes the number of visits to the state / within the t first steps, then .. N(i, t) lim —--- = tt;. t—>oo t Implications: Ergodic Markov chains always "forget", after a number of steps, their initial probability distribution. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 44/128 WEATHER FORECAST The following Markov chain wf depicts probabilities for going from sunny days to rainy and vice verse. 0.9 0.5 Transition matrix has the form 0.9 0.1 0.5 0.5 and the stationary probability distribution of wf is tt — (0.833,0.167) and it holds ttp — tt. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 45/128 EXAMPLE - QUEUE - 0 r i-A if/ = o P,v = l 1 - A - fj, if 1 < / < n - 1 I 1 — ji \f i — n Pi,i+1 = A if / < n; P/,/_i = /2 if ; > 0; 1 - A if /' = 0; 1-X- fi\fl< i < n-1; 1 — fi if / = n; This Markov chain is ergodic and therefore it has a unique stationary distribution tt. It holds 7T0 tt i tt n IV054 1. Chapter 9. Random Walks - Markov Chains/Models 46/128 7T0(1 - A) + 7Tl/i 7T/_lA + 7T/(1 — X/l) + 7T/+l/i, 1 < / < ti — 1 7Tn_lA + 7T,,(1 - /i) IV054 1. Chapter 9. Random Walks - Markov Chains/Models 47/128 EXAMPLE - QUEUE - Qn - III. From that one can show that 7T; = 7TQ AY is the solution of the above system of equations. Since it holds ;=o /=o 1 £ľ=o(W IV054 1. Chapter 9. Random Walks - Markov Chains/Models 48/128 MARKOV CHAINS for GRAPHS Let G — (V, E) be a connected, non-bipartite, and undirected graph with \ V\ — n and \E\ — m. G induces a Markov chain, denoted by Mg, states of which are nodes of G and for any two nodes u, v £ V Puv = P(u, v) where d(u) is the degree of u (in G). 0, otherwise; Properties of Mg m Mg is irreducible. ■ Periodicity of Mg is the greatest common divisor of the length of all closed walks in G. m Since G is undirected, there are closed walks of length 2; ■ Since G is non-bipartite, it has odd cycles and therefore the greatest common divisor of all closed walks is 1. Hence, Mg is aperiodic. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 49/128 STATIONARY DISTRIBUTION of MARKOV CHAINS on GRAPHS Ergodic theorem therefore implies that Mg has a unique stationary distribution it. This 7r is easy to determine. Indeed, it holds Lemma For all v e V, ttv = Proof Let [ttP]v be the v-th component of ttP. Then Corollary: For all v G V, hvv = — 2m d(v)- Comment: Google's page ranking algorithm is essentially a Markov chain over the graph of the web. = [nP]v = J2nuP(u,v)= {u,v)€E d{u) 1 2m X d(u) d(v) ^ 2m 2m IV054 1. Chapter 9. Random Walks - Markov Chains/Models 50/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 51/128 APPLICATIONS of MARKOV MODELS Physics - especially thermodynamics and statistical mechanics. Chemistry The PageRank of a webpage, as used by Google, is defined by a Markov chain. It is the probability to be at page / in the stationary distribution on the following Markov chain on all known webpages. If N is the number of known webpages, and a page / has links to k, webpages, then the probability to go to any of these pages is ol 1 — ol k, N and the probability to go to any other page is 1 — a where the parameter a (experimentally chosen) is about 0.85. ■ Economics and finance ■ Social sciences ■ Mathematical biology ■ Algorithmic music composition IV054 1. Chapter 9. Random Walks - Markov Chains/Models 52/128 (TIME) REVERSIBLE MARKOV CHAINS An ergodic Markov chain with a stationary distribution it and transition probabilities P,j is called (time) reversible, if it hols, for any states / and j: Informally, in the time reversible Markov chains, for each pair of states /,_/, the long-run rate at which the chain makes a transition from state / to state j equals the long-run rate at which the chain makes a transition from state j to state /. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 53/128 (TIME) REVERSIBLE MARKOV CHAINS - II. DESIGN of TIME-REVERSIBLE CHAINS For time revrsible Markov chains the stationary distribution is easy to compute due to the following theorem. Theorem Consider an ergodic Markov chain with states {1,..., n} and the transition matrix P. If there is a vector it of non-negative numbers it — (iti,it2, ... ,irn) such that ^tt,■ — 1 and for any pair of states i,j it holds TTjPiJ — -.P.., then it is the stationary distribution corresponding to P. Proof Consider the j'-th entry of it. Conditions of theorem imply for any j n n 5Z7r'P'J = ^KjPj'i = 7T/ ■ 1 = TTj and therefore it holds irP — it. Hence it is the stationary distribution and can be computed from the above system of linear equations. The reason why a reversible chain is called reversible is that if we start in the stationary distribution at time 0, then the sequence of random variables (Xo,... , Xt) has exactly the same distributions as the reversed sequence (Xt,..., Xo). Given any finite Markov chain with a transition matrix P and stationary distribution it, then the matrix P*, where 7r,p,j = 717 p,* is the transition matrix of a time-reversible Markov chain. Indeed, it holds □ The matrix P* is stochastic because ^2pu = ^PjiKj/m = 7r'77r' = !■ j j B The reversed chain has the same stationary distribution, because P*'s paths starting from the stationary distribution are reverse of P's paths starting from the same distribution. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 54/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 55/128 REVERSIBLE and TIME REVERSIBLE MARKOV CHAINS EXAMPLE 1 Let us have a Markov chain process: C\ : ..., X„_2, X„_i, Xn with transition probabilities Py and stationary probabilities 7r,. Let us have a connected, undirected and non-bipartite graph G with a set of nodes {1,2,..., n} and with weights vvy = vy,, assigned to any edge (/,_/). The reverse process: Cl : . . . , Xn , Xn-1, Xn-2, ■ ■ ■ is also a Markov chain with transition probabilities To G we can associate a Markov chain Cg with transition probabilities p _ wü Q:, = *>P* Markov chain Ci is called time reversible if C?y = P,,, for all /,_/ what is equivalent with and then Cg is time reversible and E, Efcw* TTjPji — TTjPjj are its stationary probabilities. Theorem: Suppose an ergodic irreducible Markov chain C has transition probabilities Py. If there are non-negative numbers x, summing up to 1 and satisfying all equalities XiPjj — XjPji for all /,_/, then C is time reversible and x, = 7r, for all /. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 56/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 57/128 Example 2 APPLICATIONS - SAMPLING Let us have the graph G with nodes {1,2,3,4,5} and with non-zero-weight edges: Wl2 = 3, Win — 1, 1V15 = 2 w35 — 6, W45 = 4 Then P12 = 1/2; PM = 1/6; P15 = 1/3 APPLICATIONS - SAMPLING P21 = 1; P35 = 1; P« = 1/5; P45 = 4/5 and Cg is time reversible. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 58/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 59/128 SAMPLING MONTE CARLO ESTIMATION OF vr Sampling in a set 5 according a given probability distribution tt, on elements of 5, is picking up an element x E 5 with probability 7r(x). ■ Let Z = (X, V) be a point chosen randomly in a 2 x 2 square centered in (0, 0). ■ This is equivalent to choosing X and y randomly from interval [—1,1]. ■ Let Z be considered as random variable that has value 1 (0) if the point (X, y) lies in the circle of radius 1 centered in the point (0, 0). ■ Clearly Pr(Z=l) = £ ■ If we perform such an experiment m times and Z, be the value of Z at the /th run, and 1/1/ = E™iz-. then |~ m ~| m E[W] = E ]TZ, = £e[Z,] = ^ L'=i J '=i and therefore 1/1/' = (4/m)W is a natural estimation for tt. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 60/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 61/128 HOW to do SAMPLING? A natural question now is how good is the estimation of tt that we get from E[W]=™ W' = (4/m)W An application of the Chernoff bound gives: = Pr([W - E[W] > eE[W]) < 2e-m7re2/12 Therefore, taking m large enough we get an arbitrarily good approximation of tt. In the previous example the Monte Carlo method used a uniform sampling in a square to achieve a counting - to determine the value of tt. In various other cases we can do efficient computation provided we can perform sampling according to a given probability distribution. A Markov-models-induced Monte Carlo method provides a very general approach to sample according to a desired probability distribution. The basic idea is to create an Ergodic Markov Chain whose states form the sample space and whose stationary distribution is the required sampling distribution. Let Xo, Xi,... be a run of such a Markov chain. The chain converges to the stationary distribution from any state Xo and so after a sufficiently large number of steps r, the distribution of the state Xr will be close to the stationary distribution and so it can be used for required sampling. We can repeat the same argument starting with Xr and getting to Xir and so on. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 62/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 63/128 DESIGN of MARKOV CHAINS with UNIFORM STATIONARY DISTRIBUTION - I. We show first how to construct a Markov Chain with a stationary distribution that is uniform over the given state space ft. The first step is to define on ft a neighbourhood relation - that is to make out of ft a graph - and in such a way that the graph obtained will be irreducible. The second step is to define transition probabilities in such a way that the resulting stationary distribution is uniform. The next lemma show how this can be done in case we can introduce also self-loops. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 64/128 DESIGN of MARKOV CHAINS with UNIFORM STATIONARY DISTRIBUTION - II. Notation: For any x £ Q let A/(x) be the set of neighbours in the created graph and let M > N — maxx£o \ N(x)\. Lemma: For a finite state space Q and a given neighbourhood structure and any M > N let us design a Markov chain such that for any x,y £ Q ( 1/M if x/y,y £ A/(x); Px,y = < 0 if x/y,y £ A/(x); { 1-N(x)/M ifx = y; Then if this chain is irreducible and aperiodic, then its stationary distribution is the uniform distribution. It is easy to see that the chain is time reversible and so we can apply corresponding theorem from slide 55, Then for any x, irx — IV054 1. Chapter 9. Random Walks - Markov Chains/Models 65/128 EXAMPLE - INDEPENDENT SETS of a GRAPH - I. EXAMPLE - INDEPENDENT SETS of a GRAPH - II. A set 5 of nodes of a graph G is called independent if no two nodes in 5 are connected by an edge in G. Consider a Markov chain, whose states are independent sets of a graph G = (V,E). a Let Xo be an arbitrary independent set in G. B To determine Xl+\ do the following ■ choose a node v uniformly and randomly from V; m if v £ X;, then X,+i = X,■ — {v}; m if v 0 X, and if adding v to X still gives an independent set, then X,+i = X, U {v}; m otherwise set X,+i = X, Using the construction from previous lemma one can show that if x, y are neighbouring independent sets then Pxy = 1/\V\ and the stationary distribution is the uniform distribution. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 66/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 67/128 EXERCISE s From a bag of white and black balls you should pick up a white (black) ball with probability | (|). How can you do that? You have 11 boxes of balls. You should pick a bal a bal a bal a bal a bal a bal a bal a bal a bal a bal a bal from the first box with probability from the second box with probability from the third box with probability from the fourth box with probability from the fifth bOx with probability ^; from the sixth bOx with probability from the seventh box with probability J^; from the eighth box with probability from the ninth box with probability from the tenth box with probability from the eleventh box with probability — 36 ' 36 ' How can you do that? IV054 1. Chapter 9. Random Walks - Markov Chains/Models 68/128 The METROPOLIS ALGORITHM This is a general method to transform any irreducible Markov chain on a state space Q to a Markov chain with a required stationary distribution. Let us assume that we have already designed an irreducible state space (graph) and we want to construct a Markov chain on this state space with a stationary distribution irx = b(x)/B, where b(x) > 0 for all x £ Q and B — X^en Kx) 's finite. Theorem For a finite state space Q, its neighbourhood structure {A/(x) | x £ Q} and N — maxxGc> |A/(x)| let M > N be any such number. For any x £ Q, let irx be the desired probability of state x in the stationary distribution. Consider a Markov chain with f i? ' min{l,7ry/7rx} ifx/y£/V(x); Px,y = I 0 if x/y £* A/(x); I 1 - Ey^x p*.y if x=y; Then, if this Markov chain is irreducible and aperiodic, then its stationary distribution is given by probabilities irx. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 69/128 DOING SAMPLING - once more CONVERGENCE of RANDOM WALKS In many important applications we need to do sampling/use of elements of a set S according to a given probability distribution it on S. One way to do that is to design such a Markov chain M on the set S that has it as the stationary distribution and then to start a random walk on M and to stop when one can expect that stationary distribution is (almost) reached. To find time for halting we need an estimation of the convergence rate of any initial distribution to the stationary one. CONVERGENCE Of RANDOM WALKS IV054 1. Chapter 9. Random Walks - Markov Chains/Models 70/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 71/128 CONVERGENCE of RANDOM WALKS - BASICS In the next slides we deal with the following problem: how many steps are needed for a random walk (that starts at some node), to converge to the stationary distribution -this means that the walk will be in each node with the probability specified by the stationary distribution. We present two techniques to deal with the above problem. Stopping rule method calculates the rate of convergence directly by defining a proper stopping rule. Coupling method reduces the problem of determining the rate of convergence to that of calculating the number of steps needed to meet another, imaginary, random walk, that starts at the stationary distribution. PRELIMINARIES Distance: ||P — Q\\, between two probability distributions, P and Q, on a set / of states is defined by ||P-Q|| = max|P(/')- i. The idea is that the variables Z, are observed in the order one at each time step: at first Zi, then Z2 and so on. The value of the variable T then shows the number of variables observed when such a process has to stop. Observe that if the sequence {Z,} is not independent, then the event T — i may depend on some variable Zj, j > i. It is sometimes required that Pr(T < 00) = 1, or that T is almost surely finite. The intuition behind such a definition of the stopping rule is that at any particular time it is enough to look at the sequence of variables/states considered so far in order to be able to tell if it is time to stop. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 73/128 STOPPING RULE - OBSERVATIONS STRONG UNIFORM STOPPING RULE/TIME Stopping rule can be seen as a mechanism for deciding whether to continue or to stop a process on the basis of the present position and past events, and which will always lead to a decision to stop at some time. Examples Consider a gambler playing roulette, starting with $ 100. ■ Playing one and only one game corresponds to the stopping time T — 1, and this is a stopping rule. ■ Playing until she either runs out of money or has played 500 games is a stopping rule. ■ Playing until she doubles her money is not a stopping rule (here it is assured that betting systems has some limitations on doubling or tripling the money). ■ Playing until she is the maximum amount ahead she will ever be is not a stopping rule (as it requires information about future, present and past). Given are random variables Zi, Z2,..., and a random variable T, whose range are natural numbers. T is a stopping rule for variables Zi, Z2,..., if, for every /, the event T — i is independent of variables Z), for all j > i. For a finite ergodic Markov chain, a strong uniform stopping time T is a stopping rule which satisfies the condition Pr[Zk = Z| 7= k] = 7T„ where Zk is the state at the kth step in the Markov chain, and 7r, is the probability, under the stationary distribution, of being at the state /. Next theorem relates strong uniform stopping rule (time) and the rate of convergence of the random walk. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 74/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 75/128 Theorem Let 7r be the stationary distribution of a random walk, and be the probability distribution of that walk after t steps. In addition, let T be a strong uniform stopping time. Then ||C?(t)) -tt|| < Pr[T> t]. Proof. Let Xt be the random variable producing states from / visited by a random walk at step t. V/' c /, (?(')(/') = pr[xt g /'] = (£ Pr[(Xt g /') n (T — j)]) + Pr[(X g /') d (T > t)] j t]Pr[T > t] = ^7r(/')Pr[T = j] + Pr[Xt g l'\T > t]Pr[T > t] j t]) + Pr[X g l'\T > t]Pr[T > t] < tt(/') + Pr[T > t]Pr[X g l'\T > t] < ir(l') + Pr[T > t] The third equality from the end follows from the fact that T is a strong uniform stopping time and that if a random walk is in a stationary distribution, then it stays in it forever. The last but one equality follows from the fact that Pr[T = j] = Pr[T < t] = 1 - Pr[T > t]. j t] and tt(I') are at most 1, V/'c /,|Q(t)(//)-7r(//)| < Pr[T> t]. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 76/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 77/128 EXAMPLE - HYPERCUBE EXAMPLE - CARD SHUFFLING 1. We show how fast converges to the stationary distribution a special random walk on a hypercube. Consider the following random walk □ Choose uniformly a neighbor (of the currently visited node). Q With probability | move to the chosen node; otherwise do not move at all. (Last trick is needed in order to have an ergodic (aperiodic) Markov chain.) Let us define as the stopping rule T the number of coordinates chosen so far (even if not all choices of coordinates yielded a move). Note that T is a strong uniform stopping rule (informally said, this happens because we have an equal probability of being at any node after all coordinates were chosen). In order to be able to use the last theorem we need to determine Pr[T > t]. However, this is exactly the coupon selector problem, since we can see coordinates as being coupons that need to be collected. For coupon selector problem, the expected number of trials needed to see all n coupons is O(nlgn). Therefore, the expected number of steps until we choose all coordinates is 0(n\gn). Using Markov's inequality we get Pr[T > t] < ^ = C(iip). Suppose we want to shuffle a pack of n cards, numbered 1, 2,..., n, according to the following policy: Move the top card to a random location in the pack. How long it will take to have all cards in a random distribution (to reach the stationary distribution)? This problem can be viewed as a random walk in a graph with n! vertices corresponding to all possible permutations. Edges are determined by the shuffling policy. Denote by BOTTOM the card that was originally at the bottom of the pack. Let T be the number (name) of the card moved from the top at the last step. T is stopping rule and T = BOTTOM is the stopping time. We claim that T is strong uniform time. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 78/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 79/128 This is due to the fact that once we remove BOTTOM from the top of the pack, we are already in the stationary distribution. (It can be shown by induction on the number of cards bellow BOTTOM that the cards below BOTTOM are always in a random order.) We show now that the above stopping rule behaves as a coupon collector. Indeed, define T, to be the number of steps until there are / cards bellow BOTTOM (including BOTTOM). Since T — Tn we have T = Ti + (T2 - Ti) + (T3 - T2) + ... + (Tn - Tn_i). Moreover, T,+i — T, has a geometric distribution with parameter n^L. This is similar to the situation in coupon selection. Indeed, let Vj denote the number of steps until we see / coupons and let V — Vn. Similarity between V and T follows from the fact that V=V1 + (V2-V1) + (V3-V2) + ... + (Vn- V„_i) and that — Vj has also geometric distribution with parameter !2^L. Hence Pr[T>t](u) = 2; (f>(a) = |; (f>(v) = 1; voltage difference between u and v is 1 and between a and v is |. Kirchhoff law: The sum of all currents entering a node u of a network equals the sum of all currents leaving u. Ohm law: I — ^, where / is current; V is voltage; R is resistance. For each edge e, let Re be the resistance of e. For each node u, let iu be the current that enters (and exits) the node u. Elektrický potenciál je práca potrebná k preneseniu jednotkového náboja z nekonečna na dané miesto silového pola po lubovolnej dráhe. Elektrický potenciál je veličina charakterizujúca energetický stav v danom bode elektrického silového pola. Elektrické napätie je práca, ktorá sa vykoná v elektrickom poli pri přeneseni jednotkového elektrického množstva po určitej dráhe. V potenciálovom elektrickom poli nezávisí napätie na dráhe a rovná sa rozdielu potenciálov. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 94/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 95/128 ELECTRICAL NETWORKS - BASICS 2/2 1 ampere 1 ampere Figure: Potentials of nodes are 4>{u) = 2; 4>{a) = § ■ 4>{y) = voltage difference between u and v is 1 and between a and v is |; resistance between u and v is 2; efFective resistance is only 1. We would like to compute the potential 4>(v) for each node v and the current iuv for each edge e = (u, v). By Ohm's Law (V=IR), _ ftu) - 0(v) Re Effective resistance Ruv between two nodes u and v is the voltage difference between u and v when one ampere is injected into u and removed from v. Hence Ruv — 4>(u) - 4>(v). Observe that effective resistance between nodes of an edge can be smaller than the resistance of the edge (see the above figure). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 96/128 COMMUTE TIME and EFFECTIVE RESISTANCE To each graph G we associate an electrical network Af(G) at which the resistance of each edge is 1. Theorem Commute time for any two vertices u and v in G is GUv = 2mRuv = huv + hvu. Proof: Notation: ■ d(x) is the degree of any node x. ■ <$>zv is the potential of any node z, relative to v, when d(x) units of current enter each node x e V, and 2m units of the current exit v. For every edge (u, w) e E it holds, by Ohm's law: ^ uv ^ wv — 'uw Ruw — 'uw • IV054 1. Chapter 9. Random Walks - Markov Chains/Models 97/128 COMMUTE TIME and EFFECTIVE RESISTANCE! Kirchhoff's law implies that for every u e V — {v} DERIVATION Kirchhoff's law implies that for every u £ V — {v} d(u) = ^2 >™ = 0ro ~ ^ wef(u) wef(u) On the other hand, definition of huv, (u, v) e £(Uv Hence, and therefore wGr(íi) wGr(ii) d(u) = <í>uvd(u) - ]T M uv - -t^t y^ d(u) ^ and Hence d(u) ^ Quv^-rr^ y^ i+tttt y^ wv = y^ -j^{i+wv). d(u) Z-, d(u) '. ^. d(u) IV054 1. Chapter 9. Random Walks - Markov Chains/Models 98/128 vver(u) v ' n/er(u) wer(u) v ' and so we got the same equation as huv — J2wer(u) + /?1W)- IV054 1. Chapter 9. Random Walks - Markov Chains/Models 99/128 PROOF - CONTINUATION 1.1 PROOF - CONTINUATION 1.2 Consider now the following network: a current of magnitude 2m enters u and a current of magnitude d(x) exits every node x £ V. If the potential of u in this network is assumed to be equal to 0, then the potential of v is equal to — (f>Uv — —huv-Let us now perform a superposition of such a network with the network considered on previous slide (at which d(x) units of current enter each node x £ V, and 2m units of the current exit v). In the resulting network, all external currents cancel, except for those in vertices u (where the current of magnitude 2m enters) and v (where the current of magnitude 2m exits). The difference of potentials between u and v is: k _ f_/j ) — h + h — -u —r "uv v "vuj "uv i "vu YUl/ T ^VU ^Ul/* Therefore, Cuv, is the voltage between u and v in the last network. Hence, by Ohm law (/ = VR), Claim: For every pair of vertices u and v, the effective resistance Ruv is not more than the distance between u and v in G. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 100/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 101/128 COVER TIME Corollary: Let G = (V, E) be a graph, n = \ V\, m = \E\ and u, v £ V. ■ If (u, v) £ Af(G), then Cuv < 2m; i If i/, v £ V, then Cuv < 2m(n - 1). u \f UjV £ V, then Cuv < n3. Theorem: For a graph G with n nodes and m edges C(G) < 2m(n — 1). Proof: Let T be any spanning tree of G — (V, E). Then there is a traversal of T visiting nodes I/O, H, ... , 1/2,7-2 = I/O that traverses each edge of T exactly once in each direction. Consider a random walk that starts at i/o, visits all nodes in the order prescribed by the traversal, and terminates after returning to i/o. An upper bound on the expected length of such a walk is an upper bound on CVo(G). 2n-3 CvQ{G) < ^2 hvjVj+1 — *^2 Cuv j=0 {u,v)GT Since (vj, i/j+i) £ E, CVjVj+1 < 2m by previous corollary. Therefore CVQ(G) < 2m(n - 1). The above bound is independent of the choice of i/o. Hence C(G) < 2m(n — 1). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 102/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 103/128 SPECIAL TYPES of GRAPHS Let us derive cover time for several special graphs. ■ A complete graph Kn. The problem to determine cover time is essentially the Coupon Collector problem. Therefore C(Kn) — n\gn+ on. A star graph S„ The problem of calculating the cover time is again essentially the Coupon Collector problem. Therefore C(S„) = 2n Ig n + cn. A line L„: Let u\, 112 be the end points of L„. Since RU1,U2 CUliU2 = 2|E(Z.„)|/?U1,U2 = 2{n - l)(n - 1) = 2{n - 1) By symmetry, hul>U2 — (n — l)2, and therefor C(Ln) = (n-lf. Lollipop graph Ln: C(Ln) — 9(n3). IV054 1. Chapter 9. Random Walks - Markov Chains/Models n — 1, we have 2 104/128 EFFECTIVE RESISTANCE of GRAPHS - II. The effective resistance /?(G) of a graph G is defined by R(G) = max Ruv. {u,v}(ZV(G) Theorem mR(G) < C(G) < 2e3mR(G) In n + n. Proof. Lower bound: Let R(G) = Ruv for some vertices iv, v e V. Then C(G) > max(huv, hvu) > C, 2m/?,, m/?(G). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 105/128 EFFECTIVE RESISTANCE of GRAPHS - III. Upper bound. Create a random walk of the length 2e3m/?(G) In n and divide it into In n phases of the same length [= 2e3m/?(G)]. For any vertices u and v, the hitting time huv is at most 2m/?(G). (This is the average time to get through any of In n phases.) By Markov inequality (Pr[Y > t] < ^p-), the probability that v is not visited during a single phase is at most 2^%g)(= 2/,^)) = £ " where t = 2e3m/?(G), E[Y] = 2mR(G). Therefore, the probability that v is not visited during any of the In n phases is at most \e3 J — „3 ■ Summing over n choices of v, we get that the probability that there is a node not visited within 2e3mR(G) In n steps is at most When this happens (that is if there is a node not visited during 2e3mR(G) In n steps), we "continue to walk until all nodes are visited" (and n3 steps are enough for that - what happens with the probability 1/n2). The expected total time is therefore 2e3mR{G) In n + {~^)n3 = 2e3mR{G) In n + n. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 106/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 107/128 APPLICATION of RAYLEIGHT's MONOTONICITY LAV Rayleight's monotonicity law states that the effective resistance of a graph is non-increased (non-decreased), whenever the resistance of any edge of the graph is decreased (increased). Corollary: effective resistance of graphs can not increase by adding edges. Lemma Effective resistance of graphs is not more than its diameter diam(G). Proof The whole graph can be generated by adding edges to the subgraph that corresponds to the diameter. Fact: If G is a /c-regular graph with n edges, then diam(G) < 3f. Theorem If G is a /c-regular graph with n edges, then C(G) — 0(n2 In n). Proof. Since n > k ■ diam(G) and, by the last theorem, C(G) — 0(mR(G) In n), we have R(G) < diam(G) < ^ and C(G)<0(f .^-Inn). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 108/128 USTCON PROBLEM It is the problem to decide, given an undirected graph G and two vertices s and t, whether there is a path from s to t. Let RLP be the family of languages L for which there exists a probabilistic off-line log-space TM M such that for any input x Pr[A4 accepts > i if x G L = 0 if x £ L Theorem USTCON G RLP. Proof Let a log-space bounded probabilistic TM M. simulate a random walk of length 2n3 through the given graph starting from s. If M. encounters t during such a walk, it outputs YES, otherwise it outputs NO. The probability of the output YES instead of NO is 0. What is the probability that M outputs NO instead of YES? We know that hst < n3. By Markov inequality, if t is reachable from s, then the probability that t is not visited during 2n3 steps is at most |. M. needs a space to count till 2n3 and to keep track of its position in the graph during the walk. Therefore it needs space 0(\gn). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 109/128 Nonuniform, deterministic, log-space algorithms for USTCON We will consider regular d-degree graphs with n nodes such that all edges of each node are labeled by labels from {1,2,..., d} UNIVERSAL TRAVERSAL SEQUENCE Any ^ 5 5 Summing up over n choices of v and \Q\ choices of the labeled graph G, the probability that the random walk (sequence) fails to be universal is less than 1. As a consequence, there is a sequence of such a length that is universal for the class G-(We have just used the probabilistic method.) IV054 1. Chapter 9. Random Walks - Markov Chains/Models 110/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 111/128 HIDDEN MARKOV MODELS HMM HIDDEN MARKOV MODELS IV054 1. Chapter 9. Random Walks - Markov Chains/Models 112/128 HIDDEN MARKOV MODELS - DEFINITIONS Hidden Markov Models (HMM) have, similarly as Markov chains considered so far, a set of states and their transition probabilities (that are given). However, in addition, it has a set of outputs each state can produce, according to its given emission probability, each time the system comes to that state. However, in a HMM states are hidden, as well as their transition and emission probabilities, before any observer. An observer can see only the sequences of outputs the states produce. The task is determine, from the large amount of such outputs, all parameters, as well its transition and production probabilities. Hidden Markov Model have been very successfully used in pattern recognition, speech recognition, handwriting and gestures recognition, machine translations, gene predictions, bio-informatics, human activities recognition, as well as in many other applications. In general, HMM can be applied when the goal is to recover a data sequence that is not immediately observable (but other data that depend on the sequence are). IV054 1. Chapter 9. Random Walks - Markov Chains/Models 113/128 HMM - Figure EXAMPLE: URN PROBLEM n 12 a23 V J \ \\ / / \\ bll yi I I y2 y3 1 y4 Figure 1. Probabilistic parameters of a hidden Markov model (example) X — states y — possible observations a — state transition probabilities fa — output probabilities IV054 1. Chapter 9. Random Walks - Markov Chains/Models In a room not visible to an observer there is a robot and urns, X\, X2,..., Xn each containing a known mixture of balls labeled as {yi,y2,...}. Robot works as follows. Chooses randomly, according a given probability distribution, one urn, randomly draw a ball from it, emails its label to the observer, puts the ball back and, according to the probability distribution associated with that urn chooses the next one and the process continues. This process can continue many times. Observes see each time only a sequence of labels y,i,y,2, ....y*. The task for observers is to determine parameters: transition probabilities for states ( of an ordinary Markov chain behind) and the number of different balls in different urns (and emission probabilities - actually number of different balls in urns). 114/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 115/128 EXAMPLE - WEATHER Alice and Bob live far apart from each other and talk daily about what Bob did previous day.His actions (waking, shopping, cleaning) depended on the weather in the following way. Shop From their phone calls Alice tries to deduce how was and is weather in the place Bob lives. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 116/128 INFERENCE PROBLEMS In the following picture x(t) is the state at time t and y(t) is the output at time t. .r(/ - 1) .r)/) )-----^ j'\f ■-]- ]} |_ du +1) Probability of observed sequence: The probability of observing an output sequence Y = y(0),y(l),...,y(/-1) of length / is given by Pr(Y) = Pr(Y\X)Pr(X) where the sum runs over all possible hidden-node sequences X — x(0),x(l),... ,x(l — l).This problem can be handled effectively using so called Forward algorithm. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 117/128 EXAMPLE In the following HMM and its output sequence Filtering: The task is to compute, given the chain's parameters and a sequence of observations, the last states at the end of observations:, i.e. to compute Pr(x(t)\y(l),...,y(t)) the following state sequences are possible: IV054 1. Chapter 9. Random Walks - Markov Chains/Models 118/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 119/128 EXAMPLE 2. Markov model For the Markov model show that: Provided that today is sunny, show that 0.04 is probability that tomorrow is sunny and the day after is rainy. Show that 0.34 is probability that it will be rainy two days from now provided it is foggy today. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 120/128 EXAMPLE 2. Hidden Markov Model Let us add to the previous model two outputs "umbrella" and "no umbrella" and let probability of having umbrella be 0.1 (0.8) [0.3] for the sunny (rainy) [foggy] day. Supposed you were locked in a room for several days and you were asked about weather outside. The only piece of evidence you have is whether a man bringing you food carries umbrella or not. Suppose the day you were locked in was sunny. The next day man carrying food came with the umbrella. Assume that the prior probability of the man carrying an umbrella on any day is 0.5. Show that 0.08 is the probability that the second day was rainy. Suppose the day you were locked in the room was sunny and that man brought an umbrella on day 2 but not on day 3. Show that 0.19 is the probability that it was foggy on day 3. IV054 1. Chapter 9. Random Walks - Markov Chains/Models HMM - speech recognition - example HIERARCHICAL HIDDEN MARKOV MODEL In Hierarchical Hidden Markov Model (HHMM) each state can itself be a HHMM. "one" - W-AX-N (3) "O* (3 -O-* (^) HMM for "AX" HMM for (") "N" A* Entry node "two" - T-00 OoO o Exit node a. S . s,, ■■>. / s Ml r? "S. 4 HMM for ([ T J! HMM for "OO" A simple hidden Markov model topology to recognize two spoken words. IVÚ54 I Chapter 9. Random Walks - Markov Chains/Models 122/128 Illustration of the structure of a HHMM. Gray lines show vertical transitions. The horizontal transitions are shown as black lines. The light gray circles are the internal states and the dark gray circles are the terminal states that returns control to the activating state. The production states are not shown in this figure. IV0r>4 1. Chapter 9. Random Walks - Markov Chains/Models 123/128 HHMM and SPEECH RECOGNITION Appendix A huge amount of samples of speech, from many different individuals, are applied to a HHMM to infer the hierarchy of states and all transition and transmission probabilities (essentially a simulation of neocortex for producing speech), and then the resulting HHMM is used to recognize new utterances. APPENDIX IV054 1. Chapter 9. Random Walks - Markov Chains/Models 124/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 125/128 SECRETARY PROBLEM SOLUTION The problem: ■ There is a single secretariat position to fill n an institute. ■ There are n applicants for the position, and the value of n is known. ■ Each applicant has a unique "quality value" - the interview making committee has no knowledge of quality values of those applicants that have not been interviewed yet and no knowledge how large is the best quality value of applicants. ■ The applicants are interviewed in a random order. ■ After each interview, the applicant is immediately accepted or rejected. ■ The decision to accept or reject an applicant can be based only on the relative "quality value" of the applicants interviewed so far. ■ Rejected applicants cannot be recalled. ■ The goal is to select an applicant with the best 'quality value". ■ How should selection committee proceeds at the best? Terminology: A candidate is an applicant who, when interviewed, is better than all the applicants interviewed previously. Since the goal in the secretary problem is to select the single best applicant, only candidates will be considered for acceptance. Optimal policy for this problem (the stopping rule): For large n the optimal policy is to interview and reject the first -e applicants and then accept the next one who is better than candidates interviewed till then. As n gets larger, the probability of selecting the best applicant goes to -, which is around 37%. IV054 1. Chapter 9. Random Walks - Markov Chains/Models 126/128 IV054 1. Chapter 9. Random Walks - Markov Chains/Models 127/128 ANDREY ANDREEVITCH MARKOV ■ Russian mathematician (1856-1922) ■ He introduced the Markov Models in 1906 ■ The original motivation was to extend the law of large numbers to dependent events. ■ In 1913 he applied his findings to the first 20 000 letters of Pushkin's Eugene Onegin.