Random number generators Non-uniform random variable generation Monte Carlo methods for inference Bi7740: Scientific computing Introduction to Monte Carlo methods Vlad Popovici popovici@iba.muni.cz Institute of Biostatistics and Analyses Masaryk University, Brno Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Supplemental bibliography Gentle, J.E.: Random number generation and Monte Carlo methods. 2003. Springer. 2nd Ed. Jones O., Maillardet R., Robinson, A. Scientific programming and simulation using R. 2009., CRC Press. Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Outline 1 Random number generators 2 Non-uniform random variable generation 3 Monte Carlo methods for inference Inference about the mean Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Numerical experiments: simulations General approach: 1 identify the random variable of interest X 2 identify/postulate its distributional properties 3 generate one or several large samples identical and independely distributed X1, . . . , Xn from the distribution of X 4 estimate the quantity of interest (e.g. estimate EX using sample average) and assess its accuracy (e.g. via confidence intervals) Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Random number generators (RNGs) all random variables can be generated by transforming a uniformly distributed random variable X ∈ U(0, 1) there is no algorithmic (deterministic) way of generating infinitely long sequences of true random numbers computers generate pseudorandom numbers there exist devices to generate (believed to be) random sequences: e.g. radioactive decay: the time elapsed between emission of two consecutive particles (α, β, γ). See: http://www.fourmilab.ch/hotbits Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d two aspects: 1 generate good pseudorandom numbers in U(0, 1): independent and uniformly distributed 2 find proper trasformations to the desired distribution you cannot prove that an RNG is truly random there are a batteries of tests that an RNG must pass to be acceptable for any RNG, one can find a statistical test that will reject it as a good generator Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Formalism: an RNG is a structure (S, µ, f, U, g) where S is a finite set of states µ is a probability distribution on S used to select the initial seed (state) s0 f : S → S is a transition function. The state of the RNG evolves according to the recurrence si = f(si−1) for i ≥ 1 U is the output space. Usually U = (0, 1) g : S → U is the output function. The numbers ui = g(si) are called random numbers produced by the RNG Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d S is finite ⇒ ∃l ≥ 0, j > 0 finite such that sl+j = sl this implies that ∀i ≥ l, ui+j = ui since both f and g are deterministic the smallest positive j for which this happens is called period lenght of the RNG and is denoted by ρ obviously, ρ ≤ |S| ex.: if the state is represented on k bits, then ρ ≤ 2k Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Quality criteria: extremly long period ρ efficient implementation repeatability portability availability of jump-ahead property: quickly compute the si+v given si, so you can partition a long sequence in subsequences to be used in parallel randomness Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Coverage: let Ψt = {(u0, . . . , ut )|s0 ∈ S} is Ψt uniformly covering the hypercube (0, 1)t ? tests of discrepancy between the empirical distribution of Ψt and the uniform distribution figure of merit: a measure of the coverage quality Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Randomness and i.i.d: statistical tests: try to detect empirical evidence against H0: "ui are realizations of i.i.d U(0, 1)". Example: diehard tests (Marsaglia, 1995) passing more tests improves the confidence in RNG, but cannot prove the RNG is foolproof for all cases good RNG passes a set of simple tests polynomial time perfect RNG: there is no polynomial-time algorithm the can predict any given bit of ui with a probability of success ≥ 1/2 + 2−k , for some > 0, after observing u0, . . . , ui−1 the usual RNGs are not polynomial time perfect Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Multiple Recursive Generator has a general recurrence xi = (a1xi−1 + · · · + ak xi−k ) mod m where m (modulus) and k (order) are integers carefully selected, and coefficients a1, . . . , ak ∈ Zm. The state is si = (xi−k+1, . . . , xi)T . When m is prime, it is possible to select ai such that the period length ρ = mk − 1. Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Example (historical, not in serious use anymore): MLCG (Lehmer, 1948): multiplicative linear congruential generator: si+1 = (a1si + a0) mod m This generates integers that are converted to (0, 1) by division with m. Weakness: (Marsaglia, 1968): if (si, . . . , si+d) represent some points in a d−dimensional space, they have a lattice structure: they lie in a number of specific hyperplanes. Famous multipliers (a0 = 0): a1 = 23, m = 108 + 1: original version, has higher order correlations a1 = 65539, m = 229 : infamous RANDU generator (IBM 360 series, in the 1970s): catastrophic higher order correlations a1 = 69069, m = 232 (Marsaglia, 1972): good properties and converage up to 6 dimensions Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Exercise: write a function rng . mlcg = function (n , a1=20, a0=0 , m=53, s0=21) which implements the procedure MLCG (with some default parameters), and returns a sequence of n numbers. generate a sequence and plot ui+1 vs ui > u = rng . mlcg (200) > plot ( u [2:200] , u [ 1 : 1 9 9 ] ) discuss! Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d Exercise: let n = 20000 execute > u = rng . mlcg (n , a1=65539, a0=0 , m=2^31, s0=10) > z = (u−0.5) / (2^31 −1) # map to (0 ,1) > hist ( z ) # i s i t reasonably uniform? > z1 = z [ 1 : ( n −2)]; z2 = z [ 2 : ( n −1)]; z3 = z [ 3 : n ] > plot ( z1 , z2 , pch=19, xlim=c (0 ,1) , ylim=c ( 0 , 1 ) ) > x11 ( ) ; plot ( z1 [ z3 < 0.01] , z2 [ z3 < 0.01] , . . . pch=19, xlim=c (0 ,1) , ylim=c ( 0 , 1 ) ) discuss! Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference RNGs, cont’d In R: don’t let the RNG to be "randomly" selected! for serious work, always set the seed, check the RNG, etc: they might be version-dependent; also you want other to be able to reproduce your results read the help for RNG uniform random numbers are generated with runif() function check also {d, p, q}unif() functions read the help for .Random.seed() Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Outline 1 Random number generators 2 Non-uniform random variable generation 3 Monte Carlo methods for inference Inference about the mean Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Non-uniform r.v. generation (NRNG) Requirements: correctness: a good approximation of the theoretical distribution robustness: RNG should work well on a large range of parameters efficiency Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference NRNG: inversion method best choice, when feasible to generate X with distribution function F, starting from a uniform variate U ∈ (0, 1), apply the inverse F−1 to U: X = F−1 (U) := min{x|F(x) ≥ U} easy to see that the distribution of X is as required: P[X ≤ x] = P[F−1 (U) ≤ x] = P[U ≤ F(x)] = F(x) for some distributions, F−1 can be obtained analytically. Ex.: Weibull distribution F(x) = 1 − exp(−(x/β)α), with α, β > 0; has the inverse F−1 (U) = β[− ln(1 − U)]1/α other distributions do not have a close form inverse: e.g. normal, χ2 ,... ⇒ approximations Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference NRNG: inversion method, cont’d Example (principle of inversion): # return X with cdf F , f o r a # uniform r . v . 0 < U < 1 # ( look −up table method ) X = 0 while (F(X) < U) X = X + 1 return (X) Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference NRNG: Rejection method consider F with a compact support and bounded F(x) ≤ k consider a series of points (Xi, Yi) uniformly distributed under the density function the distribution of Xi is the same as the distribution of X (F): P[a < Xi < b] = probability of a point falling in the region = b a F(x)dx procedure: 1 generate X ∼ U[a, b] and Y ∼ U[0, 1] independently 2 if Y < F(X) return X, otherwise repeat Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference NRNG: Rejection method Exercise: Implement the rejection method for generating random variates from the pdf F(x) =    x if 0 < x < 1 2 − x if 1 ≤ x < 2 0 otherwise Generate n = 5000 r.v., plot their histogram (use hist(..., freq=FALSE, ylim=c(0,1,01)) and the original pdf. Histogram of z z Density 0.0 0.5 1.0 1.5 2.0 0.00.20.40.60.81.0 Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Generating normally distributed r.v. you can use the rejection method alternative: Box-Muller algorithm: based on the observation that the coordinates of points in a 2D Cartesian system described by 2 independent normal distributions correspond to polar coordinates that are realizations of 2 independent uniform distributions Box-Muller transform: if U1, U2 are independent uniformly distributed on (0,1), then Z1 = r cos θ = −2 ln U1 cos(2πU2) Z2 = r sin θ = −2 ln U1 sin(2πU2) Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Improved Box-Muller algorithm, with rejection step: 1 generate U1, U2 ∼ U(−1, 1) 2 accept S2 = U2 1 + U2 2 if S2 < 1, else go to step 1 3 set W = −2ln S2 S2 4 return X = U1W and Y = U2W Exercise: Implement the procedure above in R! Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Other methods for NRNG kernel density estimation: approximate the inverse using a kernels for which efficient generators exist composition: consider F to be a convex combination of several distributions Fj: F(x) = j pjFj(x) To generate from F, one generates J with probability pj and then generates X from Fj convolution: if X = Y1 + · · · + Yn, with Yj independent with specified distributions, then generate the Yj’s and sum them etc etc Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Efficient implementations exist in R for: normal distribution: rnorm; log-normal: dlnorm binomial distribution: rbinom Poisson distribution: rpois . . . Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean Outline 1 Random number generators 2 Non-uniform random variable generation 3 Monte Carlo methods for inference Inference about the mean Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean MC methods for inference General approach: 1 identify the random variable of interest X 2 identify/postulate its distributional properties 3 generate one or several large samples identical and independently distributed X1, . . . , Xn from the distribution of X 4 estimate the quantity of interest (e.g. estimate EX using sample average) and assess its accuracy (e.g. via confidence intervals) Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean Outline 1 Random number generators 2 Non-uniform random variable generation 3 Monte Carlo methods for inference Inference about the mean Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean MC inference about the mean Reminder: problem: compute z = EZ when x is not available analytically, but Z can be simulated consider n replicates Z1, . . . , Zn of Z and estimate z by the empirical mean ˆz = i Zi/n denote σ2 = Var{Z} < ∞ central limit theorem: √ n(ˆz − z) → N(0, σ2 ), as n → ∞ from this, an 1 − α confidence interval can be obtained as ˆz − z1−α/2 σ √ n , ˆz − zα/2 σ √ n where zα denotes the α−quantile of the normal distribution (Φ(zα) = α) Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean MC for inference about the mean Implement the following procedure: write the R function pdf1(n) to generate n = 1000 r.v. drawn from f(X) = 0.2N1(X) + 0.3N2(X) + 0.5N3(X) where Ni are Gaussians with parameters µ1 = 0, σ1 = 0.5, µ2 = 6.5, σ2 = 1.25, µ3 = 14.5, σ3 = 0.75. Do not use for loops or any function from the various nonstandard packages! plot the density of the sample drawn and compare it with the theoretical plot of the mixture density repeat the procedure for n = 10000 and n = 100000. what do you see? Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean -5 0 5 10 15 20 0.000.050.100.15 density.default(x = x) N = 1000 Bandwidth = 1.294 Density Vlad Bi7740: Scientific computing Random number generators Non-uniform random variable generation Monte Carlo methods for inference Inference about the mean generate p = 1000 samples of n = 1000 r.v.: X[p × n] compute ˆxi as the sample average for each of the p samples and the grand average ˆX what is the true mean of this mixture of Gaussians? test the normality of the distribution of ˆxi (e.g. shapiro.test()) estimate the 95% empirical confidence interval (using quantiles of the distribution of ˆxi) and compare it with the theoretical one (using sample variance for σ2 ) obtained from a single sample (say, X[1,]) Vlad Bi7740: Scientific computing