Systems Identification in Systems Biology David ˇSafr´anek Masaryk University Czech Republic Brno, 6.10.2014 1/48 Outline 1 Introduction 2 The Approach: Parametric Identification 3 System Identifiability Problem 4 Overview of Approaches Brno, 6.10.2014 2/48 Outline 1 Introduction 2 The Approach: Parametric Identification 3 System Identifiability Problem 4 Overview of Approaches Brno, 6.10.2014 3/48 Dynamical Systems in Engineering A Stirred Tank F1, F2 ... input flows (controllable) c1, c2 ... input concentrations (uncontrollable) F, c ... outputs of the system, can be observed (measured) Brno, 6.10.2014 4/48 Dynamical Systems in Engineering A Stirred Tank F1, F2 ... input flows (controllable) c1, c2 ... input concentrations (uncontrollable) GOAL: Keep the outputs F, c constant. Brno, 6.10.2014 4/48 Dynamical Systems in Engineering A Stirred Tank – Mathematical Model dynamics of the volume: dV dt = F1 + F2 − F F = a √ 2gh (Torricelli’s law of fluid dynamics) where a ... effective area of the flow, g ∼ 10m/sec2 V = Ah where A is tank area (independent on h) Brno, 6.10.2014 5/48 Dynamical Systems in Engineering A Stirred Tank – Mathematical Model dynamics of the volume: dV dt = F1 + F2 − F F = a √ 2gh (Torricelli’s law of fluid dynamics) where a ... effective area of the flow, g ∼ 10m/sec2 V = Ah where A is tank area (independent on h) problems: a is difficult to obtain, does this form of the Torricelli’s law really apply for the real case? Brno, 6.10.2014 5/48 Dynamical Systems in Biology Processes Driving the Living Cell nutrients enzymes metabolic products signals proteins regulatory elements METABOLISM PROTEOSYNTHESIS Brno, 6.10.2014 6/48 Dynamical Systems in Biology Processes Driving the Living Cell nutrients enzymes metabolic products signals proteins regulatory elements METABOLISM PROTEOSYNTHESIS questions: how to control cyanobacteria to gain max ethanol how to control E. coli to gain insuline, ... Brno, 6.10.2014 6/48 Dynamical Systems in Biology Signalling Pathways joint work with P. Krejˇc´ı, Masaryk University Brno/Medical Genetics Institute, Cedars-Sinai Medical Center, L.A. Brno, 6.10.2014 7/48 Dynamical Systems in Biology Signalling Pathways What is the right topology? Brno, 6.10.2014 8/48 Wet-lab Measurements Western blots/Northern blots western blots measurements of protein binding (presence of certain proteins) Brno, 6.10.2014 9/48 Wet-lab Measurements Photobioreactor Data Optical density as a proxy of chlorophyll content and cell count Concentration of dissolved O2 and CO2 influenced by photosynthetic activity Rate of respiration as an indicator of metabolic changes Rate of oxygen evolution Červený, J., Nedbal, L. (2009) Metabolic rhythms of the cyanobacterium Cyanothece sp. ATCC 51142 correlate with modeled dynamics of circadian clock. J. Biol. Rhythms 24, 295-303. Brno, 6.10.2014 10/48 Wet-lab Measurements Fluorometer Data Brno, 6.10.2014 11/48 Outline 1 Introduction 2 The Approach: Parametric Identification 3 System Identifiability Problem 4 Overview of Approaches Brno, 6.10.2014 12/48 The Approach: System Identification INPUT: controlled perturbance of input stimuli OUTPUT: measurements of observed variables GOAL: find a system that reliably maps INPUT to OUTPUT Brno, 6.10.2014 13/48 The Approach: System Identification INPUT: controlled perturbance of input stimuli typically interesting patterns exploring most of (expected) systems response pulses, oscillations, ... OUTPUT: measurements of observed variables time-series or steady state data not all variables might be observed measurements might be very imprecise ⇒ noisy data GOAL: find a system that reliably maps INPUT to OUTPUT mapping might be non-linear extrinsic noise on both input, output side system might be affected by intrinsic noise (internal stochasticity) Brno, 6.10.2014 13/48 System Identification Workflow Brno, 6.10.2014 14/48 System Identification Workflow Modelling in Systems Biology SBML, diferenciální rovnice, boolovská sít, Petriho sít, ... biological knowledge databases biological network hypothesis model analysis analytical methods, model checking static analysis, numerical simulation, new hypothesis inference gene reporters, DNA microarray, mass spectrometry, ... emergent properties model questions hypothesis testing, property detection, model validation network reconstruction model specification Brno, 6.10.2014 15/48 System Identification Workflow Modelling in Systems Biology Brno, 6.10.2014 16/48 System Identification Concepts system S mathematical description of the real-world process can be an idealization not necessarily required to be known model structure M non-parametric (table, mapping, frequency diagram, ...) parametric (with a parameter vector θ) M(θ) identification method I depends on available data, kind of the process, ... experimental condition E concrete setting of identification experiment selection and generation of input signals prefiltering of data Brno, 6.10.2014 17/48 Parametric Identification: Problem Statement Definition Parametric model M(θ) describing n dynamically evolving autonomous variables is defined by a set of equations: ˙x(t) = f (x(t), u(t), p) y(t) = g(x(t), s) + (t) where x(t) ∈ Rn for t ≥ 0 is a vector of internal model states u(t) ∈ Ru for t ≥ 0 is a vector of input stimuli y(t) ∈ Rm for t ≥ 0 is a vector of observables (t) is a normally distributed measurement noise If m < n we speak about partially observable models. Parameter θ is defined as a vector p, x(0), s . Brno, 6.10.2014 18/48 Parametric Identification: Problem Statement χ2 (θ) = m k=1 d l=1 yD kl − yk(θ, tl ) 2 yD kl is lth measurement point of the observable yk taken at time tl yk(θ, tl ) is model-predicted yk at time tl by employing parameter estimate θ parameter estimate ˆθ is obtained as a value that minimizes χ2(θ): ˆθ = argmin χ2 (θ) . objective function and reduction to optimisation problem Brno, 6.10.2014 19/48 Parametric Identification: Problem Statement Interpretation in Biology internal states – biochemical substances in the cell observables – substances that can be measured in time (e.g., metabolites or fluorescence reporters) input stimuli – profile of nutrient support, signalling stimuli or light program differential equations define continuous-time deterministic (population-average) evolution of biochemical substances autonomity comes from biochemistry and thermodynamics mass-action kinetics, enzyme kinetics, ... in this setting x(t) and p are always positive Brno, 6.10.2014 20/48 Parametric Identification: Problem Statement Mathematical Models in Biology mechanistic models mass-action systems describes rate of any elementary reaction n i=1 Xi → ...: v = k n i=1 Xσi i where σi denotes kinetic order given by stoichiometry easily obtainable model structure if reaction network is known non-linearity is regular if stoichiometry ≤ 1 typically leads to over-parametrised models Michaelis-Menten systems enzyme kinetics based on pseudo-steady-state approximation reduces number of variables and parameters but for general case very complicated non-linear equations similar are Hill systems (generalisation of MM) Brno, 6.10.2014 21/48 Parametric Identification: Problem Statement Mathematical Models in Biology canonical models S-systems for each species Xi one set of influxes and one set of effluxes is specified in terms of power-law functions: ˙Xi = αi n j=1 X σij j − βi n j=1 X ρij j where n is the number of all system variables, α, β are rate constants for production and degradation, σ, ρ ∈ R are kinetic orders generalised mass-action (GMA) systems for each species Xi a sum of influxes/effluxes is specified (not aggregated) ˙Xi = ni k=1 γik n j=1 X fikj j where nj is number of fluxes affecting Xi , γ positive rate constants, and f ∈ R Brno, 6.10.2014 22/48 Outline 1 Introduction 2 The Approach: Parametric Identification 3 System Identifiability Problem 4 Overview of Approaches Brno, 6.10.2014 23/48 System Identifiability: Theoretical Concept Define the (theoretical) set of exact parameter values: DT (S, M) = {θ | M(θ) matches the system behaviour } Ideally this set should be a singleton. In case of higher cardinality we speak about overparameterization. Assume an estimate ˆθ(N; S, M, I, E) where N is the number of measurements in observed variable y. Brno, 6.10.2014 24/48 System Identifiability: Theoretical Concept Define the (theoretical) set of exact parameter values: DT (S, M) = {θ | M(θ) matches the system behaviour } Ideally this set should be a singleton. In case of higher cardinality we speak about overparameterization. Assume an estimate ˆθ(N; S, M, I, E) where N is the number of measurements in observed variable y. Definition System S is (parameter) identifiable under M, I and E iff ˆθ(N; S, M, I, E) → DT (S, M) as N → ∞. Brno, 6.10.2014 24/48 System Identifiability: Confidence Intervals ˆθi is associated a confidence interval [σ− i , σ+ i ] with the meaning that true value of θi is located in [σ− i , σ+ i ] with probability α asymptotic confidence σ± i = ˆθi ± ∆α(χ2) · Cii where ∆α(χ2 ) is α-quantile for χ2 C = 2 · H−1 H is Hessian matrix (describing curvature of χ2 around ˆθi by second-order partial derivatives) Brno, 6.10.2014 25/48 System Identifiability: Confidence Intervals ˆθi is associated a confidence interval [σ− i , σ+ i ] with the meaning that true value of θi is located in [σ− i , σ+ i ] with probability α asymptotic confidence σ± i = ˆθi ± ∆α(χ2) · Cii where ∆α(χ2 ) is α-quantile for χ2 C = 2 · H−1 H is Hessian matrix (describing curvature of χ2 around ˆθi by second-order partial derivatives) gives a good approximation of actual uncertainty of ˆθi if: data have small error amount of data is large wrt number of parameters exact if y(t) depends linearly on θ Brno, 6.10.2014 25/48 System Identifiability: Confidence Intervals finite sample confidence {θ | χ2 (θ) − χ2 (ˆθ) < ∆α} where ∆α is α-quantile as in the previous case gives an approximation of actual uncertainty of ˆθi up-to a statistically computed threshold Brno, 6.10.2014 26/48 System Identifiability Definition Parameter θi is identifiable iff the confidence interval [σ− i , σ+ i ] of the estimate ˆθi is finite. Brno, 6.10.2014 27/48 System Identifiability Definition Parameter θi is identifiable iff the confidence interval [σ− i , σ+ i ] of the estimate ˆθi is finite. Reasons leading to non-identifiability: structural: model structure practical: precision of measured data Brno, 6.10.2014 27/48 Structural Identifiability Definition A parameter θi is structurally identifiable if a unique minimum of χ2(θ) exists with respect to θi . structural identifiability requires uniqueness of the solution redundant parameterisation of the model causing insufficient mapping of internal states x to observables y denote θamb ⊂ θ the set of ambiguous parameters values of θamb may be varied without any change in y (and thus χ2(θ) keeps constant) in such a case there must be functional relations h among the parameters in θamb that are invariant wrt χ2(θ), and moreover: ∀i, θi ∈ θamb. σ− i = −∞ ∧ σ+ i = ∞ Brno, 6.10.2014 28/48 Structural Identifiability Structuraly Non-identifiable Parameters functional relation between parameters: h(θamb) = θ1 · θ2 − 10 = 0 Brno, 6.10.2014 29/48 Structural Identifiability Structuraly Identifiable Parameters Brno, 6.10.2014 30/48 Practical Identifiability Definition A parameter estimate ˆθi is practically non-identifiable if the finite sample confidence interval is infinitely extended in decreasing and/or increasing direction although there exists a unique minimum of χ2. practical identifiability implies structural identifiablitity practical non-identifiability does not decide on structural identifiability detailed analysis can be used to improved experiment design Brno, 6.10.2014 31/48 Structural Identifiability Structuraly Non-identifiable System confidence region is infinitely extended for θ1 → ∞ and θ2 → ∞ Brno, 6.10.2014 32/48 Detecting Identifiability differential algebraic methods to analyse the system equations can detect structural identifiability, computionally hard detection of χ2 flateness using simulated and experimental data approximation of curvature measures by quadratic approximation of χ2 at ˆθ computation of Hessian or Fisher information matrix appropriate for linear relations h among parameters practical non-identifiability cannot be detected Brno, 6.10.2014 33/48 Detecting Identifiability Profile Likelihood Method by Raue et al. 2009 explore the parameter space for each parameter in the direction of least increase in χ2 in particular this allows to follow the functional relations h(θsub) = 0 for practical identifiability detect crossing of the quantile threshold profile likelihood χ2 PL is defined for each parameter θi : χ2 PL(θi ) = minθj=i χ2 (θ) . Brno, 6.10.2014 34/48 Experiment Design Profile Likelihood Method by Raue et al. 2009 suggestion of additional targeted measurements need measurements that narrow the confidence interval explore trajectories along PL of θi to improve estimation of θi Brno, 6.10.2014 35/48 Parameter Identification Signalling Pathway Example by Raue et al. 2009 studied system, external stimuli and measured vs. simulated data Brno, 6.10.2014 36/48 Parameter Identification Signalling Pathway Example by Raue et al. 2009 studied system, external stimuli and measured vs. simulated data Brno, 6.10.2014 36/48 Parameter Identification Signalling Pathway Example by Raue et al. 2009 profile likelihood and its quadratic approximation Brno, 6.10.2014 37/48 Parameter Identification Signalling Pathway Example by Raue et al. 2009 relations among parameters Brno, 6.10.2014 38/48 Parameter Identification Signalling Pathway Example by Raue et al. 2009 further PL-based analysis for experimental planning Brno, 6.10.2014 39/48 Outline 1 Introduction 2 The Approach: Parametric Identification 3 System Identifiability Problem 4 Overview of Approaches Brno, 6.10.2014 40/48 Parameter Identification: Approaches Overview bottom-up vs. top-down modelling bottom-up means detailed reconstruction from first principles top-down (inverse) approach starts from high-throughput data steady-state vs. transient modelling steady-state data give simplifying assumption (time is abstracted by long-run view) works well for processes with a unique stable state availability of internal system variables at steady-state (e.g., metabolism) transient analysis more complicated (requires detection of initial states and appropriate time-series resolution is needed to inverse modelling) Brno, 6.10.2014 41/48 Inverse Modelling Approach I-Chun Chou, E.O. Voit / Mathematical Biosciences 219 (2009) 57-83 Brno, 6.10.2014 42/48 Inverse Modelling Methods I-Chun Chou, E.O. Voit / Mathematical Biosciences 219 (2009) 57-83 Brno, 6.10.2014 43/48 Optimisation Methods I-Chun Chou, E.O. Voit / Mathematical Biosciences 219 (2009) 57-83 Brno, 6.10.2014 44/48 Structure Identification I-Chun Chou, E.O. Voit / Mathematical Biosciences 219 (2009) 57-83 Brno, 6.10.2014 45/48 Parameter Exploration and Synthesis by Model Checking reconstruction observation model system properties observed propertiesparameter identification system formalization specified admissible parameter settings properties + required synthesis Brno, 6.10.2014 46/48 Parameter Synthesis from LTL Specifications Robustness Given an LTL property ϕ and a parameterized model M check if M(θ) |= ϕ holds for all possible parameterizations θ ∈ P (valuations of parameters), P is called the parameter space. Parameter Synthesis Problem Given an LTL property ϕ and a parameterized model M find the maximal set P ⊆ P of parameterizations such that M(θ) |= ϕ for all θ ∈ P. Problem Reduction Robustness is reduced to Parameter Synthesis Problem by taking the set P of all possible parameterizations as P. Brno, 6.10.2014 47/48 References T. S¨oderstr¨om, P. Stoica. System Identification. Prentice-Hall, 1989. I-Chun Chou, E.O. Voit. Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical Biosciences 219 (2009) 57-83 A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingm¨uller and J. Timmer. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, Vol. 25 no. 15 2009, pages 1923-1929. discussions with Stephan M¨uller, Jan van Schuppen, Alessandro Abate Brno, 6.10.2014 48/48