M7777 Applied Functional Data Analysis 7. Functional Linear Regression Jan Koláček (kolacek@math.muni.cz) Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University, Brno Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 1/26 Functional Linear Regression Three different scenarios • Scalar-on-function regression: functional covariate, scalar response • Functional response models • scalar covariate • functional covariate We will deal with each in turn. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 2 / 26 Scalar-on-function regression Example: Log total Precipitation ~ Temperature curve Temperature [C] time [days] We want to relate annual precipitation to the shape of the temperature profile. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 3 / 26 Scalar-on-function regression A First Idea We observe y/,x,-(t) Choose ti,..., t/c Then we set yi = a + ^2f3jXi(tj) + Si 7=1 = a + x;/3 + e • And do linear regression. But how many ti,..., tk and which ones? (it should be k « n !!!) Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4 / 26 Scalar-on-function regression In the Limit... If we let ti,..., tk get increasingly dense (i.e. k —> oo) becomes 7=1 yi = oi + J ß(t)xi(t)dt + ei Minimize squared error: ß(t) = arg min ^ fy; - a - Í ß(t)x;(t)dt i=l ^ J How to solve it? (3 approaches) (i) Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 5 / 26 Scalar-on-function regression 1. Estimation through a basis expansion Expand the function /3 using basis functions K ß{t) = ^Cj*j{t). 7 = 1 Thus K ß(t)xi(t)dt = J2cj j(t)xi(t)dt j=l v--y zü and model (1) reduces to y = a + Zc + e. It is a classical linear regression model =4> č Jan Koláček (SCI MUNI) M7777 Applied FDA Scalar-on-function regression The resulting estimate K Disadvantages • Assumption of (3(t) as a linear combination of basis functions 0(f) • Estimate depends on the shape of the basis functions and on their number K Confidence intervals Assuming normality of errors, 95% confidence interval for (3(t)\ K )9(t)± 1.96^0/0X0, 7=1 where aj is j-th diagonal entry of a£ (X;X) ; X = a£ is the sample variance of y — ý. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 7 / 26 Scalar-on-function regression The estimate of ß{t) 1 0 100 200 Days i Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 8/26 Scalar-on-function regression 2. Estimation with a roughness penalty Main idea • The same expansion for but K is taken to be some large value (often K is the number of £;) =4> no longer sensitivity to K • The control of smoothness is shifted from K to the smoothing parameter A and a differential operator L (a penalty term) Px(a,f3) = (yi ~ ® ~ j f3(t)xi(t)dt^j +\j [(Lf3)(t)]2dt. Thus n Px{a1ß) = YJ\r> i=l K 7=1 CjZij + A K -i 2 7=1 dt. The optimal A is selected by cross-validation Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 9 / 26 Scalar-on-function regression Cross-validation scores 5.0 7.5 10.0 12.5 15.0 Log 10 of lambda Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 10 / 26 Scalar-on-function regression The estimate of ß{t) 1e-03 5e-04 E Oe+00 ✓ ✓ / / \ \ \ / / / / / / \ \ \ \ \ \ ___ \ ^\ \ N. \ é t i / i / i / i / / \ \ \ \ I \ \ \ i \ \ \ \ 1 1 1 1 / / / / / / * / ✓ / / / / / / / / / / / / / / / / \ \ \ 1 \ \ \ \ \ \ i \ \ \ l \ \ \ ------ \ \ \ \ \ \ \ \ \ \ \ V \ s -\— \---■"----- v \ \ \ \ \ \ ✓ ✓ / J / / ✓ / ✓ / X / / / / / i / / * / / / / / / / _ z__________ / / \ * \ \ \ \ \ \ >v \ \ ^> \ \ \ \ \ \ ^ >. X >. / / / / / ✓ *""-»__ — _ — "* ( ) 100 200 300 Days Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 11 / 26 Scalar-on-function regression 3. Regression on functional principal components Let us consider an approximation x/(t) of x,-(t) by K principal components K *i(t)=x(t) + ^cu$j(t), 7=1 where ^j(t) is the j-th principal component, c,y = J £/(£)[*;(£) — x(t)]cft is its score. By plugging it in the model (1), it reduces to y; = oi + J ß{t) ^x(t) + ^ctf^(t)j dt + Si K 7 = 1 where ß0 = a + J ß(t)x(t)dt, ßj = / ß(t)£j(t)dt. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 12 / 26 Scalar-on-function regression It is a "classic" regression model y = E/3 + e with j3 = • • • iPk)' and E = [ln|C], C is the score matrix. Denoting the estimates thus obtained by /3o,/3i,... ,$k the estimates of the parameters in (1) are k k J=l J=l • first K components explain 85 or 90 percent of cumulative variance Confidence intervals K / K Var/3(t) = £ Var(#)£?(t) C\:p(t) ± 1.96 J] Var(#)£?(t) y=i \y=i Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 13 / 26 Scalar-on-function regression The estimate of ß{t) Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 14 / 26 Scalar-on-function regression Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 15 / 26 Scalar-on-function regression Assessing the quality Set n n SSEo = J>,- - y)2, SSEi = J>,- - y,) i=l i=l Squared Multiple Correlation RSQ = SSEo - SSEi SSEi F-ratio F = o SSEq-SSE! k-1 SSEi n-k where k ... degrees of freedom (usually No. of parameters) Plotting y vs. y Cross-validation Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 16 / 26 Scalar-on-function regression Assessing the quality Model degrees of freedom RSQ F-ratio Basis expansion 6 0.796 22.58 Roughness penalty 4.6 0.754 25.42 Functional PCA 5 0.757 23.33 Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 17 / 26 Scalar-on-function regression Comparison of fits y Type • 1. Basis expansion • 2. Roughness Penalty • 3. Functional PCA 2.5 3.0 y Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 18 / 26 Scalar-on-function regression Cross-validation • Divide y to 2 groups, training and testing data y = [y,y*] • Construct model based on y • Use the model to predict y* • Compare y* against y* Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 19 / 26 Scalar-on-function regression 4. Nonparametric regression The model (1) yi = a + J P(t)xi(t)dt + ei with no parameters assumption becomes to a general model y; = m(Xi(t))+6i, where m : L2 —> ffi. is a functional that must be estimated. Kernel smoothing n Kjh^djx^i)) E?=^(Hd(x,x|-))' where /7 is a smoothing parameter, K is a kernel function and d(f,g) is a measure of the distance between functions f and g. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 20 / 26 Scalar-on-function regression Comparison of fits y 3.2 3.0 2.8 co LU 2.6 2.4 2.2 / / • : • • • • • ^ / 1 • • •• / • • • • 1 • • • • • • • t • •• \/ / • • / » : • • • • t • • •/ % • • / • / / • • • • • • • Type • 1. Basis expansion • 2. Roughness Penalty • 3. Functional PCA • 4. Nonparametric 2.5 3.0 Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 21 / 26 Problems to solve O Medfly Data • Load the variable medfly from the medfly.RData file. • Perform a functional linear regression to predict the total lifespan of the fly from their egg laying. Choose a smoothing parameter by cross validation, and plot the coefficient function along with confidence intervals (see Figure 1). • Plot the estimated values of lifespan against the measured values (see Figure 2). Calculate the R2 for your regression. • Try a linear regression of lifespan on the principal component scores from your analysis (the previous lesson). What is the R2 for this model? Does lm find that the model is significant? Reconstruct and plot the coefficient function for this model along with confidence intervals (see Figure 3). • Conduct the nonparametric regression. How does it compare to the model obtained through functional linear regression and to the model obtained through PCA? Plot estimated values of lifespan against the measured values for all three cases (see Figure 4). Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 22 / 26 Problems to solve Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 23 / 26 Problems to solve 100 I 200 I I I 300 400 500 I 600 I 700 Observed Figure 2. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 24 / 26 Problems to solve Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 25 / 26 Problems to solve Jan Koláček (SCI MUNI) Ulili Applied FDA Fall 2019 26 / 26