M7777 Applied Functional Data Analysis 12. Sparse FDA Jan Koláček (kolacek@math.muni.cz) Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University, Brno Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 1/30 Sparse FDA Ebay Auctions Jank and Shmueli, 2007 • 7-Day auctions for new Palm M515 PDAs • 149 Auctions, collected May-June 2003 • -. ^ fr -'-..... i ' - " - - "7" / i \ *» ^ - -i * "* -1 \ i - - ** / - j* - «. «r ( AÍ i 4 :.....E ■''' ..... v- ^ % i 111 /1.1 Jf' \ ' * ': V \ : i /!*' • * i d \ y ,' /' ', ' ' / » • / • i r" iii — v: /> » / w 1 / / / / / V • i"\--* V * ft'- ~----- ! D 2 4 Bid Time [days] Sample of 10 auctions - bid histories. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 2 / 30 Sparse FDA Auction Price • Only increases if bid is greater than current price • 1 / I1 ••----1 / / / » - " " !•- -1/ ___________ 1 ,'mr 1 1 *"" 1 * i i / / / i i • • -* / 1 / 1 1 1 / 1 - ' ' 7 ' / / / 1» I * r: / * * * t ■ r . i i --* 0 2 4 Price Time [days] Sample of 10 auctions - price histories. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 3 / 30 Sparse FDA We will consider a model Yij = fi(tij) + £i(tij)+Sij, v-v-' for 1 < / < n, 1 < j < rij, with assumptions fi(t) .. . the mean function (required to be smooth) £j(t) .. . subject specific error functions, induce correlation between observations on the same subject, let's denote c(s, t) = Cov(X(s),X(t)) = Cov(e(s),e(t)) 5,y .. . errors explaining measurement noise, iid across both / and j, let's denote Var(5,y) = a2(tjj). It means, that we observe a process Y{t) in r? samples X/(t), the /-th sample is observed in times ti,..., tn/ with setting Cov( V(s), Y{t)) = c(s, t) + a2(s)/s=t. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4 / 30 Sparse FDA The Main Idea O Let us consider all measurements Yjj, 1 < / < n, 1 < j < n; © Get an estimate jl(t) of the mean function fi(t) (nonparametric, e.g. local linear kernel smoother, spline smoothing etc.) Time Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 5 / 30 Sparse FDA © Let us consider a set of time points pairs with its values Atijn tih) = (Yui ~ A(tf/i))(V(fe - tij2) e T and get the covariance surface estimate c(s, t) (bivariate local linear etc.). It means, that we observe a process Y{t) in r? samples X/(t), the /-th sample is observed in times ti,..., tn/ with setting Cov( V(s), Y{t)) = c(s, t) + a2(s)/s=t. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4 / 30 Sparse FDA The Main Idea O Let us consider all measurements Yjj, 1 < / < n, 1 < j < n; © Get an estimate jl(t) of the mean function fi(t) (nonparametric, e.g. local linear kernel smoother, spline smoothing etc.) Time Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 5 / 30 Sparse FDA © Let us consider a set of time points pairs with its values Atijn tih) = (Yui ~ A(tf/i))(V(fe - tij2) e T and get the covariance surface estimate c(s, t) (bivariate local linear etc.). Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 6 / 30 Sparse FDA Samples: 1 in CD —\ • •• • • •• • o H 3 ~~r 5 6 ~~r 7 Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 7 / 30 Sparse FDA Samples: 2 r~- - 1 • •• • •• ••%« 1 • • • • • • • • • • • • 1 1 • • • M • W «. « • • CD - m - • •• • • • • mm m • - • • • •• • • •• • • • •• •• oo - • •• • •• • •• • • • • • • • • • • • • M • M • mm m • • • ^ OH -21 -4H 0.0 0.5 1.0 X 1.5 Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 19 / 30 Kernel Smoothing Local linear smoother 0.0 0.5 1.0 1.5 X Mean function estimate Local linear smoother with global bandwidth n Ni r i=l j=l -|2 —>> mm Local linear smoother with local bandwidth (Fan & Gijbels (1992)) n Ni r i=l j=l min Known issues: symmetry of c(s, t) (OK for symmetric kernels) positive definiteness of c(s, t) (particularly depends on /?) optimal a(-) optimal h Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 23 / 30 Problems to solve O Motor Oil Data The dataset contains amount of Fe particles depending on operating time and a number of oil changes. Data were collected 2006 - 2016 from 29 heavy-duty army vehicles. • Load the variable df .motor from the motoroil .RData file and plot it (see Figure 1). • Use functions from the file functionsM7777.R to fill the data (see Figure 2). • Try to neglect the number of oil changes and put all groups together (see Figure 3). • Fill the data using one of mentioned methods (see Figure 4). • Do the same with using the FPCA package and compare results (see Figures 5, 6). • (optional) Is the number of oil changes negligible? Conduct the fANOVA analysis. Is it correct to do it? Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 24 / 30 Problems to solve Motor Oil Data 1 2 • ' / \ / / /_ jt % 1 • ✓ / / / / • • ✓ A- - a__1 / - ' Mm - % • s -# r • * < 'v- • r • 3 4+ • - -/ • / / ✓ / t ✓ / / / / m / • • • m\' • ✓ • -• ✓ ✓ s /' 1 '\ 11 m ✓ —7 ✓ * • ( < __ id -•- 1 -•- 17 -•- 2 -•- 18 -•- 3 -•- 19 -•- 4 -•- 20 -•- 5 -•- 21 -•- 7 -•- 22 -•- 8 -•- 23 9 -•- 24 -•- 10 -•- 25 -•- 11 -•- 26 -•- 12 -•- 27 -•- 13 -•- 28 -•- 14 -•- 29 -•- 15 -•- 30 -#- 16 100 200 300 400 0 Motor Time [hrs] 100 200 300 400 Figure 1. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 25 / 30 Problems to solve Motor Oil Data - filled 0 100 200 300 0 100 200 Motor Time [hrs] Figure 2. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 26 / 30 Problems to solve Motor Oil Data i ; / ___* / / r r • , ' ' ' v L , '1 .--;> / ^ • ' V 1 f _ M ' s ^ * ^ U' t 1 1 I IV ' / 1* ' / l' \| I ' / l' > » ' -/ 1 ' s nv -I ---- / 1 ,\ »' < i-irr í;*** «•» r ' 1 t 0 100 200 300 Motor Time [hrs] Figure 3. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 27 / 30 Problems to solve Motor Oil Data - filled 0 100 200 300 Motor Time [hrs] Figure 4. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 28 / 30 Problems to solve functionsM777.R 1 2 3 4 5 6 7 8 9 i / > J » t ✓ J n i • 'l •9 /1 it /t to f _i • 1 * * 1 7 é ' t • A* i < r • 11 12 13 14 15 16 17 18 19 • • i v> i t r ? i li li i/ ti i\ i\ i\ * • t J' u *'/' jal* i i \jm rt 11 1 / > /i » (l l i?/ y> 2 i íi Ji J K Jf 21 22 23 24 25 26 27 28 29 7 * »>/ it t M r r • '* 1 i » í' T f 1 / * *l l I/* *r m lj v ■i l' i' ,1 • • é 4« 10 > • 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 Motor Time [hrs] Figure 5. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 29 / 30 Problems to solve FDAPACE 1 2 3 4 5 6 7 8 9 « / 1 f 1 / If If J If ll ř / j / I t * • ft '• // / t 1* 1 j í / < 1 w . l\f -- * A i M 1 v t / /• • < it • « 11 12 13 14 15 16 17 18 19 • f . _• 1 i i it // % • 1 • (/ // * 11 h d 1 1 tyf // r ■ ; i L / i v ~ * • fl t / u // 1 • »1 • é 21 22 23 24 25 26 27 28 29 i If / 11 i i_ J/ • 4 y fi • #1 1 4 1 • '\ • -á >l ♦ // / < i V m • > l/ v Í • • f m •4 é 10 ( * 20 • 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 Motor Time [hrs] Figure 6. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 30 / 30