The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Session 1 Oleg Deev & Stefan Lyócsa Masaryk University *C FINTECH MANAGEMENT Oleg Deev & Štefan Lyócsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 In general, regression analysis is concerned about estimating (conditional) expected value (or values) of variable of interest given known or pre-determined values of one or more independent variables. a Similar ideas are behind most machine-learning techniques: LASSO, Ridge, Elastic net, Bayesian model averaging, Regression trees, Random forest, Neural Networks, Vector Support Machines,... . Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Are we better off if we ignore information from other variables? • What is the probability of institution K to default? Institution A B c D E F G H I J K* Default (1 - yes, 0 - no) 0 0 0 1 1 0 0 Q 1 0 Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Are we better off if we ignore information from other variables? Institution_ABCDEFGHI J K* Default (1 - yes, 0 - no) 0 0 0 1 1 0 0 0 1 0 ? Leverage ratio_566458 10 7286 The average Leverage ratio (Tier I/Total consolidated assets) for defaulted corporations is just 3.67 for non-defaulted 8.43. • Data on leverage ratio seems to be helpful in predicting defaults. • What is the probability of a default given some level of leverage ratio? • We could link leverage ratio to the probability of default. Statistical methods help to find such 'links'. Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model _Case study 3 Review of standard models Model assumptions Let's define: Yi - variable of interest (e.g. return on a loan - RR2i). Xi - explanatory (pre-determined) variable (e.g. verification of the income - ver2i). Ui - Stochastic (random) residual term. i = 1, 2,TV - index that labels observations. We assume that Yi can be calculated given an expected value of Yi given realizations of Yt = E{Y\Xt)+ut Many possibilities for E(.), linear regression assumes, that: Yi = /30 + fiiX% + u% (30 (intercept) a (3i (slope) are unknown parameters, so called rpprpssinn rnpffiripnts_ Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Review of standard models Model assumptions Yi = ß0 + ßiXz + u% Re-arranging: Ui = Yi- (A) + ßiXi) This difference {ui) is called the stochastic residual term or just residual or error term. Error term shows that there are also other factors that influence variable of interest not just Xj. Properties of ui are key in regression analysis. Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Review of standard models Model assumptions In reality, we only assume that Yi = f30 + fiiXi + Ui, and we never have all the data from the whole population (or we do not know the data-generating process). What we have is a sample of data (presumably a random sample of data that is representative). In practice, using data and some models, we estimate this regression. The estimated (sample) model is: Yi = A) + PiXi + u% Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Review of standard models Model assumptions Exa n i pl le Let RR2i be the return on the loan and ver2i a variable with only two values, 1 if income was not verified and 0 otherwise. RR2i = A) + I31ver2i + m Given the data, we estimate the ft parameters: RR2i = 8.58 - 3.52ver2i + u{ The intercept is 8.58 and the slope is —3.52. It is negative, meaning, that loans with not verified income {ver2i = 1) have a lower return than loans that have a verified income {ver2 = 0). Oleg Deev & Stefan Lyöcsa FinTech The goal is that the sample regression line Yi = (30 + fiiXi + Ui fits the true data Yi as 'well as possible'. The difference between the true value and the sample regression line is: Hi = Yi - fa - PiXi What does it mean 'as well as possible'? There are many possibilities: 9 min -> J^Ui = J2Yi - $0 - PiXi $0,$1 Z=l 2=1 n n • min -> X] |^| = X) |li - A - /3o>Al i=l i=l n n 2 min^YjUi = J2(Xi - A) - PiXi)' f3o,Pi i—1 i=l Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Review of standard models Model assumptions Case study 3 Ordinary Least Squares searchers for parameters /30, /?i for which the sum of squared residuals is minimized: min -+ £ ut2 = J2(Yt-p0- p^)2 = /(fa, ft) Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Review of standard models Model assumptions Linear regression and OLS estimator is far from perfect. It is almost surely a faulty model. But, it can nevertheless be useful. Some assumptions: O Model is linear in parameters Yi = ß0 + ß\Xi O Independent variables are not-stochastic. O As E(ui\Xi) = 0 therefore E{Yi\Xi) = ß0 + ßiX{ Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Review of standard models Model assumptions O Residuals are homoscedastic var(ui\Xi) = a Intuitively, if the salary depends on the gender, the error terms should be similar for man and woman. Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Review of standard models Model assumptions Case study 3 I £= U o o OJ O o o o o o o o 1» o c. o o to 'Z' o — 70 80 Plocha bytov T" 90 Výrost, T., Baumohl, E., Lyócsa, S., (2013). Kvantitativné metody v ekonomii 3, s. 218 Oleg Deev & Štefan Lyócsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Review of standard models Model assumptions Case study 3 0 Residuals u^Uj, where i ^ j are not correlated. Beware of the serial dependence in time-series, where error terms might be related in time, e.g. cor(ut,ut-i) ^ 0. Often, time-series data are subject to seasonality, e.g. tourism arrivals in monthly data, cor(ut, 7^ 0. What about spatial dependence? Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model _Case study 3 Review of standard models Model assumptions O Co-variance between U{ and Xi is zero, E{u^Xj) = 0 Assume that Ui and Xi are positively correlated. If Xi increases, so does Ui. Therefore coefficient /32 f°r larger values of Xi underestimates the effect of X on Y, as the error term increases. Therefore fa does not have a meaningful interpretation. Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model _Case study 3 Review of standard models Model assumptions O Number of observations n should be more than the numb of estimated coefficients. How many parameters are estimated in a linear regression mod Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Review of standard models Model assumptions O The variance of the independent variable X should be finite and positive. O Regression is correctly specified. Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Review of standard models Case study 1 and 2 Model assumptions Multivariate regression model Case study 3 O In case of multiple independent variables, there is no perfect co-linearity between them. Co-linearity between variables arises, if a variables is a property where the given variable can be expressed as a linear combination of all other variables. Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 Case study 1 and 2 Case study 2 Multivariate regression model Case study 3 Should we require P2P markets to verify the income of the borro- wer? The return on the loan is RR2i and the variable that codes verification of the income is ver2i. One (not the only one) approach is to estimate of the following linear model: RR2i = A) + I31ver2i + u{ Another one could be a linear regression model: inti = fio + P\ver2i + ui where, inti is the interest rate on the loan contract on a p.a. basis. Let's start the R session and open the script FinTech.R Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 Case study 1 and 2 Case study 2 Multivariate regression model _Case study 3_ • names(DT) • plot(DT$RR2,type='p',pch=19,cex=0.25,xlab='Loans', ylab='Internal rate of return') • abline(h=0,lwd=2,col='red') Loans Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 1 Case study 2 Case study 3 • hist(DT$RR2,breaks=100,xlab=;Return;,prob main=;Distribution of returns;) • abline(v=0,lwd=2,col=;red;) Distribution of returns -100 -80 -60 -40 -20 0 20 40 Return Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 1 Case study 2 Case study 3 • prct = round(100*table(DT$ver2)/sum(table(DT$ver2)),2) • prct • pie(table(DT$ver2),labels=paste(c("Not verified", "Verified"), ", prct, "•/,", sep=), col=c ("white", "red")) Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 Case study 1 and 2 Case study 2 Multivariate regression model Case study 3 boxplot(DT$RR2 ~ DT$ver2,pch=19,cex=0. o o CM o - o C-i o 7 o O O 0 Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 1 Case study 2 Case study 3 • Descriptive statistics • y = DT$RR2 • y = na.omit(y) • install.packages(;lawstat;) • library(lawstat) • round(c(mean(y),sd(y),min(y),median(y),max skewness(y),kurtosis(y)),2) • round(100*sum(y==-100)/length(y),2) • round(100*sum(y>-100 & y<0)/length(y),2) • table(DT$ver2) • prct Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Case study 1 Case study 2 • OLS model estimation • ml = lm(RR2 ~ ver2,data=DT) • ml • summary(ml) • install.packages("moments ") • library(moments) • bptest(ml) • install.packages("sandwich") • library(sandwich) • coeftest(ml, vcov=vcovHC(ml,type=;HC0;)) Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 1 Case study 2 Case study 3 Is higher required return (interest rate) associated with lower returns? The return on the loan is RR2i and the interest rate on the loan (annualized) is inti. We want to estimate: RR2i = f30 + Piinti + ui Another one could be a linear regression model: We already saw data on RR2^ let's continue with inti. Oleg Deev & Stefan Lyöcsa FinTech The purpose of regression analysis Simple linear regression Case study 1 Case study 1 and 2 Case study 2 Multivariate regression model ___Case study 3 • plot(y=DT$int,x=DT$date,type='p',pch=19,cex=0.25, xlab='Date',ylab='Annualized interest rate')) Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 1 Case study 2 Case study 3 • hist(DT$int,breaks=100,xlab=;Interest rate;,prob=T, main=;Distribution of interest rates;) Distribution of interest rates DO O _ o 2? 10) Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 We now use model m7 to predict the return on the next 500 loans. • yhat = predict(m7,new=Sample2) • ytrue = Sample2$RR2 • plot(y=ytrue,x=yhat,pch=19,cex=0.25, ylim=c(min(yhat,ytrue),max(yhat,ytrue)), xlim=c(min(yhat,ytrue),max(yhat,ytrue)), xlab=;Predicted returns;,ylab=;Realized returns;) • cbind(yhat,ytrue) • hist(abs(yhat-ytrue),main=;Forecast errors;) • mean(abs(yhat-ytrue)) • mean((yhat-ytrue)2)) Oleg Deev & Stefan Lyocsa FinTech The purpose of regression analysis Simple linear regression Case study 1 and 2 Multivariate regression model Case study 3 Session 1 Oleg Deev & Stefan Lyócsa Masaryk University *C FINTECH MANAGEMENT Oleg Deev & Štefan Lyócsa FinTech