171 Figure 7.1 Me playing with my dinga-ling in the Holimarine Talent Show. Note the groupies queuing up at the front 7.1.  What will this chapter tell me? 1 Although none of us can know the future, predicting it is so important that organisms are hard wired to learn about predictable events in their environment. We saw in the previous chapter that I received a guitar for Christmas when I was 8. My first foray into public performance was a weekly talent show at a holiday camp called ‘Holimarine’ in Wales (it doesn’t exist anymore because I am old and this was 1981). I sang a Chuck Berry song called ‘My ding-a-ling’1 and to my absolute amazement I won the competition.2 Suddenly other 8 year olds across the land (well, a ballroom in Wales) worshipped me (I made lots of friends after the competition). I had tasted success, it tasted like praline chocolate, and so I wanted to enter the competition in the second week of our holiday. To ensure success, I needed to know why I had won in the first week. One way to do this would have been to collect data and to use these data to predict people’s evaluations of children’s performances in the contest 1 It appears that even then I had a passion for lowering the tone of things that should be taken seriously. 2 I have a very grainy video of this performance recorded by my dad’s friend on a video camera the size of a medium-sized dog that had to be accompanied at all times by a ‘battery pack’ the size and weight of a tank. Maybe I’ll put it up on the companion website … 7Regression 172 DISCOVERING STATISTICS USING SAS from certain variables: the age of the performer, what type of performance they gave (singing, telling a joke, magic tricks), and maybe how cute they looked. A regression analysis on these data would enable us to predict future evaluations (success in next week’s competition) based on values of the predictor variables. If, for example, singing was an important factor in getting a good audience evaluation, then I could sing again the following week; however, if jokers tended to do better then I could switch to a comedy routine. When I was 8 I wasn’t the sad geek that I am today, so I didn’t know about regression analysis (nor did I wish to know); however, my dad thought that success was due to the winning combination of a cherub-looking 8 year old singing songs that can be interpreted in a filthy way. He wrote me a song to sing in the competition about the keyboard player in the Holimarine Band ‘messing about with his organ’, and first place was mine again. There’s no accounting for taste. 7.2.  An introduction to regression 1 In the previous chapter we looked at how to measure relationships between two variables. These correlations can be very useful but we can take this process a step further and predict one variable from another. A simple example might be to try to predict levels of stress from the amount of time until you have to give a talk. You’d expect this to be a negative relationship (the smaller the amount of time until the talk, the larger the anxiety). We could then extend this basic relationship to answer a question such as ‘if there’s 10 minutes to go until someone has to give a talk, how anxious will they be?’ This is the essence of regression analysis: we fit a model to our data and use it to predict values of the dependent variable from one or more independent variables.3 Regression analysis is a way of predicting an outcome variable from one predictor variable (simple regression) or several predictor variables (multiple regression). This tool is incredibly useful because it allows us to go a step beyond the data that we collected. In section 2.4.3 I introduced you to the idea that we can predict any data using the following general equation: outcomei = ðmodelÞ + errori (7.1) This just means that the outcome we’re interested in for a particular person can be predicted by whatever model we fit to the data plus some kind of error. In regression, the model we fit is linear, which means that we summarize a data set with a straight line (think back to Jane Superbrain Box 2.1). As such, the word ‘model’ in the equation above simply gets replaced by ‘things that define the line that we fit to the data’ (see the next section). With any data set there are several lines that could be used to summarize the general trend and so we need a way to decide which of many possible lines to chose. For the sake of making accurate predictions we want to fit a model that best describes the data. The simplest way to do this would be to use your eye to gauge a line that looks as though it summarizes the data well. You don’t need to be a genius to realize that the ‘eyeball’ method is very subjective and so offers no assurance that the model is the best one that could have been chosen. Instead, we use a mathematical technique called the method of least squares to establish the line that best describes the data collected. 3 I want to remind you here of something I discussed in Chapter 1: SAS refers to regression variables as dependent and independent variables (as in controlled experiments). However, correlational research by its nature seldom controls the independent variables to measure the effect on a dependent variable and so I will talk about ‘independent variables’ as predictors, and the ‘dependent variable’ as the outcome. How do I fit a straight line to my data? 173CHAPTER 7 Regression 7.2.1.   Some important information about straight lines 1 I mentioned above that in our general equation the word ‘model’ gets replaced by ‘things that define the line that we fit to the data’. In fact, any straight line can be defined by two things: (1) the slope (or gradient) of the line (usually denoted by b1 ); and (2) the point at which the line crosses the vertical axis of the graph (known as the intercept of the line, b0 ). In fact, our general model becomes equation (7.2) below in which Yi is the outcome that we want to predict and Xi is the ith participant’s score on the predictor variable.4 Here b1 is the gradient of the straight line fitted to the data and b0 is the intercept of that line. These parameters b1 and b0 are known as the regression coefficients and will crop up time and time again in this book, where you may see them referred to generally as b (without any subscript) or bi (meaning the b associated with variable i). There is a residual term, εi , which represents the difference between the score predicted by the line for participant i and the score that participant i actually obtained. The equation is often conceptualized without this residual term (so ignore it if it’s upsetting you); however, it is worth knowing that this term represents the fact that our model will not fit the data collected perfectly: Yi = ðb0 + b1XiÞ + εi (7.2) A particular line has a specific intercept and gradient. Figure 7.2 shows a set of lines that have the same intercept but different gradients, and a set of lines that have the same gradient but different intercepts. Figure 7.2 also illustrates another useful point: the gradient of the line tells us something about the nature of the relationship being described. In Chapter 6 we saw how relationships can be either positive or negative (and I don’t mean the difference between getting on well with your girlfriend and arguing all the time!). A line that has a gradient with a positive value describes a positive relationship, whereas a line with a negative gradient describes a negative relationship. So, if you look at the graph in Figure 7.2 4 You’ll sometimes see this equation written as: Yi = (β0 + β1 Xi ) + εi The only difference is that this equation has got βs in it instead of bs and in fact both versions are the same thing, they just use different letters to represent the coefficients. Figure 7.2 Lines with the same gradients but different intercepts, and lines that share the same intercept but have different gradients 174 DISCOVERING STATISTICS USING SAS in which the gradients differ but the intercepts are the same, then the dashed line describes a positive relationship whereas the solid line describes a negative relationship. Basically, then, the gradient (b1 ) tells us what the model looks like (its shape) and the intercept (b0 ) tells us where the model is (its location in geometric space). If it is possible to describe a line knowing only the gradient and the intercept of that line, then we can use these values to describe our model (because in linear regression the model we use is a straight line). So, the model that we fit to our data in linear regression can be conceptualized as a straight line that can be described mathematically by equation (7.2). With regression we strive to find the line that best describes the data collected, then estimate the gradient and intercept of that line. Having defined these values, we can insert different values of our predictor variable into the model to estimate the value of the outcome variable. 7.2.2.   The method of least squares 1 I have already mentioned that the method of least squares is a way of finding the line that best fits the data (i.e. finding a line that goes through, or as close to, as many of the data points as possible). This ‘line of best fit’ is found by ascertaining which line, of all of the possible lines that could be drawn, results in the least amount of difference between the observed data points and the line. Figure 7.3 shows that when any line is fitted to a set of data, there will be small differences between the values predicted by the line and the data that were actually observed. Back in Chapter 2 we saw that we could assess the fit of a model (the example we used was the mean) by looking at the deviations between the model and the actual data collected. These deviations were the vertical distances between what the model predicted and each data point that was actually observed. We can do exactly the same to assess the fit of a regression line (which, like the mean, is a statistical model). So, again we are interested in the vertical differences between the line and the actual data because the line is our model: we use it to predict values of Y from values of the X variable. In regression these differences are usually called residuals rather than deviations, but they are the same thing. As with the mean, data points fall both above (the model underestimates their value) and Figure 7.3 This graph shows a scatterplot of some data with a line representing the general trend. The vertical lines (dotted) represent the differences (or residuals) between the line and the actual data 175CHAPTER 7 Regression below (the model overestimates their value) the line, yielding both positive and negative differences. In the discussion of variance in section 2.4.2 I explained that if we sum positive and negative differences then they cancel each other out and that to circumvent this problem we square the differences before adding them up. We do the same thing here. The resulting squared differences provide a gauge of how well a particular line fits the data: if the squared differences are large, the line is not representative of the data; if the squared differences are small, the line is representative. You could, if you were particularly bored, calculate the sum of squared differences (or SS for short) for every possible line that is fitted to your data and then compare these ‘goodness-of-fit’ measures. The one with the lowest SS is the line of best fit. Fortunately we don’t have to do this because the method of least squares does it for us: it selects the line that has the lowest sum of squared differences (i.e. the line that best represents the observed data). How exactly it does this is by using a mathematical technique for finding maxima and minima and this technique is used to find the line that minimizes the sum of squared differences. I don’t really know much more about it than that, to be honest, so I tend to think of the process as a little bearded wizard called Nephwick the Line Finder who just magically finds lines of best fit. Yes, he lives inside your computer. The end result is that Nephwick estimates the value of the slope and intercept of the ‘line of best fit’ for you. We tend to call this line of best fit a regression line. 7.2.3.  Assessing the goodness of fit: sums of squares, R and R2 1 Once Nephwick the Line Finder has found the line of best fit it is important that we assess how well this line fits the actual data (we assess the goodness of fit of the model). We do this because even though this line is the best one available, it can still be a lousy fit to the data! In section 2.4.2 we saw that one measure of the adequacy of a model is the sum of squared differences (or more generally we assess models using equation (7.3) below). If we want to assess the line of best fit, we need to compare it against something, and the thing we choose is the most basic model we can find. So we use equation (7.3) to calculate the fit of the most basic model, and then the fit of the best model (the line of best fit), and basically if the best model is any good then it should fit the data significantly better than our basic model: deviation = X ðobserved − modelÞ2 (7.3) This is all quite abstract so let’s look at an example. Imagine that I was interested in predicting record sales (Y) from the amount of money spent advertising that record (X). One day my boss came into my office and said ‘Andy, I know you wanted to be a rock star and you’ve ended up working as my stats-monkey, but how many records will we sell if we spend £100,000 on advertising?’ If I didn’t have an accurate model of the relationship between record sales and advertising, what would my best guess be? Well, probably the best answer I could give would be the mean number of record sales (say, 200,000) because on average that’s how many records we expect to sell. This response might well satisfy a brainless record company executive (who didn’t offer my band a record contract). However, what if he had asked ‘How many records will we sell if we spend £1 on advertising?’ Again, in the absence of any accurate information, my best guess would be to give the average number of sales (200,000). There is a problem: whatever amount of money is spent on advertising I How do I tell if my model is good? 176 DISCOVERING STATISTICS USING SAS always predict the same level of sales. As such, the mean is a model of ‘no relationship’ at all between the variables. It should be pretty clear then that the mean is fairly useless as a model of a relationship between two variables – but it is the simplest model available. So, as a basic strategy for predicting the outcome, we might choose to use the mean, because on average it will be a fairly good guess of an outcome, but that’s all. Using the mean as a model, we can calculate the difference between the observed values, and the values predicted by the mean (equation (7.3)). We saw in section 2.4.1 that we square all of these differences to give us the sum of squared differences. This sum of squared differences is known as the total sum of squares (denoted SST ) because it is the total amount of differences present when the most basic model is applied to the data. This value represents how good the mean is as a model of the observed data. Now, if we fit the more sophisticated model to the data, such as a line of best fit, we can again work out the differences between this new model and the observed data (again using equation (7.3)). In the previous section we saw that the method of least squares finds the best possible line to describe a set of data by minimizing the difference between the model fitted to the data and the data themselves. However, even with this optimal model there is still some inaccuracy, which is represented by the differences between each observed data point and the value predicted by the regression line. As before, these differences are squared before they are added up so that the directions of the differences do not cancel out. The result is known as the sum of squared residuals or residual sum of squares (SSR ). This value represents the degree of inaccuracy when the best model is fitted to the data. We can use these two values to calculate how much better the regression line (the line of best fit) is than just using the mean as a model (i.e. how much better is the best possible model than the worst model?). The improvement in prediction resulting from using the regression model rather than the mean is calculated by calculating the difference between SST and SSR . This difference shows us the reduction in the inaccuracy of the model resulting from fitting the regression model to the data. This improvement is the model sum of squares (SSM ). Figure 7.4 shows each sum of squares graphically. If the value of SSM is large then the regression model is very different from using the mean to predict the outcome variable. This implies that the regression model has made a big improvement to how well the outcome variable can be predicted. However, if SSM is small then using the regression model is little better than using the mean (i.e. the regression model is no better than taking our ‘best guess’). A useful measure arising from these sums of squares is the proportion of improvement due to the model. This is easily calculated by dividing the sum of squares for the model by the total sum of squares. The resulting value is called R2 and to express this value as a percentage you should multiply it by 100. R2 represents the amount of variance in the outcome explained by the model (SSM ) relative to how much variation there was to explain in the first place (SST ). Therefore, as a percentage, it represents the percentage of the variation in the outcome that can be explained by the model: R2 = SSM SST (7.4) This R2 is the same as the one we met in Chapter 6 (section 6.5.2.3) and you might have noticed that it is interpreted in the same way. Therefore, in simple regression we can take the square root of this value to obtain Pearson’s correlation coefficient. As such, the correlation coefficient provides us with a good estimate of the overall fit of the regression model, and R2 provides us with a good gauge of the substantive size of the relationship. A second use of the sums of squares in assessing the model is through the F-test. I mentioned way back in Chapter 2 that test statistics (like F) are usually the amount of systematic variance divided by the amount of unsystematic variance, or, put another way, the model compared against the error in the model. This is true here: F is based upon the ratio of the improvement due to the model (SSM ) and the difference between the model and the observed data (SSR ). Actually, because the sums of squares depend on the number 177CHAPTER 7 Regression of differences that we have added up, we use the average sums of squares (referred to as the mean squares or MS). To work out the mean sums of squares we divide by the degrees of freedom (this is comparable to calculating the variance from the sums of squares – see section 2.4.2). For SSM the degrees of freedom are simply the number of variables in the model, and for SSR they are the number of observations minus the number of parameters being estimated (i.e. the number of beta coefficients including the constant). The result is the mean squares for the model (MSM ) and the residual mean squares (MSR ). At this stage it isn’t essential that you understand how the mean squares are derived (it is explained in Chapter 10). However, it is important that you understand that the F-ratio (equation (7.5)) is a measure of how much the model has improved the prediction of the outcome compared to the level of inaccuracy of the model: F = MSM MSR (7.5) Figure 7.4 Diagram showing from where the regression sums of squares derive 178 DISCOVERING STATISTICS USING SAS If a model is good, then we expect the improvement in prediction due to the model to be large (so MSM will be large) and the difference between the model and the observed data to be small (so MSR will be small). In short, a good model should have a large F-ratio (greater than 1 at least) because the top of equation (7.5) will be bigger than the bottom. The exact magnitude of this F-ratio can be assessed using critical values for the corresponding degrees of freedom (as in the Appendix). 7.2.4.   Assessing individual predictors 1 We’ve seen that the predictor in a regression model has a coefficient (b1 ), which in simple regression represents the gradient of the regression line. The value of b represents the change in the outcome resulting from a unit change in the predictor. If the model was useless at predicting the outcome, then if the value of the predictor changes, what might we expect the change in the outcome to be? Well, if the model is very bad then we would expect the change in the outcome to be zero. Think back to Figure 7.4 (see the panel representing SST ) in which we saw that using the mean was a very bad way of predicting the outcome. In fact, the line representing the mean is flat, which means that as the predictor variable changes, the value of the outcome does not change (because for each level of the predictor variable, we predict that the outcome will equal the mean value). The important point here is that a bad model (such as the mean) will have regression coefficients of 0 for the predictors. A regression coefficient of 0 means: (1) a unit change in the predictor variable results in no change in the predicted value of the outcome (the predicted value of the outcome does not change at all); and (2) the gradient of the regression line is 0, meaning that the regression line is flat. Hopefully, you’ll see that it logically follows that if a variable significantly predicts an outcome, then it should have a b-value significantly different from zero. This hypothesis is tested using a t-test (see Chapter 9). The t-statistic tests the null hypothesis that the value of b is 0: therefore, if it is significant we gain confidence in the hypothesis that the b-value is significantly different from 0 and that the predictor variable contributes significantly to our ability to estimate values of the outcome. Like F, the t-statistic is also based on the ratio of explained variance to unexplained variance or error. Well, actually, what we’re interested in here is not so much variance but whether the b we have is big compared to the amount of error in that estimate. To estimate how much error we could expect to find in b we use the standard error. The standard error tells us something about how different b-values would be across different samples. We could take lots and lots of samples of data regarding record sales and advertising budgets and calculate the b-values for each sample. We could plot a frequency distribution of these samples to discover whether the b-values from all samples would be relatively similar, or whether they would be very different (think back to section 2.5.1). We can use the standard deviation of this distribution (known as the standard error) as a measure of the similarity of b-values across samples. If the standard error is very small, then it means that most samples are likely to have a b-value similar to the one in our sample (because there is little variation across samples). The t-test tells us whether the b-value is different from 0 relative to the variation in b-values across samples. When the standard error is small even a small deviation from zero can reflect a meaningful difference because b is representative of the majority of possible samples. Equation (7.6) shows how the t-test is calculated and you’ll find a general version of this equation in Chapter 9 (equation (9.1)). The bexpected is simply the value of b that we would expect to obtain if the null hypothesis were true. I mentioned earlier that the null 179CHAPTER 7 Regression hypothesis is that b is 0 and so this value can be replaced by 0. The equation simplifies to become the observed value of b divided by the standard error with which it is associated: t = bobserved − bexpected SEb = bobserved SEb (7.6) The values of t have a special distribution that differs according to the degrees of freedom for the test. In regression, the degrees of freedom are N − p − 1, where N is the total sample size and p is the number of predictors. In simple regression when we have only one predictor, this reduces down to N − 2. Having established which t-distribution needs to be used, the observed value of t can then be compared to the values that we would expect to find if there was no effect (i.e. b = 0): if t is very large then it is unlikely to have occurred when there is no effect (these values can be found in the Appendix). SAS provides the exact probability that the observed value (or a larger one) of t would occur if the value of b was, in fact, 0. As a general rule, if this observed significance is less than .05, then scientists assume that b is significantly different from 0; put another way, the predictor makes a significant contribution to predicting the outcome. 7.3.  Doing simple regression on SAS 1 So far, we have seen a little of the theory behind regression, albeit restricted to the situation in which there is only one predictor. To help clarify what we have learnt so far, we will go through an example of a simple regression on SAS. Earlier on I asked you to imagine that I worked for a record company and that my boss was interested in predicting record sales from advertising. There are some data for this example in the file Record1.sas7bdat. This data file has 200 rows, each one representing a different record. There are also two columns, one representing the sales of each record in the week after release and the other representing the amount (in pounds) spent promoting the record before release. This is the format for entering regression data: the outcome variable and any predictors should be entered in different columns, and each row should represent independent values of those variables. The pattern of the data is shown in Figure 7.5 and it should be clear that a positive relationship exists: so, the more money spent advertising the record, the more it is likely to sell. Of course there are some records that sell well regardless of advertising (top left of scatterplot), but there are none that sell badly when advertising levels are high (bottom right of scatterplot). The scatterplot also shows the line of best fit for these data: bearing in mind that the mean would be represented by a flat line at around the 200,000 sales mark, the regression line is noticeably different. To find out the parameters that describe the regression line, and to see whether this line is a useful model, we need to run a regression analysis. To do the analysis you need to use PROC REG. Proc reg syntax is very straightforward, and is shown in SAS Syntax 7.1. The model statement is written in the form of the equation: we think that sales are a function of adverts, so we write sales = adverts. Notice that Proc Reg is slightly different to other procedures, because we need to write quit; after the run statement. 180 DISCOVERING STATISTICS USING SAS AdvertsingBudget(thousandsofpounds) 0100020003000 Record Sales (thousands) 0 100 200 300 400 Figure 7.5 Scatterplot showing the relationship between record sales and the amount spent promoting the record PROC REG DATA=chapter6.record1; MODEL sales = adverts; RUN; QUIT; SAS Syntax 7.1 7.4.  Interpreting a simple regression 1 7.4.1.   Overall fit of the model 1 The output from the regression is shown in SAS Output 7.1. The first part of the output reports an analysis of variance (ANOVA – see Chapter 10). The summary table shows the various sums of squares described in Figure 7.4 and the degrees of freedom associated with each. From these two values, the average sums of squares (the mean squares) can be calculated by dividing the sums of squares by the associated degrees of freedom. The most important part of the table is the F-ratio, which is calculated using equation (7.5), and the associated significance value of that F-ratio. For these data, F is 99.59, which is significant at p < .001 (because the value in the column labelled Sig. is less than .001). Researchers usually dont report p-values below 0.001 though. This result tells us that there is less than a 0.1% chance that an F-ratio at least this large would happen if the null hypothesis were true. Therefore, we can conclude that our regression model results in significantly better prediction of record sales than if we used the mean value of record sales. In short, the regression model overall predicts record sales significantly well. The second part of the output is a summary of the model. This summary table provides the value of R and R2 for the model that has been derived (as well as some other things we are not going to worry about for now). For these data, R has a value of .578 and because there is only one predictor, this value represents the simple correlation between advertising 181CHAPTER 7 Regression and record sales (you can confirm this by running a correlation using what you were taught in Chapter 6). The value of R2 is .335, which tells us that advertising expenditure can account for 33.5% of the variation in record sales. In other words, if we are trying to explain why some records sell more than others, we can look at the variation in sales of different records. There might be many factors that can explain this variation, but our model, which includes only advertising expenditure, can explain approximately 33% of it. This means that 67% of the variation in record sales cannot be explained by advertising alone. Therefore, there must be other variables that have an influence also. 7.4.2.   Model parameters 1 The ANOVA tells us whether the model, overall, results in a significantly good degree of prediction of the outcome variable. However, the ANOVA doesn’t tell us about the individual contribution of variables in the model (although in this simple case there is only one variable in the model and so we can infer that this variable is a good predictor. The third part of the output provides details of the model parameter estimates (the beta values) and the significance of these values. We saw in equation (7.2) that b0 was the Y intercept and this value is the value labelled Parameter Estimate (in the SAS output) for the constant. So, from the table, we SAS Output 7.1 How do I interpret b values? The REG Procedure Model: MODEL1 Dependent Variable: SALES Record Sales (thousands) Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 433688 433688 99.59 <.0001 Error 198 862264 4354.86953 Corrected Total 199 1295952 Root MSE 65.99144 R-Square 0.3346 Dependent Mean 193.20000 Adj R-Sq 0.3313 Coeff Var 34.15706 Parameter Estimates Variable Label DF Parameter Estimate Standard Error t Value Pr > |t| Intercept Intercept 1 134.13994 7.53657 17.80 <.0001 ADVERTS Advertsing Budget (thousands of pounds) 1 0.09612 0.00963 9.98 <.0001 182 DISCOVERING STATISTICS USING SAS can say that b0 is 134.14, and this can be interpreted as meaning that when no money is spent on advertising (when X = 0), the model predicts that 134,140 records will be sold (remember that our unit of measurement was thousands of records). We can also read off the value of b1 from the table and this value represents the gradient of the regression line. It is 0.096. Although this value is the slope of the regression line, it is more useful to think of this value as representing the change in the outcome associated with a unit change in the predictor. Therefore, if our predictor variable is increased by one unit (if the advertising budget is increased by 1), then our model predicts that 0.096 extra records will be sold. Our units of measurement were thousands of pounds and thousands of records sold, so we can say that for an increase in advertising of £1000 the model predicts 96 (0.096 × 1000 = 96) extra record sales. As you might imagine, this investment is pretty bad for the record company: it invests £1000 and gets only 96 extra sales! Fortunately, as we already know, advertising accounts for only one-third of record sales. We saw earlier that, in general, values of the regression coefficient b represent the change in the outcome resulting from a unit change in the predictor and that if a predictor is having a significant impact on our ability to predict the outcome then this b should be different from 0 (and big relative to its standard error). We also saw that the t-test tells us whether the b-value is different from 0. SAS provides the exact probability that the observed value of t would occur if the value of b in the population were 0. If this observed significance is less than .05, then scientists agree that the result reflects a genuine effect (see Chapter 2). For these two values, the probabilities are <.0001 and so we can say that the probability of these t values (or larger) occurring if the values of b in the population were 0 is less than .0001. Therefore, the bs are different from 0 and we can conclude that the advertising budget makes a significant contribution (p < .0001) to predicting record sales. SELF-TEST How is the t in SAS Output 7.1 calculated? Use the values in the table to see if you can get the same value as SAS. 7.4.3.   Using the model 1 So far, we have discovered that we have a useful model, one that significantly improves our ability to predict record sales. However, the next stage is often to use that model to make some predictions. The first stage is to define the model by replacing the b-values in equation (7.2) with the values from SAS Output 7.1. In addition, we can replace the X and Y with the variable names so that the model becomes: record salesi = b0 + b1advertising budgeti = 134:14 + ð0:096 × advertising budgetiÞ (7.7) It is now possible to make a prediction about record sales, by replacing the advertising budget with a value of interest. For example, imagine a record executive wanted to spend £100,000 on advertising a new record. Remembering that our units are already in thousands of pounds; we can simply replace the advertising budget with 100. He would discover that record sales should be around 144,000 for the first week of sales: