cHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
393
10
FIGURE 10.1 My 10th birthday. (From left to right) My brother Paul (who still hides behind cakes rather than have his photo taken), Paul Spreckley, Alan Palsey, Clair Sparks and me
Moderation, mediation and more regression M
Vi \:
10.1. What will this chapter tell me? ©
Having successfully slayed audiences at holiday camps around the country, my next step towards global domination was my primary school. I had learnt another Chuck Berry song ('Johnny B. Goode'), but also broadened my repertoire to include songs by other artists (I have a feeling 'Over the edge' by Status Quo was one of them).1 Needless to say, when the opportunity came to play at a school assembly I jumped at it. The headmaster tried to have me banned,2 but the show went on. It was a huge success (I want to reiterate my earlier
1 This would have been about 1982, so just before they became the most laughably bad band on the planet. Some would argue that they were always the most laughably bad band on the planet, but they were the first band that
1 called my favourite band.
2 Seriously! Can you imagine a headmaster banning a 10-year-old from assembly? By this time I had an electric guitar and he used to play hymns on an acoustic guitar; I can assume only that he somehow lost all perspective on the situation and decided that a 10-year-old blasting out some Quo in a squeaky little voice was subversive or something.
point that 10-year-olds are very easily impressed). My classmates carried me around the playground on their shoulders. I was a hero. Around this time I had a childhood sweetheart called Clair Sparks. Actually, we had been sweethearts since before my newfound rock legend status. I don't think the guitar playing and singing impressed her much, but she rode a motorbike (really, a little child's one) which impressed me quite a lot; I was utterly convinced that we would one day get married and live happily ever after. I was utterly convinced, that is, until she ran off with Simon Hudson. Being 10, she probably literally did run off with him - across the playground. I remember telling my parents and them asking me how I felt about it. I told them I was being philosophical about it. I probably didn't know what philosophical meant at the age of 10, but I knew that it was the sort of thing you said if you were pretending not to be bothered about being dumped.
If I hadn't been philosophical, I might have wanted to look at what had lowered Clair's relationship satisfaction. We've seen in previous chapters that we could predict things like relationship satisfaction using regression. Perhaps it's predicted from your partner's love of rock bands like Status Quo (I don't recall Clair liking that sort of thing). However, life is usually more complicated than this; for example, your partner's love of rock music probably depends on your own love of rock music. For example, if you both like rock music then your love of the same music might have an additive effect, giving you huge relationship satisfaction {moderation), or perhaps the relationship between your partner's love of rock and your own relationship satisfaction can be explained by your own music tastes (mediation). In the previous chapter we also saw that regression could be done with a dichotomous predictor (e.g., rock fan or not) but what if you wanted to categorize musical taste into several categories (rock, hip-hop, R & B etc.)? Surely you can't use multiple categories as a predictor in regression? This chapter extends what we know about regression to these more complicated scenarios. First we look at two common regression-based models - moderation and mediation - before expanding what we already know about categorical predictors.
10.2. Installing custom dialog boxes in SPSS (D
Although you can do both moderation and mediation analysis in SPSS manually, it's a bit of a faff. It will require you to create new variables using the compute command, and in the case of mediation analysis it will limit what you can do considerably. By far the best way to tackle moderation and mediation is to use the PROCESS command. This is not part of SPSS; it exists only because Andrew Hayes and his colleague Kristopher Preacher have spent an enormous amount of time writing a range of tools for doing moderation and mediation analyses (e.g., Hayes & Matthes, 2009; Preacher & Hayes, 2004, 2008a). These tools were previously available only through syntax, and for inexperienced users were a bit scary and fiddly. Andrew Hayes wrote the PROCESS custom dialog box (Hayes, 2012) to wrap the Preacher and Hayes mediation and moderation tools in a convenient menu and dialog box interface. It's pretty much the best thing to happen to moderation and mediation analysis in a long time. While using these tools, I strongly suggest you spare a thought of gratitude that there are people like Hayes and Preacher in the world who invest their spare time doing cool stuff like this that makes it possible for you to analyse your data without having a nervous breakdown. Even if you think you are having a nervous breakdown, trust me it's not as big as the one you'd be having if PROCESS didn't exist.
The PROCESS tool is what's known as a custom dialog box. SPSS includes the ability to add your own menus and dialog boxes, which means that you can write your own functions using syntax, but then create a custom menu and dialog box for yourself so that you can access the syntax through a nice point and click menu. Of course, most of us will never use this feature, but Andrew Hayes has. Essentially, he provides a file (process.spd) that you download, which installs a new menu into the Analyze Regression > menu.
392
394
FIGURE 10.2 Installing the PROCESS menu
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
From this menu you access a dialog box that can be used to do moderation and mediation analysis.
You install PROCESS in three easy steps, which are illustrated in Figure 10.2 (MacOS users can ignore step 2):
1 Download the install file: Download the file process.spd from Andrew Hayes' website: http://www.afhayes.com/spss-sas-and-mplus-macros-and-code.html. Save this file onto your computer.
2 Start SPSS as an administrator: To install the tool in Windows, you need to start IBM SPSS as an administrator. To do this, make sure that SPSS isn't already running, and then click on the start menu (*.rv). Select > mip^™ ? which will display a list of programs installed on your machine. Within that list, there should be a folder called IBM SPSS Statistics. Select that folder to display its contents. You should see this icon within that folder: ibm spss statistics20 (don't be worried if the number is different from 20, it just refers to the version of SPSS that you have installed). Click on this icon with the right mouse button to activate the menu in Figure 10.2. Within this menu select (you're back to using the left mouse button now) *, Run as administrator. This action opens SPSS but allows it to make changes to your computer. A dialog box will appear that asks you whether you want to let SPSS make changes to your computer and you should select f VbTL
3 Once SPSS has loaded select utilities custom Dialogs ►flf mstaii custom Dialog 3 which will open a standard dialog box for opening files (Figure 10.2). Locate the file process, spd, select it, and click on ( a™ I. This will install the PROCESS menu and dialog boxes into SPSS. If you get an error message, the most likely explanation is that you haven't opened SPSS as an administrator (see step 2). Once the installation is complete you'll find that the PROCESS menu has been added to the existing Analyze Regression                  > menu (Figure 10.3).
Default Programs Desktop Gadget Gal!* Internet Explorer
Microsoft Security Es ^ Windows Anytime U Windows DVDMakei Windows Fax and Sci Wtndows Media Cen' Windows Media P5a> Windows Update XPS Viewer Accessories Dropbox Games
Í8M SPSS Statistics *
5 IBM SPSS Štatisti*
6 IBM SPSS Statistics M a inter, a nee Microsoft Mouse Startup
Sack
Open
Run as administrator Troubisshoot compatibihty Open file location
Scan with Microsoft Security Essentials., Unpin from Taskbar Ptn to Start Menu
Restore previous versrons
Send to
Cut Copy
De'ete Rename
Properties
Default Programs
feMhig   Ouphs   pHH "vto-ons I Variables
i
Q OMS Control Panel .
i,% QMS ItSentltiérs
44J So>nng Wizard.
3| Merge Model XML J Data f tie Ccnmenla f Oefins Vanasle Sets I Use Variable sais
-ri> Run Scrip!
S FrodiicJior, FacRp Map Conversion Utility Custom Dialogs gxisnsion Bundles
sin a of 0 Vangfles
I
I install Custom CHalag 3 Custom Dialofj SiHkiar
\m '-;pí--E fíídiisisrs Processor is read)
Search pmgtúm and
HP"
Help and Support
Windows Security
1 i
* /
1^ 0(m- . ilog
Look in ; Data
m r Mil
*jr pro«»i.acd
FH« name'    .process ejsa
M'fc-i (j:t,;ju c^-lcf] iVKMgí' III-..' iv
CHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
10.3. Moderation: interactions in regression
395
The conceptual model
So far we have looked at individual predictors in the linear model. However, it is possible for a statistical model to include the combined effect of two or more predictor variables on an outcome. The combined effect of two variables on another is known conceptually as moderation, and in statistical terms as an interaction effect. We'll start with the conceptual and we'll use an example of whether violent video games make people antisocial. Video games are among the favourite online activities for young people: two-thirds of 5-16-year-olds have their own video games console, and 88% of boys aged 8-15 own at least one games console (Ofcom (Office of Communications), 2008). Although playing violent video games can enhance visuospatial acuity, visual memory, probabilistic inference, and mental rotation (Feng, Spence, & Pratt, 2007; Green & Bavelier, 2007; Green, Pouget, & Bavelier, 2010; Mishra, Zinni, Bavelier, & Hillyard, 2011), compared to games such as Tetris, these games have also been linked to increased aggression in youths (Anderson & Bushman, 2001). Another predictor of aggression and conduct problems is callous-unemotional traits such as lack of guilt, lack of empathy, and callous use of others for personal gain (Rowe, Costello, Angold, Copeland, & Maughan, 2010). Imagine a scientist wanted to look at the relationship between playing violent video games such as Grand Theft Auto, MadWorld and Manhunt and aggression. She gathered data from 442 youths (Video Games.sav). She measured their aggressive behaviour (Aggression), callous unemotional traits (CaUnTs), and the number of hours per week they play video games (Vid_Games).
"   Untitled! (DataSetO] - EM SPSS Statistics Data Edrtor .= II <3
file   Edit   View   Data   Transform   £jial,ze   Direct Marketino   Graphs   Utilities   Add-ons   Window Help_
■	e"!
	
1     var     f     var |	
1	
2	
3	-.............F........-!
L 4	
L ... S	
6	
7	
9	
! 9	
10	
11	
	
Data View Variable View	
PROCESS, by Andrew F Hayes (http //ww	
	
Reports
Descriptive Statistics Taales
Compare Means
General Linear Model
Generalized Linear Models
Mixed Models
Correlate
Regression
Loglinear
Neural Networks
Classify
Dimension Reduction Scale
Nonparametnc Tests
Forecasting
Survival
Multiple Response: H Missing Value Analysis .
Multiple imputation
Co mplex Samples
Quality Control Q ROC Curve
^3
Al
Visible 0 of 0 Vanatles
1
f~| Automatic Linear Modeling... Q Linear
kgj Curve Estimation 01 Partial Least Squares . □ Binary Logistic. I] Multinomial Logistic . L~J Ordinal. Probit
PROCESS, by Andrew F Hayes (http■jiwww.afha?* [7] Nonlinear [;,-] Weight Estimation
2-Stage Least Squares
Optimal Scaima (CATREG)..
1—■------
FIGURE 10.3 After
installation, the PROCESS menu appears as part of the existing Regression menu
396
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
CHAPTER 10 MODERATION, MEDIATION AND MORE REGRESSION
FIGURE 10.4 Diagram of the conceptual moderation model
FIGURE 10.5 A categorical moderator (callous traits)
	Moderator		
			
Predictor	>		Outcome
			
Callous Traits
— Not Callous Callous
10        15       20       25 30
Hours Playing Video Games
35
Let's assume we're interested in the relationship between the hours spent playing these games (predictor) and aggression (outcome). The conceptual model of moderation is shown in Figure 10.4, and this diagram shows that a moderator variable is one that affects the relationship between two others. If callous-unemotional traits were a moderator then we're saying that the strength or direction of the relationship between game playing and aggression is affected by callous-unemotional traits.
Imagine that we could classify people in terms of callous-unemotional traits: they either have them or they don't. Our moderator variable would be categorical (callous or not callous). Figure 10.5 shows an example of how moderation would work in this case: for people who are not callous there is no relationship between video games and aggression (the line is completely flat), but for people who are callous there is a positive relationship because the more time spent playing these games, the higher the aggression levels (the line slopes upwards). Therefore, callous-unemotional traits moderate the relationship between video games and aggression: there is a positive relationship in those with callous-unemotional traits but not for those without. This is the simplest way to think about moderation. However, it is not necessary that there is an effect in one group but not in the other, all we're looking for is a change in the relationship between video games and aggression in the two callousness groups. It could be that the effect is weakened or changes direction.
No Moderation/Interaction
397
Moderation/Interaction
If we measure the moderator variable along a continuum it becomes a bit trickier to visualize, but the basic interpretation stays the same. Figure 10.6 shows two graphs that display the relationships between the time spent playing video games, aggression and callous-unemotional traits (measured along a continuum rather than as two groups). We're still interested in how the relationship between video games and aggression changes as a function of callous-unemotional traits. We can do this by comparing the slope of the regression plane for time spent gaming at low and high values of callous traits. To help you I have added blue arrows that show the relationship between video games and aggression. In the left of the diagram you can see that at the low end of the callous-unemotional traits scale, there is a slight positive relationship between playing video games and aggression (as time playing games increases so does aggression). At the high end of the callous-unemotional traits scale, we see a very similar relationship between video games and aggression (the ends of the regression planes slope at the same angle). The same is also true at the middle of the callous-unemotional traits scale. This is a case of no interaction or no moderation. The right of Figure 10.6 shows an example of moderation: at low values of callous-unemotional traits the plane slopes downwards, indicating a slightly negative relationship between playing video games and aggression, but at the high end of callous-unemotional traits the plane slopes upwards, indicating a strong positive relationship between gaming and aggression. At the midpoint of the callous-unemotional traits scale, the relationship between video games and aggression is relatively flat. So, as we move along the callous-unemotional traits variable, the relationship between gaming and aggression changes from slightly negative to neutral to strongly positive. We can say that the relationship between violent video games and aggression is moderated by callous-unemotional traits.
10.3.2.
The statistical model
Now we know what moderation is conceptually, let's look at how we explore these effects within a statistical model. Figure 10.7 shows how we conceptualize moderation statistically: we predict the outcome from the predictor variable, the proposed moderator, and the interaction of the two. It is the interaction effect that tells us whether moderation has occurred, but we must include the predictor and moderator as well for the interaction term to be valid. This point is very important. In our example, then, we'd be looking at doing a
FIGURE 10.6 A continuous moderator (callous traits)
398
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
CHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
399
FIGURE 10.7 Diagram of the statistical moderation model
played violent video games we'd measured their heart rate while playing the games as an indicator of their physiological reactivity to them:
Outcome
regression in which aggression was the outcome, and we would predict it from video game playing, callous-unemotional traits and their interaction.
All of the general linear models we've considered in this book take the general form of:
outcome,- = (model) + error,-
When we encountered multiple regression in Chapter 8 we saw that this model was written as (see equation (8.6)):
Y{ = (b0 +   Xu + b2X2i + - + b„Xni) + s,
Therefore, our basic regression model for this example would be:
Aggression, = (b0 + ^Gaming,- + b2Callous,-) +
However, to test for moderation we need to consider the interaction between gaming and callous-unemotional traits. If we want to include this term too, then we have seen before that we can extend the linear model to include extra terms, and each time we do we assign them a parameter (b). A model that tests for moderation, therefore, is as follows (first expressed generally and then in terms of this specific example):
Y!=(b0+b1Ai + b2B,+b3AB!) + £i Aggression ■ = (bQ + ^Gaming; + &2Callous,- + ^Interaction,) + e{
(10.1)
10.3.3.
Centring variables
When an interaction term is included in the model the b parameters have a specific meaning: for the individual predictors they represent the regression of the outcome on that predictor when the other predictor is zero. So, in equation (10.1), bl represents the relationship between aggression and gaming when callous traits are zero, and b2 represents the relationship between aggression and callous traits when someone spends zero hours gaming per week. In our particular example this interpretation isn't problematic because zero is a meaningful score for both predictors: it's plausible that a child spends no hours playing video games, and it is plausible that a child gets a score of 0 on the continuum of callous-unemotional traits. However, there are often situations where it makes no sense for a predictor to have a score of zero. Imagine that rather than measuring how much a child
Aggression, = (bQ + ^Heart Rate, + £>2Callous, + ^Interaction,)
In this model b2 is the regression of aggression on callous traits when someone has a heart rate of zero while playing the games. This b makes no sense unless we're interested in knowing something about the relationship between callous traits and aggression in youths who die (and therefore have a heart rate of zero) while playing these games. It's fair to say that in the unlikely event that playing a video game actually killed someone, we wouldn't really have to worry one way or another about them subsequently developing aggression. Hopefully this example illustrates that the presence of the interaction term makes the bs for the main predictors uninterpretable in many situations.
For this reason, it is common to transform the predictors using grand mean centring. Centring refers to the process of transforming a variable into deviations around a fixed point. This fixed point can be any value that you choose, but typically it's the grand mean. When we calculated z-scores in Chapter 1 we used grand mean centring because the first step was to take each score and subtract from it the mean of all scores. This is grand mean centring. Like z-scores, the subsequent scores are centred on zero, but unlike z-scores we don't care about expressing the centred scores as standard deviations.3 Therefore, grand mean centring for a given variable is achieved by taking each score and subtracting from it the mean of all scores (for that variable).
Centring the predictors has no effect on the b for highest-order predictor, but will affect the bs for the lower-order predictors. 'Highest-order' and 'lower-order' refer to how many variables are involved: so the gaming x callous traits interaction is a higher-order effect than the effect of gaming alone because it involves two variables rather than one. So, in our model (equation (10.1)), whether or not we centre the predictors will have no effect on b3 (the parameter for the interaction) but it will change the values of b and b. (the parameters for gaming and callous traits). As we have seen, if we don't centre the gaming and callous variables, then the bs represent the effect of the predictor when the other predictor is zero. However, if we centre the gaming and callous variables then the bs represent the effect of the predictor when the other predictor is its mean value. For example, b2 represents the relationship between aggression and callous traits for someone who spends the average number of hours gaming per week.
Therefore, centring is particularly important when your model contains an interaction term because it makes the bs for lower-order effects interpretable. There are good reasons for not caring about the lower-order effects when the higher-order interaction involving those effects is significant, but when it is not, centring will make interpreting the main effects easier. For example, if the gaming x callous traits interaction is significant, then it's not clear why we would be interested in the individual effects of gaming and callous traits. In any case, with centred variables the bs for individual predictors have two interpretations: (1) they are the effect of that predictor at the mean value of the sample; and (2) they are the average effect of the predictor across the range of scores for the other predictors. To explain the second interpretation, imagine we took everyone who spent no hours gaming and computed the regression between aggression and callous traits and noted the b, then we took everyone who played games for 1 hour and did the same, then we took everyone who gamed for 2 hours per week and did the same. We continued doing this until we had computed regressions for every different value of the hours spent gaming. We'd have a lot of bs: each one representing the relationship between callous traits and aggression but for
! Remember that with z-scores we go a step further and divide the centred scores by the standard deviation of the original data, which changes the units of measurements to standard deviations.
400
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
different amounts of gaming. If we took an average of these bs then we'd get the same value as the b for callous traits (centred) when we use it as a predictor with gaming (centred) and their interaction.
The PROCESS tool will do the centring for us so we don't really need to worry too much about how it's done, but because centring is useful in other analyses Oliver Twisted has some additional material that shows you how to do it manually for this example.
OLIVER TWISTED
Please, Sir, can I have some more ... centring?
'Recentgin', babbles Oliver as he stumbles drunk out of Mrs Moonshine's alcohol emporium. 'I've had some recent gin.' | think you mean centring Oliver, not recentgin. If you want to know how to centre your variables using SPSS, then the additional material for this chapter on the companion website will tell you.
10.3.4.
Creating interaction variables
Equation (10.1) contains a variable called 'Interaction', but the data file does not. The question you might well ask is how we enter a variable into the model that doesn't exist in the data set. We can create it, and it's easier than you might think. Mathematically speaking, when we look at the combined effect of two variables (an interaction) we are literally looking at the effect of the two variables multiplied together. So the interaction variable in this case would literally be the scores on the time spent gaming multiplied by the scores for callous-unemotional traits. That's why interactions are denoted as variable 1 x variable 2. The way we'll do moderation analysis in SPSS creates the interaction variable for you, but the self-help task gives you some practice at doing it manually (which might be handy for future reference).
jr ... '
SELF-TEST Follow Oliver Twisted's instructions to create the centred variables CUT_Centred and Vid_Centred. Then use the compute command to create a new variable called Interaction in the Video Games.sav file, which is CUT_ Centred multiplied by Vid_Centred.
10.3.5.
Following up an interaction effect
As we have already seen, moderation is shown by a significant interaction between variables. However, if the moderation effect is significant, then we need to delve a bit deeper to find out the nature of the moderation. In our example, we're predicting that the moderator (callous traits) will influence the relationship between playing violent video games and aggression. If the interaction of callous traits and time spent gaming is a significant predictor of aggression then we know that we have a moderation effect, but we don't know the nature of the effect. It could be that the time spent gaming always has a positive relationship with aggression, but that relationship gets stronger the more a person has callous traits. Alternatively, perhaps in people low on callous traits the time spent gaming reduces aggression but it increases aggression in those high on callous traits (i.e., the relationship
CHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
reverses). To find out what is going on we need to do something known as simple slopes analysis (Aiken 8c West, 1991; Rogosa, 1981).
The idea behind simple slopes analysis is fairly straightforward and it's really no different than what was illustrated in Figure 10.6. When describing that figure I talked about comparing the relationship between the predictor (time spent gaming) and outcome (aggression) at low and high levels of the moderator (callous traits). For example, in the right panel of Figure 10.6, we saw that time spent gaming and aggression had a slightly negative relationship at low levels of callous traits, but a fairly strong positive relationship at high levels of callous traits. This is the essence of simple slopes analysis: we work out the regression equations for the predictor and outcome and low, high and average levels of the moderator. The 'high' and 'low' levels can be anything you like, but PROCESS uses 1 standard deviation above and below the mean value of the moderator. Therefore, in our example, we would get the regression model for aggression predicted from hours spent gaming for the average value of callous traits, for 1 standard deviation above the mean value of callous traits and for one standard deviation below the mean value of callous traits. We compare these slopes both in terms of their significance, and the value and direction of the b to see whether the relationship between hours spent gaming and aggression changes at different levels of callous traits.
A slightly different approach is to look at how the relationship between the predictor and outcome changes at lots of different values of the moderator (not just at high, low and mean values). One such approach implemented by PROCESS is based on Johnson and Neyman (1936). Essentially, it computes the regression model for the predictor and outcome at lots of different values of the moderator. For each model it computes the significance of the regression slope so you can see for which values of the moderator the relationship between the predictor and outcome is significant. It returns a 'zone of significance',4 which consists of two values of the moderator. Typically, for values in between these two values of the moderator the predictor does not significantly predict the outcome. Values below the lower value and above the upper value are values of the moderator for which the predictor significantly predicts the outcome.
401
10.3.6.
Running the analysis
Given that moderation is demonstrated through a significant interaction between the predictor and moderator in a regression, we could follow the general procedure for fitting linear models in Chapter 8 (Figure 8.11). We would first centre the predictor and moderator, then create the interaction term as discussed already, then run a forced entry regression with the centred predictor, centred moderator and the interaction of the two centred variables as predictors. The advantage of this approach is that we can inspect sources of bias in the model.
SELF-TEST Assuming you have done the other self-test, run a regression predicting Aggression from CUT_ Centred, Vid Centred and Interaction.
Using the PROCESS tool (if you haven't installed it yet, see Section 10.2) has several advantages over using the normal regression tools: (1) it will centre predictors for us; (2) it computes the interaction term automatically; and (3) it will do simple slopes analysis. To access the dialog boxes in Figure 10.8 select Analyze Regression ►
41 have to be careful not to confuse this with my wife, who is the Zoe of significance.
402
FIGURE 10.8 The dialog boxes for running moderation analysis
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
PROCESS. Dy Andrew F. Hayes (http://www.afhayes.com). The variables in your data file will be listed in the box labelled Data File Variables. Select the outcome variable (in this case Aggression) and drag it to the box labelled Outcome Variable (Y), or click on * . Similarly select the predictor variable (in this case Vid_Games) and drag it to the box labelled Independent Variable (X). Finally, select the moderator variable (in this case CaUnTs) and drag it to the box labelled M Variable(s), or click on yjfej. This box is where you specify any moderators (you can have more than one).
PROCESS can test 74 different types of model, and these models are listed in the dropdown box labelled Model Number. If you want to investigate all 74 different models then have a look at the PROCESS documentation (http://www.afhayes.com/public/process.pdf). Simple moderation analysis is represented by model 1, but the default model is 4 (mediation, which we'll look at next). Therefore, activate this drop-down list and select 1 -. The rest of the options in this dialog box are for models other than simple moderation, so we'll ignore them
If you click on &>«°™ another dialog box will appear containing four useful options for moderation. Selecting (\)Mean center for products centres the predictor and moderator for you; (2) Heteroscedasticity-consistent SEs means we need not worry about having heteroscedasticity in the model; (3) OLS/ML confidence intervals produces confidence intervals for the model, and I've tried to emphasize the importance of these throughout the book; and (4) Generate data for plotting is helpful for interpreting and visualizing the simple slopes analysis. Talking of simple slopes analysis, if you click on concnao™^:? vou can change whether you want simple slopes at ±1 standard deviation of the mean of the moderator (the default, which is fine) or at percentile points (it uses the 10th, 25th, 50th, 75th and 90th percentiles). It is useful to select the Johnson-Neyman method to get a zone of significance for the moderator. Back in the main dialog box, click on 1 <* 1 to run the analysis.
ö PROCESS Procedure for SPSS, written by An*ew f. Hayes (www.efhayes.com)
Data Fife VanaMes
Outcome variable 0)
^Agression lAgsKessr ]
Independent Variable (X)
*    i^WHo Gamps (Hour  j ?_onai!"
M vanablefsi
Callous Unemotion
Model Number
Bootstrapping lot indirect effects
Bootstrap Samples_
1000 —mm*^^
Covanaleis)
Bootstrap CI metiiod Percentile Bias Corrected
Confidence level lor confidence intervals
Covanateisi in modelisj of '*  Dotti M and V O Monly O Von*
j.-i PROCESS Optio
Proposes -Moderator W
Proposed Moderator Z
Proposed Moderatoi V
Pi oposed Moderator 0
[ OK ] Paste    Reset Cane*
•/ Mean center for products V Heteroscsdasticit;-consistent SEs «/ OLS/ML confidence intervals -/ Generate data for plotting; (model 1. 2. and ? only Effect 3UB (models 4 and 6) Sobeitesl (model 4 cnLV) Total effect model (models 4 and 6 only) Compare indirect effects (models 4 and 6 only)
[ Continue j Cancel
Group Title Pio-a-Point
•*> Mean and »v i SD fron Mean Percentiles
I Continue J Cancel
10.3.7.
Output from moderation analysis ©
The first thing to notice about the output is it appears as text rather than being nicely formatted in tables. Try not to let this formatting disturb you. If your output looks odd
CHAPTER 10 MODERATION, MEDIATION AND MORE REGRESSION
403
SPSS TIP 10.1
Troubleshooting PROCESS
There are a few things worth knowing about PROCESS that might help to prevent weird stuff happening.
If the variable names entered into PROCESS are longer than 8 characters, it shortens them to 8 characters. Therefore, if you enter variables with similar long names PROCESS will get confused. For example, if you had two variables in the data editor called NumberOfNephariousActs and NumberOfBlackSabbathAlbumsOwned they would both be shortened to numberof (or possibly number~1 and number~2) and PROCESS will get confused about which variable is which. If your output looks weird, then check your variable names. Don't call any of your variables xxx (I'm not sure why you would) because that is a reserved variable name in PROCESS, so naming a variable xxx will confuse it.
PROCESS is also confused by string variables, so only enter numeric variables.
or contains warnings, or has a lot of zeros in it, it might be worth checking the variables that you input into PROCESS (SPSS Tip 10.1). However, assuming everything has gone smoothly, you should see Output 10.1, which is the main moderation analysis. This output is pretty much the same as the table of regression coefficients that we saw in Chapter 8. We're told the £>-value for each predictor, and the associated standard errors (which have been adjusted for heteroscedasticity because we asked for them to be). Each b is compared to zero using a t-test, which is computed from the beta divided by its standard error. The confidence interval for the b is also produced (because we asked for it). Moderation is shown up by a significant interaction effect, and in this case the interaction is highly significant, b = 0.027, 95% CI [0.013, 0.041], t = 3.71, p < .001, indicating that the relationship between the time spent gaming and aggression is moderated by callous traits.
SELF-TEST Assuming you did the previous self-test, compare the table of coefficients that you got with those in Output 10.1.
To interpret the moderation effect we can examine the simple slopes, which are shown in Output 10.2. Essentially, the table shows us the results of three different regressions: the regression for time spent gaming as a predictor of aggression (1) when callous traits are low (to be precise when the value of callous traits is -9.6177); (2) at the mean value of callous traits (because we centred callous traits its mean value is zero as indicated in the output); and (3) when the value of callous traits is 9.6177 (i.e., high). We can interpret these three regressions as we would any other: we're interested in the value of b (called Effect in the output), and its significance. From what we have already learnt about regression we can interpret the three models as follows:
:, When callous traits are low, there is a non-significant negative relationship between time spent gaming and aggression, b = -0.091, 95% CI [-0.299, 0.117], t = -0.86, p = .392.
2 At the mean value of callous traits, there is a significant positive relationship between time spent gaming and aggression, b = 0.170, 95% CI [0.020, 0.319], t = 2.23, p = .026.
3 When callous traits are high, there is a significant positive relationship between time spent gaming and aggression, b = 0.430, 95«% CI [0.231, 0.628], t = 4.26, p < .001.
404
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
These results tell us that the relationship between time spent playing violent video games and aggression only really emerges in people with average or greater levels of callous-unemotional traits.
Qyypyy ^ q ^ ************************************************************************* Model = 1
Y = Aggressi X = Vid_Game M = CaUnTs
Sample size 442
************************	**********	***********	*********************	* * *	A -* *■ * *
Outcome: Aggressi					
Model Summary					
R R-sq	F	dfl	df2 p		
.6142 .3773	90.5311	3.0000	438.0000 .0000		
Model					
coef £	se	t	p LLCI		ULCI
constant 39.9671	.4750	84.1365	.0000 39.0335	40	. 9007
CaUnTs .7601	. 0466	16.3042	.0000 .6685		. 8517
Vid_Game            .169 6	.0759	2.2343	.0260 .0204		. 3188
int_l .0271	. 0073	3.7051	.0002 .0127		. 0414
Interactions:					
int_l       Vid_Game X	CaUnTs				
************************ Conditional effect of X	************************************************* on Y at values of the moderator(s)				
CaUnTs Effect	se	t	p LLCI		ULCI
-9.6177 -.0907	.1058	-.8568	.3920 -.2986		. 1173
.0000 .1696	. 0759	2.2343	.0260 .0204		.3188
9.6177 .4299	.1010	4.2562	.0000 .2314		. 6284
Values for quantitative	moderators	are the mean and plus/minus one		SD	from mean
Output 10.3 shows the output of the Johnson-Neyman method, and this gives a different approach to simple slopes. First we're told the boundaries of the zone of significance: it is between -17.1002 and -0.7232. Remember that these are the values of the centred version of the callous-unemotional traits variable, and define regions within which the relationship between the time spent gaming and aggression is significant. The table underneath gives a detailed breakdown of these regions. Essentially it's doing something quite similar to the simple slopes analysis: it takes different values of callous and unemotional traits and for each one computes the b {Effect) and its significance for the relationship between the time spent gaming and aggression. I have annotated the output to show the boundaries of the zone of significance. If you look at the column labelled p you can see that we start off with a significant negative relationship between time spent gaming and aggression, b = -0.334, 95% CI [-0.645, -0.022], t = -2.10, p = .036. As we move up to the next value of callous traits (-17.1002), the relationship between time spent gaming and aggression is still significant (p = .0500), but at the next value it becomes non-significant (p =.058). Therefore, the
cHAPTER 10 MODERATION, MEDIATION AND MORE REGRESSION
*********************  JOHNSON-NEYMAN TECHNIQUE ******************
Moderator value(s) defining Johnson-Neyman significance region(s) -17.1002 -.7232
Conditional effect of X on Y at values of the moderator (M)
405
CaUnTs		Effect	se		t	P	LLCI	ULCI
-lb	.5950	-.3336	.1587	-2	. 1027	. 0361	-.6454	-.0218
-17	. 1002	-.2931	.1492	-1	. 9654	. 0500	-.5863	.0000
-16	.4450	-.2754	.1451	-1	. 8987	. 0583	-.5605	.0097
-14	.2950	-.2172	. 1319	-1	. 6467	. 1003	-.4765	.0420
-12	.1450	-.1590	.1194	-1	.3319	.1836	-.3937	.0756
-9	. 9950	-.1009	.1077	-	. 9361	. 3497	-.3126	.1109
-7	. 8450	-.0427	.0972	-	. 4390	.6609	-.2338	. 1484
-5	. 6950	.0155	.0882		.1757	.8606	-.1579	. 1889
-3	.5450	. 0737	.0813		. 9059	.3655	-.0862	.2336
-1	. 3950	. 1319	.0771	1	.7111	. 08 78	-.0196	. 2833
-	. 7232	. 1501	. 0763	1	. 9654	. 0500	. 0000	.3001
	.7550	. 1901	. 0759	2	.5053	.0126	.0410	.3392
2	. 9050	.2482	.0779	3	. 1878	.0015	.0952	.4013
5	.0550	. 3064	. 0829	3	.6980	.0002	. 1436	. 4693
7	.2050	.3646	.0903	4	.0360	.0001	. 1871	.5422
9	.3550	.4228	.0997	4	.2386	.0000	.2267	.6188
11	.5050	.4810	.1106	4	. 3490	.0000	.2636	. 6983
13	. 6550	. 5392	.1225	4	.4013	.0000	.2984	.7799
15	.8050	.5973	.1352	4	.4188	.0000	. 3317	.8630
17	.9550	. 6555	.1484	4	.4160	.0000	.3638	. 9473
20	.1050	.7137	.1621	4	.4017	.oooo	.3950	1.0324
22	.2550	. 7719	.1762	4	.3914	.0000	. 4256	1.1181
24	. 4050	.8301	.1905	4	. 3580	. 0000	. 4557	1.2044
* * * * *	*********	********	***********	*****	******	***********	*************	* * * * *
Significant
Not significant
Significant
threshold for significance ends at -17.1002 (which we were told at the top of the output). As we increase the value of callous-unemotional traits the relationship between time spent gaming and aggression remains non-significant until the value of callous-unemotional traits is -0.723, at which point it just crosses the threshold for significance again. For all subsequent values of callous-unemotional traits the relationship between time spent gaming and aggression is significant. Looking at the ^-values themselves (in the column labelled Effect) we can also see that with increases in callous-unemotional traits the strength of relationship between time spent gaming and aggression goes from a small negative effect (b = -0.33,4) to a fairly strong positive one (b = 0.830). /
The final way we can look at these effects is by graphing them. In Figure 10.8 we asked PROCESS to generate data for plotting and these data are at the bottom of the output (see Figure 10.9). We're given values of the variable Vid_Games (-6.9622, 0, 6.9622) and of CaUnTs (-9.6177, 0, 9.6177). These values are not important in themselves, but they correspond to low, mean and high values of the variable. The yhat tells us the predicted values of the outcome (aggression) for these combinations of the predictors. For example, when Vid_Games and CaUnTs are both low (-6.9622 and -9.6177, respectively) the predicted value of aggression is 33.2879, when both variables are at their mean (0 and 0), the predicted value of aggression is 39.9671, and so on. To create a simple slopes graph we need to put these values in a data file. The simplest way to create the new data file is to create coding variables that represent low, mean and high (use any codes you like). Then enter all combinations of these codes. For example, in Figure 10.9 I've created variables called Games and CaUnTs both of which are coding variables (1 = low, 2 = mean, 3 = high) and then entered the combinations of these codes that correspond to the PROCESS output (e.g., low-low, mean-low, high-low), then I have typed in the corresponding predicted values from the PROCESS output. Hopefully you can see from Figure 10.9 how the output from PROCESS corresponds to the new data file. You can access this file as Video Game Graph.sav if you can't work out how to create it yourself. Having transferred the output to a data file, we can draw line graphs using what we learnt in Chapter 4.
OUTPUT 10.3
406
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
FIGURE 10.9 Entering data for graphing simple slopes
Data for visualizing conditional effect
Vid -6"
Game 9622 0000 9622 9622 0000 9622 9622 0000 9622
CaOnTs -9.6177 -9.6177 -9.6177 .0000 .0000 . 0000 9.6177 9.6177 9.6177
33	2879
32	6568
32	0256
38	7861
39	9671
41	1481
44	2844
47	2774
50	2705
y -Video G*m* Graphs. i*v fD«t«S«W | BM SPSS Statni 0«ta Edito, .i,
fife  Eal View  Data Transform Analyze Dired Markstin; Graphs Utiles Add-ons WirirjQ>.v Help
	Gsims	CaUnTs	Aggression		»     1 »	•m__j
1	Lo'.v Low			33 29	:	
2	Mean	Low		32 66		
3	Hi*	Low		32 03		
■ 4___	Low	Mean		38 79		
i	Mean	Mean		39 97		
t	High	Mean		41 15		
7	Low	High	44 28			
I	Mean	High		47 28	-	
1	High	High		50 27	.........	
10						
11						
ISM SPSS Statistics Processor is ready
SELF-TEST Draw a multiple line graph of Aggression (/-axis) against Games (x-axis) with different-coloured lines for different values of CaUnTs
The resulting graph from the self-test is shown in Figure 10.10. The graph shows what we found from the simple slopes analysis: when callous traits are low (blue line) there is a non-significant negative relationship between time spent gaming and aggression; at the mean value of callous traits (green line) there is small positive relationship between time spent gaming and aggression; and this relationship gets even stronger at high levels of callous traits (beige line).
cHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
407
SELF-TEST  Now draw a multiple line graph of Aggression (/-axis) against CaUnTs (x-axis) with different-coloured lines for different values of Games.
Reporting moderation analysis
Moderation analysis is just regression, so we can report it in the same way as described in Section 8.9. My personal preference would be to produce a table such as Table 10.1.
TABLE 10.3   Linear model of predictors of aggression
Constant
Callous Traits (centred) Gaming (centred) Callous Traits x Gaming Note. R2= .38.
39.97 [39.03. 40.90]
0.76 [0.67. 0.85]
0.17 [0.02, 0.32]
0 027 [0.01, 0.04]
0.475 84.13 p < .001
0.047 16.30 p < .001
0.076 2.23 p = .026
0.007 3.71 p < .001
FIGURE 10.10
Simple slopes equations of the regression of aggression on video games at three levels of callous traits
.2
g 40.00-
a <
O 30.00-u •9
CJ
E £
Callous Unemotional Traits
— Low
— Mon High
Low M..,r High
Video Games (Hours per week)
CRAMMING SAM'S TIPS
Moderation
Moderation occurs when the relationship between two variables changes as a function of a third variable. For example, the relationship between watching horror films and feeling scared at bedtime might increase as a function of how vivid an imagination a person has.
Moderation is tested using a regression in which the outcome (fear at bedtime) is predicted from a predictor (how many horror films are watched), the moderator (imagination) and the interaction of these variables. Predictors should be centred before the analysis.
The interaction of two variables is simply the scores on the two variables multiplied together. If the interaction is significant then moderation is present.
If moderation is found, follow up the analysis with simple slopes analysis. This analysis looks at the relationship between the predictor and outcome at low, mean and high levels of the moderator.
408
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
10.4. Mediation
10.4.1.
The conceptual model ©
rWhat is mediation?
FIGURE 10.11 Diagram of a basic mediation model
Whereas moderation alludes to the combined effect of two variables on an outcome, mediation refers to a situation when the relationship between a predictor variable and an outcome variable can be explained by their relationship to a third variable (the mediator). The top of Figure 10.11 shows a basic relationship between a predictor and an outcome (denoted as c). However, the bottom of the fig ure shows that these variables are also related to a third variable in specific ways: (1) the predictor also predicts the mediator through the path denoted by a; (2) the mediator predicts the outcome through the path denoted by b. The relationship between the predictor and outcome will probably be different when the mediator is also included in the model and so is denoted c'. The letters denoting each path {a, b, c and c') represent the unstandardized regression coefficient between the variables connected by the arrow; therefore, they symbolize the strength of relationship between variables. Mediation is said to have occurred if the strength of the relationship between the predictor and outcome is reduced by including the mediator (i.e., the regression parameter for c' is smaller than for c). Perfect mediation occurs when c' is zero: in other words, the relationship between the predictor and outcome is completely wiped out by including the mediator in the model.
This description is all a bit abstract, so let's use an example. My wife and I often wonder what the important factors are in making a relationship last. For my part, I don't really understand why she'd want to be with a balding heavy rock fan with an oversized collection of vinyl and musical instruments and an unhealthy love of Doctor Who and numbers. It is important I gather as much information as possible about keeping her happy because the odds are stacked against me. For her part I have no idea why she wonders: her very existence makes me happy. Perhaps if you are in a relationship you have wondered how to make it last too.
Simple Relationship
Predictor	->	Outcome
	c	
Mediated Relationship	Indirect Effect	
r		's
	Mediator	
		\
Predictor	->	Outcome
	c'	
-y-
Direct Effect
cHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
409
Indirect Effect
-y
Direct Effect
During our cyber-travels, Mrs Field and I have discovered that physical attractiveness (McNulty, Neff, & Karney, 2008), conscientiousness and neuroticism (good for us) predict marital satisfaction (Claxton, O'Rourke, Smith, & DeLongis, 2012). Pornography use probably doesn't: it is related to infidelity (Lambert, Negash, Stillman, Olmstead, & Fincham, 2012). Mediation is really all about the variables that explain relationships like these: it's unlikely that everyone who catches a glimpse of some porn suddenly rushes out of their house to have an affair - presumably it leads to some kind of emotional or cognitive change that undermines the love glue that holds us and our partners together. Lambert et al. tested this hypothesis. Figure 10.12 shows their mediator model: the initial relationship is that between pornography consumption (the predictor) and infidelity (the outcome), and they hypothesized that this relationship is mediated by commitment (the mediator). This model suggests that the relationship between pornography consumption and infidelity isn't a direct effect but operates through a reduction in relationship commitment. For this hypothesis to be true: (1) pornography consumption must predict infidelity in the first place (path c); (2) pornography consumption must predict relationship commitment (path a); (3) relationship commitment must predict infidelity (path b); and (4) the relationship between pornography consumption and infidelity should be smaller when relationship commitment is included in the model than when it isn't. We can distinguish between the direct effect of pornography consumption on infidelity, which is the relationship between them controlling for relationship commitment, and the indirect effect, which is the effect of pornography consumption on infidelity through relationship commitment (Figure 10.12).
10.4.2.
The statistical model
Unlike moderation, the statistical model for mediation is basically the same as the conceptual model: it is characterized in Figure 10.11. Historically, this model was tested through a series of regression analyses, which reflect the four conditions necessary to demonstrate mediation (Baron & Kenny, 1986). I have mentioned already that the letters denoting the paths in Figure 10.11 represent the unstandardized regression coefficients for the relationships between variables denoted by the path. Therefore, to estimate any one of these paths, we want to know the unstandardized regression coefficient for the two variables involved. For example, Baron and Kenny suggested in their seminal paper that mediation is tested through three regression models (see also Judd & Kenny, 1981):
1 A regression predicting the outcome from the predictor variable. The regression coefficient for the predictor gives us the value of c in Figure 10.11.
FIGURE 10.12 Diagram of a mediation model from Lambert et al. (2012)
410
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
2 A regression predicting the mediator from the predictor variable. The regression coefficient for the predictor gives us the value of a in Figure 10.11.
T A regression predicting the outcome from both the predictor variable and the mediator. The regression coefficient for the predictor gives us the value of c' in Figure 10.11, and the regression coefficient for the mediator gives us the value of b.
These models test the four conditions of mediation: (1) the predictor variable must significantly predict the outcome variable in model 1; (2) the predictor variable must significantly predict the mediator in model 2; (3) the mediator must significantly predict the outcome variable in model 3; and (4) the predictor variable must predict the outcome variable less strongly in model 3 than in model 1.
In Lambert et al.'s (2012) study, all participants had been in a relationship for at least a year. The researchers measured pornography consumption on a scale from 0 (low) to 8 (high), but this variable, as you might expect, was skewed (most people had low scores) so they analysed log-transformed values (LnConsumption). They also measured commitment to their current relationship (Commitment) on a scale from 1 (low) to 5 (high). Infidelity was measured in terms of questions asking whether the person had committed a physical act (Infidelity) that they or their partner would consider to be unfaithful (0 = no, 1 = one of them would consider it unfaithful, 2 = both of them would consider the act unfaithful),5 and also in terms of the number of people they had 'hooked up' with in the previous year (Hook_Ups), which would mean during a time period in which they were in their current relationship.6 The actual data from Lambert et al.'s study are in the file Lambert et al. (2012).sav.
SELF-TEST Run the three regressions necessary to test mediation for Lambert et al.'s data: (1) a regression predicting Infidelity from LnConsumption; (2) a regression predicting Commitment from LnConsumption; and (3) a regression predicting Infidelity from both LnConsumption and Commitment. Is there evidence of mediation?
Many people still use this approach to test mediation: Baron and Kenny's article has been cited over 35,000 times in scientific papers, which gives you some idea of how influential this method has been. I think it is very useful for illustrating the principles of mediation and for understanding what mediation means. However, the method of regressions has some limitations. The main one is the fourth criterion by which mediation is assessed: the predictor variable must predict the outcome variable less strongly in model 3 than in model 1. Although we know that perfect mediation is shown when the relationship between the predictor and outcome is reduced to zero in model 3, usually this doesn't happen. Instead, you see a reduction in the relationship between the predictor and outcome, rather than the relationship being reduced to zero. This raises the question of how much of a reduction is necessary to infer mediation.
Although Baron and Kenny advocated looking at the sizes of the regression parameters, in practice people tend to look for a change in significance; so, mediation would occur if the relationship between the predictor and outcome was significant (p < .05) when looked at in isolation (model 1) but not significant (p > .05) when the mediator is included too (model 3). This approach can lead to all sorts of silliness because of the all-or-nothing
5 I've coded this variable differently from the original data to make interpretation of it more intuitive, but it doesn't affect the results.
* A 'hook-up' was defined to participants as 'when two people get together for a physical encounter and don t necessarily expect anything further (e.g., no plan or intention to do it again)'.
cHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
thinking that p-values encourage. You could have a situation in which the 6-value for the relationship between the predictor and outcome changes very little in models with and without the mediator, but the p-value shifts from one side of the threshold to another (e.g.., from p =-049 when the mediator isn't included to p =.051 when it is). Even though the p-values have changed from significant to not significant, the change is very small, and the size of the relationship between the predictor and outcome will not have changed very much at all. Similarly, you could have a situation where the b for the relationship between the predictor and the outcome reduces a lot when the mediator is included, but remains significant in both cases. For example, perhaps when looked at in isolation the relationship between the predictor and outcome is b = 0.46, p < .001, but when the mediator is included as a predictor as well it reduces to b = 0.18, p = .042. You'd conclude (based on significance) that no mediation had occurred despite the fact that relationship between the predictor and outcome is less than half its original value.
An alternative is to estimate the indirect effect and its significance. The indirect effect is illustrated in Figures 10.11 and 10.12: it is the combined effects of paths a and b. The significance of this effect can be assessed using the Sobel test (Sobel, 1982). If the Sobel test is significant it means that the predictor significantly affects the outcome variable via the mediator. In other words, there is significant mediation. This test works well in large samples, but you're better off computing confidence intervals for the indirect effect using bootstrap methods (Section 5.4.3). Now that computers make it easy for us to estimate the indirect effect (i.e., the effect of mediation) and its confidence interval, this practice is becoming increasingly common and is preferable to Baron and Kenny's regressions and the Sobel test because it's harder to get sucked into the black-and-white thinking of significance testing (Section 2.6.2.2). People tend to apply Baron and Kenny's method in a way that is intrinsically bound to looking for 'significant' relationships, whereas estimating the indirect effect and its confidence interval allows us to simply report the degree of mediation observed in the data.
411
10.4.3.
Effect sizes of mediation
If we're going to look at the size of the indirect effect to judge whether mediation has occurred, then it's useful to have effect size measures to help us (see Section 2.7.1). Many effect size measures have been proposed and are discussed in detail elsewhere (MacKinnon, 2008; Preacher & Kelley, 2011). The simplest is to look at the regression coefficient for the indirect effect and its confidence interval. Figure 10.11 shows us that the indirect effect is the combined effect of paths a and b. We have also seen that a and b are unstandardized regression coefficients for the relationships between variables denoted by the path. To find the combined effect of these paths, we simply multiply these regression coefficients:
indirect effect = ab
(10.2)
The resulting value is an unstandardized regression coefficient like any other, and consequently is expressed in the original units of measurement. As we have seen, it is sometimes useful to look at standardized regression parameters, because these can be compared across different studies using different outcome measures (see Chapter 8). MacKinnon (2008) suggested standardizing this measure by dividing by the standard deviation of the outcome variable:
indirect effect (partially standardized):
ab
Outcome
(10.3)
412
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
This standardizes the indirect effect with respect to the outcome variable, but not the predictor or mediator. As such, it is sometimes referred to as the partially standardized indirect effect. To fully standardize the indirect effect we would need to multiply the partially standardized measures by the standard deviation of the predictor variable (Preacher & Hayes, 2008b):
indirect effect (standardized):
ab
^Outcome
' X SPredictor
(10.4)
This measure is sometimes called the index of mediation. This measure is useful in that it can be compared across different mediation models that use different measures of the predictor, outcome and mediator. Reporting this measure would be particularly helpful if anyone decides to include your research in a meta-analysis.
A different approach to estimating the size of the indirect effect is to look at the size of the indirect effect relative to either the total effect of the predictor or the direct effect of the predictor. For example, if we wanted the ratio of the indirect effect (ab) to the total effect (c) we could use the regression parameters from the various regressions displayed in Figure 10.11:
ab
c
(10.5)
Similarly, if we want to express the indirect effect as a ratio of the direct effect (c1), the regressions give us everything we need:
R
ab
M
(10.6)
These ratio-based measures only really re-describe the original indirect effect. Both are very unstable in small samples, and MacKinnon (2008) advises against using PM and RM in samples smaller than 500 and 5000, respectively. Also, although it is tempting to think of PMas a proportion (because it is the ratio of the indirect effect compared to the total effect) it is not: it can exceed 1 and even take on negative values (Preacher & Kelley, 2011). For these reasons, these ratio measures are probably best avoided.
In regression we used R2 as a measure of the proportion of variance explained by a predictor (or several predictors). We can compute a form of R2 for the indirect effect, which tells us the proportion of variance explained by the indirect effect. MacKinnon (2008) proposes several versions, but PROCESS computes this one:
R
Y,MX
R
Y,X I
(10.7)
This uses the proportion of variance in the outcome variables explained by the predictor (Ry x ), the mediator (Ry M ), and both ( RytMX )• It can be interpreted as the variance in the outcome that is shared by the mediator and the predictor, but that cannot be attributed to either in isolation. Again, this measure is not bound to fall between 0 and 1, and it's possible to get negative values (which usually indicate suppression effects rather than mediation).
The final measure that I'll consider was proposed by Preacher and Kelley (2011) and is called kappa-squared ( k2 ). If you read the original article, it is full of scary equations that make this measure very difficult to explain. However, at a conceptual level it is a
CHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
very simple and elegant idea: kappa-squared expresses the indirect effect as a ratio to the maximum possible indirect effect that you could have found given the design of your study:
ab
max
(ab)
(10.8)
The scary maths comes into play in how the maximum possible value of the indirect effect is computed. However, we have computers to do that for us, so let's just imagine that a frog called Hugglefrall sticks his big slimy tongue out and numbers attach themselves to it. He then swirls the numbers around in his mouth, does that funny expanding throat thing that frogs sometimes do, and then belches out the value for us. Beyond that, all we need to know is that kappa is a proportion and we can interpret it as such: values of 0 mean the indirect effect is very small relative to the maximum possible value, and values close to 1 mean that it is as large as it could possibly be given the design that we have. Not that I should really encourage this sort of thing, but in terms of what constitutes a large effect, k2 can be equated to the values used for R2: a small effect is .01, a medium effect would be around .09, and a large effect in the region of .25 (Preacher & Kelley, 2011).
PROCESS compures all of the effect size measures that I have discussed, but of them all probably the most useful are the unstandardized and standardized indirect effect and k2 . All of the measures discussed have accompanying confidence intervals and are unaffected by sample sizes (although note my earlier comments about the variability of PM and RM in small samples). However, PM, RM and cannot be interpreted easily because they allude to being proportions but are not, and all of the measures apart from k1 are unbounded, which again makes interpretation tricky (Preacher & Kelley, 2011).
10.4.4.
Running the analysis
Assuming we're going to test Lambert's mediation model (Figure 10.12) by estimating the indirect effect rather than through a Baron and Kenny style mediation analysis, then we can again use Hayes's PROCESS tool (see Section 10.2 if you haven't installed it yet). To access the dialog boxes in Figure 10.13 select Analyze Regression ► PROCESS, by Andrew F. Hayes (http:/AAWw.afhayes.com). The variables in your data file will be listed in the box labelled Data File Variables. Select the outcome variable (in this case Infidelity) and drag it to the box labelled Outcome Variable (Y), or click on * . Similarly, select the predictor variable (in this case LnConsumption) and drag it to the box labelled Independent Variable (X). Finally, select the mediator variable (in this case Commitment) and drag it to the box labelled M Variable(s), or click on * . This box is where you specify any mediators (you can have more than one).
As I mentioned before, PROCESS can test many different types of model, and simple mediation analysis is represented by model 4 (this model is selected by default). Therefore, make sure that * - is selected in the drop-down list under Model Number. Unlike moderation, there are other options in this dialog box that are useful: for example, to test the indirect effects we will use bootstrapping to generate a confidence interval around the indirect effect. By default PROCESS uses 1000 bootstrap samples, and will compute bias corrected and accelerated confidence intervals. These default options are fine, but just be aware that you can ask for percentile bootstrap confidence intervals instead (see Section 5.4.3).
If you click on ew°™-: another dialog box will appear containing four useful options for mediation. Selecting (1) Effect size produces the estimates of the size of the indirect effect
413
414
FIGURE 10.13 The dialog boxes for running mediation analysis
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
y PROCfSS Procedure for SPSS, wnflen Oy Andrew F. Heye* (www.efhayes.com)
Data Fite variattes ____
4^ Pornoaraprij Consumption [Con.. -0s* NumDer of Hooks ups in PastYe...
About
Outcome Vanaoie (V!
i.yJ \& PHr&)callnfl<3*4tty(0-..H , <—r .----c=—T=ř--1 i Options
independent variaDie_íX)_ *   yLogTransfrom^Po] lAMHH
i lanačTejs;
Model Number	
4	
Bootstrapping for MM	J effects
Bootstrap Samples	
1000	
Bootstrap Cl -riettioc	
■_■ Percentile	
Bias Ccrrected	
	
Confidence level for confidence intervals	
	_........ ?_
Covartatefs; in modelrs	*~——i
% bom M and Y	
O Monty	
O Vonly	
$ Commttment/1-5; |
Covanat^si
Proposed Moderator W
Proposed Moderator Z
Proposed Moderator If
Proposed Moderator 0
[ Oh j Paste ^jjjjjj^ ^jjjíjj^
iy PROCESS Opttom ;7^jg
Mean center lor products
Heterosttid3stjat,-consistem SEE
OUSML confidence intervals
Generate cats tor clotting (model 1 2. ana 3 only)
V Eflect size (models 4 and 6)
V SoOel lest (model 4 only)
■4 Total eflect T-odel 'models 4 and 6 only; •/Compare indirect streets (modest 4 ana 6 only]
[ Continue j Cancel
ODITI'S LANTERN
moderation and mediation
'I, Oditi, want you to join my cult of undiscovered numerical truths. I also want you to stare into my lantern to gain statistical enlightenment. It's possible that statistical knowledge mediates the relationship between staring into my lantern and joining my cult ... or it could be mediated by neurological changes to your brain created by the subliminal messages in the videos. Stare into my lantern to find out about mediation and moderation.'
cHAPTER 10 MODERATION, MEDIATION AND MORE REGRESSION
variables, which have been shortened to 8 letters (SPSS Tip 10.1). This is useful for double-checking we have entered the variables in the correct place: the outcome is infidelity, the predictor consumption, and the mediator is commitment. The next part of the output shows us the results of the simple regression of commitment predicted from pornography consumption (i.e., path a in Figure 10.12). This output is interpreted just as we would interpret any regression: we can see that pornography consumption significantly predicts relationship commitment, b = -0A7, t = -2.21, p = .028. The R2 value tells us that pornography consumption explains 2% of the variance in relationship commitment, and the fact that the b is negative tells us that the relationship is negative also: as consumption increases, commitment declines (and vice versa).
415
*********************************************
Model = 4
Y = Infidel i
X = LnConsum
M = Commitme Sample size
********************************************* Outcome: Commitme
t*************
***************************
Model Summary
R R-sq .1418 .0201
Model
coef f
constant 4.2027 LnConsum -.4697
F dfl df2
4.8633 1.0000 237.0000
se t p
.0545 77.1777 .0000
.2130 -2.2053 .0284
P
.0284
OUTPUT 10.4
discussed in Section 10.4.3 ;7 (2) Sobel test produces a significance test of the indirect effect devised by Sobel; (3) Total effect model produces the direct effect of the predictor on the outcome (in this case the regression of infidelity predicted from pornography consumption); and (4) Compare indirect effects will, when you have more than one mediator in the model, estimate the effect and confidence interval for the difference between the indirect effects resulting from these mediators. This final option is useful when you have more than one mediator to compare their relative importance in explaining the relationship between the predictor and outcome. However, we have only a single mediator so we don't need to select this option (you can select it if you like, but it won't change the output produced). None of the options activated by clicking on cwcm™, apply to simple mediation models, so we can ignore this button and click 1 °k 1 to run the analysis.
10.4.5.
Output from mediation analysis ®
As with moderation, the output appears as text. Output 10.4 shows the first part of the output, which initially tells us the name of the outcome (Y), the predictor (X) and the mediator (M)
7 R^and k1 are produced only for models with a single mediator. Although I don't look at more complex models, bear this in mind if you run models including more than one mediator, or covariates.
Output 10.5 shows the results of the regression of infidelity predicted from both pornography consumption (i.e., path c' in Figure 10.12) and commitment (i.e., path b in Figure 10.12). We can see that pornography consumption significantly predicts infidelity even with relationship commitment in the model, b = 0.46, t = 2.35, p = .02; relationship commitment also significantly predicts infidelity, b = -0.27, t = -4.61, p <.001. The R2 value tells us that the model explains 11.4% of the variance in infidelity. The negative b for commitment tells us that as commitment increases, infidelity declines (and vice versa), but the positive b for consumption indicates that as pornography consumption increases, infidelity increases also. These relationships are in the predicted direction.
r***************i
r**********
r**********
Outcome: Infideli Model Summary R
.3383
Model
R-sq 1144
F dfl d£2
15.2453 2.0000 236.0000
P
.0000
	coef f	se		t	P
constant	1. 3704	.2518	5	4433	. 0000
Commitme	-.2710	.0587	-4	6128	.0000
LnConsum	.4573	.1946	2	3505	.0196
OUTPUT 10.5
416
OUTPUT 10.6
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
Output 10.6 shows the total effect of pornography consumption on infidelity (outcome). You will get this bit of the output only if you selected Total effect model in Figure 10.13. The total effect is the effect of the predictor on the outcome when the mediator is not present in the model - in other words, path c in Figure 10.11. When relationship commitment is not in the model, pornography consumption significantly predicts infidelity, b = 0.58, t = 2.91 p = .004. The R2 value tells us that the model explains 3.46% of the variance in infidelity. As is the case when we include relationship commitment in the model, pornography consumption has a positive relationship with infidelity (as shown by the positive 6-value).
***********************
Outcome: Infideli Model Summary
R R-sq .1859 .0346
TOTAL  EFFECT MODEL
***********
F dfl df2
8.4866 1.0000 237.0000
P
.0039
Model
constant LnConsum
coef f .2315 . 5846
se t .0513 4.5123 .2007 2.9132
P
. 0000 . 0039
Output 10.7 is the most important part of the output because it displays the results for the indirect effect of pornography consumption on infidelity (i.e., the effect via relationship commitment). First, we're told the effect of pornography consumption on infidelity in isolation (the total effect), and these values replicate the model in Output 10.6. Next, we're told the effect of pornography consumption on infidelity when relationship commitment is included as a predictor as well (the direct effect). These values replicate those in Output 10.5. The first bit of new information is the Indirect effect of X on Y, which in this case is the indirect effect of pornography consumption on infidelity. We're given an estimate of this effect (b = 0.127) as well as a bootstrapped standard error and confidence interval. As we have seen many times before, 95% confidence intervals contain the true value of a parameter in 95% of samples. Therefore, we tend to assume that our sample isn't one of the 5% that does not contain the true value and use them to infer the population value of an effect. In this case, assuming our sample is one of the 95% that 'hits' the true value, we know that the true fc-value for the indirect effect falls between 0.023 and 0.335.8 This range does not include zero, and remember that b = 0 would mean 'no effect whatsoever'; therefore, the fact that the confidence interval does not contain zero means that there is likely to be a genuine indirect effect. Put another way, relationship commitment is a mediator of the relationship between pornography consumption and infidelity.
The rest of Output 10.7 you will see only if you selected Effect size in Figure 10.13; it contains various standardized forms of the indirect effect. In each case they are accompanied by a bootstrapped confidence interval. We discussed these measures of effect size in Section 10.4.3, and rather than interpret them all I'll merely note that for each one you get an estimate along with a confidence interval based on a bootstrapped standard error. As with the unstandardized indirect effect, if the confidence intervals don't contain zero then we can be confident that the true effect size is different from 'no effect'. In other words, there is mediation. All of the effect size measures have confidence intervals that don't include zero, so whatever one we look at we can be fairly confident that the indirect effect is greater than 'no effect'. Focusing on the most useful of these
8 Remember that because of the nature of bootstrapping you will get slightly different values in your output.
CHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
effect sizes, the standardized b for the indirect effect, its value is b =.041, 95% BCa CI [,007, .103], and similarly, k2 =.041, 95% BCa CI [.008,.104]. k2 is bounded between 0 and 1, so we can interpret this as the indirect effect being about 4.1% of the maximum value that it could have been, which is a fairly small effect. We might, therefore, want to look for other potential mediators to include in the model in addition to relationship commitment.
*****************  TOTAL,   DIRECT,   AND INDIRECT EFFECTS ******************** Total effect of X on Y
417
Effect . 5846
SE .2007
Direct effect of X on Y Effect SE .4573 .1946
t
2.9132
t
2.3505
P
.0039
P
.0196
Indirect effect of X on Y
Effect        Boot SE      BootLLCI BootULCI Commitme .1273 .0716 .0232 .3350
Partially standardized indirect effect of X on Y
Effect        Boot SE      BootLLCI BootULCI Commitme .1818 .1002 .0325 .4684
Completely standardized indirect effect of X on Y
Effect Boot  SE       BootLLCI BootULCI
Commitme .0405 .0220 .0073 .1032
Ratio of indirect to total effect of X on Y
Effect Boot  SE       BootLLCI BootULCI
Commitme .2177 1.9048 .0348 1.4074
Ratio of indirect to direct effect of X on Y
Commitme
Effect        Boot SE BootLLCI .2783 6.4664 .0222
R-squared mediation effect size (R-sq_med)
Effect Boot  SE BootLLCI
Commitme .0138 .0101 .0017
Preacher and Kelley  (2011) Kappa-squared
Effect Boot  SE BootLLCI
Commitme .0411 .0218 .0080
BootULCI 6.7410
BootULCI . 0480
BootULCI . 1044
The final part of the output (Output 10.8) shows the results of the Sobel test. As I have mentioned before, it is better to interpret the bootstrap confidence intervals than formal tests of significance; however, if you selected Sobel test in Figure 10.13 this is what you will see. Again, we're given the size of the indirect effect (b = 0.127), the standard error, associatedx-score (z = 1.95) andp-value (p = .051).9 Thep-value isn't quite under the not-at-all magic .05 threshold so technically we'd conclude that there isn't a significant indirect effect, but this just shows you how misleading these kind of tests can be: every single effect size had a confidence interval not containing zero, so there is compelling evidence that there is a small but meaningful mediation effect.
You might remember in regression, we calculate a test statistic (t) by dividing the regression coefficient by its standard error (as in equation (8.11)). We do the same here except we get a z instead of a t: z = 0.1273/0.0652 = 1.9526.
OUTPUT 10.7
418
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
OUTPUT 10.8
Normal theory tests for indirect effect
Effect se Z p
.1273 .0652 1.9526 .0509
LABCOAT LENI'S REAL RESEARCH 10.1
/ heard that Jane has a boil and kissed a tramp ©
Everyone likes a good gossip from time to time, but apparently it has an evolutionary function. One school of thought is that gossip is used as a way to derogate sexual competitors - especially by questioning their appearance and sexual behaviour. For example, if you've got your eyes on a guy, but he has his eyes on Jane, then a good strategy is to spread gossip that Jane has a massive pus-oozing boil on her stomach and that she kissed a smelly vagrant called Aqualung. Apparently men rate gossiped-about women as less attractive, and they were
more influenced by the gossip if it came from a woman with a high mate value (i.e., attractive and sexually desirable). Karlijn Massar and her colleagues hypothesized that if this theory is true then (1) younger women will gossip more because there is more mate competition at younger ages; and (2) this relationship will be mediated by the mate value of the person (because for those with high mate value gossiping for the purpose of sexual competition will be more effective). Eighty-three women aged from 20 to 50 (Age) completed questionnaire measures of their tendency to gossip (Gossip) and their sexual desirability (Mate_Value). Test Massar et al.'s mediation model using Baron and Kenny's method (as they did) but also using PROCESS to estimate the indirect effect (Massar et al. (2011).sav). Answers.are on the companion website (or look at Figure 1 in the original article, which shows the parameters for the various regressions).
10.4.6.
Reporting mediation analysis ©
Some people report only the indirect effect in mediation analysis, and possibly the Sobel test. However, I have repeatedly favoured using bootstrap confidence intervals, so you should report these, and preferably the effect size ic2 and its confidence interval:
/ There was a significant indirect effect of pornography consumption on infidelity through relationship commitment, b = 0.127, BCa CI [0.023, 0.335]. This represents a relatively small effect, k1 = .041, 95% BCa CI [.008, .104]
This is fine, but it can be quite useful to present a diagram of the mediation model, and indicate on it the regression coefficients, the indirect effect and its bootstrapped confidence intervals. For the current example, we might produce something like Figure 10.14.
FIGURE 10.14 Model of pornography consumption as a predictor of infidelity, mediated by relationship commitment. The confidence interval for the indirect effect is a BCa bootstrapped CI based on 1000 samples
MODERATION, MEDIATION AND MORE REGRESSION
419
b = -0.47, p = .028
Relationship Commitment
b = -0.27, p < .001
Infidelity
Direct effect, b = 0.46, p = .02 Indirect effect, b = 0.13, 95% CI [0.02, 0.34]
CRAMMING SAM'S TIPS
Mediation
Mediation is when the strength of the relationship between a predictor variable and outcome variable is reduced by including another variable as a predictor. Essentially, mediation equates to the relationship between two variables being 'explained' by a third. For example, the relationship between watching horror films and feeling scared at bedtime might be explained by scary images appearing in your head.
Mediation is tested by assessing the size of the indirect effect and its confidence interval. If the confidence interval contains zero then we cannot be confident that a genuine mediation effect exists. If the confidence interval doesn't contain zero, then we can conclude that mediation has occurred.
The size of the indirect effect can be expressed using kappa-squared ( k1 ). Values of 0 mean that the indirect effect is very small relative to its maximum possible value, and values close to 1 mean that it is as large as it could possibly be given the research design. A small effect is .01, a medium effect would be around .09, and a large effect in the region of .25.
10.5. Categorical predictors in regression <d
We saw in the previous chapter that it is possible to include a categorical predictor in a regression model when there are only two categories: we simply code these categories with 0 and l.10 However, often you'll collect data about groups of people in which there are more than two categories (e.g., ethnic group, gender, socio-economic status, diagnostic category). You might want to include these groups as predictors in the regression model. Given that we have seen how to include categorical predictors with two categories into a regression model (Section 9.2.2), it shouldn't be too inconceivable that we could then extend this model to incorporate several predictors that had two categories; therefore, if we want to include a predictor with more than two categories, all we need to do is convert it to several variables each of which has two categories. This is the essence of dummy coding.
10.5.1.
Dummy coding ®
10.5.1.1. What is dummy coding? (D
The obvious problem with wanting to use categorical variables as predictors is that often you'll have more than two categories. For example, if you'd collected data on religion you might have categories of Muslim, Jewish, Hindu, Catholic, Buddhist, Protestant, Jedi.11 Clearly these groups cannot be distinguished using a single variable coded with zeros and ones. Therefore, we use what are called dummy variables, which is a way of representing groups of people using only zeros and ones. To do it, we have to create several variables; in fact, the number of variables we need is one less than the number of groups we're recoding.
10 We saw in Section 9.2.2 why we use 0 and 1, and I elaborate on this issue in Section 11.2.1.
11 For those of you not in the UK, we had a census here a few years back in which a significant portion of people put down Jedi as their religion.
420
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
There are eight basic steps:
1 Count the number of groups you want to recode and subtract 1.
2 Create as many new variables as the value you calculated in step 1. These are your dummy variables.
3 Choose one of your groups as a baseline against which all other groups will be compared. Normally you'd pick the control group, or, if you don't have a specific hypothesis, the group that represents the majority of people (because it might be interesting to compare other groups against the majority).
4 Having chosen a baseline group, assign that group values of 0 for all of your dummy variables.
5 For your first dummy variable, assign the value 1 to the first group that you want to compare against the baseline group. Assign all other groups 0 for this variable.
6 For the second dummy variable assign the value 1 to the second group that you want to compare against the baseline group. Assign all other groups 0 for this variable.
7 Repeat this process until you run out of dummy variables.
8 Place all of your dummy variables into the regression analysis in the same block.
Let's try this out using an example. In Chapter 5 we encountered a biologist who was worried about the potential health effects of music festivals. She collected some data at the Download Festival, which is a music festival specializing in heavy metal. The biologist was worried that the findings that she had were a function of the fact that she had tested only one type of person: metal fans. Perhaps it's only metal fans who get smellier at festivals (as a metal fan, 1 would at this point sacrifice the biologist to Odin for being so prejudiced). To find out whether the type of music a person likes predicts whether hygiene decreases over the festival, the biologist went to the Glastonbury Music Festival, which has an eclectic clientele. Again, she measured the hygiene of concertgoers over the three days of the festival using a technique that results in a score ranging between 0 (you smell like you've bathed in sewage) and 4 (you smell of freshly baked bread). The data are in the file called GlastonburyFestivalRegression.sav. This file contains the hygiene scores for each of three days of the festival as well as a variable called change, which is the change in hygiene over the three days of the festival (so it's the change from day 1 to day 3).12 The biologist categorized people according to their musical affiliation: she used the label 'indie kid' for people who mainly like alternative music, 'metaller' for people who like heavy metal, and 'crusty' for people who like hippy/folky/ambient type of stuff. Anyone not falling into these categories was labelled 'no musical affiliation'. In the data file she coded these groups 1, 2,3 and 4, respectively.
We have four groups, so there will be three dummy variables (one less than the number of groups). The first step is to choose a baseline group. We're interested in comparing those that have different musical affiliations against those that don't, so our baseline category will be 'no musical affiliation'. We give this group a code of 0 for all of our dummy variables. For our first dummy variable, we could look at the 'crusty' group, and to do this we give anyone who was a crusty a code of 1, and everyone else a code of 0. For our second dummy variable, we
TABLE 10.2   Dummy coding for the Glastonbury Festival data
CHAPTER 10 MODERATION, MEDIATION AND MORE REGRESSION
421
Dummy Variable 1      Dummy Variable 2      Dummy Variable 3
No Affiliation Indie Kid Metaller Crusty
0 0
0
1
0
0
1
0
0
1
0 0
fc^ Recode tntc Different variables
Numeric Variable ■> Output Variaole
$ Ticket Number [tictcnumo] $ Hygiene (Day1 of Glaston.. $ Hygiene (Da> 2 oFGIaston ..
Hygiene (Day 3 of Glaston... j f Change in Hygiene OverT..
Output variable Name
J Crusty_
Label.
jNo AtTihaton vs. Crusty
' lange!
(optional case selection condition)
Reset    :ancel    Help :
'- Not everyone could be measured on day 3, so there is a change score only for a subset of the original sample.
could look at the 'metaller' group, and to do this we give anyone who was a metaller a code of 1, and everyone else a code of 0. Our final dummy variable will code the 'indie kid' category. To do this, we give anyone who was an indie kid a code of 1, and everyone else a code of 0. The resulting coding scheme is shown in Table 10.2. Note that each group has a code of 1 on only one of the dummy variables (except the base category, which is always coded as 0).
10.5.1.2. The recode function ®
We looked at why dummy coding works in Section 9.2.2, so let's look at how to recode our grouping variable into these dummy variables using SPSS. To recode variables you need to use the recode function. Select Iransform0 Recode into Different variables to access the dialog box in Figure 10.15. The Recode dialog box lists all of the variables in the data editor, and you need to select the one you want to recode (in this case music) and transfer it to the box labelled Numeric Variable —> Output Variable by clicking on *.. You then need to name the new variable (the Output Variable as SPSS calls it) by going to the part labelled Output Variable and typing a name for your first dummy variable in the box labelled Name (let's call it Crusty). You can give this variable a more descriptive name by typing something in the box labelled Label (for this first dummy variable I've labelled it 'No Affiliation vs. Crusty'). Click on change to transfer this new variable to the box labelled Numeric Variable -» Output Variable (this box should now say music —> Crusty).
Having defined the first dummy variable, we need to tell SPSS how to recode the values of the variable music into the values that we want for the new variable, Crusty. To do this, click on om and New values to access the dialog box in Figure 10.16. This dialog box is used to change values of the original variable into different values for the new variable. For our first dummy variable, we want anyone who was a crusty to get a code of 1 and everyone else to get a code of 0. Now, crusty was coded with the value 3 in the original variable, so you need to type the value 3 in the section labelled Old Value in the box labelled Value. The new value we want is 1, so we need to type the value 1 in the section labelled New Value in the box labelled Value. When you've done this, click on i a*1 I to add this change to the list of changes (the list is displayed in the box labelled Old —> New, which should now say 3 —> 1 as in the diagram). The next thing we need to do is to change the remaining groups to have a value of 0 for the first dummy variable. To do this just select <§■ ah other values and type the value 0 in the section labelled New Value in the box labelled Value.13 When you've
13 Using this ■ * other values option is fine when you don't have missing values in the data, but just note that when you do (as is the case here) cases with both system-defined and user-defined missing values will be included in the recode. One way around this is to recode only cases for which there is a value (see Oliver Twisted). The alternative is to recode missing values specifically using the '®' RaQ9e option. It is also a good idea to use the frequencies or crosstabs commands after a recode and check that you have caught all of these missing values.
FIGURE 10.15 Recode dialog box
422
OLIVER TWISTED
Please, Sir, can I have some more ... receding?
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
'Our data set has missing values', worries Oliver. 'What do we rj0 if we only want to recode cases for which we have data?' Well We can set some other options. If you want to know more, the additional material for this chapter on the companion website will tell you. Ston worrying, Oliver, everything will be OK.
FIGURE 10.16 Recode dialog box for changing old values to new (see also SPSS Tip 10.2)
iJS Pecode into Different Variables: Old and New Values
Old Value • Value
' J System-missing
O System- or user-missing
O Range:
(Jew value	--—
Rvalue. |1	
O Sjstem-missing	
0 Cop>oldvalue(s)	
through
0 Range, LOWEST through value O Range, value through HIGHEST ■J All other values
Old -> New.
3-»1
I Output variables are strings
■ Convert numenc strings to numoers f5'->5)
[Continue)   Cancel Heir
done this, click on [ "*» ) to add this change to the list of changes (this list will now also say ELSE -h> 0). When you've done this, click on [contmuT] to return to the main dialog box, and then click on I ok ) to create the first dummy variable. This variable will appear as a new column in the data editor, and you should notice that it will have a value of 1 for anyone originally classified as a crusty and a value of 0 for everyone else.
SELF-TEST Try creating the remaining two dummy variables (call them Metaller and lndie_Kid) using the same principles.
10.5.2.
SPSS output for dummy variables
Let's assume you've created the three dummy coding variables (if you're stuck there is a data file called GlastonburyDummy.sav (the 'Dummy' refers to the fact it has dummy variables in it - I'm not implying that if you need to use this file you're a dummy©). With dummy variables, you have to enter all related dummy variables in the same block (so use the Enter method).
CHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
423
SPSS TIP 10.2
Using syntax to recode
If you're doing a lot of recoding it soon becomes pretty tedious using the dialog boxes all of the time. I've written the syntax file, RecodeGlastonburyData.sps, to create all of the dummy variables we've discussed. Load this file and run the syntax, or type the following into a new syntax window (see Section 3.9):
0) INTO Crusty. 0) INTO Metaller. 0) INTO Indie Kid.
DO IF(1-MISSING(change)). RECODE music (3=1)(ELSE RECODE music (2=1)(ELSE RECODE music (1=1)(ELSE END IF.
VARIABLE LABELS Crusty 'No Affiliation vs. Crusty'. VARIABLE LABELS Metaller 'No Affiliation vs. Metaller'. VARIABLE LABELS Indie Kid 'No Affiliation vs. Indie Kid'. VARIABLE LEVEL Crusty Metaller lndie_Kid (Nominal). FORMATS Crusty Metaller lndie_Kid (F1.0). EXECUTE.
Each recode command does the equivalent of the dialog box in Figure 10.16. So, the three lines beginning recode ask SPSS to create three new variables (Crusty, Metaller and lndie_Kid), which are based on the original variable music. For the first variable, if music is 3 then it becomes 1, and every other value becomes 0. For the second, if music is 2 then it becomes 1, and every other value becomes 0, and so on for the third dummy variable. Note that all of these recode commands are within an;/ statement (beginning do if and ending with end if). This tells SPSS to carry out the recode commands only if a certain condition is met. The condition we have set is {1-MISSING(change)). MISSING is a built-in command that returns 'true' (i.e., the value 1) for a case that has a system- or user-defined missing value for the specified variable; it returns 'false' (i.e., the value 0) if a case has a value. Hence, MISSING(change) returns a value of 1 for cases that have a missing value for the variable change and 0 for cases that do have values. We want to recode the cases that do have a value for the variable change, therefore we use '1-MISSING(change)\ This command reverses MISSING(change) so that it returns 1 (true) for cases that have a value for the variable change and 0 (false) for system- or user-defined missing values. To sum up, the statement DOIF(1 -MISSING(change)) tells SPSS 'Do the following recode commands if the case has a value for the variable change.'
The variable labels command tells SPSS to assign the text in the quotations as labels for the variables Crusty, Metaller, and lndie_Kid, respectively. It then sets these three variables to be 'nominal', and the formats command changes the variables to have a width of 1 and 0 decimal places (hence the 1.0). The execute is essential: without it none of the commands beforehand will be executed. Note also that every line ends with a full stop.
SELF-TEST Use what you learnt in Chapter 8 to run a multiple regression using the change scores as the outcome, and the three dummy variables (entered in the same block) as predictors.
Let's have a look at the output. Output 10.9 shows the model statistics. We see that by entering the three dummy variables we can explain 7.6% of the variance in the change in hygiene scores (the R2 value x 100%). In other words, 7.6% of the variance in the change in hygiene can be explained by the musical affiliation of the person. The ANOVA (which shows the same thing as the R2 change statistic because there is only one step in this regression) tells us that the model is significantly better at predicting the change in hygiene scores
424
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
OUTPUT 10.9
Model Summary
Mcil '	R	R Sauae	Acjustec R Si i. • • i	5td. Er'Or ot 1 " i i no;-.	Change Statistics "—i				
					R Squa-e Chance	r Change	dfl	an	' '-''Mm
-	.2 7G	.376	.053	.0??! '		3.27C	3		.OTT
3 Predictors. iConstanti No Affilatioi vs. Indie Kid. No ATihation vs. Metaller. No Artiliaton vs Crust/
ANOVA"
Mvťf; ..................	Su-n of Squares	Clf	Mean square	r	Sitj.
1 Regression	■ ; i				02 4*
Residual	56.358	119	474		
Total	61.004				
a. Dependent Variable Change in Hygiene Over The Festival
b. Predictors Constant). No Affiliation vs. Indie Kid. So Affiliation vs. Crusty. No Affiliation vs Metaller
c Predictors iCanstanii. No Affiliation vs. indie Kirj. No Affiliaton vs. Merallei. No Aftiiiatior vs trusts
than having no model (put another way, the 7.6% of variance that can be explained is a significant amount).
Output 10.10 shows a basic Coefficients table for the dummy variables, which is the more interesting part of the output. The first thing to notice is that each dummy variable appears in the table with a useful label (such as No Affiliation vs. Crusty) because when we recoded our variables we gave each variable a useful label; if we hadn't done this then the table would contain the less helpful variable names of Crusty, Metaller and Indie_Kid. The labels that I have used remind me of what each dummy variable represents. The first dummy variable (No Affiliation vs. Crusty) shows the difference between the change in hygiene scores for the no affiliation group and the crusty group. Remember that the beta value tells us the change in the outcome due to a unit change in the predictor. In this case, a unit change in the predictor is the change from 0 to 1. By including all three dummy variables at the same time, zero will represent our baseline category (no affiliation). For this variable 1 represents 'Crusty'. Therefore, the change from 0 to 1 represents the change from no affiliation to Crusty. Therefore, this variable represents the difference in the change in hygiene scores for a crusty, relative to someone with no musical affiliation. This difference is the difference between the two group means (see Section 9.2.2).
To illustrate this fact, I've produced a table (Output 10.11) of the group means for each of the four groups and also the difference between the means for each group and the no affiliation group. These means represent the average change in hygiene scores for the three groups (i.e., the mean of each group on our outcome variable). If we calculate the difference in these means for the no affiliation group and the crusty group we get, crusty - no affiliation = (-0.966) - (-0.554) = -0.412. In other words, the change in hygiene scores is greater for the crusty group than it is for the no affiliation group (crusties' hygiene decreases more over the festival than those with no musical affiliation). This value is the same as the unstandard-ized beta value in Output 10.10. So, the beta values tell us the relative difference between each group and the group that we chose as a baseline category. This beta value is converted to a t-statistic and the significance of this t reported. As we've seen before this f-statistic tests whether the beta value is 0; therefore, when we have two categories coded with 0 and 1, it tests whether the difference between group means is 0. If it is significant then the group coded with 1 is significantly different from the baseline category - so, it's testing the difference between two means, which is the context in which students are most familiar with the t-sta-tistic (see Chapter 9). For our first dummy variable, the £-test is significant, and the beta value has a negative value so we could say that the change in hygiene scores goes down as a person changes from having no affiliation to being a crusty. Bear in mind that a decrease in hygiene scores represents greater change (you're becoming smellier) so what this actually means is that hygiene decreased significantly more in crusties compared to those with no musical affiliation.
-HAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
425
Coefficient*'
	UnsundardijiKi Coerlicients		standardised Coefficient*		SQ.	9S.0% Confidence Interva: tor «	
	B	Sid Ei'or	Be-'.:			Loi«r Bound	UCf '    E'.L Id
■j (Cement)	-.55*	0?0		-6.131	'.. c	-.733	-.373
No AffnatNm «s. Ciusry	-.412	167	-,212\	-2 464	.01 s	- 742	-.081
NO Affi laoor vs. Mctalk'	.028	160	.017	"-—17?	.860	.289	346
No Afti lanor vs. Indie Kid	-.110	105		-2.001	048	-.816	. -, .
a. Dependent Variable Charge in Hyg er* Ovei Tne festival
Bootstrap toi Coefficients
			Boo'sttap				
					Sg 12-	BCa 95 - Confidence interval	
		1.	Btas	SM. f nof		1 oap 1	i - i. i
fComoag		.-"	.Of "	.097	r.»>:	-.736	-.349
No Affiliation rfs	Crusty	-.412	- 011	179	030	-.733	- 101
No AffiUtion vs Metall»		.028	-.006	.149	.847	-.262	.293
Nn Affiliation M KM	Indie	-.410	-.010	.201	049	-.813	-043
. Unless otherwise noted. Bootstap results a-e based on 1000 bootstrap sanoles
OLAP Cubes
|Variables=Chanqe in Hygiene Over The Festival
Musical Affiliation	Mean	Std. Deviation	N
Indie Kid	-0.964	0.570	14
Metaller	-0.526	0.576	27
Crust)/	-0.966	0.760	24
No Musical Affiliation	-0.554	0.708	58
Crusty- No Musical Affiliation	-0.412	0.052	-34
Metaller- No Musical Affiliation	0.028	-0.133	-31
Indie Kid - No Musical Affiliation	-0.410	-0.038	-44
Total	-0.675	0.707	123
Our next dummy variable compares metallers to those that have no musical affiliation. The beta value again represents the difference in the change in hygiene scores for a person with no musical affiliation compared to a metaller. The difference in the group means for the no affiliation group and the metaller group is metaller - no affiliation = (-0.526) -(-0.554) = 0.028. This value is again the same as the unstandardized beta value in Output 10.10. For this second dummy variable, the West is not significant. We could conclude that the change in hygiene scores is similar if a person changes from having no affiliation to being a metaller: the change in hygiene scores is not predicted by whether someone is a metaller compared to if they have no musical affiliation.
For the final dummy variable, we're comparing indie kids to those that have no musical affiliation. The beta value again represents the shift in the change in hygiene scores if a person has no musical affiliation, compared to someone who is an indie kid. The difference in the group means for the no affiliation group and the indie kid group is indie kid - no affiliation = (-0.964) - (-0.554) = -0.410. It should be no surprise to you by now that this is the unstandardized beta value in Output 10.10. The f-test is significant, and the beta value has a negative value so, as with the first dummy variable, we could say that the change in hygiene scores goes down as a person changes from having no affiliation to being an indie kid. Bear in mind that a decrease in hygiene scores represents more change (you're becoming smellier) so this actually means that hygiene decreased significantly more in indie kids compared to those with no musical affiliation. We could report the results as in Table 10.3 (note I've included the bootstrap confidence intervals).
So, overall this analysis has shown that compared to having no musical affiliation, crusties and indie kids get significantly smellier across the three days of the festival, but
OUTPUT 10.10
OUTPUT 10.11
426
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
TABLE 10.3
and accelerated confidence intervals reported in standard errors based on 1000 bootstrap samples
Linear model of predictors of the change in hygiene scores (95% bias corrected
parentheses). Confidence intervals
Constant
No Affiliation vs. Crusty
No Affiliation vs. Metaller
No Affiliation vs. Indie Kid
Note. R'=.08{p =.024).
(
-0.55 -0.74, -0.35)
-0.41 0.73. -0.10)
0.03 0.26, 0.29)
-041
-0.81, -0.04)
0.10
0.18
0.15
0.20
.23
02
-.19
P = 001
P =.030
P = .847
P = .049
metallers don't. This section has introduced some really complex ideas that I expand upon in Chapter 11. It might all be a bit much to take in, and so if you're confused or want to know more about why dummy coding works in this way I suggest reading Section 11.2.1 and then coming back here. Alternatively, read Hardy's (1993) excellent monograph.
10.6. Brian's attempt to woo Jane (d
FIGURE 10.17 What Brian learnt from this chapter
Dummy coding: categorical variables with more than two categories are coded into variables all of which have values of only 0 or 1
How do I include categorical predictors into my model?
Bootstrap 95% CIs
Effect size: kappa-squared
The strength of the relationship between a predictor variable and outcome variable is reduced by including another variable as a predictor
Indirect effect: the effect of the predictor through the mediator
If the interaction is significant then there is moderation
When the relationship between two variables changes as a function of a third variable
Direct effect: the effect of the predictor independent of the mediator
Simple slopes look at the relationship between the predictor and outcome at low. mean and high levels of the moderator
Centre the predictors
cHAPTER 10  MODERATION, MEDIATION AND MORE REGRESSION
10.7. What next? ® f
we started this chapter by looking at my relative failures as a human being compared to Simon Hudson. I then bleated on excitedly about moderation and mediation, which could explain why Clair Sparks chose Simon Hudson all those years ago. Perhaps she could see the writing on the wall! I was true to my word to my parents, though, and I was philosophical about it. I set my sights elsewhere during the obligatory lunchtime game of kiss chase. However, my life was about to change beyond all recognition. Not that I believe in fate, but if I did I would have believed that the wrinkly and hairy hand of fate (I don't know why but I always imagine it wrinkly, hairy and in need of a manicure) had decided that I was far too young to be getting distracted by such things as girls. Waggling its finger at me, it plucked me out of primary school and cast me down into what can only be described as hell, also known as an all-boys' school. It's fair to say that my lunchtime primary school game of kiss chase was the last I would see of girls for quite some time ...
10.8. Key terms that I've discovered
Grand mean centring Direct effect Index of mediation Indirect effect
Interaction effect Mediation Mediator Moderation
Moderator
Simple slopes analysis Sobel test
10.9. Smart Alex's tasks
Task 1: McNulty et al. (2008) found a relationship between a person's Attractiveness and how much Support they give their partner as newlyweds. Is this relationship moderated by gender (i.e., whether the data were from the husband or wife)? The data are in McNulty et al. (2008).sav.14 ©
•  Task 2: Produce the simple slopes graphs for the above example. ©
Task 3: McNulty et al. (2008) also found a relationship between a person's Attractiveness and their relationship Satisfaction as newlyweds. Using the same data as the previous examples, is this relationship moderated by gender? ©
Task 4: In the chapter we tested a mediation model of infidelity for Lambert et al.'s data using Baron and Kenny's regressions. Repeat this analysis, but using Hook_Ups as the measure of infidelity. ©
Task 5: Repeat the above analysis but using the PROCESS tool to estimate the indirect effect and its confidence interval. ©
Task 6: In Chapter 3 (Task 5) we looked at data from people who had been forced to marry goats and dogs and measured their life satisfaction as well as how much they like animals (Goat or Dog.sav). Run a regression predicting life satisfaction from the type of animal to which a person was married. Write out the final model. ©
14 These are not the actual data from the study, but are simulated to mimic the findings in Table 1 of the original paper.
428
DISCOVERING STATISTICS USING IBM SPSS STATISTICS
Task 7: Repeat the analysis above but include animal liking in the first block, and type of animal in the second block. Do your conclusions about the relationship between type of animal and life satisfaction change? ©
Task 8: Using the GlastonburyDummy.sav data, which you should've already analysed, comment on whether you think the model is reliable and generalizable. ©
■ Task 9: Tablets like the iPad are very popular. A company owner was interested in how to make his brand of tablets more desirable. He collected data on how cool people perceived a product's advertising to be (Advert_Cool), how cool they thought the product was (Product_Cool), and how desirable they found the product (Desirability). Test his theory that the relationship between cool advertising and product desirability is mediated by how cool people think the product is (Tablets.sav). Am I showing my age by using the word 'cool' ? ®
Answers can be found on the companion website.
Comparing several means: ANOVA (GLM 1)
11
10.10. Further reading
Cohen, J., Cohen, R, Aiken, L., & West, S. (2003). Applied multiple regression/correlation analysis for
the behavioral sciences. Mahwah, NJ: Erlbaum. Hardy, M. A. (1993). Regression with dummy variables. Sage University Paper Series on Quantitative
Applications in the Social Sciences, 07-093. Newbury Park, CA: Sage. Hayes, A. F. (2013). An introduction to mediation, moderation, and conditional process analysis.
New York: Guilford Press.
FIGURE 11.1 My brother Paul (left) and I (right) in our very fetching school uniforms
11.1. What will this chapter tell me? ©
There are pivotal moments in everyone's life, and one of mine was at the age of 11. Where I grew up in England there were three choices when leaving primary school and moving on to secondary school: (1) state school (where most people go); (2) grammar school (where clever people who pass an exam called the Eleven Plus go); and (3) private school (where rich people go). My parents were not rich and I am not clever and consequently I failed my Eleven Plus, so private school and grammar school (where my clever older brother had gone) were out. This left me to join all of my friends at the local state school. I could not have been happier. Imagine everyone's shock when my parents received a letter saying that some extra spaces had become available at the grammar school; although the local authority could scarcely believe it and had checked the Eleven Plus papers several million times to confirm their findings, I was next on their list. I could not have been unhappier. So, I waved goodbye to all of my friends and trundled off to join my brother at Ilford County High School for Boys (a school that still hit students with a cane if they were particularly bad and that, for
429