5   Models for ordinal outcomes
Although the categories for an ordinal variable can be ordered, the distances between the categories are unknown. For example, in survey research, questions often provide the response categories of strongly agree, agree, disagree, and strongly disagree, but an analyst would probably not assume that the distance between strongly agreeing and agreeing is the same as the distance from agree to disagree. Educational attainments can be ordered as elementary education, high school diploma, college diploma, and graduate or professional degree. Ordinal variables also commonly result from limitations of data availability that require a coarse categorization of a variable that could, in principle, have been measured on an interval scale. For example, we might have a measure of income that is simply low, medium, or high.
Ordinal variables are often coded as consecutive integers from 1 to the number of categories. Perhaps because of this coding, it is tempting to analyze ordinal outcomes with the linear regression model. However, an ordinal dependent variable violates the assumptions of the lrm, which can lead to incorrect conclusions, as demonstrated strikingly by McKelvey and Zavoina (1975, 117) and Winship and Mare (1984, 521-523). Accordingly, with ordinal outcomes it is much better to use models that avoid the assumption that the distances between categories are equal. Although many different models have been designed for ordinal outcomes, in this chapter we focus on the logit and probit versions of the ordinal regression model (orm), introduced by McKelvey and Zavoina (1975) in terms of an underlying latent variable and in biostatistics by McCullagh (1980), who referred to the logit version as the proportionai odds model-As with the binary regression model, the orm is nonlinear, and the magnitude of the change in the outcome probability for a given change in one of the independent variables depends on the levels of all the independent variables. As with the brm, the challenge is to summarize the effects of the independent variables to fully reflect key substantive processes without overwhelming and distracting detail. For ordinal outcomes, as well as for the models for nominal outcomes in chapter 6, the difficulty of this task is increased by having more than two outcomes to explain.
Before proceeding, we caution that researchers should think carefully before concluding that their outcome is indeed ordinal. Simply because the values of a variable can be ordered, do not assume that the variable should be analyzed as ordinal. A variable that can be ordered when considered for one purpose could be unordered or ordered differently when used for another purpose. Miller and Volker (1985) show how different assumptions about the ordering of occupations resulted in different conclusions. A variable might also reflect ordering on more than one dimension, such as attitude
184
Chapter 5 Models for ordinal outcomes
5.1.1   A latent-variable model
185
scales that reflect both the intensity and the direction of opinion. Moreover, surveys commonly include the category "don't know", which probably does not correspond to the middle category in a scale, even though analysts might be tempted to treat it this way. Overall, when the proper ordering is ambiguous, the models for nominal outcomes discussed in chapter 6 should be considered.
We begin by reviewing the statistical model, followed by an examination of testing, fit, and methods of interpretation. These discussions are intended as a review for those who are familiar with the models. For a complete discussion, see Long (1997). We end the chapter by considering several less common models for ordinal outcomes, which can be fitted using ado-files that others have developed. We also introduce the stereotype logit model, added in Stata 9 with the slogit command, but we postpone a full discussion until chapter 6. As always, you can obtain sample do-files and data files by downloading the spost9_do and spost9_ado packages (see chapter 1 for details).
.1   The statistical model
The ORM can be developed in different ways, each of which leads to the same form of the model. These approaches to the model parallel those for the BRM. Indeed, the BRM can be viewed as a special case of the ordinal model in which the ordinal outcome lias only two categories.
.1.1   A latent-variable model
The ordinal regression model is commonly presented as a latent-variable model. Defining y* as a latent variable ranging from —oo to oo, the structural model is
y* = Xj/3 + Ei
Or, for the case of one independent variable,
y* =a + fai + Ei
where i is the observation and e is a random error, as discussed further below.
The measurement model for binary outcomes is expanded to divide y* into J ordinal categories,
yi=m    if tm_i < y* < rm    for m - 1 to J
where the outpoints ti through tj_i are estimated. (Some authors refer to these as thresholds.) We assume tq = — oo and rj — oo for reasons that will be clear shortly.
To illustrate the measurement model, consider the example used in this chapter. People are asked to respond to the following statement:
A working mother can establish just as warm and secure of a relationship with her child as a mother who does not work.
Ok
Possible responses are 1 = Strongly Disagree (sd), 2 = Disagree (d), 3 = Agree (a), and 4 = Strongly Agree (sa). The continuous latent variable can be thought of as the propensity to agree that working mothers can be warm and secure mothers. The observed response categories are tied to the latent variable by the measurement model:
y% =
if r0 = -oo < y? < n
if n < y* < t2
if T2 < y* < t3
if t3 < y* < t4 = oo
Thus when the latent y* crosses a cutpoint, the observed category changes. Anderson (1984) referred to ordinal variables created in this fashion as "grouped continuous" variables, and referred to what we call the ordinal regression model as the "grouped continuous model".
IT
Figure 5.1: Relationship between observed y and latent y* in ordinal regression model with one independent variable.
For one independent variable, the structural model is y* = a+j3x+et which is plotted in figure 5.1 along with the cutpoints for the measurement model. This figure is similar to that for the binary regression model, except that there are now three horizontal lines representing the cutpoints tu t2, and r3. The three cutpoints lead to four levels of y that are labeled on the right-hand side of the graph.
The probability of an observed outcome for a given value of x is the area under the curve between a pair of cutpoints. For example, the probability of observing y — m for given values of the xs corresponds to the region of the distribution where y* falls between tm_i and rm:
Pr (y = m | x) = Pr (rro_i < y* < rm \ x)
Substituting x/3 + e for y* and using some algebra leads to the standard formula for the predicted probability in the ORM,
Pr(y = m | x) = F (rm ~ xß) - F (t„
■x/3)
(5.1)
186
Chapter 5 Models for ordinal outcomes
where F is the cdf for e. In ordinal probit, F is normal with Var(e) = 1; in ordinal logit, F is logistic with Var(cr) = tt2/3. For y = 1, the second term on the right drops out because F (-co - x/3) ~ 0, and for y = J, the first term equals F (oo - x/3) = 1.
Comparing these equations with those for the brm shows that the orm is identical to the binary regression model, with one exception. To show this, we Jit chapter 4's binary model for labor-force participation using both logit and ologit (the command for ordinal logit):
. use http://www.stata-prese.com/data/lf2/binlfp2, clear (Data from 1976 PSID-T Mroz)
. logit lfp k5 k618 age wc he lug inc, nolog
{output omitted)
. estimates store logit
. ologit lfp k5 k618 age wc he lwg inc, nolog
(output omitted)
. estimates store ologit
To compare the coefficients, we combine them using estimates table, w;hich leads to the following table:1
. estimates table logit ologit,	b(%9.3f) t	label varwidth(30) equations(1:1)	1
Variable	logit	ologit	
#1			j
# kids < 6	-1.463	-1.463	
	-7.43	-7.43	
# kids 6-18	-0.065	-0.065	r;'.
	-0.95	-0.95	
Wife's age in years	-0.063	-0.063	
	-4.92	-4.92	
Wife College: l=yes 0=no	0.807	0.S07	
	3.51	3.51	
Husband College: l=yes 0=no	0.112	0.112	
	0.54	0.54	
Log of wife's estimated wages	0.605	0.605	
	4.01	4.01	
Family income excluding wife's	-0.034	-0.034	
	-4.20	-4.20	
Constant	3.182 4.94		
cutl			
Constant		-3.182	t
		-4.94	
legend: b/t
1. Because logit has a constant and ologit has a cutpoint, by default estimates table will not line up the coefficients from the two models. Rather, each of the independent variables will be listed twice, equatiohsd: 1) tells estimates table to line up the coefficients. This is easiest to understand if you try our command without the equationsd; 1) option.
5.1.2  A nonlinear probability model
187
The slope coefficients and their standard errors are identical, but for logit an intercept is reported (i.e., the coefficient associated with _cons), whereas for ologit the constant is replaced by the cutpoint labeled /cutl, which is equal but of opposite sign.
This difference is due to how the two models are identified. As the orm has been presented, there are "too many" free parameters; that is, you cannot estimate J — 1 thresholds and the constant, too. For a unique set of ml estimates to exist, an identifying assumption needs to be made about either the intercept or one of the cutpoints. In Stata, the orm is identified by assuming that the intercept is 0, and the values of all cutpoints are estimated. Some statistics packages for the orm instead fix one of the cutpoints to 0 and estimate the intercept. In presenting the brm, we immediately assumed that the value that divided y* into observed Os and Is was 0. In effect, we identified the model by assuming a threshold of 0. Although different parameterizations can be confusing, keep in mind that the slope coefficients and predicted probabilities are the same under either parameterization (see Long 1997, 122-23 for more details).
5.1.2   A nonlinear probability model
The ordinal regression model can also be developed as a nonlinear probability model without appealing to the idea of a latent variable. Here we show how this can be done for the ordinal logit model. First, we define the odds that an outcome is less than or equal to m versus greater than m given x: 5 ^ ^
<m >m
,, (x) =
Pr (y < m | x)
Pr|
> m x)
for to = 1, J ■
L----
For example, we could compute the odds of disagreeing or strongly disagreeing (i.e., m < 2) versus agreeing or strongly agreeing (m > 2). The log of the odds is assumed to equal
lnfi<mj>m (x) = Tm ~ *ß
(5.2)
For one independent variable and three categories (where we are fixing the intercept to equal 0),
In
Pr (y > 1 j x) Pr (y<2\x) Pr (y > 2 I x)
0
= T-2 -fftll /
Although it may seem confusing that the model subtracts fix rather than adding it, this is a consequence of computing the logit of y < m versus y > m. While we agree that it would be simpler to stick with rTO + fix, this is not the way the model is normally presented.
188
Chapter 5 Models for ordinal outcomes
5.2-1   Example of attitudes toward working mothers
189
5.2   Estimation using ologit and oprobit
The ordered logit and probit models can be fitted with the following commands and their basic options:
ologit depvar ['indepvars]  [if] [in]  [wei^/ii]  [, robust cluster(varname) level{#) nolog]
oprobit depvar [indepvars]  [if] [in]  [weight]  [, robust cluster(varname) level(#) nolog]
In our experience, these models take more steps to converge than the models for either binary or nominal outcomes.
Variable lists
depvar is the dependent variable. The specific values assigned to the outcome categories are irrelevant, except that larger values are assumed to correspond to "higher" outcomes. For example, if you had three outcomes, you could use the values 1, 2, and 3, or —1.23, 2.3, and 999. Up to 50 outcomes are allowed in Intercooled Stata and Stata/SE; 20 outcomes are allowed in Small Stata.
indepvars is a list of independent variables. If indepvars is not included, Stata fits a model with only outpoints.
Specifying the estimation sample
if and in qualifiers can be used to restrict the estimation sample. For example, if you want to fit an ordered logit model for only those in the 1989 sample, you could specify ologit warm age ed prst male white if yr89==l. . .
Listwise deletion Stata excludes cases in which there are missing.values for any of the variables in the model. Accordingly, if two models are fitted using the same dataset but have different sets of independent variables, it is possible to have different samples. We recommend that you use mark and markout (discussed in chapter 3) to explicitly remove cases with missing data.
Weights
Both ologit and oprobit can be used with fweights, pweights, and iweights. chapter 3 for more details.
See
Options
robust indicates that robust variance estimates are to be used. When cluster0 is specified, robust standard errors are automatically used. See chapter 3 for more details.
cluster (varname) specifies that the observations are independent across the groups specified by unique values of varname but not necessarily within the groups. See chapter 3 for further details.
level(#) specifies the level of the confidence interval for estimated parameters. By default, Stata uses a 95% interval. You can also change the default level, say, to a 90% interval, with the command set level 90.
nolog suppresses the iteration history.
5.2.1   Example of attitudes toward working mothers
Our example is based on a question from the 1977 and 1989 General Social Survey. As we have already described, respondents were asked to evaluate the following statement: "A working mother can establish just as warm and secure of a relationship with her child as a mother who does not work". Responses were coded as: 1 = Strongly Disagree (sd) , 2 = Disagree (d), 3 = Agree (a), and 4 = Strongly Agree (sa). A complete description of the data can be obtained by using describe, summarize, and tabulate:
. use http://www.stata-press.com/data/rf2/oxdwarm2 (77 & 89 General Social Survey)
. describe warm yr89 male white age ed prst
storage   display value variable name     type     format label variable label
warm byte '/.10.0g SD2SA
yr89 byte 7,10. Og yrlbl
male byte XlO.Og sexlbl
white byte %10.0g race21bl
age byte XlO.Og
ed byte s/t10. Og
prst byte °/.10.0g
. summarize warm yr89 male white age ed prst
Mom can have warm relations
with child Survey year: 1=1989 0=1977 Gender: l=male 0=female Race: l=white 0=not white Age in years Years of education Occupational prestige
Variable	Obs	Mean	Std. Dev.	Min	Max
warm	2293	2.607501	.9282156	1	4
yr89	2293	.3986044	.4897178	0	1
male	2293	.4648932	.4988748	0	1
white	2293	.8765809	.3289894	0	1
age	2293	44.93546	16.77903	18	89
ed	2293	12.21805	3.160827	0	20
prst	2293	39.58526	14.49226	12	82
190
Chapter 5 Models for ordinal outcomes
tabulate warm
Mom can have warm relations with child	Freq.	Percent	Cum.
SD	297	12.95	12.95
D	723	31.53	44.48
A	856	37.33	81.81
SA	417	18.19	100.00
Total	2,293	100.00	
Using these data, we fitted the model
Pr(warm ~ m | x.) ~ F(rm - x/3) - F(rm-i - x/3)
where
x,3 = /?yr89yr89 + /?malemale + /^^white + &geage + /3eded + /?prstprst
Here are the results from ologit and oprobit. We store each model with estimates store so that we can later make a table combining the results:
. ologit warm yr89 male white age ed prst, nolog Ordered logistic regression
Log likelihood = -2844.9123
Number of obs LR Chi2(6) Prob > chi2 Pseudo 32
2293 301.72 0.0000 0.0504
warm	Coef.	Std. Err.	z	P>lzl	[95*/. Conf.	Interval]	
yrS9	.5239025	.0798988	6.56	0.000	.3673037	.6805013	
male	-.7332997	.0784827	-9.34	0.000	-.8871229	-.5794766	
white	-.3911595	.1183808	-3.30	0.001	-.'6231815	-.1591374	■ lil
age	-.0216655	.0024683	-8.78	0.000	-.0265032	-.0168278	
ed	.0671728	.015975	4.20	0.000	.0358624	.0984831	i
prst	.0060727	.0032929	1.84	0.065	-.0003813	.0125267	
/cutl /cut2 /cut3	-2.465362 \ I -.630904 \ 1.261854	.2389126 .2333165 j.2340179			-2.933622 -1.088194 .8031873	-1.997102 -.173614 1.720521	
estimates store ologit
■■AÁ
5.2.1   Example of attitudes toward working mothers
191
. oprobit warm yr89 male white Ordered probit regression
age ed prst, nolog
Number of obs LR chi2(6)
2293 294.32
					Prob >	chi2	0.0000
Log likelihood = -2848.611					Pseudo	R2	0.049Í
warm	Coef.	Std. Err.		z	PXzl	[95% Conf.	Interval]
yr89	.3188147	.0468519	6	80	0.000	. 2269867	.4106427
male	-.4170287	.0455459	-9	16	0.000	-.5062971	-.3277603
white	-.2265002	.0694773	-3	26	0.001	-.3626733	-.0903272
age	-.0122213	.0014427	-8	47	0.000	-.0150489	-.0093937
ed	.0387234	.0093241	4	15	0.000	.0204485	.0569982
prst	.003283	.001925	1	71	0.088	-.0004899	.0070559
/cutl	-1.428578	. 1387742				-1.700571	-1.156586
/cut2	-.3605589	.1369219				-.6289209	-.092197
/cut 3	.7681637	.1370564				.4995381	1.036789
. estimates store oprobit
The information in the header and the table of coefficients is in the same form as discussed in chapter 3. Since we stored the results for both models, we can compare the results using estimates table:
. estimates table ologit oprobit, b(%9.3f) t label varwidth(30)
Variable	ologit	oprobit
warm		
Survey year: 1=1989 0=1977	0.524	0.319
	6.56	6.80
Gender: l=male 0=female	-0.733	-0.417
	-9.34	-9.16
Race: l=white 0=not white	-0.391	-0.227
	-3.30	-3.26
Age in years	-0.022	-0-012
	-8.78	-8.47
Years of education	0.067	0.039
	4.20	4.15
Occupational prestige	0.006	0.003
	1.84	1.71
cutl		
Constant	-2.465	-1.429
	-10.32	-10.29
cut2		
Constant	-0.631	-0.361
	-2.70	-2.63
cut 3		
Constant	1.262	0.768
	6.39	5.60
legend: b/t
192
Chapter 5 Models for ordinal outcomes
5.3.1   Testing individual coefficients
193
As with the BRM, the estimated coefficients differ from logit to probit by a factor of about 1.7, reflecting the differing scaling of the ordered logit and ordered probit models. Values of the 2-tests are very similar because they are not affected by the scaling, but they are not identical because of slight differences in the shape of the assumed distribution of the errors.
5.2.2   Predicting perfectly
If the dependent variable does not vary within one of the categories of an independent variable, there will be a problem with estimation. To see what happens, let's transform the prestige variable prst into a dummy variable:
. gen dumprst = (prst<20 & warm==l) . tab dumprst warm, miss
	Mom can have	warm relations with		child	
dumprst	SD	D	A	SA	Total
0	257	723	S56	417	2,253
1	40	0	0	0	40
Total	297	723	856	417	2,293
In all cases where dumprst is 1, respondents have values of SD for warm. That is, if you know dumprst is 1, you can predict perfectly that warm is 1 (i.e., SD). Although we purposely constructed dumprst so this would happen, perfect prediction can also occur in real data. If we fit the ORM using dumprst rather than prst, the perfectly predicted observations are dropped from the estimation sample.
. ologit warm	yr89 male white age ed dumprst.			nolog		
Ordered logistic regression				Number	of obs =	2293
				LR chi2(6)		447.02
				Prob >	chi2	0.0000
Log likelihood = -2772.2621				Pseudo	R2	0.0746
warm	Coef.	Std. Err.	z	P>|z|	[95% Conf.	Interval]
yr89	.5268578	.0805997	6.54	0.000	.3688853	.6848303
male	-.7251825	.0792896	-9.15	0.000	-.8805872	-.5697778
white	-.4240687	.1197416	-3.54	0.000	-.658758	-.1893795
age	-.0210592	.0024462	-8.61	0.000	-.0258536	-.0162648
ed	.072143	.0133133	5.42	0.000	.0460494	.0982366
dumprst	-34.58373	1934739	-0.00	1.000	-3792053	3791983
/cutl	-2.776233	.243582			-3.253645	-2.298822
/cut2	-.8422903	.2363736			-1,305574	-.3790065
/cut3	1.06148	.236561			.5978287	1.525131
Note: 40 observations completely determined.    Standard errors questionable.
The note at the bottom of the output above indicates the problem. In practice, the next step would be to delete the 40 cases in which dumprst equals 1 (you could use
the command drop if dumprst==l to do this) and refit the model without dumprst. This corresponds to what is done automatically for binary models fitted by logit and probit,
5.3   Hypothesis testing with test and Irtest
Hypothesis tests of regression coefficients can be evaluated with the 2-statistics in the estimation output, with test for Wald tests of simple and complex hypotheses, and with Irtest for the corresponding likelihood-ratio tests. We will briefly review each.
5.3.1   Testing individual coefficients
If the assumptions of the model hold, the ML estimators from ologit and oprobit are distributed asymptotically normally. The hypothesis H0:(3k = /?* can be tested with z = (j3k - (3* j /<jgt. Under the assumptions justifying ML, if HQ is true, then z is distributed approximately normally with a mean of 0 and a variance of 1 for large samples. For example, consider the results for the variable male from the ologit output above:
ologit warm male yr89 white {output omitted)
age ed prst, nolog
warm	Coef.	Std. Err.	z P>|z|	[957. Conf.	Interval]
male	-.7332997	.0784827	-9.34 0.000	-.8871229	-.5794766
We conclude that sex significantly affects attitudes toward working mothers (z = -9.34, p < 0.01 for a two-tailed test).
Either a one-tailed or a two-tailed test can be used as discussed in chapter 4.
The z-test in the output of estimation commands is a Wald test, which can also be computed using test. For example, to test i?o; Aaie — 0,
. test male
( 1)    [warm] male = 0
chi2(    1) = Prob > chi2 =
87.30 0.0000
We conclude that sex significantly affects attitudes toward working mothers (X2 = 87.30, d/ = l,p< 0.01).
The value of a chi-squared test with 1 degree of freedom is identical to the square of the corresponding z-test, which can be demonstrated with the display command:
. display "z*z=" -9.343*-9.343 2*2=87.291649
194
Chapter 5 Models for ordinal outcomes
5.4   Scalar measures of ßt using ßtstat
195
Another way to verify this is to use the information that test leaves in memory. After test, we type
. return list scalars:
r(drop) = 0
r(chi2) - 87.30028866314404
r(df) = 1
r(p) = 9.32342950928e-21
The return r(chi2) contains the value of the chi-squared test statistics. We can take the square root of this quantity to confirm the relationship between'the z-test and the chi-squared test.
. di "chi2=" r(chi2) "; sqrt(chi2)= " sqrt(r(chi2)) chi2=87.300289; sqrt(chi2)= 9.3434623
An lr test is computed by comparing the log likelihood from a full model with that of a restricted model. To test one coefficient, we begin by fitting the full model:
. ologit warm yr89 male white age ed prst, nolog Ordered logistic regression
Log likelihood = -2844.9123 (output omitted)
. estimates store fmodel
Then we fit the model, excluding male:
. ologit warm yr89 white age ed prst, nolog Ordered logistic regression
Log likelihood = -2889.278 (output omitted)
. estimates store nmodel
. lrtest fmodel nmodel
Likelihood-ratio test
(Assumption: nmodel nested in fmodel)
[lumber of obs LR chi2(6) Prob > chi2 Pseudo R2
2293 301.72 Ö.0000 0.0604
Number of obs LR chi2(5) Prob > chi2 Pseudo R2
LR chi2{l) = Prob > chi2 -
2293 212.98 0.0000 0.0355
88.73 0.0000
The resulting lr test can be interpreted to mean that the effect of being male is significant at the .01 level (LRX2 = 88.73, df = 1, p < .01).
5.3.2   Testing multiple coefficients
We can also test a complex hypothesis that involves more than one coefficient. For example, our model has three demographic variables: age", white, and male. To test that all the demographic factors are simultaneously equal to zero, Anaie = 0> we can use either a Wald or an lr test. For the Wald test, we fit the full model as before and then type
. test age white male ( 1)    [warm]age = 0 ( 2)    [warm]white = 0 C 3}    [warm] male = 0
chi2(   3) = 166.62 Prob > chi2 = 0.0000
We conclude that the hypothesis that the demographic effects of age, race, and sex are simultaneously equal to zero can be rejected at the .01 level (X2 = 166.62, df = 3, p < .01).
test can also be used to test the equality of effects as shown in chapter 4.
To compute an lr test of multiple coefficients, we first fit the full model and save the results with lrtest, saving(0). Then to test H0: &ge = /3Bhite = Pnale = 0, we fit the model that excludes these three variables and run lrtest:
. ologit warm yr89 male white age ed prst, nolog (output omitted)
. estimates store fmodel
. ologit warm yr89 ed prst, nolog
(output omitted)
. estimates store nmodel
. lrtest fmodel nmodel
Likelihood-ratio test
(Assumption: nmodel nested in fmodel)
LR chi2(3) = Prob > chi2 =
171.58 0.0000
We conclude that the hypothesis that the demographic effects of age, race, and sex are simultaneously equal to zero can be rejected at the .01 level {X2 = 171.58, 4f = 3,p< .01).
We find that the Wald and lr tests usually lead to the same decisions. When there are differences, they generally occur when the tests are near the cutoff for statistical significance. Because the lr test is invariant to reparameterization, we prefer the lr test.
5.4   Scalar measures of fit using fitstat
As we discuss more in chapter 3, scalar measures of fit can be useful in comparing competing models (see also Long 1997, 85-113). Several different measures can be computed after either ologit or oprobit with the SPost command fitstat:
196
Chapter 5 Models for ordinal outcomes
5.6   The parallel regression assumption
197
ologit warm yr89 male white age ed prst, nolog (output omitted)
f it stat
Measures of Fit for ologit	of warm		
Log-Lik Intercept Only:	-2995.770	Log-Lik Full Model:	-2844.912
D(2284):	5689.825	LR(6) :                  '" ,	301.716
		Prob > LR:	0.000
McFadden's R2:	0.050	McFadden's Adj R2:,	0.047
ML (Cox-Snell) R2:	0.123	Cragg-Uhler(Nagelkerke) R2:	0.133
McKelvey & Zavoina's R2:	0.127		
Variance of y*:	3.768	Variance of error:	3.290
Count R2:	0.432	Adj Count R2:	0.093
AIC:	2.489	AIC*n:	5707.825
BIC:	-11982.891	BIC :	-255.291
BIC used by Stata:	5759.463	AIC used by Stata:	5707.825
Using simulations, both Hagle and Mitchell (1992) and Windmeijer (1995) find that, for ordinal outcomes, McKelvey and Zavoina's R2 most closely approximates the R2 obtained by fitting the linear regression model on the underlying latent variable.
.5   Converting to a different parameterization*
Earlier, we noted that different software packages use different parameterizations to identify the model. Stata sets po = 0 and estimates T\, whereas some programs fix T\ — 0 and estimate j3Q. Although, all quantities of interest for the purpose of interpretation (e.g., predicted probabilities) are the same under both parameterizations, it is useful to see how Stata can fit the model under either parameterization. The key to understanding how this is done is the equation
Pr (y = m | x) = F.([rm - 5] - [P0 -S\~x0)-F ([rm_x - 5} - [0b -S\- x/3)
Without further constraints, it is possible to estimate only the differences rm - 5 and Po-S. Stata assumes 5 = 0O, which forces the estimate of 0O: to be 0, whereas some other programs assume 5 = T\, which forces the estimate of r\ to be 0. For example,
Model	Stata's	Alternative
parameter	estimate	parameterization
00	A)-A) = 0	00 ~ Tl
T\	n - ßo	n - n = o
72	72 - 00	*     T2 - Tl
73	T3-0o	73 - Ti
Although you would only need to compute the alternative parameterization if you wanted to compare your results with those produced by another statistics package, seeing how this is done illustrates why the intercept and thresholds are arbitrary. To
■
"Mi
M
ja?
W
..A
estimate the alternative parameterization, we use lincom to estimate the difference between Stata's estimates (see page 190) and the estimated value of the first cutpoint:2
. ologit warm yr89 male white age ed prst, nolog (output omitted)
. * intercept
. lincom 0 - _bE/cutlJ
C 1} - [cutl]_cons = 0
warm	Coef.	Std. Err.	z P>|z|	[95X Conf.	Interval]
(1)	2.465362	.2389126	10.32 0.000	1.997102	2.933622
Here we are computing the alternative parameterization of the intercept, ologit assumes that Pq = 0, so we simply estimate 0 - n; that is, 0-_b[/cutl]. The trick is that the outpoints are contained in the vector _b[], using the labels /cutl, /cut2, and /cut3. For the thresholds, we are estimating r2 - n and t3-tu which .correspond to Jb[/cut2]-_b[/cutl] and _b[/cut3]-J)[/cutlj:
. * cutpoint 2
. lincom _b[/cut2] - _b[/cutl] ( 1) - [cutl]_cons + [cut2]_cons = 0
Harm	Coef.     Std. Err.	z	P>lzl	[95'/, Conf.	Interval]
CD	1.834458 .0630432	29.10	0.000	1.710895	1.95802
. * cutpoint 3 . lincom _b[/cut3] - _b[/cutl]					
( 1) - tcutl]	_cons + [cut3]_cons = 0				
warm	Coef.     Std. Err.	z	P>lz!	[95'/, Conf.	Interval]
(1)	3.727216 .0826215	45.11	0.000	3.565281	3.889151
The estimate of n - n is, of course, 0.
5.6   The parallel regression assumption
Before discussing interpretation, it is important to understand an assumption that is implicit in the orm, known as both the parallel regression assumption and, for the ordinal logit model, the proportional odds assumption. Using (5.1), the orm can be written as
2. Prior to version 9, Stata labeled the cutpoints as, for example, .cutl instead of the newer /cutl. If you are using a version of Stata prior to version 9, simply replace the /s with _js in the commands in this chapter.
Chapter 5 Models for ordinal outcomes
198
Pr(y = l|x) = i?(Tro-x/3) ■ Pr(!/=mlx)=F(rm-x/3) - F (rm_i - x/3) form. = 2 to J - 1
pr (y =; J I x) = 1 - F (rm-i - x/3) These equations can be used to compute the cumulative probabilities, which have the simple form
pr(y < m 1 x) = F(rro -x/3)   for m = 1 to J - 1
This equation shows that the ORM is equivalent to J - 1 binary regressions with the ^foaSumption that the slope coefficients are identical across each regression. For example, with four outcomes and one independent variable, the equations are
Pr (y < 1 j x) = F (n - fix) Pr (y < 2 | x) = F (r2 - #c) Pr(p < 3 |x) = F(t3-/3x)
The intercept a is not in the equation since it has been assumed to equal 0 to identify the model. These equations lead to the following figure:
Pr(y<=1 I x) -----Pr(y<-21 x)
Pr(y<=3 I x) ^__
See chapter 1 for details.
5.6   The parallel regression assumption
199
This figure suggests that the parallel regression assumptioncan be tested by comparing the estimate from the J—1 binary regressions,
Pr(y < m | x) = F(rm - xf3m) form=l,J-l
where the /3s are allowed to differ across the equations. The parallel regression assumption implies that /32 = j32 = - ■ • = yf3J-_-L. To the degree that the parallel regression assumption holds, the coefficients 3i, 32, should be close. There are two
commands in Stata that perform this test.
An approximate LR test
The command omodel (Wolfe and Gould 1998) is not part of official Stata but can be obtained by typing net search omodel and following the prompts, omodel computes an approximate LR test. Essentially, this method compares the log likelihood from ologit (or oprobit) with that obtained from pooling J - I binary models fitted with logit (or probit), making an adjustment for the correlation between the binary outcomes defined by y < m. The syntax is
omodel [logit [probit] depvar [indepvars] [if] [in] [weight]
where the subcommand logit or probit indicates whether ordered logit or ordered probit is to be used. For example,
. omodel logit warm yr89 male white age ed prst (same output as for ologit warm yx89 male white age ed prst)
Approximate likelihood-ratio test of proportionality of odds across response categories:
chi2(12) =48.91 Prob > chi2 = 0.0000
Here the parallel regression assumption can be rejected at the .01 level. A Wald test
The LR test is an omnibus test that the coefficients for all variables are simultaneously equal. Accordingly, you cannot determine whether the coefficients for some variables are identical across the binary equations while coefficients for other variables differ. To this end, a Wald test by Brant (1990) is useful since it tests the parallel regression assumption for each variable. The messy details of computing this test are found in Brant (1990) or Long (1997, 143-144). In Stata, the test is computed quickly with brant, which is part of SPost. After running ologit (brant does not work with oprobit), you run brant with the syntax:
brant [, detail]
200
Chapter 5 Models for ordinal outcomes
The detail option provides a table of coefficients from each of the binary models. For example,
. brant, detail
Estimated coefficients from j-			1 binary regressions
	, y>i	y>2	y>3
yr89	.9647422	.56540626	.31907316
male	-.30536425	-.69054232	-1.0837888
white	-.55265759	-.31427081	-.39299842
age	-.0164704	-.02533448	-.01859051
ed	.10479624	.05285265	.05755466
prst	-.00141118	.00953216	.00553043
_cons	1.8584045	.73032873	-1.0245168
Brant Test of Parallel Regression Assumption
Variable	chi2		p>chi2		df	
All	49.	18	0.	000	12	A i "ŕ,
yr89	13	01	0.	001	2	
male	22	24	0.	000	2	r
white	1	27	0.	531	2	A,
age	7.	38	0.	025	2	
ed	4.	31	0.	116	2	
prst	4.	33	0.	115	2	
A significant test statistic provides evidence that the parallel regression assumption has been violated.
The chi-squared of 49.18 for the Brant test is close to the value of 48.91 from the LR test. However, the Brant test shows that the largest violations are for yr89 and male, which indicates that there may be problems related to these variables.
Caveat regarding the parallel regression assumption We find that the parallel regression assumption is frequently violated. When the assumption of parallel regressions is rejected, alternative models that do not impose the constraint of parallel regressions should be considered. Violation of the parallel regression assumption is not a rationale for using ordinary least squares regression since the assumptions implied by the application of the LRM to ordinal data are even stronger. Alternative models that can be considered include models for nominal outcomes discussed in chapter 6 or other models for ordinal outcomes discussed in section 5.9.
E
A
IP
*
■II ill
5.7  Residuals and outliers using predict
201
the last section. As noted by Hosmer and Lemeshow, the disadvantage of this approach is that you are evaluating only an approximation to the model you have fitted, because the coefficients of the binary models differ from those fitted in the ordinal model. But, if the parallel regression assumption is not rejected, you can be more confident in the results of your residual analysis.
To illustrate this approach, we start by generating three binary variables corresponding to warm < 2, warm < 3, and warm < 4:
. gen warmlt2 = (warm<2) if warm <. . gen warmlt3 = (warm<3) if warm <. . gen warmlt4 = (warm<4) if warm <.
For example, warmlt3 is 1 if warm equals 1 or 2, else 0. Next we estimate binary logits for warmlt2, warmlt3, and warmlt4 using the same independent variables as in our original ologit model. After estimating each logit, we generate standardized residuals using predict (for a detailed discussion of generating and inspecting these residuals, see chapter 4):
* warm < 2
. logit warmlt2 yr89 male white age ed prst (output omitted)
. predict rstd_lt2, rs
* warm < 3
. logit warmlt3 yr89 male wiiite age ed prst (output omitted)
. predict rstd_lt3, rs
* warm < 4
. logit warmlt4 yr89 male white age ed prst
(output omitted)
. predict rstd_lt4, rs
Next we create an index plot for each of the three binary equations. Using the results from the logit of warmlt3 yields the following graph:
(Continued on next page)
7   Residuals and outliers using predict
Although no methods for detecting influential observations and outliers have been developed specifically for the ORM, Hosmer and Lemeshow (2000, 305) suggest applying the methods for binary models to the J — 1 cumulative probabilities that were discussed in
202
Chapter 5 Models for ordinal outcomes
, sort prst
. gen index = _n
. graph twoway scatter rstd_lt3 index, yline(O) ylabel{-4(2)4) ///
> xtitleO'Qbservation Number")   xlabel(0(500)2293) ///
> msymbol(Oh)
"■■1000: v 1500 Observation Number
Given the size of the dataset, no residual stands out as being especially large. See section 4.4 for other ways in which you can examine outliers and influential cases.
.8 Interpretation
If the idea of a latent variable makes substantive sense, simple interpretations are possible by rescaling y* to compute standardized coefficients that can be used just like coefficients for the linear regression model. If the focus is on the categories of the ordinal variable (e.g., what affects the likelihood of strongly agreeing), the methods illustrated for the BRM can be extended to multiple outcomes. Because the ORM is nonlinear in the outcome probabilities, no approach can fully describe the relationship between a variable and the outcome probabilities. Consequently, you should consider each of these methods before deciding which approach is most effective in your application. For purposes of illustration, we continue to use the example of attitudes toward working mothers. Remember that the test of the parallel regression assumption suggests that this model is not appropriate for these data.
5.8.1   Marginal change in y*
5,8.1   Marginal change in y*
In the ORM, y* = x/3 + e, and the marginal change in y* with respect to xk is
203
dxi
= Pk
Because y* is latent (and hence its metric is unknown), the marginal change cannot be interpreted without standardizing by the estimated standard deviation oty*,
op* =3 Var(x)3 + Var(e)
where Var (x) is the covariance matrix for the observed xs, 0 contains ML estimates, and Var(e) = 1 for ordered probit and ?r2/3 for ordered logit. Then the y*-standardized coefficient for Xk is
k O-y,
which can be interpreted that for a unit increase in xk, y* is expected to increase by fify standard deviations, holding all other variables constant.
The fully standardized coefficient is
s       UlcPk „Sy* Pk ~ - = O-fePfc
which can be interpreted that for a standard deviation increase in xk, y* is expected to increase by /3| standard deviations, holding all other variables constant.
These coefficients can be computed with Xistcoef using the std option. For example, after fitting the ordered logit model,
. listcoef, std help
ologit (N=2293): Unstandardized and Standardized Estimates Observed SD: .9282156 Latent SD: 1.9410634
warm	b	z	P>|z|	bStdX	bStdY	bStdXY	SDofX
yr89	0.52390	6.557	0.000	0.2566	0.2699	0.1322	0.4897
male	-0.73330	-9.343	0.000	-0.3658	-0.3778	-0.1885	0.4989
white	-0.39116	-3.304	0.001	-0.1287	-0.2015	-0.0663	0.3290
age	-0.02167	-8.778	0.000	-0.3635	-0.0112	-0.1873	16.7790
ed	0.06717	4.205	O.OOO	0.2123	0.0346	0.1094	3.1608
prst	0.00607	1.844	0.065	0.0880	0.0031	0.0453	14.4923
b = raw coefficient z = z-score for test of b=0 P>lz| = p-value for z-test bStdX = x-standardized coefficient bStdY = y-standaxdized coefficient bStdXY = fully standardized coefficient SDofX = standard deviation of X
204
Chapter 5 Models for ordinal outcomes
If we think of the dependent variable as measuring support for mothers in the workplace, then the effect of the year of the interview can be interpreted as indicating that in 1989 support was .27 standard deviations higher than in 1977, holding all other variables constant.
To consider the. effect of education, each, standard deviation increase in education increases support by .11 standard deviations, holding all other variables constant.
5.8.2 Predicted probabilities
We usually prefer interpretations based somehow on predicted probabilities. These probabilities can be estimated with the formula
Pr (y = m | x) = F (rm - x/3) - F (rm_i - x/3)
with cumulative probabilities computed as
Pr (y < m | x) = F (rm - xj3^j
The values of x can be based on observations in the sample or can be hypothetical values of "interest. The most basic command for computing probabilities is predict, but our SPost commands can be used to compute predicted probabilities in particularly useful ways.
5.8.3 Predicted probabilities with predict
After fitting a model with ologit or oprobit, a useful first step is to compute the in-sample predictions with the command
predict newvarl  [newvarS [newvar3...] ]  [if] [in]
where you indicate one new variable name for each category of the dependent variable. For instance, in the following example predict specifies that the variables SDwarm, Dwarm, Awarm, and SAwarm should be created with predicted values for the four outcome categories:
. ologit warm yrS9 male white age ed prst, nolog (output omitted)
. predict SDlogit Dlogit Alogit SAlogit (option p assumed; predicted probabilities)
The message (option p assumed; predicted probabilities) reflects that predict can compute many different quantities. Because we did not specify an option indicating which quantity to predict, option p for predicted probabilities was assumed.
An easy way to see the range of the predictions is with dot plot, one of our favorite commands for quickly checking data:
5.8.4   Individual predicted probabilities with prvalue
label var SDwarm "Pr(SD)" label var Dwarm "Pr(D)" label var Awarm "Pr(A)" label var SAwarm "Pr(SA)1T
dotplot SDwarm Dwarm Awarm SAwarm, ylabel(0(.25).75)
205
Pr(SD)
Pr(D)
Pr(A>
Pr(SA)
The predicted probabilities for the extreme categories tend to be less than .25, with most predictions for the middle categories falling between .25 and .5. In only a few cases is the probability of any outcome greater than .5.
Examining predicted probabilities within the sample provides a first, quick check of the model. To understand and present the substantive findings, however, it is usually more effective to compute predictions at specific, substantively informative values. Our commands prvalue, prtab, prgen, and pr change are designed to make this simple.
5.8.4   Individual predicted probabilities with prvalue
Predicted probabilities for individuals with a particular set of characteristics can be computed with prvalue. For example, we might want to examine the predicted probabilities for individuals with the following characteristics:
• Working-class men in 1977 who are near retirement.
• Young, highly educated women with prestigious jobs.
• An "average" individual in 1977.
• An "average" individual in 1989.
206
Chapter 5 Models for ordinal outcomes
Each of these can be easily computed with prvalue (see chapter 3 for a discussion of options for this command). The predicted probabilities for older, working-class men are
. prvalue, x(yr89=0 male=l prst=20 age=64 ed=16) rest(mean) ologit: Predictions for warm Confidence* intervals by delta method
967. Conf.
Pr(y=SDlx): Pr(y=DIx): Pr(y=Alx): Pr(y=SAIx):
yr89 0
0.2317 0.4221 0.2723 0.0739
[ 0.1776, [ 0.3942, [ 0.2249, [ 0.0523,
male 1
white .8765809
age 64
Interval 0.2857] 0.4500] 0.3198] 0.0954]
ed 16
prst 20
For young, highly educated women with prestigious jobs, they are
. prvalue, x(yr89=l male=0 prst=80 age=30 ed=24) rest(mean) brief ologit: Predictions for warm
95% Conf. Interval
Pr(y=SD|x):           0.0164      [0.0106, 0.0222]
Pr(y=Dlx):             0.0781      [0.0554, 0.1008]
Pr(y=A|x):             0.3147      [0.2636, 0.3658]
Pr(y=SAIx):           0.5908      [0.5143, 0.6673]
and so on, for other sets of values.
We have set the values of the independent variables that define our hypothetical person using the x() and restO options. The output from the first call of prvalue lists the values that have been set for all independent variables. This allows you to verify that x() and restO did what you intended. For the second call, we added the brief option. This suppresses the output showing the levels of the independent variables. If you use this option, be certain that you have correctly specified the levels of all variables. Remember that the output of prvalue labels the categories according to the value labels assigned to the dependent variable. For example, Pr(y=SD | x) : 0.2317. As it is easy to be confused about the outcome categories when using these models, it is prudent to assign clear value labels to your dependent variable (see chapter 2).
We can summarize the results in a table that lists the ideal types and provides a clear indication of which variables are important:
5.8.5   Tables of predicted probabilities with prtab
207
Ideal type
Probability for outcome category:
Working-class men in 1977 who are near retirement Young, highly educated women in 1989 with
prestigious jobs An "average individual" in 1977 An "average individual" in 1989
SD	D	A	SA
.23	.42	.27	.07
.02	.08	.32	.59
.13	.36	.37	.14
.08	.28	.43	.21
5.8.5   Tables of predicted probabilities with prtab
In other cases, it can be useful to compute predicted probabilities for all combinations of a set of categorical independent variables. For example, the ideal types illustrate the importance of sex and the year when the question was asked. Using prtab, we can easily show the degree to which these variables affect opinions for those average, on other characteristics.
. prtab yr89 male, novarlbl ologit: Predicted probabilities for warm Predicted probability of outcome 1 CSD)
yr89	male Women Hen
1977 1989	0.0989 0.1859 0.0610 0.1191
Predicted probability of outcome 2 (D)	
yr89	male Women Men
1977 Í989	0.3083 0.4026 0.2282 0.3394
Predicted probability of outcome 3 (A)	
yr89	male Women Hen
1977 1989	0.4129 0.3162 0.4406 0.3904
208
Chapter 5 Models for ordinal outcomes
5.8.6   Graphing predicted probabilities with prgen
209
Predicted probability of outcome 4 (SA)
	male	
yr89	Women	Men
1977	0.1799 0	0953
1989	0.2703 0	1510
yr89 male x=    .39860445 .46489315
white age ed prst
.8765809   44.93B456   12.218055 39.585259
(tables for other outcomes omitted)
Tip Sometimes the output of prtab is clearer without the variable labels. These can be suppressed with the novarlbl option.
The output from prtab can be rearranged into a table that clearly shows that men are more likely than women to strongly disagree or disagree with the proposition thai working mothers can have relationships with their children that are as warm as those of mothers who do not work. The table also shows that between 1977 and 1989 there was a movement for both men and women toward more positive attitudes about working mothers.
1977	SD	D	A	SA
Men	.19	.40	.32	.10
Women	.10	.31	.41	.18
Difference	.09	.09	-.09	-.08
1989	SD	D	A	SA
Men	.12	.34	.39	.15
Women	.06	.23	.44	.27
Difference	.06	.11	-.05	-.12
Change from 1977 to 1989				
	SD	D	A	SA
Men	-.07	-.06	.07	.05
Women	-.04	-.08	.03	.09
prtab does not include confidence intervals. If we want confidence intervals for the probabilities in the table, we need to make separate calls to prvalue. For the first row in the above table, we could compute the confidence intervals as follows:
"Ž
1*
. prvalue, x(yr89=0 male=l) ologif. Predictions for warm Confidence intervals by delta method
95'/. Conf.
Pr(y=SDlx): 0.1859 Pr(y=D|x):' 0.4026 Pr(y=A|x): 0.3162 Pr(y=SAIx): 0.0953
yr89 0
male
1
[ 0.1631, t 0.3777, [ 0.2925, [ 0.0814,
white .8765809
Interval 0.2088] 0.4275] 0.3400] 0.1092]
age
44.935456
ed
12.218055
prst 39.585259
The confidence intervals for the other cells can be generated by the following commands:
prvalue, x(yr89=0 male=0) prvalue, x(yr89=l male=l) prvalue, x(yr89=l male=0)
5.8-6   Graphing predicted probabilities with prgen
Graphing predicted probabilities for each outcome can also be useful for the ORM. Here we consider women in 1989 and show how predicted probabilities are affected by age. Of course, the plot could also be constructed for other sets of characteristics. The predicted probabilities as age ranges from 20 to 80 are generated by prgen:
. prgen age, from(20) to(80) generate(w89) x(male=0 yr89=l) ncases(13) ologif. Predicted values as age varies from 20 to 80.
yr89 male white age ed prst
x= 1 0      .8765809   44.935456   12.218055 39.585259
You should be familiar with how x() operates, but it is useful to review the other options:
from(20) and to (80) specify the minimum and maximum values over which inc is to vary. The default is the variable's minimum and maximum values.
generate (w89) is the root name for the new variables.
ncases(13) indicates that 13 evenly spaced values of age between 20 and 80 should be generated.
Here w89x contains values of age ranging from 20 to 80. The p# variables contain the predicted probability for outcome # (e.g., w89p2 is the predicted probability of outcome 2). With ordinal outcomes, prgen also computes cumulative probabilities (i.e., summed) that are indicated by s (e.g., w89s2 is the sum of the predicted probability of outcomes 1 and 2).
210
Chapter 5 Models for ordinal outcomes
desc w89*
	storage	display	value	
variable name	type	format	label	variable label
w89x	float	7.9. Og		Age in years
w89pl	float	7.9. Og		pr(SD)=Pr(l)
w89p2	float	7.9. Og		pr(D)=Pr(2)
w89p3	float	7.9. Og		pr(A)=Pr(3)
w89p4	float	7.9. Og		pr C3A)=Pr(4)
w89sl	float	7.9. Og		pr(y<=l)
w89s2	float	7,9. Og		pr(y<=2)
w89s3	float	7.9. Og		prCy<=3)
w89s4	float	7.9. Og		pr(y<=4)
If
Although prgen assigns variable labels to the variables it creates, we can change these to improve the look of the plot that we are creating. Specifically,
. label var w89pl "SD"
. label var w89p2 "D"
. label var w89p3 "A"
. label var w89p4 "SA"
. label var w89sl "SD"
. label var w89s2 "SD or D"
. label var wS9s3 "SD, D or A"
First, we plot the probabilities of individual outcomes using graph. Because the graph command is long, we use ///to allow the commands to be longer than one line in our do-file.
. // step 1: graph predicted probabilities
. graph, twoway connected w89pl w89p2 w89p3 w89p4 w89x, ///
> titleO'Panel A: Predicted Probabilities") ///
> xtitleC'Age") xlabel(20(10)80) ylabelC0( .25) .50) ///
> yscale(noline) ylabelC") xline(44.93) ///
> ytitleC") name(graphl, replace)
This graph command plots the four predicted probabilities against generated values for age contained in w89x. Standard options for graph are used to specify the axes and labels. The vertical line specified by xline(44.93) marks the average age in the sample. This line is used to illustrate the marginal effect discussed in section 5.8.7. Option name (graphl, replace) saves the graph in memory under the name graphl so that we can combine it with the next graph, which plots the cumulative probabilities:
. // step 2: graph cumulative probabilities
. graph twoway connected w89sl w89s2 w89s3 w89x, ///
> titleO'Panel B: Cumulative Probabilities") ///
> xtitleC'Age*') xlabel(20(10)80) ylabel(0(.25) 1) ///
> yscale(noline) ylabeK"") name(graph2, replace) ///
> ytitleC")
5.8.7 Changes in predicted probabilities 211 Next we combine these two graphs (see chapter 2 for details on combining graphs):
. // step 3: combine graphs
. graph combine graphl graph2, col(l) iscale(*.9) imargin(small) /// > ysize(4.31) xsize(3.287)
This leads to figure 5.2. Panel A plots the predicted probabilities and shows that with age the probability of SA decreases rapidly, whereas the probability of D (and to a lesser degree, SD) increases. Panel B plots the cumulative probabilities. Both panels present the same information; which method you use is up to you.
Panel A: Predicted Probabilities
.() ■:>-','-flci^
SD
SD, Dor A
SDorD
Figure 5.2: Plot of predicted probabilities for the ordered logit model.
5.8.7   Changes in predicted probabilities
When there are many variables in the model, it is impractical to plot them all. In such cases, measures of change in the outcome probabilities are useful for summarizing the effects of each variable. Before proceeding, however, we hasten to note that values of
212
Chapter 5 Models for ordinal outcomes
both discrete and marginal change depend on the levels of all variables in the model. We return to this point shortly.
Marginal change with prchange
The marginal change in the probability is computed as
8 Pr(y = m | x)     dF{rm - xj3)    9F(rm_! - xj9)
dxk
dxk
dxk,
which is the slope of the curve relating x& to Pr{y = m | x), holding all other variables constant. Here we consider the marginal effect of age {dPr(y = m | x) /Sage} for women in 1989 who are average on all other variables. This corresponds to the slope of the curves in panel A of figure 5.2 evaluated at the vertical line (recall that this line is drawn at the average age in the sample). The marginal is computed with prchange, where we specify that only the coefficients for age should be computed:
. prchange age, x(male=0 yr89=l) rest(mean) ologit: Changes in Probabilities for warm
age
Min->Max -+1/2 -+sd/2 MargEfct
AvglChgl .16441468 .00222661
.0373125 .00222662
SD
SD
.10941909 .00124099 .0208976 .00124098
D
.21941006 .00321223 .05372739 .00321226
A
A
-.05462247 -.0001803 -.00300205 -.00018032
SA
SA
.27420671 .00427291 .07162295 .00427292
D
Pr(ylx)    .06099996    .22815652    .44057754 .27026597
yr89        male      white age ed prst
x= 1 0    .876581   44.9355    12.2181 39.5853
sd(x)=    .489718    .498875    .328989     16.779   3.16083 14.4923
The first thing to notice is the row labeled Pr (y|x), which is the predicted probabilities at the values set by x() and restQ. In panel A, these probabilities correspond to the intersection of the vertical line and the probability curves. The row MargEfct lists the slopes of the probability curves at the point of intersection with the vertical line in the figure. For example, the slope for SD (shown with circles) is .00124, whereas the slope for A (shown with squares) is negative and small. As with the BRM, the size of the slope indicates the instantaneous rate of change but does not correspond exactly to the amount of change in the probability for a change of one unit in the independent variable. However, when the probability curve is approximately linear, the marginal effect can be used to summarize the effect of a unit change in the variable on the probability of an outcome.
Marginal change with mfx
Marginal change can also be computed using mfx, where at () is used to set values of the independent variables. And it estimates the marginal effects for only one outcome category at a time, where the category is specified with the option predict (outcome (#)).
5.8.7   Changes in predicted probabilities
213
Using the same values for the independent variables as in the example above, we obtain the following results:
. mfx, at(male=0 yr89=l) predict(outcome(D)
warning: no value assigned in at() for variables white age ed prst; means used for white age ed prst
Marginal effects after ologit
y   - Pr(warm==l)  (predict, ontcome(l)) = .06099996
„iL	variable	dy/dx	Std. Err,	z	P>|z|	[       95'/. CI. ]	X
	yr89*	-.0378526	.00601	-6.30	0.000	-.049633 -.026072	1
	male*	.0581355	.00731	7.95	0.000	.043803    .072468	0
	white*	.0197511	.0055	3.59	0.000	.008972 .03053	.876581
	age	.001241	.00016	7.69	0.000	.000925 .001557	44.9355
	ed	-.0038476	.00097	-3.96	0.000	-.005754 -.001941	12.2181
-	prst	-.0003478	.00019	-1.83	0.068	-.000721 .000025	39.5853
í	(*) dy/dx	is for discrete	change of	dummy variable		from 0 to 1	
The marginal for age is .001241, which matches the result obtained from prchange. The advantage of mfx is that it computes standard errors and confidence intervals.
Discrete change with prchange
As the marginal change can be misleading when the probability curve is changing rapidly or when an independent variable is a dummy variable, we prefer using discrete change (mfx computes discrete change for independent variables that are binary but not for other independent variables). The discrete change is the change in the predicted probability for a change in from the start value x$ to the end value x& (e.g., a change from Xk = 0 to Xk = 1). Formally,
A Pr (y = m \ x) Axt ~
Pr (y = m | x, xk = xE) - Pr (y - m | x, xk = xs)
where Pr(y — m \ x,Xk) is the probability that y = m given x, noting a specific value for Xk. The change is interpreted as indicating that when Xk changes from xg to xEl the predicted probability of outcome m changes by APr(y = m | x) /Axk, holding all other variables at x.
The value of the discrete change depends on (1) the value at which Xf, starts, (2) the amount of change in Xk, and (3) the values of all other variables. Most often, each continuous variable except Xk is held at its mean. For dummy independent variables, the change could be computed for both values of the variable. For example, we could compute the discrete change for age separately for men and women.
Here the discrete-change coefficients for male, age, and prst for women in 1989, with other variables at their mean, are computed as follows:
214
Chapter 5 Models for ordinal outcomes
. prchange male age prst, x(male=0 yr89=l) rest(mean)
ologit: Changes in Probabilities for warm
male
	AvgjChgl	SD
0->l	.08469636	.05813552
age		
	AvglChgl	SD
Min->Max	.16441458	.10941909
-+1/2	.00222661	.00124099
-+sd/2	.0373125	.0208976
MargEfct	.00222662	.00124098
prst		
	AvglChgl	SD
Min->Max	.04278038	-.02352008
-+1/2	.00062411	-.00034784
-+sd/2	.00904405	-.00504204
MargEfct	.00062411	-.00034784
	3D	D
D
.11125721 D
.21941006 .00321223 .05372739
A
.05015317 A
.05462247 -.0001803 .00300205
SA
.11923955 SA
.27420671 .00427291 .07162295
.00321226   -.00018032 -.00427292
D
.06204067 .00090037 .01304607 .00090038
A
44057754    .27026597
.00013945 .00005064 .00073212 .00005054
SA
SA
.08542132 .00119767 .01735598 .00119767
PrCylx)    .06099996 .22815652
yr89        male       white age
x=             1             0    .876581 44.9355
sd(x)=    .489718    .498875    .328989 16.779
ed prst 12.2181 39.5853 3.16083 14.4923
For variables that are not binary, the discrete change can be interpreted for a unit change centered on the mean, for a standard deviation change centered on the mean, or as the variable changes from its minimum to its maximum value. The following are two examples:
For a standard deviation increase in age, the probability of disagreeing increases by .05, holding other variables constant at their means.
Moving from the minimum prestige to the maximum prestige changes the predicted probability of strongly agreeing by .08, holding all other variables constant at their means.
The J discrete-change coefficients for a variable can be summarized by computing the average of the absolute values of the changes across all the outcome categories:
APrfo = j |x)
The absolute value must be used because the sum of the changes without taking the absolute value is necessarily zero. These are labeled as AvgjChgl. For example, the effect of being a male is on average 0.08, which is larger than the average effect of a standard deviation change in either age or occupational prestige.
5.8.7   Changes in predicted probabilities
215
Confidence intervals for discrete changes
Although pr change computes discrete changes, it does not compute confidence intervals for these changes. If you want confidence intervals, you need to use prvalue with the save and dif f options. In the first call to prvalue, you provide the starting values for the explanatory variables and specify save. In the second call, you provide the ending values and specify dif f.
Consider the change in probabilities associated with a change from 0 to 1 in a dummy variable. With pr change, we calculated the difference in probabilities between men and women in 1989, holding the other variables at their means. We can compute this same change using prvalue, which will give ns the confidence intervals as well.
. qui prvalue, x(male=0 yr89=l) rest(mean) save
. prvalue, x(male=l yr89=l) rest(mean) diff
ologit: Change in Predictions for warm
Confidence intervals by delta method
Current Saved
0.1191 0.0610
0.3394 0.2282
0.3904 0.4406
0.1510 0.2703
Pr(y=SD|x): Pr(y=D]x): Pr(y=A|x): Pr(y=SA|x):
yr89
Current= 1 Saved= 1 Diff= 0
Change 0.0581 0.1113 0.0502 0.1192
95% CI for Change
male 1
0
1
white .8765809 .8765809 0
[ 0.0438
[ 0.0876, 0
[-0.0679, -0
[-0.1448, -0
age ed
44.935456 12.218055
44.935456 12.218055
0 0
.0725] 1350] 0324] 0937]
prst 39.585259 39.585259 0
Tf you compare the results listed in the Change column with the earlier results from prchange, you will see that they are the same. For example, the probability of strongly agreeing with the item about working mothers is .119 lower for men than for women. The estimated bounds of the 95% confidence interval for differences in probabilities between men and women in 1989 are -.145 and -.094, holding other variables at their means. Or, we might say, the results suggest that the difference between men and women in the predicted probability of strongly agreeing is .12 and the difference could be as small as .09 or as large as .14 with 95% confidence.
Now let's consider discrete change with a continuous variable. Earlier, prchange computed the changes associated with a standard deviation increase in age centered around the mean. The mean age is 44.9, with a standard deviation of 16.8. Accordingly, we want to compute the change in probabilities when age increases from 36.5 to 53.3. The start and end values for age are computed using the display command:
. di 44.935 - (.5*16.779) 36.5455
.  di 44.935 + (.5*16.779) 53.3245
216 Chapter 5 Models for ordinal outcomes
Using these values in a pair of prvalue commands, we find
. qui prvalue, x(male=0 yr89=l age=36.5455) rest(mean) save . prvalue, x(male=0 yrS9=l age=53.3245) rest(mean) diff ologit: Change in Predictions for warm Confidence intervals by delta method
Current Saved
Pr(y=SD|x): 0.0723 0.0514
Pr(y=D|x): 0.2556 0.2019
Pr(y=A|x): 0.4362 0.4392
Pr(y=SA|x): 0.2359 0.3076
yr89            male white
Current-                1                 0 .8765809
Saved=                 1                 0 8765809
Diff-                 0                0 0
Change     95'/. CI for Change 0.0209    [ 0.0155, 0.0262] 0.0537    [ 0.0414, 0.0661] -0.0030    [-0.0118, 0.0058] -0.0716    [-0.0883, -0.0549]
age ed prst
53.3245    12.218055 39.585259 36.5455    12.218055 39.585259 16.779 0 0
The amount of change matches the results from pr change, but now we have confidence intervals. For otherwise average women in 1989, a standard deviation increase in age, about 17 years, increases the probability of disagreeing by .054 with estimated bounds for the 95% confidence interval at .041 and .066.
Computing discrete change for a 10-year increase in age
In the example above, age was measured in years. Not surprisingly, the change in the predicted probability for a 1-year increase in age is trivially small. But, to characterize the effect of age, we could report the effect of a 10-year change in age.
Warning It is tempting to compute the discrete change for a 10-year change in age by simply multiplying the 1-year discrete change by 10. This will give you approximately the right answer if the probability curve is nearly linear over the range of change. But, when the curve is not linear, simply multiplying can give misleading results and even the wrong sign. To be safe, do not do it!
The delta(#) option for prchange computes the discrete change as an independent value changes from #/2 units below the base value to #/2 above. Here we use delta(10) and set the base value of age to its mean:
5.8.8   Odds ratios using Hstcoef
. prchange age, x(male=0 yr89=l) rest(mean) delta(10) ologit: Changes in Probabilities for warm
217
(Mote: d	= 10)			
age	AvglChgl	SD	D	A SA
Min->Max	.16441458	.10941909	.21941006	-.05462247 -.27420671
-+d/2	.02225603	.01242571	.03208634	-.00179818 -.04271388
-+sd/2	.0373125	.0208976	.05372739	-.00300205 -.07162295
MargEfct	.00222662	.00124098	.00321226	-.00018032 -.00427292
	SD	D	A	3A
Pr(y|x)	.06099996	22815652    .44057754 .27026597		
	yr89        male white		age	ed prst
x=	1	0 .876581	44.9355 12	.2181 39.5853
sd(x)=    .489718 .498875
.328989
16.779   3.16083 14.4923
For females interviewed in 1989, the results in the -+d/2 row show the changes in the predicted probabilities associated with a 10-year increase in age centered on the mean.
5.8.8   Odds ratios using listcoef
For ologit, but not oprobit, we can interpret the results using odds ratios. Earlier, (5.2) defined the ordered logit model as
^<™|>m (x) = exp (rm - x/3)
For example, wifl^ur})utcom<?s we would simultaneously estimate three equations:
^<i|>i (x) = exp (n - x/3) 4 Ary fa .
^<2|>2 (x) = exp (r2 - x/3) ^<3[>3 (x) = exp (t3 - x/3)
2/vi
Using the same approach as shown for binary logit, the effect of a change in xk of 1 equals
Q<m|>m (x,3Sfc + 1) _   -pk _ 1
^<m|>m(x,a;fc) ~~ e"*
which can be interpreted as indicating that for /ai,- unitf'^ip^-'io^^J^^^s^^ii
iangM'"'by"fne factor exp (-Bi/j. holding all
other variables constant.
The value of the odds ratio does not depend on the value of m, which is why the parallel regression assumption is also known as the proportional odds assumption. We could interpret the odds ratio as follows:        " """ "'—
For a unit increase in xk, the odds of a lower outcome compared with a higher outcome are changed by the factor exp (-/?*), holding all other variables constant.
218
or, for a change in xk of S,
(x, xk + 5)
^<m|>m (x) xk)
so that
Chapter 5 Models for ordinal outcomes
M
= exp (-Ö x ßk) =
exp (5 x ßk)
for an increase of 5 in xk, the odds of a lower outcome compared with a higher outcome change by the factor exp (-5 x f3k), holding all other variables constant.
In these results, we are discussing factor changes in the odds of lower outcomes compared with higher outcomes. This is done because the model is traditionally written as lnfi<m|>TO (x) = tm - x/3, which leads to the factor change coefficient of exp For purposes of interpretation, we could just as well consider the factor change in the odds of higher versus lower values; that is, changes in the odds Sl>m|<m (x). This would equal exp (/3fc).
The odds ratios for both a unit and a standard deviation change of the independent variables can be computed with listcoef, which lists the factor changes in the odds of higher versus lower outcomes. Here we request coefficients for only male and age:
. ologit warm yr89 male white age ed prst, nolog
(output omitted) . listcoef male age, help ologit (N=2293): Factor Change in Ddds
Odds of: >m vs <=m
warm	b	z	P>izl	e"b	e"bStdX	SDofX
male	-0.73330	-9.343	0.000	(0.48o3)	0.6936	0.4989
age	-0.02167	-8.778	0.000		0.6952	16.7790
b = raw coefficient
z = z-score for test of b=0
P>|zl = p-value for z-test
e"b = exp(b) = factor change in odds for unit increase in X
"bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X
5.8.8   Odds ratios using listcoef
or to compute percent changes in the odds,
. listcoef male age, help percent ologit (M=2293): Percentage Change in Odds Odds of: >m vs <=m
219
warm	b	z	P>l2l	7.	7,StdX	SDofX
male	-0.73330	-9.343	0.000	('-52.0)	-30.6	' 0.4989
age	-0.02167	-8.778	0.000		-30.5	16.7790
b = raw coefficient
z .= z-score for test of b=0 P>|z| = p-value for z-test
7. = percent change in odds for unit increase in X 7,StdX = percent change in odds for SD increase in X SDofX = standard deviation of X
These results can be interpreted as follows:
The odds of haying more positive attitudes tQw^dLjgQEkirig mothers are
.^iliiSS^^ holding all other variables constant.
Equivalently, the odds of having more_positive values are 52% smaller for
men thanjromeji^ holding other variables constant.
For a standard deviation increase in age, the odds of having more positive attitudes decrease by a factor of .69, holding all other variables constant.
When presenting odds ratios, we find that people find it easier to understand the results if you talk about increases in the odds rather than decreases. That is, it is clearer to say, "The odds increased by a factor of 2" than to say, "The odds decreased by a factor of .5". If you agree, then you can reverse the order when presenting odds. For example, we could say that
The odds of having more negative attitudes toward working mothers are 2.08 times larger for men than women, holding all other variables constant.
This new factor change, 2.08, is just the inverse of the old value .48 (that is, 1/.48). listcoef computes the odds of a lower category versus a higher category if you specify the reverse option:
. listcoef male, reverse ologit (N=2293): Factor Change in Odds Odds of: <=m vs >m
warn	b	z P>|z|	e"b e'bStdX	SDofX
male	-0.73330	-9.343 0.000	2.0819     1.4417	0.4989
220
Chapter 5 Models for ordinal outcomes
The output now says Odds of: <=m vs >m instead of Odds of: >m vs <=m, as it did earlier.
When interpreting the odds ratios, remember two points that are discussed in detail in chapter 4. First, because odds ratios are multiplicative coefficients, positive and negative effects should be compared by taking the inverse of the negative effect (or vice versa). For example, a negative factor change of .5 has the same magnitude as a positive factor change of 2 = 1/.5. Second, the interpretation assumes only that the other variables have been held constant, not held at any specific values (as was required for discrete change). But, a constant factor change in the odds does not correspond to a constant change or constant factor change in the probability.
5.9   Less common models for ordinal outcomes
Stata can also be used to fit several less commonly used models for ordinal outcomes. In concluding this chapter, we describe these models briefly and note their commands for estimation. Our SPost commands do not work with these models. For ocratio, this is mainly because these commands do not fully incorporate the new methods of returning information that were introduced with an earlier version of Stata.
5.9.1   The stereotype model
The stereotype logistic model (slm), also referred to as the stereotype ordered regression model (sorm), was proposed by Anderson (1984) in response to the restrictive assumption of parallel regressions in the ordered regression model. This slm, which can be fitted with in Stata using the slogit command, is a compromise between allowing the coefficients for each independent variable to vary by outcome category (as is the case with multinomial logit model considered in the next chapter) and restricting the coefficients to be identical across all outcomes as was the case with the olm. The slm is defined as
Pr (y = q | x) = ^ _ ^ _ ^ _ ^ ^ ^
In
x
where (3 is a vector of coefficients associated with the independent variables, the 6>s are intercepts, and the (j>s are scale factors that mediate the effects of the x's. Because this model is so closely related to the multinomial logit model, and is sometimes used for nominal data, we will postpone discussion until the next chapter.
5.9.2   The generalized ordered logit model
The parallel regression assumption results from assuming the same coefficient vector j3 for all comparisons in the J — 1 equations
lnfi<m|>m (x) = rm - x/3
5.9.3   The continuation ratio model
221
where f2<TO|>m (x) = {Pr (y < m | x)} / {Pr (y > m | x)}. The generalized ordered logit model (golm) allows (3 to differ for each of the J - 1 comparisons. That is,
In r2<m|>m (x) =tm- x/3m  for j = 1 to J - 1
where predicted probabilities are computed as
K"      1   ^    l + exp(n-x/3x)
Pr (y = j I x) =    exp (Tj - x/3,-)    _   exp fo^ - x/3^)
for j - 2 to .1 - 1
1 + exp (rj - x/3.,)     1 + exp (r^i -
p,(,-J|,)-i-1"g(T;--^-).
l+exp(rj_i -xPj_j)
To ensure that the Pr (y = j | x) is between 0 and 1, the condition
(tj - xPj) > (rj-_! - x^-.i)
must hold. Once predicted probabilities are computed, all the approaches used to interpret the ORM results can be readily applied. This model has been discussed by Clogg and Shihadeh (1994, 146-147), Fahrmeir and Tutz (1994, 91), and McCullagh and Nelder (1989, 155). It can be fitted in Stata with the add-on command gologit (Fu 1998). This command has not been recently updated and thus is no longer supported in SPost. More recently, Williams (2005) has written gologit2, which extends the original command to fit two special cases of the general models: the proportional odds model and the partial proportional odds model (see Lall et al. 2002; Peterson and Harrell 1990). These models are less restrictive than the ordinal logit model fitted by ologit but more parsimonious than the multinomial logit model fitted by mlogit. We plan to add support for gologit2 to SPost.
5.9.3   The continuation ratio model
The continuation ratio model was proposed by Fienberg (1980, 110) and was designed for ordinal outcomes in which the categories represent the progression of events or stages in some process through which an individual can advance. For example, the outcome could be faculty rank, where the stages are assistant professor, associate professor, and full professor. A key characteristic of the process is that an individual must pass through each stage. For example, to become an associate professor you must be an assistant professor; to be a full professor, an associate professor. Although there are versions of this model based on other binary models (e.g., probit), here we consider the logit version.
If Pr (y — m j x) is the probability of being in stage m given x and Pr (y > m \ x) is the probability of being in a stage later than m, the continuation ratio model for the log odds is
222
Chapter 5 Models for ordinal outcomes
fPr(g = m|x) \ = \ Pr (y > m | x) J
rm — x/3  for m = 1 to J - 1
^ Pr (y > m j x)
where the /3s are constrained to be equal across outcome categories, whereas the constant term rm differs by stage. As with other logit models, we can also express the model in terms of the odds:
Pr (y = m | x)
x
= exp (rm - x/3)
Pr (y > m
Accordingly, exp (-ft) can be interpreted as the effect of a unit increase inxk on the odds of being in m compared with being in a higher category given that an individual is in category m or higher, holding all other variables constant. From this equation, the predicted probabilities can be computed as
Pr (v - m I x) =_exp(rro-x/3)-        for m = 1 to J - 1
Fr [y - m | x)    ^ {1 + ^ (r_ _ x/3)}
j-i
Pr{y = J!x) = l-$]Prfo = j |x)
These predicted probabilities can be used for interpreting the model. In Stata, this model can be fitted using ocratio by Wolfe (1998); type net search ocratio and follow the prompts to download.
6 Models for nominal outcomes with case-specific data
An outcome is nominal when the categories are assumed to be unordered. For example, marital status can be grouped nominally into the categories of divorced, never married, married, or widowed. Occupations might be organized as professional, white collar, blue collar, craft, and menial, which is the example we use in this chapter. Other examples include reasons for leaving the parents' home, the organizational context of scientific work (e.g., industry, government, and academia), and the choice of language in a multilingual society. Further, in some cases a researcher might prefer to treat an outcome as nominal, even though it is ordered or partially ordered. For example, if the response categories are strongly agree, agree, disagree, strongly disagree, and don't know, the category "don't know" invalidates models for ordinal outcomes. Or, you might decide to use a nominal regression model when the assumption of parallel regressions is .rejected. In general, if you have concerns about the ordinality of the dependent variable, the potential loss of efficiency in using models for nominal outcomes is outweighed by avoiding potential bias.
This chapter focuses on three closely related models for nominal (and sometimes ordinal) outcomes with case-specific data. The multinomial logit model (MNLM) is the most frequently used nominal regression model. In this model, you are essentially estimating a separate binary logit for each pair of outcome categories. Next we consider the multinomial probit model with uncorrected errors, which is the normal counterpart to the mnlm. We then discuss the stereotype logistic regression model (SLM). Although this model is often used for ordinal outcomes, it is closely related to the mnlm. All these models assume that the data are case specific, meaning that each independent variable has one value for each individual. Examples of such variables are an individual's race or education. In the next chapter, we consider models that include alternative-specific data.
Models for nominal outcomes, both in this chapter and the next, require us to be more exacting about some basic terminology. Until now we have used "individual", "observation", and "case" interchangeably to refer to observational units, where each observational unit corresponds to a single row or record in the dataset. In the next two chapters, we will use only the term "case" for this purpose. Most of the time, we use the word "alternative" to refer to a possible outcome. Sometimes we refer to an alternative as an outcome category or a comparison group in order to be consistent with the usual terminology for a model or the output generated by Stata. The term "choice"