222
Chapter 5 Models for ordinal outcomes
In
fPr(y = m]x)| _^ \ Pr (y > m | x) J m
for m = 1 to J — 1
Pr (y > m | x)
where the /3s are constrained to be equal across outcome categories, whereas the constant term rm differs by stage. As with other logit. models, we can also express the model in terms of the odds:
Pr (y = m I x) . .
p ,  ^-ri = exp (rm - x/3)
Pr (y > m \ x)
Accordingly, exp (— /?&) can be interpreted as the effect of a unit increase in Xk on the odds of being in m compared with being in a higher category given that an individual is in category m or higher, holding all other variables constant. From this equation, the predicted probabilities can be computed as
Pr (y — m | x)
exp (rm - x/3)
n™=i (l+expfo-x/?)}
j-i
for m = 1 to J — 1
Pr (y = J | x) = 1 - J2 Pr (V = 3 I x)
j=i
These predicted probabilities can be used for interpreting the model. In Stata, this model can be fitted using ocratio by Wolfe (1998); type net search ocratio and follow the prompts to download.
6 Models for nominal outcomes with case-specific data
An outcome is nominal when the categories are assumed to be unordered. For example, marital status can be grouped nominally into the categories of divorced, never married, married, or widowed. Occupations might be organized as professional, white collar, blue collar, craft, and menial, which is the example we use in this chapter. Other examples include reasons for leaving the parents' home, the organizational context of scientific-work (e.g., industry, government, and academia), and the choice of language in a multilingual society. Further, in some cases a researcher might prefer to treat an outcome as nominal, even though it is ordered or partially ordered. For example, if the response categories are strongly agree, agree, disagree, strongly disagree, and don't know, the category "don't know" invalidates models for ordinal outcomes. Or, you might decide to use a nominal regression model when the assumption of parallel regressions is rejected. In general, if you have concerns about the ordinality of the dependent variable, the potential loss of efficiency in using models for nominal outcomes is outweighed by avoiding potential bias.
This chapter focuses on three closely related models for nominal (and sometimes ordinal) outcomes with case-specific data. The multinomial logit model (mnlm) is the most frequently used nominal regression model. In this model, you are essentially estimating a separate binary logit for each pair of outcome categories. Next we consider the multinomial probit model with uncorrelated errors, which is the normal counterpart to the mnlm. We then discuss the stereotype logistic regression model (slm). Although this model is often used for ordinal outcomes, it is closely related to the mnlm. All these models assume that the data are case specific, meaning that each independent variable has one value for each individual. Examples of such variables are an individual's race or education. In the next chapter, we consider models that include alternative-specific data.
Models for nominal outcomes, both in this chapter and the next, require us to be more exacting about some basic terminology. Until now we have used "individual", "observation", and "case" interchangeably to refer to observational units, where each observational unit corresponds to a single row or record in the dataset. In the next two chapters, we will use only the term "case" for this purpose. Most of the time, we use the word "alternative" to refer to a possible outcome. Sometimes we refer to an alternative as an outcome category or a comparison group in order to be consistent with the usual terminology for a model or the output generated by Stata. The term "choice"
224
Chapter 6 Models for nominal outcomes with case-specific data
refers to the alternative that is actually observed, which can be thought of as the "most preferred" alternative. For example, if the dependent variable is the party voted for in the last presidential election, the alternatives might be Republican, Democrat, and Independent. If the person corresponding to a given case voted for the alternative of Democrat, we would say that the choice for this case is Democrat. But you should not infer from the term "choice" that the models we describe can be used only for data where the outcome occurs through a process of choice. For example, if we were modeling the type of injuries that people (i.e., cases) entering the emergency room of a hospital have, we would use the term "choice" even though the injury sustained is unlikely to be a choice. We will continue with this terminology in chapter 7, but with one complication. Chapter 7 deals with alternative-specific variables that vary not only by case but also by the alternative. For example, if a commuter is selecting one of three modes of travel, an alternative-specific predictor might be her travel time using each alternative. Each case has three rows of data, one for each of the alternatives, since this is the easiest way to organize the data. We discuss this more fully in the next chapter.
We begin by discussing the MNLM, where the biggest challenge is that the model includes many parameters and it is easy to be overwhelmed by the complexity of the results. This complexity is compounded by the nonlinearity of the model, which leads to the same difficulties of interpretation found for models in prior chapters. Although fitting the model is straightforward, interpretation involves many challenges that are the focus of this chapter. We begin by reviewing the statistical model, followed by a discussion of testing, fit, and finally methods of interpretation. These discussions are intended as a review for those who are familiar with the models. For a complete discussion, see Long (1997). As always, you can obtain sample do-files and data files by downloading the spost9_do and spost9_ado packages (see chapter 1 for details).
6.1_The multinc
lodel
The MNLM can be thought of as simultaneously estimating binary logits for all comparisons among the alternatives. For example, let occ3 be a nominal outcome with the categories M for manual jobs, W for white-collar jobs, and P for professional jobs. Assume that there is one independent variable, ed, measuring years of education. We can examine the effect of ed. on occ3 by estimating three binary..logits,
In
In
In
f Pr (P I x) 1 \ Pr ^ I x) J Pr {W) x) Pr (ÖYp x)
PrCÖx)
— ßo,F\M + ßl,P\Med ~ ßd,W\M + ßl,W\Me<i-~ ßo,P\W + ßl,P\W&d
where the subscriptsTToH-he^slndicate which comparison is being made (e.g., Pitp\M is the coefficient for the first independent variable for the comparison of P and M).
6.1   The multinomial logit model
225
The three binary logits include redundant information. Because lno/fo = Ina - Inb, the following equality must hold:
In
This implies that
Pr (P
Pr (M
In
Pr (W I x) Pr (M I x)
(P
\ Pr (J'l
(W I x)
ßo,P\M ~ ßd,W\M — ßo,P\W ßl,P\M — ßl,W\M = ßl,P\W
(6.1)
In general, with J alternatives, only J -1 binary logits need to be estimated. Estimates for the remaining coefficients can be computed using equalities of the sort shown in (6.1).
J3^J?xabhim^i^i j^jtogj^ a series of binary logits is that
each" binaryj^ljs^^ sample. For example, in the logit comparing P
*mnr~H7those in W are dropper! Wseelhis, we can look at the output from a series of binary logits. First, we estimate a binary logit comparing manual and professional workers:
. use http://www.stata-press.com/data/lf2/nomintro2, clear (1982 General Social Survey)
. tab prof_man, miss
prof_man	Freq.	Percent		Cum.	
Manual	184	54	60	54	60
Prof	112	33	23	87	83
	41	12	17	100	00
Total	337	100	00		
. logit prof_man ed, nolog Logistic regression
Log likelihood = -126.43879
Number of obs LR chi2(l) Prob > chi2 Pseudo R2
296 139.78 0.0000 0.3560
prof _man	Coef-^   Std. Err.	z	P>lz|	[95% Conf.	Interval]
ed	7184599 ) .0858735	8.37	0.000	.550151	.8867688
_cons	VlO.19854j 1.177467	-8.66	0.000	-12.50632	-7.89077
Forty-one cases are missing for prof _ man and have been deleted. These correspond to respondents who have white-collar occupations. Likewise, the next two binary logits also exclude cases corresponding to the excluded category:
226
Chapter 6 Models for nominal outcomes with case-specific data
tab wc_man, miss
wc„man	Freq-	Percent	Cum.
Manual WhiteCol	184 41 112	54.60 12.17 33.23	54.60 66.77 100.00
Total	337	100.00	
. logit wc_man ed, nolog Logistic regression
Log likelihood - -98.818194
Number of obs LR chi2<l) Prob > .cii-2 ,Pseu'do R2
225 "IE". 00""' 0.0001 0.0749
yc_man	Coef.	Std- Err^,	ST"  z P>|z|	[957, Conf.	Interval] ........
ed _cons	' .3418255 i-6.758148	"~;ö934517 1.216291	3.66 0.000 -4.73 0.000	.1586636 -8.142035	.5249875 -3.374262
. tab prof_wc prof_wc	^ \ , miss Freq.	Percent	Cum.		
WhiteCol Prof	41 112 184	12.17 33.23 54.60	12.17 4s.40 100.00		
Total	337	100.00			
. logit prof_wc ed, nolog Logistic regression
Number of obs LR chi2(l)
153 23.34
Log likelihood = -77.257045				Prob > Pseudo	chi2 R2	0.0000 0.1312
prof_bc	Coef.	Std. Err.	z	P>iz|	[95'/. Conf.	Interval]
ed _cons	.3735466 -4.332833	.0874469 1.227293	4.27 -3.53	0.000 0.000	.2021538 -6.738283	.5449395 -1.927382
The results from the binary logits can be compared with the output from mlogit, the command that fits the mnlm:
. tab occ3, miss
occ3	Freq.	Percent		Cum.	
Manual	184	54	60	54	60
WhiteCol	41	12	17	66	77
Prof	112	33	23	100	00
Total	337	100	00		
6.1-1   Formal statement of the model
. mlogit occ3 ed, nolog Multinomial logistic regression
Log likelihood = -248.14786
227
Number of obs LR chi2(2) Prob > chi2 Pseudo R2
337 145.89 0.0000 0.2272
occ3	Coef.	Std. Err.		z	P>|zl	[96% Conf.	Interval]
W&iteCol	r    . .- - •""^]						
..........._________ ed	!    .3000735 I	.0841358	3	57	0.000	.1351703	.4649767
_cons	\ -5,232602 1	1.096086	-4	77	0.000	-7.380892	-3.084312
Prof ed _cons	/ -719B673 (-10.21121	.0805117 1.106913	8 -9	94 22	0.000 0.000	.5617671 -12.38072	.8773674 -8.041698
(occ3==Manual is^tJie^base outcome)
The output from mlogit is divided into two panels. The top panel is labeled WhiteCol, which is the value label for the second category of the dependent variable; the second panel is labeled Prof, which corresponds to the third outcome category. The key to understanding the two panels is the last line of output: occ3==Manual is the base outcome. This means that the panel WhiteCol presents coefficients from the comparison of W to M. The second panel, labeled Prof, holds the comparison of P to M. Accordingly, the top panel should be compared with the coefficients from the binary logit for W and M (outcome variable wc_man) listed above. For example, the coefficient for the comparison of W to M from mlogit is @i,w'\m — -3000735 with z = 3-57, whereas the logit estimate is 0itw\M = .3418255 with z — 3.66. Overall, J,he_estimates from the binarj nxodeXare close to those from the mnlm but not exactlyJ.he_saine. '
Although theoretically 0i,p\m — 0itw\m — Pi,p\w> the estimates from the binary logits are pltP{M - A,w\m = .7184599 - .3418255 = .3766344, which does not equal the binary logit estimate 0irp\w = -3735466. A series of binary logits using logit_ does not impose the constraints among coefficients that are implicit in the definition of the model. When fitting the model with mlogit, the constraints are imposed. Indeed, the output from mlogit presents only two of the three comparisons from our example, namely, W versus M and P versus M. Thej^maining comparij^j^Wc"T^r^s_P, is_the difference between the two sets of estimatgd.£oeffiHents. Details on using list coef to
comparison 3. Details o
automatically computeThlTr^Tammg comparisons are given below.
6.1.1   Format statement of the model
mnlm can be wr lnftm]6(x) =ln
for rn = 1 to J
Formally, the mnlm can be written as
Pr(g = ro|x) _a
Pr(y = Mx) " where b is the base category, which is also referred to as the comparison group. As Infillj, (x) = In 1 = 0, it must hold that /3fa|b = 0. That is, the log odds of an outcome
Up
t •/
228
Chapter 6 Models for nominal outcomes with case-specific data
6.2  Estimation using rnlogit
229
compared with itself are always 0, and thus the effects of any independent variables must also be 0. These J equations can be solved to compute the predicted probabilities:
Pr (y — tn | x) —
exp (x/3m|b) . Ei=iexp (x/V,)
Although the predicted probability will be the same regardless of the base outcome, b, changing the base outcome can be confusing since the resulting output from rnlogit appears to be quite different. Suppose that you have three outcomes and fit the model with alternative 1 as the base category. Your probability equations would be
Pr (y = m j x) =
exp
(x£m|i)
£/=i exp (x%)
and you would obtain estimates 32]1 and /33ji, where (3X^ = 0. If someone else set up.. the model with base category 2, their equations would be
Pr (y = m | x)
exp
(x/3m|2)
and they would obtain 3i|2 and J33]2, where (32{2 = 0. Although the estimated parameters are different, they are only different parameterizatioas that provide the same predicted probabilities. The confusion arises only if you are not clear about which parameterization you are using. Unfortunately, some software packages—but not Stata^ make it hard to tell which set of parameters is being estimated. We return to this issue when we discuss how Stata's rnlogit parameterizes the model in the next section.
6.2   Estimation using rnlogit
The multinomial logit model is fitted with the following command and its basic options:
rnlogit depvar [indepvars]  [if]  [in] [weight] [, noconstant
baseoutcome(#) constraints(clist) robust cluster(varname) level(#) rrr noJLog]
In our experience, the model converges quickly, even when there are many outcome categories and independent variables.
Variable lists
depvar is the dependent variable. The actual values taken on by the dependent variable are irrelevant. For example, if you had three outcomes, you could use the values
-JE
1, 2, and 3 or -1, 6, and 999. Up to 50 outcomes are allowed in Stata/SE and Intercooled Stata, and 20 outcomes are allowed in Small Stata.
indepvars is a list of independent variables. If indepvars is not included, Stata fits a model with only constants.
Specifying the estimation sample
if and in qualifiers can be used to restrict the estimation sample. For example, if you want to fit the model with only white respondents, use the command rnlogit occ ed exper if white==l.
Listwise deletion Stata excludes cases in which there are missing values for any of the variables. Accordingly, if two models are fitted using the same dataset but have different sets of independent variables, it is possible to have different samples. We recommend that you use mark and markout (discussed in chapter 3) to explicitly remove cases with missing data.
Weights
rnlogit can be used with f weights, pweights, and iweights. In chapter 3, we provide a brief discussion of the different types of weights and how weights are specified in Stata's syntax.
Options
noconstant excludes the constant terms from, the model.
baseoutcome(#) specifies the value of depvar that is the base category (i.e., reference group) for the coefficients that are listed. This determines how the model is parameterized. If the baseoutcomeO option is not specified, the most frequent outcome in the estimation sample is chosen as the base. The base category is always reported immediately below the estimates; for example, Outcome occ3==Manual is the base outcome.
constraints (.clist) specifies the linear constraints to be applied during estimation. The default is to perform unconstrained estimation. Constraints are defined with the constraint command. This, option is illustrated in section 6.3.3 when we discuss an LR test for combining outcome categories.
robust indicates that robust variance estimates are to be used. When cluster() is specified, robust standard errors are automatically used. See chapter 3 for more details.
cluster (.varname) specifies that the observations be independent across the groups specified by unique values of varname but not necessarily independent within the groups. See chapter 3 for more details.
230
Chapter 6 Models for nominal outcomes with case-specifíc datu
6.2-2   Using different base categories
231
level {#) specifies the level of the confidence interval for estimated parameters. By default, Stata uses 95% intervals. You can also change the default level to, sav. a 90% interval, with the command set level 90. rrr reports the estimated coefficients transformed to relative risk ratios, defined as exp (b) rather than 6, along with standard errors and confidence intervals for these ratios.
nolog suppresses the iteration history.
.2.1   Example of occupational attainment
The 1982 General Social Survey asked respondents their occupation, which we recoded into five broad categories: menial jobs (M), blue collar jobs (B)t craft jobs (C), white collar jobs (W), and professional jobs (P). Three independent variables are considered: white indicating the race of the respondent, ed_measuring years of education, and exper. measuring years of wo'fk" experience.
narize white ed exper
. mlogit occ white ed exper, baseoutcome(5) nolog Multinomial logistic regression
Log likelihood ~ -426.80048
Number of obs LR cM2(12) Prob > chi2 Pseudo R2
337 166.09 0.0000 0.1629
Variable	Obs	Mean	Std. Dev.	Min	Max	
white ed exper	337 337 337	.9169139 13.09496 20.50148	.2764227 2.946427 13.95936	0 3 2	1 20 66	# "ft
The distribution among outcome categories is
. tab occ
Occupation	Freq.	Percent	Cum.	
			..... 9.V.0-------- 29.67 54.60 66.77 100.00	- -™-
Menial BlueCol Craft WhiteCol Prof	31 69 84 41 112	a./O 20.47 24.93 12.17 33.23		
Total	337	100.00	t	
Using these variables, the following mnlm was fitted:
lnfiM!F(xj) = A),A-f|P + A,M|pWhite + ^2iM|ped + ^3;W|Pexper In ftB(F (xi) = /?0re|p + A,£r[pwhite + Jh,B\p^A + /53,B|pexper lnficlF (xj) = P0tc\p + A,c|pwhite + /?2,c|Ped + /?3,c|pexper ln.ClW\p (Xi) = f30:w\p + /3i,n/|pwMte + 02%w\p&& + /^wipexper
where we specify the fifth outcome P as the base category:
occ	Coef.     Std. Err.          z       P>|z|         [95% Conf. Interval]
Menial white ed exper _cons	-1.774306      .7550543       -2.35     0.019       -3.254186 -.2944273 -.7788519      .1146293       -6.79     0.000       -1.003521 -.5541826 -.0356509       .018037       -1.98     0.048       -.0710028 -.000299 11.51833     1.849356         6.23     0.000         7.893659 15.143
BlueCol white ed exper _cons	-.5378027      .7996033       -0.67     ^_501,       -2.104996 1.029391 -.8782767      .1005446       -8.74     0.000         -1.07534 -.6812128 -.0309296      .0144086       -2.15     0.032           -.05917 -.0026893 12.25956      1.668144         7.35     0.000         8.990061 15.52907
Craft white ed exper _cons	-1.301963       .647416       -2.01     0.044       -2.570875 -.0330509 -.6850365      .0892996       -7.67     0.000       -.8600605 -.5100126 -.0079671      .0127055       -0.63     0.531       -.0328693 .0169351 10.42698     1.517943         6.87   '07000         7.451864 13.40209
WhiteCol white ed exper _cons	-.2029212      .8693072       -0.23     0.815       -1.906732 1.50089 -.4256943      .0922192       -4.62     C-7Ö00       -.6064407 -.2449479 -.001055      .0143582       -0.07     0.941       -.0291967 .0270866 5.279722     1.684006         3.14     '07Ö02         1.979132 8.580313
(occ==Prof is the base outcome)	
Methods of testing coefficients and interpretation of the estimates will be considered after we discuss the effects of using different base categories.
6.2.2   Using different base categories
By default, mlogit sets the base category to the alternative with the most observations. Or, as illustrated in the last example, you can select the base category with baseoutcomeQ. mlogit then reportgjsp_£ffifaentB for_jyi^eflfec"fc of each^ma^ejjdent variable_an_ea.eh category relative,to_thi3_b_ase__ca^eg.Qii- HoweverTl^oTnmouTd also examine the effects on other pairs of outcome categories. For example, you might be interested in how race affects the allocation of workers between Craft and BlueCol (e.g., fli:B\c)> which was not estimated in the output listed above. Although this coefficient can be estimated by rerunning mlogit with a different base category (e.g., mlogit occ white ed exper, baseontcome(3)), it is easier to use listcoef, which presents estimates for all combinations of outcome categories. Because listcoef can generate much output, we show two options that limit which coefficients are listed. First, you can include a list of variables, and only coefficients for those variables will be listed. For example,
232
Chapter 6 Models for nominal outcomes with case-specific data
6.2.2   Using different base categories
233
. listcoef white, help
mlogit (N=337); Factor Change in the Odds of occ
Variable: white Csd=.27642268)
Odds comparing Alternative 1
to Alternative 2			b	z	P>lz|		e"b	e'bStdX	
Menial	-BlueCol	-1	23650	-1.707	0	088	0.2904	0	7105
Menial	-Craft	-0	47234	-0.782	0	434	0.6235	0	8776
Menial	-WhiteCol	-1	57139	-1.741	0	082	0.2078	0	6477
Menial	-Prof	-1	77431	-2.350	0	019.	0.1696	0	6123
"SlueCol	-Menial	1	23650	1.707	0	088	3.4436	1	4075
BlueCol	-Craft	0.76416		1.208	0	227	2.1472	1	2352
BlueCol	-WhiteCol	-0.33488		-0.359	0	720	0.7154	0	9116
BlueCol	-Prof	-0	63780	-0.673	0	501	0.5840	0	8619
^Craft	-Menial	0	47234	0.782	0	434	1.6037	1	1395
Craft	-BlueCol	-0	76416	-1.208	0	227	0.4657	0	8096'
Craft	-WhiteCol	-1	09904	-1.343	0	179	0.3332	0.7380	
Craft	-Prof	-1	30196	-2.011	0	044	0.2720	0	6978
"WhiteCol	-Menial	1	57139	1.741	0.082		4.8133	1	5440
WhiteCol	-BlueCol	0	33488	0.359	0.720		1.3978	1	0970
WhiteCol	-Craft	1	09904	1.343	0	179	3.0013	1.3550	
WhiteCol	-Prof	-0	20292	-0.233	0	815	0.8163	0	9455
■prof	-Menial	1	77431	2.350	0	019	5.8962	1	6331
Prof	-BlueCol	0	53780	0.673	0	501	1.7122	1	.1603
Prof	-Craft	1	30196	2.011	0	044	3.6765	1	.4332
Prof	-WhiteCol	0.20292		0.233	0	815	1.2250	1	.0577
■mm
b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test
e~b = exp(b) = factor change in odds for unit increase in X e"bStdX = exp(t>*SD of X) = change in odds for SD increase in X
-Qr—you-ea^-mit-4jMMMJ^t^^ at a -given level
using the pvalue(#) option, which specifies that only coefficients significant at the # significance level or smaller will be printed. For example,
(Continued on next page)
. listcoef, pvalue(.05}
mlogit (N=337): Factor Change in the Odds of occ when P>|z| < 0.05 Variable: white (sd=.27642268)
Odds comparing							
Alternat i ve 1							
to Alternative 2	b	z	PXzl		e"b	e"	bStdX
Menial -Prof	-1.77431	-2.350	0	.019	0.1696	0	.6123
Craft -Prof	-1.30196	-2.011	0	.044	0.2720	0	.6978
Prof -Menial	1.77431	2.350	0	.019	5.8962	i	.6331
Prof -Craft	1.30196	2.011	0	.044	3.6765	1	.4332
Variable: ed (sd=2.9464271)							
Odds comparing							
Alternative 1							
to Alternative 2	b	z	P>|z|		e~b	e'bStdX	
Menial -WhiteCol	-0.35316	-3.011	0	003	0.7025	0	.3533
Menial -Prof	-0.77885	-6.795	0	000	0.4589	0	1008
BlueCol -Craft	-0.19324	-2.494	0	013	0.8243	0	5659
BlueCol -WhiteCol	-0.45258	-4-, 425	0	000	0.6360	0	2636
BlueCol -Prof	-0.87828	-8.735	0	000	0.4155	0	0752
Craft -BlueCol	0.19324	2.494	0	013	1.2132	1	7671
Craft -WhiteCol	-0.25934	-2.773	0	006	0.7716	0	4657
Craft -Prof	-0.68504	-7.671	0	000	0.5041	0	1329
WhiteCol-Menial	0.35316	3.011	0	003	1.4236	2	8308
WhiteCol-BlueCol	0,45258	4.425	0	000	1.5724	3	7943
WhiteCol-Craft	0.25934	2.773	0	006	1.2961	2	1471
WhiteCol-Prof	-0.42569	-4.616	0	000	0.6533	0	2853
Prof -Menial	0.77885	6.795	0	000	2.1790	9	9228
Prof -BlueCol	0.87828	8.735	0.	000	2.4067	13	3002
Prof -Craft	0.68504	7.671	0.	000	1.9838	7	5264
Prof -WhiteCol	0.42569	4.616	0.	000	1.5307	3	5053
Variable: exper (sd=13.959364)
Odds comparing Alternative 1 to Alternative 2		b		z	P>|z|.		e'b	e~bStdX
Menial -Prof	-0	03565	-1,	977	0.048	0	9650	0.6079
BlueCol -Prof	-0	03093	-2.	147	0.032	0	9695	0.6494
Prof -Menial	0	03565	1.	977	0.048	1	0363	1.6449
Prof -BlueCol	0	03093	2.	147	0.032	1	0314	1.5400
If you do not need to see the comparisons between all pairs of alternatives, you. can limit the output with the gt or It options of listcoef. By default, listcoef lists comparisons in both directions. For example, it will show you the effect on the odds of alternative 1 versus alternative 2 and the effect on the odds of 2 versus 1. The gt option limits comparisons to those in which the first alternative is greater than the second; It shows comparisons when the first alternative is less than the second. For example,
234
Chapter 6 Models for nominal outcomes with case-specific data
. listcoef ed, pvalue(,05) gt nolabel
mlogit (11=337): Factor Change in the Odds of occ when P>|zl < 0.05 Variable: ed (sd=2.9464271)
We used the nolabel option to show the category values of the two alternatives rather than their value labels, and the pvalueC.05) option limits the coefficients that are , printed to those that are significant at the .05 level.
.2.3   Predicting perfectly
mlogit handles perfect prediction somewhat differently than the estimations commands for binary and ordinal models that we have discussed, logit and probit automatically remove the observations that imply perfect prediction and compute estimates accordingly, ologit and oproblt keep these observations in the model, fit the z for the problem variable as 0, and provide an incorrect lr chi-squared but also warn that a given number of observations are completely determined. You should delete these observations and refit the model, mlogit is just like ologit and oprobit, except that you —donjot~recerve~Trwwrnmg~nisssag^ associated with the variable causing the problem have z — 0 (and p > \z\ = 1). You should refit the model, excluding the problem variable and deleting the observations that imply the perfect predictions. Using the tabulate command to generate a cross-tabulation of the problem variable and the dependent variable should reveal the combination that results in perfect prediction.
.3   Hypothesis testing of coefficients
In the mnlm, you can test individual coefficients with the reported ^-statistics, with a Wald test using test, or with an lr test using Irtest. As the methods of testing one coefficient that were discussed in chapters 4 and 5 still apply fully, they are not considered further here. However, in the mnlm there are new reasons for testing groups of coefficients. First, testing that a variable has no effect requires a test that J — 1 coefficients are simultaneously equal to zero. Second, testing whether the independent variables as a group differentiate between two alternatives requires a test of K coefficients. This section focuses on these two kinds of tests.
'4i
1
Odds comparing Alternative 1 to Alternative 2		b		z	P>lzl	e-b	e~bStdX	■
3 -2	0	19324	2	494	0.013	1.2132	1.7671	
4 -1	0	35316	3	on	0.003	1.4236	2.8308	
4 -2	0	45258	4	425	0.000	1.5724	3.7943	—j
4 -3	0	25934	2	773	0.006	1.2961	2.1471	
5 -1	0	77885	6	795	0.000	2.1790	9.9228	
5 -2	0	87828	8	735	0.000	2.4067	13.3002	
5 -3	0	68504	7	671	0.000	1.9838	7.5264	
5 -4	0	42569	4	616	0.000	1.5307	3.5053	
6.3.1   mlogtest for tests of the MNLM
235
Caution regarding specification searches Given the difficulties of interpretation that are associated with the mnlm, it is tempting to search for a more parsimonious model by excluding variables or combining outcome categories based on a sequence of tests. Such a search requires great care. First, these tests involve multiple coefficients. Although the overall test might indicate that as a group the coefficients are not significantly different from zero, an individual coefficient can still be substantively and statistically significant. Accordingly, you should examine the individual coefficients involved in each test before deciding to revise your model. Second, as with all searches that use repeated, sequential tests, there is a danger of overfitting the data. When models are constructed based on prior testing using the same data, significance levels should be used only as rough guidelines.
6.3.1   mlögtest for tests of the MNLM
Although the tests in this section can be computed using test or Irtest, in practice this is tedious. The mlogtest command (Freese and Long 2000) makes the computation of these tests easy. The syntax is
mlogtest) [variist]  [, all lr wald combine lrcomb
~het(varlist[\ variist[\...]]) iia hausman smhsiao detail base]
variist indicates that the variables for which tests of significance should be computed. If no variist is given, tests are run for all independent variables.
Options
lr requests a likelihood-ratio (lr) test for each variable in variist. If variist is not specified, tests for all variables are computed.
wald requests a Wald test for each variable in variist. If variist is not specified, tests for all variables are computed.
combine requests Wald tests of whether dependent categories can be combined.
lrcomb requests lr tests of whether dependent categories can be combined. These tests use constrained estimation and overwrite constraint #999 if it is already defined.
set (.varlist[\ varlist[\...]]) specifies that a set of variables is to be considered together for the lr test or Wald test. \ is used to specify multiple sets of variables. For example, mlogtest, lr set (age age2 \ iscatl iscat2) computes one lr test for the hypothesis that the effects of age and age2 are jointly 0 and a second lr test that the effects of iscatl and iscat2 are jointly 0.
Other options for mlogtest are discussed later in the chapter.
236 Chapter 6 Models for nominal outcomes -with case-specific data
6.3.2   Testing the effects of the independent variables
With J dependent categories, there are J—I nonredundant coefficients associated with each independent variable xk. For example, in our logit on occupation, there are four coefficients associated with ed: /32]M|P, #2,bjp, fh,c\p> and @2,w\p- The hypothesis that Xk does not affect the dependent variable can be written as
Hq'- ßk,l\b — "' ■ = ßk,
J\b
0
where b is the base category. Because Pk,b\b is necessarily 0, the hypothesis imposes constraints on J - 1 parameters. This hypothesis can be tested with either a Wald or an lr test.
A likelihood-ratio test
The lr test involves (1) fitting the full model, including all the variables, resulting in the likelihood-ratio statistic LR'p; (2) fitting the restricted model that excludes variable Xk, resulting in LR~R; and (3) computing the difference LR^vsF^ LRp - LR%, which is distributed as chi-squared with J—1 degrees of freedom if the null hypothesis is true. This can be done using Irtest:
. use http://www.stata-press.com/data/lf2/nomocc2, clear (1982 General Social Survey)
. mlogit occ white ed exper, baseoutcome(5) nolog
(output omitted)
. estimates store fmodel
. mlogit occ ed exper, baseoutcome(5) nolog (output omitted)
. estimates store. nmodel_white
. Irtest fmodel nmo<iel_white
Likelihood-ratio test
(Assumption: nmodel_white nested in fmodel)
. mlogit occ white exper, baseoutcome(6) nolog
(and so on)
LR ch±2(4) Prob > chi2
8.10 0.0881
Although using Irtest is straightforward, the command mlogtest, lr is even simpler because it automatically computes the tests for all variables by making repeated calls to Irtest:
. mlogit occ white ed exper, baseoutcome(5) nolog (output 'omitted)
mlogtest, lr j
**** Likelifioocl-ratio tests for independent variables <N=337) Ho: All coefficients associated with~-given variable(s) are 0.
occ	chi2	df	P>chi2*
white	8.095	4	0.088
ed	156.937	4	0.000
exper	8.561	4	0.073
6.3.2   Testing the effects of the independent variables
237
The results of the lr test, regardless of how they are computed, can be interpreted as follows:
The effect of race on occupation is significant at the .10 level but not at the .05 level (X2 = 8.10, df = 4, p = .09). The effect of education is significant at the .01 level (X2 = 156.94, df = 4, p < .01).
Or, it can be stated more formally:
The hypothesis that all the coefficients associated with education are simultaneously equal to 0 can be rejected at the .01 level (X2 = 156.94, df = 4, p < .01).
A Wald test
Although the lr test is generally considered superior, its computational costs can be prohibitive if the model is complex or the sample is very large. K Wald tests can also be computed using test without fitting additional models. For example,
. mlogit occ white ed exper, baseoutcome(5) nolog (output omitted)
. test white
( i) [Menial]white = 0
( 2) [BlueCol]white = 0
( 3) [Craft]white - 0
( 4) [WhiteCol]white = 0
chi2(   4) = 8.15 Prob > chi2 = 0.0863
. test ed
(1) [Menial]ed = 0
( 2) [BlueCol]ed = 0
( 3) [Craft]ed - 0
( 4) [WMteCol]ed = 0
chi2(   4) = 84.97 Prob > chi2 = 0.0000
. test exper
>
(1)    [Menial]exper = 0 y' ( 2)    [BlueCol] exper = 0 •"' ( 3)    [Craft] exper = 0 ( 4)    [WhiteCol]exper = 0
chi2(   4) = 7.99 Prob > chi2 = 0.0918
The output from test makes explicit which coefficients are being tested. Here we see the way in which Stata labels parameters in models with multiple equations. For example, [Menial] white is the coefficient for the effect of white in the equation comparing the outcome Menial with the base category Prof; [BlueCol] white is the coefficient for the effect of white in the equation comparing the outcome BlueCol with the base category Prof.
238 Chapter 6 Models for nominal outcomes with case-specific data
As with the.L.R test, mlogtest, wald automates this process:
/ . mlogtest, wald )
—**** Maid te'stsrfo'r independent variables (N=337)
Ho: All coefficients associated with given variable(s)■are 0.
occ	chi2	df	P>chi2
white	S.149	4	0.086
ed	84.968	4	0.000
exper	7.995	4	0.092
These tests can be interpreted in the same way as shown for the LR test above.
Testing multiple independent variables
The logic of the Wald or LR tests can be extended to test that the effects of two or more independent variables are simultaneously zero. For example, the hypothesis to test that Xk and X{ have no effect is
Ho- ß
'k,i\b
— ßk,J\b — ßi,l\b — ■■ ■ — ß£tJ\b — 0
The set(varlist[\ varlist[\...]]) option in mlogtest specifies which variables are to be simultaneously tested. For example, to test the hypothesis that the effects of ed and exper are simultaneously equal to 0, we could use lrtest as follows:
. mlogit occ white ed exper, baseoutcome(5) nolog (output omitted)
. estimates store fmodel
. mlogit occ white, baseoutcome(5) nolog
(output omitted)__..... ________
. estimates store nraodel
. lrtest fmodel nmodel
Likelihood-ratio test
(Assumption: nmodel nested in fmodel)
LR chi2(8) = Prob > chi2 =
160.77 0.0000
6.3.3   Tests for combining alternatives
or, using mlogtest,
239
. mlogit occ white ed exper, baseoutcome(5) nolog (outputjimitted)
. mlogtest, lr sei~('e<f experj^)
**** Likelihood-ratio tests for independent variables (M=337) Ho: All coefficients associated with given variable(s) are 0.
occ	chi2	df	P>chi2
white	8.095	4	0.088
ed	156.937	4	0.000
exper	8.561	4	0.073
set_l:. ed exper	160.773	8	0.000
6.3.3   Tests for combining alternatives
If none of the independent variables significantly affect the odds of alternative tn versus alternative n, we say that m and n are indistinguishable with respect to the variables in the model (Anderson 1984). Alternatives m and n's being indistinguishable corresponds to the hypothesis that
which can be tested with either a Wald or an LR test. In our experience, the two tests provide similar results. If alternatives are indistinguishable with respect to the variables in the model, then you can obtain more efficient estimates by combining them. To test whether alternatives are indistinguishable, you can use mlogtest.
A Wald test for combining alternatives
The command mlogtest, combine computes Wald tests of the null hypothesis that two alternatives can be combined for all pairs of alternatives. For example,
(Continued on next page)
240
Chapter 6 Models for nominal outcomes with case-specific data
. mlogit occ white ed exper, baseoutcome{5) nolog (output.omitted) 'O^. mlogtest, combine^?
**** Waid testa for combining alternatives (11=337)
Ho: All coefficients except intercepts associated with a given pair of alternatives are 0 (i.e., alternatives can be combined).
For example, we can reject the hypothesis that categories Menial and Prof are indistinguishable, whereas we cannot reject that Menial and BlneCol are indistinguishable.
Using test [category]*
The mlogtest command computes the tests for combining categories with the test command. For example, to test that Menial is indistinguishable from the base category Prof, type
. test [Menial]
( 1)    [Menial] white = 0____.....___________________.............
( 2)    [Menial]ed = 0 (3)    [Menial]exper = 0
chi2(   3) = 48.19 Prob > chi2 - 0.0000
which matches the results from mlogtest in row Menial-Prof, [outcome} in test is used to indicate which equation is being referenced in multiple equation commands, mlogit is a multiple equation command because it is in effect estimating .7—1 binary logit equations.
The test is more complicated when neither outcome is the base category. For example, to test that m and n are indistinguishable when the base category b is neither m nor n, the hypothesis you want to test is
That is, you want to test the difference between two sets of coefficients. This can be done with test [outcomel =outcome£]. For example, to test if Menial and Craft can be combined, type
Alternatives tested	chi2	df	P>chi2	
Menial- BlueCol	3.994	3	0.262 V,	
Menial- Craft	3.203	3	0.361	
Menial-WhiteCol	11.951	3	0.008	
Menial- Prof	48.190	3	o.ooog?	
BlueCol- Craft	8.441	3	0.038	
BlueCol-WhiteCol	20.055	3	0.000	V
BlueCol- Prof	76.393	3	0.000	
CraXt-WhiteCol	8.892	3	0.031	
Craft- Prof	60.583	3	0.000	
WhiteCol- Prof	22.203	3	0.000	
6.3.3   Tests for combining alternatives 241
. test [Menial=Craft]
( 1)    [Menial]white - [Craft]white =0 ( 2)    [Menial] ed - [Craft] ed = 0 ( 3)    [Menial]exper - [Craft]exper = 0 chi2(   3) = 3.20 Prob > chi2 =      0.3614
Again the results are identical to those from mlogtest.
An LR test for combining alternatives
An lr test of combining m and n can be computed by first fitting the full model with no constraints, with the resulting lr statistic LR%. Then we fit a restricted model Mr in which outcome m is used as the base category and all the coefficients except the constant in the equation for outcome n are constrained to 0, with the resulting test statistic LR%. The test statistic is the difference LR?RvbF = LR\ - LR2R, which is distributed as chi-squared with K degrees of freedom. The command mlogtest, lrcomb computes J x(J ~1) tests for all pairs of outcome categories. For example,
. mlogit occ white ed exper, baseoutcome(5) nolog (output omitted)
. mlogtest, lrcomb **** LR tests for combining alternatives
1=337)
Ho
All coefficients except intercepts associated with a given pair of alternatives are 0 {i.e., alternatives can be collapsed).
Alternatives tested	cli2		df	P>chi2	
Menial- BlueCol	4	095	3	0	251
Menial- Craft	3	376	3	0	337
Menial-WhiteCol	13	223	3	0	004
Menial- Prof	64	607	3	0	000
BlueCol- Craft	9	176	3	0	027
BlueCol-HhiteCol	22	803	3	0	000
BlueCol- Prof	125	699	3	0	000
Craft-WhiteCol	■9	992	3	0	019
Craft- Prof	95	889	3	0	000
WhiteCol- Prof	26	736	3	0	000
Using constraint with Irtest*
The command mlogtest, lrcomb computes the test by using the powerful constraint command. To show this, we use the test comparing Menial and BlueCol reported by mlogtest, lrcomb above. First, we fit the full model and save the results of Irtest:
mlogit occ white ed exper, nolog {output omitted)
estimates store fmodel
242
Chapter 6 Models for nominal outcomes with case-specißc data
Second, we define a constraint using the command
. constraint define 999 [Menial]
This defines constraint 999, where the number is arbitrary. The expression [Menial] indicates that all the coefficients except the constant from the Menial equation should be constrained to 0. Third, we refit the model with this constraint. The base category must be BlueCol, so that the coefficients indicated by [Menial] are comparisons of BlueCol and Menial:
. mlogit occ exper ed white, base(2) constraint(999) nolpg
Multinomial logistic regression Number of obs
LR chi2(9) Prob > chi2
Log likelihood - -428.84791 Pseudo R2
( 1)    [Menial] exper = 0 ( 2)    [Menial] ed = 0 ( 3)    [Menial]white = 0
337 161.99 0.0000 0.1589
Menial
exper ed white _cons
Craft
exper ed white _cons
WhiteCol
-exper"
ed white
_cons
Prof
exper ed white
„cons
Coef.
Std. Err.
P>]zl
[95% Conf. Interval]
(dropped) (dropped) (dropped) -.8001193
.2162194
-3.70 0.000
-1.223901
,3763371
.0242824 .0113959
.1599345 .0693853
-.2381783 .4978563
-1.969087 1.054935
2.13 0.033
2.31 0.021
-0.48 0.632
-1.87 0.062
.0019469 .0239418 -1.213959 -4.036721
.0466179 .2959273 .7376021 .098547
-rU3t2007-
.4195709 .0958978
.8829927 .843371
-7.140306 1.623401
--2T17--0t030~
4.38 0.000 1.05 0.295 -4.40 0.000
-.0030561 .2316147 -.7699841 -10.32211
.0593454 .607527 2.535969 -3.958498
.032303 .8445092 1.097459 -12.42143
.0133779 .093709 .6877939 1.569897
2.41 0.016
9.01 0.000
1.60 0.111
-7.91 0.000
.0060827 .6608429 -.2505923 -15.49837
.0585233 1.028176 2.44551 -9.344489
(occ==BlueCol is the base outcome)
mlogit requires the option constraint (999) to indicate that estimation should impose this constraint. The output clearly indicates which constraints have been imposed. Finally, we use lrtest to compute the test:
. estimates store nmodel . lrtest fmodel nmodel
Likelihood-ratio test LR chi2(3)   = 4.09
(Assumption: nmodel nested in fmodel) Prob > chi2 = 0.2514
6.4   Independence of irrelevant alternatives
243
6.4   Independence of irrelevant alternatives
Both the MNLM and the conditional logit model (discussed below), make the assumption known as the independence of irrelevant^alternatives (iIA). Here we describe the assumption in terms of the MNLM. In this model""™""
Pr (y — m j x) Pr (y — n | x)
= exp{x (ßm]b-ßn]b)}
wh<>.re the odds do not depend on other alternatives that are available. In this sense, these alternatives are "irrelevant". What this means is that adding .ox deleting alternatives does not affect the odds ajnor^fJfeTrem^mg alternative^ This point is often made with the red bus-blue bus example. Suppose that you have the choice of a red bus or a car to get to work and that the odds of taking a red bus compared with those of taking a car are 1:1. . IIA implies that the odds will remain 1:1 between these two alternatives, even if a new fame bus company comes to town that is identical to the red bus company, except for the color of the bus. Thus the probability of driving a car can be made arbitrarily small by adding enough different colors of buses! More reasonably, we might expect that the odds of a red bus compared with those of a car would be reduced to 1:2 since half of those riding the red bus would be expected to ride the blue bus.
Tests of IIA involve comparing the ftst.foftt.ftH mftffiHentB from the full model to those froma restricted model tlrat .excludes at least one of the alternatives. If the test statistic is significant, the assumption of lLAis rejected indicating that the MNLM is^mapprogriate. In this section, we consider the two most common tests of IIA: the Hausman-McFadden (HM) test (1984) and the Small-Hsiao (SH) test (1985). For details on other tests, see Fry and Harris (1996, 1998). In a model with J alternatives, there are J - 1 ways of computing each test. If you remove the first alternative and refit the model, you get the first restricted model. If you remove the second alternative, the second, and so on, for a total of J — 1 restricted models, each of these restricted models will lead to a different test statistic, as we demonstrate below.
Both the HM and the SH tests are computed by mlogtest, and for both tests we compute J — I variations. As many users of mlogtest have told us, the HM and SH tests often provide conflicting information on whether IIA has been violated (i.e., some of the tests reject the null hypothesis, whereas others do not). To explore this further, Cheng and Long (2005) ran Monte Carlo experiments to examine the properties of these tests. Their results show that the HM test has poor size properties even with sample sizes of more than 1,000. For some data structures, the SH test has reasonable size properties for samples of 500 or more. But, with other data structures the size properties are extremely poor and do not get better as the sample size increases. Overall, they conclude that these tests are not useful for assessing violations of the IIA property. It appears that the best advice regarding IIA goes back to an early statement by McFadden (1973), who wrote that the multinomial and conditional logit models should be used only in cases where the alternatives "can plausibly be assumed to be distinct and weighted independently in the eyes of each decision maker". Similarly, Amemiya
244
Chapter 6 Models for nominal outcomes with case-specific data
(1981, 1,517) suggests that the mnlm works well when the alternatives are dissimilar. Care in specifying the model to involve distinct alternatives that are not substitutes for one another seems to be reasonable, albeit unfortunately ambiguous, advice. Nonetheless, we continue to include these tests in mlogtest, but we do not encourage their use. As we will show here, these tests can produce contradictory results.
Hausman test of HA
The Hausman test of HA involves the following steps:
1. Fit the full model with all J alternatives included, with estimates in j3F.
2. Fit a restricted model by eliminating one or more alternatives, with estimates in 3«-
3. Let 3f be & subset of J3F after eliminating coefficients not fitted in the restricted model. The test statistic is
H = (flR -%) {vVr($R) ^ Var(3;) }_1 (pR - ffF)
where H is asymptotically distributed as chi-squared with degrees of freedom equal to the rows in fiR if HA is true. Significant values of H indicate that the IIA assumption has been violated.
The Hausman test of IIA can be computed with mlogtest. Here the results are
. mlogit occ white ed exper, baseoutcome(5) nolog {output omitted)
. mlogtest, hausman base
**** Hausman tests of IIA assumption (N="337) Ho: Odds{Outcome-J vs Dutcome-K) are independent of other alternatives.
Omitted	chi2	df	P>chi2	evidence
Menial	7.324	12	0.835	for Ho
BlueCol	0.320	12	1.000	for Ho
Craft	-14.436	12	1.000	for Ho
WhiteCol	-5.541	11	1.000	for Ho
Prof	-0.119	12	1.000	for Ho
Five tests of IIA are reported. The first four correspond to excluding one of the four nonbase categories. The fifth test, in row Prof, is computed by refitting the model using the largest remaining outcome as the base category.1 Although none of the tests I reject the H0 tlra^HAlioMsjJhe results differ considerably, depending on the outcome * 'oo^d^eST^w^s^^^&^tla.e test statistics are negative, which we find to be very
1. Even though mlogtest fits other models to compute various tests, when the command ends it restores the estimates from your original model. Accordingly, other commands that require results from your original mlogit, such as predict and prvalue, will still work correctly.
3"
6.4   Independence of irrelevant alternatives
245
common. Hausman and McFadden (1984, 1226) note this possibility and conclude that a negative result is evidence that IIA has qpt^beeri^olated. A further sense of the variability of the results can be seen by rerunning mlogit with a different base category and then running mlogtest, hausman base.
Small-Hsiao test of IIA
To compute Small and Hsiao's test, the sample is divided randomly into two subsamples of about equal size. The unrestricted mnlm is fitted on both subsamples, where 3f1 contains estimates from the unrestricted model on the first subsample and 3f2 is its counterpart for the second subsample. A weighted average of the coefficients is computed as
Next a restricted sample is created from the second subsample by eliminating all cases with a chosen value of the dependent variable. The mnlm is fitted using the restricted sample, yielding the estimates ffi and the likelihood L(J3**). The Small-Hsiao statistic
SH = -2{L0Sul3')-L(^a)}
which is asymptotically distributed as a chi-squared with the degrees of freedom equal to the number of coefficients that are fitted both in the full model and the restricted model.
To compute the Small-Hsiao test, you use the command mlogtest, smhsiao (our program uses code from smhsiao by Nick Winter, available at the SSC-IDEAS archive). For example,
. mlogtest, smhsiao
**** Small-Hsiao tests of IIA assumption (11=337) Ho: Odds(Outcome-J vs Ontcome-K) axe independent of other alternatives.
Omitted	lnL(full)	lnL(omit)		chi2	df	P>chi2		evidence
Menial	-182.140	-169.907	24	.466	12	0	018	against Ho
BlueCol	-148.711	-140.054	17	.315	12	0	138	for Ho
Craft	-131.801	-119.286	25	.030	12	0	015	against Ho
WhiteCol	-161.436	-148.550	25	772	12	0	012	against Ho
In three variations of the sh test, we reject the null, whereas the hm test accepted the null in all cases.
Because the Small-Hsiao test requires randomly dividing the data into subsamples, the results will differ with successive calls of the command, as the sample will be divided differently. To obtain test results that can be replicated, you must explicitly set the seed used by the random-number generator. For example,
246
Chapter 6 Models for nominal outcomes with case-specific data
. set seed 8675309 , mlogtest, smhsiao
**** Small-Hsiao tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Qutcome-K) are independent of other alternatives.
Omitted	lnL(full)	lnL(omit)	Chi2	df	P>chi2	evidence
Menial	-169.785	-161.523	16.523	12	0.168	for Ho
BlueCol	-131.900	-125.871	12.058	12	0.441	for Ho b
Craft	-136.934	-129.905	14.058	12	0.297	for Ho
WhiteCol	-155.364	-150.239	10.250	12	0.594	for Ho
Using a new seed, we accept the null in each case, illustrating a common problem when using the sh test—you can get quite different results depending on how the sample is randomly divided.
Advanced: setting the random seed The random numbers that divide the sample for the Small-Hsiao test are based on Stata's uniform() function, which uses a pseudorandom number generator. This generator creates a sequence of numbers based on a seed number. Although these numbers appeal- to be random, the same sequence will be generated each time you start with the same seed number. In this sense (and some others), these numbers are pseudorandom rather than random. If you specify the seed with set seed #, you ensure that you can replicate your results later. See the Data Management Reference Manual for more details.
6.5   Measures of fit
As with the binary and ordinal models, scalar measures of fit for the mnlm model can be. computed with the SPost command f itstat. The same caveats against overstating the importance of these scalar measures apply here as to the other models we consider (see also chapter 3). To examine the fit of individual observations, you can estimate the series of binary logits implied by the multinomial logit model and use the established methods of examining the fit of observations to binary logit estimates. This is the same approach that was recommended in chapter 5 for ordinal models.
6.6 Interpretation
Although the mnlm is a mathematically simple extension of the binary model, interpretation is made difficult by the many possible comparisons. Even in our simple example with five outcomes, we have many possible comparisons: M\P, B\P, C\P, W\P, M\W, B\W, C\W, M\C, P>\C, and M\B. It is tedious to write all the comparisons, let. alone to interpret each of them for each of the independent variables. Thus the key to interpretation is to avoid being overwhelmed by the many comparisons. Most of the methods
6.6.2   Predicted probabilities with predict
247
we propose are similar to those for ordinal outcomes, and accordingly, these are treated briefly. However, methods of plotting discrete changes and factor changes are new, so these are considered in greater detail.
6.6.1   Predicted probabilities
Predicted probabilities can be computed with the formula
Pr (y — m ( x) =
E/=i exP (*ßj\j)
where x can contain values from individuals in the sample or hypothetical values. The most basic command for computing probabilities is predict, but we also illustrate a series of SPost commands that compute predicted probabilities in useful ways.
6.6.2   Predicted probabilities with predict
After fitting the model with mlogit, the predicted probabilities within the sample can be calculated with the command
predict newvarl [newvar2 ... [newvarj] ] [if] [in]
where you must provide one new variable name for each of the J categories of the dependent variable, ordered from the lowest to highest numerical values. For example,
. mlogit occ white ed exper, baseoutcome<5) uolog (output omitted)
. predict ProbM ProbB ProJj£_ProbW PjrjpbP Coptlbirp^assumed; predicted probabilities)
The variables created by predict are
. desc Prob*
storage display
variable name	type	format
ProbM	float	'/.9.0g
ProbB	float	'/.9.0g
ProbC	float	XS.Og
ProbW	float	y.9.og
ProbP	float	%9.0g
value label
variable label
Pr(occ==l) Pr(occ==2) Pr(occ==3) Pr(occ==4) Pr{occ==5)
summarize Prob*
Variable	Obs	Mean	Std. Dev.	Hin	Max
ProbM	337	.0919381	.059396	.0010737	.3281906
ProbB	337	.2047478	.1450568	.0012066	.6974148
ProbC	337	.2492582	.1161309	.0079713	.551609
ProbW	337	.1216617	.0452844	.0083857	.2300058
ProbP	337	.3323442	.2870992	.0001935	.9597512
248
Chapter 6 Models for nominal outcomes with case-specific data
6.6.4   Tables of predicted probabilities with prtab
249
Using predict to compare mlogit and ologit
An interesting way to illustrate how predictions can be plotted is to compare predictions from ordered logit and multinomial logit when the models are applied to the same data. Recall from chapter 5 that the range of the predicted probabilities for middle categories abruptly ended, whereas predictions for the end categories had a more gradual distribution. To illustrate this point, the example in chapter 5 is estimated using ologit and mlogit, with predicted probabilities computed for each case;
. use http://www.stata-press.com/data/lf2/ordwarm2,clear (77 & 89 General Social Survey)
. ologit warm yr89 male white age ed prst, nolog
(output omitted)
. predict SDologit Dologit Aologit SAologit (option p assumed; predicted probabilities)
. label var Dologit "ologit-D"
. mlogit warm yr89 male white age ed prst, nolog (output omitted)
. predict SDmlogit Dmlogit Amlogit SAmlogit (option p assumed; predicted probabilities)
. label var Dmlogit "mlogit-D"
We can plot the predicted probabilities of disagreeing in the two models with the command dotplot Dologit Dmlogit, ylabel(0(.25).75), which leads to
ér
ologit-D
Although the two sets of predictions have a correlation of .92 (computed by the command correlate Dologit Dmlogit), the abrupt truncation of the distribution for the ordered logit model strikes us as substantively unrealistic.
6.6.3   Predicted probabilities and discrete change with prvalue
Predicted probabilities for individuals with specified characteristics can be computed with prvalue. For example, we might compute the probabilities of each occupational outcome to compare nonwhites and whites who are average on education and experience:
. use http://www.stata-press.com/data/lf2/nomocc2, clear (1982 General Social Survey)
. mlogit occ white ed exper, baseoutcome(5) nolog (output omitted)
. quietly prvalue, x(white=0) rest(mean) save
. prvalue, x(white=l) rest(mean) diff
mlogit: Change in Predictions for occ
Confidence intervals by delta method
	Current		Saved	Change		95"/. CI for	Change
Pr(y=Menial|x):	0	0860	0.2168	-0	1309	[-0.3056,	0.0439]
Pr(y=BlueColjx):	0	1862	0.1363	0	0498	[-0.0897,	0.1893]
Pr(y=Craft|x):	0	2790	0.4387	-0	1597	[-0.3686,	0.0491]
Pr(y=WhiteCol|x):	0	1674	0.0877	0	0797	[-0.0477,	0.2071]
Pr(y=ProfIx):	0	2814	0.1204	0	1611	[ 0.0277,	0.2944]
white		ed	exper				
Current= 1	13	094955	20.501484				
Saved= 0	13	094955	20.501484				
Diff= 1		0	0				
This example also shows how to use prvalue to compute differences between two sets of probabilities. Our first call of prvalue is done quietly, but we save the results. The second call uses the diff option, and the output compares the results for the first and second set of values computed. By using prvalue with the save and. diff options, we obtain confidence intervals for the discrete changes. The predicted difference between blacks and whites in the probability of having professional jobs is the only case in which the 95% confidence interval does not include zero.
6.6.4   Tables of predicted probabilities with prtab
If you want predicted probabilities for all combinations of a set of categorical independent variables, prtab is useful. For example, we might want to know how white and nonwhite respondents differ in their probability of having a menial job by years of education:
(Continued on next page)
250
Chapter 6 Models for nominal outcomes with case-specific data
6-6.5   Graphing predicted probabilities with prgen
251
. label def lwhite 0 NonWhite 1 White
. label val white Iwhite
. prtab ed white, novarlbl outcome(1)
mlpgit: Predicted probabilities of outcome 1 (Menial) for occ
ed	white NonWhite		White
3	0	2847	0.1216
6	0	2987	0.1384
7	0	298S	0.1417
8	0	2963	0.1431
9	0	2906	0.1417
10	0	2814	0.1366
11	0	2675	0.1265
12	0	2476	0.1104
13	0	2199	0.0883
14	0	1832	0.0632
15	0	1393	0.0401
16	0	0944	0.022S
17	0.0569		0.0120
18	0.0310		0.0060
19	0.0158		0.0029
20	0.0077		0.0014
white x= .91691395
ed
13.094955
exper 20.501484
Tip: outcome() option Here we use the outcome () option to restrict the output to one outcome category. Without this option, prtab will produce a separate table for each outcome category.
The table produced by prtab shows the substantial differences between whites and nonwhites in the probabilities of having menial jobs and how these probabilities are affected by years of education. However, given the number of categories for ed, plotting these predicted probabilities with prgen is probably a more useful way to examine the results.
6.6.5   Graphing predicted probabilities with prgen
Predicted probabilities can be plotted using the same methods considered for the ordinal regression model. After fitting the model, we use prgen to compute the predicted probabilities for whites with average working experience as education increases from 6 years to 20 years:
. prgen ed, x(white=l) from(6) to(20) generate(wht) ncases(lS) ralogit: Predicted values as ed varies from 6 to 20.
white
X" 1
ed
13.094955
exper 20.501484
Here is what the options specify:
x(white=l) sets white to 1. Because the restO option is not included, all other variables are set to their means by default.
irom(6) and to(20) set the minimum and maximum values over which ed is to vary. The default is to use the variable's minimum and maximum values.
ncases(15) indicates that 15 evenly spaced values of ed between 6 and 20 are to be generated. We chose 15 for the number of values from 6 to 20, inclusive.
gen(wht) specifies the root name for the new variables generated by prgen. For example, the variable whtx contains values of ed, the p-variables (e.g., whtp2) contain the predicted probabilities for each outcome, and the s-variables contain the summed
probabilities:
desc wht*
XA
storage display
value
variable name	type	format	label	variable label	/
whtx	float	7.9-Og		Years of educatip	/ n
whtpl	float	7.9.0g		pr (Menial) =Pr (1')	
whtp2	float	X9-0g		pr(BlneCol)=Pr(2)	y \ \
whtp3	float	7.9.0g		pr(Craft)=Pr(3)	
whtp4	floert	7.9. Og		pr(WhiteCol)=Pr(4) v f	
whtpö	float	7.9.0g		pr(Prof)=Pr(5)	\s i
wht si	float	7.9-Og		pr(y<=l)	I
whts2	float	7.9-0g		pr(y<=2)	/
whts3	float	7.9-Og		pr(y<=3)	/
whts4	float	7,9-0g		pr(y<=4)	/
whts5	float	7.9. Og		pr(y<=5)	/
Y
.J^—
The same thing can be done to compute predicted probabilitigs'for nonwhites:
. prgen ed, x(white=0) from(6) to(20) generate(nwht|""ncases(15) mlogit: Predicted values as ed varies from 6 to^O.
white 0
ed
13.094955
exper 20.501484
Plotting probabilities for one outcome and two groups
The variables nwhtpl and whtpl contain the predicted probabilities of having menial jobs for nonwhites and whites. Plotting these provides clearer information than the results of prtab given above:
252
Chapter 6 Models for nominal outcomes with case-specific data
6.6.5   Graphing predicted probabilities with prgen
253
label var whtpl "Whites"
label var nwhtpl "Nonwhites^ .-^"^ \^ graph twoway connectedfwhtpl\nwritpli nwhtx,
xtitleC"Years of Education'') -
ytitleC'Pr (Menial Job)")
ylabel(0(.25).50) xlabel(6 8 12 16 20)
Years of Education
Whites
Nonwhites
Graphing probabilities for all outcomes for one group
Even though nominal outcomes are not ordered, plotting the summed probabilities can BeinisemTwayTo^frc^^ show this,-
we construct a graph to show how education affects the probability of each occupation for whites (a similar- graph could be plotted for nonwhites). This is done using the roots# variables created by prgen, which provide the probability of being in an outcome less than or equal to some value. For example, the label for whts3 is pr(y<=3), which indicates that all nominal categories coded as 3 or less are added together. To plot these probabilities, the first thing we do is change the variable labels to the name of the highest category in the sum, which makes the graph clearer (as you will see below):
label var whtsl "Menial"
label var whts2 "Blue Collar"
label var whts3 "Craft"
label var whts4 "White Collar"
To create the summed plot, we use the following command:
.. graph twoway connected whtsl whts2 whts3 whts4 whtx, ///
> xtitleC'Whites: Years of Education") ' ///
> ytitlef"Summed Probability") ///
> xlabel(6(2)20) ///
> ylabel(0(.25)l)
Whiles: Years; of Education
tö
20.;
Menial Craft
Blue Collar White Collar
The graph plots the four summed probabilities against whtx, where standard options for graph are used. This graph is not ideal, but before revising it, let's make sure we understand what is being plotted. The lowest line with circles, labeled "Menial" in the key, plots the probability of having a menial job for a given year of education. This is the same information as plotted in our prior graph for whites. The next line with small diamonds, labeled "Blue Collar" in the key, plots the sum of the probability of having a menial job or a blue-collar job. Thus the area between the line with circles and the line with diamonds is the probability of having a blue-collar job, and so on.
Because what we really want to illustrate are the regions between the curves, this graph is not as effective as we would like. In the graph command below, we use the rarea plot type to shade the regions between the curves. The syntax for an rarea plot2 is
graph twoway rarea ylvar y2var xvar [if] [in]  [, rarea-options]
where ylvar defines the lower boundary and y2var defines the upper boundary of the region for each a:-value given in the variable xvar.
Continuing with our example, as the probabilities are bounded between zero and one, we begin by creating variables that hold these extreme values.
2. Type help twoway rarea for more information.
254
Chapter 6 Models for nominal outcomes with case-specißc
. gen zero = 0 . gen one   = 1
Now we are ready to draw the full graph.
. graph twoway (rarea zero whtsl whtx, bc(gsl))^
>
>
>
>
>
>
>
>
>
>
>
>
(rarea whtsl whts2 Hhtx, bc(gs4)) (rarea wiits2 whts3 whtx, bc(gs8>) (rarea whts3 whts4 whtx, bc(gsll)) (rarea whts4 one whtx, bc(gsl4)), ytitleC Summed Probability") legend( order( 12 3 4 5) labeK 1 "Menial")
label( 2 "Blue Collar") label( 3 "Craft") label(4 "White Collar") labeKS "Professional1 xtitleO'Whites: Years of Education") xlabel(6 8 12 16 20) ylabel(0(.25)1) plotregion(margin(zero))
/// /// /// /// /// /// /// lll-III )) /// /// ///
12- 16 ■'Whites-Years of Education ■
20
Figure 6.1: Whites: years of education.
The changes in the shaded regions in figure 6.1 clearly illustrate how the probability of selecting any one occupation changes as education increases.
Marginal change is defined as
d Pr (y = m | x)
j
= Pr (y = ro j x) <{ ßKm]j - ]T ßk,j\j Pr(y = j \ x
.7=1
As this equation combines all the ßkj\jS, the value of the marginal change depends on the levels of all variables in the model. Further, as the value of xk changes, the sign of the marginal can change. For example, at one point the marginal effect of education on having a craft occupation could be positive, whereas at another point the marginal effect could be negative.
Discrete change is defined as
A Pr (y — m | x.) Axk
= Pr {y = m \ x, xk = xE) - Pr (y ■
x, xk = xs)
where the magnitude of the change depends on the levels of all variables and the size of the change that is being made. The J discrete-change coefficients for a variable (one for each outcome category) can be summarized by computing the average of the absolute values of the changes across all the outcome categories,
1 J
3 = 1
J
APr(y = j |x)
Axk
where the absolute value is taken because the sum of the .changes without taking the absolute value is necessarily zero.
Computing marginal and discrete change with prchange
Discrete and marginal changes are computed with prchange (the full syntax for which is provided in chapter 3). For example,
(Continued on next page)
6.6.6   Changes in predicted probabilities
Marginal and discrete change.can be used in the same way as in models for ordinal outcomes. As before, both can be computed using prchange.
256
Chapter 6 Models for nominal outcomes with case-specific data
. mlogit occ white ed exper (output omitted)
. prchange
mlogit: Changes in Probabilities for occ
white
0->l
0->l
ed
Min->Max -+1/2 -+sd/2 MargEfct
Min->Max -+1/2 -+sd/2 MargEfct
exper
Min->Max -+1/2 -+sd/2 MargEfct
Min->Max -+1/2 -+sd/2 MargEfct
Craft -.15373434
Craft .15010394 .05247185 .14576758 .05287415
WhiteCol .07971004
AvglChgl Menial BlueCol
.11623582   -.13085523 .04981799
Prof .1610615
AvglChgl Menial BlueCol
.39242268 -.13017954 -.70077323
.05855425 -.02559762 -.06831616
.1640657 -.07129153 -.19310513
.05894859 -.02579097 -.06870635
Prof .95680079 .13387768 .37951647 .13455107
AvglChgl Menial BlueCol
.12193559 -.11536534 -.18947365
.00233425 -.00226997 -.00356567
.03253578 -.03167491 -.04966453
.00233427 -.00226997 -.00356571
Prof .17889298 .00308132 .04293236 .00308134
Menial      BlueCol          Craft     WhiteCol Prof -Ttmt&tfl-r294i-H>54-.4,6-1-12368_J6S30P_62_
WhiteCol .0Q425591 .01250795 .03064777 .01282041
Craft .03115708 .00105992 .01479983 .00105992
WhiteCol . 09478889 .0016944 .02360725 .00169442
' PrTyTS
white ed exper
x=    .916914 13.095 20.5015
sd{x)=    .276423 2.94643 13.9594
The first thing to notice is the output labeled Pr(ylx), which is the predicted probabilities at the values set by x() and rest(). Marginal change is listed in the rows MargEfct. For variables that are not binary, discrete change is reported over the range of the variable (reported as Min->Max), for changes of one unit centered on the base values (reported as -+1/2), and for changes of one standard deviation centered on the base values (reported as -+sd/2). If the uncentered option is used, the changes begin at the value specified by x() or restO and increase one unit or one standard deviation from there. For binary variables, the discrete change from 0 to 1 is the only appropriate quantity and is the only quantity that is presented. Looking at the results for white above, we can see that for someone who is average in education and experience, the predicted probability of having a professional job is .16 higher for whites than nonwhites. The average change is listed in the column AvglChgl. For example, for white, A = 0.12, the average absolute change in the probability of various occupational categories for being white as opposed to nonwhite is .12.
6.6:7 Plotting discrete changes with prchange and mlogview Marginal change with mfx
257
The marginal change can also be computed using mfx, where the at() option is used to set values of the independent variables. Like prchange, the mfx command sets all values of the independent variables to their means by default. Also we must estimate the marginal effects for one outcome at a time, using the predict (outcome (#)) option to specify the outcome for which we want marginal effects:
. mfx, predict (outcome(D) Marginal effects after mlogit
y   = Pr(occ==l) (predict, outcome(l)) = .09426806
variable	dy/dx	Std. Err.		z	P>|z|	[ 95'/,	C.I. ]	X
white*	-.1308552	.08914	-1	47	0.142	-.305562	.043852	.916914
ed	-.025791	.00688	-3	75	0.000	-.039269	-.012312	13.09S
exper	-.00227	.00126	-1	80	0.071	-.004737	.000197	20.5015
(*) dy/dx is for discrete change of dummy variable from 0 to 1
These results are for the Menial category (occ==l). Estimates for exper and ed match the results in the MargEfct rows of the prchange output above. Meanwhile, for the binary variable white, the discrete change from 0 to 1 is presented, which also matches the corresponding result from prchange. An advantage of mfx is that standard errors for the effects are also provided; a disadvantage is that mfx can take a long time to produce results after mlogit, especially if the number of observations and independent variables is large.
6.6.7   Plotting discrete changes with prchange and mlogview
One difficulty with nominal outcomes is the many coefficients that need to be considered: one for each variable times the number of outcome categories minus one. To help you sort out all this information, discrete-change coefficients can be plotted using our program mlogview. After fitting the model with mlogit and computing discrete changes with prchange, executing mlogview opens the following dialog box:
(Continued on next page)
258
Chapter 6 Models for nominal outcomes with case-specific data
6.6:7  Plotting discrete changes with prchange and mlogview
259
H Multinomial Logit Plots
Dialog boxes are easier to use than to explain. So, as we describe various features, the best advice is to generate the dialog box shown above and experiment.
Selecting variables If you click and hold a :mM button, you can select a variable to be plotted. The same variable can be plotted more than once, for example, showing the effects of different amounts of change.
Selecting the amount of change The radio buttons allow you to select the type of discrete-change coefficient to plot for each selected variable: +1 selects coefficients for a change of one unit; +SD selects coefficients for a change of one standard -dcwatimTtf/i^eter^
Making a plot Even though there are more options to explain, you should try plotting your selections by clicking on DC Plot, which produces a graph. The command mlogview works by generating the syntax for the command mlogplot, which actually draws the plot. In the Results window, you will see the mlogplot command that was used to generate your graph (full details on mlogplot are given in section 6.6.9). If there is an error in the options you select, the error message will appear in the Results window.
On the assumption that everything has worked, we generate the following graph:
white-0/1 C ed exper
M
B W
B C M
W
-.16 -.12 -.08 -.04 0 Change in Predicted Probabilty for occ
.04
.08
.12
.16
The graph immediately shows how a unit increase in each variable afreets the probability of each outcome. Although it appears that the effects of being white are the largest, changes of one unit in education and (especially) experience are often too small to be as informative. It would make more sense to look at the effects of a standard deviation change in these variables. To do this, we return to the dialog box and click on the radio button +SD. Before we see what this does, let's consider several other options that can be used.
Adding labels The box Note allows you to enter text that will be placed at the top of the graph. Clicking the box for Use variable labels replaces the names of the variables on the left axis with the variable labels associated with each variable. When you do this, you may find that the labels are too long. If so, you can use the label variable- command to change them.
Tick marks The values for the tick marks are determined by specifying the minimum and maximum values to plot and the number of tick marks. For example, we could specify a plot from -.2 to .4 with seven tick marks. This will lead to labels every .1 units.
Using some of the features discussed above, our dialog box would look like this:
mm
260
Chapter 6 Models for nominal outcomes with case-specific data
Clicking on DC Plot produces the following graph:
C M !
White Worker-0/1 Yrs of Education-std Yrs of Experience-std
B W
B
M
W
BM jCWP
-.2 -.1 0 .1
Change in Predicted Probabilty for occ
You can see that the effects of education are largest and that those of experience are smallest. Or, each coefficient can be interpreted individually, such as the following:
The effects of a standard deviation change in education are largest, with an increase of more than .35 in the probability of having a professional occupation.
The effects of race are also substantial, with average blacks being less likely to enter blue-collar, white-collar, or professional jobs than average whites.
Expected changes due to a standard deviation change in experience are much smaller and show that experience increases the probabilities of more highly skilled occupations.
In using these graphs, remember that different values for discrete change are obtained at different levels of the variables, which are specified with the x() and restO options for prchange.
""Value labels with mlogview The value label^cTIhe^üM^n^^äTegöries of the dependent variables must begin with different letters because the plots generated with mlogview use the first letter of the value label.
6.6.8   Odds ratios using listcoef and mlogview
Discrete change does little to illuminate the dynamics among the outcomes. For example, a decrease in education increases the probability of both blue-collar and craft jobs, but how does it affect the odds of a person choosing a craft job relative to a blue-collar job? To deal with these issues, odds ratios (also referred to as factor change coefficients) can be used. Holding other variables constant, the factor change in the odds of outcome m versus outcome n as Xk increases by 5 equals
m.
(x, Xk + 5)
(x, xk)
6.6.8   Odds ratios using listcoef and mlogview
If the amount of change is.6 = 1, the odds ratio can be interpreted as follows:
J^LfL1™^ versu§3.are exp^cjed^^angejby
a. factor of exR(J^|^JbddinR,aii other y^ables^onstajit.
261
If the amount of change is 6 =     , then the odds ratio can be interpreted as follows:
For a standard deviation change in xk, the odds of m versus n are expected to change by a factor of exp{j3k}m\n x sk), holding all other variables constant-
Listing odds ratios with listcoef
The difficulty in interpreting odds ratios for the MNLM is that, to understand the effect of a variable, you need to examine the coefficients for comparisons among all pairs of outcomes. The standard output from mlogit includes only J - 1 comparisons with the base category. Although you could estimate coefficients for all possible comparisons by rerunning mlogit with different base categories (e.g., mlogit occ white ed exper, baseoutcome(3)), using listcoef is much simpler. For example, to examine the effects of race, type s ^
. listcoef^whitey help
mlogit (N=337): Factor Change in the Odds of occ Variable: white (sd=.27642268)
Odds comparing Alternative 1 to Alternative 2			b		z	P>|zi			e~b	e'bStdX
Menial	-BlueCol	-1	.23650	-1	.707	0	.088	0	.2904	0.7105
Menial	-Craft	-0	.47234	-0	.782	0	434	0	6235	0.8776
Menial	-WhiteCol	-1	57139	-1	741	0	082	0	2078	0.6477
Menial	-Prof		'7743 i\	-2	350	0	019	0	1696	0.6123
BlueCol	-Menial	1	"2"36W	1	.707	0	.088	3	4436	1.4075
BlueCol	-Craft	0	76416V	1	208	0	227	2	1472	1.2352
BlueCol	-WhiteCol	-0	33488\	-0	359	0	720	0	7154	0.9116
BlueCol	-Prof	-0	53780 \	-0	673	0	501	0	5840	0.8619
Craft	-Menial	0	47234	0	782	0	434	1	6037	1.1395
Craft	-BlueCol	-0	76416	1 -1	208	0	227	0	4657	0.8096
Craft	-WhiteCol	-1	09904	1-1	343	0	179	0	3332	0.7380
Craft	-Prof	-1	30196	-2	011	0	044	0	2720	0.6973
WhiteCol	-Menial	1	57139	1	741	0	082	4	8133	1.5440
WhiteCol	-BlueCol	0	33488	/ 0	359	0	720	1	3978	1.0970
WhiteCol	-Craft	1	09904 ,	/ 1	343	0	179	3	0013	1.3550
-NbAteCal	=Eraf	-0.20292,//		-0	233	0	815	o.		0.9455
JErof	^Menial_j	Cl	77431^	2	350	0	019		JS968J>	1.6331
Prof'"'	-BlueCol			0	673	0	501	1	7122	1.1603
Prof	-Craft	1	30196	2	011.	0	044	3	6765	1.4332
Prof	-WhiteCol	0	20292	0	233	0	815	1	2250	1.0577
b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test
e~b = exp(b) = factor change in odds for unit Increase in X e'bStdX = exp(b*SD of X) = change in odds for SD increase in X
262 Chapter 6 Models for nominal outcomes with case-specific data
The odds ratios of interest are in the column labeled e"b. For example, the odds ratio for the effect of race on having a professional versus a menial job is 5.90, which can be interpreted as follows:
6.6.8   Odds ratios using listcoef and mlogview
263
The odds of having a professional occupation relative to a menial occupation are 5.90 times greater for whites than for blacks, holding education and experience constant.
V
Remember: the gt, It, and pvalue options control which comparisons are printed by listcoef. See pages 233-234 for more details.
Plotting odds ratios
However, examining all the coefficients for even a single variable with only five dependent categories is complicated. An odds-ratio plot makes it easy to quickly see patterns in results for even a complex MNLM (see Long 1997, chapter 6 for full details). To explain how to interpret an odds ratio plot, we begin with some hypothetical output from a MNLM with three outcomes and three independent variables:
		Logit coefficient for:		
Comparison				
B 1 A	0B\A exp(/3B|A) P	-0.693 0.500 0.04	0.693 2.000 0.01	0.347 1.414 0.42
				
C 1 A	ßc\A exp(ßc\A) P	0.347 1.414 0.21	-07347^ 0.707 0.04	U.Ö93 2.000 0.37
C 1 B	ßc\B exp(/?c|s) V	1.040 2.828 0.02	-1.040 0.354 0.03	0.346 1.414 0.21
These coefficients were constructed to have some fixed relationships among categories and variables:
• The effects of xx and x2 on B | A (which you can read as B versus A) are equal but of opposite size. The effect of x3 is half as large.
• The effects of zi and x2 on C | A are half as large (and in opposite directions) as the effects on B | A, whereas the effect of x3 is in the same direction but twice as large.
In the odds-ratio plot, the independent variables are each, represented on a separate row, and the horizontal axis indicates the relative magnitude of the ß coefficients associated with each outcome. Here is the plot, where the letters correspond to the outcome categories:
Factor Change Scale Relative to Category A -5 .63 .79 1
1.26
1.59
X1
x2 x3
B
B
B
-.69 -.46 -.23 0
Logit Coefficient Scale Relative to Category A
.23
.46
The plot reveals much information, which we now summarize.
Sign of coefficients
If a letter is to the right of another letter, increases in the independent variable make the outcome to the right more likely. Thus relative to outcome A, an increase in xi makes it more likely that we will observe outcome C and less likely that we will observe outcome B. This corresponds to the positive sign of the j3uc\A coefficient and the negative sign of the 01>b\a coefficient. The signs of these coefficients are reversed for x2, and accordingly, the odds-ratio plot for x2 is a mirror image of that for x^
Magnitude of effects
The distance between a pair of letters indicates the magnitude of the effect. For both x-i and x2, the distance between A and B is twice the distance between A and C, which reflects that j3B\A is twice as large as j3C]A for both variables. For x3, the distance between A and B is half the distance between A and C, reflecting that j3^c\a is twice as large as P3;b\a-
The additive relationship
The additive relationships among coefficients shown in (6.1) are also fully reflected in this graph. For any of the independent variables, j3c]A = /?B]A + pC]B. Accordingly, the distance from A to C is the sum of the distances from A to B and B to C.
264
The base category
Chapter 6 Models for nominal outcomes with case-specific
The additive scale on the bottom axis measures the value of the @kim\ns. The multiplicative scale on the top axis measures the exp (/?fciTn]n).s. The As are stacked on top of one another because the plot uses A as its base category for graphing the coefficients. The choice of base category is arbitrary. We could have used alternative B instead. If we had, the rows of the graph would be shifted to the left or right so that the Bs lined up. Doing this leads to the following graph:
SS6
Factor Change Scale Relative to Category B .35 .5 .71 1
1.41
2.83
X1
x2 x3
B
B
A
B
-1.04 -.69 -.35 0
Logit Coefficient Scale Relative to Category B
.35
.69
1.04
Creating odds-ratio plots
These graphs can be created using mlogview after running mlogit. Using our example and after changing a few options, we obtain this dialog box:
WS Multinomial Logit Plots
\ ........-IJ it/    f       u it
wSmlillMmik.
WSmUmUr,
mim,,,,,,;
m
IlZJ
iPoi
Hi"
w____huui
6.6.8 Odds ratios using listcoef and mlogview Clicking on OR Plot gives
265
		Factor Change Scale Relative to Category Prof .06           .11            .19 .33	.58	1	1.73
white-	-0/1	M C	B	W P	
ed-	-std	BMC W		p	
exper-	-std		IVB	ON	
		-2.75          -2.2          -1.65 -1.1 Logit Coefficient Scale Relative to Category Prof	-.55	0	.55
Several things are immediately apparent. The effect of experience is the smallest, although increases in experience make it more likely that one will be in a craft, white-collar, or professional occupation relative to a menial or blue-collar one. We also see that education has the largest effect; as expected, increases in education increase the odds of having a professional job relative to any other type.
Adding significance levels
The current graph does-not reflect statistical significance. This is added by drawing a line between categories for which there is not a significant coefficient. The lack of statistical significance is shown by a connecting line, suggesting that those two outcomes are "tied together". You can add the significance level to the plot with the Connect if box on the dialog box. For example, if we enter .1 in this box and uncheck the "pack odds ratio plot" box, we obtain
(Continued on next page)
266
Chapter 6 Models for nominal outcomes with case-specific data
6.6.9   Using mlogplot*
267
Factor Change Scale Relative to Category Prof .06 .11 .19 .33
.58
1.73
white on		M_		_—---p?	
ed Std Coef			W		P
exper                                              % J Std Coef W					
-2.75 -2.2 -1.65 -1.1
Logit Coefficient Scale Relative to Category Prof
-.55
.55
To make the connecting lines clear, vertical spacing is added to the graph. This vertical spacing has no meaning and is used only to make the Hnes clearer. The graph shows that race orders occupations from menial to craft to blue collar to white collar to professional, but the connecting lines show that none of the adjacent categories are significantly differentiated by race. Being white increases the odds of being a craft worker relative to having a menial job, but the effect is not significant. However, being white significantly increases the odds of being a blue-collar worker, a white-collar worker, or a professional, relative to having a menial job. The effects of ed and exper can be interpreted similarly.
Adding discrete change
In chapter 4, we emphasized that whereas the factor change in the odds is constant across the levels ofäfl variables, the discreTe'''chmrge^gets^srgeror-smaller at different values of the variables. For example, if the odds increase by a factor of 10 but the current odds are 1 in 10,000, the substantive impact is small. But if the current odds were 1 in 5, the impact is large. Information on the discrete change in probability can be incorporated in the odds-ratio graph by making the size of the letter proportional to the discrete change in the odds (specifically, the area of the letter is proportional to the size of the discrete change). This can easily be added to our graph. First, after estimating the MNLM, run prchange at the levels of the variables that you want. Then enter mlogview to open the dialog box. Set any of the options, and then click the OR+DC Plot button:
Factor Change Scale Relative to Category Prof ■06_.11 .19 .33
white
0/1
ed
Std Coef
exper
Std Coef
B M c
-2.75 -2.2 -1.65 -1.1
Logtt Coefficient Scale Relative to Category Prof
-.55
.55
With a little practice, you can quickly create and interpret these graphs.
6.6.9   Using mlogplot*
The dialog box mlogview does not actually draw the plots but only sends the options you select to mlogplot, which creates the graph. Once you click a plot button in mlogview, the necessary mlogplot command, including options, appears in the Results window.' This is done because mlogview invokes a dialog box and so cannot be used effectively in a do-file. But once you create a plot using the dialog box you can copy the generated mlogplot command from the Results window and paste it into a do-file. This should be clear by looking at the following Screenshot:
(Continued on next page)
268
Chapter 6 Models for nominal outcomes with case-specific data
6.6.10   Plotting estimates from matrices with mlogplot*
269
The dialog box with selected options appears in the upper left of the screen. After we clicked on the OR Plot button, the graph in the upper right appeared along with the following command in the Results window:
. mlogplot white ed exper, std(Oss) p(.l) min(-2.75) max(.55) or ntics(7)
If you enter this command from the Command window or run it from a do-file, the same graph will be generated. The full syntax for mlogplot is described in appendix A.
6.6.10   Plotting estimates from matrices with mlogplot*
You can also use mlogplot to construct odds-ratio plots (but not discrete-change plots) using coefficients that are to be contained in matrices. For example, you can plot coefficients from published papers or generate examples like those, we used above. To do this, you must construct matrices containing the information to be plotted and add the option matrix to the command. The easiest way to see how this is done is with an example, followed by details on each matrix. The commands
matrix mnlbeta = (-.693, .693, matrix mnlsd     = (1, 2, 4) global mnlname = "xl x2 x3"
.347    .347, -.347,   .693 )
. global mnlcatnm = "B C A" . global mnldepnm "depvar" .. mlogplot, matrix std(uuu) vars(xl x2 x3) packed
create the following plot:
Factor Change Scale Relative to Category A _.5 .63 .79 1
1.26
1.59
X1
x2 x3
B
B
B
-.69 -.46 -.23 0
Logit Coefficient Scale Relative to Category A
.23
.46
.69
Options for using matrices with mlogplot
matrix indicates that the coefficients to be plotted are contained in matrices.
vars(varlist) contains the names of the variables to be plotted. This list must contain-names from mnlname, which will be described next, but does not need to be in the same order as in mnlname. The list can contain the same name more than once and can select a subset of the names from mnlname.
Global macros and matrices used by mlogplot
mnlname is a string containing the names of the variables corresponding to the columns of the matrix mnlbeta. For example, global mnlname = "xl x2 x3".
mnlbeta is a matrix with the (3s, where element (?', j) is the coefficient /?7,,|i>. That is, rows i are for different contrasts; columns j are for variables. For example, matrix mnlbeta = (-.693, .693, .347 \ .347, -.347, .693). As constant terms are not plotted, they are not included in mnlbeta.
mnlsd is a vector with the standard deviations for the variables listed in mnlname. For example, matrix mnlsd =(1,2,4). If you do not want to view standardized coefficients, this matrix can be made all Is.
mnlcatnm is a string with labels for the outcome categories with each label separated by a space. For example, global mnlcatnm = "B C A". The first label corresponds to the first row of mnlbeta, the second to the second, and so on. The label for the base category is last.
270
Chapter 6 Models for nominal outcomes with case-specific data
6.6.10   Plotting estimates from matrices with mlogplot*
271
Example
Suppose that you want to compare the logit coefficients estimated from two groups, such as whites and nonwhites from the example used in this chapter. We begin by estimating the logit coefficients for whites:
. use http://www.stata-press.com/data/lf2/nomocc2, clear (1982 General Social Survey)
. mlogit occ ed exper if white==l, base(B) nolog Multinomial logistic regression
Log likelihood = -388.21313
Number of obs LR chi2(8) Prob > chi2 Pseudo R2
309 154.60 0.0000 0.1660
occ	Coef.     Std. Err.          z      P> 1 z 1         [95'/. Conf. Interval]
Menial ed exper _cons	-.8307514      .1297238       -6.40     0.000       -1.085005 -.5764973 -.0338038      .0192045       -1.76     0.078         -.071444 .0038364 10.34842     1.779603         5.82     0.000         6.860465 13.83638
BlueCol ed exper _cons	-.9225522      .1085452       -8.50     0.000.     -1.135297 -.7098075 -.031449      .0150766       -2.09     0.037       -.0609987 -.0018994 12.27337     1.507683         8.14     0.000         9.318368 15.22838
Craft ed exper _cons	-.6876114      .0952882       -7.22     0.000       -.8743729 -.50085 -.0002589      .0131021       -0.02     0.984       -.0259385 .0254207 9.017976       1.36333         6.61     0.000         6.345897 11.69005
WhiteCol ed exper cons	-.4196403      .0956209       -4.39     0.000       -.6070539 -.2322268 .0008478      .0147558         0.06     0.954       -.0280731 .0297687 4.972973     1.421146         3.50     0.000         2.187578 7.758368
(occ==Prof is	the base outcome)
Next we compute coefficients for nonwhites:
. mlogit occ ed exper if white==0, base(5) nolog Multinomial logistic regression
Humber of obs
28
Log likelihoo	LB chi2(8)           = 17.79 Prob > chi2        = 0.0228 d = -32.779416                                         pseudo R2            = 0.2135
occ	Coef.     Std. Err.          z      P>|z|         [95'/. Conf. Interval]
Menial ed exper _cons	-.7012628      .3331146       -2.11     0.035       -1.354155 -.0483701 -.1108415      .0741488       -1.49     0.135       -.2561705 .0344876 12.32779     6.053743         2.04     0.042         .4626714 24.19291
BlueCol ed exper _cons	-.560695      .3283292       -1.71     0.088       -1.204208 .0828185 -.0261099      .0682348       -0.38     0.702       -.1598477 .1076279 8.063397     6.008358         1.34     0.180       -3.712768 19.83956
Craft ed exper _cons	-.882502      .3359805       -2.63     0.009       -1.541012     - 2239924 -.1597929      .0744172       -2.15     0.032         -.305648 -0139378 16.21925     6.059753         2.68     0.007         4.342356 28.09615
WhiteCol ed exper _cons	-.5311514       .369815       -1.44     0.151       -1.255976 .1936728 -.0520881      .0838967       -0.62     0.535       -.2165227 .1123464 7.821371     6.805367         1.15     0.250       -5.516904 21.15965
(occ==Prof is the base outcome)
The two sets of coefficients for ed are placed in mnlbeta:
. matrix mnlbeta = (-.8307514, -.9225522, -.6876114, -.4196403 \ -.7012628, -.560695 , -.882502 , -.5311514)
Rows of the. matrix correspond to the variables (i.e., ed for whites and ed for nonwhites) since this was the easiest way to enter the coefficients. For mlogplot, the columns must correspond to variables, so we transpose the matrix:
. matrix mnlbeta = mnlbeta"
We assign names to the columns using mnlname and to the rows using mnlcatnm (where the last element is the name of the reference outcome):
, global mnlname = "White WonWhite"
. global mnlcatnm = "Menial BlueCol Craft WhiteCol Prof"
We named the coefficients for ed for whites, White, and the coefficients for ed for nonwhites, NonWhite, as this will make the plot clearer. Next we compute the standard deviation of ed:
272
Chapter 6 Models for nominal outcomes with case-specific data
6.7   Multinomial probit model with IIA
273
ummarize e<l Variable
ed
Dbs
Mean
Std. Dev.
Min
Max
337       13.09496 2.946427
20
and enter the information into mnlsd:
. matrix mnlsd = (2.946427,2.946427)
The same value is entered twice because we want to use the overall standard deviation in education for both groups. To create the plot, we use the command
. mlogplot, vars(White MonWhite) packed or.matrix std(ss)
noteC'Racial Differences in Effects of Education")
which leads to
Racial Differences in Effects of Education
Factor Change Scale Relative to Category Prof .07 .1 16_-26
.64
White-std NonWhite-std
B M
W
M
BW
""^72 -2.27 -1.81 -1.36 --91
Logit Coefficient Scale Relative to Category Prof
-.45
P
if
Given the limitations of our dataset (e.g., there were only 28 cases in the logit lor
-Troirwhites)-^^ reSeaiCh °n'
racial differences in occupational outcomes, but they show the flexibility of the mlogplot
command.
6.7   Multinomial probit model with HA
The multinomial probit regression command mprobit is the normal error counterpart to the multinomial logit model fitted by mlogit in the same way that probit is the normal counterpart to logit. However, mprobit uses a normalization that can obscure this fact. To understand this point, we need to consider how logit and probit models can be motivated as discrete-choice models in which a person maximizes her utility.
Let uim be the utility that person i receives from alternative m. The utility is assumed to be determined by a linear combination of observed characteristics Xj and random error £jm:
ts
Since the utility associated with each alternative m is partly determined by chance through e, the model is also called a random utility model (rum). A person chooses alternative j if the utility associated with that alternative is larger than that for any other alternative. Accordingly, the probability of alternative rn being chosen is-
Pr(yi — m) = Pr(uim > mj for all j ^ m)
The choice that a person makes under these assumptions will not change if the utility associated with each alternative changes by some fixed amount, say, 5. That is, if uim > Uij, then uim + S > Uij + 5. Thus the choice is based on the difference in the utilities between alternatives. We can incorporate this idea into the model by taking the difference in the utilities for two alternatives. To illustrate this, assume that there are three alternatives. We can consider the utility of each alternative relative to some base alternative. It does not matter which alternative is chosen as the base, so we assume that each utility is compared with alternative 1. Accordingly, we have
w»i - u*i = 0
u.i2 - Uji — Xi (/32 - /3X) + (ei2 - en)
Ui3 - tiil = *i (03 - fa) + (£i3 - t.;i)
uu> e*-m = £™ _ e*i a11** 0m\i — 0 m ~ @i> the model can be
If we define u*m = uim written as:
<2 = *i02\l + ei2 ui3 ~ xi/33|l + £j*3
The specific form of the model depends on the distribution of the error terms. Assuming that the es have an extreme value distribution with mean 0 and variance tt2/6 leads to the mnlm that we discussed with respect to mlogit. Assuming that the es have a normal distribution leads to a probit-type model. To understand the model fitted by mprobit and how it relates to the usual binary probit model, we need to pay careful attention to the assumed variance of the errors. The binary probit model fitted by probit makes the usual assumption that Var(sj) = 1/2, so Var(e*) =Var(£j) + Var(ei) = 1. Since we assume that the errors are uncorrelated, Cov(ej,£i) — 0. Using our earlier example for labor force participation, we can fit the binary probit model:
(Continued on next page)