ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES Vol. 67, No. 3, September, pp. 247–257, 1996 ARTICLE NO. 0077 The Evaluability Hypothesis: An Explanation for Preference Reversals between Joint and Separate Evaluations of Alternatives CHRISTOPHER K. HSEE Center for Decision Research, Graduate School of Business, The University of Chicago cate their minimum selling price for each gamble. This research investigates a particular type of Another widely-studied type of choice-judgment PR preference reversal (PR), existing between joint (preference reversal) is between choice and matching evaluation, where two stimulus options are evalu- (e.g., Tversky, Sattath, & Slovic, 1988). In choice, partiated side by side simultaneously, and separate evalcipants choose between two alternatives. In matching, uation, where these options are evaluated sepaparticipants are presented with the same alternatives rately. I first examine how this PR differs from other but some information is missing and participants’ tasktypes of PRs and review studies demonstrating this is to fill in that missing information so that the twoPR. I then propose an explanation, called the evaluaoptions are equally attractive.bility hypothesis, and report experiments that tested this hypothesis. According to this hypothesis, PRs In both the choice-pricing and the choice-matching between joint and separate evaluations occur be- paradigms, reversals occur between tasks that involve cause one of the attributes involved in the options different evaluation scales (Bazerman, Loewenstein, & is hard to evaluate independently and another attri- White, 1992; Goldstein and Einhorn, 1987). In the bute is relatively easy to evaluate independently. I choice-pricing paradigm, the evaluation scale for choice conclude by discussing prescriptive implications of is relative acceptability and that for pricing is money. this research. ᭧ 1996 Academic Press, Inc. In the choice-matching paradigm, the evaluation scale for choice is, again, relative acceptability, and that for matching is probability or value estimation.Normative decision theories assume that people have The present research investigates a different type ofstable and consistent preferences regardless of how the PR than the conventionally studied choice-judgmentpreferences are elicited. An increasing amount of eviPRs. It is between tasks that have identical (or similar)dence has appeared suggesting otherwise; for example, evaluation scales but different evaluation modes. Eval-people may exhibit different or even reverse preferuation mode refers to whether the stimulus options areences for the same options in two normatively equivapresented side by side and evaluated by the same peo-lent evaluation conditions. Most preference reversals ple (the joint evaluation mode), or presented separately(PRs) documented in the literature are between choice and evaluated by two different groups of people (theand judgment. One widely studied type of choiceseparate evaluation mode) (cf., Goldstein & Einhorn,judgment PRs (preference reversals) is between choice and pricing (e.g., Grether & Plott, 1979; Lichtenstein & 1987). In this article, I first review studies that demonSlovic, 1971; Slovic & Lichtenstein, 1969). In choice, strate this type of PR. Next I propose an explanation, participants choose between two alternatives, typically and describe several other studies that tested this exa high-payoff/low-probability gamble and a low-payoff/ planation. Finally I discuss prescriptive implications of high-probability gamble. In pricing, participants indi- this research. This research is supported by a fourth quarter funding provided STUDY 1by the Graduate School of Business, University of Chicago. Correspondence and reprint requests should be addressed to Christopher K. Hsee, Graduate School of Business, University of Chicago, 1101 East 58th Street, Chicago, IL 60637. E-mail: christopher.hsee@ Method gsb.uchicago.edu. I thank Sally Blount, David Budescu, Bill Goldstein, Josh Klayman, Rick Larrick, George Loewenstein, Paul The purpose of this study is to illustrate the joint-Slovic, Dick Thaler, and Elke Weber for their helpful comments on drafts of this article. separate evaluation PR effect. The study involved the 247 0749-5978/96 $18.00 Copyright ᭧ 1996 by Academic Press, Inc. All rights of reproduction in any form reserved. / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 248 CHRISTOPHER K. HSEE evaluations of two hypothetical second-hand music dic- tionaries: Dictionary A Dictionary B Year of publication: 1993 1993 Number of entries: 10,000 20,000 Any defects? No, it’s like new. Yes, the cover is torn; otherwise it’s like new. The questionnaire for this study had three betweensubject versions, joint-evaluation, separate-evaluationA, and separate-evaluation-B. In each version, participants were asked to assume that they were a music major and that they were looking for a music dictionary FIG. 1. Mean WTP values for Dictionary A and Dictionary B inin a used book store and planned to spend between $10 Study 1. The numbers in parentheses indicate numbers of particiand $50. In the joint-evaluation condition, participants pants. were told that there were two music dictionaries in the store. They were then presented with the information about both dictionaries (as listed above) and asked how conditions that shared a constant, WTP, scale; the only much they were willing to pay for each dictionary. In difference lay in whether the stimulus options were each of the separate-evaluation conditions, particievaluated jointly or separately. pants were told that there was only one music dictionJoint-separate evaluation PRs have been docuary in the store; they were presented with the informamented in other contexts as well. One of the original tion on one of the dictionaries and asked how much demonstrations of joint-separate evaluation PRs was they were willing to pay. (Because there was only one provided by Bazerman, Loewenstein, and White (1992). dictionary in each separate-evaluation condition, the Participants read a description of a dispute between label ‘‘A’’ or ‘‘B’’ was not used.) Note that across the themselves and their neighbor and then evaluated difthree conditions the evaluation scale was held conferent potential resolutions of the dispute. Among the stant, namely, willingness-to-pay (WTP) price. various resolution options were the following two: Respondents were 116 unpaid college students from the University of Chicago and the University of Illinois at Chicago. They randomly received one of the three A: $600 for self and $800 for neighbor B: $500 for self and $500 for neighborversions of the questionnaire and completed it individ- ually. In joint evaluation, participants were presented withResults and Discussion pairs of options, such as the one listed above, and asked The results are summarized in Fig. 1,1 As the figure to indicate which was more acceptable or more satisshows, there was a PR between joint and separate evalfying. In separate evaluation, participants were preuations. In joint evaluation, willingness-to-pay (WTP) sented with these options one at a time, and asked prices were higher for Dictionary B than for Dictionary to indicate on a rating scale how acceptable or how A (t Å 7.11, p õ .001), but in separate evaluation WTP satisfying each option was. Overall, rates of preference values were higher for Dictionary A than for Dictionary reversal between joint and separate evaluations were B (t Å 1.69, p Å .1). The PR was highly significant (t quite high. For example, of the two options listed above, Å 4.56, p õ .001).2 Note that this PR occurred between most participants rated Option A more favorably in joint evaluation, but most rated Option B more favor-1 To prevent their undue influences, extreme WTP values, defined ably in separate evaluation. Bazerman, Schroth, Prad-here as those at least three standard deviations from the mean, were han, Diekmann, and Tenbrunsel (1994) replicated thisexcluded prior to analysis. This footnote applies to all the studies reported in this article. PR with business students in the context of hypotheti- 2 To assess the significance of a joint-separate evaluation PR, one needs to compare the difference between the valuations of A and B in joint evaluation with that in separate evaluation. Note that the and for B in joint evaluation and means for A and for B in separate evaluation, respectively; SJ 2 , SSA 2 and SSB 2 are variances; NJ, NSA, anddifference in joint evaluation is within subjects and that in separate evaluation is between subjects. To meet this need, the following t NSB are numbers of participants in the joint, and the two separateevaluation conditions, respectively. I thank Jimmy Ye for his helpstatistic is used: t Å ((MJA 0 MJB) 0 (MSA 0 MSB))/[(SJ 2 /NJ / SSA 2 / NSA / SSB 2 /NSB)] 1 2 , where MJA, MJB, MSA and MSB are means for A on this statistic. / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 249EVALUABILITY HYPOTHESES cal job offers which differed in terms of (a) salary for pothesis.3 Unless otherwise specified, the discussion below assumes that there are two options to be evalu-oneself and (b) salary for others, and differed in terms of (a) salary for oneself and (b) fairness of the grievance ated and that the two options vary on two attributes. procedure of the company (see Bazerman et al., 1994, for details). Similar PRs have also been obtained by Attribute 1 Attribute 2 Option A: a1 a2Hsee (1993) in the context of salary preferences and by Lowenthal (1993) in the context of political candidates Option B: b1 b2 preferences. It should be noted that joint-separate evaluation PRs Also assume that Option A is superior to Option B on one of the attributes and Option B superior to Optionare different from the observation that effects revealed in a within-subject design may disappear in a between- A on the other attribute. The two options in Study 1 comply with this pattern. The differences between thesubject design (e.g., Fox & Tversky, 1995). In a jointtwo dictionaries can be interpreted as follows:separate evaluation PR, the preference revealed in joint evaluation does not disappear in separate evaluaEntries Defectstion; it reverses itself. Dictionary A: 10,000 noJoint-separate evaluation PRs cannot be easily acDictionary B: 20,000 yescounted for by theories designed to explain choice-pricing and choice-matching PRs. The standard explanaDictionary A was superior on the Defects attribute andtion for choice-pricing PRs is the compatibility principle Dictionary B superior on the Entries attribute.(Slovic, Griffin, & Tversky, 1990). According to this According to the evaluability hypothesis, joint-sepa-principle, a given attribute will carry more weight in rate evaluation PRs occur because one of the attributesa response that is on the same scale as this attribute involved in the stimulus options is hard to evaluatethan in a response that is on a different scale. For independently and the other attribute is relatively easyexample, monetary attributes will loom larger if the to evaluate independently. To say that an attribute isevaluation is made on a monetary scale, such as in hard to evaluate independently means that the evalua-pricing, than if it is made in terms of choice. Evidently, tor does not know how good a given value on the attri-this principle is concerned with PRs involving different bute is without comparisons; to say that an attributeevaluation scales and is not applicable to joint-separate is easy to evaluate independently means that the eval-evaluation PRs. The standard explanation for choiceuator knows how good the value is. In study 1, formatching PRs is the prominence principle (Tversky example, the Entries attribute was hard to evaluateet al., 1988; see also Fischer & Hawkins, 1993). It independently. Without something to compare with,posits that the most prominent attribute of the stimmost students would not know how good a dictionaryulus options has a greater weight in choice than in with 10,000 entries (or with 20,000 entries) is. On thematching. However, there are substantial differences other hand, the Defects attribute was relatively easybetween choice and joint evaluation, and between to evaluate independently. Even without a direct com-matching and independent evaluation. For example, parison, most people would find a defective dictionaryin matching the evaluator is exposed to both stimulus unattractive, and a like-new dictionary attractive.options and performs careful trade-off analyses The relative impact between the hard-to-evaluate(Tversky et al., 1988); in separate evaluation the and the easy-to-evaluate attributes will vary de-evaluator is presented with only one option and canpending on the mode of the evaluation. In separatenot perform trade-off analyses. Moreover, as will be evaluation, because people do not know how to evaluatedemonstrated later in this article, joint-separate an option’s value on the hard-to-evaluate attribute,evaluation PRs can be turned ‘‘on’’ and ‘‘off’’ by varythey have to base their evaluation chiefly on the easy-ing the relative evaluability of the attributes, even if to-evaluate attribute alone. For example, in Study 1,the relative prominence of those attributes remains because those evaluating only one of the dictionariesthe same. would not know how good its number of entries was, THE EVALUABILITY HYPOTHESIS 3 Loewenstein, Blount, and Bazerman (1994) proposed a similar account of the joint-separate evaluation preference reversal which they cast in terms of attribute ambiguity rather than (in)evaluability. In this section I propose an explanation for joint- Although we developed our ideas independently, I have benefited from discussions with those authors.separate evaluation PRs, called the evaluability hy/ a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 250 CHRISTOPHER K. HSEE they would be forced to base their evaluation of the The payoff attribute was presumably relatively hard to evaluate independently. In contrast, the equalitydictionary on its cosmetic condition alone. In joint evaluation, people could compare one option against the attribute was relatively easy to evaluate independently; even without a direct comparison, most peo-other, and this comparison would increase the evaluability of the otherwise hard-to-evaluate attribute. For ple would find an equal settlement appealing and an unequal settlement unappealing. Again, the PR ob-example, in the joint evaluation condition of Study 1, respondents could compare one dictionary against the served in that study was in the direction predicted by the evaluability hypothesis, that is, from Option Aother, and through this comparison they would recognize that a dictionary with 20,000 entries was rela- (superior on the hard-to- evaluate attribute) in joint evaluation to Option B (superior on the easy-to-eval-tively good and one with only 10,000 entries not as good. In short, separate evaluation is determined pri- uate attribute) in separate evaluation. Similar analyses can be applied to other joint-separate evaluationmarily by the easy-to-evaluate attribute and not by the PR findings (e.g., Bazerman, et al., 1994; Hsee, 1993;hard-to-evaluate attribute, whereas joint evaluation is Lowenthal, 1993).influenced by both the hard-to-evaluate and the easySo far, the evaluability hypothesis has only beento-evaluate attributes. used to make post hoc explanations for already ob-Based on the preceding discussion, the evaluability served PRs. The following studies were designed to testhypothesis can be stated as follows: When two stimulus whether the evaluability hypothesis is capable of mak-options involve a trade-off between a hard-to-evaluate ing predictions. These studies each involved optionsattribute and an easy-to-evaluate attribute, the hardthat varied on a hard-to-evaluate attribute and anto-evaluate attribute has a lesser impact in separate easy-to-evaluate attribute. The evaluability hypothesisevaluation than in joint evaluation, and the easy-towas used to predict the direction of a PR. Study 2 usedevaluate attribute has a greater impact. In terms of naturally occurring hard-to-evaluate and naturally oc-Study 1, this hypothesis implies that the Entries attricurring easy-to-evaluate attributes. In Studies 3 andbute had a lesser impact in separate evaluation than 4, whether an attribute was hard or easy to evaluatein joint evaluation, and the Defects attribute had a was manipulated empirically.greater impact. The evaluability hypothesis makes a specific predic- STUDY 2 tion for the direction of joint-separate evaluation PRs. Study 2 differed from Study 1 in two major respects. Because the hard-to-evaluate attribute loses impact First, as mentioned earlier, Study 2 was designed to and the easy-to-evaluate attribute gains impact from test the evaluability hypothesis rather than simply to joint evaluation to separate evaluation, the direction demonstrate a PR. Second, in Study 1 as well as in the of any joint-separate evaluation PRs will always be other studies reviewed previously, the evaluability of from the option superior on the hard-to-evaluate attri- an attribute was confounded with the continuous/dibute in joint evaluation to the option superior on the chotomous nature of the attribute; the hard-to-evaluate easy-to-evaluate attribute in separate evaluation. In- attribute was always a continuous variable and the deed, the PR observed in Study 1 conforms to this pat- easy-to-evaluate attribute always a dichotomous varitern. (Of course, in order for a PR to happen, the option able. In Study 2, both the hard-to-evaluate and the superior on the hard-to-evaluate attribute must be pre- easy-to-evaluate attributes were continuous variables. ferred in joint evaluation; otherwise there would be no Methodroom for a PR. Unless otherwise specified, the above Design and stimuli. This study involved the evalua-condition is assumed to be true in the rest of this tions of two hypothetical job candidates for a computerarticle.) programmer position. The programmer was expected toThe evaluability hypothesis is also consistent with use a computer language called KY. The two candidatesthe finding of the self-neighbor study by Bazerman et were:al., (1992). In that study, the two outcomes mentioned above can be interpreted as varying on two attributes: Candidate A Candidate B(a) payoff to oneself and (b) whether this payoff equaled Education: B.S. in computer B.S. in computerthe payoff to one’s neighbor: science from UIC science from UIC GPA from UIC: 4.9 3.0 Experience has written 10 KY has written 70 KYPayoff Equality Option A: $600 unequal with KY: programs in the programs in the last 2 years last 2 yearsOption B: $500 equal / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 251EVALUABILITY HYPOTHESES (‘‘UIC’’ stands for the University of Illinois at Chicago. The participants were students of that university and knew the abbreviation. GPA at UIC is on a 5-point scale.) Note that the two candidates differed on two attributes—GPA and Experience. Both are continuous variables. For ease of discussion later, let us summarize the differences between the candidates in the following format: Experience GPA Candidate A: 10 KY programs 4.9 FIG. 2. Mean WTP salaries for Candidate A and Candidate B inCandidate B: 70 KY programs 3.0 Study 2. The numbers in parentheses indicate numbers of partici- pants. As in Study 1, the questionnaire for this study had three between-subject versions, joint-evaluation, sepaResults and Discussionrate-evaluation-A, and separate-evaluation-B. In all three versions, participants were asked to imagine that Measure of evaluability. The mean evaluability they were the owner of a consulting firm, that they score for GPA was 3.7 and that for Experience was 2.1. were looking for a computer programmer to use a com- The difference was significant (t Å 11.79, p õ .001). puter language called KY, and that they planned to These results established that GPA was a relatively pay the person between $20,000 and $40,000 per year. easy-to-evaluate attribute and Experience a relatively In the joint-evaluation condition, participants evalu- hard-to-evaluate attribute. ated both candidates. In each separate-evaluation conWillingness-to-pay values. According to the evalua-dition, they evaluated only one of the candidates. The bility hypothesis, there was likely to be a joint-separateevaluation scale was constant across the three verevaluation PR, because one of the attributes involved insions—willingness-to-pay salary. the stimulus options (Experience) was hard to evaluate Measure of evaluability. To assess which attribute independently and the other attribute (GPA) relatively was hard to evaluate independently and which was easy. Given that Candidate A was superior on GPA easy, participants in the two separate-evaluation con- and Candidate B superior on Experience, the direction ditions were asked the following questions after they of the PR would be from Candidate B in joint evaluation had indicated their WTP salaries for the candidate: (a) to Candidate A in separate evaluation. ‘‘Do you have any idea how good a GPA of 4.9 (3.0) from The results, summarized in Fig. 2 were consistent UIC is?’’ and (b) ‘‘If someone has written 10 (70) KY with these predictions. There was a significant PR beprograms in the last 2 years, do you have any idea tween joint and separate evaluations (t Å 4.94, p õ how experienced he/she is with KY?’’ (The numbers .001). In joint evaluation, WTP salaries were higher for preceding the parentheses were for the separate-evalu- Candidate B than for Candidate A (t Å 1.65, p Å .1). ation-A condition and those in the parentheses were for In separate evaluation WTP values were higher for the separate-evaluation-B condition.) To answer each Candidate A than for Candidate B (t Å 5.50, p õ .001). question, participants would choose among four op- This study yields two important implications. First, tions, ranging from (1) Å ‘‘I don’t have any idea.’’ to (4) joint-separate evaluation PRs exist not only when one Å ‘‘I have a clear idea.’’ These options served as an attribute is dichotomous and the other attribute continuevaluability scale, where a greater number indicated ous, but also when both attributes are continuous. Second, greater evaluability. this study shows that the evaluability hypothesis is not only able to provide post-dictions for already-observed Participants and procedure. Respondents were 112 PRs, but also able to provide predictions. college students from the University of Illinois at Chicago. They randomly received one of the three versions STUDY 3 of the questionnaire and completed it individually. Upon completion each participant received a candy bar In all of the studies discussed thus far, whether an attribute was hard or easy to evaluate independently wasas compensation. / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 252 CHRISTOPHER K. HSEE a characteristic of the attribute per se and was never contained two indices, Clarity and Warranty. In the joint-evaluation condition, participants indicated theirmanipulated empirically. In the two studies described below, the evaluability of an attribute was manipulated WTP prices for both TVs and in each separate-evaluation condition for only one of the TVs.4 empirically. As mentioned earlier, the evaluability hypothesis asserts that joint-separate evaluation PRs occur The twoEvaluability conditions were Hard/Hard and Hard/Easy. In the Hard/Hard condition, both the Clar-because one of the attributes involved in the stimulus options is hard to evaluate independently while the other ity and the Warranty ratings were meaningless numbers and hence both hard to evaluate independently.attribute relatively easy, and the relative impact of the two attributes changes from joint evaluation to separate Participants were simply told that Clarity reflected how clear the picture was, that Warranty reflected howevaluation. It implies that if both attributes are hard to evaluate independently, or if both easy to evaluate good the warranty was, and that for both indices, the higher the number, the better. In the Hard/Easy condi-independently, then the relative impact of the two attributes will not change between the two evaluation modes, tion, Clarify remained hard to evaluate, but Warranty was made relatively easy to evaluate by telling partici-and there will be no PR. If the foregoing analysis is correct, then a PR can pants that the Warranty rating indicated the length, in months, of the warranty.be turned either ‘‘on’’ or ‘‘off’’ by varying the relative evaluability of the attributes. Study 3 and Study 4 were The two Evaluability conditions were presented within-subjects. Because the Hard/Easy condition con-designed to test this intuition. Study 3 involved two evaluability conditions, Hard/ tained information not available in the Hard/Hard condition but not vice versa, the Hard/Hard condition al-Hard and Hard/Easy: In the Hard/Hard condition, both attributes were hard to evaluate independently; in the ways preceded the Hard/Easy condition. Hard/Easy condition, one was hard and one easy. The Participants and procedure. Respondents were 98evaluability of an attribute was manipulated depending college students from the University of Chicago who com-on whether or not participants were informed of the pleted multiple questionnaires and received a cash pay-meanings of the attribute. In the Hard/Hard condition, ment. Each participant received one of the three versionsthe values on both attributes (Clarity and Warranty of a of the questionnaire and completed it individually.TV) were meaningless numbers. In the Hard/Easy condition, participants were told that the Warranty rating Results and Discussionmeant the length of the warranty; presumably the Warranty attribute would be easier to evaluate independently According to the evaluability hypothesis, there would once participants knew its meanings. The prediction for be no PR in the Hard/Hard condition, and there was this study was that there would no PR between joint and likely to be a PR in the Hard/Easy condition. Because separate evaluations in the Hard/Hard condition and that TV A was superior on the hard-to-evaluate attribute there would be one in the Hard/Easy condition. (Clarity) and TV B superior on the easy-to-evaluate attribute (Warranty), the direction of the PR would be from Method TV A in joint evaluation to TV B in separate evaluation. The results, summarized in Fig. 3, confirmed these Design and stimuli. Study 3 involved the evalua- predictions. In the Hard/Hard condition, there was no tions of two hypothetical TVs; they varied on two attri- PR: WTP values were higher for TV A than for TV B butes, Clarity and Warranty: in both joint evaluation (t Å 5.5, p õ .001) and separate evaluation (although the difference in separate evalua-Clarity Warranty tion was not significant). In the Hard/Easy condition,TV A: 90 9 there was a significant PR (t Å 3.47, p õ .01): In jointTV B: 40 18 evaluation, WTP values were higher for TV A than for TV B (t Å 4.33, p õ .001), but in separate evaluation The questionnaire for this study had three versions and each included two parts. They constituted 3 Evaluation 4 In Study 3 and Study 4, participants in the joint-evaluation condi-Mode 1 2 Evaluability conditions. In all versions, partition first indicated whether their WTP price was higher for A orcipants were asked to assume that they were shopping for B before indicating a specific WTP price for each option. Twofor a basic 20Љ color TV, and that most such TVs would participants in Study 3 and four in Study 4 gave contradictory recost around $200. Participants were also asked to as- sponses, i.e., said that they were willing to pay more for one option sume that they were in a store where the salespeople but gave a higher WTP price for the other. These responses were excluded.knew nothing about TVs, and that the tag on the TV(s) / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 253EVALUABILITY HYPOTHESES tively easier to evaluate if one knows the range of the attribute and hence knows where the focal value falls in this range. Finally, a manipulation check was used in Study 4 to verify the effectiveness of the evaluability manipulation. Method Design and stimuli. This study involved the evaluations of two CD changers (i.e., multiple compact disc players): CD Changer A CD Changer B Brand: JVC JVC CD capacity: can hold 5 CDs can hold 20 CDs Sound quality: THD Å .003% THD Å .01% Warranty: 1 year 1 year Note that the two CD changers varied on two attributes: CD-capacity and sound quality; the latter was indexed by THD. It was explained to participants in all conditions that THD stands for total harmonic distortion, and that the smaller the THD, the better the sound quality. For ease of discussion, let us summarize the differences between the two CD changers as fol- lows: THD CD Capacity FIG. 3. Mean WTP values for TV A and TV B in Study 3. The CD Changer A: .003% 5 CDs numbers in parentheses indicate numbers of participants. CD Changer B: .01% 20 CDs The questionnaire for this study had six betweenWTP values were higher for TV B than for TV A (t Å subject versions; they constituted 3 Evaluation Mode 1.56, p Å .12). 1 2 Evaluability conditions. In all conditions, participants were asked to assume that they were shopping for a CD changer in a department store and that theSTUDY 4 price of a CD changer would range from $150 to $300. In the joint-evaluation condition participants indi-Study 4 was a replication of Study 3 with the following main differences. First, like Study 3, Study 4 had cated their WTP prices for both CD changers; in each separate-evaluation condition, for only one of the CDtwo Evaluability conditions. Instead of Hard/Hard and Hard/Easy, the two conditions were Hard/Easy and changers. The two Evaluability conditions were Hard/Easy andEasy/Easy. It was predicted that there would be a PR in the Hard/Easy condition, but no PR in the Easy/ Easy/Easy. In the Hard/Easy condition, participants received no other information about either THD or CD-Easy condition. Second, the two Evaluability conditions were between-subjects rather than within-subjects. Capacity than described previously. It was assumed that THD was hard to evaluate independently and CD-Third, evaluability was manipulated differently in Study 4 than in Study 3. In the Hard/Easy condition, capacity relatively easy. Without a comparison, most students would not know whether a given THD ratingthe hard-to-evaluate attribute was an unfamiliar variable (total harmonic distortion of a CD changer); in the (e.g., .01%) was good or bad, but they would have some idea of how many CDs a CD changer could hold andEasy/Easy condition, the possible range of the totalharmonic-distortion attribute was provided. It is ex- whether a CD changer that can hold 5 CDs (or 20 CDs) was good or not.pected that an unfamiliar attribute will become rela/ a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 254 CHRISTOPHER K. HSEE In the Easy/Easy condition, participants were given the following additional information about THD: ‘‘For most CD changers on the market, THD ratings range from .002% (best) to .012% (worst).’’ This information was designed to make THD easier to evaluate independently. With this information, participants in the separate-evaluation conditions would have some idea where the given THD rating fell in the range and hence whether the rating was good or bad. Participants received no additional information about CD-capacity. Measure of evaluability. To check that the evaluability manipulation was effective, participants in the two separate-evaluation conditions were asked the following questions after they had indicated their WTP prices: ‘‘Do you have any idea how good a THD rating of .003% (.01%) is?’’ and ‘‘Do you have any idea how large a CD capacity of 5 (20) CDs is?’’ (The numbers preceding the parentheses were for the separate-evaluation-A condition and those in the parentheses were for the separate-evaluation-B condition.) As in Study 2, answers to those questions ranged from 1 to 4, greater numbers indicating greater evaluability. Participants and procedure. Respondents were 202 college students from the University of Illinois at Chicago. They randomly received one of the six versions of the questionnaire and completed it indiFIG. 4. Mean WTP values for CD Changer A and CD Changer vidually. Each participant received a candy bar as B in Study 4. The numbers in parentheses indicate numbers of particompensation. cipants. Results and Discussion tion, because CD Changer A was superior on the hard-Measure of evaluability. Mean evaluability scores to-evaluate attribute (THD) and CD Changer B supe-for THD and CD-capacity in the Hard/Easy condition rior on the easy-to-evaluate attribute (CD-capacity),were 1.98 and 3.25, respectively, and in the Easy/Easy the direction of the PR in that condition would be fromcondition were 2.53 and 3.22. A 2 Attribute (THD verCD Changer A in joint evaluation to CD Changer B insus CD-capacity) 1 2 Evaluability (Hard/Easy versus separate evaluation.Easy/Easy) analysis of variance revealed a significant The results, summarized in Fig. 4, confirmed theseinteraction effect (F(1,135) Å 9.40, p õ .01) and a sigpredictions. In the Hard/Easy condition, there was anificant main effect for Attribute (F(1,135) Å 111.79, p significant PR (t Å 3.32, p õ .01), and the direction ofõ .001). Planned comparisons indicated that evaluabilthe PR was consistent with the evaluability hypothesis:ity scores for THD were significantly higher in the In joint evaluation WTP values were higher for CDEasy/Hard condition than in the Hard/Hard condition Changer A than for CD Changer B (t Å 1.96, p Å .06),(t Å 2.92, p õ .01), suggesting that the evaluability but in separate evaluation WTP values were highermanipulation for THD was effective. There were virtufor CD Changer B than for CD Changer A (t Å 2.70, pally no differences in the evaluability of CD-capacity õ .01). In the Easy/Easy condition, the PR disappearedbetween the Easy/Hard and the Easy/Easy conditions. (t õ 1, n.s.). WTP values were higher for CD Changer A than for CD Changer B in both joint evaluation (t ÅWillingness-to-pay values. Based on the evaluability hypothesis, the following predictions were made: A 2.81, p õ .01) and separate evaluation (t Å 2.92, p õ .01).PR was likely to occur in the Hard/Easy condition, but not in the Easy/Easy condition. In the Hard/Easy condi- Study 4 corroborates Study 3 by showing that the / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 255EVALUABILITY HYPOTHESES presence of a joint-separate evaluation PR depends on the Easy/Easy condition of that study, THD became the relative evaluability of the attributes involved in relatively easier to evaluate independently because the stimulus options. A PR is likely to occur if one participants were given some distribution information attribute is hard to evaluate independently and the of that attribute, namely, its range. With this informaother relatively easy to evaluate independently. The tion, people would not know where the THD value of chance of a PR is greatly mitigated if both attributes the given option fell in the range and hence whether it are hard to evaluate independently (such as in the was good or not. Hard/Hard condition of Study 3), or if both attributes Whether or not an attribute is hard to evaluate indeare easy to evaluate independently (such as in the pendently may be related to the certainty with which Easy/Easy condition of Study 4). one can evaluate its values. Mellers, Richards, and Birnbaum (1992) found that an adjective carries a GENERAL DISCUSSION lesser impact if the evaluation of the adjective is uncertain (and has a large variance) than if it is certain (and Preference reversals as traditionally studied are usuhas a small variance). It is possible that for hard-toally between conditions that involve different evaluaevaluate attributes there is greater uncertainty in tion scales, e.g., acceptability versus pricing. The presjudging their values in separate evaluation than in ent paper concerns itself with preference reversals bejoint evaluation, and therefore these attributes have tween conditions that share the same evaluation scale, a lesser impact on separate evaluation than on joint but differ in the way the options are evaluated—either evaluation. For easy-to-evaluate attributes, the uncerjointly or separately. According to the evaluability hytainty is likely to be low regardless of the evaluation pothesis, joint-separate evaluation PRs occur because mode, and hence these attributes will have consistent one of the attributes involved in the options is hard impact on separate and joint evaluations. to evaluate independently while the other attribute is Another question about this research concerns its relatively easy. Study 1 demonstrated a joint-separate relationship with the prominence principle (Tversky et evaluation PR when the easy-to-evaluate attribute was al., 1988). Although that principle was originally fora dichotomous variable and the hard-to-evaluate attrimulated to explain choice-matching PRs, one may inbute a continuous variable. Study 2 showed that a PR terpret it more generally to mean that the most promi-could occur even if both attributes were continuous. In nent attribute of an option will carry more weight inStudy 3 and Study 4, the evaluability of an attribute joint evaluation than in separate evaluation. This in-was manipulated empirically. A PR emerged when one terpretation of the prominence principle is consistentattribute was hard to evaluate and the other easy to with the results of Studies 1 and 2, but not with thoseevaluate, and it disappeared if both attributes were of Study 3 or Study 4. If a joint-separate evaluation PRhard to evaluate or both easy to evaluate. The findings occurs simply because one attribute is more prominentof these studies provide consistent support for the evathan the other, then the Evaluability manipulationsluability hypothesis. in Study 3 and Study 4 should not have affected theSeveral potential questions about this research need existence of a PR, because these manipulations wouldto be addressed. First, what determines whether an not alter the relative prominence of the attributes.attribute is hard or easy to evaluate independently? Finally, one may wonder if there are other explana-My speculation is that it depends on how much knowltions for joint-separate evaluation PRs than the evalua-edge the evaluator has about that attribute, especially bility hypothesis. The answer is probably yes. Gener-about the value distribution of that attribute. An attrially speaking, the evaluability hypothesis is most ap-bute will be hard to evaluate independently if the evalplicable to PRs where the options vary on two distinctuator does not know its distribution information, such attributes, and one of the attributes is markedly harderas the possible values of the attribute, its best and to evaluate independently than the other attribute.worst values, and so forth. Without such knowledge, PRs of other forms may or may not be explained bythe evaluator will not know where a given value on the evaluability hypothesis, and there are also otherthat attribute lies in relation to the other values on the possible explanations. For example, when two optionsattribute and hence will not know how to evaluate it. involving a tradeoff between two attributes, and theIndeed, the evaluability manipulations in Study 3 and values of the options are of the same sign on one attri-Study 4 were based on these intuitions. For example, bute but of different signs on the other attribute, a PRin the Hard/Easy condition of Study 4, THD was hard may emerge between joint and separate evaluationsto evaluate independently because most participants did not have any distribution information of THD. In (see Hsee, 1994, for a demonstration). This type of PR / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 256 CHRISTOPHER K. HSEE can be explained in terms of the different evaluation separately. Similarly, when a person is applying for a job, she may have the option of either being interviewedmodels people use in joint versus in separate evaluaat the same time as another job candidate or beingtion and the curvilinearity of the utility functions of interviewed on a different day than the other candi-the attributes (Bazerman et al., 1992; Hsee, 1994). Gendate. In the same-time scenario, the candidate is likelyerally speaking, people use an additive difference to be evaluated jointly with the other candidate. In themodel (Tversky, 1969) in joint evaluation and an addidifferent-day case, especially if the two interview daystive model in separate evaluation, and most attributes are scheduled far enough apart, she is likely to be eval-have a concave utility function in the positive domain uated separately from her competitor. Which optionand a convex utility function in the negative domain should the manufacturer adopt? Which should the job(e.g., Kahneman & Tversky, 1979). From these ascandidate adopt? According to the findings of this re-sumptions it can be derived mathematically that the search, the answers to these questions depend on theattribute on which the stimulus options are of the same type of attributes on which one excels over one’s rival.sign will have a lesser impact in separate evaluation If one is superior to one’s rival on hard-to-evaluate at-than in joint evaluation (Bazerman et al., 1992; Hsee, tributes and inferior on easy-to-evaluate attributes,1994). See Bazerman et al. (1992) for an alternative one should try to create a joint evaluation environmentexplanation of the result of their self–neighbor study so as to facilitate direct comparison. If the reverse isfrom this perspective. true, one should try to be evaluated separately fromAs another example, consider a study reported by one’s rival.Irwin, Slovic, Lichtenstein and McClelland (1993). ParThe present research shows that when two optionsticipants stated their WTP values for various options, involving a trade-off between a hard-to-evaluate attri-including (a) improvement in air quality in Denver and bute and an easy-to-evaluate attribute are evaluated,(b) improvement in a VCR. In joint evaluation WTP preference between these options may change de-values were higher for improvements in air quality, pending on whether these options are presented jointlybut in separate evaluation WTP values were higher for or separately. More important, the direction of thisimprovements in consumer products (see also Kahnechange can be predicted, and can even be manipulated.man & Ritov, 1994, for similar findings). This type of PR was quite different from the one explored in the REFERENCESpresent research; for example, the options in Irwin et al. (1994) were of two entirely different categories (air Bazerman, M. H., Loewenstein, G. F., & White, S. B. (1992). Reverquality versus consumer products) and they shared no sals of preference in allocation decisions: Judging an alternative versus choosing among alternatives. Administrative Science Quar-common attributes. Irwin et al. (1993) considered their terly, 37, 220–240.finding as an instance of choice-judgment PR and exBazerman, M. H., Schroth, H. A., Shah, P. P., Diekmann, K. A., &plained it using the prominence principle. Another posTenbrunsel, A. E. (1994). The inconsistent role of comparison othsible explanation is related to elastic justification ers and procedural justice in reactions to hypothetical job descrip(Hsee, 1995, 1996), the notion that one’s decision will tions: Implications for job acceptance decisions. Organizational Bebe influenced more by unjustifiable considerations havior and Human Decision Processes, 60, 326–352. when there is ambiguity as to what one should do than Fischer, G. W., & Hawkins, S. A. (1993). Strategy compatibility, scale compatibility, and the prominence effect. Journal of Experimentalwhen there is not. It is reasonable to assume that imPsychology: Human Perception And Performance, 19, 580–597.proving one’s VCR is a less justifiable action than imFox, C. R., & Tversky, A. (1995). Ambiguity aversion and comparativeproving air quality in Denver, and that separate evaluignorance. The Quarterly Journal of Economics, 585–603. ation entails more ambiguity as to what one should do Goldstein, W. M., & Einhorn, H. J. (1987). Expression theory and than joint evaluation. Thus, the results of Irwin et al. the preference reversal phenomena. Psychological Review, 94, (1993) may be an incidence of elastic justification. 236–254. I shall conclude this article with a discussion of some Grether, D. M., & Plott, C. R. (1979). Economic theory of choice and the preference reversal phenomenon. American Economic Review,prescriptive implications. In real life, people often face 69, 623–638.the option of engaging either in joint evaluation or in Hsee, C. K. (1993). When trend of monetary outcomes matters: Sepa-separate evaluation. For example, when a manufacrate versus joint evaluation, and judgment of feelings versus choice. turer launches a new product, he may face the choice Unpublished manuscript, University of Chicago. of either running a comparative advertisement against Hsee, C. K. (1995). Elastic justification: How tempting but task-irrela rival product or advertising his product by itself. In evant factors influence decisions. Organizational Behavior and the former scenario the focal and the rival products Human Decision Process, 62, 330–337. Hsee, C. K. (1996). Elastic justification: How unjustifiable factorswill likely be evaluated jointly, in the latter scenario, / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP 257EVALUABILITY HYPOTHESES influence judgments. Organizational Behavior and Human Deci- of Preference Between Independent and Simultaneous Evaluation of Alternatives. Working paper, Carnegie Mellon University.sion Processes, 66, 122–129. Lowenthal, D. (1993). Preference reversals in candidate evaluation.Hsee, C. K. (1994). Preference reversal between joint and separate Working paper, Carnegie Mellon University.evaluations. Working paper, The University of Chicago. Mellers, B. A., Richards, V., & Birnbaum, M. H. (1992). Distribu-Irwin, J. R., Slovic, P., Lichtenstein, S., & McClelland, G. H. (1993). tional theory of impression formation. Organizational BehaviorPreference reversals and the measurement of environmental valand Human Decision Process, 51, 313–343.ues. Journal of Risk And Uncertainty, 6, 5–18. Slovic, P., Griffin, D., & Tversky, A. (1990). Compatibility effects in Kahneman, D., & Ritov, I. (1994). Determinants of stated willingness judgment and choice. In R. M. Hogarth (Ed.), Insights in Decision to pay for public goods. A study in the headline method. Journal Making: Theory And Applications. Chicago: Univ. of Chicago. of Risk And Uncertainty, 9, 5–38. Slovic, P., & Lichtenstein, S. (1968). The relative importance of probKahneman, D., & Tversky, A. (1979). Prospect theory: An analysis abilities and payoffs in risk taking. Journal of Experimental Psyof decision under risk. Econometrica, 47, 263–291. chology Monographs, 78, (3, Pt.2). Lichtenstein, S., & Slovic, P. (1971). Reversal of preferences between Tversky, A. (1969). Intransitivity of preferences. Psychological Rebids and choices in gambling decisions. Journal of Experimental view, 76, 31–48. Psychology, 89, 46–55. Tversky, A., Sattath, S., & Slovic, P. (1988). Contingent weighting in judgment and choice. Psychological Review, 95, 371–384.Loewenstein, G. F., Blount, S., & Bazerman, M. H. (1994). Reversals Received: January 9, 1996 / a706$$2632 09-16-96 13:33:33 obhas AP: OBHDP