1 1 Introduction 1.1 Factor Analysis and Structural Theories By a structural theory we shall mean a theory that regards a phenomenon as an aggregate of elemental components interrelated in a lawful way. An excellent example of a structural theory is the theory of chemical compounds: Chemical substances are lawful compositions of the atomic elements, with the laws governing the compositions based on the manner in which the electron orbits of different atoms interact when the atoms are combined in molecules. Structural theories occur in other sciences as well. In linguistics, for example, structural descriptions of language analyze speech into phonemes or morphemes. The aim of structural linguistics is to formulate laws governing the combination of morphemes in a particular language. Biology has a structural theory, which takes, as its elemental components, the individual cells of the organism and organizes them into a hierarchy of tissues, organs, and systems. In the study of the inheritance of characters, modern geneticists regard the manifest characteristics of an organism (phenotype) as a function of the particular combination of genes (genotype) in the chromosomes of the cells of the organism. Structural theories occur in psychology as well. At the most fundamental level a psychologist may regard behaviors as ordered aggregates of cellular responses of the organism. However, psychologists still have considerable difficulty in formulating detailed structural theories of behavior because many of the physical components necessary for such theories have not been identified and understood. But this does not make structural theories impossible in psychology. The history of other sciences shows that scientists can understand the abstract features of a structure long before they know the physical basis for this structure. For example, the history of chemistry indicates that chemists could formulate principles regarding the effects of mixing compounds in certain amounts long before the atomic and molecular aspects of matter were understood. Gregor Mendel stated the fundamental laws of inheritance before biologists had associated the chromosomes of the cell with inheritance. In psychology, Isaac Newton, in 1704, published a simple mathematical model of the visual effects of mixing different hues, but nearly a hundred years elapsed before Thomas Young postulated the existence of © 2010 by Taylor & Francis Group, LLC 2 Foundations of Factor Analysis three types of color receptors in the retina to account for the relationships described in Newton’s model. And only a half-century later did physiologist Helmholtz actually give a physiological basis to Young’s theory. Other physiological theories subsequently followed. Much of psychological theory today still operates at the level of stating relationships among stimulus conditions and gross behavioral responses. One of the most difficult problems of formulating a structural theory involves discovering the rules that govern the composition of the aggregates of components. The task is much easier if the scientist can show that the physical structure he is concerned with is isomorphic to a known mathematical structure. Then, he can use the many known theorems of the mathematical structure to make predictions about the properties of the physical structure. In this regard, George Miller (1964) suggests that psychologists have used the structure of euclidean space more than any other mathematical structure to represent structural relationships of psychological processes. He cites, for example, how Isaac Newton’s (1704) model for representing the effects of mixing different hues involved taking the hues of the spectrum in their natural order and arranging them as points appropriately around the circumference of a circle. The effects of color mixtures could be determined by proportionally weighting the points of the hues in the mixture according to their contribution to the mixture and finding the center of gravity of the resulting points. The closer this center of gravity approached the center of the color circle, the more the resulting color would appear gray. In addition, Miller cites Schlosberg’s (1954) representation of perceived emotional similarities among facial expressions by a two-dimensional graph with one dimension interpreted as pleasantness versus unpleasantness and the other as rejection versus attention, and Osgood’s (1952) analysis of the components of meaning of words into three primary components: (1) evaluation, (2) power, and (3) activity. Realizing that spatial representations have great power to suggest the existence of important psychological mechanisms, psychologists have developed techniques, such as metric and nonmetric factor analysis and metric and nonmetric multidimensional scaling, to create, systematically, spatial representations from empirical measurements. All four of these techniques represent objects of interest (e.g., psychological “variables” or stimulus “objects”) as points in a multidimensional space. The points are so arranged with respect to one another in the space as to reflect relationships of similarity among the corresponding objects (variables) as given by empirical data on these objects. Although a discussion of the full range of techniques using spatial representations of relationships found in data would be of considerable interest, we shall confine ourselves, in this book, to an in depth examination of the methods of factor analysis. The reason for this is that the methodology of factor analysis is historically much more fully developed than, say, that of multidimensional scaling; as a consequence, prescriptions for the ways © 2010 by Taylor & Francis Group, LLC Introduction 3 of doing factor analysis are much more established than they are for these other techniques. Furthermore, factor analysis, as a technique, dovetails very nicely with such classic topics in statistics as correlation, regression, and multivariate analysis, which are also well developed. No doubt, as the gains in the development of multidimensional scaling, especially the nonmetric versions of it, become consolidated, there will be authors who will write textbooks about this area as well. In the meantime, the interested reader can consult Torgerson’s (1958) textbook on metric multidimensional scaling for an account of that technique. 1.2 Brief History of Factor Analysis as a Linear Model The history of factor analysis can be traced back into the latter half of the nineteenth century to the efforts of the British scientist Francis Galton (1869, 1889) and other scientists to discover the principles of the inheritance of manifest characters (Mulaik, 1985, 1987). Unlike Gregor Mendel (1866), who is today considered the founder of modern genetics, Galton did not try to discover these principles chiefly through breeding experiments using simple, discrete characters of organisms with short maturation cycles; rather, he concerned himself with human traits such as body height, physical strength, and intelligence, which today are not believed to be simple in their genetic determination. The general question asked by Galton was: To what extent are individual differences in these traits inherited and by what mechanism? To be able to answer this question, Galton had to have some way of quantifying the relationships of traits of parents to traits of offspring. Galton’s solution to this problem was the method of regression. Galton noticed that, when he took the heights of sons and plotted them against the heights of their fathers, he obtained a scatter of points indicating an imperfect relationship. Nevertheless, taller fathers tended strongly to have on the average taller sons than shorter fathers. Initially, Galton believed that the average height of sons of fathers of a given height would be the same as the height of the fathers, but instead the average was closer to the average height of the population of sons as a whole. In other words, the average height of sons “regressed” toward the average height in the population and away from the more extreme height of their fathers. Galton believed this implied a principle of inheritance and labeled it “regression toward the mean,” although today we regard the regression phenomenon as a statistical artifact associated with the linearregression model. In addition, Galton discovered that he could fit a straight line, called the regression line, with positive slope very nicely through the average heights of sons whose fathers had a specified height. Upon consultation with the mathematician Karl Pearson, Galton learned that he could use © 2010 by Taylor & Francis Group, LLC 4 Foundations of Factor Analysis a linear equation to relate the heights of fathers to heights of sons (cf. Pearson and Lee, 1903): = + +Y a bX E (1.1) Here Y is the height of a son a the intercept with the Y-axis of the regression line passing through the averages of sons with fathers of fixed height b the slope of the regression line X the height of the father E an error of prediction As a measure of the strength of relationship, Pearson used the ratio of the variance of the predicted variable, Yˆ=a+bX, to the variance of Y. Pearson, who was an accomplished mathematician and an articulate writer, recognized that the mathematics underlying Galton’s regression method had already been worked out nearly 70 years earlier by Gauss and other mathematicians in connection with determining the “true” orbits of planets from observations of these orbits containing error of observation. Subsequently, because the residual variate E appeared to be normally distributed in the prediction of height, Pearson identified the “error of observation” of Gauss’s theory of leastsquares estimation with the “error of prediction” of Equation 1.1 and treated the predicted component Yˆ=a+bX as an estimate of an average value. This initially amounted to supposing that the average heights of sons would be given in terms of their fathers’ heights by the equation Y=a+bX (without the error term), if nature and environment did not somehow interfere haphazardly in modifying the predicted value. Although Pearson subsequently found the field of biometry on such an exploitation of Gauss’s least-squares theory of error, modern geneticists now realize that heredity can also contribute to the E term in Equation 1.1 and that an uncritical application of least-squares theory in the study of the inheritance of characters can be grossly misleading. Intrigued with the mathematical problems implicit in Galton’s program to metricize biology, anthropology, and psychology, Pearson became Galton’s junior colleague in this endeavor and contributed enormously as a mathematical innovator (Pearson, 1895). After his work on the mathematics of regression, Pearson concerned himself with finding an index for indicating the type and degree of relationship between metric variables (Pearson, 1909). This resulted in what we know today as the product-moment correlation coefficient, given by the formula − − ρ = − −2 2 [( ( ))( ( ))] [( ( )) ] [( ( )) ] XY E X E X Y E Y E X E X E Y E Y (1.2) where E[] is the expected-value operator X and Y are two random variables © 2010 by Taylor & Francis Group, LLC Introduction 5 This index takes on values between −1 and +1, with 0 indicating no relationship. A deeper meaning for this coefficient will be given later when we consider that it represents the cosine of the angle between two vectors, each standing for a different variable. In and of itself the product-moment correlation coefficient is descriptive only in that it shows the existence of a relationship between variables without showing the source of this relationship, which may be causal or coincidental in nature. When the researcher obtains a nonzero value for the product-moment correlation coefficient, he must supply an explanation for the relationship between the related variables. This usually involves finding that one variable is the cause of the other or that some third variable (and maybe others) is a common cause of them both. In any case, interpretations of correlations are greatly facilitated if the researcher already has a structural model on which to base his interpretations concerning the common component producing a correlation. To illustrate, some of the early applications of the correlation coefficient in genetics led nowhere in terms of furthering the theory of inheritance because explanations given to nonzero correlation coefficients were frequently tautological, amounting to little more than saying that a relationship existed because such-and-such relationship-causing factor was present in the variables exhibiting the relationship. However, when Mendel’s theory of inheritance (Mendel, 1865) was rediscovered during the last decade of the nineteenth century, researchers of hereditary relationships had available to them a structural mechanism for understanding how characters in parents were transmitted to offspring. Working with a trait that could be measured quantitatively, a geneticist could hypothesize a model of the behavior of the genes involved and from this model draw conclusions about, say, the correlation between relatives for the trait in the population of persons. Thus, product-moment correlation became not only an exploratory, descriptive index but an index useful in hypothesis testing. R. A. Fisher (1918) and Sewell Wright (1921) are credited with formulating the methodology for using correlation in testing Mendelian hypotheses. With the development of the product-moment correlation coefficient, other related developments followed: In 1897, G. U. Yule published his classic paper on multiple and partial correlation. The idea of multiple correlation was this: Suppose one has p variables, X1, X2,…, Xp, and wishes to find that linear combination = β + +β1 2 2 ˆ p pX X X (1.3) of the variables X2,…, Xp, which is maximally correlated with X1. The problem is to find the weights β2,…, βp that make the linear combination Xˆ 1 maximally correlated with X1. After Yule’s paper was published, multiple correlation became quite useful in prediction problems and turned out to be systematically related but not exactly equivalent to Gauss’s solution for linear least-squares estimation of a variable, using information obtained on © 2010 by Taylor & Francis Group, LLC 6 Foundations of Factor Analysis several independent variables observed at certain preselected values. In any case, with multiple correlation, the researcher can consider several components (on which he had direct measurements) as accounting for the variability in a given variable. At this point, the stage was set for the development of factor analysis. By the time 1900 had arrived, researchers had obtained product-moment correlations on many variables such as physical measurements of organisms, intellectual-performance measures, and physical-performance measures. With variables showing relationships with many other variables, the need existed to formulate structural models to account for these relationships. In 1901, Pearson published a paper on lines and planes of closest fit to systems of points in space, which formed the basis for what we now call the principal-axes method of factoring. However, the first common-factor-analysis model is attributed to Spearman (1904). Spearman intercorrelated the test scores of 36 boys on topics such as classics, French, English, mathematics, discrimination of tones, and musical talent. Spearman had a theory, primarily attributed by him to Francis Galton and Herbert Spencer, that the abilities involved in taking each of these six tests were a general ability, common to all the tests, and a specific ability, specific to each test. Mathematically, this amounts to the equation = + ψj j jY a G (1.4) where Yj is the jth manifest variable (e.g., test score in mathematics) aj is a weight indicating the degree to which the latent general-ability variable G participates in Yj ψj is an ability variable uncorrelated with G and specific to Yj Without loss of generality, one can assume that E(Yj)=E(ψj)=E(G)=0, for all j, implying that all variables have zero means. Then saying that ψj is specific to Yj amounts to saying that ψj does not covary with another manifest variable Yk, so that E(Yk ψj)=0, with the consequence that E(ψj ψk)=0 (implying that different specific variables do not covary). Thus the covariances between different variables are due only to the general-ability variable, that is, = = = 2 2 ( ) [( + )( + )] ( + + + ) ( ) j k j j k k j k j k k j j k j k E YY E a G a G E a a G a G a G a a E G ψ ψ ψ ψ ψ ψ (1.5) From covariance to correlation is a simple step. Assuming in Equation 1.5 that E(G2)=1 (the variance of G is equal to 1), we then can derive the correlation between Yj and Yk: ρ =jk j ka a (1.6) © 2010 by Taylor & Francis Group, LLC Introduction 7 Spearman noticed that the pattern of correlation coefficients obtained among the six intellectual-test variables in his study was consistent with the model of a single common variable and several specific variables. For the remainder of his life Spearman championed the doctrine of one general ability and many specific abilities, although evidence increasingly accumulated which disconfirmed such a simple model for all intellectualperformance variables. Other British psychologists contemporary with Spearman either disagreed with his interpretation of general ability or modified his “two-factor” (general and specific factors) theory to include “group factors,” corresponding to ability variables not general to all intellectual variables but general to subgroups of them. The former case was exemplified by Godfrey H. Thomson (cf. Thomson, 1956), who asserted that the mind does not consist of a part which participates in all mental performance and a large number of particular parts which are specific to each performance. Rather, the mind is a relatively undifferentiated collection of many tiny parts. Any intellectual performance that we may consider involves only a “sample” of all these many tiny parts. Two performance measures are intercorrelated because they sample overlapping sets of these tiny parts. And Thomson was able to show how data consistent with Spearman’s model were consistent with his sampling-theory model also. Another problem with G is that one tends to define it mathematically as “the common factor common to all the variables in the set” rather than in terms of something external to the mathematics, for example “rule inferring ability.” This is further exacerbated by the fact that to know what something is, you need to know what it is not. If all variables in a set have G in common, you have no instance for which it does not apply, and G easily becomes mathematical G by default. If there were other variables that were not due to G in the set, this would narrow the possibilities as to what G is in the world by indicating what it is not pertinent to. This problem has subtly bedeviled the theory of G throughout its history. On the other hand, psychologists such as Cyril Burt and Philip E. Vernon took the view that in addition to a general ability (general intelligence), there were less general abilities such as verbal–numerical–educational ability and practical–mechanical–spatial–physical ability, and even less general abilities such as verbal comprehension, reading, spelling, vocabulary, drawing, handwriting, and mastery of various subjects (cf. Vernon, 1961). In other words, the mind was organized into a hierarchy of abilities running from the most general to the most specific. Their model of test scores can be represented mathematically much like the equation for multiple correlation as Yj =ajG+bj1G1 +…+bjsGs +cj1H1 +…+cjtHt +ψj where Yj is a manifest intellectual-performance variable aj, bj1,…, bjs, cj1,…, cjt are the weights G is the latent general-ability variable © 2010 by Taylor & Francis Group, LLC 8 Foundations of Factor Analysis G1,…, Gs are the major group factors H1,…, Ht are the minor group factors ψj is a specific-ability variable for the jth variable Correlations between two observed variables, Yj and Yk, would depend upon having not only the general-ability variable in common but group-factor variables in common as well. By the time all these developments in the theory of intellectual abilities had occurred, the 1930s had arrived, and the center of new developments in this theory (and indirectly of new developments in the methodology of common-factor analysis) had shifted to the United States where L. L. Thurstone at the University of Chicago developed his theory and method of multiplefactor analysis. By this time, the latent-ability variables had come to be called “factors” owing to a usage of Spearman (1927). Thurstone differed from the British psychologists over the idea that there was a general-ability factor and that the mind was hierarchically organized. For him, there were major group factors but no general factor. These major group factors he termed the primary mental abilities. That he did not cater to the idea of a hierarchical organization for the primary mental abilities was most likely because of his commitment to a principle of parsimony; this caused him to search for factors which related to the observed variables in such a way that each factor pertained as much as possible to one nonoverlapping subset of the observed variables. Sets of common factors displaying this property, Thurstone said, had a “simple structure.” To obtain an optimal simple structure, Thurstone had to consider common-factor variables that were intercorrelated. And in the case of factor analyses of intellectualperformance tests, Thurstone discovered that usually his common factors were all positively intercorrelated with one another. This fact was considerably reassuring to the British psychologists who believed that by relying on his simple-structure concept Thurstone had only hidden the existence of a general-ability factor, which they felt was evidenced by the correlations among his factors. Perhaps one reason why Thurstone’s simple-structure approach to factor analysis became so popular—not just in the United States but in recent years in England and other countries as well—was because simple-structure solutions could be defined in terms of more-or-less objective properties which computers could readily identify and the factors so obtained were easy to interpret. It seemed by the late 1950s when the first large-scale electronic computers were entering universities that all the drudgery could be taken out of factor-analytic computations and that the researcher could let the computer do most of his work for him. Little wonder, then, that not much thought was given to whether theoretically hierarchical solutions were preferable to simple-structure solutions, especially when hierarchical solutions did not seem to be blindly obtainable. And believing that factor analysis could automatically and blindly find the key latent variables in a domain, what © 2010 by Taylor & Francis Group, LLC Introduction 9 researchers would want hierarchical solutions which might be more difficult to interpret than simple-structure solutions? The 1950s and early 1960s might be described as the era of blind factor analysis. In this period, factor analysis was frequently applied agnostically, as regards structural theory, to all sorts of data, from personality-rating variables, Rorschach-test-scoring variables, physiological variables, semantic differential variables, and biographical-information variables (in psychology), to characteristics of mining districts (in mineralogy), characteristics of cities (in city planning), characteristics of arrowheads (in anthropology), characteristics of wasps (in zoology), variables in the stock market (in economics), and aromatic-activity variables (in chemistry), to name just a few applications. In all these applications the hope was that factor analysis could bring order and meaning to the many relationships between variables. Whether blind factor analyses often succeeded in providing meaningful explanations for the relationships among variables is a debatable question. In the case of Rorschach-test-score variables (Cooley and Lohnes, 1962) there is little question that blind factor analysis failed to provide a manifestly meaningful account of the structure underlying the score variables. Again, factor analyses of personality trait-rating variables have not yielded factors universally regarded by psychologists as explanatory constructs of human behavior (cf. Mulaik, 1964; Mischel, 1968). Rather, the factors obtained in personality trait-rating studies represent confoundings of intrarater processes (semantic relationships among trait words) with intraratee processes (psychological and physiological relationships within the persons rated). In the case of factor-analytic studies of biographical inventory items, the chief benefit has been in terms of classifying inventory items into clusters of similar content, but as yet no theory as to life histories has emerged from such studies. Still, blind factor analyses have served classification purposes quite well in psychology and other fields, but these successes should not be interpreted as generally providing advances in structural theories as well. In the first 60 years of the history of factor analysis, factor-analytic methodologists developed heuristic algebraic solutions and corresponding algorithms for performing factor analyses. Many of these methods were designed to facilitate the finding of approximate solutions using mechanical hand calculators. Harman (1960) credits Cyril Burt with formulating the centroid method, but Thurstone (1947) gave it its name and developed it more fully as an approximation to the computationally more challenging principal axes, the eigenvector–eigenvalue solution put forth by Hotelling (1933). Until the development of electronic computers, the centroid method was a simple and straightforward solution that highly approximated the principal axes solution. But in the 1960s, computers came on line as the government poured billions into the development of computers for decryption work and into the mathematics of nuclear physics in developing nuclear weapons. Out of the latter came fast computer algorithms for finding eigenvectors and eigenvalues. Subsequently, factor analysts discovered the computer, and the eigenvector © 2010 by Taylor & Francis Group, LLC 10 Foundations of Factor Analysis and eigenvalue routines and began programming them to obtain principal axes solutions, which rapidly became the standard approach. Nevertheless most of the procedures initially used were still based on least-squares methods, for the statistically more sophisticated method of maximumlikelihood estimation was still both mathematically and computationally challenging. Throughout the history of factor analysis there were statisticians who sought to develop a more rigorous statistical theory for factor analysis. In 1940, Lawley (1940) made a major breakthrough with the development of equations for the maximum-likelihood estimation of factor loadings (assuming multivariate normality for the variables), and he followed up this work with other papers (1942, 1943, 1949) that sketched a framework for statistical testing in factor analysis. The problem was, to use these methods you needed maximum-likelihood estimates of the factor loadings. Lawley’s computational recommendations for finding solutions were not practical for more than a few variables. So, factor analysts continued to use the centroid method and to regard any factor loading less than .30 as “nonsignificant.” In the 1950s, Rao (1955) developed an iterative computer program for obtaining maximum-likelihood estimates, but this was later shown not to converge. Howe (1955) showed that the maximum-likelihood estimates of Lawley (1949) could be derived mathematically without making any distributional assumptions at all by simply seeking to minimize the determinant of the matrix of partial correlations among residual variables after partialling out common factors from the original variables. Brown (1961) noted that the same idea was put forth on intuitive grounds by Thurstone in 1953. Howe also provided a far more efficient Gauss–Seidel algorithm for computing the solution. Unfortunately, this was ignored or unknown. In the meantime, Harman and Jones (1966) presented their Gauss–Seidel minres method of least-squares estimation, which rapidly converged and yielded close approximations to the maximum-likelihood estimates. The major breakthrough mathematically, statistically, and computationally in maximum-likelihood exploratory factor analysis, was made by Karl Jöreskog (1967), then a new PhD in mathematical statistics from the University of Uppsala in Sweden. He applied a then recently developed numerical algorithm of Fletcher and Powell (1963) to the maximumlikelihood estimation of the full set of parameters of the common-factor model. The algorithm was quite rapid in convergence. Jöreskog’s algorithm has been the basis for maximum-likelihood estimation in most commercial computer programs ever since. However, the program was not always well integrated with other computing methods in some major commercial programs, so that the program reports principal components eigenvalues rather than those of the weighted reduced correlation matrix of the commonfactor model provided by Jöreskog’s method, which Jöreskog used in initially determining the number of factors to retain. © 2010 by Taylor & Francis Group, LLC Introduction 11 Recognizing that more emphasis should be placed on the testing of hypotheses in factor-analytic studies, factor analysts in the latter half of the 1960s began increasingly to concern themselves with the methodology of hypothesis testing in factor analysis. The first efforts in this regard, using what are known as procrustean transformations, trace their beginnings to a paper by Mosier (1939) that appeared in Psychometrika nearly two decades earlier. The techniques of procrustean transformations seek to transform (by a linear transformation) the obtained factor-pattern matrix (containing regression coefficients for the observed variables regressed onto the latent, underlying factors) to be as much as possible like a hypothetical factor-pattern matrix constructed according to some structural hypothesis pertaining to the variables studied. When the transformed factor-pattern matrix is obtained it is tested for its degree of fit to the hypothetical factor-pattern matrix. For example, Guilford (1967) used procrustean techniques to isolate factors predicted by his three-faceted model of the intellect. However, hypothesis testing with procrustean transformations have been displaced in favor of confirmatory factor analysis since the 1970s, because the latter is able to assess how well the model reproduces the sample covariance matrix. Toward the end of the 1960s, Bock and Bargmann (1966) and Jöreskog (1969a) considered hypothesis testing from the point of view of fitting a hypothetical model to the data. In these approaches the researcher specifies, ahead of time, various parameters of the common-factor-analysis model relating manifest variables to hypothetical latent variables according to a structural theory pertaining to the manifest variables. The resulting model is then used to generate a hypothetical covariance matrix for the manifest variables that is tested for goodness of fit to a corresponding empirical covariance matrix (with unspecified parameters of the factor-analysis model adjusted to make the fit to the empirical covariance matrix as good as possible). These approaches to factor analysis have had the effect of encouraging researchers to have greater concern with substantive, structural theories before assembling collections of variables and implementing the factoranalytic methods. We will treat confirmatory factor analysis in Chapter 15, although it is better treated as a special case of structural equation modeling, which would be best dealt in a separate book. The factor analysis we primarily treat in this book is exploratory factor analysis, which may be regarded as an “abductive,” “hypothesis-generating” methodology rather than a “hypothesis-testing” methodology. With the development of structural equation, modeling researchers have come to see traditional factor analysis as a methodology to be used, among other methods, at the outset of a research program, to formulate hypotheses about latent variables and their relation to observed variables. Furthermore it is now regarded as just one of several approaches to formulating such hypotheses, although it has general applications any time one believes that a set of observed variables are dependent upon a set of latent common factors. © 2010 by Taylor & Francis Group, LLC 12 Foundations of Factor Analysis 1.3 Example of Factor Analysis At this point, to help the reader gain a more concrete appreciation of what is obtained in a factor analysis, it may help to consider a small factor-analytic study conducted by the author in connection with a research project designed to predict the reactions of soldiers to combat stress. The researchers had the theory that an individual soldier’s reaction to combat stress would be a function of the degree to which he responded emotionally to the potential danger of a combat situation and the degree to which he nevertheless felt he could successfully cope with the situation. It was felt that, realistically, combat situations should arouse strong feelings of anxiety for the possible dangers involved. But ideally these feelings of anxiety should serve as internal stimuli for coping behaviors which would in turn provide the soldier with a sense of optimism in being able to deal with the situation. Soldiers who respond pessimistically to strong feelings of fear or anxiety were expected to have the greatest difficulties in managing the stress of combat. Soldiers who showed little appreciation of the dangers of combat were also expected to be unprepared for the strong anxiety they would likely feel in a real combat situation. They would have difficulties in managing the stress of combat, especially if they had past histories devoid of successful encounters with stressful situations. To implement research on this theory, it was necessary to obtain measures of a soldier’s emotional concern for the danger of a combat situation and of his degree of optimism in being able to cope with the situation. To obtain these measures, 14 seven-point adjectival rating scales were constructed, half of which were selected to measure the degree of emotional concern for threat, and half of which were selected to measure the degree of optimism in coping with the situation. However, when these adjectival scales were selected, the researchers were not completely certain to what extent these scales actually measured two distinct dimensions of the kind intended. Thus, the researchers decided to conduct an experiment to isolate the common-meaning dimensions among these 14 scales. Two hundred and twenty-five soldiers in basic training were asked to rate the meaning of “firing my rifle in combat” using the 14 adjectival scales, with ratings being obtained from each soldier on five separate occasions over a period of 2 months. Intercorrelations among the 14 scales were then obtained by summing the cross products over the 225 soldiers and five occasions. (Intercorrelations were obtained in this way because the researchers felt that, although on any one occasion various soldiers might differ in their conceptions of “firing my rifle in combat” and on different occasions an individual soldier might have different conceptions, the major determinants of covariation among the adjectival scales would still be conventional-meaning dimensions common to the scales.) The matrix of intercorrelations, illustrated in Table 1.1, was then subjected to image factor analysis (cf. Jöreskog, 1962), which is a relatively © 2010 by Taylor & Francis Group, LLC Introduction 13 TABLE 1.1 Intercorrelations among 14 Scales 1 1.00 Frightening 2 .20 1.00 Useful 3 .65 −.26 1.0 Nerve-shaking 4 −.26 .74 −.32 1.00 Hopeful 5 .71 −.27 .70 −.32 1.00 Terrifying 6 −.25 .64 −.30 .68 −.31 1.00 Controllable 7 .64 −.30 .73 −.34 .74 −.34 1.00 Upsetting 8 −.40 .39 −.44 .40 −.47 .39 −.53 1.00 Painless 9 −.13 .24 −.11 .24 −.16 .26 −.17 .27 1.00 Exciting 10 −.45 .36 −.49 .42 −.53 .39 −.53 .58 .32 1.00 Nondepressing 11 .59 −.26 .63 −.31 .32 −.32 .69 −.45 −.15 −.51 1.00 Disturbing 12 −.30 .69 −.35 .75 −.36 .68 −.39 .44 .28 .48 −.38 1.00 Successful 13 −.36 .35 −.50 .43 −.42 .38 −.52 .45 .20 .49 −.45 .46 1.00 Settling (vs. unsettling) 14 −.35 .62 −.36 .65 −.38 .62 −.46 .50 .36 .51 −.45 .67 .49 1.00 Bearable accurate but simple-to-compute approximation of common-factor analysis. Four orthogonal factors were retained, and the matrix of “loadings” associated with the “unrotated factors” is given in Table 1.2. The coefficients in this matrix are correlations of the observed variables with the common factors. However, the unrotated factors of Table 1.2 are not readily interpretable, and they do not in this form appear to correspond to the two expected-meaning dimensions used in selecting the 14 scales. At this point it was decided, after TABLE 1.2 Unrotated Factors 1 2 3 4 1 Frightening .73 .35 .01 −.15 2 Useful −.56 .60 −.10 .12 3 Nerve-shaking .78 .31 −.05 −.12 4 Hopeful −.62 .60 −.09 .13 5 Terrifying .78 .34 .43 .00 6 Controllable −.59 .52 −.06 .08 7 Upsetting .84 .29 −.08 .00 8 Painless −.65 .07 .03 −.33 9 Exciting −.29 .21 −.02 −.35 10 Nondepressing −.70 .04 .03 −.36 11 Disturbing .70 .13 −.59 −.05 12 Successful −.67 .54 −.04 .05 13 Settling −.63 .09 .08 −.16 14 Bearable −.69 .43 .04 −.12 © 2010 by Taylor & Francis Group, LLC 14 Foundations of Factor Analysis some experimentation, to rotate only the first two factors and to retain the latter two unrotated factors as “difficult to interpret” factors. Rotation of the first two factors was done using Kaiser’s normalized Varimax method (cf. Kaiser, 1958). The resulting rotated matrix is given in Table 1.3. The meaning of “rotation” may not be clear to the reader. Therefore let us consider the plot in Figure 1.1 of the 14 variables, using for their coordinates the loadings on the variables on the first two unrotated factors. Here we see that the coordinate axes do not correspond to variables that would be clearly definable by their association with the variables. On the other hand, note the cluster of points in the upper right-hand quadrant (variables 1, 3, 5, and 7) and the cluster of points in the upper left-hand quadrant (variables 2, 4, 6, 12, and 14). It would seem that one could rotate the coordinate axes so as to have them pass near these clusters. As a matter of fact, this is what has been done to obtain the rotated coordinates in Table 1.3, which are plotted in Figure 1.2. Rotated factor 1 appears almost exclusively associated with variables 1, 3, 5, 7, and 11, which were picked as measures of a fear response, whereas rotated factor 2 appears most closely associated with variables 2, 4, 6, 12, and 14, which were picked as measures of optimism regarding outcome. Although variables 8, 10, and 13 appear now to be consistent in their relationships to these two dimensions, they are not unambiguous measures of either factor. Variable 9 appears to be a poor measure of these two dimensions. Some factor analysts at this point might prefer to relax the requirement that the obtained factors be orthogonal to one another. They would, in this case, most likely construct a unit-length vector collinear with variable 7 to TABLE 1.3 Rotated Factors 1 2 1 Frightening .83 −.10 2 Useful −.11 .85 3 Nerve-shaking .84 −.16 4 Hopeful −.17 .87 5 Terrifying .86 −.14 6 Controllable −.18 .80 7 Upsetting .89 −.22 8 Painless −.50 .44 9 Exciting −.12 .34 10 Nondepressing −.57 .43 11 Disturbing .67 −.29 12 Successful −.24 .85 13 Settling −.48 .43 14 Bearable −.33 .76 © 2010 by Taylor & Francis Group, LLC Introduction 15 2 9 .1 .1 –.1 –.2 –.3 –.4 –.5 –.6 –.7 –.8 –.9 .2 .3 .4 .5 .6 .7 .8 .9 –.1–.2–.3–.4–.5–.6–.7–.8–.9 .2 .3 .4 .5 .6 .7 .8 .9 14 8 1310 12 6 24 1 11 5 3 7 1 FIGURE 1.1 Plot of 14 variables on unrotated factors 1 and 2. represent an “oblique” factor 1 and another unit-length vector collinear with variable 12 to represent an “oblique” factor 2. The resulting oblique factors would be negatively correlated with one another and would be interpreted as dimensions that are slightly negatively correlated. Such “oblique” factors are drawn in Figure 1.2 as arrows from the origin. In conclusion, factor analysis has isolated two dimensions among the 14 scales which appear to correspond to dimensions expected to be present when the 14 scales were chosen for study. Factor analysis has also shown that some of the 14 scales (variables 8, 9, 10, and 13) are not unambiguous measures of the intended dimensions. These scales can be discarded in constructing a final set of scales for measuring the intended dimensions. Factor analysis has also revealed the presence of additional, unexpected dimensions among the scales. Although it is possible to hazard guesses as to the meaning of these additional dimensions (represented by factors 3 and 4), such guessing is not strongly recommended. There is considerable likelihood that the interpretation of these dimensions will be spurious. This is not © 2010 by Taylor & Francis Group, LLC 16 Foundations of Factor Analysis to say that factor analysis cannot, at times, discover something unexpected but interpretable. It is just that in the present data the two additional dimensions are so poorly determined from the variables as to be interpretable only with a considerable risk of error. This example of factor analysis represents the traditional, exploratory use of factor analysis where the researcher has some idea of what he will encounter but nevertheless allows the method freedom to find unexpected dimensions (or factors). 1 1 2 11 3 9 6 2 4 .1 –.1 –.1–.2–.3–.4–.5–.6–.7–.8–.9 –.2 –.3 –.4 –.5 –.6 –.7 –.8 –.9 .1 .2 .3 .4 .5 .6 .7 .8 .9 .2 .3 .4 .5 .6 .7 .8 .9 8 13 10 12 14 5 7 FIGURE 1.2 Plot of 14 variables on rotated factors 1 and 2. © 2010 by Taylor & Francis Group, LLC