1
1
Introduction
1.1 Factor Analysis and Structural Theories
By a structural theory we shall mean a theory that regards a phenomenon as
an aggregate of elemental components interrelated in a lawful way. An excellent
example of a structural theory is the theory of chemical compounds:
Chemical substances are lawful compositions of the atomic elements, with
the laws governing the compositions based on the manner in which the
electron orbits of different atoms interact when the atoms are combined in
molecules.
Structural theories occur in other sciences as well. In linguistics, for example,
structural descriptions of language analyze speech into phonemes or
morphemes. The aim of structural linguistics is to formulate laws governing
the combination of morphemes in a particular language. Biology has a
structural theory, which takes, as its elemental components, the individual
cells of the organism and organizes them into a hierarchy of tissues, organs,
and systems. In the study of the inheritance of characters, modern geneticists
regard the manifest characteristics of an organism (phenotype) as a function
of the particular combination of genes (genotype) in the chromosomes of the
cells of the organism.
Structural theories occur in psychology as well. At the most fundamental
level a psychologist may regard behaviors as ordered aggregates of cellular
responses of the organism. However, psychologists still have considerable
difficulty in formulating detailed structural theories of behavior because
many of the physical components necessary for such theories have not been
identified and understood. But this does not make structural theories impossible
in psychology. The history of other sciences shows that scientists can
understand the abstract features of a structure long before they know the
physical basis for this structure. For example, the history of chemistry indicates
that chemists could formulate principles regarding the effects of mixing
compounds in certain amounts long before the atomic and molecular aspects
of matter were understood. Gregor Mendel stated the fundamental laws of
inheritance before biologists had associated the chromosomes of the cell
with inheritance. In psychology, Isaac Newton, in 1704, published a simple
mathematical model of the visual effects of mixing different hues, but nearly
a hundred years elapsed before Thomas Young postulated the existence of
© 2010 by Taylor & Francis Group, LLC
2 Foundations of Factor Analysis
three types of color receptors in the retina to account for the relationships
described in Newton’s model. And only a half-century later did physiologist
Helmholtz actually give a physiological basis to Young’s theory. Other physiological
theories subsequently followed. Much of psychological theory today
still operates at the level of stating relationships among stimulus conditions
and gross behavioral responses.
One of the most difficult problems of formulating a structural theory
involves discovering the rules that govern the composition of the aggregates
of components. The task is much easier if the scientist can show that the
physical structure he is concerned with is isomorphic to a known mathematical
structure. Then, he can use the many known theorems of the mathematical
structure to make predictions about the properties of the physical
structure. In this regard, George Miller (1964) suggests that psychologists
have used the structure of euclidean space more than any other mathematical
structure to represent structural relationships of psychological processes.
He cites, for example, how Isaac Newton’s (1704) model for representing the
effects of mixing different hues involved taking the hues of the spectrum in
their natural order and arranging them as points appropriately around the
circumference of a circle. The effects of color mixtures could be determined
by proportionally weighting the points of the hues in the mixture according
to their contribution to the mixture and finding the center of gravity of
the resulting points. The closer this center of gravity approached the center
of the color circle, the more the resulting color would appear gray. In addition,
Miller cites Schlosberg’s (1954) representation of perceived emotional
similarities among facial expressions by a two-dimensional graph with one
dimension interpreted as pleasantness versus unpleasantness and the other
as rejection versus attention, and Osgood’s (1952) analysis of the components
of meaning of words into three primary components: (1) evaluation,
(2) power, and (3) activity.
Realizing that spatial representations have great power to suggest the existence
of important psychological mechanisms, psychologists have developed
techniques, such as metric and nonmetric factor analysis and metric and nonmetric
multidimensional scaling, to create, systematically, spatial representations
from empirical measurements. All four of these techniques represent
objects of interest (e.g., psychological “variables” or stimulus “objects”) as
points in a multidimensional space. The points are so arranged with respect
to one another in the space as to reflect relationships of similarity among the
corresponding objects (variables) as given by empirical data on these objects.
Although a discussion of the full range of techniques using spatial representations
of relationships found in data would be of considerable interest,
we shall confine ourselves, in this book, to an in depth examination of the
methods of factor analysis. The reason for this is that the methodology of
factor analysis is historically much more fully developed than, say, that
of multidimensional scaling; as a consequence, prescriptions for the ways
© 2010 by Taylor & Francis Group, LLC
Introduction 3
of doing factor analysis are much more established than they are for these
other techniques. Furthermore, factor analysis, as a technique, dovetails very
nicely with such classic topics in statistics as correlation, regression, and
multivariate analysis, which are also well developed. No doubt, as the gains
in the development of multidimensional scaling, especially the nonmetric
versions of it, become consolidated, there will be authors who will write textbooks
about this area as well. In the meantime, the interested reader can
consult Torgerson’s (1958) textbook on metric multidimensional scaling for
an account of that technique.
1.2 Brief History of Factor Analysis as a Linear Model
The history of factor analysis can be traced back into the latter half of the
nineteenth century to the efforts of the British scientist Francis Galton (1869,
1889) and other scientists to discover the principles of the inheritance of
manifest characters (Mulaik, 1985, 1987). Unlike Gregor Mendel (1866), who
is today considered the founder of modern genetics, Galton did not try to
discover these principles chiefly through breeding experiments using simple,
discrete characters of organisms with short maturation cycles; rather, he
concerned himself with human traits such as body height, physical strength,
and intelligence, which today are not believed to be simple in their genetic
determination. The general question asked by Galton was: To what extent are
individual differences in these traits inherited and by what mechanism? To
be able to answer this question, Galton had to have some way of quantifying
the relationships of traits of parents to traits of offspring. Galton’s solution
to this problem was the method of regression. Galton noticed that, when
he took the heights of sons and plotted them against the heights of their
fathers, he obtained a scatter of points indicating an imperfect relationship.
Nevertheless, taller fathers tended strongly to have on the average taller sons
than shorter fathers. Initially, Galton believed that the average height of sons
of fathers of a given height would be the same as the height of the fathers, but
instead the average was closer to the average height of the population of sons
as a whole. In other words, the average height of sons “regressed” toward
the average height in the population and away from the more extreme height
of their fathers. Galton believed this implied a principle of inheritance and
labeled it “regression toward the mean,” although today we regard the
regression phenomenon as a statistical artifact associated with the linearregression
model. In addition, Galton discovered that he could fit a straight
line, called the regression line, with positive slope very nicely through the
average heights of sons whose fathers had a specified height. Upon consultation
with the mathematician Karl Pearson, Galton learned that he could use
© 2010 by Taylor & Francis Group, LLC
4 Foundations of Factor Analysis
a linear equation to relate the heights of fathers to heights of sons (cf. Pearson
and Lee, 1903):
= + +Y a bX E (1.1)
Here
Y is the height of a son
a the intercept with the Y-axis of the regression line passing through the
averages of sons with fathers of fixed height
b the slope of the regression line
X the height of the father
E an error of prediction
As a measure of the strength of relationship, Pearson used the ratio of the
variance of the predicted variable, Yˆ=a+bX, to the variance of Y.
Pearson, who was an accomplished mathematician and an articulate writer,
recognized that the mathematics underlying Galton’s regression method had
already been worked out nearly 70 years earlier by Gauss and other mathematicians
in connection with determining the “true” orbits of planets from observations
of these orbits containing error of observation. Subsequently, because
the residual variate E appeared to be normally distributed in the prediction of
height, Pearson identified the “error of observation” of Gauss’s theory of leastsquares
estimation with the “error of prediction” of Equation 1.1 and treated
the predicted component Yˆ=a+bX as an estimate of an average value. This initially
amounted to supposing that the average heights of sons would be given
in terms of their fathers’ heights by the equation Y=a+bX (without the error
term), if nature and environment did not somehow interfere haphazardly in
modifying the predicted value. Although Pearson subsequently found the
field of biometry on such an exploitation of Gauss’s least-squares theory of
error, modern geneticists now realize that heredity can also contribute to the E
term in Equation 1.1 and that an uncritical application of least-squares theory
in the study of the inheritance of characters can be grossly misleading.
Intrigued with the mathematical problems implicit in Galton’s program to
metricize biology, anthropology, and psychology, Pearson became Galton’s
junior colleague in this endeavor and contributed enormously as a mathematical
innovator (Pearson, 1895). After his work on the mathematics of
regression, Pearson concerned himself with finding an index for indicating
the type and degree of relationship between metric variables (Pearson, 1909).
This resulted in what we know today as the product-moment correlation
coefficient, given by the formula
− −
ρ =
− −2 2
[( ( ))( ( ))]
[( ( )) ] [( ( )) ]
XY
E X E X Y E Y
E X E X E Y E Y
(1.2)
where
E[] is the expected-value operator
X and Y are two random variables
© 2010 by Taylor & Francis Group, LLC
Introduction 5
This index takes on values between −1 and +1, with 0 indicating no relationship.
A deeper meaning for this coefficient will be given later when we
consider that it represents the cosine of the angle between two vectors, each
standing for a different variable.
In and of itself the product-moment correlation coefficient is descriptive
only in that it shows the existence of a relationship between variables without
showing the source of this relationship, which may be causal or coincidental
in nature. When the researcher obtains a nonzero value for the
product-moment correlation coefficient, he must supply an explanation for
the relationship between the related variables. This usually involves finding
that one variable is the cause of the other or that some third variable (and
maybe others) is a common cause of them both. In any case, interpretations
of correlations are greatly facilitated if the researcher already has a structural
model on which to base his interpretations concerning the common
component producing a correlation.
To illustrate, some of the early applications of the correlation coefficient
in genetics led nowhere in terms of furthering the theory of inheritance
because explanations given to nonzero correlation coefficients were frequently
tautological, amounting to little more than saying that a relationship
existed because such-and-such relationship-causing factor was present
in the variables exhibiting the relationship. However, when Mendel’s theory
of inheritance (Mendel, 1865) was rediscovered during the last decade of the
nineteenth century, researchers of hereditary relationships had available to
them a structural mechanism for understanding how characters in parents
were transmitted to offspring. Working with a trait that could be measured
quantitatively, a geneticist could hypothesize a model of the behavior of the
genes involved and from this model draw conclusions about, say, the correlation
between relatives for the trait in the population of persons. Thus,
product-moment correlation became not only an exploratory, descriptive
index but an index useful in hypothesis testing. R. A. Fisher (1918) and Sewell
Wright (1921) are credited with formulating the methodology for using correlation
in testing Mendelian hypotheses.
With the development of the product-moment correlation coefficient, other
related developments followed: In 1897, G. U. Yule published his classic
paper on multiple and partial correlation. The idea of multiple correlation
was this: Suppose one has p variables, X1, X2,…, Xp, and wishes to find that
linear combination
= β + +β1 2 2
ˆ p pX X X (1.3)
of the variables X2,…, Xp, which is maximally correlated with X1. The problem
is to find the weights β2,…, βp that make the linear combination Xˆ
1
maximally correlated with X1. After Yule’s paper was published, multiple
correlation became quite useful in prediction problems and turned out to
be systematically related but not exactly equivalent to Gauss’s solution for
linear least-squares estimation of a variable, using information obtained on
© 2010 by Taylor & Francis Group, LLC
6 Foundations of Factor Analysis
several independent variables observed at certain preselected values. In any
case, with multiple correlation, the researcher can consider several components
(on which he had direct measurements) as accounting for the variability
in a given variable.
At this point, the stage was set for the development of factor analysis. By
the time 1900 had arrived, researchers had obtained product-moment correlations
on many variables such as physical measurements of organisms,
intellectual-performance measures, and physical-performance measures.
With variables showing relationships with many other variables, the need
existed to formulate structural models to account for these relationships. In
1901, Pearson published a paper on lines and planes of closest fit to systems
of points in space, which formed the basis for what we now call the principal-axes
method of factoring. However, the first common-factor-analysis
model is attributed to Spearman (1904). Spearman intercorrelated the test
scores of 36 boys on topics such as classics, French, English, mathematics,
discrimination of tones, and musical talent. Spearman had a theory, primarily
attributed by him to Francis Galton and Herbert Spencer, that the abilities
involved in taking each of these six tests were a general ability, common to
all the tests, and a specific ability, specific to each test. Mathematically, this
amounts to the equation
= + ψj j jY a G (1.4)
where
Yj is the jth manifest variable (e.g., test score in mathematics)
aj is a weight indicating the degree to which the latent general-ability variable
G participates in Yj
ψj is an ability variable uncorrelated with G and specific to Yj
Without loss of generality, one can assume that E(Yj)=E(ψj)=E(G)=0, for all j,
implying that all variables have zero means. Then saying that ψj is specific to
Yj amounts to saying that ψj does not covary with another manifest variable
Yk, so that E(Yk ψj)=0, with the consequence that E(ψj ψk)=0 (implying that
different specific variables do not covary). Thus the covariances between different
variables are due only to the general-ability variable, that is,
=
=
=
2
2
( ) [( + )( + )]
( + + + )
( )
j k j j k k
j k j k k j j k
j k
E YY E a G a G
E a a G a G a G
a a E G
ψ ψ
ψ ψ ψ ψ
(1.5)
From covariance to correlation is a simple step. Assuming in Equation 1.5
that E(G2)=1 (the variance of G is equal to 1), we then can derive the correlation
between Yj and Yk:
ρ =jk j ka a (1.6)
© 2010 by Taylor & Francis Group, LLC
Introduction 7
Spearman noticed that the pattern of correlation coefficients obtained among
the six intellectual-test variables in his study was consistent with the model
of a single common variable and several specific variables.
For the remainder of his life Spearman championed the doctrine of one
general ability and many specific abilities, although evidence increasingly
accumulated which disconfirmed such a simple model for all intellectualperformance
variables. Other British psychologists contemporary with
Spearman either disagreed with his interpretation of general ability or modified
his “two-factor” (general and specific factors) theory to include “group
factors,” corresponding to ability variables not general to all intellectual
variables but general to subgroups of them. The former case was exemplified
by Godfrey H. Thomson (cf. Thomson, 1956), who asserted that the mind
does not consist of a part which participates in all mental performance and a
large number of particular parts which are specific to each performance.
Rather, the mind is a relatively undifferentiated collection of many tiny parts.
Any intellectual performance that we may consider involves only a “sample”
of all these many tiny parts. Two performance measures are intercorrelated
because they sample overlapping sets of these tiny parts. And Thomson was
able to show how data consistent with Spearman’s model were consistent
with his sampling-theory model also.
Another problem with G is that one tends to define it mathematically as
“the common factor common to all the variables in the set” rather than in
terms of something external to the mathematics, for example “rule inferring
ability.” This is further exacerbated by the fact that to know what something
is, you need to know what it is not. If all variables in a set have G in common,
you have no instance for which it does not apply, and G easily becomes
mathematical G by default. If there were other variables that were not due to
G in the set, this would narrow the possibilities as to what G is in the world
by indicating what it is not pertinent to. This problem has subtly bedeviled
the theory of G throughout its history.
On the other hand, psychologists such as Cyril Burt and Philip E. Vernon
took the view that in addition to a general ability (general intelligence), there
were less general abilities such as verbal–numerical–educational ability and
practical–mechanical–spatial–physical ability, and even less general abilities
such as verbal comprehension, reading, spelling, vocabulary, drawing, handwriting,
and mastery of various subjects (cf. Vernon, 1961). In other words,
the mind was organized into a hierarchy of abilities running from the most
general to the most specific. Their model of test scores can be represented
mathematically much like the equation for multiple correlation as
Yj =ajG+bj1G1 +…+bjsGs +cj1H1 +…+cjtHt +ψj
where
Yj is a manifest intellectual-performance variable
aj, bj1,…, bjs, cj1,…, cjt are the weights
G is the latent general-ability variable
© 2010 by Taylor & Francis Group, LLC
8 Foundations of Factor Analysis
G1,…, Gs are the major group factors
H1,…, Ht are the minor group factors
ψj is a specific-ability variable for the jth variable
Correlations between two observed variables, Yj and Yk, would depend upon
having not only the general-ability variable in common but group-factor
variables in common as well.
By the time all these developments in the theory of intellectual abilities
had occurred, the 1930s had arrived, and the center of new developments in
this theory (and indirectly of new developments in the methodology of common-factor
analysis) had shifted to the United States where L. L. Thurstone
at the University of Chicago developed his theory and method of multiplefactor
analysis. By this time, the latent-ability variables had come to be called
“factors” owing to a usage of Spearman (1927).
Thurstone differed from the British psychologists over the idea that there
was a general-ability factor and that the mind was hierarchically organized.
For him, there were major group factors but no general factor. These major
group factors he termed the primary mental abilities. That he did not cater
to the idea of a hierarchical organization for the primary mental abilities
was most likely because of his commitment to a principle of parsimony; this
caused him to search for factors which related to the observed variables in
such a way that each factor pertained as much as possible to one nonoverlapping
subset of the observed variables. Sets of common factors displaying
this property, Thurstone said, had a “simple structure.” To obtain an optimal
simple structure, Thurstone had to consider common-factor variables
that were intercorrelated. And in the case of factor analyses of intellectualperformance
tests, Thurstone discovered that usually his common factors
were all positively intercorrelated with one another. This fact was considerably
reassuring to the British psychologists who believed that by relying
on his simple-structure concept Thurstone had only hidden the existence of
a general-ability factor, which they felt was evidenced by the correlations
among his factors.
Perhaps one reason why Thurstone’s simple-structure approach to factor
analysis became so popular—not just in the United States but in recent years
in England and other countries as well—was because simple-structure solutions
could be defined in terms of more-or-less objective properties which
computers could readily identify and the factors so obtained were easy to
interpret. It seemed by the late 1950s when the first large-scale electronic
computers were entering universities that all the drudgery could be taken
out of factor-analytic computations and that the researcher could let the computer
do most of his work for him. Little wonder, then, that not much thought
was given to whether theoretically hierarchical solutions were preferable
to simple-structure solutions, especially when hierarchical solutions did
not seem to be blindly obtainable. And believing that factor analysis could
automatically and blindly find the key latent variables in a domain, what
© 2010 by Taylor & Francis Group, LLC
Introduction 9
researchers would want hierarchical solutions which might be more difficult
to interpret than simple-structure solutions?
The 1950s and early 1960s might be described as the era of blind factor
analysis. In this period, factor analysis was frequently applied agnostically,
as regards structural theory, to all sorts of data, from personality-rating variables,
Rorschach-test-scoring variables, physiological variables, semantic
differential variables, and biographical-information variables (in psychology),
to characteristics of mining districts (in mineralogy), characteristics
of cities (in city planning), characteristics of arrowheads (in anthropology),
characteristics of wasps (in zoology), variables in the stock market (in economics),
and aromatic-activity variables (in chemistry), to name just a few
applications. In all these applications the hope was that factor analysis could
bring order and meaning to the many relationships between variables.
Whether blind factor analyses often succeeded in providing meaningful
explanations for the relationships among variables is a debatable question.
In the case of Rorschach-test-score variables (Cooley and Lohnes, 1962) there
is little question that blind factor analysis failed to provide a manifestly
meaningful account of the structure underlying the score variables. Again,
factor analyses of personality trait-rating variables have not yielded factors
universally regarded by psychologists as explanatory constructs of human
behavior (cf. Mulaik, 1964; Mischel, 1968). Rather, the factors obtained in
personality trait-rating studies represent confoundings of intrarater processes
(semantic relationships among trait words) with intraratee processes
(psychological and physiological relationships within the persons rated). In
the case of factor-analytic studies of biographical inventory items, the chief
benefit has been in terms of classifying inventory items into clusters of
similar content, but as yet no theory as to life histories has emerged from
such studies. Still, blind factor analyses have served classification purposes
quite well in psychology and other fields, but these successes should not be
interpreted as generally providing advances in structural theories as well.
In the first 60 years of the history of factor analysis, factor-analytic methodologists
developed heuristic algebraic solutions and corresponding algorithms
for performing factor analyses. Many of these methods were designed
to facilitate the finding of approximate solutions using mechanical hand
calculators. Harman (1960) credits Cyril Burt with formulating the centroid
method, but Thurstone (1947) gave it its name and developed it more fully as
an approximation to the computationally more challenging principal axes,
the eigenvector–eigenvalue solution put forth by Hotelling (1933). Until the
development of electronic computers, the centroid method was a simple and
straightforward solution that highly approximated the principal axes solution.
But in the 1960s, computers came on line as the government poured
billions into the development of computers for decryption work and into the
mathematics of nuclear physics in developing nuclear weapons. Out of the latter
came fast computer algorithms for finding eigenvectors and eigenvalues.
Subsequently, factor analysts discovered the computer, and the eigenvector
© 2010 by Taylor & Francis Group, LLC
10 Foundations of Factor Analysis
and eigenvalue routines and began programming them to obtain principal
axes solutions, which rapidly became the standard approach. Nevertheless
most of the procedures initially used were still based on least-squares
methods, for the statistically more sophisticated method of maximumlikelihood
estimation was still both mathematically and computationally
challenging.
Throughout the history of factor analysis there were statisticians who
sought to develop a more rigorous statistical theory for factor analysis.
In 1940, Lawley (1940) made a major breakthrough with the development
of equations for the maximum-likelihood estimation of factor loadings
(assuming multivariate normality for the variables), and he followed up
this work with other papers (1942, 1943, 1949) that sketched a framework
for statistical testing in factor analysis. The problem was, to use these
methods you needed maximum-likelihood estimates of the factor loadings.
Lawley’s computational recommendations for finding solutions were
not practical for more than a few variables. So, factor analysts continued to
use the centroid method and to regard any factor loading less than .30 as
“nonsignificant.”
In the 1950s, Rao (1955) developed an iterative computer program for
obtaining maximum-likelihood estimates, but this was later shown not to
converge. Howe (1955) showed that the maximum-likelihood estimates of
Lawley (1949) could be derived mathematically without making any distributional
assumptions at all by simply seeking to minimize the determinant
of the matrix of partial correlations among residual variables after partialling
out common factors from the original variables. Brown (1961) noted that
the same idea was put forth on intuitive grounds by Thurstone in 1953. Howe
also provided a far more efficient Gauss–Seidel algorithm for computing the
solution. Unfortunately, this was ignored or unknown. In the meantime,
Harman and Jones (1966) presented their Gauss–Seidel minres method of
least-squares estimation, which rapidly converged and yielded close approximations
to the maximum-likelihood estimates.
The major breakthrough mathematically, statistically, and computationally
in maximum-likelihood exploratory factor analysis, was made
by Karl Jöreskog (1967), then a new PhD in mathematical statistics from
the University of Uppsala in Sweden. He applied a then recently developed
numerical algorithm of Fletcher and Powell (1963) to the maximumlikelihood
estimation of the full set of parameters of the common-factor
model. The algorithm was quite rapid in convergence. Jöreskog’s algorithm
has been the basis for maximum-likelihood estimation in most commercial
computer programs ever since. However, the program was not always
well integrated with other computing methods in some major commercial
programs, so that the program reports principal components eigenvalues
rather than those of the weighted reduced correlation matrix of the commonfactor
model provided by Jöreskog’s method, which Jöreskog used in
initially determining the number of factors to retain.
© 2010 by Taylor & Francis Group, LLC
Introduction 11
Recognizing that more emphasis should be placed on the testing of hypotheses
in factor-analytic studies, factor analysts in the latter half of the 1960s
began increasingly to concern themselves with the methodology of hypothesis
testing in factor analysis. The first efforts in this regard, using what are
known as procrustean transformations, trace their beginnings to a paper by
Mosier (1939) that appeared in Psychometrika nearly two decades earlier. The
techniques of procrustean transformations seek to transform (by a linear
transformation) the obtained factor-pattern matrix (containing regression
coefficients for the observed variables regressed onto the latent, underlying
factors) to be as much as possible like a hypothetical factor-pattern matrix
constructed according to some structural hypothesis pertaining to the variables
studied. When the transformed factor-pattern matrix is obtained it is
tested for its degree of fit to the hypothetical factor-pattern matrix. For example,
Guilford (1967) used procrustean techniques to isolate factors predicted
by his three-faceted model of the intellect. However, hypothesis testing with
procrustean transformations have been displaced in favor of confirmatory
factor analysis since the 1970s, because the latter is able to assess how well
the model reproduces the sample covariance matrix.
Toward the end of the 1960s, Bock and Bargmann (1966) and Jöreskog
(1969a) considered hypothesis testing from the point of view of fitting a
hypothetical model to the data. In these approaches the researcher specifies,
ahead of time, various parameters of the common-factor-analysis model
relating manifest variables to hypothetical latent variables according to a
structural theory pertaining to the manifest variables. The resulting model
is then used to generate a hypothetical covariance matrix for the manifest
variables that is tested for goodness of fit to a corresponding empirical covariance
matrix (with unspecified parameters of the factor-analysis model
adjusted to make the fit to the empirical covariance matrix as good as possible).
These approaches to factor analysis have had the effect of encouraging
researchers to have greater concern with substantive, structural theories
before assembling collections of variables and implementing the factoranalytic
methods.
We will treat confirmatory factor analysis in Chapter 15, although it is
better treated as a special case of structural equation modeling, which would
be best dealt in a separate book. The factor analysis we primarily treat in this
book is exploratory factor analysis, which may be regarded as an “abductive,”
“hypothesis-generating” methodology rather than a “hypothesis-testing”
methodology. With the development of structural equation, modeling
researchers have come to see traditional factor analysis as a methodology
to be used, among other methods, at the outset of a research program, to
formulate hypotheses about latent variables and their relation to observed
variables. Furthermore it is now regarded as just one of several approaches
to formulating such hypotheses, although it has general applications any
time one believes that a set of observed variables are dependent upon a set
of latent common factors.
© 2010 by Taylor & Francis Group, LLC
12 Foundations of Factor Analysis
1.3 Example of Factor Analysis
At this point, to help the reader gain a more concrete appreciation of what is
obtained in a factor analysis, it may help to consider a small factor-analytic
study conducted by the author in connection with a research project designed
to predict the reactions of soldiers to combat stress. The researchers had the
theory that an individual soldier’s reaction to combat stress would be a function
of the degree to which he responded emotionally to the potential danger
of a combat situation and the degree to which he nevertheless felt he
could successfully cope with the situation. It was felt that, realistically, combat
situations should arouse strong feelings of anxiety for the possible dangers
involved. But ideally these feelings of anxiety should serve as internal
stimuli for coping behaviors which would in turn provide the soldier with
a sense of optimism in being able to deal with the situation. Soldiers who
respond pessimistically to strong feelings of fear or anxiety were expected
to have the greatest difficulties in managing the stress of combat. Soldiers
who showed little appreciation of the dangers of combat were also expected
to be unprepared for the strong anxiety they would likely feel in a real combat
situation. They would have difficulties in managing the stress of combat,
especially if they had past histories devoid of successful encounters with
stressful situations.
To implement research on this theory, it was necessary to obtain measures
of a soldier’s emotional concern for the danger of a combat situation and of
his degree of optimism in being able to cope with the situation. To obtain
these measures, 14 seven-point adjectival rating scales were constructed, half
of which were selected to measure the degree of emotional concern for threat,
and half of which were selected to measure the degree of optimism in coping
with the situation. However, when these adjectival scales were selected, the
researchers were not completely certain to what extent these scales actually
measured two distinct dimensions of the kind intended.
Thus, the researchers decided to conduct an experiment to isolate the
common-meaning dimensions among these 14 scales. Two hundred and
twenty-five soldiers in basic training were asked to rate the meaning of “firing
my rifle in combat” using the 14 adjectival scales, with ratings being obtained
from each soldier on five separate occasions over a period of 2 months.
Intercorrelations among the 14 scales were then obtained by summing the
cross products over the 225 soldiers and five occasions. (Intercorrelations
were obtained in this way because the researchers felt that, although on any
one occasion various soldiers might differ in their conceptions of “firing
my rifle in combat” and on different occasions an individual soldier might
have different conceptions, the major determinants of covariation among the
adjectival scales would still be conventional-meaning dimensions common
to the scales.) The matrix of intercorrelations, illustrated in Table 1.1, was then
subjected to image factor analysis (cf. Jöreskog, 1962), which is a relatively
© 2010 by Taylor & Francis Group, LLC
Introduction 13
TABLE 1.1
Intercorrelations among 14 Scales
1 1.00 Frightening
2 .20 1.00 Useful
3 .65 −.26 1.0 Nerve-shaking
4 −.26 .74 −.32 1.00 Hopeful
5 .71 −.27 .70 −.32 1.00 Terrifying
6 −.25 .64 −.30 .68 −.31 1.00 Controllable
7 .64 −.30 .73 −.34 .74 −.34 1.00 Upsetting
8 −.40 .39 −.44 .40 −.47 .39 −.53 1.00 Painless
9 −.13 .24 −.11 .24 −.16 .26 −.17 .27 1.00 Exciting
10 −.45 .36 −.49 .42 −.53 .39 −.53 .58 .32 1.00 Nondepressing
11 .59 −.26 .63 −.31 .32 −.32 .69 −.45 −.15 −.51 1.00 Disturbing
12 −.30 .69 −.35 .75 −.36 .68 −.39 .44 .28 .48 −.38 1.00 Successful
13 −.36 .35 −.50 .43 −.42 .38 −.52 .45 .20 .49 −.45 .46 1.00 Settling (vs. unsettling)
14 −.35 .62 −.36 .65 −.38 .62 −.46 .50 .36 .51 −.45 .67 .49 1.00 Bearable
accurate but simple-to-compute approximation of common-factor analysis.
Four orthogonal factors were retained, and the matrix of “loadings” associated
with the “unrotated factors” is given in Table 1.2. The coefficients in this
matrix are correlations of the observed variables with the common factors.
However, the unrotated factors of Table 1.2 are not readily interpretable, and
they do not in this form appear to correspond to the two expected-meaning
dimensions used in selecting the 14 scales. At this point it was decided, after
TABLE 1.2
Unrotated Factors
1 2 3 4
1 Frightening .73 .35 .01 −.15
2 Useful −.56 .60 −.10 .12
3 Nerve-shaking .78 .31 −.05 −.12
4 Hopeful −.62 .60 −.09 .13
5 Terrifying .78 .34 .43 .00
6 Controllable −.59 .52 −.06 .08
7 Upsetting .84 .29 −.08 .00
8 Painless −.65 .07 .03 −.33
9 Exciting −.29 .21 −.02 −.35
10 Nondepressing −.70 .04 .03 −.36
11 Disturbing .70 .13 −.59 −.05
12 Successful −.67 .54 −.04 .05
13 Settling −.63 .09 .08 −.16
14 Bearable −.69 .43 .04 −.12
© 2010 by Taylor & Francis Group, LLC
14 Foundations of Factor Analysis
some experimentation, to rotate only the first two factors and to retain the
latter two unrotated factors as “difficult to interpret” factors. Rotation of
the first two factors was done using Kaiser’s normalized Varimax method
(cf. Kaiser, 1958). The resulting rotated matrix is given in Table 1.3.
The meaning of “rotation” may not be clear to the reader. Therefore let us
consider the plot in Figure 1.1 of the 14 variables, using for their coordinates
the loadings on the variables on the first two unrotated factors. Here we see
that the coordinate axes do not correspond to variables that would be clearly
definable by their association with the variables. On the other hand, note the
cluster of points in the upper right-hand quadrant (variables 1, 3, 5, and 7)
and the cluster of points in the upper left-hand quadrant (variables 2, 4, 6,
12, and 14). It would seem that one could rotate the coordinate axes so as to
have them pass near these clusters. As a matter of fact, this is what has been
done to obtain the rotated coordinates in Table 1.3, which are plotted in
Figure 1.2.
Rotated factor 1 appears almost exclusively associated with variables 1, 3, 5,
7, and 11, which were picked as measures of a fear response, whereas rotated
factor 2 appears most closely associated with variables 2, 4, 6, 12, and 14,
which were picked as measures of optimism regarding outcome. Although
variables 8, 10, and 13 appear now to be consistent in their relationships to
these two dimensions, they are not unambiguous measures of either factor.
Variable 9 appears to be a poor measure of these two dimensions.
Some factor analysts at this point might prefer to relax the requirement
that the obtained factors be orthogonal to one another. They would, in this
case, most likely construct a unit-length vector collinear with variable 7 to
TABLE 1.3
Rotated Factors
1 2
1 Frightening .83 −.10
2 Useful −.11 .85
3 Nerve-shaking .84 −.16
4 Hopeful −.17 .87
5 Terrifying .86 −.14
6 Controllable −.18 .80
7 Upsetting .89 −.22
8 Painless −.50 .44
9 Exciting −.12 .34
10 Nondepressing −.57 .43
11 Disturbing .67 −.29
12 Successful −.24 .85
13 Settling −.48 .43
14 Bearable −.33 .76
© 2010 by Taylor & Francis Group, LLC
Introduction 15
2
9
.1
.1
–.1
–.2
–.3
–.4
–.5
–.6
–.7
–.8
–.9
.2
.3
.4
.5
.6
.7
.8
.9
–.1–.2–.3–.4–.5–.6–.7–.8–.9 .2 .3 .4 .5 .6 .7 .8 .9
14
8
1310
12
6
24
1
11
5
3 7
1
FIGURE 1.1
Plot of 14 variables on unrotated factors 1 and 2.
represent an “oblique” factor 1 and another unit-length vector collinear with
variable 12 to represent an “oblique” factor 2. The resulting oblique factors
would be negatively correlated with one another and would be interpreted
as dimensions that are slightly negatively correlated. Such “oblique” factors
are drawn in Figure 1.2 as arrows from the origin.
In conclusion, factor analysis has isolated two dimensions among the 14
scales which appear to correspond to dimensions expected to be present
when the 14 scales were chosen for study. Factor analysis has also shown
that some of the 14 scales (variables 8, 9, 10, and 13) are not unambiguous
measures of the intended dimensions. These scales can be discarded in constructing
a final set of scales for measuring the intended dimensions. Factor
analysis has also revealed the presence of additional, unexpected dimensions
among the scales. Although it is possible to hazard guesses as to the
meaning of these additional dimensions (represented by factors 3 and 4),
such guessing is not strongly recommended. There is considerable likelihood
that the interpretation of these dimensions will be spurious. This is not
© 2010 by Taylor & Francis Group, LLC
16 Foundations of Factor Analysis
to say that factor analysis cannot, at times, discover something unexpected
but interpretable. It is just that in the present data the two additional dimensions
are so poorly determined from the variables as to be interpretable only
with a considerable risk of error.
This example of factor analysis represents the traditional, exploratory use of
factor analysis where the researcher has some idea of what he will encounter
but nevertheless allows the method freedom to find unexpected dimensions
(or factors).
1
1
2
11
3
9
6
2
4
.1
–.1
–.1–.2–.3–.4–.5–.6–.7–.8–.9
–.2
–.3
–.4
–.5
–.6
–.7
–.8
–.9
.1 .2 .3 .4 .5 .6 .7 .8 .9
.2
.3
.4
.5
.6
.7
.8
.9
8
13
10
12
14
5
7
FIGURE 1.2
Plot of 14 variables on rotated factors 1 and 2.
© 2010 by Taylor & Francis Group, LLC