Fitting the Common Factor Model
PSYn5440 – Introduction to Factor Analysis
Week 6

The Common Factor Model


The Common Factor Model


The Common Factor Model


The Common Factor Model
•
•
•You learned a LOT already!
•
•More than a majority of factor analysis practitioners know about the model J (pretty sad, huh?)

Fitting the model
•One important thing to note is that these models are intended for a population – they are
population models, describing how stuff works in a population.
•
•Anyway, at the beginning we learned that there are two sides to factor analysis – theory and
methodology.
•What we have covered so far was the theory (the model itself)
•Now, we will focus on the methodology (how to fit the model on data / how to estimate the unknowns
in the model)

Fitting the model
•More specifically, we will focus on the theoretical basis for fitting the model. Later on in the
course, we will cover the actual thing in practice (software and examples).
•
•A model represents some hypothesized structure of data. Different methods are available for
fitting the model to data and obtaining estimates of model parameters (the elements in model
matrices) and providing us with information on how well the model fits the data.

Fitting the model
•For the sake of argument, we will consider the hypothetical scenario where the population
correlation matrix P is known, and the model holds exactly in the population (i.e., the model
explains P perfectly)
•
•This will never ever be the case in practice, but it’s a better starting point to begin
understanding the principles.
•
•Later, we will drop these assumptions, no worries.

The population correlation matrix


Rotational indeterminacy


Rotational indeterminacy


Rotational indeterminacy


Rotational indeterminacy


Rotational indeterminacy


•Let’s look at an example, using the example data we have seen before.
•
•The matrix P is given as follows:
•
•
•
•
•


The communality problem


The communality problem
•Many solutions were suggested to the communality problem.
•
•The one that “won” (was and is the most widely used) was suggested by Louis Guttman in 1940.
•
•Guttman suggested squared multiple correlations (SMCs) as the initial approximations to
communalities.

The communality problem
•Just what is a squared multiple correlation (SMC)?
•
•Imagine you have p manifest variables. You can try to predict the j-th manifest variable from the
other (p - 1) manifest variables, linear regression-style.
•
•This prediction will be imperfect. You can correlate these predicted values of the j-th manifest
variable with the actual values of the variable. What you will get is a correlation coefficient,
the multiple correlation coefficient. Square it and you get the SMC.

The communality problem


The communality problem
•However, in order to obtain the population SMCs, we need to know P in the first place. Most often,
we don’t.
•
•In practice, we can apply the same procedure to a sample correlation matrix, R, in order to obtain
sample SMCs. Since, in reality, we usually work with sample correlation matrices, let’s slowly
shift the gear towards thinking more about a sample correlation matrix R and less about the
population correlation matrix, P.

Working with a sample correlation matrix
•So far, we have studied factor analysis limiting ourselves to the ideal scenario in which we know
the population correlation matrix, P. Moreover, we only considered the case where the model holds
exactly in the population.
•
•Now, let’s consider the real world in which we do not have access to P but we do have access to R.
In this real world scenario, we are not even sure the sample correlation matrix R is drawn from a
population with a correlation matrix P for which the model holds.
•
•As before, let’s just consider the uncorrelated / orthogonal model for now.

Working with a sample correlation matrix


Working with a sample correlation matrix


Working with a sample correlation matrix
•Every element in the residual matrix tells us how far is the model-implied (predicted) value of
this element from its observed value.
•
•Alright, so – again, we don’t have a population correlation matrix P which we used for all the
computations and methods covered before. What are we going to do?
•
•Of course, we’re going to pretend like the problem isn’t there and we’ll start by doing things in
the exact same way.

Working with a sample correlation matrix


Working with a sample correlation matrix
•Again, we will obtain some eigenvalues and some eigenvectors. However, in this case (not having a
population correlation matrix, not being sure the model holds exactly in the population), we will
generally not obtain an eigen-solution where the (p – m) smallest eigenvalues are zero.
•
•Thus, we cannot rely on the number of non-zero eigenvalues to show us the “true” number of factors
(m). Thus, we will have to choose m ourselves beforehand, based on our best judgement (more on that
later)
•
•

Working with a sample correlation matrix


Working with a sample correlation matrix


Working with a sample correlation matrix


Example


Example


Example


Example


Example


Example


Example


Example


Example
•The solution produced a residual matrix with minimum sum of squares, conditional on the prior
communality estimates. If the prior communality estimates would be different, a different residual
matrix would satisfy the RSS criterion.
•

Short review


Iterative procedure


Iterative procedure


Iterative procedure


Iterative procedure
•That’s really all there is (in principle) about OLS.
•
•By the way, the RSS function (the formula we have seen before) is a discrepancy function – it
quantifies the distance between the observed and model-implied correlation matrices. In other
words, it expresses the degree of lack of model fit.
•
•Being a discrepancy function, it is always greater than or equal to zero and is zero only when the
observed and model-implied correlation matrices are the same.
•
•
•

Heywood cases
•One nasty thing can happen when using OLS estimation
•
•That is, some communalities can, in the course of the iterations, be greater than one. Conversely,
the unique variances can become less than zero (because in a standardized solution, the communality
and the unique variance of an MV add up to one)
•
•But there’s no such thing as negative variance. Thus, such a solution would be nonsensical and
unacceptable. We call these occurrences Heywood cases

Heywood cases
•If you’re using smart software, you should be notified whenever a Heywood case occurs
•
•If you’re using smart software, it can help you circumvent the problem by placing a constraint on
the associated unique variance such that it can only be greater than or equal to zero.

Summary


Summary


Summary


Summary