**Factor Analysis in Personality Psychology
**

**© 2010**

This eText is the property of Toru Sato. All rights reserved
* © 2010*. This eText is not to be copied, distributed, or
downloaded without permission of the author. Any violation of copyright found
in this eText is unintentional. Please notify the author if copyrighted material
is found and not appropriately referenced.

**Exploratory factor analysis**

There are countless numbers of words and phrases to describe
people's personality. If a personality psychologist wanted to understand
a particular person's personality, it would be an overwhelming task to figure
out how much each of these countless numbers of words and phrases was
characteristic of that person. To make life easier, personality psychologists
commonly use a statistical tool to simplify vast amounts of information by
lumping similar information into clusters. This tool is a procedure known
as **exploratory factor analysis**.

The basic idea behind factor analysis is quite simple. If two or more characteristics correlate, they may reflect a shared underlying trait. We could say then that, patterns of correlations reveal the trait dimensions that lie beneath the measured qualities (Tabachnik & Fidell, 2005). Factor analysis is a more complex version of a correlation. Instead of looking at the correlation between just two variables, factor analysis uses a large number of correlations among a large number of variables (Kline, 1994).

In order to conduct factor analysis, we first collect data on many variables, across large numbers of people. The data can be collected in a myriad of ways. They can be derived from paper and pencil questionnaires rating ourselves on various personality characteristics. They can also be derived from behavior ratings made by objective observers. We can also obtain data about people from their family members, asking them about what they think about that those people. As long as the same data is collected from everybody participating, we can use that data for factor analysis.

Once we collect the data, we can calculate the correlations
between every possible pair of variables. The researcher then examines the
**eigenvalues**, among other things, to decide on the number of factors
the data should be reduced to. Eigenvalues are values corresponding to how
much accuracy we would lose if we simplified the data by lumping it to a
specific number of factors. Although the point of this analysis is to simplify
the data, there is a cost to simplifying data. By simplifying data, we lose
the details and therefore we lose accuracy. The smaller number of factors
we reduce the data to, the more we simplify the data, but the more accuracy
we lose. Eigenvalues, among other things, allow us to conduct a cost-benefit
analysis regarding how much we should simplify the data. After determining
the number of factors the data should be reduced to, the set of correlations
is then put through a procedure called **factor extraction**. This procedure
allows us to reduce the large number of variables to a smaller set of
higher-order variables that we call "factors" (Kline, 1994).

Once the factors are extracted, we end up with a **factor
structure**. The factor structure consists of numerical figures known as
**factor loadings**. It may be useful to think of factor loadings as numbers
representing how much each variable correlates with particular "factors"
(Gorsuch, 1983). Variables that correlate highly with the factor are said
to "load on" that factor. Variables that do not correlate with the factor
are said not to load on it. The variables that load on the factor allow us
to figure out the underlying meaning of the factor (i.e., what do all of
the variables loading on that factor have in common?).

The final step in this process involves labeling the factors. Because a factor is defined by variables that load on it, we must decide on a label to characterize this factor as closely as possible to the content of those variables (especially to the variables with the highest factor loadings). When we use factor analysis in personality research, the factor is typically viewed as a reflection of a personality trait. The label for the factor is the name of the personality trait. Choosing representative labels for the factors is extremely important. Many researchers in Psychology use factor analysis to construct and refine personality tests. Because we often forget that the label of a factor is merely something we have inferred from a cluster of correlating variables, we assume that personality test scores directly reflect the person's personality traits with little to no error. Therefore, carelessness in labeling a factor may lead to misunderstandings of test scores for thousands of people who take that personality test.

To sum up, factor analysis is a very useful statistical tool in the trait approach to personality psychology. Perhaps we could say that it has three very important functions in the study of personality. It simplifies the multiple ways we can understand a person by reducing the information to a smaller set of personality traits. Second, it provides a basis for thinking that perhaps some traits (those that form large highly correlating clusters) are more important than others. Third, factor analysis is extremely useful in creating personality measures. We keep test items (i.e., variables) that load highly and discard items that don't load highly on specific factors. As researchers continue to create new test items, the items that do not load highly on certain factors are replaced by better ones.

Factor analysis is a very useful tool. However, please keep in mind that it is only a tool. Factor analysis can only tell us about the variables we put into it. Thus, the factors that emerge depend largely on the kind of data collected or the variables that were included in the analysis to begin with (Kline, 1994).

**Confirmatory factor analysis**

When we already have a theory about the factor structure
such as with an already established personality test measuring people on
numerous personality traits, we can test the data to see if it fits the existing
factor structure (either due to theory or previous research data). This
is called **confirmatory factor analysis** (using structural equation
modeling). With confirmatory factor analysis, the researcher begins with
a hypothesized structure in mind. This structure specifies which variables
(e.g., responses to personality test questions) will be correlated with which
variables (Stevens, 1996).

We first specify which items will be correlated with
which items by freeing and fixing parameters. We **free parameters** when
we can theoretically assume that there is a significant correlation between
two variables. We **fix parameters** when there is no theoretical assumption
that there is a significant correlation between the variables. Just
as in exploratory factor analysis, we expect these variables to form correlating
clusters. Unlike exploratory factor analysis, however, in confirmatory factor
analysis, we have already determined which variables cluster together with
which. Each cluster represents a "**latent variable**" because it is a
variable that is not directly measured but inferred from the combined results
of numerous measured variables (e.g., responses to personality test questions)
within the cluster (Hershberger, Marcoulides, & Parramore, 2003).

With the use of either a correlation matrix with standard deviations of each of the variables (or a variance/covariance matrix), we conduct the analysis to examine if the data collected matches this factor structure. How well the data matches the hypothesized structure is referred to a "goodness of fit."

There are numerous ways to assess **goodness of fit**.
Common indices for goodness of fit are; results of a chi-squared test,
goodness of fit index (**GFI**), comparative fit index (**CFI**),
incremental fit index (**IFI**), standardized root mean square residual
(**SRMR**) and root mean square error of approximation (**RMSEA**).
These figures are statistically calculated to tell us how well the data matches
our hypothesized factor structure (Mulaik, James, van Alstie, Bennett, Lind,
& Stilwell, 1989).

If the goodness of fit indices are not satisfactory,
we may reconsider our model. If we find that the **maximum likelihood estimates
**(somewhat like correlations) of the free parameters are low, we can fix
those parameters as long as it makes theoretical sense to think that there
is no significant correlation between the variables. We can also look at
the **modification indices** of the fixed parameters. A modification index
tells us how much goodness of fit would improve if we freed a specific fixed
parameter. If the modification index of a parameter is high, we may be able
to free the parameter as long as it makes theoretical sense to think that
there is a significant correlation between the variables. This may lead to
us creating an alternative model that still makes theoretical sense. It may
involve removing some variables altogether or moving certain variables from
one cluster to another. We may then conduct the confirmatory factor analysis
with the new model to see how well it matches the data.

In other types of research we may use confirmatory factor analysis to test for two or more alternative factor structures to see which model matches the data better. In these cases, the model with the better set of goodness of fit indices is considered to be the better one.

When interpreting the findings of confirmatory factor analyses, it is important to keep in mind that oftentimes, there is more than one factor structure that provides an excellent set of goodness of fit indices. Finding one factor structure with an excellent set of goodness of fit indices does not mean that there are no other possible models. In addition, because there are a number of goodness of fit indices, it is not uncommon to find that some of them will say that factor structure A is better than B and others will say the exact opposite. Therefore it is important to evaluate factor structures using multiple goodness of fit indices in a holistic manner (Biddle & Marlin, 1987).

**References**

Biddle, B. J., & Marlin, M. M. (1987). Causality,
confirmation, credulity, and structural equation modeling. *Child Development,
58*, 4-17

Gorsuch, R. L. (1983). *Factor analysis *(2nd ed.).
Hillsdale, NJ: Erlbaum.

Hershberger, S. L. Marcoulides G. A., & Parramore,
M. M. (2003). Structural equation modeling: an introduction. In B. H. Pugesek,
A. Tomer, A. von Eye (Eds.), *Structural equation modeling: Applications
in ecological and evolutionary biology* (pp. 3-41). Cambridge, UK: Cambridge
University Press.

Kline, P. (1994). *An easy guide to factor analysis*.
New York: Routledge.

Mulaik, S.A., James, L.R., van Alstie, J., Bennett, N.,
Lind, S., & Stilwell, C. D. (1989). Evaluation of goodness-of-fit indices
for structural equation models. *Psychological Bulletin, 105*,
430-455.

Stevens, J. P. (1996). *Applied multivariate statistics
for the social sciences *(3rd ed.). Mahwah, NJ: Lawrence Erlbaum
Associates.

Tabachnik, B. G., & Fidell, L. S. (2005). *Using
multivariate statistics *(5th ed.). Needham Heights, MA: Allyn and
Bacon.