Factor Analysis in Personality Psychology

© 2010

This eText is the property of Toru Sato. All rights reserved © 2010. This eText is not to be copied, distributed, or downloaded without permission of the author. Any violation of copyright found in this eText is unintentional. Please notify the author if copyrighted material is found and not appropriately referenced.

Exploratory factor analysis

There are countless numbers of words and phrases to describe people's personality. If a personality psychologist wanted to understand a particular person's personality, it would be an overwhelming task to figure out how much each of these countless numbers of words and phrases was characteristic of that person. To make life easier, personality psychologists commonly use a statistical tool to simplify vast amounts of information by lumping similar information into clusters. This tool is a procedure known as exploratory factor analysis.

The basic idea behind factor analysis is quite simple. If two or more characteristics correlate, they may reflect a shared underlying trait. We could say then that, patterns of correlations reveal the trait dimensions that lie beneath the measured qualities (Tabachnik & Fidell, 2005). Factor analysis is a more complex version of a correlation. Instead of looking at the correlation between just two variables, factor analysis uses a large number of correlations among a large number of variables (Kline, 1994).

In order to conduct factor analysis, we first collect data on many variables, across large numbers of people. The data can be collected in a myriad of ways. They can be derived from paper and pencil questionnaires rating ourselves on various personality characteristics. They can also be derived from behavior ratings made by objective observers. We can also obtain data about people from their family members, asking them about what they think about that those people. As long as the same data is collected from everybody participating, we can use that data for factor analysis.

Once we collect the data, we can calculate the correlations between every possible pair of variables. The researcher then examines the eigenvalues, among other things, to decide on the number of factors the data should be reduced to. Eigenvalues are values corresponding to how much accuracy we would lose if we simplified the data by lumping it to a specific number of factors. Although the point of this analysis is to simplify the data, there is a cost to simplifying data. By simplifying data, we lose the details and therefore we lose accuracy. The smaller number of factors we reduce the data to, the more we simplify the data, but the more accuracy we lose. Eigenvalues, among other things, allow us to conduct a cost-benefit analysis regarding how much we should simplify the data. After determining the number of factors the data should be reduced to, the set of correlations is then put through a procedure called factor extraction. This procedure allows us to reduce the large number of variables to a smaller set of higher-order variables that we call "factors" (Kline, 1994).

Once the factors are extracted, we end up with a factor structure. The factor structure consists of numerical figures known as factor loadings. It may be useful to think of factor loadings as numbers representing how much each variable correlates with particular "factors" (Gorsuch, 1983). Variables that correlate highly with the factor are said to "load on" that factor. Variables that do not correlate with the factor are said not to load on it. The variables that load on the factor allow us to figure out the underlying meaning of the factor (i.e., what do all of the variables loading on that factor have in common?).

The final step in this process involves labeling the factors. Because a factor is defined by variables that load on it, we must decide on a label to characterize this factor as closely as possible to the content of those variables (especially to the variables with the highest factor loadings). When we use factor analysis in personality research, the factor is typically viewed as a reflection of a personality trait. The label for the factor is the name of the personality trait. Choosing representative labels for the factors is extremely important. Many researchers in Psychology use factor analysis to construct and refine personality tests. Because we often forget that the label of a factor is merely something we have inferred from a cluster of correlating variables, we assume that personality test scores directly reflect the person's personality traits with little to no error.  Therefore, carelessness in labeling a factor may lead to misunderstandings of test scores for thousands of people who take that personality test.

To sum up, factor analysis is a very useful statistical tool in the trait approach to personality psychology. Perhaps we could say that it has three very important functions in the study of personality. It simplifies the multiple ways we can understand a person by reducing the information to a smaller set of personality traits. Second, it provides a basis for thinking that perhaps some traits (those that form large highly correlating clusters) are more important than others. Third, factor analysis is extremely useful in creating personality measures. We keep test items (i.e., variables) that load highly and discard items that don't load highly on specific factors. As researchers continue to create new test items, the items that do not load highly on certain factors are replaced by better ones.

Factor analysis is a very useful tool. However, please keep in mind that it is only a tool. Factor analysis can only tell us about the variables we put into it. Thus, the factors that emerge depend largely on the kind of data collected or the variables that were included in the analysis to begin with (Kline, 1994).

Confirmatory factor analysis

When we already have a theory about the factor structure such as with an already established personality test measuring people on numerous personality traits, we can test the data to see if it fits the existing factor structure (either due to theory or previous research data).  This is called confirmatory factor analysis (using structural equation modeling). With confirmatory factor analysis, the researcher begins with a hypothesized structure in mind. This structure specifies which variables (e.g., responses to personality test questions) will be correlated with which variables (Stevens, 1996).

We first specify which items will be correlated with which items by freeing and fixing parameters. We free parameters when we can theoretically assume that there is a significant correlation between two variables. We fix parameters when there is no theoretical assumption that there is a significant correlation between the variables. Just as in exploratory factor analysis, we expect these variables to form correlating clusters. Unlike exploratory factor analysis, however, in confirmatory factor analysis, we have already determined which variables cluster together with which. Each cluster represents a "latent variable" because it is a variable that is not directly measured but inferred from the combined results of numerous measured variables (e.g., responses to personality test questions) within the cluster (Hershberger, Marcoulides, & Parramore, 2003).  

With the use of either a correlation matrix with standard deviations of each of the variables (or a variance/covariance matrix), we conduct the analysis to examine if the data collected matches this factor structure.  How well the data matches the hypothesized structure is referred to a "goodness of fit."

There are numerous ways to assess goodness of fit. Common indices for goodness of fit are; results of a chi-squared test, goodness of fit index (GFI), comparative fit index (CFI), incremental fit index (IFI), standardized root mean square residual (SRMR) and root mean square error of approximation (RMSEA). These figures are statistically calculated to tell us how well the data matches our hypothesized factor structure (Mulaik, James, van Alstie, Bennett, Lind, & Stilwell, 1989).

If the goodness of fit indices are not satisfactory, we may reconsider our model. If we find that the maximum likelihood estimates (somewhat like correlations) of the free parameters are low, we can fix those parameters as long as it makes theoretical sense to think that there is no significant correlation between the variables. We can also look at the modification indices of the fixed parameters. A modification index tells us how much goodness of fit would improve if we freed a specific fixed parameter. If the modification index of a parameter is high, we may be able to free the parameter as long as it makes theoretical sense to think that there is a significant correlation between the variables. This may lead to us creating an alternative model that still makes theoretical sense. It may involve removing some variables altogether or moving certain variables from one cluster to another. We may then conduct the confirmatory factor analysis with the new model to see how well it matches the data.

In other types of research we may use confirmatory factor analysis to test for two or more alternative factor structures to see which model matches the data better. In these cases, the model with the better set of goodness of fit indices is considered to be the better one.

When interpreting the findings of confirmatory factor analyses, it is important to keep in mind that oftentimes, there is more than one factor structure that provides an excellent set of goodness of fit indices. Finding one factor structure with an excellent set of goodness of fit indices does not mean that there are no other possible models. In addition, because there are a number of goodness of fit indices, it is not uncommon to find that some of them will say that factor structure A is better than B and others will say the exact opposite. Therefore it is important to evaluate factor structures using multiple goodness of fit indices in a holistic manner (Biddle & Marlin, 1987).


Biddle, B. J., & Marlin, M. M. (1987). Causality, confirmation, credulity, and structural equation modeling. Child Development, 58, 4-17

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.

Hershberger, S.  L. Marcoulides G. A., & Parramore, M. M. (2003). Structural equation modeling: an introduction. In B. H. Pugesek, A. Tomer, A. von Eye (Eds.), Structural equation modeling: Applications in ecological and evolutionary biology (pp. 3-41). Cambridge, UK: Cambridge University Press.

Kline, P. (1994). An easy guide to factor analysis.  New York: Routledge.

Mulaik, S.A., James, L.R., van Alstie, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-455.

Stevens, J. P. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

Tabachnik, B. G., & Fidell, L. S. (2005). Using multivariate statistics (5th ed.). Needham Heights, MA: Allyn and Bacon.

Back to Toru Sato's Homepage

Back to Toru Sato's Theories of Personality page