RSS Matters

RSS Logo

How to Calculate Empirically Derived Composite or Indicator Scores

Link to the last RSS article here: Working with Sage and R -- Ed.

By Dr. Jon Starkweather, Research and Statistical Support Consultant

This month’s article was motivated by the frequent need to calculate composite scores from multiple variables. Often we have Likert response survey data and want to combine several questions’ or items’ responses into a composite score; and then use the composite score(s) as a variable of interest in some traditional analysis. For those unfamiliar, Likert response survey questions have response choices such as; strongly disagree, disagree, agree, strongly agree. The response choices are typically considered ordinal; meaning, they represent sequentially ordered categories (e.g. strongly disagree = 1, disagree = 2, agree = 3, strongly agree = 4). In this context, we typically refer to the words as labels (e.g. “strongly disagree”) and the numbers as values (e.g. “1”). Occasionally, we are confronted with a client who wants to simply average each participant’s response values on several questions to arrive at a composite score for the domain which the questions are believed to be assessing. This is generally a bad idea because, it treats each question as contributing to the composite score equally – which is often not the case when one considers the latent variable structure of what one is attempting to measure or assess.

Essentially, we will be using factor analysis to generate the composite scores. In a very real sense, the best composite scores are factor scores when there is a known, or strongly supported belief in, structure of the data. There are a few ways to go about generating the composite, or factor, scores based on what type of structure you believe the data contains (e.g. single factor, multiple correlated factors, multiple uncorrelated factors, bifactor model, hierarchical factor model, etc.); and how the variables are measured or scaled (e.g., nominal scaled, ordinal or Likert scaled, interval/ratio scaled, etc.).

The general procedure for generating composite / indicator scores includes the following steps: (1) convert, or recode, nominal or ordinal (Likert) responses to numeric responses, (2) apply a factor analysis model which reflects the known structure, or calculated correlation structure, of the variables, (3) save the factor scores and factor loadings, (4) rescale the factor scores using the factor loadings, the weighted mean, and the weighted standard deviation of the original data so that the composite scores reflect (as nearly as possible) the original semantic (i.e., word) meaning of the original data. In this process, the factor loadings serve as weights for the weighted mean and weighted standard deviation calculations. The last step of rescaling the composite scores is necessary because it allows us to retain the meaning of the responses which went into creating the composites. For instance, if we have a composite score of 3.6 and the four questions’ responses which were used to create that composite were all 4-point Likert style with the labels and values; strongly disagree = 1, disagree = 2, agree = 3, strongly agree = 4, then we can say the 3.6 means that the person associated with that score responded more with strongly agree than they did with agree, disagree, or strongly disagree. The primary benefits of using the rescaled factor scores as composite scores are that they are considered interval/ratio scaled and they reflect more closely a true score on the latent construct we were attempting to measure.


First, import some example data (the data file used below has been simulated).

import some example data  

In this example, we have 3 groups of questions: q1 - q4, q5 - q9, and q10 - q14. The response choices for q1 - q4 and q10 - q14 were the same: 1 = strongly disagree, 2 = disagree, 3 = agree, 4 = strongly agree. The response choices for q5 - q9 were: 1 = lethargic, 2 = not very active, 3 = no difference, 4 = more active, 5 = hyperactive. In this example, we have three groups of questions; each group measures a particular latent construct (i.e. indirect measurement) and the three latent constructs are mildly correlated to one another. The above statements represent the known (or strongly supported hypothesis of) the data's (factor) structure; in this example a model with three mildly correlated factors.

Consider the situation where you have a set of Likert scaled items which you believe are the result of one continuously scaled latent factor which is not related to any other questions or factors in the analysis, then you would need to recode the ordinal responses as numeric, then simply run a one factor model, collect the factor scores, and rescale the factor scores as composites scores which reflect the original metric. As an example, consider q1, q2, q3, q4 which we believe reflect a single latent construct.

consider q1, q2, q3, q4  

First, recode the responses into numbers which reflect the ordinality of the original responses (the words). This can be tricky sometimes because R cannot tell if "agree" should be a 1, 2, 3, etc. So, it's best to impose the values on specific labels using a fairly simple function which returns a subset of variables containing the recoded data. Here, extract the four columns of the original data and assign them to an object called ‘subset.1’, then we submit that object to the recoding function and re-assign the result to that same name (subset.1).


Since we now have a properly re-coded (and numeric) version of the subset/data, we can apply a one-factor model. Of course, factor analysis assumes a linear relationship between each of the variables included in the factor model, so it is suggested that linearity be checked among each pair of variables included.

it is suggested that linearity be checked among each pair of variables included.

            Simply use the "$scores" operator on the factor analysis object to extract the factor scores.

Simply use the "$scores" operator

You'll notice the factor scores have a mean of zero, so in order for them to have (semantic) meaning, we must convert them back into the scale of the original 1 - 4 responses. To do this, we will need three things; the new factor scores, the raw data, and the factor loadings. The factor loadings augment the meaning of the composite scores by providing insight into how each question contributed to the composite scores. Notice in this example each of the four questions contributed approximately equally to the composite scores (i.e. the loadings are roughly equal). However, if you have loadings which are not (roughly) equal, then you must communicate the loadings (and why they are important) to anyone interpreting or using the composite scores.

factor loadings

Unfortunately, there is no base level function for calculating the weighted standard deviation (as there is for the weighted mean). Therefore, we create a small function for calculating the weighted standard deviation; needed below during the rescaling of the factor scores. The function takes the vector of values (x) and the weights (w), which are the loadings here, and returns the weighted standard deviation of the vector of values.

we create a small function for calculating the weighted standard deviation

The rescaling function below simply puts the scores back into the metric of the original questions. Keep in mind, some of the final scores may be slightly below '1' and some slightly above '4'; this is because we modeled the latent 'true scores'. Now, we have one set of scores or one variable, which contains each participant's score on the latent variable (subset.1 = ss1: q1 - q4).

The rescaling function

We can go ahead and apply this same general procedure to the other two sets of questions (q5 - q9 and q10 - q14). However, because q5 - q9 have a 5 point Likert response format, we need a second recoding function to put those responses into numeric format.

we need a second recoding function

Next, we create a single function which will take the numeric data and apply the 1 factor model, extract the factor scores, extract the factor loadings, and apply the re-scaling function. This function returns a list object which includes two elements: the rescaled scores and the factor loadings.

the rescaled scores and the factor loadings

Now, we can apply the above function to subset.2 and subset.3.

apply the above function to subset.2 and subset.3

Notice the factor loadings from above are all roughly equal. Next, we can extract the rescaled factor scores.

extract the rescaled factor scores

Now, we can create a data frame which contains just the composite scores for each subset or section of the questionnaire.

create a data frame which contains just the composite scores for each subset or section of the questionnaire

Keep in mind, because the three composite score variables are likely to be related, we could have chosen to run a single factor analysis specifying three latent factors (rather than doing three separate factor analyses). However, if a single model was applied, each question would have a loading for each latent factor and those loadings might be substantial. If those cross-loadings were substantial, then they might call into question the factor structure (i.e. question 2 was supposed to load on factor 1, but instead loaded most on factor 3….). Furthermore, if we know these three latent variables, represented by the three composite score vectors, supported a global or general factor in a hierarchical fashion; then we would use these three composite score vectors in another one factor model to calculate the composite scores for that general factor.


Generating composite scores using weighted factor scores is an extremely useful skill to have in one’s repertoire. The composite scores can be used as independent or dependent variables in more traditional analysis (e.g. linear regression). However, the example above provides only an introduction to calculating these composite scores. When data does not display the necessary linear relationship required of factor analysis, one might explore the use of correspondence analysis, optimal scaling, or data transformations. The best defense against violations of assumptions, such as linearity, are a sound design and careful planning which can often ensure the data one collects is capable of providing the information one is seeking.

References and Resources

Organization for Economic Co-operation and Development (OECD). (2008). Handbook on Constructing Composite Indicators.

Statistics Canada. (2010). Survey Methods and Practices. Ottawa, Canada: Minister of Industry.