# Using Census data to derive a new area-based measure of deprivation - Section 3: Derivation

## Section 3: Deriving a new area-based measure based on Census 2011

To form our composite measure, we utilise data at the output area level on the qualifications and occupations of residents.

Earlier in this paper, we noted why ownership of a vehicle was not suitable to include in developing a measure for deprivation. While Townsend (1987) incorporates housing quality into the construction of his index, a synthesis of the evidence around the drivers of poverty by HM Government (2014) illustrates that it is low income that is more likely to result in a family living in poor quality housing, rather than the other way around. The same study, however, highlights the important role that education plays in deprivation. ONS (2018c) illustrates the relationship between education and earnings, with those possessing a degree or equivalent having the highest gross hourly median pay. Meanwhile, Table 29 in ONS (2022) demonstrates how those in professional occupations tend to have the largest earnings. As noted in HM Government (2014), the transmission mechanism linking education, occupation and income is therefore as follows. Adults with no/few recognised qualifications are more likely to experience spells of unemployment/underemployment and have a higher likelihood of finding low-paid work (e.g. routine occupations like being a waiter or member of bar staff). In contrast, those with higher levels of education often secure employment in high wage work (e.g. professional jobs such as accountancy and law) – a pattern illustrated by ONS (2001).

Across the UK, there are 232,296 output areas. For each of these, we generate the following two variables;

- Proportion of residents in the output area aged 16 and over with below level 4 qualifications
- Proportion of residents in the output area aged 16 to 74 in NSSEC groups 3 to 8 (those that couldn’t be classified were excluded from the calculation)

Table 1 presents some initial summary statistics relating to the population sizes of the two variables we draw upon to create our composite measure. We see from this that the number of residents aged 16 to 74 in NSSEC groups 1-8 averages (median) 193 across the output areas of the UK. The median value is slightly higher at 233 when considering the education levels of those aged 16 and over (from which the proportion of individuals with below level 4 qualifications is calculated). Though there are a few particularly small/large populations at either end of the scale, the proportions that make up our measure are generally based on totals ranging from 40 to 450. Smaller population sizes are more common in Scotland where the number of households included in an output area can be as low as 20. In Australia, two of the criteria adopted by the Australian Bureau of Statistics when deciding whether an area will be given a score is if the population level is under 11 or if the denominator of a variable in the index is less than 6. In such areas, confidentiality and data quality concerns would lead to the value being suppressed. Only 1 output area in our file would fail to meet such criteria. Indeed, only in 0.1% of cases do we encounter an output area where the occupation proportion is based on a total of less than 30. We therefore do not impose any form of suppression on our data in this paper.

**Table 1: Summary statistics on the population sizes for the two variables used to generate the area-based measure of disadvantage**

Occupation | Qualifications | |
---|---|---|

Minimum | 5 | 33 |

1st percentile | 41 | 52 |

50th percentile | 193 | 233 |

99th percentile | 355 | 436 |

Maximum | 1,895 | 4,087 |

We find that in the median output area, the proportion of residents with below level 4 qualifications stands at 76%, while the corresponding figure for the percentage in NSSEC groups 3 to 8 is 68%. The standard deviation for both fields is very similar (around 15). Indeed, when exploring the correlation between the two, we found there to be a highly linear and positive relationship, with the Pearson’s correlation coefficient being 0.91. Figure 1 demonstrates this graphically. Earlier, we discussed the transmission mechanism between education and occupation. Reassuringly, the expected pattern emerges in the data, with areas in which residents have higher levels of qualification more likely to be working in professional jobs and hence being classified in NSSEC groups 1 or 2.

**Figure 1: The correlation between the qualification and occupation indicators in the Census used to create our measure of deprivation**

In the Townsend index, each of the four variables are standardised before being added together (they all carry equal weight). The unemployment and overcrowding indicators are log transformed prior to standardising, due to the skewed nature of these fields. Standardising a variable is often carried out when we have several variables that are not all measured on the same scale. That is not the case here though. Both the variables we are looking to work with are percentages that can range from 0 to 100.

Furthermore, we have two variables that display a strong linear association (it is not too distant from being the case that the qualification and occupation variables have equal values across output areas, which would be represented by a correlation of 1). Both exhibit quite similar distributions and yield comparable summary statistics. Here therefore, the simplest method by which a composite measure can be created is either through taking an average of the two values for each output area or proceeding with using one of the indicators only. We choose the former approach. Utilising a technique such as principal component analysis would not produce any material difference in the final composite measure and would also bring the disadvantage of being less easy to interpret for the end user of the statistics. Indeed, we find the correlation between our preferred approach to creating the index and an alternative method such as principal component analysis to be 1.

For each of our 232,296 output areas therefore, we assign a value equal to the average of the two variables we originally started with on education and occupation. Hence, the median output area would have a figure of 72% for this new composite measure we have created. All output areas are then ranked based on this numeric value, with those areas displaying the highest percentages considered to be the most disadvantaged, as they have the largest proportions of residents without level 4 qualifications/working in occupations that fall into NSSEC groups 3 to 8.

Next: Section 4: Is our measure of deprivation likely to be correlated with income?

#### Contents

- Summary
- Abstract
- 1. Introduction
- 2. Data
**3. Deriving a new area-based measure based on Census 2011**- 4. Is our measure of deprivation likely to be correlated with income?
- 5. Should housing tenure form part of a composite measure based on the Census?
- 6. How does our measure compare to the Indices of Deprivation?
- 7. Discussion and concluding remarks
- References