Skip to main content

Using Census data to derive a new area-based measure of deprivation - Section 2: Data

Section 2: Data

To derive the new area-based measure, as well as carry out the subsequent analysis, it was necessary to access and link a variety of data sources.

The first of these was 2011 Census data that was available in the public domain. The Census is a UK-wide collection that occurs every ten years and is mandatory for all households to complete. It is administered by the Office for National Statistics (ONS) in England and Wales, while the Northern Ireland Statistics and Research Agency (NISRA) and National Records of Scotland (NRS) gather the relevant data for Northern Ireland and Scotland, respectively. Alongside there being a very high level of coverage across the population, the UK Data Service (2022) Census forms illustrate that there is broad consistency in the way questions are asked across all four nations. A wide range of topics are covered such as employment, education, as well as home and vehicle ownership.

As specified earlier, the smallest geographic domain at which data is subsequently released to the general public is at output area level (or small areas in Northern Ireland). In England and Wales, ONS (2021a) highlight the aspiration was for output areas to contain approximately 125 households, while also being as homogenous as possible (based on tenure and dwelling type). In Scotland, no such requirement was set on homogeneity, with NRS (2015) indicating that output areas are expected to contain between 20 and 78 households. NISRA (2019) point out that small areas in Northern Ireland average around 160 households/400 individuals and are intended to be socially similar. Our starting point was to therefore obtain key statistics at output area level from the 2011 Census supplied by ONS (2021b), NRS (2021b) and NISRA (2021) relating to various indicators, which included;

  a) Age structure (KS102)
  b) Health and provision of unpaid care (KS301)
  c) Household tenure (KS402)
  d) Household composition (KS105)
  e) Qualifications and students (KS501)
  f) National Statistics Socioeconomic Classification (NSSEC) (KS611)

As stated in the introduction, qualifications and occupation (KS611) were the two variables we used to create our index. Data on age, housing tenure, household structure and self-reported health were utilised to enable us to develop some summary statistics on our index and the extent to which it may be correlated with low income. For example, The Health Foundation (2020) demonstrate that poorer self-reported health is correlated with lower household income, while HM Government (2014) note that lone parent households have a higher risk of experiencing long-term poverty.  

This was followed by ingesting look-up files for each of the home nations. ONS (2018a) locates output areas to an English region, while ONS (2018b) indicates how output areas in England and Wales map to LSOAs, middle layer super output areas (MSOAs) and local authority districts. NRS (2011) supply the output area to data/intermediate zone look-up file for Scotland. It should be noted though, that 2011 output areas in Scotland only match perfectly into council areas, with best fit aggregations having to be applied at other levels of geography. For Northern Ireland, NISRA (2013a) disseminate information that highlights how small areas map to larger geographical domains, such as wards and local government districts (LGDs). However, small areas do not nest properly into LGDs, so assignment is commonly determined by the location of the majority of households. Furthermore, there is also a variable which summarises the extent to which an area is urban or rural, which allows us to assess whether the area-level measures that we examine capture rural parts of Northern Ireland. As this data source provides the 1992 LGDs/wards, we also utilise an additional file supplied by NISRA (2013b) to obtain the updated 2014 LGDs. Within this, there is supplementary detail on how 2014 District Electoral Areas (DEAs) match up to the 2014 LGDs. The rationale behind linking these look-up files to our Census data was that we would then have the codes needed to bring in the Indices of Deprivation and/or urban-rural classifications, which are formed at a higher level of geography.

One of the drawbacks of the Indices of Deprivation is that they are less useful in capturing disadvantage in rural areas. Consequently, we wanted to ensure that our dataset contained information on the urban-rural classification for each nation. For Northern Ireland, this was already made available through the look-up file. In England and Wales, a grouping has been developed by the Department for Environment, Food and Rural Affairs (DEFRA, 2021) and we examine their detailed 10-fold categorisation when conducting our analysis for these two countries. For Scotland, the Scottish Government (2019a) have developed a file that outlines how data zones map to higher level geographies, with this also containing an urban-rural classification at varying levels of granularity.

The Indices of Deprivation (both the composite measure and individual domains) were then ingested for all four countries. The Ministry of Housing, Communities and Local Government (2019) was the government department responsible for publishing the most recent version for England. As they also release supplementary data relating to the income deprivation affecting children index (IDACI) and given the relevance of this variable to our work, this was also brought into our dataset. 2019 was also the year that the latest version of the Welsh Index of Multiple Deprivation (WIMD) was disseminated by Welsh Government (2019), while the Scottish Government (2020) circulated their updated Scottish Index of Multiple Deprivation (SIMD) data a year later. In Northern Ireland, NISRA (2017) distributed the Northern Ireland Multiple Deprivation Measure (NIMDM), which also included some extra data on the urban-rural nature of an area, as well as an Income Deprivation Affecting Children (IDAC) indicator.

Additionally, as we stated in the introduction to our paper, interest lies in understanding how indices developed from the Census correlate with income. Concerns around sensitivity and the potential impact on non-response have precluded questions on income from emerging in the Census. However, one file we can ingest to enhance our knowledge of how a derived index may be associated with income is the ONS (2020) small area income estimates for England and Wales, with these figures having been derived using a model-based approach based on a dataset comprising of both survey and administrative sources. Data at MSOA level is available for the financial year 2011/12 and contains four different income measures – total and net weekly household income, as well as equivalised net weekly household income (before and after taking into account housing costs). However, similar data is not available for ingestion into our master dataset in either Scotland or Northern Ireland.

Next: Section 3: Deriving a new area-based measure based on Census 2011