Skip to main content

Using Census data to generate a UK-wide measure of disadvantage - Data

3. Data

Data source 1: Census 2011

The Census is a UK-wide collection that occurs every ten years and is mandatory for all households to complete. It is administered by the ONS in England and Wales, while the Northern Ireland Statistics and Research Agency (NISRA) and National Records of Scotland gather the relevant data for Northern Ireland and Scotland, respectively. Alongside there being a very high level of coverage across the population, there is consistency (as far as possible) in the way questions are asked across all four nations.[1] A wide range of topics are covered such as employment, education, as well as home and vehicle ownership.

The smallest geographic domain at which data is subsequently released to the general public is at output area level (or ‘small areas’ in Northern Ireland). In England and Wales, the aspiration was for output areas to contain approximately 125 households, while also being as homogenous as possible (based on tenure and dwelling type).[2] In Scotland, no such requirement was set on homogeneity, with output areas expected to contain between 20 and 78 households.[3] Small areas in Northern Ireland average around 160 households/400 individuals and are intended to be socially similar.[4] Note that output areas in England and Wales can be grouped to LSOAs, which themselves can be further aggregated to MSOAs. The latter consist of between 5,000 and 15,000 residents, with the closest comparable grouping in Scotland being Intermediate Zones (IZs), which have a population size ranging from 2,500 to 6,000.[5]

Our starting point was to therefore download key statistics at output area level from the 2011 Census across all nations relating to four aspects that one may believe to be associated with disadvantage.[6] These were:

  • Qualifications
  • Tenure (i.e. whether one owns or (socially/privately) rents their accommodation)
  • Car or van availability

These were then transformed to create the following variables:

  • Proportion of residents in an output area aged 16 and over with below level 4 qualifications[7]
  • Proportion of residents in an output area aged 16 to 74 in NSSEC groups 3 to 8 (those that couldn’t be classified were excluded from the calculation)[8]
  • Proportion of households in an output area living in (rented) social housing
  • Proportion of households in an output area without a car or van

Data source 2: ONS small area income estimates

In order to increase understanding about poverty and deprivation, there has been a long-standing requirement among government departments and policymakers for income data. Sensitivity concerns and the potential impact on non-response have precluded questions on income appearing in the Census. As an alternative, the ONS have generated small area income estimates (at MSOA level) for England and Wales using a model-based approach, which draws upon the Family Resources Survey (FRS) and various administrative data sources (including the 2011 Census).[9] The final dataset created by the ONS that we utilise in this study is based on the financial year 2011/12[10] and consists of four income measures (where equivalised figures take into account household composition)[11];

- Total household weekly income (unequivalised)

- Net household weekly income (unequivalised)

- Net household weekly income before housing costs (equivalised)

- Net household weekly income after housing costs (equivalised)

As the ONS note in the accompanying statistical bulletin, such estimates carry their own uncertainty and given the more aggregated geographic level at which such data is available (as well as being limited to England and Wales only), we do not consider this in our derivation of a UK-wide measure of disadvantage. However, these various measures of income that have been generated by the ONS also highlight the complexities with using income (or any measure derived from this) in determining disadvantage, such as establishing the most appropriate definition of income. More importantly in this research, the ONS dataset also contains variables such as local authority name, which prove most useful in our analysis – as will become clear later in this paper. To assist in understanding data at MSOA level, we also merge in MSOA names provided by the House of Commons Library.[12]

Data source 3: Scottish intermediate zones

Public Health Scotland publish a range of open data covering various issues. Under the Health and Care theme, one is able to access a range of geography codes and associated labels. This file allows one to obtain the label names for the 1,279 IZs in 2011 within Scotland, as well as the associated higher level geographies, with council area name being of particular interest to us in this study.[13]

Data source 4: Northern Ireland Look Up Table

In a similar fashion to Public Health Scotland, Northern Ireland also disseminate information that highlights how small areas map to larger geographical domains, such as wards and local government districts (LGDs). Furthermore, there is also a variable which summarises the extent to which an area is urban or rural, which allows us to assess whether the area-level measures that we examine capture rural parts of Northern Ireland.[14] As this data source provides the 1992 LGDs/wards, we also utilise an additional file supplied by NISRA to obtain the updated 2014 LGDs. Within this, there is supplementary detail on how 2014 District Electoral Areas (DEAs) match up to the 2014 LGDs.[15]

Data Source 5: Urban-rural identification in England, Wales and Scotland

Given the presence of an urban-rural marker in the data for Northern Ireland, we explored whether similar information was available for the other nations of the UK. In England and Wales, a grouping has been developed by the Department for Environment, Food and Rural Affairs (DEFRA) and we examine the more detailed 10-fold classification when conducting our analysis for these two countries. In Scotland, the relevant data is supplied through Public Health Scotland. We again choose to utilise the most granular categorical variable provided, which consists of eight categories.[16]

Data source 6: HESA data

Our population of interest in the HESA Student record is UK domiciled full-time first degree entrants aged 18 to 20 in the academic year 2011/12. Some existing measures used in widening participation policy, such as POLAR or parental occupation (collected through the UCAS application form) relate specifically to young entrants. Given part of this study will be focusing on assessing the similarities and differences of our variable to existing measures, it is important that we restrict our attention to a comparable group. From the Student record, we extracted any individual-level data that has been considered to be of use in assessing access to higher education (e.g. a derived POLAR marker, parental education/occupation, state school marker etc), alongside fields relating to demographic and course characteristics, as well as prior qualifications. Using an individual’s postcode information, we then link this bespoke dataset to information gathered from the ONS postcode directory, which enables us to obtain the output area and MSOAs (IZs) in which one lives prior to commencing higher education study and an additional measure of disadvantage (IMD).[17]

It is on the basis of the output area and MSOA (IZ) codes in the various sources outlined above that we are able to create a linked file that matches HESA records with these external datasets.

Back: 2. Widening participation: What measures are or could be available? Next: 4. The derivation of a new measure of disadvantage

[1] Please see for sample questionnaires from each nation.

[7] Level 4 qualifications (or above) comprise of those who hold a degree, professional qualification or other equivalent higher education qualifications.

[8] NSSEC 3 to 8 covers intermediate occupations, small employers and own account workers, lower supervisory and technical occupations, semi-routine occupations, routine occupations and those who have never worked (or are long-term unemployed).

[14] This data source can be found at Note that small areas are allocated a Settlement (2015), which is utilised to develop the (2015) urban-rural indicator.

[15] The excel file can be found at and is called ‘District Electoral Areas 2014 Lookup Tables’.

[17] We use 2015 IMD for England, 2014 IMD for Wales, 2012 IMD for Scotland and 2010 IMD for Northern Ireland.