Graduate Outcomes and other data on graduates
While Graduate Outcomes is the only national survey designed specifically to provide insight into the experiences of higher education graduates, the domains of several other datasets overlap to an extent with the domain of the Graduate Outcomes survey. Graduates in further study at UK higher education providers will be recorded in the HESA Student Record, and linking the two datasets can provide further information about the quality of Graduate Outcomes data. Beyond HESA, both the Longitudinal Educational Outcomes (LEO) study and the Labour Force Survey (LFS) collect data on education and salary, with the LFS also including detailed information on employment and occupation. While the Graduate Outcomes, LEO, and the LFS can provide complementary views of graduates in the workforce, it is important to understand key differences between the three data sources.
In Spring 2021, HESA analysts carried out a quality assurance investigation based on linked data from the HESA Student record and the 2017/18 Graduate Outcomes dataset. A linked dataset was constructed linking all graduates in the 2017/18 target population with the Student records from 2017/18 to 2019/20, and fuzzy matching of data items contained in both Student and Graduate Outcomes was used to identify those members of the Graduate Outcomes population who appeared to be in further study according to the Student record during the relevant Graduate Outcomes census week. By investigating the characteristics of graduates who appeared to be in further study in both datasets, those who recorded themselves in Graduate Outcomes as engaged in a course of further study or training but could not be found in the Student record, and those who appeared to be in further study in the Student record but not in the Graduate Outcomes dataset, we hoped to evaluate the extent to which data on further study in the two datasets is consistent and comparable.
For those graduates who could be found in further study in both datasets, we examined the quality of the matches between the two datasets and identified the most frequent areas of mismatch. Of the 33,525 records which could be found in both datasets, 86% matched on UKPRN, 91% matched on mode of study, and 85% matched on level of study; nearly 79% of records matched on both provider and mode of study.
Mismatches at the level of UKPRN were most often the result of null data in Graduate Outcomes; of the records which did not match on UKPRN, 88% had a null value for the Graduate Outcomes variable UCNAME. Of the records which did not match on mode of study, 27% were the result of null data, and 46% were the result of graduates recording themselves as engaged in full-time study when the Student record indicated that they were enrolled part-time; the majority of these mismatches by mode of study occurred at the postgraduate (taught) level. Mismatches on level of study seemed to stem from a level of graduate uncertainty about the official levels of different qualifications, with approximately 10% of graduates who identified themselves as engaged in a postgraduate (research) course recorded in the Student record as enrolled on a postgraduate (taught) course, and approximately 70% of those who said they were studying for a professional qualification likewise enrolled on a postgraduate (taught) course.
Of the 21,840 records which could be found in Graduate Outcomes, but not the Student record, 15,755 stated that they were currently engaged in further study, while 6,090 reported that they were due to start further study in the next month; the 6,090 graduates due to start further study would not yet be expected to be in the Student record. Of the 15,755 respondents who recorded themselves as currently engaged in further study, nearly two thirds were not aiming for a formal qualification, were studying for a professional qualification, or were studying for a type of qualification not mentioned in the survey; graduates engaged in these types of study, like those who had not yet started their course of study, would not necessarily be expected to be in the Student record.
Of the 40,430 records which could be found in the Student record, but not in Graduate Outcomes, 29,595 (73%) represent members of the target population who did not respond to the Graduate Outcomes survey. Of the 10,835 respondents who could be found in further study in the Student record, but did not report themselves as engaged in further study in the Graduate Outcomes survey, 74% identified themselves as engaged in employment, 15% said they were unemployed, and 11% said they were engaged in some other activity. 66% of the 10,835 respondents said they had undertaken further study since graduation; of those, approximately one third were within 30 days of their course end date during census week, so it is possible that they had completed most or all of the formal requirements of their course and therefore considered themselves to be no longer in study.
The work which has so far been carried out on linked Graduate Outcomes and Student data suggests that, where individuals can be found in both datasets, the two datasets match quite closely, which in turn suggests that the Graduate Outcomes data is generally robust. The cases in which individuals who should be identifiable in both datasets can only be found in one have raised a number of questions for us to pursue in further quality investigations and assessments of the survey instrument. Some of these questions surround the use of a more detailed linked dataset to help us better understand areas of mismatch, some surround the potential use of other relevant datasets to enhance our quality assurance capabilities, and some surround additional guidance which might help respondents provide fuller information about their further study.
The LEO dataset, which was first published in 2017, brings together education data from the Department for Education (DfE) along with employment, earnings, and benefits data from the Department for Work and Pensions (DWP) and Her Majesty’s Revenue and Customs (HMRC). Using these sources, LEO provides earnings and benefits information for graduates one, three, five, and ten years after completion of their qualifications; it also includes data on personal characteristics (gender, ethnicity, and age), university attended, subject studied, qualification achieved, and graduate movement between home region, provider region, and current region.
Unlike Graduate Outcomes, which, as a survey, depends on the individual responses of graduates, the LEO dataset is drawn from administrative data and includes information on all graduates from English providers in paid work in the UK; since LEO earnings data comes directly from HMRC, it is free of some of the risks of inaccuracy inherent in self-reported salary data. LEO does not, however, include data on hours worked, so it is not possible to distinguish between graduates who are in full-time work and those who are working part-time; this can be a particular issue for data on female graduates, who are more likely to be working part-time than their male counterparts. LEO also does not include data on graduates doing voluntary or unpaid work, and, because the LEO earnings data does not include self-assessment earnings, LEO data on graduates in self-employment cannot be entirely representative. LEO data, moreover, does not include information about occupation; the LEO record tells us what graduates earn, but it does not give us any further information about what graduates do.
Graduate Outcomes and LEO thus provide different pictures of the graduate population in the UK. One of the goals in the design of the Graduate Outcomes survey was to provide statistical outputs which could contextualise data on graduates from other sources, such as LEO, and this goal is reflected in the breadth of information collected in the Graduate Outcomes survey. While the LEO dataset provides data on a small number of variables for most graduates in the UK, and while it, moreover, tracks changes in earnings over time, the Graduate Outcomes survey provides a more detailed picture of each annual cohort at a single point in their post-university careers. The LEO dataset measures graduate outcomes only in terms of whether graduates are in paid employment and, if so, how much they are earning, while the Graduate Outcomes survey collects a broader range of information about what graduates are doing and how they feel about it.
While LEO is specifically geared towards collecting data about employment outcomes for higher education graduates, the LFS is a household survey designed to collect data about the employment circumstances of the UK population as a whole. It was first run in 1973 as a biennial survey and shifted to an annual survey in 1984; since 1992, the LFS has been collected quarterly, with a switch from seasonal to calendar quarters in 2006. Households participating in the LFS are surveyed for five consecutive quarters, with a fifth of the overall sample being replaced each quarter. Where LEO collects administrative data on all graduates in employment in the UK, the LFS is administered to a systematic sample of approximately 35,000 households in Great Britain, plus approximately 2,500 households from Northern Ireland; conclusions about overall patterns in employment circumstances are thus drawn from a relatively small portion of the UK population.
Unlike the LFS, which is concerned with the entire UK labour force, Graduate Outcomes is concerned only with those who have completed HE qualifications in a given year, and, while there will inevitably be some level of non-response, Graduate Outcomes aims to collect data from the entire target population. With 361,215 responses in the first year and 380,980 in the second, the Graduate Outcomes sample is thus much larger than the annual sample collected by the LFS, despite the narrower focus of the Graduate Outcomes survey.
Although both Graduate Outcomes and the LFS include questions about employment and education, the focuses of the two surveys are different. The LFS is primarily focused on employment, but participants are also asked to respond to the ONS4 SWB questions and to a series of questions about their educational attainment. Since not all LFS respondents have the same educational qualifications, the educational information collected in the survey allows for some comparison of outcomes between people with different educational histories. All Graduate Outcomes respondents, on the other hand, are higher education graduates, so different comparisons are possible; rather than encouraging comparisons between graduates and non-graduates, Graduate Outcomes encourages comparisons between different categories of graduates.
Respondents to the LFS can be at any stage in their careers; for those who have higher education qualifications, this means that they may be selected to participate in the LFS shortly after finishing their qualifications, or they may be selected many years later. Even within the subset of LFS respondents with higher education qualifications, there will therefore be a wider variation in experiences and possible outcomes than is likely to be visible in Graduate Outcomes, where graduates are deliberately surveyed at the same point in their post-university careers. While Graduate Outcomes provides a cross-section of the experiences of higher education graduates 15 months after finishing their qualifications, the LFS can provide glimpses into what their lives may be like at a variety of different points.
If we are looking for a complete picture of what happens to higher education graduates in the UK, Graduate Outcomes, LEO, and the LFS all fill in different pieces of the puzzle. Although the datasets could fruitfully be used in conjunction with each other – the use of the same set of SWB questions in Graduate Outcomes and the LFS might, for example, allow for some research into the comparative SWB of graduates and non-graduates – in making any comparison between the three data sources, it will be important to recognise the differences in methodology and coverage between the sources. To return to the example of SWB comparisons, although LFS and Graduate Outcomes respondents answer the same four questions about SWB, they are faced with those questions at different points in their careers, and differences in SWB may depend on a range of factors not necessarily connected to education.
In addition to enabling careful comparisons between graduates and the population as a whole or between different stages in graduates’ careers, the existence of other datasets with overlapping domains is likely to be important in the future development of Graduate Outcomes. When LEO data was first published, the DfE conducted a comparison between the LEO and DLHE datasets; HESA has in the past carried out similar comparisons in order to check the quality of DLHE salary data, and a further, detailed comparison of LEO and Graduate Outcomes would provide useful information about the respective strengths and weaknesses of the two datasets. HESA also hopes in future years to explore the possibility of linking the Graduate Outcomes record with other relevant datasets, including LEO salary data. Doing so will not only allow us to streamline our collection processes, but also, and perhaps more importantly, it will allow us to provide a fuller view of the trajectories of graduates after they leave higher education.
 Department for Education. 2021. Tax Year 2018/19. Graduate Outcomes (LEO). https://explore-education-statistics.service.gov.uk/find-statistics/graduate-outcomes-leo/2018-19
 Due to the limitations of LEO as a representative measure of female earnings, researchers from the Institute for Fiscal Studies chose to focus on the earnings of sons in their recent report for the Social Mobility Commission, The Long Shadow of Deprivation: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/923623/SMC_Long_shadow_of_deprivation_MAIN_REPORT_Accessible.pdf
 Department for Education. 2017. Employment and earnings outcomes for higher education graduates by subject and institution: experimental statistics using the Longitudinal Educational Outcomes (LEO) data. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/718225/SFR_18_2017_LEO_mainText.pdf
 Office for Statistics Regulation. 2019. Exploring the public value of statistics about post-16 education and skills in England. Office for Statistical Regulation Systematic Review Programme. https://www.statisticsauthority.gov.uk/publication/exploring-the-public-value-of-statistics-about-post-16-education-and-skills-in-england/
5] Further discussion of the goals which shaped the design of the of the survey can be found in relevant sections of the Graduate Outcomes Survey methodology; see https://www.hesa.ac.uk/data-and-analysis/graduates/methodology/understanding-outcomes and https://www.hesa.ac.uk/data-and-analysis/graduates/methodology/review-topics
 Office for National Statistics. 2018. Labour Force Survey User Guide, Volume 1: LFS Background and Methodology. Available from: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/methodologies/labourforcesurveyuserguidance
 HESA 2021. Graduate Outcomes Cohort D Review: C18071 2018/19. https://www.hesa.ac.uk/files/End%20of%20cohort%20D%20report%20C18071.pdf
HESA. 2020. Graduate Outcomes Cohort D Review: C17071 2017/18. https://www.hesa.ac.uk/files/End%20of%20cohort%20D%20report.pdf
 Office for National Statistics. 2020. Labour Force Survey User Guide, Volume 2: User guide to the LFS questionnaire. https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/methodologies/labourforcesurveyuserguidance
 Department for Education. 2016. Employment and earnings outcomes for higher education graduates: experimental statistics using the Longitudinal Educational Outcomes (LEO) data. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/543794/SFR36-2016_main_text_LEO.pdf
 HESA. Key principles of Graduate Outcomes. https://www.hesa.ac.uk/innovation/outcomes/about/principles