HMRC data brings some brawn, but HESA data is the brains of the operation
Today’s essential reading (on the train, in my case) is the IFS report authored by Anna Vignoles of Cambridge and Neil Shephard of Harvard. This is the second report from a landmark research project that uses HMRC tax data, SLC loans data, and HESA Student data to find out about some of the factors that affect graduate earnings.
This report has been trailed extensively, and much of the attention will rightly focus on the very interesting findings around the impact of family wealth on earnings (compounding the effects that wealth already has on access), and the strong effects related to subject of study. There will also be a lot of interest in the government response, as the research opens up the prospect of using information on salary outcomes to tailor state support for HE very precisely – expect lots of articles in the HE press about the potential unintended consequences of using data like this to determine funding for HE.
For me, the most interesting aspect of the report was the limitation placed on it by not being in a position to use linked individualised HESA data – and what linked HESA data would add to a future study.
The IFS report offers a proof of concept that graduate tax data, with its unparalleled coverage, completeness and accuracy will add a new dimension to the debate on graduate outcomes. But for me, the most interesting aspect of the report was the limitation placed on it by not being in a position to use linked individualised HESA data – and what linked HESA data would add to a future study.
The report’s authors use HESA data at a high level to contextualise the HMRC and SLC data at the level of the institution and subject, but the data is not linked to individuals (p.2). In a ground-breaking study like this, it is fair dinkum to use HESA data in this way, but the scene has been set for more in-depth studies in the future. And even using HESA data to add context (by linking the high-level JACS codes and HE providers in HESA data to those in the individualised SLC data) demonstrates the potential for better public information than we are currently able to achieve from DLHE salary data.
Of course, there are some limitations in this study, as the authors note:
One purpose of this paper is to provide a proof of concept for using the data to provide some useful information that might inform students' choice of degree. However, although we are able to document the variation in earnings across graduates from different institutions, once we start studying subject-institution combinations we find that the 10% sample of data that we are using from HMRC is still not large enough to look in detail at large numbers of higher education institution and subject combinations without making strong econometric modelling assumptions. This limits our ability to provide sufficient information about every subject-institution combination and hence the usability of the database, as it stands, for information provision to students.What and where you study matter for graduate earnings – but so does parents’ income (p. 55)
What kind of information will this be? Expect even more column inches in the sector’s press about what sorts of information might be provided and the effects this will have. I’ll keep my powder dry on that for the time being, only noting that the report’s request could be granted readily, since the legal gateway is already in place – and for every part of the UK, not just for the data on England used in the IFS report.
If the IFS’s proof of concept were to be developed into a public information product, or for funding, there would understandably be calls for the data to be contextualised in a wide range of ways: by region (of domicile and of work); by protected characteristics; by POLAR postcode; by course; by mode of study; by mobility experience, and so on. This is where the advantages of using linked individualised HESA data become clear.
For a start, the gap in the IFS data on self-funded undergraduate students who don’t take out loans from the SLC could be closed immediately – HESA has this data, along with the associated personal characteristics. We could also look at the long-term salary effects of additional protected characteristics, such as ethnicity and disability, at a detailed level.
Far from being a blunt tool to shape public demand or funding for HE, data like this could drive innovation in the curriculum.
Thinking about the impact on curriculum, imagine the impact of being able to track the longer-term salary effect for students who go on a mobility experience, and being able to cut this by country or type of experience, against a control group? The same would apply to a whole range of curriculum data, since HESA data includes course information at a fairly granular level. Far from being a blunt tool to shape public demand or funding for HE, data like this could drive innovation in the curriculum at individual course level – not by ‘following the herd’ but by using the data in a sophisticated way to identify practices, strategies and activities that will have a positive and long-lasting educational impact.
HESA data has the nuanced information about academic outcomes to drive a rational debate about the role of part-time and adult learning – and how it should be funded.
No one has mentioned part-time students yet (and I can’t remember exactly what funding was available to the cohort used in the data) but the SLC is unlikely to be a good source of data for these students. If we want to understand the impact of HE on part-time learners, then only individualised HESA data can support this analysis. We know too little about the impacts of HE on part-time learners – their success rate in finding work (as evidenced in the DLHE) is very high, precisely because many part-time students never stop working during their studies, but beyond this they are an under-studied group, and we have very little systematic understanding of how HE fits into their working lives. This could be about to change. Looking at this the other way around – do part time students make better learners? Could linked data tell us more about the educational performance of students with certain kinds of employment backgrounds? What is the optimal set of background employment characteristics to get the most out of a Masters’ degree, and will this vary by subject? HESA data has the nuanced information about academic outcomes to drive a rational debate about the role of part-time and adult learning – and how it should be funded.
HESA data can add intelligence about what and how students study to raw salary data to demonstrate that educational experiences have desired employment effects.
HE also creates network effects and social benefit – and a linked data approach will be able to show this. HESA data can add intelligence about what and how students study to raw salary data to demonstrate that educational experiences have desired employment effects – not just in terms of salary, but in sector and location of employment, too. HMRC data appears to be weak on location data, but HESA data is not. We have home postcodes prior to study, a variety of postcodes during study as well as the study location itself, and from the DLHE, postcode of employment at 6 months (This is something that we are currently reviewing in our Destinations and Outcomes review). Imagine using this data alongside earnings to develop regional approaches to supporting both HE and business: local priorities could be identified, funded and monitored, with HE providers playing to the full their role as ‘Anchor Institutions’, and benefiting from diversified local funding as a result. At a higher level, governments in each part of the UK could decide on national settlements that support the outcomes they want to see, not just in narrow terms of wealth creation, but by investing in the right kinds of HE necessary to create and maintain a happy and healthy society.
Only HESA data can add the necessary explanatory power needed to create value out of tax and loan data.
HMRC and SLC data will tell us about the salary effects of institution and subject, but when linked to individualised student data could also act as a source of intelligence to drive curriculum development. Combined, this data could be used to create intelligent policy that supports HE to create social wealth, to develop regions and industries, and to raise people up to achieve, whatever their walk of life. Only HESA data can add the necessary explanatory power needed to create value out of tax and loan data.