Skip to main content

Keeping track: Can linking higher education records tell us anything about the quality of parental education data?

First-in-family data has been the subject of discussion within the sector over the past year. In this piece, HESA assess the quality of the parental education information we collect.

Key findings

  • Students registered on full-time degree courses who transfer providers across academic years will generally be required to complete the UCAS application form on two separate occasions.
  • This supplies an opportunity to assess the extent to which transferring students give the same answer to the parental education question.
  • Among those who do indicate whether they had at least one parent/step-parent/guardian with a higher education qualification in both instances, the agreement rate stands at 91%.
  • While this is a reassuring finding, it does not preclude the possibility that respondents have supplied an incorrect response.
  • Future research may wish to consider the reasons for students stating that they do not know the answer, refuse to say or give no response at all.
  • This could assist with reducing the extent of ‘missing’ data and improve the utility of this variable across the sector.


Assessing the proportion of students who transfer higher education provider across academic years has been a core aspect of the non-continuation performance indicators produced by HESA.[1] However, looking at students who move between two different providers can also offer an additional method by which to evaluate the quality of data collected in the Student record. In this piece, we explain how and why this is possible for the parental education variable, as well as the results of an analysis we conducted for this field.

This insight will be of particular relevance to those involved in widening participation activity (e.g. outreach work), given the parental education variable is used by some providers and organisations working in this area to support decision-making. Indeed, recent research has illustrated the potential usefulness of utilising such data for these purposes.[2] With parental education and higher education participation recently being the focus of questions in Parliament, this exploration may also be of interest to policymakers.[3] Additionally, our study may be helpful to those involved in developing questions (e.g. in social surveys) relating to this topic. For example, there is a currently a programme of work taking place around developing harmonised standards on socioeconomic background.[4]

As a producer of official statistics, we have a responsibility to carry out quality assurance activity on our data to ensure that it meets user requirements and is suitable for its intended purposes. Indeed, the Code of Practice for Statistics states that ‘the quality of the statistics and data, including their accuracy and reliability, coherence and comparability, and timeliness and punctuality, should be monitored and reported regularly’.

We have previously undertaken an examination into the quality of the parental education field using the 2011 Census. Here, we seek to build on this by offering further evidence around the quality of this data.

Transferring provider – how does the process work?

When a student applies to study on a full-time undergraduate course in higher education, they will usually submit an online application form to UCAS. Alongside stating information such as their preferred subject and provider of study, individuals will be asked to (optionally) indicate whether any of their (step-) parents or guardians hold a higher education qualification. Parental education data is then supplied to providers, who will then submit this to the HESA student record.[5] Each year, there will be a small proportion of students who do not apply through this route. For example, some prospective students may only decide to enter higher education during the clearing period of the admissions cycle. Such individuals will generally contact their provider of choice at this time and if they have not utilised the UCAS system earlier in the process, they will be asked by the provider to complete a Record of Prior Acceptance (RPA). This involves the provider collecting a minimum amount of information from the student and then passing on this data to UCAS.[6] While providers are encouraged to collect parental education data from non-UCAS entrants, we cannot be certain how or whether the question is asked.

In some instances, students may begin a course and then subsequently decide that they wish to transfer provider (e.g. due to a change in their preferred career pathway). UCAS undergraduate admissions guidance states that, irrespective of the point of transfer, if a student is enrolled full-time at a specific provider, but then wishes to transfer to a full-time course at another provider, they should complete the same UCAS application form for a second time. Again, there may be a small proportion of individuals who fill in an RPA as part of their switch to another provider.

How can this assist with evaluating data quality?

It is likely that most individuals will transfer between providers within the space of a few years. Among some individuals, it may well be that their (step-) parents or guardians acquire new qualifications during this time, but one would expect this to constitute a low fraction of the population. For the vast majority of applicants, parental qualification levels are unlikely to change in such a short timeframe.[7]

We would therefore anticipate that most individuals who respond to the parental education question on two separate occasions in the UCAS form should give the same answer. If this were the case, this does not mean to say that the answer they have provided is necessarily accurate, as they could have given an incorrect response in both instances (either intentionally or unintentionally) and we have no means by which to verify the response supplied.[8] However, should we observe high levels of discrepancy between the two responses, this would lead to greater scepticism over the accuracy and reliability of the parental education field. In particular, there would be increased concern that there are individuals who are unwilling to give a true answer or that the design of the question is making it difficult for them to respond correctly, which would then need to be explored through further investigation.

A similar idea for assessing data quality was utilised in the 2011 Census. A few months after administering the main collection, the Office for National Statistics conducted the Census Quality Survey (CQS) in which a voluntary sample of respondents were asked to complete the questionnaire for a second time. The purpose of this was to develop an understanding of the potential accuracy of the responses submitted. During the CQS however, rather than being requested to self-complete the form, face-to-face interviews were carried out with participants, with there being evidence that this latter survey mode is more likely to elicit accurate responses (e.g. as an interviewer can assist with helping the respondent understand the question).[9]

Below, we illustrate how we have used a data linking approach to understand student journeys through higher education. This includes transfers between providers across different academic years, which opens up the possibility to assess the quality of some of our data fields, with our focus here being on parental education. We differ from the CQS exploration in that we cannot examine quality by using different modes to collect the same information and we are unable to ask students about what qualifications their (step-) parents or guardians held at a particular date in time. Also, while the CQS asks respondents about their own education, the UCAS application form requests prospective students to provide information about the level of qualifications attained by their parents.

Data and methods

We begin with the same HESA dataset we used to carry out our earlier work on the quality of the parental education field through linking to Census 2011 records. The population therefore encompasses the academic years 2011/12 to 2016/17 (inclusive) and we focus on UK (excluding Guernsey, Jersey and the Isle of Man) domiciled full-time first degree entrants aged 18 to 20 at the time of starting their course.[10] However, some of these individuals will not be entirely new entrants to higher education, as they may have, for example, transferred from one provider to another (and hence potentially completed the UCAS application form twice in doing so). We start by drawing upon a personal identifier field[11], which enables us to track the same student across academic years, to limit the dataset to the row that relates to the most recent entry into higher education for each individual.

We then put together a separate file, initially containing rows of all the times we find each student in our aforementioned HESA dataset between the academic years 2010/11 and 2016/17.[12] For example, if a student began their higher education journey at a provider in 2011/12, before transferring the following year to a different provider and then completing three further years of education, we would have four rows of data for this individual. The reason for creating this separate dataset is so that we can capture the first recorded occurrence of a student being in the first year of study of a full-time first degree programme, which we can then compare with the latest year to see which students have transferred providers. We therefore subsequently restrict this dataset such that we only obtain a single row for each individual in line with this criterion. As this also contains the same personal identifier contained within the dataset discussed in the previous paragraph, we then link the two sources together.

Figure 1. Illustration of linking data between HESA Student records

HESA Student record
An inidividual student appears in different years at two different providers with different information in the Parental education variable.
Student Academic year HE provider Year of study Parental education
Jane Bloggs 2013/14 University of Poppleton First year Unknown
Jane Bloggs 2014/15 University of Poppleton Second year Unknown
Jane Bloggs 2014/15 Poppleton Metropolitan First year Yes
Jane Bloggs 2015/16 Poppleton Metropolitan Second year Yes
Jane Bloggs 2016/17 Poppleton Metropolitan Third year Yes
Final dataset for analysis
Linked data holds the different responses to the Parental education question
Student Earliest record HE provider 1 Parental education 1 Latest first year record HE provider 2 Parental education 2
Jane Bloggs 2013/14 University of Poppleton Unknown 2014/15 Poppleton Metropolitan Yes

On combining these two datasets, we have a final file in which each row of data for an individual contains information relating to their earliest and latest record of entry into a full-time first degree course between the academic years 2010/11 and 2016/17. We proceed by firstly excluding any individual for whom their academic year of entry is the same in both the earliest and latest entry points. These students are likely to be those who have entered higher education for the first time and are therefore not part of the population of interest in this study, as they will have most probably submitted the UCAS application form only once. The dataset is then limited to those we have identified to have changed provider between academic years.[13] Furthermore, by assessing whether an individual has a valid UCAS application ID in both cases, we are able to restrict our data to those who are likely to have completed a UCAS application form on both occasions. Our final population size was 55,320.


Prospective students completing the UCAS application form are asked ‘Do any of your parents, step-parents or guardians have any higher education qualifications, such as a degree, diploma or certificate of higher education?’. Table 1 below provides the results from a cross-tabulation of the parental education fields in the earliest record we captured for the individual and the corresponding data in the latest extract. We note the following key findings.

  1. 67% of individuals who had changed provider are found to have given the same response in their earliest and latest records.[14]
  2. For students that initially did not declare the level of education attained by their (step-) parents or guardians (i.e. they stated that they didn’t know, refused to give this information or supplied no response), we see the majority subsequently give a ‘Yes’ or ‘No’ response.
  3. Among those that did give either a ‘Yes’ or ‘No’ response in their earliest data submission, 77% provide the same answer on the latest occasion.[15]
  4. The lack of agreement for the remaining 23% is predominantly due to the individual refusing to give this information, indicating that they did not know the answer or supplying no response at all.
  5. Indeed, focusing on the 69% of individuals who do state either ‘Yes’ or ‘No’ in both instances, we find that the agreement rate jumps to 91%.[16]

Further remarks

As stated earlier, the consistency of the responses among the sub-population explored in e) (i.e. those who state a ‘Yes’ or ‘No’ in both instances) does not conclusively mean that the responses given are accurate and reliable. However, this is a more reassuring result than one in which greater disparity was observed, which would be more indicative of poor data quality.

While we appreciate that the sample we are considering is small, one matter that our findings do suggest could be worth exploring in more detail is why such a high proportion of individuals move from providing no concrete answer to giving a ‘Yes’ or ‘No’ response. Just over 60% of students who initially stated that they did not know if their (step-) parents or guardians held a higher level qualification or refused to supply this information now give a ‘Yes’ or ‘No’ response, with the proportion being 73% among those who originally gave no answer at all. Additional investigation into this issue could bring the longer-term benefit of reducing the extent of ‘missing’ data, with completeness of the variable often one of the factors taken into consideration by policymakers, providers and researchers when determining the potential to use this data field.

This could be achieved through qualitative fieldwork and speaking to students about the reasons behind them responding more definitively to this question in the UCAS application form when they were seeking to transfer to another provider in the higher education sector. For example, if a key reason for the discrepancies was that individuals struggled to understand the rationale and meaning of the question at their initial point of entry, there may be value in carrying out additional research to see if this is a wider problem. If so, there may be a need to devise alternative ways of asking for this information in order to facilitate higher levels of response.

Given the importance of fields such as parental education to our users, we will be actively exploring other methods by which we can examine the quality of this data. In the meantime, any comments/feedback on this work are welcome and can be sent to [email protected].




[1] Tables T3a-T3e provide relevant statistics on this matter.

[5] See for further information on this process. Please note that it is possible that some providers may have done further collection/validation work on the parental education data before submitting the information to HESA.

[6] The exact data fields that must be collected through the Record of Prior Acceptance route can be found on page 22 of the following webpage - Page 19 details the rationale behind introducing the Record of Prior Acceptance.

[7] See, for example, for evidence of this. Using the British Household Panel Survey, the authors illustrate the limited change in the formal qualifications held by participants over a five-year window (Table 3).

[8] The potential for intentional and unintentional misreporting was highlighted in two recent reports on first-in-family students. See page 41 of and page 39 of for further information.

[10] We draw upon the first year marker to define entrants. As indicated on our definitions page - - this does not necessarily mean that the student is entering higher education for the first time.

[11] The HESA person identifier is a unique identifier derived for each UK student in the HESA Student and Student Alternative records using a combination of student identifiers, names, date of birth, postcode and other personal characteristics. Complex matching techniques have been used to create the identifiers which allow UK students to be tracked throughout their time in higher education.

[12] HESA have been tracking individuals through the Student record using a personal identifier field since the 2010/11 academic year. While we appreciate that some individuals may have started their higher education journeys in our dataset before 2010/11, as most individuals return to higher education within a few years, conducting our assessment from 2010/11 does not seem unreasonable in this instance.

[13] We have also taken into consideration mergers and de-mergers when assessing whether an individual has changed provider.

[14] Incidentally, the agreement rate for the highest qualification question in the 2011 CQS was 68% (see page 5 at, though as we note earlier in this insight, there are differences in the methodological approaches we deploy in our work and those utilised in the CQS.

[15] The calculation used to derive this value is (18,200 + 16,680) / (23,940 + 21,560).

[16] The calculation used to derive this value is (18,200 + 16,680) / (18,200 + 16,680 + 1,445 + 2,110).

Appendix: Descriptive statistics tables

Table 1: Cross-tabulation of initial parental education entry (row) versus latest parental education entry (column)
Figures are counts reported to the nearest 5.

  Latest parental education entry
Initial parental education entry Yes No Don't know Information refused No response given Total
Yes 18,200 2,110 995 1,345 1,285 23,940
No 1,445 16,680 1,130 1,225 1,080 21,560
Don't know 865 1,590 850 455 275 4,030
1,190 1,470 395 1,020 220 4,290
No response
540 555 125 60 210 1,495
Total 22,240 22,400 3,500 4,105 3,075 55,320

Table 2: Cross-tabulation of initial parental education entry (row) versus latest parental education entry (column)
Figures reported are proportions based on unrounded values.

  Latest parental education entry
Initial parental education entry Yes No Don't know Information refused No response given Total
Yes 76.0 8.8 4.2 5.6 5.4 100
No 6.7 77.4 5.2 5.7 5.0 100
Don't know 21.4 39.4 21.1 11.2 6.9 100
27.7 34.2 9.2 23.8 5.1 100
No response
36.2 37.2 8.4 4.1 14.1 100
Total 40.2 40.5 6.3 7.4 5.6 100
Siobhan Donnelly

Siobhan Donnelly

Lead Statistical Analyst
Jenny Bermingham

Jenny Bermingham

Principal Statistical Analyst
Tej Nathwani

Tej Nathwani

Principal Researcher (Economist)


See more research from HESA