Skip to main content

Accuracy and reliability

In this section we evaluate the closeness between the estimated results produced from the survey and the (unknown) true value. The design of Graduate Outcomes minimises the possibility for sampling error, due to the comprehensive approach taken to surveying all cases available to be contacted from the sampling frame. We therefore start by describing the sampling frame, and how we maintain it, also describing the close resemblance of the sample to the sampling frame. We then go on to concentrate on various forms of non-sampling error in the subsequent subsections, including:

  • coverage error
  • non-response error
  • measurement error
  • processing error.

The sampling frame, and how it is maintained

The Graduate Outcomes survey aims to survey the population of graduates from Higher Education (HE), and the survey employs a dynamic sampling frame that is kept up to date when source data changes. The source data is a list of data about individual graduates drawn from existing administrative census datasets about students. These sources are enriched with contact details sourced from the providers where those graduates studied. Below, we cover these two separate aspects of how the sampling frame is constructed. The Survey methodology section on the sampling frame offers an overview of this area[1]. We present additional information in the following paragraphs.

The sampling frame has been developed utilising the main administrative data sources for HE provision in HE settings across the UK[2], and for college HE in all parts of the UK except Scotland[3]. These data sources each support existing official statistics publications, so our initial assumption is that they are of high quality and fit for their purposes. The sampling frame is drawn from this administrative data, according to the criteria set out in the coverage statement for the Graduate Outcomes Contact Details record[4] which we summarise in the Survey methodology section on survey coverage[5]. The (separate) coverage statement for the Graduate Outcomes Survey Results record explains further detail of this[6]. The following subsection summarises this information, and provides additional commentary, starting with the main processes utilised for all data sourced from HE providers, by HESA. In the subsection after that, we cover how we derive the sampling frame related to college HE settings.

Sampling frame data based on HESA data collections

The majority of data used to determine the sampling frame is collected by HESA. HESA collects individualised data on students in HE providers across the whole UK in its Student record and Student Alternative record (referred to hereinafter as the “Student record(s)”, for brevity). Data from these records is an administrative census: their goal is to enumerate the HE student population and describe their personal and study characteristics. The data on qualifiers contained in the Student records is the most complete single record of graduates from HE available. The Student records are the primary official record of UK HE, and are principally collected on behalf of the UK Government, the Devolved Administrations, and the Office for Students[7]. HESA collects this data annually, from a constituency of HE providers that is refreshed at least annually – referred to by HESA as ‘reporting providers’. This covers all publicly-funded and/or regulated HE providers in the UK. The HESA Student records for the 2018/19 academic year were used in the creation of the sampling frame for the second year of the Graduate Outcomes survey.

The sampling frame comprises all students reported to HESA or the relevant body as obtaining relevant higher education qualifications during the reporting period 01 August to 31 July, and whose study was full-time or part-time (including sandwich students and those writing-up theses). Graduates with awards from dormant status are only included in the target population for postgraduate research students. Graduates with some qualifications are excluded from the sampling frame, principally because their work and study destinations are already captured by other data sources. These include intercalated degrees, awards to visiting students, students on post-registration health and social care courses, and professional qualifications for serving school teachers[8].

Exceptionally, issues may be found in the source administrative data, that, when corrected through the data amendments process (also termed the fixed database facility) have the effect of altering the sampling frame[9]. Up to the dates specified in the coding manual (which overlap with the contact period substantially) changes made to the sampling frame via the fixed database are reflected in the “population file” that is passed to the provider through an online electronic portal for providers (hereinafter, ‘the Portal’), so that additional contact details can be gathered. This would be necessary, for example, if the fixed database change increases the sampling frame data for a provider, by inserting previously missing records. Furthermore, the data that is published (including response rates in relation to targets) always reflects the most up-to-date sampling frame available from the fixed database at the time of production. This means that even if over-sampling has occurred (because a fixed database change removes graduates from the sampling frame, in cases where responses have already been gathered, successfully) then these results would also be discarded from the output file.

In order to derive the sample, and to obtain their contact details, information about the sampling frame is passed back to the HE providers, through the Portal. The goal is to maximise the availability of usable contact details for use during data collection. A full data collection process exists to support this activity, and it is specified in detail in the coding manual for the Graduate Outcomes Contact Details record[10]. This document explains the collection schedule and the data items collected, and gives information to support interactions with graduates – an engagement strategy is defined by HESA and roles and responsibilities are shared with HE providers[11]. The coding manual also gives details of the quality assurance regime (automated and manual) along with other guidance and training materials on the systems and processes operated via the Graduate Outcomes provider portal.

In the provider portal, providers are presented with an output file showing graduates from the sampling frame drawn from the providers’ own data (collected previously) and are asked to populate and upload an XML file with contact details. Detailed guidance and training is offered on data quality expectations and using the tools provided[12]. The provider portal enables HE providers to act as peers in the quality assurance process, and HESA’s system logs show interaction with the Portal has reduced as providers normalise their use of the tool following initial teething/experimentation. This complements increased use of the web-based update facility (mainly used by smaller providers).

Table 3 Portal usage statistics

Year of survey Providers attempting upload File uploads attempted Providers successfully uploading files Successful file uploads
First year (17071) 190 5,437 176 1,462
Second year (18071) 193 3,448 181 1,259

On submission, checks are undertaken by HESA to identify any problems with various quality dimensions of the data[13]: validity[14], uniqueness[15], completeness[16], and consistency[17]. Further information about the 51 automated rules applied consistently during the second year of operation is available online in the quality rules directory[18]. While new rules can be added in response to feedback from survey operations, no changes were required during the second year of operation. Version control is applied to all aspects of the coding manual and quality rules, allowing analysts to see which rules were introduced at which points.

The quality regime seeks to maximise the number of usable details available for contact. Where quality rules are triggered, providers must either update the data, or contact HESA to request that the rule be ‘switched-off’ for that observation. This process is managed by HESA’s Liaison team who have oversight of these operational data quality issues. We do not directly assess the accuracy[19] of the contact details – our current checks do not determine if the contact details provided belong to the graduate. Providers must therefore warrant the accuracy of the data and fitness for purpose for use of the contact details, on submission. The head of the provider also affirms compliance with the (supply side) Code of Practice for Data Collection[20]. Providers’ interactions with HESA also form part of their internal audit and compliance mechanisms, which are typically overseen by their governing bodies.

At this point, we will summarise the quality characteristics of the contact details. Quality of contact details is measured primarily in terms of coverage or completeness of record and validity. The following table demonstrates that coverage has largely remained constant between years 1 and 2 with a slight deterioration in the number of graduates without a phone number:

Table 4 Quality characteristics of contact details

Type of contact details % with no contact details % with email only % with UK Landline or International number only % with UK mobile but no email % of grads with email and number
Year 1 0.2% 2.6% 0.2% 0.8% 95.9%
Year 2 0.2% 5.4% 0.3% 0.8% 93.3%

Throughout year two we have been reporting on the validity of contact details through the end-of-cohort review reports[21]. Based on our evaluation of the quality of contact details over the past two years we recently published a blog aimed at providers with a view to highlight the most common issues and their impact on our ability to make contact with graduates and collect responses[22]. This has led to the introduction of a series of internal checks which are regularly carried out on contact details and the feedback is shared directly with providers who have submitted relatively low-quality contact details compared to the rest of the sector. In practice, some contact details prove unavailable. A few graduates do not keep in touch with their HE providers and accurate contact details held for them can become out of date. Providers are encouraged to stay in touch with their graduates through different means, enabling them to supply good quality contact details in time for the survey 15 months later.

During the second year of the survey, the automated quality assurance of contact details has remained largely consistent with the approach established by the end of the first year of operations. The main additions to the process were guidance about the utilization of providers’ own email addresses, the use of a postcode validator, and the implementation of a new functionality for monitoring the quality of mobile numbers used for SMS delivery. Details of the quality rules we utilized during construction of the elements of the sampling frame that are drawn from the HESA Student records is available within the quality rules directory in the coding manual[23].

During the contact details collection process, HE providers are also able to supply additional information that allows HESA to exclude graduates from the surveyable population, for example if they have become seriously ill, or have died, since graduating. During the first year of the survey, we had excluded graduates whose providers had told us they were dead or seriously ill from the sample entirely. However, following reflection on the appropriateness of this analytical choice we determined that we should adopt a different approach from the second year. These graduates are in the population of interest and in the sampling frame so we do not wish to ignore them, however we must respect the ethical choice of providers in their decision not to pass on contact details in such circumstances. Nevertheless, providers cannot possess perfect knowledge of the health outcomes of graduates, and we found that we discovered cases where the graduate had died or become seriously ill through surveying. In some cases, we even elicited a response from seriously ill graduates. Given that the rates of serious illness and death among recent graduates appears to be very low, our approach here would be unlikely to have material impacts on our outputs, or on end users. The total number of graduates excluded in this way from the surveyable population for the first year was 150[24]. We note that a further 270 graduates were discovered to have died, or become seriously ill, during the collection of survey responses (this information was not independently verified). Subsequent investigation revealed a further 3 cases that had been excluded from the college HE data in England (one in Cohort C, two in Cohort D) and it was confirmed that there were no cases in Northern Ireland. The main impacts would be on the response rates of very small providers, but this is an insufficient argument, and since we anticipate that the distribution of these cases will be random, there is no reason to expect smaller providers to be affected disproportionately. We therefore determined that the appropriate approach would be to simply treat these graduates as a part of the sample, and where no contact details are provided, they are therefore treated as a non-respondent. We are still able to gather information from providers about their reasons for not including contact details in such cases, but the sample has now been aligned with the sampling frame, with the result that the survey is more inclusive and analysis is more straightforward.

Timeliness of the data in the sampling frame is a central consideration. The collection of contact details follows four phases, each aligned to one of the four cohorts (A, B, C, and D). Comprehensive information aimed at HE providers is published about timescales for collection activities[25]. Because the survey takes place approximately 15 months following course completion, allowance has to be made for changes of circumstance following this. Contact details are therefore collected during a period when the provider has had maximum opportunity to ensure they are as up-to-date as possible.

Sampling frame data based on other ingested data

A minority of HE study takes place in further education (FE) settings[26]. We use the term ‘college HE’ to refer to this provision. HESA collects data about college HE students in Wales as part of its Student record (the process for this is the same as for the other data described in the paragraphs following this one). In England, Northern Ireland, and Scotland, college HE data is collected by other bodies[27]. Given the prevalence and success of articulation agreements, graduates from college HE in Scotland are excluded from the survey coverage[28]. HESA ingests data about college HE students from the administrative records collected in England and Northern Ireland. This data, along with, in England, contact details found within these administrative records, is provided to HESA in a timely manner by the relevant bodies, in order to permit these college HE graduates to be contacted during the normal operation of the survey. Where contact details are not provided, or where the FE provider is able to source improved contact details, a Portal-based collection process identical to the one described in the previous section is employed to permit this. We do not describe the quality processes followed in the construction of these administrative records here, but we do provide supporting information for Further Education Colleges (FECs) in England and Northern Ireland[29]. College HE data collectors tend to see a record for each qualification aim separately, and hence they have to exercise judgement about when a qualification aim is ‘nested’ within a larger aim, and when it is suitable for driving survey coverage. Such matters are handled by skilled professionals, but they prudently acknowledge that there is a small risk of undercoverage or overcoverage occurring in situations such as unusual personal circumstances of a student, or where a qualification is unfamiliar. Further details should be sought from the data collectors (see footnote [27]).

Next: How does the sampling frame relate to the population?

[2] These are the HESA Student record(s) described in detail further on. See for the data published from these records.

[3] The detail is covered later on, in the Sampling frame data based on other ingested data section.

[5] For further information about the survey coverage, see the relevant section in the Survey methodology:

[7] HESA’s Collection Notice for its Student record details the statutory background for this. The coverage statement for the Student record (2018/19) utilised in creating the sampling frame gives details on which students are included in the record: The equivalent statement for the Student Alternative record is here:

[8] Full details of exclusions are available at:

[9] For details of the financial impact and regulatory authorisation needed to make a change to the previously-submitted data (to amend the fixed database) see

[11] This engagement plan is detailed in the information provided on the operational management of the survey. See
Communications resources are here:
Roles and responsibilities are here:

[12] See for an accessible overview. For full information about types of contact details we accept and other best practice see the Portal user guide, at:

[13] HESA’s approach to data quality management during collection rests partly on the quality dimensions specified in the DAMA DMBOK. See (DAMA UK Working Group on “Data Quality Dimensions”, 2013) (For outputs, HESA uses the ESS dimensions.)

[14] E.g. telephone numbers consist of digits.

[15] E.g. identifying graduates with duplicate email addresses or telephone numbers.

[16] E.g. that most graduates in the sampling frame have some contact details.

[17] E.g. that a variety of different contact methods have been given, and they are not all, for example, comprised entirely of the provider’s own ‘email for life’ address (where this exists) for each graduate.

[19] E.g. Properly-formed contact details could theoretically pass our checks, without necessarily belonging to the respondent we hope to reach.

[24] The prevailing HESA approach to rounding and anonymisation has been applied to these and all other specific figures about people included in this report; full details can be found at

[26] To summarise, in 2017/18, FE providers accounted for 0.5% of the UK’s total postgraduate enrolments, 1.4% of the UK’s total first degree enrolments, and 47.8% of the UK’s “other undergraduate” enrolments. For detailed figures and explanatory notes, see

[27] In England, the Individualised Learner Record (ILR) is collected by the Education and Skills Funding Agency (ESFA). In Northern Ireland, the Assembly mandates the collection of the Consolidated Data Return (CDR) of which an extract is supplied to HESA by the Department for the Economy (Northern Ireland). In Scotland, the government mandates the collection of the Further Education Statistics record (FES). However, the college HE activity in Scotland, collected in the FES, is not within coverage for the Graduate Outcomes survey.

[29] For FECs in England, see:
For FECs in Northern Ireland, see: FECs in Wales are longer-standing HESA subscribers, and information for them is consistent with the general information sources, here: and elsewhere.