Skip to main content

Sampling error and non-response error

Sampling error is the difference between a population value and an estimate based on a sample, and is one of the components of total survey error. It is normal for a quality report on a sample survey to offer a caveat explaining that, in principle, many random samples could be drawn and each would give different results, due to the fact that each sample would be made up of different people, who would give different answers to the questions asked. The spread of these results is the sampling variability. However, sampling error occurs because estimates are based on a sample rather than a census. As we have previously demonstrated, Graduate Outcomes is a population scale survey[1] where the sample is identical with the sampling frame, and the sampling frame resembles the population of interest very closely. While we know that the quality and availability of contact details must affect the response rate we can achieve from the sample, to develop a comprehensive measure of quality is a complex exercise in the absence of a perfect and accessible descriptor of quality. We are however making significant improvements in our understanding of the various facets of quality, as described in the Sampling frame data based on HESA data collections section. We aspire to provide response rates not just as a proportion of the target population but also as a proportion of the contactable population. Therefore, the response rate achieved is itself our present best indicator of the quality of contact details. Hence, our analytical focus in this section is on the extent to which the achieved sample is representative of the population. We therefore focus on non-response error.

This section comprises two subsections, which cover the strategies HESA has followed to limit the practical effects of missing responses. In conducting a survey, one of the main types of non-sampling error that can arise is that resulting from non-response. Whilst a lower level of response causes a reduction in the precision of obtained estimates, the impact of response rates on bias is ambiguous[2]. The two types of error in this category are unit non-response[3] and item non-response[4]. We cover issues related to these in the next two sections.

Unit non-response error

Unit non-response occurs where a graduate does not respond to the survey. A poor response rate will result in less precision in any estimates we generate. Its effect on bias is less certain. Bias is determined by two components[5]. These are the response rate, as well as the variation between respondent and non-respondent values. Hence, a better response rate can be associated with increased bias, if the discrepancy between those who respond to the survey and those who do not grows larger. Consequently, attempting to maximise response rates will not necessarily minimise non-response bias[6].

A number of elements of the survey design are intended to maximise response rates, and an overview is offered in the operational survey information on the HESA website[7]. These include:

  • A website aimed at respondents to reinforce the legitimacy and credentials of the survey[8]
  • A smartphone-optimised survey
  • Allowing the survey to be completed in more than one stage, whether online, at the telephone, or using a mixture of both modes
  • Bespoke email invitations and reminders that include the name of the graduate and their provider
  • A dynamic engagement strategy informed by best practice and survey paradata
  • Using a data collection platform that seamlessly integrates all modes together
  • The adoption of a concurrent mixed-mode design (computer-assisted telephone interviewing (CATI) starts a week after the online system opens, and those who start online are not followed up until much later in the field period)
  • Increasing the convenience of responding for graduates, by making appointments for telephone interviews at times that suit them
  • Collecting proxy responses from half-way through the fieldwork period.

For the rest of this section we cover the specifics of our approach where non-response bias is concerned. Root cause remediation is one of the practices HESA adopts to proactively manage data quality[9]. In this case, our goal was to reduce data quality issues arising during collection. Historically, organisations that have administered surveys have relied upon methods executed after collection (i.e. weighting) to deal with the challenge of non-response. Yet, over the last decade, those working in this area have increasingly looked at whether anything can also be done during the data gathering phase. Work by the Netherlands’ official statistics agency[10] points to the advantages in attempting to do this, such as improved precision due to less variable weights. In trying to reduce non-response bias, other authors highlight the potential benefit of developing propensity models and subsequently diverting more attention to those individuals with a lower likelihood of responding in the latter stages of the collection process[11]. An adaptive survey design methodology was therefore designed and implemented from cohort C of the first year of the survey, onwards. This is subject to a quarterly refinement process where opportunities for improvements to our response propensity model are identified and where possible implemented by analysts. Details of the practical approach to case prioritisation we take (based on our response propensity model) are covered in detail in the section of the Survey methodology covering data collection[12]. In summary, approximately halfway through a collection cycle, a logit model (consisting of student and course characteristics as independent variables) is created to generate individual response propensities. Additional resource and effort is then allocated to obtaining responses from those graduates identified as being least likely to partake in the survey. The objective of this exercise is to ensure not only higher response rates, but also to reduce possible non-response bias by aspiring to achieve a more representative sample.

We cannot, however, simply assume that the adaptive survey design will achieve its objective. The resulting data must be assessed and if necessary, action taken to address bias. This is referred to as “weighting” the survey. The overarching objective of weighting is to enable the sample to be adjusted such that it is more representative of the population[13]. Most surveys are weighted following collection. However, the Graduate Outcomes survey has some unusual features, such as a large sample size, an adaptive survey design, and a concurrent mixed-mode data collection approach. We therefore undertook a study to determine whether year one of the Graduate Outcomes survey should be weighted. The recommendation of this study was that weighting will not be applied to all statistics published by HESA for this first year (17/18) of survey data. Our analysis of the survey data did not identify any evidence of bias relating to mis-match between the achieved sample and graduate population characteristics in any direction at sector level. Indeed, when analysing across a range of demographic and course variables, we found a high level of similarity between the sample and population distributions. We trialled various weighting methods, and these did not improve the quality of our estimates. Unweighted and weighted estimates were generated at the overall level, as well as by key subgroups for each of three different weighting methods[14]. Overall, across the breadth of HESA variables analysed, we generally observe close resemblance between the sample and the population, reducing concerns over potential bias. For a summary of our research and the findings, see the Survey methodology section on data analysis[15]. Technical details of the study we undertook are also available in our research paper titled ‘Should we weight?’[16]. This paper offers a detailed account of how we reached the decision not to apply weighting for year one of the survey. It describes the research methodology and illustrates the results that were found from the analysis. The paper is mainly aimed at academics, statisticians, and other interested parties wishing to understand the weighting research and its conclusions. Included in appendices A and B of the research paper, are a series of tables and graphs that illustrate our findings in detail.

For the second year of the survey, HESA commissioned an external report from the Institute for Social and Economic Research (ISER) at the University of Essex. The research objective was to understand whether the application of statistical weighting to the Graduate Outcomes survey would effectively mitigate the consequences of non-response by Graduates. The researchers were further asked to assess what (if any) estimation method should be used, and why, along with details of the variables that should be used in any weighting approach, and the rationale for this. To facilitate this the researchers were given appropriately-controlled access to microdata, and asked to compare weighted and unweighted estimates for the whole sample, but also for subsamples by provider, by subject, by subject within provider, and by protected characteristics (including measures of disadvantage). Since Graduate Outcomes is a population-scale survey, any design weight assigned to individuals would be the same. However, we wanted to determine what the best approach (if any) would be for ensuring the sample matches to known population totals. We also required an investigation into whether a non-response adjustment is necessary. The brief for the investigation was not a replication of our previous work. Instead we wanted to explore the impact of weighting on estimates for the proportion in highly skilled employment and/or further study using the year one survey data and to take the opportunity to extend the work HESA have done using the proportion in employment and/or further study outcome. We also asked the researchers to test weighting approaches not previously examined by HESA researchers. In summary, the research found that weighting reduced the measurable error for only a minority of estimates, and where this was the case, the magnitude of the reduction was very small. Alternative models did not reveal substantially different results. There is therefore no advantage to be gained by using weighted estimation. This is a rather unusual finding for a survey (where the starting assumption is always that weighting will be required) which corroborates the similar finding we made based on the year one data. The full report from ISER is available on HESA’s website[17].

Some statistics published from the Graduate Outcomes survey are at a very granular level, e.g. activity by provider, domicile, level of qualification and mode of qualification. In some cases, the sample size for such statistics may be small. In these cases, the statistics may be subject to high levels of variability and a lack of statistical precision. Confidence intervals on these statistics (ranges within which we have a high level of confidence that the equivalent whole-population parameter would fall, where a narrow range indicates greater precision and a wide range indicates less precision) are, for key tables, published alongside the data.

In addition, for some statistics, it may be necessary to introduce publication thresholds whereby statistics based on very small sample sizes and/or lower response rates are suppressed – this will be explained in any statistical releases where this decision is taken[18].

Research to date therefore indicates there is no evidence of measurable non-response bias in the data. We are fortunate to be able to link to good data on population characteristics to support these assessments. The risk of non-response bias appears to have been minimised by the combination of relatively high response rates, and the adaptive survey design. Despite this, it is not easy to quantify the extent to which non-response bias remains a problem. There may be variables that we are not currently measuring that are more strongly correlated with unit nonresponse. As noted in ‘Should we weight?’ the Longitudinal Educational Outcomes data offers a suitable external source for analysis of bias, and undertaking this work forms part of our future plans. Survey paradata may also prove useful in this respect in future. Users of Graduate Outcomes microdata may wish to conduct their own analyses to ensure the Graduate Outcomes data supports their analytical objectives. However, users should be reassured that there is no evidence to suggest that measurable non-response bias is present in the Graduate Outcomes survey data.

Item non-response error

Item non-response occurs where a value for a particular variable is missing for a graduate, in a case where this observation was expected. In our survey, this typically occurs when respondents decline to answer particular questions. No single graduate is expected to answer all available survey questions. A routing structure directs respondents to particular sets of questions that are most relevant to their circumstances[19]. Furthermore, optional questions will not be presented to all respondents. So, some data will not be present, but this does not mean it is missing – it may never have been sought, as it was not relevant to be asked in that case. In HESA’s publications, these issues will be made clear in the data and the notes, for example by indicating the sample used to produce a table or chart in its title, and by enumerating the unknown values. Researchers and other microdata users in particular will need to note this feature of the survey.

A derived field (ZRESPSTATUS[20]) describes the status of response to the Graduate Outcomes survey for each graduate for whom some (however minimal) results data has been received. A core set of mandatory questions[21] are required to be completed for a response to be marked as completed. This field classifies responses into categories denoting various states of completeness. The terms ‘complete’ and ‘full response’[22] are used interchangeably to refer to those cases where all the questions requiring a response have been completed and are populated with an answer. In addition to responses classified as ‘survey completed’,[23] a status of ‘partially completed’[24] has been assigned where some of the core questions are missing but the first two questions have been answered.[25] Although partially completed responses do not contribute to the survey’s response rate targets, partially complete responses are used alongside ‘survey completed’ responses in statistical outputs. Again, data from such responses will appear in published statistics in the following ways: in tables with numbers, unknown values are shown for questions that were not answered. Wherever we display % values, we exclude unknowns from the calculations. The sample used will be clear in the title or accompanying text.

Just as unit non-response has the potential to introduce bias into overall survey results, item non-response can also introduce bias into estimates based on responses to specific questions which experience a relatively high proportion of survey drop-out. Where this non-response is non-randomly distributed for reasons such as question sensitivity and social desirability bias, it is important that patterns of non-response are well understood.[26] This would enable us to implement treatment plans to reduce non-response and therefore the risk of bias.

So far, we have observed a high completion and a very low drop out rate in Graduate Outcomes. Most people (more than 90%) who start responding to the Graduate Outcomes survey tend to complete it. This not only reduces the risk of item non-response, but it also reduces the requirement for interventions. HESA has started a program of work which is aimed at getting a better understanding of the characteristics of and reasons behind unit and item non-response, leading to the development and implementation of treatment plans where necessary and possible.

With regards to item non-response, we are currently prioritising the most sensitive questions in the survey which are prone to higher drop-out rates compared with other questions. Questions relating to the following topics have been shortlisted in the first instance: Job title, Salary, Employer’s name, Subjective well-being. The following table contains response rates for each of these questions in the 2018/19 survey.

Table 5 Response rates for sensitive questions, year two

Question/topic Response rate Base description
Job title (employment) 96.1% Graduates in or due to start employment
Job title (self-employment, business, portfolio) 93.4% Graduates in self-employment, running a business or developing a portfolio
Employer’s name (employment) 92.7% Graduates in or due to start employment
Employer’s name (self-employment, business) 91.5% Graduates in self-employment or running a business
Salary 93.0% Graduates in employment or self-employment
Subjective well-being 88.6% Graduates who have answered at least the first question

It is evident from the above table that most sensitive questions in the survey perform extremely well and are not likely to pose a threat to the robustness of data. Even though item non-response rates are low, in year three we are preparing to introduce a new functionality to the survey that would provide additional guidance to graduates who are unsure about responding to these sensitive questions. It will be in the form of information buttons that explain the reason behind collecting this information, with the aim of addressing respondents’ concerns. This information will be available on the self-completion as well as the CATI mode to ensure consistency across the two data collection systems. This is expected to reassure concerned respondents and hopefully reduce non-response rates even further.

The only survey question in the above table with a slightly lower response rate is subjective well-being. This question appears right at the end of the survey and as mentioned above the base population used to calculate the response rate comprises all individuals who answered at least the first question. There were plenty of ‘opportunities’ for respondents to discontinue the survey before they even reached the subjective well-being questions. The questions themselves are therefore not expected to have a detrimental impact on data quality if respondents did not even have a chance to see them. This is evident from the fact that of those who answer the minimum set of questions required for a complete response, 95.5% go on to answer at least one of the well-being questions (which appear after the minimum set in the survey).

This in turn raises the question around the method of calculating item non-response. So far, we have used a cautious method involving an assessment of the eligibility of a graduate to respond to a question based on their activities (i.e., response to the first question). This tends to be useful to analysts, as the activity groupings are most often used as the basis for various analyses of sub-groups. However, it is arguable that a more accurate method of calculation would incorporate response to the previous question and using that as the base. This poses some challenges given the complex routing structure within the survey. We continue to review our methodology and hope to publish more refined set of statistics in future. These would include non-sensitive questions which were excluded from the analysis so far.

Next: Proxy responses


[2] As Koch and Blohm (2016) note.

[3] This is where we are missing all observations for a case – this would mainly happen in situations where we are unable to elicit any response from a graduate.

[4] This is where we are missing some observations for a case – a common situation might be a graduate who answers the survey, but does not wish to answer some questions in the survey. We explain more about how we handle this sort of issue, in the following section.

[5] As Groves (2004) illustrates.

[6] Keeter et al (2000) and Curtin et al (2000) are examples of previous studies that have demonstrated the phenomenon of achieving both higher response rates and bias.

[9] Addressing quality issues closest to their source is generally the most efficient approach, and follows established data quality management principles (Data Management Association, 2017, p. 453).

[10] (Schouten & Shlomo, 2017)

[11] See Rosen et al. (2014) for details. The use of this approach has also been applied in a similar fashion by Peytchev et al (2010) and Wagner (2013).

[12] See https://www.hesa.ac.uk/data-and-analysis/graduates/methodology/data-collection (particularly the section on case prioritisation).

[13] The creation of weights can comprise of several components. First, the base weight refers to the probability that an individual is selected into the sample given the design of the survey. In Graduate Outcomes, we aim to send the survey to everyone in the sampling frame. We have not quantified how many people actually receive the survey. Second, a (unit) non-response weight may be generated, which seeks to account for the fact that participation may vary among different groups. In instances where information is available on the entire population, a final step would be to ensure that the weights can allow the sample data to match known population totals for a chosen set of categories.

[14] This included subject area, provider and subject area within providers, which tend to be groups of interest for different stakeholders across the sector (e.g. to help providers evaluate their performance and for prospective students considering what course to study). As policy matters in this area are devolved across the four nations, estimates were also produced by country of provider. Additionally, the Equality Act of 2010 requires public sector bodies to promote equal opportunity among individuals from all types of backgrounds. Consequently, we have also produced estimates by some of the key protected characteristics, such as age, ethnicity, disability and gender. Others such as marital status and gender reassignment were not covered, due to insufficient coverage in the data.

[18] Where suppression is applied, this will be done in line with the prevailing HESA statistical confidentiality policy (see https://www.hesa.ac.uk/about/regulation/official-statistics/confidentiality) and the associated rounding and suppression approach: https://www.hesa.ac.uk/about/regulation/data-protection/rounding-and-suppression-anonymise-statistics (summarised in the Confidentiality and disclosure control section of this report).

[19] A flow diagram showing the survey response record fields produced given each survey routing, is available in the coding manual: https://www.hesa.ac.uk/collection/c18072/download/Overall_Survey_Routing_Structure.pdf

[20] See the derived field specification at: https://www.hesa.ac.uk/collection/c18072/derived/zrespstatus

[21] Details of mandatory questions can be found as a PDF download from: https://www.hesa.ac.uk/innovation/outcomes/survey

[23] ZRESPSTATUS=04

[24] ZRESPSTATUS=03

[25] The observations gathered from the first two survey questions permit the derived field XACTIVITY to be produced – see https://www.hesa.ac.uk/collection/c18072/derived/xactivity. Since ‘activity’ is the Graduate Outcomes survey’s central concept, these responses are often partly usable.

[26] (De Leeuw, Hox and Huisman, 2003)