Open data and official statistics » Graduate Outcomes data » Survey methodology » Data collection

Data collection

Graduate Outcomes data, for a given academic year, is collected in four instalments, known as cohorts. Each cohort represents a group of graduates who completed their course during a certain period 15 months prior to start of data collection. Figure 1 outlines the data collection plan for 2020/21 collection year:

Figure 1: Data collection plan for the 2020/21 collection

Cohort	End date of course	Contact period (c. 15 months after the end date)	Census week (week commencing)
Cohort A	2020-08-01 to 2020-10-31	2021-12-01 to 2022-02-29	2021-12-01
Cohort B	2020-11-01 to 2021-01-31	2022-03-01 to 2022-05-31	2022-03-01
Cohort C	2021-02-01 to 2021-04-30	2022-06-01 to 2022-08-31	2022-06-01
Cohort D	2021-05-01 to 2021-07-31	2022-09-01 to 2022-11-30	2022-09-01

As not all graduates will have access to the internet (or a telephone), the survey adopts a mixed mode design to maximise contact with respondents. The primary modes of data collection in every cohort are web and telephone (except for non-EU overseas graduates who were only contacted via the online mode), with several strategies (outlined below) that look to maximise response rates. Postal surveys are also used for a small number of graduates with no other contact details except a residential address.

The two main modes of data collection interact with each other seamlessly in that respondents starting the survey on one mode can easily finish it on another, without having to start at the beginning. They are also able to access the survey online, multiple times, until they reach the end and submit all of their responses. Respondents can choose not to complete the survey over the phone, and, in such instances, interviewers can transfer a respondent to the online survey by sending a link to the survey via email instantaneously.

Online data collection

About

Data collection commences a few weeks prior to the start of a cohort with a pre-notification email sent to all graduates with an email address. The aim is to introduce the survey to prospective respondents and encourage them to look out for an invitation email. Once the cohort opens, an invitation email is sent to all graduates, using email addresses submitted by providers. The email contains a survey link that is unique to every graduate. This is followed by an SMS (usually the following day but it can take longer for larger cohorts) to UK mobile numbers only. All graduates therefore receive a form of invitation in the first week of data collection. Telephone follow-ups with all eligible non-respondents commence in the second week. Respondents who only partially complete the survey online are given a few weeks to complete it online before they are contacted by telephone.

Providers are able to return up to a maximum of three contact details of each type. Every single contact detail submitted and approved by providers is used to send emails and SMS messages

During the entire 13-week field period in each cohort, five to seven emails and SMS messages are sent to all non-responding graduates and those partially completing the survey. The exact number and timing of these reminders varies slightly from one cohort to another and is communicated on the engagement plan which is published for each cohort on our website.

Design Features

The first few years of Graduate Outcomes has seen the implementation of several enhancements during and in between cohorts. The objective of these enhancements has always been the improvement of data quality and/or effectiveness of the data collection instrument which in turn leads to higher response rates. Some of the enhancements include:

Trialling email and SMS delivery on different days and time of day. Using paradata to inform future deliveries.
Recognising respondents who may have partially completed the survey, through targeted emails and SMS messages.
Using SMS messages flexibly as a prompt or to encourage a direct response.
Use of hover texts and information buttons to provide non-intrusive guidance to respondents.
Use of pre-notification or “warm up” emails to prospective respondents, before the start of data collection. This was implemented for the first time in cohort D in the 17/18 collection year. All graduates with approved contact details in this cohort received a pre-notification (warm-up) email at least a week before they received the first invitation. The purpose of this exercise was to improve the take up of online data completion and to ‘warm-up’ our IP addresses, to raise their recognition as legitimate by the information security utilised by the service providers that respondents receive notifications on e.g. gmail and microsoft.

We have taken steps to risk-assess these improvements prior to implementation to minimise any likely impact on bias in the survey. Balancing the potential improvements in response rates and data quality with assessed risk of bias has been a key consideration, but in the case of all improvements implemented, we believe the balance of benefits has been compelling.

View the emails used in the engagement strategy and survey materials at Suggested graduate contact plan.

Telephone data collection

About

Telephone interviewing usually commences in week two of field work. In year 4 we implemented a ‘soft launch’ whereby graduates with no email addresses but a valid phone number are called in the first week as that is the only mode of data collection available for them. In year three, calls to non-EU overseas graduates commenced in the last week of October. They continued to receive emails and SMSs (where a UK mobile number was present). This was done to redirect expenditure towards UK domiciled graduates, the primary target group of interest to our users.

Calls are handled using an auto-dialler that randomly selects respondents from the entire sample and connects them to an available interviewer. Depending on the outcome of the call, it is marked as a complete, incomplete or refusal. An incomplete status is further classified according to the nature of the call and its outcome, for example, ‘no reply’, ‘busy’, ‘answer phone’, etc. To try and maximise response rates, interviewers are also able to book appointments with respondents if they wish to be contacted on certain days or time of day.

As with email addresses and mobile numbers, a graduate can have up to three UK landline and international numbers. All numbers are used to contact respondents and collect a valid response. Once a number has been used to make direct contact with a graduate, it is marked as ‘successful’ and used in all subsequent attempts. As advised by our contact centre, mobile numbers are likely to be more unique to the graduates, therefore they are used before landline and international numbers.

Geo-dialling

The contact centre operates using a geo-dialling system, whereby the geographical location of providers is taken into consideration. Graduates are presented with a familiar area code, increasing the likelihood of them answering a call rather than ignoring or rejecting it as they might do from an unknown/unrecognisable number. This approach is supported more generally by existing best practice within the Market Research sector. As well as increasing the likelihood of graduates picking up the phone, it also dilutes the risk of a single number becoming backlisted.

Despite the benefits of a geo-dialling system, the use of phone numbers that are visible but unknown to respondents does increase the likelihood that they will repeatedly ignore or even bar the calls, especially where they are called multiple times from the same number. It was therefore vital to consider any steps that could be taken to reduce this behaviour, with a view to increasing levels of response. Regularly changing the telephone numbers during the fieldwork period mitigates this risk.

Following recommendation from our contact centre this approach was adapted with the use of mobile numbers alongside that of geolocated landline numbers to call graduates. This led to a significant improvement in pick up rates. The change was implemented in the last two weeks of cohort D.

Third-party interviewing

During the second half of the field period, interviewers are advised to collect responses from third parties, where possible, and where a suitable proxy respondent (defined as a partner, relative, carer or close friend) is available. Only the mandatory questions are asked, and subjective questions are excluded.

Interviewer training and development

To minimise interviewer error, the contact centre undertakes an extensive training exercise to train their interviewers on Graduate Outcomes. HESA worked with the contact centre to compile a set of guidance notes and training materials on every question in the survey. The training covers practical, theoretical and technical aspects of the job requirements. For quality control purposes, team leaders provide ongoing support throughout, enhancing interviewer skills and coaching around areas for improvement. This is carried out through top-up sessions, structured de-briefs and shorter knowledge sharing initiatives about ‘what works’.

All interviewers receive a detailed briefing upon commencing interviewing, covering the purpose of the survey, data requirements (for example level of detail needed in certain free-text questions), running through each survey question, and pointing out areas of potential difficulty so objections and questions can be handled appropriately and sensitively.

Making calls and scripting

Interviewers are randomly allocated to respondents by the telephone dialler. This reduces the risk of creating interviewer-respondent clusters based on common characteristics. The only exception to this rule is the employment of Welsh speaking interviewers who are allocated to Welsh speaking respondents only.

Interviewers introduce the Graduate Outcomes survey as the reason for the call and state they are calling on behalf of the provider for the particular graduate. If asked for further information, they will explain that they are from a research agency that has been appointed by HESA to carry out this work. If required, the interviewer can also advise that the survey has been commissioned by the UK higher education funding and regulatory bodies.

All interviews are recorded digitally to keep an accurate record of interviews. A minimum of 5% of each interviewer's calls are reviewed in full by a team leader. Quality control reviews are all documented using a series of scores. Should an interviewer have below acceptable scores, this will be discussed with them, an action plan agreed and signed, and their work further quality controlled. Team leaders rigorously check for tone/technique, data quality, and conduct around data protection and information security.

Recontacting graduates

Some of the data collected on the survey is coded by an external supplier, using national industry and occupational coding frameworks. Where they are unable to code verbatim responses, these are returned to the contact centre who try and supply more detailed responses by listening back to the interview and where necessary calling the graduate again.

HESA collects regular feedback from interviewers on the handling of different questions and respondents with the aim of identifying survey or script modifications.

Postal data collection

A third and final mode of data collection used in Graduate Outcomes is postal. Under exceptional circumstances, where a higher education provider is unable to supply email addresses or phone numbers for graduates, survey questionnaires are sent by post to the residential address supplied by the provider. The number of records with only residential addresses is not permitted to exceed 5% of a provider’s population in a given cohort.

The postal survey is a much shorter questionnaire, containing only a subset of the core survey questions that are required as a minimum to produce the main outputs. This is largely done to keep the survey short and minimise the level of navigation required due to routing. So far, the requirement for postal surveys has been minimal across all cohorts and approximately 10% of recipients have returned a completed questionnaire. Data from completed surveys is manually entered into the system by HESA.

Opt-outs

Graduates are able to opt out from the survey and any further communication through a number of different channels. The email invitations and online survey instrument provide access to information and a direct 'unsubscribe' link to opt-out. Respondents can contact HESA at any point to request an opt-out or deletion of their survey data or contact details as per their rights under GDPR (this extends after the survey closes up to a fixed point which is outlined on the privacy notice).

Respondents can also refuse to take part in the survey over the phone, and interviewers are trained to handle such requests.

Graduates can also get in touch with their providers to request an opt-out. Such requests are redirected to HESA for formal action. Respondents who opt out are marked as such on the survey data collection system, and all future communications cease within five working days from receipt of the request.

Case prioritisation

While achieving a higher response rate can improve the precision of estimates, the impact this will have on bias is ambiguous. This is because non-response bias depends not only on the level of response, but also on the discrepancy between respondent and non-respondent values. As the latter component can continue to widen as more individuals complete the survey, a better response rate will not necessarily solve the problem of bias. It has generally been the case that the post-collection procedure of weighting is applied as a solution to this issue (for details on weighting refer to Data analysis section of this report). However, rather than simply relying on this technique on its own, it was concluded that trying to additionally address bias during the data gathering phase could bring supplementary benefits (e.g., less variable weights).

Consequently, for cohort C and D in year one, a case prioritisation approach was introduced.

While in theory case prioritisation should work because its basic premise is well established, it is the view of the survey data collection team that our approach to prioritisation is ineffective and burdensome. The following paragraphs outline some of the flaws with this approach and findings from the last three years which highlight these concerns.

The aim of case prioritisation is to somehow encourage those least likely to respond (referred to as priority group) to take part in the survey. Our method of encouraging the priority group to participate is simply by trying to call them a greater number of times compared with the rest of the sample. In practice this is achieved by placing these graduates into a specific ‘team’ within the call management system. A small number of interviewers are assigned to call this team throughout the duration of case prioritisation. The rest of the sample also continues to receive calls during this time but at a slower pace, or at least that’s the expectation.

Practical constraints

The number of times the priority group will be called depends on the number of interviewers assigned to this group, the number of graduates in the group, number of valid telephone numbers available for each graduate, call outcomes of every graduate in the group and their participation in the online survey. Most of these characteristics cannot be pre-determined and some of them change during the course of surveying.

Assigning too many interviewers could result in over-calling of this small sample (and risk of complaints) and assigning too few would result in under-calling. This makes the management of this group a highly manual and imperfect process. There is no guarantee that we will ever get the assignment right and this risk materialized in cohort D last year. Our analysis shows that on average the priority group received slightly fewer calls than the non-priority group. On the other hand, in cohort A last year the priority group received on average twice as many calls as the non-priority group. This led us to conclude that case prioritisation did not get implemented in cohort D because the group was no better off in terms of number of calls they received, in fact if anything it was slightly disadvantaged as a result of being placed in a separate team.

Representativeness of the responding sample

The aim of case prioritisation is to reduce non-response bias and therefore increase representativeness of the responding sample compared with the overall population.

Having looked at the representativeness of the sample prior to and at the end of CPcase prioritisation, it is clear that case prioritisationit is not having a material impact on representativeness of the responding sample.

In cohort D year 3, for all characteristics under observation (age, sex, mode of study, level of study and degree classification) the responding sample became more representative of the population despite the fact that case prioritisation did not work as intended.

For reasons described in the next section case prioritisation was not implemented in Cohort A Year 4 and we found no pattern of reduced representativeness in the responding sample compared with the same cohort in the previous year.

Furthermore, where case prioritisation does have the desired effect on call management there are instances where the sample becomes less representative of the overall population. Generally any differences in representativeness prior to and after case prioritisation are extremely small (less than 0.5% points in either direction).

Statistical assessment of non-response bias – Year 3

It was noted at the end of Year 3 that there was no need to conduct statistical weighting of the outputs from Graduate Outcomes as there didn’t exist significant evidence requiring an adjustment for non-response bias. Cohort D being the largest cohort (contributing 70% of the responses) has a strong influence on overall survey results. It is evident that the absence of case prioritisation in this cohort did not have a material impact on non-response bias and areour decision to not weight the data.

Changes to scope and setup of CATI

The removal of international calling at the start of Year 4 and a reduction in the maximum number of times a non-responding graduate is called, from 20 to 15, has meant a more even distribution of calls across all graduates. As a result we saw all non-responding graduates receive on average 15 calls in cohort A and 14 in cohorts B and C, in year 4. This made case prioritisation completely irrelevant as the priority group had as much a chance of receiving maximum calls as the remaining sample. Seeing how quickly interviewers were getting through the sample we decided not to implement CPcase prioritisation in cohorts A-C to avoid the risk of over-calling over a short period of time.

Due to the size of the sample in cohort D, we expect a smaller proportion of graduates to receive maximum calls but the reduction in maximum try count will still have a noticeable impact on average calls across the entire non-responding sample. It takes around 4 calls on average for a successful interview and at the time of writing this note majority of sample had received a lot more than 4 calls on average.

Impact on overall response rates

Assigning interviewers to a small group of graduates least likely to respond takes them away from the rest of the sample containing several graduates who are more likely to respond. While reducing non-response bias is absolutely essential, trying to do so with an activity with very limited (or no) benefit is not an effective use of resources.

Welsh language requirements

HESA is committed to providing access to Graduate Outcomes in Welsh, recognising the importance of ensuring Welsh speakers are not treated disadvantageously in comparison to English speaking graduates. Working alongside the Welsh funding and regulatory body, we have contracted with a partner organisation to undertake all English to Welsh translation work for Graduate Outcomes. This includes the logo, Graduate Outcomes website, the survey, script, results, email and SMS text. All communications are offered in Welsh, English or bilingual modes, depending on a graduate’s ability to speak fluently in Welsh.

Data collection and Our response to the COVID-19 situation

With Covid continuing to impact individuals around the world during year three, the adjustments implemented in the previous year remained in place. Our focus was on ensuring that graduates could self-administer the survey to accurately reflect their personal situation and that interviewers could support participants sensitively and appropriately. The following survey changes were implemented in year two and remained in place for year three:

Furloughed staff

To ensure that respondents who are furloughed under the new Government scheme (remaining technically employed) select the correct option at ‘What activities were you doing in [CENSUS WEEK]’, we added additional text to code 01 (Paid work) to clarify that this does include furloughed employees.

For year 4, the above changes were removed following the alleviation of the COVID-19 pandemic and its restrictions.

Previous: Telephone survey design Next: Data processing

Search form

Data collection

Online data collection

About

Design Features

Telephone data collection

About

Geo-dialling

Third-party interviewing

Interviewer training and development

Making calls and scripting

Recontacting graduates

Postal data collection

Opt-outs

Case prioritisation

Practical constraints

Representativeness of the responding sample

Statistical assessment of non-response bias – Year 3

Changes to scope and setup of CATI

Impact on overall response rates

Welsh language requirements

Data collection and Our response to the COVID-19 situation

Furloughed staff