Outcomes of 17/18 SOC coding assessment
Quick links: Scale of provider feedback | Our process | The outcome | Problems encountered | Lessons learned | Next steps | Thank you
As you may know, the SOC coding for the Graduate Outcomes survey is carried out centrally following a standard process, rather than by providers on a distributed basis, as was the case with DLHE. This change was a key recommendation of the NewDLHE consultation, and ensures that the coding of responses is applied consistently, so that data users can be better assured that there is no coding bias and that the data is consistent and comparable.
Providers have had the opportunity to review their data including the draft SOC (occupation) coding. HESA extended the review period from 6 December to 6 January 2020 to allow time for more feedback from providers, and we have spent the last few weeks compiling the feedback and reviewing its impact on the data classification of C17072 collection results.
We would like to express our sincere thanks to all providers for their cooperation and patience during this process and we are now writing to provide an overview of the process, the action we’ve taken, and our next steps.
As we have noted in previous communications, we received 2,500 queries from 90 providers, and it was a significant undertaking to review and action feedback.
The dissemination timetable included quality assurance activity consisting of two rounds of manual checking and consistency checking at the end of the collection. The volume of provider feedback was higher than anticipated, and HESA has worked to ensure that every area of feedback has been properly considered and actioned as applicable. Getting this right has meant that we are behind schedule in finalising SOC coding, consequently the final data delivery timeline has been delayed for Statutory Customers and providers: View the latest timeline for data dissemination.
Although we received feedback from only a sub-set of providers, it’s important to stress that any changes to SOC coding resulting from either provider feedback or HESA assurance work have been applied consistently across the entire collection. Therefore, even if your provider did not feedback directly, your data will have gone through the same assurance process, and seen the same changes made as everyone else.
The following processes were actioned for the complete review of SOC coding:
Assessment of provider feedback
All the provider feedback received has been individually reviewed and tracked. To do this, we have gone through a significant assessment process. HESA reviewed all 2,500 cases and placed them into one of the following four categories:
|Systemic||Where the error is widespread and there is a clear pattern of miscoding||Passed to Oblong for further review and amendment if necessary|
|Non-systemic||Isolated cases||Not passed to Oblong|
|Inconsistent||Where multiple records in an occupation group are coded inconsistently with no obvious pattern||Passed to Oblong for further review and amendment if necessary|
|Not actionable||No basis or evidence exists for coding to be changed||Not passed to Oblong|
In identifying systemic and non-systemic issues, HESA reviewed the entire dataset of 300,000+ records and not just the ones reported by providers. In other words, the review process was provider-neutral. Not only did it enable the identification of systemic issues, it countered the issue of lack of feedback from the remaining 300 providers, some of whom may not have found any issues.
It is very likely that the records identified as inconsistent would have been rectified by Oblong naturally through their consistency checks. The timing of this review meant they had not already carried out these checks (see below).
Planned consistency checking
In parallel, Oblong have been completing their quality assurance and consistency checking of the entire dataset, which is now complete. This is to make sure all four cohorts are coded consistently, and any changes made during one cohort are applied across all.
Once the dataset had been quality assured by Oblong, it was once again checked by HESA to ensure that any systemic issues had been addressed. In addition, a random sample of just under 10,000 records was examined to confirm that no other systemic issues had been missed and to guarantee the effectiveness of the consistency checking process.
As a result of our assessment process, HESA had identified 73 occupations to be reviewed by Oblong. Oblong made the final recommendation on whether and how the coding needs to be corrected, given their knowledge and expertise. They have been simultaneously working on the consistency checking process which in some cases identified a number of the same issues.
Of these, 28 occupation groups were confirmed as systemic issues, 38 were inconsistently coded and 7 were not actionable. In addition to these, Oblong have made several other changes following consistency checks. As a result of both processes combined, 8% of records have had SOC codes changed. 59% of these changes constitute a change from one major group to another, the rest are lateral changes within major groups.
Where a group has been identified as having systemic issues, it is not necessary for all records within the group to be recoded to the same SOC code. A large occupation group often represents multiple variations, some leading to different SOC codes.
As we have shared throughout this process, we have focused this activity on systemic issues only. However, we have found that a significant number (over two-thirds) of the challenges received from providers are either non-systemic or not actionable. We understand that all and any errors found have a direct impact for an individual provider, however, we have to balance this view with the nature of the centralised model and the disproportionate effort required to check (and correct if applicable) every single record in the entire dataset.
We would like to share a number of other observations that we hope will clarify the process for year two and explain HESA’s position on a number of aspects:
Requests to change survey responses
We were approached by some providers to make changes to graduates’ actual survey responses where providers have determined that a graduate either misinterpreted the question or the supplied answer is inaccurate (based on assumptions made by the provider). A fundamental principle of HESA and Graduate Outcomes is that we do not make changes to data returned by respondents. In the absence of unbiased contradictory evidence, answers given in the survey are deemed to be true from the perspective of that graduate.
We are only able to determine coding based on the information supplied by the graduate to the necessary survey questions. Obviously, the quality of this varies by the graduate as some may choose to withhold their answer or provide insufficient detail. We recontact graduates where their records are initially returned as uncodeable and they completed the survey over the phone. But for the online mode, we are unable to make follow up contact due to the logistics of carrying this out at such a large scale.
Requests that were not applicable
Some providers were confused by the presence of two sets of SOC codes for a given record. Unlike DLHE, graduates are asked about their employment and self-employment activities separately and this can result in two different activities with two different codes.
Some providers sent us examples where the codes had already changed as a result of Oblong’s continuous quality assurance processes.
On the whole, at least 4% of requests (of the 2,500 received) were deemed not applicable for a further review.
Consistent contextual information
Some providers have requested the use of presuppositions, anecdotal evidence or assumptions about certain responses and job types. For example, by sending a template job specification for a certain job title. As this information is not accessible to HESA and its external coder consistently and objectively for all graduates, and because we cannot be assured that it applies to the graduate response, we have taken the position that these would be inappropriate to factor in a coding process for a centralised survey.
Bias in coding SOC major groups 1-3
HESA has sought to ensure that adjustments to SOC coding considered the requirement to adjust coding of responses, both up and down the major SOC groupings. Based on a sample analysis of provider feedback, around 90% of requests sought to change the coding from major group 4-9 to 1-3, inflating the SOC group. The spread of actionable issues did not follow this trend. Of all the records that changed at the major group level, 28% moved from major groups 4-9 to 1-3; 13% from major groups 1-3 to 4-9 and the remaining 59% continued to be coded within these two groupings.
Coding of occupations is an area prone to subjective views and preconceptions, but HESA seeks to operate as consistent and objective an approach as possible. We believe that a centralised process operated by a single supplier supports that aim. The first stage of coding utilises an automated process which codes records based on a pre-defined algorithm. This is followed up by a series of manual quality checks. Whenever human intervention and checking is necessary in the process it is based on the execution of consistent principles and approaches, thereby minimising the risk of introducing bias or coding inconsistency.
There have been many lessons learned from this process, most of which we will be able to isolate over the coming months, and which will be fed into a new process for coding for year two of the survey. We will also investigate how the survey instrument can support this process (e.g. interviewer and respondent guidance), how providers may support this process, and how we will continue to listen and respond to the needs of the sector.
Use of QUALREQ
Through engagement with both the assessment process and consistency checking, we were able to make a significant change in the coding for year one.
One of the fields used for SOC coding was QUALREQ (qualification required) which, through this process, we have determined should no longer be used. This question asks if the graduates’ qualification that they completed 15 months ago was a requirement of the main job they are doing now. However, the individual may have completed further study between course completion and the census date or prior to the course they did 15 months ago, which is not captured by QUALREQ. Therefore, this variable cannot be used to distinguish between a professional and non-professional job, which is how it was being used previously. The final coding therefore classifies individuals based on their employment only. QUALREQ will not be used for year two coding.
Some issues raised by providers were affected by QUALREQ and have been addressed during the review. Similarly, some occupation groups that were not queried by providers were also found to be inconsistently coded as a result of QUALREQ. Once again, where necessary codes for such records have been amended.
We are unable to report on the outcome of each individual case raised by providers. However, we have added a list of the professions that have been changed to the existing SIC / SOC data classification information page. You may also see some changes to the raw data in the provider portal over the coming weeks, but it is not final until the data delivery at the end of March 2020.
As previously communicated, any feedback supplied to HESA after the 6 January 2020 deadline cannot be incorporated into year one (17/18) outputs. This is to ensure we remain on track for data delivery to the timeline advised. However, we remain open to this feedback to allow us to fully inform the coding of year two (18/19).
This process has taken longer than expected and we want year two coding to benefit from the full set of learnings from year one (providing enhanced data quality). Therefore, we will not switch on the raw feed of year two SIC / SOC coding until summer 2020.
Informing a revised process for year two
HESA will be hosting a workshop in mid-April with representatives from key sector groups (AGCAS, AHUA, HESPA and HECSU), ONS and the Steering Group to inform our approach to occupation coding for future collections (from year two). We recognise that there is an issue to be explored as to whether some institutional coding practices previously undertaken in relation to the DLHE survey could be replicated centrally on Graduate Outcomes, where these can be shown not to introduce bias or coding inconsistency. We will provide an update once we have ascertained a potential way forward for year two.
On Friday 14 February, the ONS launched SOC 2020 which is now the current Standard Occupational Classification (SOC) for the UK. HESA has been aware of the imminent launch of this new iteration of the framework and its role in SOC coding for Graduate Outcomes.
As the coding process is now complete for year one of Graduate Outcomes, we can confirm that it has been coded based on SOC 2010. We are currently in the process of reviewing the impact of SOC 2020 and carrying out a full assessment to determine its role in year two coding. Once this process is complete, we will give an update to providers.
We hope you find this update informative. We would like to express our sincere thanks to all providers for their cooperation and patience during this process. We believe we have a robust final set of data and look forward to working with the sector to further enhance our coding processes for year two and beyond.
Managing Director HESA Statutory