Skip to main content

Data analysis

On this page: Approach to weighting | Reliability | Salary data

Approach to weighting

Non-response to a survey can result in estimates derived from a sample not accurately reflecting the wider population. Consequently, in year 1, HESA completed an internal investigation into whether weighting could help to alleviate this consequence of non-response. We did not find any evidence to suggest that applying weighting would be beneficial, given the minimal difference between the weighted and unweighted estimates for the proportion in employment and/or study.[1]

It was recognised that further research would be required in year 2 to examine the robustness of this conclusion. HESA therefore commissioned the Institute for Social and Economic Research (ISER) at the University of Essex to carry out this analysis for the second year. When compared with year 1, the work was extended to include the proportion in highly skilled employment and/or study as an additional outcome to analyse.

Data from year 2 of the survey, covering graduates from the 2018/19 academic year, was used in the research. The research was replicated on data from year 1 to establish the robustness of the conclusions.

Weights were produced, designed to ensure that the responding sample, once weighted, matched the full population of graduates in terms of characteristics such as subject area of study, level and class of award, provider, sex, age at entry and region of domicile. Four different sets of weights were developed, each matching the sample to a different combination of the available variables.

Each set of weights was tested to see whether they improved the accuracy of estimates of the proportion of graduates in employment and/or study or in highly skilled employment and/or study, both overall and for a number of population subgroups. 

It was found that weighting – using any of the four approaches – improved accuracy for only a minority of estimates. Furthermore, when an improvement occurred it was rather small in size, making little or no practical difference to the conclusions that would be drawn from the data. In other words, the accuracy of estimates did not substantially differ between weighted and unweighted estimates.

ISER’s investigation of year 2 data therefore reached a similar conclusion to HESA’s previous investigation of year 1 data - that there is no need to use weighted estimation with the Graduate Outcomes survey data. 

Users of the data may be reassured that the findings from this project indicate that there is no evidence of substantial non-response bias in the survey data.

A full technical report of the ISER research on weighting has been published on the HESA website.[2]

Reliability

Some statistics published from the Graduate Outcomes survey will be at a very granular level, e.g., employment rates by HE provider and subject. In some cases, the sample of respondents for such statistics may be small and/or the response rate for that sample may be lower than the overall survey response rate. In these cases, the statistics may be subject to high levels of variability and a lack of statistical precision. HESA intends to publish confidence intervals[3] on these statistics (ranges within which we have a high level of confidence that the equivalent whole-population statistic would fall, where a narrow range indicates greater precision and a wide range indicates less precision).

In addition, for some statistics, it may be necessary to introduce publication thresholds whereby statistics based on very small sample sizes and/or lower response rates are suppressed. The actual decisions on use of these techniques will be clearly explained in each HESA statistical release.

Salary data

Preliminary analysis of the data on salaries submitted by respondents in the Graduate Outcomes survey reveals a small number of salary outliers which are suggestive of data quality issues, such as misinterpretation of the salary question. Whilst HESA has taken steps to reduce misinterpretation, some level of irregularity in responses is expected and decisions must be made on the treatment of salary outliers for dissemination of survey data. In determining a reasonable approach, at the lower end of the range of reported salaries HESA has taken the decision to exclude those falling below the UK national minimum wage equivalent (calculated using the minimum wage rates relevant to the year of reported employment). Salaries below this level are considered implausible.

At the opposite end of the salary range we see a small proportion of very high salaries reported which are worthy of additional scrutiny. In the first year of publication in 2020 (using 2017/18 data) HESA conducted statistical analysis of the data which suggested that if the top 1.5% of reported salaries were excluded the remaining data would more closely fit a ‘normal’ statistical distribution (which would be the usual expectation for data such as this drawn from a very large sample). HESA therefore previously concluded that it would be appropriate to exclude the top 1.5% of salaries as outliers. Some user feedback received since the 2020 publication challenged this approach on the basis that the statistical analysis did not necessarily suggest those high salaries were erroneous. In response, HESA has undertaken further analysis of both the 2017/18 and 2018/19 data, including reviewing the literature and manual scrutiny of salaries reported to be in excess of £100,000 alongside other reported characteristics of the associated employment such as job titles. This assessment leads us to conclude that salary data is unlikely to be distributed normally, and further, that beyond a certain salary threshold, the proportion of reported salaries that are not credible increases markedly. The threshold we have determined falls approximately around £245,000, accounting for the top 0.1% of reported salaries. Because we cannot be confident of the data quality of salaries in this uppermost range we have taken the decision to exclude them, so they do not have a negative impact on calculations such as mean salary levels overall. Further detail on our assessment of high salaries is available in the Quality Report.

As with previous presentations of graduate salary data, HESA expects to show data only for graduates reporting themselves as in full-time paid UK employment where the currency paid is British pounds.

HESA statistical releases show numbers of graduates by salary bands which start at the national minimum wage equivalent and are divided into £3,000 bands within the most common range of graduate salaries. In the 2020 publications the highest band covered the range £39,000 or more. For 2018/19 data publications these bands have been revised to account for the fact that a reasonably sized proportion of graduates in full-time paid UK employment can be grouped into this salary band (particularly among those having graduated from some postgraduate courses). Four additional divisions have therefore been added at the higher end, meaning the highest band covers £51,000 or more. In addition to banded salary data, median salaries are published.

Previous: Data processing     Next: Dissemination


[3] Confidence intervals are calculated at the 95% level using the method proposed by Goodman (1965) implemented in R using the MultinomCI function (Goodman, L. A. (1965) On Simultaneous Confidence Intervals for Multinomial Proportions. Technometrics, 7, 247-254).