Rounding and suppression to anonymise statistics
HESA data is collected for statistical research purposes, but student and staff data can be 'Personal Data' in its raw form and this needs to be protected from unauthorised exposure.
If you've obtained data from HESA (including via Heidi Plus) it is your responsibility to apply our specified disclosure control to your research outputs. This page is intended to help clarify how our rounding and suppression strategy works. If you're using published HESA data this page explains why you might notice missing data or figures that don't add up.
On this page: The basics | Example | Rationale | How to do it | Legal warning
The HESA Standard Rounding Methodology is used in all HESA publications and must be used by other users of HESA data too. It is a condition of the Agreement for the Supply of Information Services (the licence to use HESA data) that any permitted outputs from research using HESA data must apply this rounding methodology. This would include research papers, internal reports or presentation slides - anything that displays the results of research on the raw data.
The HESA Standard Rounding Methodology may also be referred to as the HESA Services Standard Rounding Methodology, HESA’s Rounding Methodology or the Heidi Plus Rounding Methodology depending on the context of the reference. These names all refer to the same methodology described here.
There are three aspects to the rounding methodology:
- Counts of people are rounded to the nearest multiple of 5.
- Percentages (like % of students who are disabled) are not published if they are fractions of a small group of people (fewer than 22.5).
This includes percentage change calculations ([New-Old]/Old) where either the old or new number is less than 22.5.
- Averages (like average age or average salary) are not published if they are averages of a small group of people (7 or fewer).
These rules only apply to data about people. They do not apply to data about finances, areas, volumes etc.
The rules are applied after any calculations (sums, averages, percentages etc.) have been done so that changes to the data don't compound each other to give even more inaccurate results. This sometimes means numbers in tables don't appear to add up.
The full methodology, published in our data definitions and included in all our data supply contracts, is as follows:
- All numbers are rounded to the nearest multiple of 5
- Any number lower than 2.5 is rounded to 0
- Halves are always rounded upwards (e.g. 2.5 is rounded to 5)
- Percentages based on fewer than 22.5 individuals are suppressed
- Averages based on 7 or fewer individuals are suppressed
- The above requirements apply to headcounts, FPE and FTE data
- Financial data is not rounded
Here is a fictional table prepared from raw data:
|Total staff||% female staff|
This is how the data should appear after rounding and suppression have been applied. Hover your mouse pointer over the content of highlighted cells for a description of what has changed:
|Total staff||% female staff|
██ All the counts of individuals have been rounded to the nearest multiple of 5.
██ The total staff number for College doesn't match the sum of female and male staff because the real sum (24) is rounded independently of the constituent parts.
██ Average salaries of female staff at College and Conservatoire are suppressed because there are 7 or fewer people in these groups.
██ The percentage of female staff at Conservatoire is suppressed because there are fewer than 22.5 staff overall.
The purpose of the rounding methodology is to reduce the risk of identifying individuals from published figures.
Rounding of all figures, even very large ones, to the nearest 5 prevents multiple tables being used to identify small numbers. For example one table might show 134 students studying French at a university. Another table might be restricted to UK domicile students and show 133 students. The two tables would identify that there's exactly one foreign student on that course but rounding the numbers obscures this potentially personal information.
Our rounding methodology is applied to our own publications, but also to outputs from other researchers. Consistently rounding all numbers is easy to understand and easy to apply in any situation. It is also easy to spot so that readers can see it has been applied.
Percentages are suppressed for small groups to prevent percentages from giving away the real un-rounded figures in a table. For example the rounded figures in the table below could only match the percentages for one set of original un-rounded data:
The only un-rounded figures that could give these percentages are:
Suppressing percentages where the Total is less than 22.5 (rounding to 20) reduces this risk significantly, but doesn't eliminate it completely. We have to balance the small risk of working out real un-rounded figures against being able to publish useful statistics.
It is important to think about how percentages are displayed. Percentages should be displayed to zero decimal places unless there is a good statistical reason for using more precision. This reduces the risk of deducing un-rounded figures and the risk of drawing unwarranted conclusions from small differences between percentages.
Averages are suppressed only for much smaller groups of 7 people or fewer. An average salary for one person will be that person's actual salary. In a small group of people some of them could work out the salary of another member by calculating backwards from their own salaries. The more people in the group, the less plausible this scenario becomes. By suppressing averages based on groups of 7 or fewer (anything rounded to 0 or 5) this eliminates the most likely chances of working out someone's personal information.
How to do it
The easiest way to apply rounding is to use the MROUND function in Microsoft Excel: =MROUND(cellref,5)
To suppress percentages and averages you can use the IF function:
Offences under the Data Protection Act 2018
It is a criminal offence under Section 171 of the Data Protection Act 2018 for a person knowingly or recklessly to re-identify information that is de-identified personal data without the consent of the controller responsible for de-identifying the personal data.