Skip to main content

Unistats dataset

The Discover Uni website provides comparable sets of information about full- and part-time undergraduate courses. It is run by the Office for Students and is designed to meet the information needs of prospective students.

Unistats dataset

As an additional technical resource for analysts and developers, we have made the raw dataset that underlies the Discover Uni website available for download. This incorporates information from the Unistats record. The download is presented as a *.zip file containing the data in both XML format and multiple *.csv files.

Supporting files, documents and information about updates to the data are provided below.

Experimental statistics - Graduate Outcomes and Longitudinal Education Outcomes (LEO) data

The Graduate Outcomes data in the Unistats dataset is being published as experimental statistics by HESA. Experimental statistics are newly developed or innovative official statistics undergoing evaluation. They are published with the aim of involving users and stakeholders in the assessment of their suitability and quality at an early stage. Users should exercise caution when using data from experimental statistics and evaluate the quality and coverage of any data they intend to use in the context of the intended application to ensure that it is fit for their purpose. We wish to continue to engage with users to ensure that our statistical products meet user needs. Please contact our Official Statistics team via [email protected] or (0)1242 388 513 [option 2] to give feedback on this data.

The Longitudinal Education Outcomes (LEO) data in the Unistats dataset is also being published as experimental statistics by HESA, following receipt of the data from the Office for Students. This is available in the LEO3.csv and LEO5.csv files, and also between the GOSalary and Tariff entities in the XML Unistats dataset, as well as the LEO3SEC.csv and LEO5SEC.csv files, which exist within the SectorSal entity in the XML Unistats dataset.

The LEO data in the Unistats dataset is derived from data extracts owned by the Department for Education (DfE). The DfE do not accept responsibility for any inferences or conclusions derived from the LEO data by third parties.

Further information can be found on the OfS website, where you can also provide feedback on this data.

Onward use of Unistats data

We are keen to ensure that any end users of Unistats data are given sufficient information to allow correct use of the data which has been made available to them.

Several of the courses featured in the Unistats dataset have very small numbers. As a result, a change of only one student can make a substantial difference to reported numbers; we would advise caution when reporting on courses with small numbers of students, particularly when making comparisons between courses.

The Unistats dataset contains many datapoints about courses offered by UK providers. In some cases the course is too small, or the response rates to surveys are too low, for the data to be published about the course specifically. In such cases, data describing the course is aggregated either across multiple years, or with other similar courses at the same provider.

Some courses in the Unistats dataset cover more than one subject area – this may be because it is a joint honours course, or the course covers a diverse subject. When the data for these courses is published at subject level, the course will appear several times in each Unistats table: once for each subject.

As a result of the above, we would advise anyone wishing to make onward use of the Unistats data to get in contact, to understand how this is reflected in the dataset, and how to highlight the considerations outlined above. We aim to ensure that there is minimal risk of onward misinterpretation of the Unistats data.

Terms and conditions

The Unistats dataset is free to copy, use, share, and adapt for any purpose. The Unistats dataset is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You must give appropriate credit (HESA, www.hesa.ac.uk), provide a link to the licence, and indicate if any changes have been made.

Download the Unistats dataset

When you click the button above, the download will begin. The file is delivered as a compressed archive (*.zip) containing a single XML file, a readme.txt file, and a number of *.csv files.

Supporting files and documents

Unistats record 2021/22: Coding manual for the 2021/22 data collection.

XSD schema file: Unistats output schema - Provides detail of the structure of the data file.

Overview of the dataset: Unistats dataset file structure and description.

Brief introduction to XML format: Using the Unistats output file  

Subject codesLookup table for CAH 1.3.4 (applicable to latest data)

Lookup table for CAH 1.3.3 (applicable to data from 25 February 2020 to 14 April 2021)

Lookup table for CAH 1.2 (applicable to data from 18/19 to 25 February 2020)

Lookup table for JACS 3.0 (applicable to data from 12/13 to 17/18.

Lookup table for JACS 2.0 (applicable to data from 09/10 to 11/12).

KISAIM codes: List of KISCourse.KISAIM valid entries.

Data updates

Updates to the Unistats dataset will be made as required, and in parallel with the Discover Uni website, when contributing higher and further education providers wish to update their information, or when HESA or the Office for Students (OfS) make changes to the underlying data. These updates occur weekly on Wednesday mornings. The file name of the *.zip file includes the date and time.

Please see the OfS statement on corrections and revisions to the data presented on the Discover Uni website for a full description of how Unistats data changes throughout the year and how these changes are recorded.

If you have any questions on the dataset, please email HESA’s Official Statistics team, or call +44 (0)1242 388 513

Older Unistats data

If you are using an older version of the Unistats dataset please use the supporting files for the relevant collection year. These are available via the 'collection' links below.

Unistats Collection 2020/21 for dataset downloaded 2020-10-13 to 2021-09-29

Unistats Collection 2019/20 for dataset downloaded 2019-09-11 to 2020-10-13 [Please note, due to updates to the Common Aggregation Hierarchy, there are two versions of the Unistats output schema - one for use prior to 26 February 2020, and one for use from 26 February onwards]

Unistats Collection 2018/19 for dataset downloaded 2018-09-01 to 2019-09-10.

Unistats Collection 2017/18 for dataset downloaded 2017-09-04 to 2018-08-29 [Please note, due to the inclusion of LEO data, there are two versions of the Unistats output schema - one for use prior to 5 July 2018, and one for use from 5 July to 29 August 2018]

KIS Collection 2016/17 for dataset downloaded 2016-09-01 to 2017-09-03 [Please note, due to changes in the TEF, there are two versions of the Unistats output schema - one for use prior to 22 June 2017, and one for use from 22 June to 3 September 2017].

KIS Collection 2015/16 for dataset downloaded 2015-09-03 to 2016-08-31

KIS Collection 2014/15 for dataset downloaded 2014-08-28 to 2015-09-02

KIS Collection 2013/14 for dataset downloaded 2013-09-19 to 2014-08-27

KIS Collection 2012/13 for dataset downloaded before 2013-09-19