Data Futures: Key concepts
Key concepts that underpin the new Data Futures approach are discussed below.
A Glossary is also available to help clarify new terminology.
To define what is meant by ‘discrete’ in the context of data collection we should consider three dimensions:
- Nature of the data
- Nature of collection
- Scope of data
Nature of the data
‘Discrete’ collection sees data returned to reflect the status at a point in time – in this case the end of the Reference Period. In contrast the continuous collection approach originally envisaged would have enabled event driven data whereby the provider submits data as events occur and HESA package these up into data that is usable to the customer.
The below scenario provides an example of how discrete collection would work. The data returned for Student A would show that they are studying on a BA Hons History and we would not see that they had initially registered on a BA Hons English Literature.
If the transfer occurred later, as in the example of Student B, Reference Period one would reflect the BA Hons English Literature because by the end of the Reference Period the student had not transferred. Although the collection window for Reference Period one remains open at the point at which the transfer occurs it would not be reported until Reference Period two as this is the period it relates to.
Whilst typically data is returned to reflect the end of the Reference Period there may be some exceptions, for example in cases where a student moves between active and inactive modes of study within a Reference Period it may be that the date at which they change mode needs to be recorded. Other than removing the requirement to report changes within a Reference Period the change from continuous to discrete collection does not change the data required in each Reference Period.
There are some specific examples where information can be returned in advance of the Reference Period in which it becomes known. For example, where an assessment outcome becomes known during the sign off window the outcome can be reported in the collection for the Reference Period in which the assessment took place. For example:
In this example the exam board meets in December to assess the awards for the summer examinations. These outcomes could be included in Reference Period One. If the assessment information is not known in time for Reference Period one it would be reported in the next collection
Other than removing the requirement to report changes within a Reference Period, the change from continuous to discrete collection does not change the data required in each Reference Period.
Nature of collection
In a discrete approach, each collection operates independently from any others. In contrast the continuous collection approach saw data persist across multiple collections and so providers were not required to resend data in each collection unless it changes.
In practice discrete collection means that if a student is active for the whole year a record of them would be returned in each of the three Reference Periods for the year even if the information is unchanged.
This is equivalent to the approach in the existing Student and AP student collections whereby a full suite of student information is returned in each collection year.
Information that reflects the position ‘on entry’ to the engagement will not be required to be resubmitted in subsequent Reference Periods but we would not prevent providers from continuing to return the data if they wish to. ‘On entry’ data should not be updated to reflect changes during the lifetime of the engagement and validation would be in place to monitor this.
Once a module is completed and all outcome information has been returned there would be no requirement to continue to return it in subsequent Reference Periods. This allows for a picture of the activity for a student to be built up over time without providers needing to continue to maintain and return data that can no longer changes because it relates to a period of time that has passed.
The Reference Periods are independent and not cumulative, although for dissemination and analysis multiple periods may be joined together. For data suppliers, this means that a module completed in Reference Period one would not need to also be returned in Reference Periods two and three, unless there is information outstanding.
Scope of data
The scope of data is unaffected by the move from continuous to discrete collection. All data items are in scope for each Reference Period, and which are required will be dependent on the characteristics of the student or curriculum data in question.
The diagram below shows what ‘on entry’ data such as qualifications on entry would be returned in the Reference Period for Students A-C:
In this example, ‘on entry’ data would need to be sent for Student A in Reference Period one, but for students B and C in Reference Period two. Once in Reference Period two, there would be no requirement to send the ‘on entry’ data for Student A again.
Reference Periods will not have distinct specifications. There have previously been suggestions that, for example, Reference Period one could collect ‘on entry’ information for access and participation and Reference Period three outcomes. However, as provision diversifies, the concept of an ‘August-July’ academic year becomes less relevant in the sector and there will be intakes throughout the year. As a result, it would not be appropriate to structure Reference Periods around particular data items.
Analysis and onwards use of data may focus on particular themes which will typically correlate with the predominant activities within the period. For example, Reference Period one will still include the majority of enrolments and re-enrolments, but will not be limited to that.
The data required to be returned for a student will depend on attributes such as their start date.
A Reference period is a fixed period of time, the end of which, aligns to when HESA’s statutory and public purpose customers require sector-wide data and information. The diagram below summarises the structure of a Reference period:
Key terms relating to a Reference period:
Sign-off: A formal declaration that the in-scope data submitted to HESA represents an honest, impartial, and rigorous account of the HE provider’s events up to the end of the Reference period. This is not the same as all data submitted since last sign-off (see question about In-scope periods below).
Sign-off occurs during the Sign-off period before the Dissemination point. It must occur at least once but a provider’s Reference period data can be signed off as many times as the provider deems necessary, until the deadline (Dissemination point) is reached. Data must be signed-off by the head of the HE provider – normally the Vice-Chancellor or Principal.
Dissemination point - The specified date, following the end of a Reference period, by which signed-off data will be extracted and supplied to HESA's data customers. Data disseminated at the Dissemination point will be used for official accounts of the higher education provider’s activity for statistical, regulatory, and public information purposes.
There will be three Reference periods over the traditional academic year, as this common timetable reflects both the majority of activity, and the principal regulatory activities that depend on data. The flexibility of the model allows HE providers to reflect their own timetables of activity, and respect the different delivery patterns of courses with different start dates.
The Reference periods are not deterministic – the model is designed to follow the (generally annual) rhythm of course deliveries, recognising that different courses operate on different timescales. Business events occur and generate data, which is then reported to HESA in relation to the Reference period when they occur. A suite of quality assurance and sign-off activities enable us to provide the sector with reliable, comparable, and consistent information.
The diagram below illustrates how Reference periods will work (please note that the table below is only presented to illustrate the model it does not imply that any decisions on timings have been agreed):
The diagram illustrates that when one Reference period ends the next one immediately begins.
The data model translates the Student and Student Alternative records into a single data stream.
Details of the entities and fields to be collected will be presented in the Data dictionary in the Coding manual.
To view the diagram below in full-screen, right-click and select 'open in new tab'.
Quality assurance will happen within the context of a collection. Quality rules and reports will compare the incoming data with that sent in previous collections to ensure consistency and look at changes over time.
Quality assurance will be automated as far as possible to ensure rapid feedback to providers. In addition to greater automation we will streamline the process where possible based on feedback from the existing submission process and the Alpha and Beta Pilots. As an example of the efficiencies that we might introduce: tolerances that could be adjusted on a per-provider basis, tolerances applying to more than one collection, and managing quality issues within a single system.
Quality checks analogous to the existing ‘continuity’ checks will be in place to monitor changes to particular data items between collections. Some items in the model will relate to data that is not expected to change, for example date of birth, and changes to these items will be monitored through ‘updateability periods’. Changes outside of the expected ‘updateability’ period for an item will be highlighted as part of the quality assurance process.
Quality thresholds will, where appropriate, be driven by the data. For example, information about entry qualifications might be required to be known 8 weeks following the registration start date. This is intended to ensure that providers have a sufficient window to capture and assure data for all students irrespective of whether they commence studies at the beginning or the end of the Reference Period. Quality checks will be in place to monitor completeness and updateability – ensuring that data is returned within the expected period and of high quality within the expected timeframe