Skip to main content

Data Futures business requirements

Business requirements are the critical activities that must be performed to meet the Data Futures programme objective(s) while remaining solution independent, they indicate what the programme seeks to achieve.

For the Data Futures programme the business objectives provide an outline of the needs that the HESA Data Platform (HDP) is being developed to meet. The requirements have been developed with and signed off by Statutory Customers to ensure alignment on the planned aspects of the delivery. In addition to this process the requirements have been shaped by sector engagement through work with the Provider Forum, lessons learnt as part of the Alpha pilot, detailed work with Alpha participants and software suppliers and in-house system teams.

In addition to the business requirements, data requirements and functional requirements have been developed. Functional requirements by their nature will evolve during the development of the HDP, as such these will not be shared. Data requirements will be made available in the near future.

The business requirements have been grouped to ease navigation through them.


Business requirements: specification publication

The process of and requirements for the publication of the collection specification. The specification includes the following:

  • Data model
  • Coding frame
  • Guidance
  • Quality ruleset
  • Derived fields and entities (specific to the collection process)
  • Corresponding XSD file.
Requirement description Justification
Ability to publish a specification which is accessible by any interested party. The specification needs to be available to anyone who submits or uses the data and so must not be restricted to only those with a HESA account.
Ability to navigate between the HESA website and the published specifications. Users need to be able to navigate easily between the main HESA website and the platform on which the specifications are published.
Ability to publish an XSD. A published schema is required to allow providers and software suppliers to use this in their systems.
Publish the data model. A published data model is required so viewers can see the overall model structure.
Publish a data dictionary. The data dictionary is required to define the data that needs to be returned.
Filter between collected and derived items in the data dictionary. Filters allow providers to view only the fields they need to submit.
Filter between items required by providers in different countries/regulators in the data dictionary. Filters allow providers to view only the fields that are applicable to them.
Publish enhanced coding frames. Enhanced coding frames ensure users can see useful additional information, e.g. applicability of valid entries and categories for onward use.
Users able to download coding frames and enhanced coding frames for a specification. Download capabilities allow users to view coding frames and enhanced coding frames in Excel. This is particularly useful for larger coding frames or where the enhanced coding frames indicate applicability.
Users able to download a CSV file containing all fields and all schema defined valid entries. Requested by providers with in-house systems during the previous iteration of the data specification as this helps them to prepare their internal systems.
Users able to download the valid entries for a specific field in CSV format. CVS format is an existing functionality in the coding manual. It was requested by providers and has been well used. A download is easier to work with than an onscreen view, particularly for large coding frames like HECoS,
Ability to publish formatted specification guidance on field and entity pages, e.g. tables, sub-headings, bullet points. Guidance ensures the specification is user-friendly and accessible.
Function to search the data dictionary by key word. Search function will allow users to search for a specific field or entity.
Publish quality rules. Providers can view the validation that their data is subject to.
Filter rules by regulator. Providers can view only the validation that is applicable to them.
Filter rules by type of rule. Providers can view certain types of rules separately, e.g. credibility.
Search quality rules by key word. Users can see all rules related to a particular field, e.g. to allow users to narrow the list of quality rules based on the area of interest. 
Publish a revision history for a specification. User needs to view all changes made to a specification.
Publish a revision history between specifications. Users need to be able to view changes between specifications and identify when a change was made to a data stream without looking at each specification individually
Filter the revision history by type of change. Allow users to only look at certain types of changes, e.g. changes than affect schema.
Filter the revision history by the version number. Users able to see all changes made in a particular version of a specification.
Publish a specification with a version number and date. Users able to know which version of a specification they are viewing.
Publish additional guidance documents. Multiple guidance documents required to assist providers and other data users with their understanding, e.g. collection schedule, coverage, guidance on particular types of students, etc.
Link to downloadable collection tools. HESA need to link to downloadable collection tools, e.g. validation kit, data entry tool, etc.
Publish multiple specifications for the same stream. HESA will have current, future and past specifications published at the same time.
Ability to indicate the status of a specification. HESA need to indicate status to assist users looking at multiple specifications, e.g. specification under consultation.
Ability to publish specifications for multiple streams. HESA have a number of data collections with different specifications that need to be published.
Ability to clearly label a specification with a name and code. Many specifications are published, it needs to be clear which specification applies to which collection.
Ability to associate a specification with a collection. Most users will want to navigate by collection to find the specification that is relevant to their purpose.
Ability to associate a specification with multiple collections. Although there will be changes between collections it is likely that the same major version of a specification would apply across multiple collections.
Publish input and output specifications. Published input and output specifications ensure visibility and that users can access the specifications that they require.
Ability to restrict access to output specifications. Some output specifications may be sensitive and should only be visible to authorised users from relevant organisations.
Ability to access a previous version of a specification. Providers may need to submit data against an old version of a specification: they would need to be able to access the specification as it was to understand what has changed since and retrieve the XSD.
Ability to produce a revision history to support specification releases. Revision history means users can view changes over time.
Ability to produce a diff between current and previous specification versions and report differences.

Required to assist all users in seeing what has changed between two specifications.
Automated revision history for data model changes. Automated revision history will reduce overhead and risk of human error.
Ability to tag revision history entries by type of change. Change tags in revision history will help all users to understand the impact of changes to their systems, operations and processes.
Revision history screen to support Rich Text Edits. Rich Text Edits assist non-technical users with data entry and updates.
Revision history to be downloadable as a csv file. This is an existing feature which has received positive feedback from the sector.
Require the Revision history to be downloadable as a xlsx file. This is an existing feature which has received positive feedback from the sector.
Ability to search for specific key words, etc in the revision history. Key word search will enable users to easily locate the information that they are looking for.
Ability to filter revision history entries by tag. Revision history tag filter will enable users to easily locate the information that they are looking for.


Business requirements: collection process

The collection process covers the end to end process of data collection from the configuration of a collection, to data being uploaded and signed-off.

Requirement Description Justification
Associate a single user account with multiple organisations. An individual may return data to HESA on behalf of multiple organisations. Requiring an individual to create multiple accounts associated with different email addresses will encourage bad practice regarding password security.
Specify the in-collection reports to be produced during a collection. Different collections and streams will have different reports produced to support providers in undertaking quality assurance.
Send automatic notifications via email when defined criteria are met. Email was the preferred method of communication in prior trials- although on-screen system messages were also useful as a secondary source.
Enable an authorised HESA user to reset a provider's submission. Although the general approach in a single file submission is to replace the previous file as a new one is uploaded, in extreme scenarios an authorised HESA user may need to 'reset' a provider’s submission. This would entail wiping any submitted data and QA results/reports and activity log, so essentially setting the collection back to how it would have been before the provider uploaded anything. This is a hard delete and would typically be used in cases where there are data protection concerns.
Perform a reprocess of data for a collection where new comparison or reference data becomes available. Where new reference data becomes available or we fix a bug in a rule or derived field (for example), we may need to reprocess all providers to regenerate their validation and other collection outputs. This could be for a collection in an open, closed or historic amendment state.
Monitor the progress of one or more providers for a collection. To support the management of the collection the Liaison team and other HESA users (such as analysts and product owners), and Statutory Customers, need to be able to see summary level information about how providers are progressing in a collection. This enables targeted help and monitoring for any risks regarding collection closure.
Compare the progress of a provider across a number of collections or streams. Will support HESA Liaison team to identifying higher level trends for a provider.
Upload data for a specified collection. Discrete collection requires providers to associate a file with a collection, rather than HESA interpreting the data to identify which collection(s) it should be processed for.
Check whether a submitted file conforms to the published schema. Will ensure that the data can be ingested and, where it isn't, enable the provider to correct the file.
Identify the source of the issue where a schema error occurs. Will ensure that the data can be ingested and, where it isn't, enable the provider to correct the file.
Prevent a file from progressing where schema errors are identified. Data with schema errors, such as invalid codes, etc will mean quality rules and other outputs cannot be produced.
Submit [data] using an older minor version of the specification, but [data] validated against the latest version of the specification for the collection. During a collection, new minor versions of a specification may be issued to address guidance questions or add new quality rules (for example). For providers and software houses, upversioning the XSD each time in a timely manner could be difficult, particularly where there is no impact on the schema itself. However, validation should always be against the latest version to ensure that all data submitted for a collection is held to the same standard.
View an activity log for a provider for a given collection. To enable providers to manage their uploaded files by seeing a history of actions.
Provider to be able to add comments to an activity log for their organisation for a given collection. Providers will be able to manage their files more easily if they can add comments to their activity log.
Identify different uploaded files. Providers need to be able to manage their files to understand the data that HESA holds, so there is a need to be able to uniquely identify each file, e.g. using a unique identifier.
Access the raw data files submitted for a provider for a collection. The system must be designed in adherence to the principle of least privilege, so access should be restricted as far as possible. However, in some circumstances it may be necessary for HESA Liaison users (e.g. to support troubleshooting) and providers (for audit and recovery) to access raw data. Accessing these via the portal interface would be preferable to the system back-end.
Allow multiple lifecycles to exist within a collection. Better control over high-risk new providers and ability to support different lifecycles for different regulators/countries, for example the submission process is slightly different for providers in England as OfS are involved in quality assurance during the collection.
Allow more than one end point in a lifecycle. The OfS have a requirement for nil returns to be available in all collections which mean that not all providers in a collection would be expected to submit data. There needs to be two ways to progress through a lifecycle - sending data and confirming that there is no data to send.
See a measure of how long a process will take. As a provider, if I upload a file, I want to understand how long it will take to process to optimise my workload and internal processes.
Accountable officer of a provider able to sign-off their uploaded data for a collection. Sign-off confirms that the provider is content that the data is a true reflection of the organisation. This is necessary as only 'signed off' data can be used for statutory purposes.
Capability to sign-off data electronically. Paper-based sign-off is inefficient and requires unnecessary manual processing.
Process a 'nil return' for a collection. Changes to registration in England mean that providers may need to submit a nil return declaration for a collection where they have no data within the coverage of the record.
Capability to sign-off a nil return electronically. Paper-based sign-off of nil returns is inefficient and requires unnecessary manual processing.
Hold data for a collection independently from any other collection. Discrete collections are independent of each other and the data submitted to collection A should not contaminate collection B or vice versa.


Business requirement: in collection reporting

During the submission process a number of different reports are produced to support providers in undertaking quality assurance of their data. For example frequency counts, cost centre analysis and IRIS.

Requirement description Justification
Retrieve quality assurance (QA) results at row level via an API. Feeding back QA results via the API will assist providers in engaging with QA and correcting the data at source.
Specify the data to be sent to regulators during an active collection. To support QA and the generation of reconciliation reports.
Ability to provide in-flight collection data for a provider to be sent to their primary regulator for QA purposes. The IRIS process must run within an active collection as it supports QA.
Manage IRIS transactions through a workflow. Currently an 'IRIS monitor' is accessible to OfS, HEFCW, SFC and DfE-NI users as well as HESA. This shows the current status of each IRIS transaction for the collection against a workflow state. This is used during collection to reprocess and identify any processing issues.
Ability to present an output file back to HESA for publication to the provider. The purpose of the IRIS process is to provide a series of outputs back to the provider showing a reconciliation between the data that they have submitted for a collection and other data sources not available to HESA, e.g. against HESES, Research Degree Programme (RDP) funding and Transparent Approach to Costing for Teaching (TRAC(T)). These reconciliation reports need to be visible to the provider to support QA.
Make features [visible/active], [invisible/active], [invisible/inactive] for a given collection. Until the previous collection's data is available there may be limited value in producing some reporting because the comparison data is unavailable or patchy. In these cases, the results may be misleading and so it may be preferable to not display them at all.
Sort data tables in onscreen reports. Make it easier for users to engage with tables - allowing them to sort by the areas of most importance to them.
Generate a report for HESA Liaison team to support management of open issues. Assist HESA in managing the open collections, and providing support to providers access to summary level information about the number of remaining issues, duration of inactivity and assignee, etc.
Report on rules across multiple reference periods at a provider level. To understand trends in provider data.
Report on tolerances per rule across multiple reference periods at a provider level. To understand trends in provider data.
Report on the rule outcomes and tolerances across all providers in a collection. When monitoring the collection HESA needs to report on the number of providers failing rules and what tolerances are being applied. This also ensures that tolerances are applied appropriately across all providers.
Show total number of active issues in a collection by category (Quality Rule, Credibility Rule, Regulator Issue, Historic Amendment, General Issue). To help a provider manage their issues, HESA need to provide them with a screen to monitor and manage their active issues.
Display the outcome of a rule validation to the provider. Inform a provider at a high level whether they have passed or failed a rule.
Report the total number of records outside of tolerance to the provider. To assist the provider in understanding the severity of an issue.
Report the detail that makes up the total number of the records which have failed a rule to the provider. For showing the row level detail of rule failures.
Ability to click through from row level detail into a detailed report on the issues. In issue management we want to show the rows which have failed a validation, but HESA also need to provide a report for each failure. This will give more detail around the row of data which has failed, which will support the provider in understanding why it has failed validation.
Create a detailed report for each rule. Ability to manage the detailed report for each rule.
Monitor the usage of a rule. It would be useful to be able to extract metrics on how frequently a rule is triggered, by how many providers and how the tolerance is being amended. This will help HESA to understand which rules are providing value and which may require review This will enable HESA to improve the effectiveness of the ruleset.
Ability to see a count of records for each valid entry for fields that have a defined list of valid entries. This enables providers to see, at a glance, whether they are returning the spread of codes that they expected. This is particularly helpful where there are fields that do not have credibility tables associated with them or where the provider has a large number of students that fall outside of the standard registration population and are therefore excluded from most reporting.
Ability to supply a flattened copy of provider data complete with the enrichments applied by HESA. To assist providers in managing and analysing their data as well as reproducing reports that have been generated by HESA.
Ability to produce a report containing information about the records that are expected in the next collection for a provider. Assist providers in understanding the expectations for the next collection and pre-empt validation issues.
Supply a comparison of data from the Student, Staff and Finance streams to a provider for comparison activity. Enable cross-stream comparison of academic cost centre information to ensure that the data returned by the provider is consistent.
Ability to include data from a different provider (i.e. provider B) in a report for a provider (i.e. provider A). To assist providers in managing the handover of students studying under collaborative provision arrangements.
Ability to use historic data from the same stream and provider as comparison data in collection reporting. To support QA, it is necessary to provide a comparison of incoming data and data from a previous collection in the same stream - note this might not be the immediately previous collection.
Ability to use rolled-up historic data from the same stream and provider as comparison data in collection reporting. To support QA, it is necessary to provide a comparison of incoming data and data from previous collections in the same stream. In some cases, this might need to be an aggregated 'year-long' picture of three reference periods.
Ability to use data from a different stream for a provider as comparison data in collection reporting. To support QA it is necessary to show incoming data alongside data from a different stream to provide context - for example, in reviewing cost centres it is useful to see both the student and staff FTE attributed to a cost centre to ensure that the data looks correct.
Ability to use data from a different provider in collection reporting. To assist providers in managing the handover of students studying under collaborative provision arrangements.
Ability to use signed-off and in-flight comparison data in collection reporting. Comparison data may be drawn from an active collection and therefore might need to be updated dynamically as the data changes- this may be the case in comparing student and staff data, for example.
Require the ability to access previous versions of a report within a collection. Providers need to be able to understand the differences that they have made in resubmissions, so need to be able to access previous iterations within a collection. Providers will need to be able to refer back as manual queries raised, such as those relating to IRIS might relate to a non-current version of the report.
Ability to download reports in CSV and XML format. To support providers in ingesting the reports.
Require the ability to display a report on screen. To support providers in accessing and understanding the reports.


Business requirements: credibility

Credibility reporting allows data for a collection to be displayed back to users in a tabular format to assist in quality assurance. Much of the automated quality assurance (QA) checking runs off the credibility reports as this puts the data into context.

Requirement description Justification
Ability to use credibility reporting to compare data back to an earlier collection using the current structure of the provider. Ensure consistent data is returned.
Ability to check whether the credibility reports are compatible with the collection specification. To ensure that there are no errors created by publishing a credibility report which is not compatible with the collection specification.
Ability to present credibility results in a report. Currently, credibility reports are presented in a table format: this is a high-level requirement to present the output of a credibility rule in some form of report. The expectation is that this will be a table; however, HESA would not want to limited this to only tables, and think there would be benefit from more advanced visualisations.
Ability to visualise the credibility reports in a dashboard style.  To present the credibility data in tables, charts, graphs, etc. Currently credibility reports are presented in a table format. 
Ability to only have the relevant credibility tables for a higher education provider displayed. To keep tables relevant to provider, e.g. tables specific to Wales are not displayed for non-Welsh providers.
Ability to manage the display on unpopulated tables.  Some tables will not be relevant for all providers. HESA therefore need the ability to display or not display unpopulated tables to remove ‘noise’ and improve user experience.
Ability for non-technical users to manage the information displayed on a credibility report including titles, descriptions, fields, values, columns, guidance and links to guidance. Currently, credibility tables are configured by Quality Analysts. HESA therefore need to maintain this workflow and need a method for non-technical users to configure credibility tables.
Ability to use functions within the credibility report, e.g. sum, count, division, distinct, addition, multiplication, absolute. To provide the flexibility to present the data with suitable aggregation applied, e.g. counting records or working out an average full-time equivalent (FTE) of the records.
Ability for non-technical users to specify the credibility tables and rules. Currently, credibility tables are configured by Quality Analysts. HESA therefore need to maintain this workflow and need a method for non-technical users to configure credibility tables.
Require the credibility report and row level detail to be exportable into Excel and PDF. Many providers export rule reports into Excel to be shared within their organisation. We need to support this process by making the reports exportable into a format which retains formatting.
Require the ability to specify different comparison options for the data, i.e. against the latest position, previous year, aggregated year on year, etc. When writing credibility reports, HESA need to select different types of summarised data to compare against. Examples include the previous submission or the previous collection’s signed-off data.
Ability to have credibility tables with drill down functionality. To support the provider to understand their data, they will need to drill down to the row level data to understand which records are causing them issues.
Require the ability for a provider to navigate to a page of data in the drill down. High level requirement for providers to navigate between pages of row level detail.
Ability to click through from a cell in the credibility table to a dashboard for that cell. A dashboard with enhanced visuals and statistical information will give additional information and analysis for a provider to really understand and interpret their data. 
Require a click through credibility dashboard to support R and more advanced visualisation. HESA wants to use R for statistical analysis in the live collection. A click through dashboard for the credibility table was identified as the most appropriate place to begin using R in Data Futures.
Require the ability to implement credibility tables independent of developers. The implementation of credibility reports is currently done by quality analysts. HESA want to ensure that this implementation can continue to be carried out by non-technical users.
Ability to test credibility tables and rules. Testing has been identified as difficult in legacy system. Highlighted as a specific credibility requirement to ensure design takes improvements in testing into account improvements.
Ability to publish the tolerances for each cell in a credibility table. HESA require the ability to publish credibility rule tolerances to maintain consistency in tolerance defaults for quality rules.
Ability to add tolerances for each rule. To follow the same process in quality rules, each rule should have associated tolerances which measure the number of passes, fails, result and identify whether that is within an acceptable range or outside of that.  
Ability for providers to select different comparison options for the data, i.e. against the latest position, previous year, aggregated year on year, etc. To help the provider interrogate the credibility tables, we need to allow them to select different sets of data to compare their report against. For example, the previous submission or the previous collection’s signed-off data.
Ability to navigate to credibility tables. For users to navigate to and engage with the credibility tables.


Business requirements: rules management (quality assurance)

Automated quality checks (rules) are essential for in-year collection to ensure that the data submitted is within the expected or accepted bounds. In Data Futures we will use tolerances to enable rules to be adjusted to reflect the profile of each provider and enable more efficient and effective quality assurance.

Requirement description Justification
Following a merger, demerger or change of ID, ability to view the data previously returned for a stream as if the providers have always had their current structure or identifiers. The credibility and continuity quality assurance (QA) processes require a view of the historic data as if it has always been merged
Ability to associate pre-merger records with the correct new provider for continuity QA purposes. Continuity ensures records are returned as expected. Some sector benchmarking relies on continuation rates for students and we need to ensure key data is not missing. There is also a continuity aspect to the Staff stream.
Ability to compare the details of a record back to a collection where it was previously returned for a provider that no longer applies as the result of a merger, demerger or new identifier. This is not necessarily the collection immediately prior to the current collection. For GDPR need to ensure data for Students and Staff returned consistently and the same identifiers are not used for different people.
Ability to use quality rules to perform credibility checks against an earlier collection using the current structure of the provider. Ensure consistent data is returned.
Ability to create a quality rule. To enable incoming data to be validated and assessed against agreed quality criteria.
Ability to create a unique identifier for a rule. To assist in management of the rule base, each rule must be uniquely identifiable.
Ability for quality analysts to specify the rule Also to input all the details about the rule. Quality analysts need to record all the information about a rule such as the technical description, which will be used by the developers to build the rule. They will also record all the information which will be presented to the providers and statutory customers about the rule.
Ability to capture the workflow status of a rule such as Draft, Test, Peer review, Published. To enable rule authors and developers to manage their work.
Ability to assign a version number to a rule. To enable the rule base to be managed over time.
Require the ability to temporarily switch off groups of rules until the data they require for a comparison is available.


Some rules will reference third party data, which is not available until a point in time during the collection. We do not want to run a rule when it has no data to run against.
Ability to specify which fields to be displayed for row level reporting for a rule breach. It would be useful to be able to determine the relevant fields to display for any given rule, to assist providers to understand a rule and identify the source of the error within a record.
Ability to add links into external rule notes which navigate the user to additional information. Will enable further information relating to a rule to be linked and navigable.
Ability to create a rule which can’t have a tolerance change. Not all rules are negotiable - for example, some of them will relate to structural elements that would cause problems for further processing if not addressed. Preventing these rules from having the tolerance changed would reduce the potential for human error during peak processing times, particularly with a new and therefore less familiar rule set.
Ability for a rule to use data provided by a third party, e.g. postcode, UCAS *J data and data collected in previous collections. Rules may reference many different data sets external to the collection, such as third party data such as *J or they may compare a submission against previous collections data. We therefore need to be able to compare against multiple data sources such as the data warehouse, migrated data, data in other streams or previous collections.
Ability to create a rule which is only applicable to the student data which is covered by one of a providers regulators, e.g. HEFCW. To allow for validation to be tailored to the needs of an individual regulator/funder.
Ability to create a tolerance which can be applied to a rule (credibility rule or quality rule). To allow for data to be managed within agreed thresholds.
Ability to set a default tolerance for a rule. This is the default value for a rule for each regulator. At the start of each collection, we will need a default tolerance for each rule by regulator associated to it. This will then be changed by the regulator as they see fit through the issue management tolerance request process.
Require the ability to specify the type of tolerance required for a rule. Either Row level, Total, Percentage, Value.


To enable the rule to trigger correctly it will be necessary to define the criteria to judge it on. For different rules the criteria may be different - for example, it might be on a row count, or a percentage, etc.
Require the ability to set multiple tolerance ranges. To enable a tolerance change request to be routed to the correct approver based on the severity of the breach.
Ability for a tolerance owner to designate another organisation, e.g. HESA, to take responsibility of setting an override tolerance on their behalf.  To enable the effective management of tolerances.
Ability for tolerance owners to set their own default tolerances for all rules in one screen. In each collection we want to give the tolerance owner the ability to update their default tolerance for a rule before providers begin submitting data against them.
Ability to publish the rule details and its associated tolerances. To support providers in understanding the rules, we need to publish the details about the rules such as plain English description, technical description, change date, change reason. Also associated regulators/ population. and details about the default tolerances.
Ability to view a list of rules, e.g. in a table form.  Users able to see full list of quality checks in place, to understand the checks that will be applied to data and potentially build these into business processes or systems
Ability to amend a quality rule. Rules will be developed over time, therefore HESA need to be able to maintain the rule and update the associated guidance which will need to be versioned to support historic amendments.
Ability to make the same change to a collection of rules (rule set) just once e.g. removing reason for change from a number of rules To reduce the risk of manual error and improve efficiency and consistency in maintaining the rule base.
Ability to amend a rule tolerance. To allow for changes in the agreed quality thresholds.
Require the ability to allow a tolerance owner to amend their tolerances set. Covered in more detail in the issue management section.

Ability to deactivate a quality rule. To cater for instances where a rule needs to be suppressed either because of an error or because it is no longer relevant/required.
Ability to implement a rule. To enable incoming data to be subject to quality checks.
Ability to write test cases before implementing the rule. To ensure that the rule functions correctly.
Ability to record a pass/fail result for a rule at cohort level. Where they fall in multiple cohorts there will need to be multiple validation results. For a student who falls under two regulators, they may have different pass or fail validation for a rule, therefore HESA need to be able to record the output of that rule from the perspective of the cohort which they are being grouped in. Where they fall into two cohorts, HESA require the ability to record the pass or fail twice.
Ability to suspend a rule. To cater for instances where a rule needs to be temporarily suppressed either because of an error in implementation, for example.
Require that rules only used signed-off data when comparing against a previous collection.  As a provider could be completing multiple returns concurrently there is a need to have a stable basis for comparison. Signed-off data has been verified by the provider as being valid at the point of return. 
Require the ability to execute a rule and apply tolerances as part of the quality assurance process To manage incoming data in accordance with agreed quality thresholds.
Require the ability to run a rule individually across all providers submitted data in an open collection. Should a new rule be deployed or a change to a rule be made whilst a collection is active, we require the ability to run that individual rule against all the data that has been submitted (excluding historic loads) whilst a collection is still open.
Require the ability to compare the rule outcome against the correct cohorts tolerances relating to the rule. It is vital to link the outcome of the rule to the correct tolerance for that rule.


Business requirements: cohorts

Cohorts are central to supporting an automated quality assurance process which is essential to supporting more frequent data collection. Tolerances enable HESA/Customers to set thresholds for rules and to tailor these at provider level to provide a more robust quality assurance (QA) process and reduce 'noise' for providers, enabling them to focus on the issues that really matter. Cohorts enable records to be classified into groups to support the QA process in relation to issue routing. This includes the ability to define and maintain tolerances and cohorts for the rule base.

Requirements description Justification
Ability to identify which cohort a record belongs to, based upon information provided within the rule, whether in the submission or in reference data. To remove the need for duplication within the rule base by allowing a single logical rule to be executed and routed to the relevant parties.
Ability to record organisation types of specialist regulator and principle regulator. This is to categorise the regulators into their two sets.
Require cohorts to identify which regulator to associate rules with. To support identifying which rules to run for different cohorts, HESA will need the cohort solution to link rules with regulators. This can then inform valid values for a rule or associate tolerances for a rule which are different per regulator.
Require the ability to use the cohort derived regulator to associate a tolerance with a rule for a provider. To ensure that tolerances are associated with the correct group of students in a submission the cohort will inform which regulators tolerances are associated with each rule.
Ability for Cohorts to be used in credibility tables to filter/ categorise the results. Credibility tables will need to be split by regulator, either as a category, table or parameter in filter to ensure that credibility rules can follow the same workflow as quality rules.
Ability to display rule failure reports by cohorts. To ensure that rule breaches are displayed in relation to the regulator that they relate to, we require that regulators derived from cohorts are included in the rules report.
Ability to use the determined regulator from cohorts to be a component of identifying what data should be disseminated to each regulator. HESA will need to determine which regulator(s) should receive the data for each student registration, which can be informed by the cohort.
Ability to update and maintain the regulators in cohorts. HESA will need to update and maintain the list of regulators, to either change their details in the case of a rebrand or add / remove regulators as the regulatory landscape changes or new regulators are brought onto the system.
Ability to maintain cohorts at a collection level. Cohorts will be collection specific. Whilst HESA expect a slow change over time, the cohorts need to be bound to individual collections.


Business requirements: data entry tool

HESA currently supplies a downloadable data entry tool which can be used to produce XML data in the correct format. This tool is currently available for some of our streams; where it is available, it can be downloaded from the relevant coding manual.

Description of requirement Justification
Produce a schema valid XML file without using a student records system. Support smaller providers that may not have student records systems capable of producing a return in the correct format.
Not need to install separate applications for different collections or streams. A universal tool to reduce the maintenance overhead for HESA and providers in managing their data versus specific collection or stream-based tools.
Select the version of the XSD to prepare a file against. A universal tool to reduce the maintenance overhead for HESA and providers in managing their data versus specific collection or stream-based tools.
Navigate by entity - adding or removing on an entity level basis. Present data in a logical format that makes it easy for users to navigate.
Navigate by record details such as primary or compound keys. Present data in a logical format that makes it easy for users to navigate within the tool.
Ability to search for a record. Help providers to locate record(s) to update or review.
Delete a field or entity to remove it from the XML. Help providers to manage their data without needing to manipulate the XML file directly.
Select a value from a predefined list of valid entries for a field. Reduce the potential for manual error.
Support union domains: fields that allow both free text entry and schema defined entries, i.e. where we allow the provider to enter a UKPRN (not schema defined) or use a generic code, where a field has both schema defined and free text entries. Reduce the potential for manual error. A number of fields in the model will accept either a schema-defined entry or something non-schema defined such as a UKPRN.
Locate the valid entry that is relevant to my data. Support providers using large coding frames such as those relating to HECoS or countries.
See in line where an entry or no entry causes a schema error. Help providers address errors as they are entered, to support more efficient working.
See a full list of schema errors associated with my file within the tool. Help providers address errors in the data.
Resume file editing in a later session. Support provider working practices and the need to continuously refine a file during the collection period.
Use a downloadable tool on all major operating systems. Support provider working practice. Not all providers will use Microsoft products (for example) so the tool needs to work across systems.
Ability to clone a completed entity within the tool. Provider requested this ability to improve efficiency in data entry.
Open a partial or full XML file against an XSD with a different version number of a specification. As version numbers for a specification can change within and across collections providers need to be able to open a file against a different version of the schema where possible. For example, collection 1 and 2 might have different specification versions but 96% of the students returned in collection 1 are the same as collection 2. For providers it would be easier to use the data from collection 1 as a basis rather than re-key it all for collection 2.
See which fields are mandatory in the tool. Assist providers to complete the data.
Navigate from the data entry tool to the relevant page in the data dictionary. Assist providers to find field/entity guidance quickly to complete the data.
View the entity relationships in the data entry tool. To help providers understand the data model it would be helpful for the tool to reflect the parent/child relationships in the data entry tool.
Date fields in the data entry tool to be presented chronologically irrespective of the order in the schema. Fields are ordered alphabetically in the schema, but this is not always a logical order for data entry - having an end date before a start date is likely to lead to a greater chance of entry error.
Tool to 'remember' where I last loaded/saved a file, so I only need to do this once per session. Assist providers using the tool and locating their data files.


Business requirements: enrichment

HESA enriches the submitted data with a number of derived calculations to support quality assurance and analysis. These derivations are available to providers through in collection reporting, to customers through deliveries and for onwards use of the data.

Business requirements Justification
Ability to create a derived field. Allow the system to contain and consume derived fields data.Allow the system to contain and consume derived fields data.
Ability for the system to tell the user which derived fields are affected by a change to the field's component parts. Will allow data stewards to understand the impact of any change to the component parts of a derived field.
Ability to manage non-structural information about a derived field (guidance and related information). Allow notes and supporting information to be recorded about a derived field.
Ability to give the derived field a technical definition. A technical definition will allow internal developers to code the derived field but will also give external users this information so they can re-create the derived field should they require.
Ability to assign a coding frame to a derived field. To give a derived field valid values and corresponding labels.
Ability to associate the underlying code which defines how the derived field operates to the derived field in the system. The underlying code can be written and associated with a derived field in the system to allow it to function.
Ability to test derived fields are deriving data correctly before being implemented on the live system. Standard testing practice.
Ability to publish derived field specification to external users. To expose derived field specifications to external users.
Require the status of a derived field to determine how it is utilised within a specification  To enable users to understand whether a derived field is active or inactive.
Ability to create a derived entity. To allow for a table of derived data to be created.
System to tell the user which derived entities are affected by a change in data model/specification. To allow user to understand the impact on derived entities of a change in the data model.
Require the ability to manage non structural information about a derived entity.  
Ability to associate the underlying code with a derived entity, which defines how the code operates.  
Ability to test derived entity outputs before making them live. Standard testing practise.
Ability to publish derived entity specifications to external users.  
Require the status of a derived entity to determine how it is utilised within a specification. Need to expand what we mean here as part of design, i.e. each status defined and what it means.


Business requirements: reference data

HESA enriches the submitted data with a number of derived calculations to support quality assurance and analysis. These derivations are available to providers through in collection reporting, to customers through deliveries and for onwards use of the data.Reference data is used to support the collection, quality assurance and dissemination processes. Reference data is not supplied by the provider but is used to support the processing of the submitted data and the collection process more broadly – for example enabling validation against the UCAS data for HESA (*J).

Requirement description Justification
Notification when mergers and demergers will occur, type of merger/demerger and date from when it is applicable. Also, notification when providers change their IDs. Require details of provider mergers, demergers and changes of IDs and how it impacts the data collection.
Record when mergers and demergers will occur, the type of merger/demerger and the date from when it is applicable. Must be able to record the details of provider mergers and demergers and how it impacts the constituencies for data collection. This information will feed into a history of mergers and demergers required for the production of time series information.
Ability to set up the merger of two or more providers by stream, including updating the collection constituencies in line with the merger.

Must be able to set up a merger to include following information:

  • Which provider/s are being merged
  • Which provider/s will no longer be returning data to HESA
  • Change details of existing provider/s if required
  • Set up a new provider if required by the terms of the merger.
Maintain a history of mergers so that the knowledge of mergers/demergers/changes of ID on top of mergers is retained. Analysts using the collected data to produce time series need to know how the provider has changed over time.
Set up demergers on the data collection system by stream, updating the collection constituencies in line with the demerger. Must be able to set up a demerger on data collection system in terms of which providers are demerging, changing the details of the original provider and setting up a new provider if required.
Maintain a history of demergers. Analysts using the collected data to produce time series need to know how the provider has changed over time.
Require the ability to change at least one of the IDs of a provider. Providers sometimes apply for a new UKPRN if their status changes, e.g. achieve university status.
Require the ability to maintain a history of providers changing their IDs so that the knowledge of mergers/demergers/changes of ID on top of changes of ID is retained. Analysts using the collected data to produce time series need to know how the provider has changed over time.
Require the ability to view a history of mergers, demergers and changes of ID by stream and/or provider, in chronological or reverse chronological order. Analysts using the collected data to produce time series need to know how the provider has changed over time.
Group all the data from more than one provider together or group a subset of data from more than one provider together for reporting purposes. Some onward use of the data for publication requires these sorts of groupings, e.g. joint medical schools for Unistats and Discover Uni.
Report and update duplicate identifiers prior to the collection for a merged provider opening. Minimum requirement to look at all data returned in the previous 12-months for when the data is rolled up for dissemination.
Providers to apply historical amendments in the context of their structure at the time the correction applies to. Statutory customers require that the data is disseminated in the same structure as the provider had at the time of the original submission.
Create reference data tables. Reference data tables do not need to be the same as the data model holding the collected data, it could be modelled separately or held outside of the modelling software.
Ability for HESA users to view content of a reference entity, i.e. what does the data look like. Allow users to query the data and understand its content.
Import a set of reference data relevant to a reference table. Allow the data relating to a reference table to be imported from an external source.
Set reference tables to a published status. Indicate that the table/data is ready to use.
Test reference table content before making it live. Ensure upload process was successful.
Set reference entities to a draft status. Ensure the table is not used by the data collection system before it is published.
Ability for internal users of the system to understand the uses of each reference data table. Allow users to understand where each reference data table is used across the system, this will allow impact of change to be assessed.
Require the ability for quality assurance (QA) rules to access reference data. Allow reference data to be utilised as part of the QA process.
Enable collection set-up to access reference data. Allow the collection set-up to access a list of providers to be included in the collection and access such items as reference period dates.
Associate versions of reference data to a specification. Ensure the correct version is used for a specific specification.
Ability for a rule to use global metadata on providers, e.g. medical schools.  


Business requirements: validation service

To assist providers in preparing their data HESA will provide a service which enables providers to subject their data to validation checks to identify quality issues prior to submission of data for review by HESA.

Requirement description Justification
Ability for users to test data in an environment not readily accessible to users outside of their organisation. Provide a way for users to expose their data to checks before uploading it to the platform, as they do not want to expose 'draft' data to HESA.
Ability to use the validation service across different types of organisation. HESA subscribers, software suppliers and EFECs also use the validation kits for some collections and would need to do so going forward whether the tool is online or locally downloadable.
Ability to test my data in a 'sandpit' area before and during a collection. The validation kit is often used prior to a collection opening through to the close of the collection. Peak usage of the kit coincides with the peak usage of the data collection system.
Ability to use a validation service to expose a file to validation checks. The validation kit, as the name suggests, assists providers when testing their data and helps users to improve the quality of data before exposing it to HESA.
Ability for the validation service to require minimal development input for a collection. Reduce the need for developer input.
Ability to use a validation service for different data streams and formats. Pre-submission validation would be useful for all of HESA's collections and would encourage providers to engage with quality assurance (QA) earlier.
Ability to use a validation service for an XML collection. The XML collections tend to have the most complex data structures and requirements, and the greatest QA burden. These collections are most likely to benefit from a pre-submission validation system.
Ability to access reports on activity from the validation service. Activity reports provide HESA with information useful to the operation and development of the system.


Business requirements: entry data carry forward

Through consultation providers showed a clear preference to continue the current process where by ‘on entry’ data is only required to be submitted at the start of the instance and is thereafter pulled forward by HESA for use in validation and reporting.

Requirement description Justification
Ability to pull forward entry data for an entity with the same primary key from a previous submission in the same stream when a new version is not submitted in this collection. This only applies to an entity that has been identified as eligible for being pulled forward In the student and Student Alternative collections providers only have to submit EntryProfile and QualificationsOnEntry data the first time an instance is returned. For subsequent submissions, the data is copied and added to the new collection. This action reduces the risk of providers changing on entry data by mistake.
Ability to perform and report on ‘fuzzy matching’ if the data to be pulled forward cannot be located and the record with the missing entities on input is not new data. Identifies possible broken links as providers may have changed a key identifier in error.
Ability to continue the quality assurance (QA) process with missing data if data cannot be pulled forward because there are no allowable links. The QA process must be allowed to carry on if the data cannot be pulled forward.
Ability for 'on entry' data to be pulled forward from an earlier collection for use in outputs such as data supply and data deliveries where it relates to an active engagement/registration. ‘On entry' data can be submitted once or resubmitted during the lifecycle of a record. For ease of analysis, this data should continue to be available to the end user during each collection in which the associated engagement is returned.


Business requirements: end collection reporting

Following the closure of the collection, signed-off data will be delivered to HESA’s customers to enable them to carry out their statutory and regulatory functions.

Requirement description Justification
Ability to select at provider level what data must be included in a de-duplicated roll-up three-collection picture following a merger, demerger or change of identifiers. For GDPR reasons, some mergers/demergers HESA will be able to merge/demerge the historic data for delivery and not for others.
Ability to deliver a merged/demerged de-duplicated rolling three-collection view of the data based on current structure of a provider. Where the GDPR arrangements permit, data must be disseminated based on the latest structure/identifiers of the provider with any rolled-up historic data fitting into that same structure.
Require the ability to deliver a de-duplicated rolling three-period view of the data, ignoring any changes to the provider structure as the result of any mergers or demergers. GDPR arrangements for the merger/demerger may dictate that the historic data cannot be merged/demerged for delivery.
Ability to analyse the latest signed-off data for a collection. Although in most cases the 'latest' picture would be used, there are cases when it would be important to be able to identify what was submitted at a given time - even if that data was later reported to be incorrect.
Ability to disseminate data for a collection. HESA's role is to supply data for statutory purposes.
Ability to remove versions of a delivery from the portal. To support the need to remove access to deliveries if an issue is identified or in line with the specified retention period it must be possible to remove access to specific iterations of a delivery.
Ability to regenerate or retrieve a historic delivery. To support instances where a customer has failed to download a final delivery, it must be possible to recreate it as it would have been at the time.