What are derived fields under Data Futures?
The Data Futures programme envisages providing a number of ‘value added’ products back to providers as a by-product of submitting data in-year but also by allowing derivation across years of data submission. These fields will enable providers to replicate many of the standard published outputs of HESA data whilst also allowing analysis and quality assurance processes to be carried out earlier and in more depth than is currently possible with HESA data supply.
As part of this, we anticipate providing additional data back to providers, calculated or ‘derived’ using the data submitted at each stage. These data fields are identical in concept to the derived fields given back to providers in Data Supply reports, and used by HESA analysts to create bespoke data.
Whilst the concept of these fields remains the same, the scope of such fields is expected to be wider than is currently produced to provider greater value to both the demand and supply sides. We will look to produce new derived fields that meet specific high profile requirements on the demand side and that aid internal analysis on the supply side. The aim of these fields will be to both provide the demand side with data that requires less manipulation for their requirements and provides the supply with prior sight of potential onward usage as well as greater analytical potential from the data.
Types of derived fields we envisage
- Extrapolated values
Using the variables of a record to derive a specific pre-determined value. There are subtypes of this category:
Derived values that determine ‘pots’ of students that are used for analysis, e.g Standard Registration Population.
Identifying an overarching category from more granular data – e.g. subject area from subjects, level of study from course aim codes.
- Calculated values
Using the variables of a record to calculate a dependent mathematical value. Examples: tariff scores, FPE and FTE calculations.
- Associated values
Using the variables of a record to associate it with a value from a third-party source – something external to the collected data. Examples – using postcode variables to assign geographical categories from the ONS datasets, UKPRN used to associate an XINSTID01 code.
Derived field dictionary
We have started creating a dictionary of derived fields which is contained within the spreadsheet below. The fields are arranged by segment although many will require data from multiple segments in order to be calculated. We have used the current list of derived fields as a starting point and added in some more which may be useful. Please note however that the list is only indicative at this stage and by no means exhaustive. Please also note that calculation of a field at an early stage does not preclude their calculation at successive stages – we anticipate linking to previously submitted data will be routine, and derived fields would be part of that.
Process for creating and maintaining derived fields
We will put in place a process to create, maintain and decommission fields according to the needs of all of HESA’s customers – both at the supply and demand sides. It is envisaged that this process will involve some form of value threshold and acceptance criteria to justify the creation of a field, but the initiation of a request could come from any interested participant in the submission/dissemination process.
A number of these proposed derivations will require the collection of more granular data. Please refer to V3: Implementation approach for more information.