Skip to main content

Data Futures Alpha phase

Alpha participants

  • Anglia Ruskin University Higher Education Corporation
  • University of Dundee
  • Lamda Limited
  • London Business School
  • London School of Science and Technology Limited
  • University of Manchester
  • Matrix College of Counselling & Psychotherapy
  • The Open University
  • School of Oriental and African Studies
  • Oxford Brookes University
  • The Queen's University of Belfast
  • University of St Andrews
  • Study Group Limited
  • Swansea University

We are producing progress reports to update all of our colleagues on the work and outputs from the Alpha phase:

Alpha phase two progress report: includes coding manual usability sessions, real data submission

Alpha September progress report: includes online validation toolkit sessions, Alpha wash-up session

Alpha August progress report: includes website top tasks sessions, data migration update

Alpha July progress report: includes user interface (UI) sessions, quality assurance deep dive, data migration

Alpha June progress report

This section includes Alpha communications for reference and to share with colleagues: please also check our Data Futures resources area, which includes supporting toolkits and external presentations.


Alpha webinar 19 March part 2: Q&A session


Alpha webinar 19 March part 1: Introduction to Alpha


Accountable officer email: Alpha pre-notification





Screenshot of Data Futures Alpha phase email sent to Accountable officers


Operational contact email: Alpha phase

Please note, we have removed the link to the Alpha phase expression of interest survey as we require one application per organisation.

Data Futures Alpha phase email sent to operational contacts

​Quick links:

Participation and engagement

Will the pilot be delivered in a way that reflects actual Devolved Administration needs, for example Scottish testers only working on Scottish relevant data / rules? We certainly hope so. Most of the quality rules will have a primary regulator, so if you're a Scottish provider your data will be tested using the Scottish thresholds, so it should be an accurate reflection of the type of quality rules you'll expect once we go live.

Will we just be testing technology or will there be an opportunity to feedback on guidance? We'll certainly be picking up any guidance feedback as well and trying to improve what we currently have in there.

Involvement in Alpha is not guaranteed; what are the selection criteria? We are going to go through a process. We're looking for a split of providers across nations, across provision, ideally also across software houses and providers with in-house software. We want to make sure we've covered providers in England, Northern Ireland, Scotland and Wales. We want to make sure selection is representative and so I don't think we want to discourage people from applying, but a range of different types of provision, different geographic areas, different experience levels, in terms of how long people have been doing HESA returns, will be really valuable to us. Alpha isn't the only opportunity to engage. Even if you apply but aren't selected, that doesn't mean you can't provide feedback as we'll be looking at other ways to engage.

How many Alpha participants are you looking for? We're looking for about 15 participants.

So would it be expected for a provider to engage throughout the whole Alpha phase or could they engage only in the first phase for example, just using test data? It is important that real data is used during the pilot as it allows us to effectively test specific features in the HESA Data Platform. Features such as, data delivery, migration, quality rules and derived fields. It also helps to support our on-going work with the data specification and the data model. Whilst we will encourage participants' support with this, the use of 'real' data in the second phase of the pilot will be optional. There are still lots of tests that can be done effectively using test data and it's still really valuable for us to get that feedback so it [not being able to use ‘real’ data] wouldn't preclude you from the second phase at all.

How many hours or weeks do you envisage an institution will need to commit to this? We expect 10 days' participation. We want Alpha to use moderated testing for some parts of it which means one-on-one with a HESA colleague with you. We would anticipate a mixture of one-on-one and small group exercises. We'll try and make sure that the schedule is workable for everyone and that there's scope to have some flexibility in there, but we would expect most of it to be structured time. Anything else that you can do around that would be a bonus, but there's no expectation.

We are a small alternative provider using an Access database for Student data; would our interaction be helpful during May-August period? Absolutely, your interaction would be very beneficial to us. We're conscious that the sector is very varied, so we want to make sure that the system caters for your requirements.

I suspect there'll be more requests to participate in the Alpha than you can accommodate; how will those who aren't in the pilot be made aware of progress to date and outcomes findings? Whilst access to the systems is restricted to successful applicants, we recognise that the Alpha pilot is of great importance to the rest of the sector. We have published a dedicated Alpha space on our website where we will be sharing knowledge, tests and feedback. 

Do we definitely have to be employed in HE (Higher Education) to work as a tester? To participate in the Alpha pilot you need to be a HESA subscribing organisation. We will be doing conducting parallel testing with Statutory Customers and software suppliers, and there will be opportunities for each of these three groups to work collaboratively. Participation to Alpha is currently restricted to groups of these three types

What opportunities are there for engagement with the sector as a whole during the Alpha pilot? We have the dedicated Alpha space on the web page and we're continuously producing content as e-learning on different areas of the specification and data model so we'll be continuing to release those out to the sector to engage with. 

Will there be any difference between APs (alternative providers) and HEs (HE providers) in Data Futures? No there won't. We've got different categories of providers, Wales, Scotland, Northern Ireland, and we've also got England Approved and Approved (fee cap). As we're amalgamating into one collection, for English providers we will work on the coverage statements and OfS requirements between Approved and Approved (fee cap). There is not a great deal of difference between APs and HEs now we've got the new categories and any differences will not be relevant to Alpha testing.

System testing

Is there reliance on software systems in the Beta? There is not reliance on software systems in Alpha, but yes there would be for Beta. We hope that when we get to that stage that software houses are in place to put data through the data platform, not necessarily just using the data entry tool which is what we expect for Alpha.

For Alpha participants, will access to the system sessions, etc. be available to multiple staff at a participating institution, or would you expect it to overflow through one individual? The Alpha system access will all be run through an Alpha IDS (identity system) instances and also part of the testing on this, is just to make sure that people have the right access to the areas that we want. If you have a hierarchy of roles within one organisation submitting, you'll have multiple instances so you can have multiple people working within the provider.

What does integration with issue management mean in practice? Our quality assurance approach is one of the biggest changes with Data Futures. We are moving away from a manual approach into something much more automated that joins our systems together, so we want to test the full integration between those two systems. This is slightly different to our legacy process, so we want to make sure that firstly, providers understand what the process is and how each of the systems integrate and relate to each other, but also that the functionality fully works as well. That is one of the key areas that we want to test early on to make sure that the user experience works for providers, customers and HESA users. 

Can we access HESA Alpha systems from any computer, or will Alpha access be restricted to a specific IP address. We will be using an Alpha identity system and setting up specific Alpha testing roles which will be allocated to individuals within the provider, so the system doesn't have to be accessed from the same IP address just as long as you've got access to that identity system. With regards to granting those roles to people that don't currently have access to any of the identity roles, the Alpha roles are specific to Alpha, so if you want to allocate these to somebody who doesn't have any access to any of these existing systems, there won't be any kind of crossover between live collections. 

What happens if the pilot demonstrates the need for systemic or significant change? Part of the reason we want to do an Alpha pilot is to catch any issues early; we had an extensive requirements gathering period and some of those requirements are published on our website. The purpose of the Alpha pilot is to make sure that the way we've interpreted the requirements work as expected. We'll triage feedback and re-evaluate our roadmap, but also as part of Alpha we're not just looking at what we have developed, we're looking at what we plan to develop. 

Data specification

How will generation and submission of the XML fit into the Alpha pilot / Will we be able to generate or submit our own XML or will this be done through the data entry tool? It is completely up to you on how you want to do that; if you do have software systems to generate your XML file please do, but if you do want to use the Data Entry Tool that it also acceptable. The Data Entry Tool is a piece of technology that we've produced for Data Futures, so we also want to stringently test that as well. Ideally we'll have a mixture of those using software and also those using the HESA Data Entry Tool.

Regarding the possibility of real data being entered into the data entry tool; would this be existing student data, i.e. data which will eventually be submitted in the current student return at the point of the Alpha pilot? We anticipate this being the case; the student return you submit in October / November time is retrospective, so we expect you to be use that as your real data in the summer. The benefit of using your own student data is we can test more levels of the specification and if you were to use your own student data, that you'll be returning in October time, there might be a couple of tweaks that you might need to make to your data to conform with the reference period dates that we will have in Alpha. So that's also a good test in your understanding of the quality rules that might be produced or the schema rules, which would improve your understanding how to resolve those issues. 

Do you anticipate that the pilot will cover the whole of the specification? We certainly hope so. The Alpha pilot is testing the technology of the Data Platform itself, but also this is a good chance to actually test the specification, guidance and the coverage that we've got published. We are taking this opportunity to make sure that there's full understanding of the requirements of the data and use any feedback to improve that area as well. 

Will the test files you give us be based on how we as a provider provide education, for example PGT (Postgraduate Training) and PGR (Postgraduate Research) only? We're providing test files that will be relevant to various different forms of provision to ensure that providers can submit successfully. We do want to test robustly the entire system and ensure that various different forms of provision can be returned, so we will be curating and tailoring files. We won't be handcrafting files for individual participants because we really need to make sure that we've got full coverage. If, for instance, you are a provider that only has postgraduate provision we wouldn't necessarily give you just test files that only have postgraduate provisions as we need to make sure we've got comprehensive coverage for the testing. But that's not to say that you couldn't then enhance those files or when we get into the phase where you're using real data obviously tailor files to the provision that you have. 

Who will the real Student data be shared with outside of HESA? During the first phase we will be running test deliveries to our Statutory Customers, so one of the things that we need to make sure is that once we've collected all your data that we can actually share it with people who need to see it and so we will be doing that with test data. There will be lots of caveats that this is test data, you can't draw any conclusions from it.

If a provider is using real Student data for testing outside of a live collection presumably this could be anonymised in some way to cover any data protection issues? Absolutely, yes, so in submitting it you can anonymise or pseudonymise it and we could do the same in terms of exporting it and sending it on. 

Is being part of the Alpha a prerequisite to the Beta? No, it isn’t. So anyone who's not an Alpha participant would be eligible to participate in in Beta. We would hope that those involved in Alpha would want to be involved in Beta as well. In Beta we'll be looking for a much wider range of numbers, around 100 participants.

What data quality tools will be available to providers to start with; when do you think data supply outputs will be available? We’re starting Alpha with quality rules and credibility reports. Providers use data supply in lots of different ways and the structure that we're proposing for the Data Futures version of data supply is quite different to what you get in our current collections. We're aware that that will have implications for how you use the data, how you join it together, and all the things that you need to do with it. So even if we can't give you an actual file early on in Alpha, we'd want to talk that through with you and get feedback on that, because we absolutely need that to be fully functional to make Beta work.

Are there any new data sharing requirements that we need to sign up to? We are drawing up participation agreements for Alpha participants and the contracts will cover data sharing. We essentially are re-drafting previous participant agreements and we will share more information on this when it is available. In the second phase we will be making sure any of the sharing of where that data is going will all absolutely be included in those contracts that need to be signed before providers gain access to the HESA Data Platform.

Will we be able to test multiple different files, i.e. testing different engagements or we'll be working off just one file? So yes absolutely there'll be opportunities to test lots of different files. I think we anticipate giving you a handful of files to cover different test scenarios. You can then obviously do whatever you want to do with those to continue your own testing or provide your own, we're expecting you to have lots of opportunities to send lots of different things through.

Is it safe to assume that the Alpha real student data will not link to the actual annual return, i.e. not throw queries for the actual return resulting from Alpha data? 
Absolutely, we will not allow you into Alpha and raise queries on your annual return off the back of it. 

So when using real Student data how many records would you expect to be returned, or do you expect all records to be returned? This relates more to Beta testing, so if we're using the Data Entry Tool we might not be able to return all records maybe just a few records for testing. There's no expectation on the amount of data that you put in for your real test files and there's certainly no minimum amount needed, so whatever you can achieve is more than enough. 

Will standard outputs be available for the Alpha, for example will all derived fields be available? You won't have all the derived fields available in Alpha, a lot are still under development. There will be a significant number that we can use to test, and we're really keen to do is understand more about how you engage with the derived fields, what you need from the specifications, whether we're presenting them in a format that's understandable. We'll be making use of the derived fields as part of our quality rules and credibility reports that will be visible in the Alpha environment for you as well. So there absolutely will be an opportunity to engage but it won't be a comprehensive list, we're still working on those by the time we get to Beta. We would expect to have comprehensive list by Beta, but we would want to road-test some of the areas of the specification that changed most significantly in the derivations with you during the Alpha pilot.

Legal and agreements

Are the legal agreements the same as last time? We're using previous participant agreements to re-draft; we will provide further information when it is available. 

What level of senior sign-off will be required on the participation agreement? In terms of the agreement someone with legal power, so whether that is your accountable officer or a director or a legal representative must sign-off on your organisation's behalf.

Support and guidance

If we find out we've completely misunderstood the guidance on an area, will we have access to support to gain greater understanding? Absolutely. Alpha participants will have access to support. We will create a Microsoft Teams page where we share our tests and there will also be access to support. Some Liaison analysts been working on the guidance, and will be part of that Microsoft Teams group. You can also always contact the Liaison team if you did have any questions outside of Alpha. Alpha is testing the functionality and the technical development of the system but also looking at the specification, the guidance and we're really keen to make sure that it's all accurate and really easy to understand.

All e-learning releases are announced to our operational contacts in the HESA weekly update: if you are a non-operational contact, you can also subscribe to receive the latest news. 

Our e-learning platform is hosted by Easygenerator and you will need to subscribe to access the content. We will only use this information to administer and monitor use of the system.

Data Futures e-learning 

Introduction to Data Migration 

Data migration in Data Futures refers to the translation of legacy data submitted against the C2051/54 collections into the Data Futures data model. The data migration activity will allow quality assurance checks to be run against data submitted by providers for the first Data Futures live collection in 2022/23. This activity will mean there is an amount of replication across continuity and credibility checks, but we are doing this to ensure an appropriate degree of quality assurance as we move from legacy to the Data Futures. Clearly the models are different both at the field level as well as high level concepts so it is important that we implement the migration as accurately as possible to ensure robust assurance can be undertaken. 

Principles of Data Migration

  • We will only migrate data that will be used either directly or to support transition year quality assurance. 
  • We will only use migrated data for quality assurance (QA) where the migration process is deemed robust enough to do this. 
  • We will provide a migrated dataset to each Alpha participant along with a detailed migration specification to support understanding of this data for the transition year submission/QA process. 
  • We will work with providers to develop the migration specification to ensure it is fit for purpose. 

Please view our Privacy information to find out more.