Skip to main content

IPO Experimental data collection outcomes and recommendations


In June 2022 HESA ran an experimental data collection to explore the possibilities of linking HE-BCI data to that held by the Intellectual Property Office (IPO). A sample population of UK providers based on their reported figures for their cumulative Patent portfolio to the C19032 HE-BCI data collection. Of this sample, three providers provided application numbers submitted to the IPO during the 2018-19 and 2019-20 academic years totalling 224 application numbers. We requested that institutions highlighted their relationship to the application using the following identifiers:

  • The institution is a named applicant.
  • A spin-out from the institution is a named applicant.
  • An employed member of staff from the institution is a named applicant.
  • A registered student from the institution is a named applicant.

Submitted application numbers were ingested by the IPO who performed a data-matching exercise. They returned 93.3% successful matches with the 6.7% unsuccessful matches including two non-GB application numbers and 13 trademark application numbers.

Of the successful matches they returned the following data items:

  • Filing date.
  • Publication date.
  • Grant date.
  • Patent in force (or potentially in force in future).
  • EPO worldwide bibliographic data (DOCDB) Family ID (where part of a patent family filed in multiple jurisdictions).
  • DOCDB Family size (where part of a patent family filed in multiple jurisdictions).
  • IPO Owner name (1-4).


Analysis of the matched data has highlighted a number of considerations for the potential future linking of HE-BCI data to that held by the IPO.

Missing data

4.8% of applications were returned with no associated filing date, and 89% of were returned with no publication date. We acknowledge that a patent lifecycle typically takes 18 months from application to publication. Therefore, we anticipate that all  applications submitted to the experiment will have reached a point of publication where successful. Of those missing, seven application numbers matched were indicated to be in force in have potential future force. Therefore, we may assume that there is missing publication date data or that further information is needed to fully understand the patent lifecycle from application to publication.

97% of submitted application numbers were matched without a grant date, of which 9.5% were indicated as being either in force or having potential future force. We acknowledge that there is no specified lifecycle for the process of granting successful applications. Therefore, we may consider the need for a longer lead time to produce useful outputs relating to grant dates.

Discrepancies in data

From the sample submitted by three providers, accounting for alternative spellings of IPO Owner names identified 42 distinct intellectual property (IP) owners. Institutions were named differently across applications and abbreviations were sometimes used (i.e. Ltd/ Limited). Therefore, using this data would require HESA to perform a data cleaning exercise to identify inconsistencies in applications submitted by a single institution. 

Five application numbers were submitted by to HESA by providers who stated they were a named owner of the IP, but not named within IPO data.

Future considerations

There are a number of considerations HESA would need to make to evaluate the efficacy and quality of linking HE-BCI data and that held by the IPO.

Data inconsistencies

Firstly, we need to confirm whether applicant names are the same as owner names. Clarifying this could improve guidance for providers submitting application numbers and help to reduce the discrepancies between HESA and IPO data. Additionally, if these are to be the same, HESA would need to confirm whether IPO Owner name as held within the IPO data is taken directly from application and thus spelling and abbreviations are directly reflected in their data. We would also seek clarity on whether the order of owner names (1-4) is indicative of their relationship to the IP or whether any hierarchy of ownership is inferred.

Patent lifecycles

HESA needs to understand how patents with no publication or grant date may be in force or have future potential force. Exploring this would confirm whether there are genuine missing items in the data held by the IPO or whether this data is accurate and reflective of the process from application to publication/ grant.

Data protection

Whilst data held by the IPO is publicly available consideration would need to be given to the publication of named individuals as owners of IP. Furthermore, the implied dentification of relationships between providers and external researchers or companies may have significant political repercussions. Therefore, the suitability of producing outputs would require careful consideration.  


Further experimental data ingestion exercises are needed to explore the outlined considerations. Were clarification is sought we may wish to replicate the exercise with a larger sample population to further test the quantity and quality of data that can be matched to that held by the IPO.

Therefore, whilst the practicalities of linking HESA and IPO data in this way appears to be possible, the potential benefits of doing so need to be further explored and understood.