Skip to main content

Guidance and support with using XML

What is XML?

XML stands for eXtensible Mark-up Language. XML is the international standard for data transfer. It enables hierarchical data structures to be transferred in a single file.

The majority of HESA returns are made using XML:

  • Student record
  • Staff record
  • Graduate Outcomes record
  • Aggregate Offshore record
  • Unistats record
  • ITT record
  • Provider Profile record
  • Student Alternative record 

Data Model

The specification of each of our records begins with the data model. A data model (sometimes known as an Entity Relationship Model) defines the entities covered by the specification and the relationship between the entities.

Entities have attributes; for example, gender identity is an attribute of a student. Therefore, a field is an attribute of an entity. To refer to specific fields, our documentation uses the format Entity.SHORTFIELDNAME.

An *.xml file contains elements. An element is identified using tags (defined with 'pointy brackets') and each element must have a start tag and an end tag. Each tag contains the element name and the end tag also contains a / character.

Elements can contain data; for example, a birth date element might look like this:

<BIRTHDTE>1975-06-18</BIRTHDTE>

Elements can be nested within other elements to represent hierarchical data structures. An element can be an entity or an attribute (i.e. a field). For example, this Student entity has attributes of name and date of birth:

<Student>
<FNAMES>Joeseph William</FNAMES>
<SURNAME>Bloggs</SURNAME>
<BIRTHDTE>1975-06-18</BIRTHDTE>
</Student>

Additional support

There are many XML training resources on the web, including the W3C tutorial

Terminology

  • Entity: A single entity groups together a set of fields which have the same relationship.
  • Field: A field is an attribute (data item) of an entity.
  • Reason for null: Used to describe a field requiring an explanation for a null value. This must be accompanied by a reason code, for example, 2: not sought, 3: refused or 9: not applicable.
  • Parent and child elements: XML is made up of elements. Entities are known as the parent elements; fields are known as child elements as they belong to a parent element (i.e. an entity). 
  • Nested: Elements (fields and entities) can belong to one another and are therefore 'nested'. 
  • Schema: The schema describes the structure of the XML document (number of elements, default/fixed values, whether an element can be empty, etc.) and valid entries. The schema is defined by the *.xsd.

Special characters in XML

Elements that include characters with special meaning in XML (such as < less than, > greater than, & ampersand, ' apostrophe and " quotation mark) must be replaced with the appropriate entity reference. Further information is available here.

Character Name

Entity Reference

Character Reference

Ampersand

&amp;

&

Left angle bracket

&lt;

<

Right angle bracket

&gt;

>

Straight quotation mark

&quot;

"

Apostrophe

&apos;

'

Common issues and queries

What is schema error?

As with any other data format, the file structure must be in the correct order. Schema errors are triggered when data fields are not in the correct order. Validation will be unable to run fully until these schema errors are corrected. The *.xsd files in each record coding manual show the position of the entities and fields within each data stream. Find out more information about Schema errors.

How do I complete a field that doesn't apply?

The flexibility of the XML model means that fields can have minimum and maximum occurences. This means that the field does not need to be completed where it does not apply, or is not required. Within the record description for each field the minimum and maximum occurences are defined. These should be viewed in conjunction with the coverage of the field, as the field may be required for certain groups.

Certain fields require an empty element to be returned where they do not apply. For example, this is required for the ENDDATE field in the Student and Student Alternative records. If a student has not finished their Instance, the ENDDATE does not apply and so an empty element will be returned:

If an empty element is required where the field does not apply, the guidance for the field will indicate this.

XML Schema Definition (*.xsd) 

The specification of an XML structure is contained in an XML Schema Definition (*.xsd) file. This defines the elements, their optionality and their structure. Specifically, the *.xsd defines:

  • The minimum and maximum occurrences for each element. If the minimum value is one, then the element has to be present in every case. If the value is zero, this implies that there are cases where this element does not occur: these cases will be controlled through validation rules.
  • The nesting of elements. It defines which fields belong to which entity, and defines the hierarchical structure of the entities.
  • The data types. The XSD includes defining lists of valid entries, charactersets for name fields and details of which fields are date or numeric format.
  • The order in which elements must appear within submitted files. This can be different to the order that they are presented within the coding manual.

Sample files

In order to further assist institutions, we have produced sample files to show how the XML file may look for the different records. The files are designed to show the structure of data and pass schema validation only.