Guidance and support with using XML

 

 

What is XML?

XML stands for eXtensible Mark-up Language and can be used to structure, store, and transport data.

XML is now the international standard for data transfer, enabling hierarchical data structures to be transferred in a single file.

The majority of HESA returns are made using XML:

  • Student record
  • Staff record (from 2012/13)
  • DLHE record (from 2011/12)
  • Aggregate Offshore record
  • KIS record (from 2012/13)
  • ITT record

Data Model

The specification of each record begins with the data model. A data model (sometimes known as an Entity Relationship Model) defines the entities covered by the specification and the relationship between the entities.

Entities have attributes; for example, gender is an attribute of a student. Therefore, a field is an attribute of an entity. HESA documentation uses the structure Entity.SHORTFIELDNAME (e.g Student.GENDER) when referring to specific fields.

An XML file contains elements. An element is identified using tags (defined with 'pointy brackets') and each element must have a start tag and an end tag. Each tag contains the element name and the end tag also contains a / character.

Elements can contain data; for example, a birth date element might look like this:

 
<BIRTHDTE>1975-06-18</BIRTHDTE>

Elements can be nested within other elements to represent hierarchical data structures. Therefore, in the HESA Student record an element can be an entity or an attribute (i.e. a field). For example, this Student entity has attributes of name and date of birth:

 
<Student>
<FNAMES>Joeseph William</FNAMES>
<SURNAME>Bloggs</SURNAME>
<BIRTHDTE>1975-06-18</BIRTHDTE>
</Student>
 

Further detail

There are many XML training resources on the web, including the W3C tutorial which can be found at www.w3.org/TR/xmlschema-0

More information about XML can be found in the Technical Formats document at http://www.hesa.ac.uk/submit-tech.

Terminology

Terminology

Definition

Entity A single entity groups together a set of fields which have the same relationship.
Field

A field is an attribute (data item) of an entity.

Reason for null

Used to describe a field requiring an explanation for a null value. This must be accompanied by a reason code, for example, 2: not sought, 3: refused or 9: not applicable.

Parent element

XML is made up of elements. Entities are known as the ‘parent elements'.

Child element

XML is made up of elements. Fields are known as ‘child elements' as they belong to a ‘parent element' (i.e. entity). 

Nested
Elements (fields and entities) can belong to one another and thus are 'nested'. 
Schema

The schema describes the structure of the XML document (number of elements, whether an element can be empty, default/fixed values, etc.) and valid entries. The schema is defined by the XSD.

 

What are the benefits?

  • XML is a more robust structure and allows greater flexibility in the data model.
  • XML means that what may previously have been separate files are combined into one, meaning that institutions can validate across entities (previously tables) using the validation kit locally.

Common issues and queries

What is schema error?

As with any other data format the file structure must be in the correct order. Schema errors are triggered where data fields are not in the correct order. Validation will be unable to run fully until these schema errors are corrected. The XSD files in the record coding manual show the position of the entities and fields within each data stream.

How do I complete a field that doesn't apply?

The flexibility of the XML model means that fields can have minimum and maximum occurences, so that where the field does not apply or is not required it does not need to be completed. In other formats, fields would need to be completed with the default code in order to pass validation. Within the record decription for each field the minimum and maximum occurences are defined. These should be viewed in conjunction with the coverage of the field, as the field may be required for certain groups. 

Can multiple files be submitted?

Yes, institutions can submit multiple XML files to the system however it must be ensured that the data is complete and discreet. A common issue found is where, for example, a student is studying two programmes but each is contained in a separate file. It is therefore important to ensure that any common information is consistent.

 

XML Schema Definition (XSD) and schema trees

The specification of an XML structure is contained in an XML Schema Definition (XSD) file. This defines the elements, their optionality and their structure. Specifically:

  • The XSD defines the minimum and maximum occurrences for each element. If the minimum is one then the element has to be present in every case; a value of zero implies that there are cases where this element does not occur - these cases will be controlled through validation rules.
  • The XSD defines the nesting of elements - in this case defining which fields belong to which entity, as well as defining the hierarchical structure of the entities.
  • The XSD defines the data types. The XSD includes defining lists of valid entries, charactersets for name fields, as well as defining which fields are date or numeric format.
  • The XSD defines the order in which elements must appear within submitted files. This can be different to the order that they are presented within the coding manual.

HESA have also produced schema trees to assist institutions in interpretation of the XSD and to illustrate the order of the fields. Please note that the schema trees should be used in conjunction with the XSD and not instead of it.

 

Sample files

In order to further assist institutions HESA have produced a number of sample files to show how the XML file may look for the different records. The files are designed to show the structure of data and pass schema validation only.