Skip to main content

Understanding schema errors

Contents

What is a schema error?

The schema describes the structure of the XML document (number of elements, whether an element can be empty, default/fixed values, etc.) and valid entries. The schema is defined by the XSD. Schema errors occur where there is a problem with the structure or order of the file, or an invalid character is included.

Schema errors prevent the validation being run in full because the file cannot be read. This means that errors cannot be traced to a particular record. Instead the position of the error in the file is given in the form of a line number in the file to assist you in identifying the problem.

Tip: Using a text editor programme to open the raw data will show the line numbers.

Understanding schema errors

The language used to describe a schema error can be difficult to understand at first glance. In order to assist you in identifying and rectifying any errors, a glossary of terms is included below.

Expression

Meaning

Schema

The schema describes the structure of the XML document (number of elements, whether an element can be empty, default/fixed values, etc.) and valid entries. The schema is defined by the XSD.

Entity

A single entity name groups together a set of fields which have the same relationship.

Field

A field is an attribute (data item) of an entity. This refers to the short field name as shown in the coding manual.

Reason for null

Used to describe a field requiring an explanation for a null value. It must be accompanied by a reason code, for example. 2: not sought, 3: refused or 9: not applicable.

Parent element

XML is made up of elements. Entities are known as the ‘parent elements'.

Child element

XML is made up of elements. Fields are known as ‘child elements' as they belong to a ‘parent element' (i.e. entity). 

Nested

Elements (fields and entities) can belong to one each other and thus are nested.

Worked examples of schema errors for the DLHE record

1)    Invalid values

In these examples the error message states that the value returned is invalid. This means that it does not conform to the valid entries included in the field descriptions.

Schema errors - image 1

Where the value is referenced as " this means that the value is missing and the file therefore contains an empty field. Within the file this would look like:

<EMPBASIS></EMPBASIS> or <EMPBASIS> </EMPBASIS>

The XML structure does not allow for blank values and so each field in the file must contain a valid code. If the value is missing you will need to input the appropriate value. If however a value cannot be identified and the field is not required for the record, you should remove the field from the file altogether.

Tip: The minimum and maximum occurrences detailed in the field detail in the coding manual will inform you whether or not a field can be excluded from the record.

Schema errors - image 2

In this version of the error message there has been a code returned, however it does not conform to the schema and is invalid. Schema checks ensure that the code returned is as included in the list of valid entries for the field as can be found in the coding manual. If you encounter this error message you must review and amend the data submitted.

2)    Invalid child elements 

Within the schema a parent element refers to the entity, or group of fields, and a child element refers to an individual field. For example, in the DLHE record ‘Employment' is an entity and therefore a parent element, whereas the ‘SALARY' field is a child element.

Where the error message refers to an invalid child element it means that the fields within an entity in the file are not in the correct order as defined by the schema (XSD). This could mean that a field is missing altogether, or that existing fields are not in the correct place. The error message will give you some indication of the problem. In the example below the issue is in line 11 of the file and relates to the Employment entity. In this example, SALARY is the unexpected field as validation was expecting the EMPBASIS field. To resolve the issue you would need to review the raw data file and look at the fields in this entity and make any necessary changes before resubmitting. The schema tree diagram may assist you in identifying the correct order and consequently rectifying the issue with the file.

Schema errors - image 3

The raw data contains:

Looking at the schema for this collection, it can be seen that the following fields are in the wrong order:

Therefore the correct structure would be:

<Employment>

<JOBTITLE>Cleaner</JOBTITLE>

<JOBDUTIES>Cleaning</JOBDUTIES>

<SOCDLHE2010>92330</SOCDLHE2010>

<POSTDOC>2</POSTDOC>

<SALARY>22000</SALARY>

<EMPBASIS>03</EMPBASIS>

Employment

JOBSNO

JOBTITLE

JOBDUTIES

SOCDLHE2010

POSTDOC

EMPBASIS

SALARY

<Employment>

<JOBTITLE>Cleaner</JOBTITLE>

<JOBDUTIES>Cleaning</JOBDUTIES>

<SOCDLHE2010>92330</SOCDLHE2010>

<POSTDOC>2</POSTDOC>

<EMPBASIS>03</EMPBASIS>

<SALARY>22000</SALARY> 

     

3)    Incomplete content

This type of error is similar to the invalid child element issue shown above; however in this case the issue is that there are fields missing from the parent element/entity. This error may occur where a required or mandatory field has been excluded from the record because the value is not known. In this case you would need to locate the missing data and ensure that this is included in the XML extract submitted.

Schema errors - image 4

  The error message above shows that after the ALLACT field validation was expecting to see the HEWORKEXP field in the file, but instead the extract below shows that the entity ends without this field being included:

The raw data contains:

Looking at the schema for this collection, it can be seen that the following fields are missing:

Therefore the correct structure would be:

<STATUS>01</STATUS>

<APRJAN>2</APRJAN>

<MIMPACT>1</MIMPACT>

<ALLACT>1</ALLACT>

</Student>

STATUS

APRJAN

MIMPACT

ALLACT

HEWORKEXP

HESTUDYEXP

HEBUSNEXP

<STATUS>01</STATUS>

<APRJAN>2</APRJAN>

<MIMPACT>1</MIMPACT>

<ALLACT>1</ALLACT>

<HEWORKEXP>value</HEWORKEXP>

<HESTUDYEXP>value</HESTUDYEXP>

<HEBUSNEXP>value</HEBUSNEXP>

</Student>

4)    Invalid character

This type of error occurs where there is an unexpected character found in the file. This might be something that falls outside of the data type, such as a non UTF-8 character. In the example below, there is a < tag missing from a field which is causing the validation to fail.

Schema errors - image 5

5)    Missing closing tag

This type of error occurs where an entity or field is missing a closing tag. Remember that each field or entity must be correctly opened and closed in order for validation to read the file. For example, in this error the final closing tag of </DLHERecord> is missing and this means the file structure does not confirm to the schema definition.

Schema errors - image 6

Schema errors - image 7

Locating the error

The validation kit provides information in the ‘location' section of the error message to assist institutions in finding the issue within the file. Often this will be purely a line reference (as in the example below). Using a text editor to open the raw data will give line references so that you can find the affected line of data.

Schema errors - image 8

Note however that schema errors mean that the validation kit cannot read the file and so once something wrong is found validation stops because it cannot progress beyond this point. As a result the information displayed will always contain a line number but may not contain Student.HUSID. Schema errors are designed to give as much location information as possible but the amount of information displayed will depend on how far through the file the validation programme reaches and the nature of the error.