Understanding schema errors
Contents
- What is a schema error?
- Understanding schema errors
- Worked examples of schema errors for the DLHE record
- Locating the error.
What is a schema error?
The schema describes the structure of the XML document (number of elements, whether an element can be empty, default/fixed values, etc.) and valid entries. The schema is defined by the XSD. Schema errors occur where there is a problem with the structure or order of the file, or an invalid character is included.
Schema errors prevent the validation being run in full because the file cannot be read. This means that errors cannot be traced to a particular record. Instead the position of the error in the file is given in the form of a line number in the file to assist you in identifying the problem.
Tip: Using a text editor programme to open the raw data will show the line numbers.
Understanding schema errors
The language used to describe a schema error can be difficult to understand at first glance. In order to assist you in identifying and rectifying any errors, a glossary of terms is included below.
Expression |
Meaning |
---|---|
Schema |
The schema describes the structure of the XML document (number of elements, whether an element can be empty, default/fixed values, etc.) and valid entries. The schema is defined by the XSD. |
Entity |
A single entity name groups together a set of fields which have the same relationship. |
Field |
A field is an attribute (data item) of an entity. This refers to the short field name as shown in the coding manual. |
Reason for null |
Used to describe a field requiring an explanation for a null value. It must be accompanied by a reason code, for example. 2: not sought, 3: refused or 9: not applicable. |
Parent element |
XML is made up of elements. Entities are known as the ‘parent elements'. |
Child element |
XML is made up of elements. Fields are known as ‘child elements' as they belong to a ‘parent element' (i.e. entity). |
Nested |
Elements (fields and entities) can belong to one each other and thus are nested. |
Worked examples of schema errors for the DLHE record
1) Invalid values
In these examples the error message states that the value returned is invalid. This means that it does not conform to the valid entries included in the field descriptions.
Where the value is referenced as " this means that the value is missing and the file therefore contains an empty field. Within the file this would look like:
<EMPBASIS></EMPBASIS> or <EMPBASIS> </EMPBASIS>
The XML structure does not allow for blank values and so each field in the file must contain a valid code. If the value is missing you will need to input the appropriate value. If however a value cannot be identified and the field is not required for the record, you should remove the field from the file altogether.
Tip: The minimum and maximum occurrences detailed in the field detail in the coding manual will inform you whether or not a field can be excluded from the record.
In this version of the error message there has been a code returned, however it does not conform to the schema and is invalid. Schema checks ensure that the code returned is as included in the list of valid entries for the field as can be found in the coding manual. If you encounter this error message you must review and amend the data submitted.
2) Invalid child elements
Within the schema a parent element refers to the entity, or group of fields, and a child element refers to an individual field. For example, in the DLHE record ‘Employment' is an entity and therefore a parent element, whereas the ‘SALARY' field is a child element.
Where the error message refers to an invalid child element it means that the fields within an entity in the file are not in the correct order as defined by the schema (XSD). This could mean that a field is missing altogether, or that existing fields are not in the correct place. The error message will give you some indication of the problem. In the example below the issue is in line 11 of the file and relates to the Employment entity. In this example, SALARY is the unexpected field as validation was expecting the EMPBASIS field. To resolve the issue you would need to review the raw data file and look at the fields in this entity and make any necessary changes before resubmitting. The schema tree diagram may assist you in identifying the correct order and consequently rectifying the issue with the file.
The raw data contains: |
Looking at the schema for this collection, it can be seen that the following fields are in the wrong order: |
Therefore the correct structure would be: |
---|---|---|
<Employment> <JOBTITLE>Cleaner</JOBTITLE> <JOBDUTIES>Cleaning</JOBDUTIES> <SOCDLHE2010>92330</SOCDLHE2010> <POSTDOC>2</POSTDOC> <SALARY>22000</SALARY> <EMPBASIS>03</EMPBASIS> |
Employment JOBSNO JOBTITLE JOBDUTIES SOCDLHE2010 POSTDOC EMPBASIS SALARY |
<Employment> <JOBTITLE>Cleaner</JOBTITLE> <JOBDUTIES>Cleaning</JOBDUTIES> <SOCDLHE2010>92330</SOCDLHE2010> <POSTDOC>2</POSTDOC> <EMPBASIS>03</EMPBASIS> <SALARY>22000</SALARY>
|
3) Incomplete content
This type of error is similar to the invalid child element issue shown above; however in this case the issue is that there are fields missing from the parent element/entity. This error may occur where a required or mandatory field has been excluded from the record because the value is not known. In this case you would need to locate the missing data and ensure that this is included in the XML extract submitted.
The error message above shows that after the ALLACT field validation was expecting to see the HEWORKEXP field in the file, but instead the extract below shows that the entity ends without this field being included:
The raw data contains: |
Looking at the schema for this collection, it can be seen that the following fields are missing: |
Therefore the correct structure would be: |
---|---|---|
<STATUS>01</STATUS> <APRJAN>2</APRJAN> <MIMPACT>1</MIMPACT> <ALLACT>1</ALLACT> </Student> |
STATUS APRJAN MIMPACT ALLACT HEWORKEXP HESTUDYEXP HEBUSNEXP |
<STATUS>01</STATUS> <APRJAN>2</APRJAN> <MIMPACT>1</MIMPACT> <ALLACT>1</ALLACT> <HEWORKEXP>value</HEWORKEXP> <HESTUDYEXP>value</HESTUDYEXP> <HEBUSNEXP>value</HEBUSNEXP> </Student> |
4) Invalid character
This type of error occurs where there is an unexpected character found in the file. This might be something that falls outside of the data type, such as a non UTF-8 character. In the example below, there is a < tag missing from a field which is causing the validation to fail.
5) Missing closing tag
This type of error occurs where an entity or field is missing a closing tag. Remember that each field or entity must be correctly opened and closed in order for validation to read the file. For example, in this error the final closing tag of </DLHERecord> is missing and this means the file structure does not confirm to the schema definition.
Locating the error
The validation kit provides information in the ‘location' section of the error message to assist institutions in finding the issue within the file. Often this will be purely a line reference (as in the example below). Using a text editor to open the raw data will give line references so that you can find the affected line of data.
Note however that schema errors mean that the validation kit cannot read the file and so once something wrong is found validation stops because it cannot progress beyond this point. As a result the information displayed will always contain a line number but may not contain Student.HUSID. Schema errors are designed to give as much location information as possible but the amount of information displayed will depend on how far through the file the validation programme reaches and the nature of the error.