• C11041 Campus Information System return date is 1 June 2012. View details
  • C12025 Staff collection coding manual version 1.3 is now live
  • C12061 KIS record coding manual v1.3 available at C12061
  • C11051 Student xml validation kit rules now available. Download kit
  • HE Business and Community Interaction Survey Publication 2010/11 to be released 24 May. Pre-order your copy now.
  • Did you know you can follow HESA on twitter? @UkHESA

Help with xml validation kits

 

This help document contains the following sections:

Downloading the kit
Running the kit
Validation kit updates
Validation results
Errors tab
Schema errors
Warnings tab
Summary tab
Setting switches
Saving the error file
Configuration
Rules, results and switches folders
Max number of errors
Size of batch
Max viewable file
Use proxy

 

Downloading the kit

This software runs the validation kits for the Student (Cyy051), Aggregate Offshore (Cyy052) and ITT In-Year (Cyy053) data collections.

To download the kit, select the MIS installation file from the 'Validation Overview' located on the coding manual page for the Student, Aggregate Offshore and ITT collection pages. This will open the setup wizard. Follow the step-by-step instructions of this wizard.

HESA recommends using a machine with the minimum specification of a Dual core processor and 2 GB RAM.

Running the kit

Double-click on the validation kit icon located on your desktop, or alternatively select and open the file from its saved location on your PC.

From the ‘Collection' drop-down choose the collection you wish to validate.

vkit_collection.gif

From the ‘Country’ drop-down select the country in which your institution is located. You will only need to select the country of your institution the first time you use the kit. Subsequently the kit will remember your selection.

vkit_country.gif

Select the file you wish to validate using the 'Browse' button and then click 'Validate data' .Please note that the time taken by the kit to validate a data file will depend on both the size of the file, and the batch size you have specified in the Configuration Tab.

Validating data will activate three new tabs in which the validation results are displayed: Errors, Warnings and Summary.

Validation kit updates

When you start the validation kit it will automatically check the HESA server for updates, downloading these automatically where available. If new validation rules are available the kit will inform you that these updates have been applied automatically. If a new version of the kit is available then instructions for installation of the update will be provided. Details of downloaded updates will be listed on the Systems Info tab.

Validation results

The validation kit runs two stages of checks against your xml data file; schema rule checks and business rule checks. The validation kit will only progress onto the business rule checks once the file is clear of schema rule checks. Note that switches can only be applied to business rules.

Errors tab

The Errors tab details individual records failing errors which are required to be resolved before the file will pass INSERT‐stage validation.

vkit_error.gif

Schema errors

The validation kit will generate schema errors where the submitted xml file does not confirm to the schema (.xsd) of the record.

vkit_schema.gif

Warnings tab

The Warnings tab details individual records triggering warnings which should be considered to confirm that the coded data is genuine.

vkit_warning.gif

Summary tab

The Summary tab provides a summary total of the number of records failing a particular error or warning.
The blue hyperlinks provide direct access to the relevant coding manual page to enable the user to cross-check the coding requirements of the field with which the validation rule is associated.

vkit_summary.gif

Setting switches

The validation kit provides users with two methods by which switches can be applied to deactivate validation rule. Note that only business rules can be switched off.
Through the Errors tab and Warnings tab switches can be applied to one particular validation rule for set records.
To set a switch for a desired record click the tick box situated to left hand side of the error and then click the ‘Set Switches’ button.

vkit_errorswitch.gif

Through the Summary tab switches can be applied across your file to prevent all records from failing.
To set a switch against all records tick the check box to the left of the error text and click the ‘Set switches’ button.

vkit_summaryswitch.gif

To keep track of the switches which have been set view the ‘Switches’ tab. This tab logs all applied switches.
From this tab, switches can be turned back off by de-selecting the switch you wish to re-activate and clicking the ‘Set Switches’ button.

vkit_switchtab.gif

Saving the error file

The ‘Errors’ and ‘Warnings’ tabs provide users with the option to save details of the validation errors into a tab delimited text file.
This file can then be opened in Excel in order to provide a working document of listed errors.

Configuration

The ‘Configuration’ tab provides users with the option to specify certain functionality of the kit.

vkit_config.gif

Folder for rules, results and switches

This section enables users to specify the location to which this data is stored on their local PC. By default this will be set to an area specified by your Windows profile. HESA recommends using local folders for results, rules and switches and not using a network. Users should also note that the results folder will contain temporary files generated by each run of the kit and that users might wish to clear this folder out regularly.

Note: There are two check boxes within this section of the configuration tab which enable users to clear down the temporary files created when a validation kit is run and also to clear down older versions of validation rules when updates are received.

Max number of errors

Once the number of errors (or warnings) has reached this value no further errors will be reported. Increasing this value may increase the time taken to produce the results. Entering a value of zero here will remove the limit and all errors will be reported.

Size of batch

The data is validated in batches of records, this is done to avoid loading the whole data file into memory which could cause your machine to run very slowly or crash. Please note that the default batch sixe is not the optimum value. In adjusting the batch size you should note the following:

  • Reducing the size of the batch will reduce the amount of memory required to process your file, however the kit will have to validate more batches and so may take more time.
  • Increasing the size of the batch will increase the amount of memory required to process you file but reduce the number of batches being processed, if your machine has sufficient memory this may improve the speed of validation.

A bigger value of batch size does not necessarily mean better performance (the maximum for 2GB memory is about 700 or 800) and therefore setting this value to a larger number will not result in better performance. However, trial and error will be needed in order to set the correct value for this parameter for any specific configuration. The optimum batch size will vary between computers and will be influenced by the volume of data being passed through. You can try changing this value to gauge the effect of validating your data on your computer.

Max viewable file

If the result of a validation is very large it could cause memory problems when attempting to display them. This value provides a limit beyond which the results of the validation will not be shown in the kit (although it will be stored in the 'Folder with results' folder if you wish to view it in your web browser). If you encounter a message stating that the result file was too large you could modify this value. In any event you should be able to view the summary tab of the results page.

Use Proxy

Where your institution runs a firewall which prevents the kit from successfully communicating with the HESA servers, where the validation rules and updates reside, the proxy settings can be completed to facilitate the running of the kit. Tick ‘Use Proxy’ and set all the fields listed in that section to values applicable for your institutions configuration.