Validator Tutorial: Introduction

Overview of the architecture

The PSI Validator is a framework that allows one to validate a data against a set of rules. These rules can defines how controlled vocabularies and ontologies are used, but also, arbitrary rules that are defined and implemented by the developer a specific instance of a validator.

PDF version of this tutorial.

A. Bird's Eye View


B. Technologies and Requirements

The validator framework was written in Java and uses Maven 2 as build system. The configuration of the framework is mostly done using XML files. Should you wish to write your own validator, the following requirements apply:

  • Java 5 and higher
  • Maven 2 and higher (if you wish to take advantage of existing infrastructure)
  • A data model written in Java (this is the data you are going to validate) you can also use our sample data model to try the validator out.

C. Validator's Components

The validator is built in a component oriented manner, here is a short decriptions of the major ones:

  • Controlled vocabularies and Ontologies access: this module is meant to give a unified access to Controlled vocabularies and Ontologies (whether they are available locally or remotely) via a simple API.
  • The Controlled Vocabulary Mapping Rules are definition of Controlled vocabularies and Ontologies usage in a specific data model. By mean of XPath like expressions, one can define what ontology terms are allowed in a specific location of a data model.
  • The User Defined Rules are defined and implemented by the Validator's developer when Mapping rules do not allow to perform the desired validation. These rules do have access to the controlled vocabularies and ontologies and their complexity can potentially be much higher as YOU are coding them.

Below is a simple comparison of the 2 kinds of rules a validator can be build upon:

D. Flow of a Validation

Once you have a data model and a validator consisting of a set of rules, you can run your first validation. Here we define step by step how this is done:

  1. Data model is submitted to the validator. Usually one would rather submit specific objects to validate than the whole data model at once. However, every data model is different and the granularity of the objects defined in this model would vary accordingly, consequently, you have to defined for yourself what is going to be your unit of work (eg. a car in a car factory, a molecular interaction in a proteomics experiment, ...).
  2. If any CV mapping rules have been defined, the validator is going to run an internal validations on them and potentially remove all those that are not valid. The model provided is then run on the remaining rules. Messages are returned should a validation exception occur, this message include a description of the issue, a level of severity (values in DEBUG, INFO, WARN, ERROR, FATAL).
  3. If any user defined rules have been defined, the validator is running each of them on the data model and here again, messages can be generated upon exceptions.
  4. All messages are returned to the user that is then free to process them.

This tutorial will now take you through the steps required to build a Validator and is organised as follow:

  1. How to write your own validator ?
  2. Getting access to the needed Ontologies and Controlled vocabularies
  3. Building rules to map ontologies and controlled vocabulary terms to your domain model
  4. Building your own rules
  5. Wiring it together: build your validator and run it on sample data
  6. Download Validator's tutorial source code


E. Contact

Should you have any further questions about the Validator, please send an email to mayerg97 [at] rub [dot] de