Molecular Interactions

Validator Tutorial: Wiring It Together - Bringing All Components Together


Previous: Building Your Own Rules

Next: Download Validator's tutorial source code


Not that you have created your CV Mapping rules and/or your own object rules, the next logical step is to create your own validator.

Here is a graphical representation of the process of building a validator given the separate components:


As you can see in the above representation, in order to build your own validator, you will have to bring together your configuration files in order to define ontologies, cv mapping rules, and object rules (for which you also have to provide your rules). Once you have brought all of this together inside a project, you can create your own validator as follow :



In this code example, one can see that two methods have been written:

  • The constructor of the SPE Validator that essentially passes the 3 configuration files to the generic validator,
  • The validate method that takes an Experiment and run the cv mapping validation as well as the object rule validation. Any message generated in this process is stored into a collection and returned to the calling process.

Now that we have put everything together, it's time to run our validator on some data and display the result of this validation. Obviously, the aim of this tutorial is not to give a lecture on user interface or even how to write them in Java so we are going to aim at a simple, basic user interface that allows to print the result of our validation on the command line.



Here is what our little program output:

Validation run collected 3 message(s): 

ValidatorMessage{message='The result found at: /molecules/modifications/@id for which the values are ''BLA:0000X'' didn't match any of the 1 specified CV term:
- MOD:01157 (protein modification categorized by amino acid modified) or any of its children. The term can be repeated. The matching value has to be the identifier of the term, not its name.', level=WARN, context=Context(/molecules/modifications/@id ), rule=}

ValidatorMessage{message='The result found at: /molecules/type/@id for which the values are ''SPE:0328'' didn't match any of the 2 specified CV terms:
- The sole term SPE:0326 (protein) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- SPE:0318 (nucleic acid) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.', level=ERROR, context=Context(/molecules/type/@id ), rule=}

ValidatorMessage{message='Experiment id:3 doesn't have a name.', level=WARN, context=null, rule=null}




Previous : Building Your Own Rules

Next: Download Validator's tutorial source code



Validator Tutorial: How to Build Your Own Rules?


Previous: How to Build CV Mapping Rules ?

Next: Wiring It Together - Bringing All Components Together


A. What can these user-defined rules do for you ?


Essentially, whenever the CV mapping rules cannot be used to model the validation you want to apply, the Object Rules are the alternative. There is inherently no limitation to what these rules can do, as long as you are able to program them using the Java langage and the plethora of libraries available on the internet.


B. Implementing your first rule

The validator API defines a class that one has to extend in order to write a rule: 


The class diagram below illustrate this part of the Validator's data model:

As you can see on this diagram, in order to fulfill the contract of an ObjectRule, you will have to implement the following methods:

boolean canCheck( Object object );

Collection<ValidatorMessage> check( Object object )

The canCheck method allows to define what object type (ie. class) a specific rule is able to validate. The second method 'check' is the one that performs the validation and returns messages if inconsistencies are detected.


1. Writing a simple rule


So let's define a first very simple rule that only accesses the data available in the provided instance of the data model. In this example we are still playing with our Simple Proteomics Experiment of which the class diagram is available here.

In this first simple rule, we are going to to look into the Experiment and report an error whenever no name has been given.



If you wish to run this rule yourself, you can download the source code of this sample validator here.


2. Writing a rule that does use Ontologies


Now let's write a rule that reports the following inconsistencies :

  • If the molecule type is protein (SPE:0326), then if the sequence is defined it has to be composed of amino acid only.
  • If the molecule type is nucleic acid or one of it's children term, then if the sequence is defined it has to be composed of nucleic acid only.
  • If the molecule type is ribonucleic acid or one of it's children term, then if the sequence is defined it has to be composed of ribonucleic acid only.
  • If the molecule doesn't have a sequence (unless it is a small molecule), we report a low severity (INFO) message.

Here is the rule implementing these constraints:



Please note that in order to keep this code sample consise, we have removed the import section. Please download the full source code if you want to get the complete version.



B. Configuring Your Set of Object Rules


1. The Object Rules Schema


2. Example of rule set for the two rules defined above




Previous: How to Build CV Mapping Rules ?

Next: Wiring It Together - Bringing All Components Together


Validator Tutorial: How to Build CV Mapping Rules?


Previous: Ontologies

Next: How to Build Your Own Rules ?


In this section we are going to see how one can simply define a direct mapping between a data model and a set of ontologies/controlled vocabularies.

A. Defining how the model is supposed to relate to the ontologies

This is a crucial step in the design of your mapping rules as you are going to define which part of the data model is going to map to which specific part of the ontologies or controlled vocabularies.

B. Formalizing this binding in a configuration file

b1. Format of the configuration file

XSD available here.

Definition of the attributes of each elements:


  • modelName - Name of the PSI data exchange schema, e.g. mzML, GelML, MIF.
  • modelURI - URI of the data exchange schema.
  • modelVersion - Version number of the model supported by the CvMapping file.


  • cvIdentifier - Short label for the CV or namespace, this should correspond to a cvIdentifier attribute of CvTerm in the CvSourceList configuration file.
  • cvName - Full descriptive name for the CV.


  • id - Unique identifier for this rule in the scope of the current configuration file. Idenfiers are alphanumerical.
  • name - A short name for this rule. This may be used for error reporting.
  • scopePath - Element scope in the schema within which the non repeatable (isRepeatable = FALSE) condition applies.
  • cvElementPath - The full xpath expression that define the part of the data model we are mapping.
  • cvTermsCombinationLogic - Boolean operator describing the combination logic of multiple CvTerm elements associated with the same CvMappingRule.
  • requirementLevel - The requirement level indicated, when the XML element exists in the instance data file, if the association with CV terms is optional (MAY), recommended (SHOULD) or mandatory (MUST).


  • cvIdentifierRef - Internal reference (e.g. namespace abbreviation) to a term source file as defined in a CvReference element.
  • termAccession - CV term accession number as in the CV file.
  • termName - CV term name.
  • useTermName - Boolean to set whether the check is done on the termName (TRUE) or on the termAccession (FALSE and default).
  • useTerm - This attribute indicates whether the term itself can be used to annotate data (TRUE) or not (FALSE). This latter case may happen when a term, parent of valid terms for annotation, is mentioned to keep the mapping concise.
  • allowChildren - This attribute indicates whether the children of the described term are allowed to annotate data (TRUE) or not (FALSE).
  • isRepeatable - Value is 'True' when a term can be repeated in the same instance of the associated XML element.


Sample configuration file


C. Example of rule definition

Now let's define a toy example on which we will be able to build a sample custom Validator:


In a nutshell, this model describe an experiment under which one can find one to many molecules. Each molecule is characterized by a sequence (if applicable) and a MoleculeType (values taken from an ontology we have defined in an OBO file: molecule-type.obo) and can have zero to many post translational modifications (values taken from the PSI-MOD ontology).

Here is a graphical representation of the molecule type ontology:

Now let's define some rules based on this data model and express them using the cv mapping.

rule 1: all molecules must have a type that is 'protein' or 'nucleic acid' or one of it's children term

rule 2: if a modification is defined on a molecule, it should be a child term of 'protein modification categorized by amino acid modified' (MOD:01157)


You can download the complete sample file here: cv-mapping.xml

Note: we have tried to develop this component so that it makes the developer's life a little easier when it comes to write your XPath expression. The component automatically verifies that the XPath expression is valid again the instance of the data model submitted and if not correct, a ValidatorMessage will be generated in order to describe the issue, and if possible, provide a solution to fix it. Let's take a look at an example:

We define on the above described model the following Xpath expression:


When you run the validator's CV Mapping Rules on an instance of experiment that does have at least one molecule, you would get the following error message:

Could not find property 'molecul' of the xpath expression 'molecul/modifications/@id' (element position: 1) 
in the given object of: net.sf.psi.spe.Experiment - Did you mean 'molecules' ?



Previous: Ontologies

Next: How to Build Your Own Rules ?


Validator Tutorial: Getting access to the needed Ontologies and Controlled vocabularies


Previous : How to write your own validator ?

Next : CV Mapping Rules

In this section, we are going to see in more details how one can deal with Ontologies and Controlled vocabularies in the validator framework. To start with, you would have to define what ontologies will be required to validate your data model. It could be any one available in the Ontology Lookup Service (, any data in OBO format available on the network or locally.

A. configuration file

- Representation of the schema, as well as the location of the XSD

XSD available at:

1. Description of the attributes

source - Physical source of the CV file or term information. The keywords 'OLS' or 'file' should be used in this attribute and coupled with the appropriate URI. A fully qualified class name is also allowed when it implements the ontology loader interface (ie. and has a public default constructor.
name - Name of the CV as in the PSI CV resource.
identifier - Internal identifier for the CV source to be cross-referenced in the CVTerm instances.
uri - Universal identifier of the CV resource.
format - To describe the CvFomart use consistently the upper case of the acronymes of the CV language, e.g. 'OBO', 'OWL', or the 'plain text' keyword when applicable.
version - Version of the OBO format used.

2. Sample file


You can download this sample file here: ontologies.xml


B. Different types of access

The framework currently allows several ways to access a controlled vocabulary or ontology resource.
We are going to describe below some of the facilities provided:

source={OLS, FILE, user-defined-class}

1. File

This is essentially any obo file that can be found via a URL (http, ftp, file...) or in the classpath or the running application.

a. Using a local file

Local file can be accessed by defining a URL that uses the file protocol, here is an example:


b. Using a URL

Here is an example of access using the HTTP protocol:


c. Using your classpath

If you have made available an OBO file in your classpath, you can access it by prefixing the URI with classpath:, here is an example


2. Ontology Lookup Service

As of May 23rd 2008, OLS has integrated 61 ontologies and 720,114 terms amongst which one can access GO, PSI-MI, PSI-MS, PSI-MOD... The Ontology Manage module is provided with a implementation that uses OLS to access ontologies and controlled vocabularies.

Please note that when using OLS, the URI of the source is not mandatory as OLS is relying on the source's identifier to access the data. A complete list of all supported identifier can be found on the OLS web site


3. Writing your own implementation of OntologyAccess

Currently, only the OBO format is supported. Should one of the ontology or controlled vocabularies you use not been supported you can extend the functionality of the Ontology Manager.
You can write your own class that implements
Now let's say you have implemented an OWL access in the following class:
you can then declare a new CvSource using is as follow:


Obviously, the compiled class OwlAccess would have to be in the classpath when running the validator.

Previous : How to write your own validator ?

Next : CV Mapping Rules



Validator Tutorial: How to Write Your Own Validator


Previous: Overview

Next : Ontologies

In this section, we are going to give more information about what you should if you are planning to write your own validator.

a. Requirements


  • Java 5 or above (
  • Maven 2 or above ( This is not per se a mandatory requirements but as we have developed the framework using it that are many advantages to be gained. Should you choose not to use it, please be aware that we have made available a version of the validator framework on SourceForge that contains all necessary dependencies.
  • A Java IDE to ease the development (in this tutorial I will mostly refer to IntelliJ 13.x --


b. Defining your needs

Here are a few question you could ask yourself before to go any further:

What is to be checked on ?
What part of my data model ?

Am I using ontologies and controlled vocabularies ?
What ontologies and controlled vocabularies is my model using ?
Are these ontologies available in OBO format ?
Are these available in the Ontology Lookup Service ( ? On the Internet ? On your local computer ?

Anything else you need to check on ?
How would I proceed to validate it, how can I implement it ?


In the following sections we are going to define more precisely how to use the various components of the validator.


Previous: Overview

Next : Ontologies


Validator Tutorial: Introduction

Overview of the architecture

The PSI Validator is a framework that allows one to validate a data against a set of rules. These rules can defines how controlled vocabularies and ontologies are used, but also, arbitrary rules that are defined and implemented by the developer a specific instance of a validator.

PDF version of this tutorial.

A. Bird's Eye View


B. Technologies and Requirements

The validator framework was written in Java and uses Maven 2 as build system. The configuration of the framework is mostly done using XML files. Should you wish to write your own validator, the following requirements apply:

  • Java 5 and higher
  • Maven 2 and higher (if you wish to take advantage of existing infrastructure)
  • A data model written in Java (this is the data you are going to validate) you can also use our sample data model to try the validator out.

C. Validator's Components

The validator is built in a component oriented manner, here is a short decriptions of the major ones:

  • Controlled vocabularies and Ontologies access: this module is meant to give a unified access to Controlled vocabularies and Ontologies (whether they are available locally or remotely) via a simple API.
  • The Controlled Vocabulary Mapping Rules are definition of Controlled vocabularies and Ontologies usage in a specific data model. By mean of XPath like expressions, one can define what ontology terms are allowed in a specific location of a data model.
  • The User Defined Rules are defined and implemented by the Validator's developer when Mapping rules do not allow to perform the desired validation. These rules do have access to the controlled vocabularies and ontologies and their complexity can potentially be much higher as YOU are coding them.

Below is a simple comparison of the 2 kinds of rules a validator can be build upon:

D. Flow of a Validation

Once you have a data model and a validator consisting of a set of rules, you can run your first validation. Here we define step by step how this is done:

  1. Data model is submitted to the validator. Usually one would rather submit specific objects to validate than the whole data model at once. However, every data model is different and the granularity of the objects defined in this model would vary accordingly, consequently, you have to defined for yourself what is going to be your unit of work (eg. a car in a car factory, a molecular interaction in a proteomics experiment, ...).
  2. If any CV mapping rules have been defined, the validator is going to run an internal validations on them and potentially remove all those that are not valid. The model provided is then run on the remaining rules. Messages are returned should a validation exception occur, this message include a description of the issue, a level of severity (values in DEBUG, INFO, WARN, ERROR, FATAL).
  3. If any user defined rules have been defined, the validator is running each of them on the data model and here again, messages can be generated upon exceptions.
  4. All messages are returned to the user that is then free to process them.

This tutorial will now take you through the steps required to build a Validator and is organised as follow:

  1. How to write your own validator ?
  2. Getting access to the needed Ontologies and Controlled vocabularies
  3. Building rules to map ontologies and controlled vocabulary terms to your domain model
  4. Building your own rules
  5. Wiring it together: build your validator and run it on sample data
  6. Download Validator's tutorial source code


E. Contact

Should you have any further questions about the Validator, please send an email to mayerg97 [at] rub [dot] de


Development of standards-compliant tools for molecular interaction data management



The outcome of this meeting was published here.


Molecular interaction data is a key resource in modern biomedical research, but interactions are often presented in different data formats and managed in different protocols. The HUPO Proteomics Standards Initiative has developed the PSI MI XML format, a community standard for molecular interactions that is now widely implemented. The aim of this workshop is to provide intensive hands-on training in the development of standards-compliant software for molecular interaction data management.


The workshop will take place at the European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, Cambridgeshire, UK. Thanks to generous funding by the European Science Foundation, participation in the meeting if free, including accomodation. However, you will have to pay for your travel expenses. 

Signup for participation is now closed.

For travel information please see 

Scientific Summary

Molecular interaction data is a key resource in modern biomedical research, and molecular interaction datasets are currently generated on a large scale, demonstrating from one to tens of thousands of interactions per experiment. These interaction data sets are represented in many different forms, from simple pairs of protein names to detailed textual descriptions, and are collected in various databases, each with their own database schema. In 2004, the HUPO Proteomics Standards Initiative developed and published the PSI-MI XML format for molecular interactions as a community format for the exchange of protein interaction data. This format had been jointly developed by major producers of protein interaction data and by data providers, among them BIND, DIP, IntAct, MINT, MIPS and Hybrigenics. The PSI-MI XML format is now widely implemented and supported by both software tool development and data providers.

However, practical implementation of the standard often differs from resource to resource, sometimes resulting in actually invalid XML, sometimes only resulting in minor discrepancies in the use of attributes. In addition, a number of additional conventions and protocols leveraging the PSI MI format are currently under development, in particular the PSI Common Scoring Framework (PSISCORE) and the PSI Common Query Interface (PSIQUIC). The aim of the workshop is to facilitate and co-ordinate the precise and efficient use and implementation of the PSI MI standard for molecular interactions, as well as related standards and protocols. The expected result is a harmonized provision of molecular interaction data through participating databases, both in the form of data files and computational services. These interfaces, in turn, will be used by analysis tools like interaction confidence scoring systems or Cytoscape*. The invited participants reflect this aim, the list includes bioinformaticians from key molecular interaction databases and tool developers.

The meeting will be located in the IT training room of the European Bioinformatics Institute, the focus will be on actual collaborative software development, using concepts from the extreme programming paradigm.


Provisional Meeting Program

Sunday, November 16, 2008

13:00 Introduction – Henning Hermjakob, EBI, UK

13:20 The PSI MI XML 2.5 standard – Arnaud Ceol, U Tor Vergata, Rome, Italy

13:40 PSI tools: Java API, validator – Samuel Kerrien, EBI, UK

14:00 PSI tools: RpsiXML - Jitao David Zhang, DKFZ, Germany

14:20 The PSIQUIC common query interface – Bruno Aranda, EBI, UK

14:45 Coffee

15:00 Definition of development targets and developer teams

          Hands-on software specification/development in small groups

17:00 Short status summary

18:00 Hinxton Hall Bar open (self-funding)

19:00 Meeting dinner Hinxton Hall.


Monday, November 17, 2008

09:15 Cytoscape as a web services client - Keiichiro Ono, UCSD, US

10:00 The PSISCORE molecular interaction confidence scoring framework – Hagen Blankenburg, MPI, Germany

10:20 Definition of development targets and developer teams

          Hands-on software specification/development in small groups

12:30 Lunch

13:30 Hands-on software development in small groups

15:00 Coffee

17:00 Short status summary

19:00 Meeting dinner, Red Lion, Hinxton

Tuesday, November 18, 2008

09:15: Definition of development targets and developer teams

           Hands-on software development training in small groups

12:30 Lunch

13:30 Hands-on software development in small groups

15:00 Coffee

17:00 Short status summary


Wednesday, November 19, 2008

09:15: Definition of development targets and developer teams

           Hands-on software development training in small groups

12:30 Lunch

13:30 Summary session: Results and further planning of development and dissemination (publications, updates of

15:00 Meeting end



The optimum results at the end of the workshop would be:

  • The PSICQUIC protocol is formally defined and approved.
  • The participating databases have PSICQUIC implemented
  • Cytoscape can connect to all the databases simultaneously, query for a given list of proteins, and visualize the joint data contents of these databases
  • The PSISCORE protocol is formally defined and approved.
  • The participating resources have PSISCORE implemented.

Background material

Familiarity with the PSI MI 2.5 standard would be highly advantageous: 

Kerrien S, Orchard S, Montecchi-Palazzi L, Aranda B, Quinn AF, Vinod N, Bader GD, Xenarios I, Wojcik J, Sherman D, Tyers M, Salama JJ, Moore S, Ceol A,
Chatr-Aryamontri A, Oesterheld M, Stümpflen V, Salwinski L, Nerothin J, Cerami E, Cusick ME, Vidal M, Gilson M, Armstrong J, Woollard P, Hogue C, Eisenberg D,
Cesareni G, Apweiler R, Hermjakob H.: Broadening the horizon--level 2.5 of the HUPO-PSI format for molecular interactions.
BMC Biol. 2007 Oct 9;5:44. PMID: 17925023


The workshop is generously funded by the European Science Foundation.



Molecular Interactions

Several well-established public databases for protein-protein interaction data exist, including BIND, DIP, IntAct, MINT and MIPS. However, the data were originally provided in many different, incompatible formats.

The PSI Molecular Interaction work group develops and maintains a common data standard that allows users to retrieve all relevant data from different sites and perform comparative analysis of different data sets in a consistent format. The standard comprises an XML format and detailed controlled vocabularies, and allows a detailed representation of a fully curated interaction record.

For a detailed documentation of the PSI Molecular Interaction standard, please see this page and/or the Nature Biotechnology paper.


Molecular Interaction XML Format: schema changes from version 1.0 to 2.5

HUPO Proteomics Standards Initiative Protein Interaction Specification Documentation 

Proteomics Standards Initiative

Molecular Interaction XML Format 2.5

Documentation of schema changes from version 1.0 to 2.5

December 2005


Significant changes have been made from MIF 1.0 to 2.5. The overall aims were to

  • increase the expressive power of the format, going from “intersection” to “union” of molecular interaction annotation, allowing inter-database exchange of fully annotated records;
  • perform a cleanup and fix minor bugs.


Format changes

Although MIF 2.5 has many changes and is significantly more complex than 1.0, nearly all changes are optional additions, and the minimal representation of a given interaction has not been significantly expanded. In detail, the changes are:


  1. <experimentDescription>
    1. bibref is now mandatory. Submissions are considered bibrefs.
    2. confidence is now a confidenceList, allowing multiple confidence values.
  2. <interaction>
    1. added id attribute
    2. added interaction/parameterList element to describe especially kinetic parameters.
    3. added an optional flag "negative" with default value false. If set to "true", it indicates that the interaction has explicitely been described as NOT being observed in the experiment.
    4. added inferredInteractionList to allow correct description of complex topology, with supporting experimental evidence.
    5. imexId has been added for the purpose of the IMEx molecular interaction exchange consortium.
  3. <proteinInteractor> and
    1. To enable representation of more general interactions, not only protein interactions, proteinParticipant and proteinInteractor have been renamed participant and interactor, respectively, and an element interactorType, controlled by a new controlled vocabulary, has been added. This allows a high flexibility for representation of general molecular interactions.
  4. <interactor>
    1. added modelled flag: If true, it describes an interaction in a species of interest, e.g. human, but has actually beeninvestigated in another organism, e.g. mouse. The transfer will usually be based on a homology statement made by the data producer. If this optional element is missing, it is assumedto be set to false.
    2. added intraMolecular flag: If true, it is an intramolecular interaction, e.g. an autophosphorylation. If missing, this element is assumed to be false.
  5. <participant>
    1. added id attribute
    2. The addition of participant/interactionRef allows the representation of hierarchical structure in complexes, e.g. composition of a receptor complex from subunits, and interaction of such a receptor complex with a ligand.
    3. participantIdentificationList has been added to allow the description of the participant identification method on a per-participant basis, not only a global method on the experiment level.
    4. The previous “role” attribute has been split into the biological role, e.g. enzyme/target, and the experimental role, e.g. bait/prey.
    5. New element experimentalFormList has been added to allow description of experimental forms, e.g. protein tags
    6. ExperimentalInteractorList allows the representation of homology-based deductions made by the data provider. For example, an experimentalist might work with mouse proteins to make a statement on a human system. In this case, the experimentally used protein would be stored in an experimentalInteractor element, the human protein would be stored in the normal participant. On <interaction> level, the flag <modelled> should be set.
    7. The <confidence> element has been extended into a <confidenceList>.
  6. <feature>
    1. added id attribute
    2. renamed featureDescription -> featureType
    3. renamed featureLocation -> featureRange
    4. added ExperimentRefList. This allows to refer to one or more experiments in which the feature has been determined.
    5. added a <names> element
    6. added a <attributeList> element, to allow handling of free text feature description.
  7. <featureRange>
    1. each feature now has a list of range elements, to allow representation of discontinuous features, e.g. structural domains.
    2. the range element has been restructured to allow fuzzy locations, and start/end ranges.
    3. site and position have been removed
  8. Administrative changes
    1. id attributes
      1. The type of id attributes has been changed from xs:ID to xs:int. xs:ID requires that any id is unique in the file. This was incompatible with the denormalised form of MIF 1.0, where e.g. the same protein may be listed more than once.
        Ids are now defined to be arbibrary integers, unique to each object within an <entry>.
        The type xs:int has been chosen to provide an easy mapping to standard data types, as it provides a limited range of integers, while xs:integer represents the mathmatical concept of integers with an unlimited value range.
      2. All major objects now have an id attribute, namely <experiment>, <interaction>, <interactor>, <participant>, <feature>.
    2. Method-related elements have been renamed for clarity:
      1. ParticipantDetection -> ParticipantIdentificationMethod
      2. InteractionDetection -> InteractionDetectionMethod
      3. FeatureDetection -> FeatureDetectionMethod
    3. namesType extended by addition of an optional list of aliases.
    4. Ordered sequence of standard elements. New order is:
      1. names
      2. bibref
      3. xref
      4. other
      5. attributeList
    5. created a new complex type confidenceType and inserted it in all previous occurrences of confidence elements.
    6. Added attribute/nameAc to allow controlled vocabulary for attributes.
    7. extended xrefType, now allows a controlled vocabulary representation of database xrefs.
    8. For all string attributes and elements, the length has been set to at least 1. This avoids empty attributes and elements, which could cause problems in data exchange.


Controlled vocabulary changes

The major change from PSI 1.0 to 2.5 requires a remapping of controlled vocabularies.

Proposed mappings from PSI 1.0 to 2.5 CVs are described in cv-1to25mapping.doc.

The reverse mapping is described in cv-25to1mapping.txt. This file is presented in plain text format to facilitate parsing.


Henning Hermjakob, 21/11/2005


Outcome of the Molecular Interaction ESF Workshop

  Development of standards-compliant tools for molecular interaction data management


The original workshop description is available here:



Workshop Participants (by alphabetical order)



working on

Use of the PSI-MI standards

Achuthan Premanand

European Bioinformatics Institute

IntAct [1]

Integration of PSI standards in the IntAct services.

Arnaud Ceol

University of Rome Tor Vergata

MINT database [2]

PSI-MI/XML and tab export/import (exchange), web-services

Bruno Aranda

European Bioinformatics Institute

IntAct [1]

Integration of PSI standards in the IntAct services.

Hagen Blankenburg

Max Planck Institute for Informatics

PSI Common Interaction Confidence Scoring System (PSISCORE)


Distributed Annotation System for Molecular Interactions (DASMI) [3]

PSISCORE uses PSI-MI XML and TAB files as the basis for a common confidence scoring system. Users/clients can send these files to distributed scoring servers, which perform calculations, store the resulting scores in the original files, and return them.

Henning Hermjakob

European Bioinformatics Institute

IntAct molecular interaction database
PSI-MI standard development [1]

IntAct data representation and dissemination
Data exchange with IMEx partner databases

Dr Ian Donaldson

Biotechnology Centre of Oslo, University of Oslo

iRefIndex [4]

We are using PSI-MI standards to consolidate and re-distribute interaction data

Dr Iskra Ventseslavova

IEMAM – BAS, Bulgaria

Differentiation of stem and progenitor cells from bone-marrow material and peripheral blood, in activated myeloid and lymphoid cells with anti-malignant properties


Dr Jan Wildenhain

Tyers Lab, The University of Edinburgh

Programmer on the BioGrid database [11]


Javier de Las Rivas

CancerResearchCenter(CIC-IBMCC, CSIC/USAL)

PSIMEx [6]

Using PSI-MI standards in APID

Jens Hansen

HelmholtzCenterMunich - GermanResearchCenter for Environmental Health

High-throughput interaction screen on disease-related proteins

planned to implement PSI-MI standards  in the project

Jitao David Zhang

DKFZ, Heidelberg


Statistical analysis of PPIN

Johannes Goll

J. Craig Venter Institute

MPIDB [8], Literature curation and analysis of interaction data

MPIDB download format


Jules Kessemakers

Radboud University Nijmegen

 IntAct interaction validation on basis of structural information (SIC - Structural Interaction Confirmation) [1]


Keiichiro Ono

University of California, San Diego(US)

Cytoscape [9]

PSI-MI 1.0 & 2.5 support in Cytoscape

Milan Simonovic

Institute of Molecular Biology, University of Zurich


STRING has a web service API which exposes search results in PSI-MI XML format and STRING team is committed to implement PSICQUIC interface

Sabry Razick

Biotech Center of Oslo, Norway

IRefIndex [4]

 Use PSI-MI as a data source and as data export

Samuel Kerrien

European Bioinformatics Institute

IntAct [1]

Development of the PSI-MI standards and implementation in IntAct


[1] IntAct -

[2] MINT -

[3] DASMI -

[4] iRefIndex -

[5] APID and APID2NET -

[6] PSIMEx -

[7] R PSI XML -

[8] MPIDB -

[9] Cytoscape -

[10] String -

[11] BioGrid -




Opening - Henning Hermjakob

The PSI MI XML 2.5 standard – Arnaud Ceol

PSI tools: Java API, validator – Samuel Kerrien

PSI tools: RpsiXML - Jitao David Zhang

The PSICQUIC common query interface – Bruno Aranda


Cytoscape as a web services client - Keiichiro Ono

The PSISCORE molecular interaction confidence scoring framework – Hagen Blankenburg

IRefIndex – Ian Donaldson


Outcome of the workshop

The goals that have been set for this workshop were essentially targeting 2 fronts:


A protocol definition was finalized and resulted in a final Web Service Definition (WSDL) and made available here:

In order to be able to get a fully working Web Service, we have also had to re-engineer the PSI-MI XML 2.5 schema. It resulted in a version 2.5.4 of this schema that is not backward compatible with previous versions as the namespace was updated from net:sf:psidev:mi  to in order to be standard compliant and allow interoperability.
The resulting schema is available here:

A sample Java project was put together to get people wishing to implement a PSIMITAB based PSICQUIC service started quickly.
Project home:
More information in the WIKI pages:

A session was also held to discuss other features that are seen to be desirable in a version 2 of the PSICQUIC interface, namely, having a Molecular Interaction Query Language that would (like SQL is to databases) allow for flexible querying of interaction resources. The outcome of this session was summarized in this document:

At the term of this workshop, the participants have committed to delivers the following components:

Future PSICQUIC Data Providers


Aiming at a early January implementation of PSICQUIC


Aiming at a early January implementation of PSICQUIC. It will use the reference MITAB implementation code


Trying to solve some technical issues. Will use existing MITAB open source tools

Proposes writing a MITAB validator


Aiming at a early January implementation of PSICQUIC


Currently cannot commit due to licensing issues


Haven't committed to delivering a service yet


Future PSICQUIC Clients



Creating a basic skeleton for a PSICQUIC client. Aiming at integrating it for release 2.6.2 (possibly early January)



RpsiXML package is able to digest WSDL on the server side, to output methods to distribute. There is still bugs/pitfalls to process WSDL file in R from a client side, I suppose it is resolved soon as the WSDL file gets updated .

For now we only support parsing XML files, so we hope to see PSICQUIC returns XML 2.5 soon.

Note: We aim to support MPIDB and IRefIndex as soon as the data is available in XML 2.5 version.



 Once a WSDL has been drafted, it will been agreed upon, it will be made available here:

 A preliminary Web Service Definition was put together by Hagen Blankenburg:

 A summary of the outcome of this track is available in this document:

 At the term of this workshop, the participants have committed to delivers the following components:


Aiming at an early January implementation of PSISCORE.

Hagen proposed to put together a list of controlled vocabulary terms reflecting:

a.      Classes of algorithm used to score molecular interactions.

b.      Known implementation of such algorithms


Aiming at implementing a PSISCORE service based on protein docking algorithm as part of Jules Kessemakers’ traineeship which will end in March 2009.


Finally, a Shared spreadsheet has been put together in order to give the opportunity to any interested party to follow the progress of every participants. The document is available here:

The PSI-MI mailing list ( should be the principal mean of communication between the participants after the workshop. Participant that weren’t registered prior to the meeting have also been invited onto this list.



Subscribe to RSS - Molecular Interactions