Molecular Interaction XML Format: schema changes from version 1.0 to 2.5

HUPO Proteomics Standards Initiative Protein Interaction Specification Documentation 

Proteomics Standards Initiative

Molecular Interaction XML Format 2.5

Documentation of schema changes from version 1.0 to 2.5

December 2005

 

Significant changes have been made from MIF 1.0 to 2.5. The overall aims were to

  • increase the expressive power of the format, going from “intersection” to “union” of molecular interaction annotation, allowing inter-database exchange of fully annotated records;
  • perform a cleanup and fix minor bugs.

 

Format changes

Although MIF 2.5 has many changes and is significantly more complex than 1.0, nearly all changes are optional additions, and the minimal representation of a given interaction has not been significantly expanded. In detail, the changes are:

 

  1. <experimentDescription>
    1. bibref is now mandatory. Submissions are considered bibrefs.
    2. confidence is now a confidenceList, allowing multiple confidence values.
  2. <interaction>
    1. added id attribute
    2. added interaction/parameterList element to describe especially kinetic parameters.
    3. added an optional flag "negative" with default value false. If set to "true", it indicates that the interaction has explicitely been described as NOT being observed in the experiment.
    4. added inferredInteractionList to allow correct description of complex topology, with supporting experimental evidence.
    5. imexId has been added for the purpose of the IMEx molecular interaction exchange consortium.
  3. <proteinInteractor> and
    <proteinParticipant>
    1. To enable representation of more general interactions, not only protein interactions, proteinParticipant and proteinInteractor have been renamed participant and interactor, respectively, and an element interactorType, controlled by a new controlled vocabulary, has been added. This allows a high flexibility for representation of general molecular interactions.
  4. <interactor>
    1. added modelled flag: If true, it describes an interaction in a species of interest, e.g. human, but has actually beeninvestigated in another organism, e.g. mouse. The transfer will usually be based on a homology statement made by the data producer. If this optional element is missing, it is assumedto be set to false.
    2. added intraMolecular flag: If true, it is an intramolecular interaction, e.g. an autophosphorylation. If missing, this element is assumed to be false.
  5. <participant>
    1. added id attribute
    2. The addition of participant/interactionRef allows the representation of hierarchical structure in complexes, e.g. composition of a receptor complex from subunits, and interaction of such a receptor complex with a ligand.
    3. participantIdentificationList has been added to allow the description of the participant identification method on a per-participant basis, not only a global method on the experiment level.
    4. The previous “role” attribute has been split into the biological role, e.g. enzyme/target, and the experimental role, e.g. bait/prey.
    5. New element experimentalFormList has been added to allow description of experimental forms, e.g. protein tags
    6. ExperimentalInteractorList allows the representation of homology-based deductions made by the data provider. For example, an experimentalist might work with mouse proteins to make a statement on a human system. In this case, the experimentally used protein would be stored in an experimentalInteractor element, the human protein would be stored in the normal participant. On <interaction> level, the flag <modelled> should be set.
    7. The <confidence> element has been extended into a <confidenceList>.
  6. <feature>
    1. added id attribute
    2. renamed featureDescription -> featureType
    3. renamed featureLocation -> featureRange
    4. added ExperimentRefList. This allows to refer to one or more experiments in which the feature has been determined.
    5. added a <names> element
    6. added a <attributeList> element, to allow handling of free text feature description.
  7. <featureRange>
    1. each feature now has a list of range elements, to allow representation of discontinuous features, e.g. structural domains.
    2. the range element has been restructured to allow fuzzy locations, and start/end ranges.
    3. site and position have been removed
  8. Administrative changes
    1. id attributes
      1. The type of id attributes has been changed from xs:ID to xs:int. xs:ID requires that any id is unique in the file. This was incompatible with the denormalised form of MIF 1.0, where e.g. the same protein may be listed more than once.
        Ids are now defined to be arbibrary integers, unique to each object within an <entry>.
        The type xs:int has been chosen to provide an easy mapping to standard data types, as it provides a limited range of integers, while xs:integer represents the mathmatical concept of integers with an unlimited value range.
      2. All major objects now have an id attribute, namely <experiment>, <interaction>, <interactor>, <participant>, <feature>.
    2. Method-related elements have been renamed for clarity:
      1. ParticipantDetection -> ParticipantIdentificationMethod
      2. InteractionDetection -> InteractionDetectionMethod
      3. FeatureDetection -> FeatureDetectionMethod
    3. namesType extended by addition of an optional list of aliases.
    4. Ordered sequence of standard elements. New order is:
      1. names
      2. bibref
      3. xref
      4. other
      5. attributeList
    5. created a new complex type confidenceType and inserted it in all previous occurrences of confidence elements.
    6. Added attribute/nameAc to allow controlled vocabulary for attributes.
    7. extended xrefType, now allows a controlled vocabulary representation of database xrefs.
    8. For all string attributes and elements, the length has been set to at least 1. This avoids empty attributes and elements, which could cause problems in data exchange.

 

Controlled vocabulary changes

The major change from PSI 1.0 to 2.5 requires a remapping of controlled vocabularies.

Proposed mappings from PSI 1.0 to 2.5 CVs are described in cv-1to25mapping.doc.

The reverse mapping is described in cv-25to1mapping.txt. This file is presented in plain text format to facilitate parsing.

 


Henning Hermjakob, 21/11/2005

Tags: