Controlled Vocabularies
Released October, 2006
Last maintenance update, October 2017
Table of Contents
- Introduction
- OBO PSI CVs by working groups
- Recommendation for PSI CVs
- Common PSI CVs
- Mapping between exchange schema and CVs
- Maintenance procedure
- Further information and relevant links
- References
Introduction
The Controlled Vocabularies (CV’s) of the Proteomic Standard Initiative (PSI) provide a consensus annotation system to standardize the meaning, syntax and formalism of terms used across proteomics, as required by the PSI Working Groups. Each PSI working group develops the CV’s required by the technology or data type it aims to standardize, following common recommendations for devoplement and maintenance. At the PSI meeting in Washington (Science 296, 827), it was decided that all PSI working groups should adopt the same CV’s standardizing some overlapping concepts (units and resources). Finally, we propose a common mapping schema to describe for each data file schema the associations between its specific elements and the PSI CV’s or other external ontology resources. Such mappings support the validation of XML files.
OBO PSI CVs by working groups
- PSI-MI CV is available here, send comment to psi-mi@ebi.ac.uk
- PSI-MS CV supporting the mzML, TraML, mzIdentML, mzQuantML and mzTab schemas is available here, send comment to psidev-ms-vocab@lists.sourceforge.net
- XLMOD CV is available here
- PSI-MOD CV is available here and documented here, send comment to psidev-mod-vocab@lists.sourceforge.net
- PSI-GEL is supported by the SEP ontology, documented here, the CVs are available here, and comment can be sent to psidev-gps-dev@lists.sourceforge.net.
Recommendation for PSI CV’s
The recommendations for creation and maintenance of PSI CVs are defined in the Guidelines for the development of Controlled Vocabularies. Please send any comment to psidev-onto-dev@lists.sourceforge.net
Common PSI CV’s
Developing CV’s is a process of collecting, and if necessary defining terms. Every effort must be made to adopt and re-use existing ontologies or CV’s where they exist, to avoid “re-inventing the wheel”. As stated by OBO Foundry “we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies”. Therefore it is recommended to represent the following concepts as described.
Units It is RECOMMENDED to use, and to contribute to the Unit ontology by requesting required terms via their mailing list .
Chemical Entities For the representation of Chemical entities it is RECOMMENDED to use terms from Chemical Entities of Biological Interest ChEBI. ChEBI is also available from the OBO Foundry website.
Phenotypic quality For the representation of Phenotypic quality (e.g. age, color, shape etc.) it is RECOMMENDED to use terms from the quality ontology and request any missing term via the dedicated mailing list.
As common reference system for databases, the MIRIAM Registry is recommended.
Mapping between exchange schema and CV’s
The mapping of the exchange schema elements supported by CV’s with a common mechanism would greatly increase the cross-compatibility of the PSI resources and facilitate joint development of data validation tools. Therefore we propose a simple XML schema , providing a pattern to write a mapping file (see also the documentation and an example file for MI WG). A validation tool based on this mapping documentation is being developed for both MI and MS WGs.
Maintenance procedure
The PSI-MS CV evolved over time by important contributions of a wide community, including hardware and software vendors, which contributed much to the high-quality definition for many terms. The further development of the PSI-MS CV is an ongoing process. For this the HUPO-PSI working groups defined some guidelines for the development of controlled vocabularies. In addition the detailed maintenance process advanced over the time and some informal best practices evolved for it. Previously, requests for new terms were done by filling in a form on the PSI-PI website and by discussing the new term proposals or terms in dispute via an issue tracker. Now everyone in the proteomics community is free to subscribe to the ‘psidev-ms-vocab’ mailing list and to make proposals for new terms or improvements of the already existing ‘psi-ms.obo’ terms. Also requests to restructure parts of the ontology are possible, for instance when it emerges that the current hierarchical structuring of terms is sub-optimal or needs a reorganization because of new technological developments, but in all these cases it is warranted that already existing terms are never deleted from the ontology because of the obsoletion mechnism. Often there are also proposals discussed in the telephone conferences of the various PSI subgroups, so that the update can be done within about 5 working days after such a request, provided that there are no objections and there is consensus about the requested terms. The current maintenance procedure is now described as it has been applied to the ‘psi-ms.obo’ ontology file since January 2012 (see Figure).
This maintenance work is coordinated by the PSI ontology coordinator. He is a member of the proteomics scientific community and is normally elected at the annual HUPO-PSI spring meeting or appointed by the steering committee in the case that an emerged vacancy for this position must be assigned between these meetings. After receiving a request for a new CV term the PSI ontology coordinator checks if the term and its description, data type, parent terms and relations are sensible. If necessary any inconsistencies are clarified by consulting the proposer of the term. The ontology coordinator then checks, whether a term with the same meaning is already present in the ontology or if the term is necessary at all. The coordinator also checks, if the naming of the terms and synonyms are in accordance with the IUPAC (International Union of Pure and Applied Chemistry) nomenclature for mass spectrometry terms. If an attribute with the same meaning is already present in the schema of the corresponding data format, then typically the CV term will not be added to avoid duplication of information.
An additional rule is used if a term is related to MALDI (Matrix Assisted Laser Desorption Ionization)-checking: if the term is already present in the MALDI imaging obo and whether the term would be more suitable in that ontology. If there are proposals about chemical substances, used e.g. for matrix solutions, then it is checked if the substance is already defined in the ChEBI ontology. In that case the request is denied and the proposer is given notice that they should consider to use a CV term referencing the corresponding ChEBI entry instead. If not then the CV coordinator can request the ChEBI team to incorporate the substance into their ontology if it fulfills the criteria for inclusion into ChEBI. If not, it is checked if the substance is defined in the PubChem database, and a new term in the PSI-MS CV is created, which references this PubChem entry by specifying the corresponding ‘dbxref’ term at the end of the def: tag line.
A term which passes all these checks then is included in the next release candidate of the obo file. This release candidate is then sent to the three mailing lists psidev-ms-vocab, psidev-pi-dev, and psidev-ms-dev for public discussion. In order to allow a prompt update of the CV after requests for new or changed terms, there is no regular schedule for the update process, so that if there is no objection, the new terms of the release candidate become part of the next official release of the obo file, which is made public about 5 working days after the release candidate. Otherwise the term under question is further discussed by the subscribers of the mailing lists, either by email correspondence or, if necessary, in a telephone conference call, until everything gets clarified and the community comes to a consensus about the exact definition of the discussed term, whereat the consensus should be reached by the strength of the arguments. As far as possible the term names should be general and non-proprietary. In case that vendor-specific terms are inevitable, for instance because they describe a proprietary software or product, the term name can be assembled by a leading identification for the proprietary product, followed by a colon and the actually CV term name. This naming mechanism can also help to prevent possible blockades resulting from conflicts of interest between rivaling companies. Then the date and version are updated and the new obo file is officially released by the ontology coordinator by first checking its syntactical correctness using the ‘Verification Manager’ of OBO-Edit and then transferring it to the GitHub website. The release of the new obo version is then announced to the three mailing lists stated above together with a small summary of the new and / or changed terms. The version number of the PSI-MS CV has the format ‘x.y.z’. An increase in x means the release of a major build, i.e. that a change in a root level term took place, whereas an increase of y indicates the addition of new terms or the obsoletion of terms and an increase of z means that only minor changes like the editing of names or definitions was done.
In cases where a merging, splitting, replacement or deprecation of an ontology term is necessary, e.g. due to upcoming new technologies or instruments or changes in standard formats, the old terms must be set obsolete by assigning the ‘is_obsolete’ relation to them, but they must stay inside the ontology to ensure backwards compatibility of instance data files already making use of these now obsoleted terms.
Further information and relevant links
- Ontology lookup service (OLS)
- OBO Foundry portal
- GeneOntology (GO)
- Ontology for Biomedical Investigations (OBI)
- BioPortal
- OBO-Edit
- Berkeley Bioinformatics Open-source projects
References
- The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary
- Controlled vocabularies and ontologies in proteomics: Overview, principles and practice
- Ontological analysis of controlled vocabularies used in PSI/MSI supported XML standards
- Ontology usage in Omics Standards Initiatives: Pros and Cons of enriching XML data formats with controlled vocabulary terms
- Ten minutes guide for requesting new CV terms in the PSI-MS CV
- The PSI-MOD community standard for representation of protein modification data
Gerhard Mayer, mayerg97 :at: rub.de, October 2017