PSI-MS: Mass Spectrometry Standards Working Group
The PSI-MSS working group defines community data formats and controlled vocabulary terms facilitating data exchange and archiving in the the field of proteomics mass spectrometry.
Current projects are:
- The mzML format, which merges the mzData format (see below) and another similar format mzXML. mzML 1.1.0 was released on June 1, 2009 and has been stable since then. Everyone is encouraged to implement mzML 1.1.0 in their software. See the mzML information page for the full specification, other documentation and examples.
- The TraML format has been developed as a standardized format for the exchange and transmission of transition lists for selected reaction monitoring (SRM) experiments. This specification has been been accepted through the PSI document process and is complete. Please email the list email@example.com with your questions, comments, and suggestions. See the TraML information page for the full specification, other documentation and examples.
Past achievements are:
- The mzData standard, which captures mass spectrometry output data. mzData’s aim is to unite the large number of current formats (pkl’s, dta’s, mgf’s, …..) into a single format. mzData has been released and is stable at version 1.05. It is now deprecated in favor of mzML.
Note that the mzIndentML (formerly AnalysisXML) format now comes under the aegis of the Proteomics Informatics Working Group.
|Chair||Eric Deutsch, Institute for Systems Biology|
|Co-chairs||Pierre-Alain Binz, Centre Universitaire Hospitalier Vaudois CHUVHenry Lam, Hong Kong University of Science and Technology|
|Secretary||Andrew Dowsey, University of Bristol|
|MIAPE||Pierre-Alain Binz, Centre Universitaire Hospitalier Vaudois CHUV|
|Ontology Co-ordinators||Gerhard Mayer, Ruhr-University Bochum|
Other Working Group Members:
Steffen Neumann, IPB Hall
Florian Reisinger, European Bioinformatics Institute
David Creasy, Matrix Science Ltd.
Wilfred Tang, Applied Biosystems
Pete Souda, University of California, Los Angeles
Angel Pizarro, University of Pennsylvania
Sean Seymour, Applied Biosystems
Status: Released 1.1.0 [more information…]
As of 2006 there existed two separate XML formats for encoding raw spectrometer output: mzData developed by the PSI and mzXML developed at the Seattle Proteome Center at the Institute for Systems Biology. It is recognized that the existence of two separate formats for essentially the same thing generates confusion and extra programming effort. Therefore the PSI, with full participation by ISB, developed a new format intended to replace the previous two formats, by merging the best ideas from each format. This new format is called mzML. See the information page for mzML which includes current documentation, example files and other related materials. We encourage everyone to implement mzML in their software and workflows and cease using mzXML and mzData as soon as possible.
The PSI-MS Controlled Vocabulary is developped in common with the PSI-Proteomics Informatics group. It consists of a large collection of structured terms covering description and use of Mass Spectrometry instrumentation as well as Protein Identification and Quantitation software. The source of the terms are multiple: they include vocabulary and definitions in chapter 12 of the IUPAC nomenclature book, instrument and software vendors and developers and other user-submitted terms. Although its structure and use is linked to mzML, mzIdentML and mzQuantML, it is dynamically maintained in a OBO format.
The latest version of the PSI-MS CV is available here.
To request new CV terms to be added to the PSI-MS Controlled Vocabulary, please use the psidev-vocab mailing list.
More information on PSI CVs can be found here.
Status: version 1.05. Deprecated.
mzData in a nutshell
mzData is a data format capturing peak list information. Its aim is to unite the large number of current formats (pkl’s, dta’s, mgf’s, …..) into one; mzData. mzData is NOT a substitute for the rawfile formats of the instrument vendors. Some vendors, if not all, will provide software transforming their raw files to mzData. There are already a number of programs which can use mzData. In order to keep the filesize of mzData limited, mz/intensity information is stored in “binary base 64 format”.
Technical description and other resources
- Schema and Specification Documents
- [UPDATED:2008-04-25]A full list of known mzData implementations.
- mzData example instance files.
- The controlled vocabulary for use with mzData is here: PSI-MS CV
Questions? Consult the PSI-MS email discussion list