Skip to content

mzML

History

From 2005-2008 there existed two separate XML formats for encoding raw spectrometer output: mzData developed by the PSI and mzXML developed at the Seattle Proteome Center at the Institute for Systems Biology (ISB). It was recognized that the existence of two separate formats for essentially the same thing generated confusion and required extra programming effort. Therefore the PSI, with full participation by ISB, developed a new format called mzML by taking the best aspects of each of the precursor formats to form a single one. It is intended to replace the previous two formats, which are now deprecated, although still sometimes used by older software.

On 2008-06-01, mzML 1.0.0 was released. In early 2009, several implementation efforts identified a few minor shortcomings in mzML 1.0.0. Since no vendors had yet released software supporting mzML 1.0 yet, the working group decided to release an update in June 2009. It is expected that all software will support mzML 1.1 as the long-term-stable format instead of 1.0. Below is the available documentation for mzML 1.1.0 and related information. Please send feedback to psidev-ms-dev@lists.sourceforge.net.

Status

mzML 1.1.0 was released on 2009-06-01 and has been stable every since. There were initial plans to update a new 1.2 release to support ion mobility mass spectrometry (IM-MS) and data-independent acquisition (DIA) MS. However, as of 2022-11-21, it appears that support for IM-MS and DIA can be achieved without a schema change, with just some additional terms. Please contact psidev-ms-dev@lists.sourceforge.net for more information.


Proposed mzML best practices for encoding IM-MS and DIA data in mzML

(updated 2022-11-21)

The following proposal for adding IM-MS and DIA support has not yet entered the PSI Document Process. Comments to psidev-ms-dev@lists.sourceforge.net or an issue to the GitHub repo are welcome. Some parts of this have already been implemented in ProteoWizard.


mzMLb

One drawback of mzML is that while it compresses well, compressing greatly reduces random access performance. We propose an alternative based upon HDF5 that embeds an mzML XML document but stores each type of binary data array as separate datasets. HDF5’s chunked compression method makes random access to compressed data orders of magnitude faster than for compressed mzML. See Bhamber et al. 2021 for more details.

References

  • Bhamber, R. S., Jankevics, A., Deutsch, E. W., Jones, A. R., & Dowsey, A. W. (2021). mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements. Journal of Proteome Research, 20(1), 172–183. https://doi.org/10.1021/acs.jproteome.0c00192

mzML Release Schedule

(updated 2022-11-21)

    • 2008-06-01 mzML 1.0.0 released

    • 2009-06-01 mzML 1.1.0 released

    • 2010-06-01 mzML index wrapper schema updated to 1.1.1

    • 2022-11      Minor updates to CV still occur, but no new schema changes are planned at this time
    • 2023-03-17 mzML schema adjusted to 1.1.1 to allow the schema to tolerate common identifiers better


mzML 1.1.0 Finished Specification

(updated 2024-02-09)

The information and documents in this subsection are related to mzML 1.1.0, revised after going through the PSI document process on May 19, 2009. Everyone is encouraged to implement mzML 1.1.0. It is hoped that mzML 1.1.0 will remain stable for a long time.

NOTE: On 2010-06-01, the mzML index schema was updated from 1.1.0 to 1.1.1. There was no functional change, but rather the addition of an enumeration constraint to an attribute to prevent creative, unintended values. This could cause some files that previously validated to no longer validate. However, any such files should never have successfully validated in the first place.

XML schema definition files:

– mzML1.1.1.xsd (main schema)

– mzML1.1.3_idx.xsd (separate and optional index)

– Latest mapping file, which defines where certain controlled vocabulary terms may be used in a document.

– Latest version of the controlled vocabulary (CV) in OBO 1.2 format.  (OBO-Edit)

– Latest version of the controlled vocabulary (CV) in OWL format.

Documentation files:

– Full Specification Document: mzML1.1.0_specificationDocument.doc

– HTML schema documentation for mzML 1.1.0

– HTML schema documentation for mzML 1.1.0 index wrapper schema

Validation of mzML files

Although at one time there were on-line mzML validators, these have fallen into disrepair and are no longer functional.

You can download and run a local validator.

– The OpenMS validator can be installed locally by downloading and installing OpenMS.

– The Java-based validator can be downloaded from GitHub

Sample instance documents for all relevant formats:

All documents are meant to contain equivalent information in the various formats.

– tiny1.mzML1.1.0.mzML
– tiny1.mzData1.05.xml

– tiny1.mzXML2.0.mzXML
– tiny1.mzXML3.0.mzXML

Sample files generated by the ProteoWizard:

– small.RAW (a small Thermo RAW file with LTQ-FT data)

– small.pwiz.1.1.mzML (converted from small.RAW by msconvert)

– small_miape.pwiz.1.1.mzML (converted by msconvert, with example MIAPE fields added programatically)

– small_zlib.pwiz.1.1.mzML (converted by msconvert, with zlib compression and 32-bit precision)

Other sample files:

 – PDA example file (createdby Steffen Neumann)

– Sample files generated by the Proteios Software Environment

Other relevant websites:

– HUPO-PSI GitHub mzML

– General PSI guidelines for creating controlled vocabularies

 Current and future support for mzML:
(updated 2013-02-19)

Product Source Contact  Support comments
ProteoWizard USC Parag Mallick Full mzML support today
TPP ISB Eric Deutsch Full mzML support today (including embedded X!Tandem)
Insilicos Viewer Insilicos Erik Nilsson Full mzML support today
X!Tandem GPM Ron Beavis Full mzML support today
Myrimatch Vanderbilt Matt Chambers Full mzML support today
InSilicoSpectro SIB Alex Masselot Full mzML support today
Proteios SE Univ Lund Fredrik Levander Full mzML support today
NCBI C++ toolkit NCBI Douglas Slotta available in next release
OpenMS/TOPP Univ Tübingen Marc Sturm Full mzML support today
Phenyx GeneBio Pierre-Alain Binz Full mzML support today
Mascot Matrix Science David Creasy Full mzML support today
Mascot Distiller Matrix Science David Creasy Full mzML support today
jmzML Ghent/ EMBL-EBI Lennart Martens Full mzML support today
Conversion tool in Proteomics Toolbox Thermo Scientific Jim Shofstahl beta testing
ReAdW (.RAW converter) ISB Eric Deutsch Replaced by ProteoWizard msconvert
mzWiff (.wiff converter) ISB Eric Deutsch Replaced by ProteoWizard msconvert
MassWolf (.raw/ converter) ISB Eric Deutsch Replaced by ProteoWizard msconvert
Trapper (Agilent data converter) ISB Eric Deutsch Replaced by ProteoWizard msconvert
mzML_Exporter ABI Sean Seymour beta testing
CompassXport Bruker ? ?
PEAKS Bioinformatics Solutions Inc Kevin Zhang Beta Testing
PRIDE database EMBL-EBI Juan A. Vizcaino ongoing
PRIDE Inspector EMBL-EBI Juan A. Vizcaino Full mzML support
MIAPE MS Extractor ProteoRed Salvador Martinez-Bartolome Full mzML support
mzR Bioconductor Bernd Fischer, Steffen Neumann, Laurent Gatto Full mzML support
pymzML Univ Münster Christian Fufezan Full mzML support
Crux University of Washington W. Noble Full mzML support


Released mzML 1.0.0 Specification

(updated 2009-02-10)

The information and documents below related to mzML 1.0.0, which is now obsoleteDo not use it.

Current xml schema definition files (.xsd):

– mzML1.0.0.xsd (main schema)

– mzML1.0.0_idx.xsd (separate and optional index)

Documentation files:

– Full Specification Document: mzML1.0.0_specificationDocument.doc

– HTML schema documentation for mzML 1.0.0

– HTML schema documentation for mzML 1.0.0 index wrapper schema

– ASMS June 2008 Poster (3MB PDF)

Tags

Mass Spectrometry

Specifications