This schema can capture the use of a mass spectrometer, the data generated, and the initial processing of that data (to the level of the peak list). Peak lists are processed data from a mass spectrometry experiment. There can be multiple peak lists in an mzData file, which might be related via a separation, or just in sequence from an automated run. Any one peak list (mass spectrum) may also be composed of a number of acquisitions, which can be described individually herein.
This stores the location, name, version and a short, arbitrarily assigned in-file reference label, for one or more controlled vocabulary sources. Short labels are for use elsewhere in the data file, primarily in the cvLabel attribute, to indicate the use of a particular source for an item of controlled vocabulary.
Each data set starts with a description section. This section captures 'general' information; such as the instrument on which the data were generated.
Administrative information pertaining to the entire mzData file (i.e. not specific to any part of the data set) is stored here.
Instrument descripton (not 'run time' parameters, which should properly be captured in spectrumInstrument); these features must be common to all acquisitions.
Description of the default processing by which the peak list(s) were generated.
All mass spectra and the acquisitions underlying them are described and attached here. Subsidiary data arrays are also both described and attached here.
This is an individual spectrum.The spectrum is considered to be composed of an array of acquisitions. There are two primary ways of representing data; base64 encoded binary (single or double precision) or arrays of simple data types. All arrays used to describe a single spectrum are the same length, with the same indexing.
The number of spectra that are to be found in the attached list.
The development version of this mzData schema.
The accession number assigned arbitrarily to a particular mzData instance (i.e. data) file, by the generator of that file. This accession number is intended to serve as a (locally) unique reference by which to identify a particular mzData instance file, and is not intended to be related to any other accession number, such as that for a particular entry in a reference database such as Uniprot, or the element sampleName under the admin branch of mzData.
Description of the source file, including location and type.
Name of the source file, without reference to location (either URI or local path).
URI-formatted full path to file, without actual file name appended.
Type of the file if appropriate, else a description of the software or reference resource used.
Data type for additional data vectors (beyond m/z and intensity).
Name of the supplemental data array.
The value to which the supDataArrayRef attribute on suppDesc refers; values should never be shared between binary and non-binary supplemental arrays.
Number of items in the supDataArray.
Each supDataArray can either be related to the mzArray or not.If the elements are related to the mzArray then this flag is set to 1.
If the indexed flag is set, then this value gives the element count (starting from 1) in the mzArray which aligns with the first element in this supDataArray.
Information about an ontology/CV source and a short 'lookup' tag to
refer to.
The short label to be used as a reference tag with which to refer to this particulart Controlled Vocabulary source description, from an instance of the cvLabel attribute, where it appears (i.e. in things of type paramType).
The usual name for the resource (e.g. The MGED Ontology).
The version of the CV from which the referred-to terms are drawn.
The URI for the resource.
Parameters from a controlled vocbulary.
The short tag for the resource as defined in cvLookupType.
The accession number of the referred-to term in the named resource.
The actual name for the parameter, from the referred-to controlled vocabulary.
The value for the parameter; may be absent if not appropriate, or a numeric or symbolic value, or may itself be CV (legal values for a parameter should be enumerated and defined in the ontology).
Data type for operator identification information.
Contact person name, or role name (e.g. "Group leader of team 42") of the individual responsible for this dataset.
Academic or corporate organisation with which the contact person or role is associated.
Phone number, email, postal address or other appropriate means of contact.
Software information (the software that produced the peak list).
The official name for the software package used.
The version number of the software package.
Additional comments on the use of the software.
The time to complete the processing that resulted in this mzData file, if the mzData file was generated in a single use of the software (i.e. if the software was not used with more than one parameter set, not whether the job was completed in several phases).
Description of the software, and the way in which it was used to generate the peak list.
Specific information on the conversion or processing software.
Description of the default peak processing method.This element describes the base method used in the generation of a particular mzData file. Variable methods should be described in the appropriate acquisition section - if no acquisition-specific details are found, then this information serves as the default.
The structure tha captures the generation of a peak list (including
the underlying acquisitions)
There is one spectrumDesc per spectrum. It captures both the instance-specific parameters for the underlying acquisitions, and where applicable, the postion of this spectrum in a possible hierarchy of spectra. For example, for 'tandem' mass spectrometry; the id attribute on the spectrum element allows the survey scan to be identified from which the parent ion was selected that gave rise to this MSMS spectrum - note that these identifying numbers can be given, in a list if necessary, whether or not the actual referred-to spectra are present in the file - they are the 'family tree' of this spectrum.
There is one supDesc for each supDataArray (binary or otherwise) found under this particular spectrum element.
The list of m/z values (for any type of spectrum). The array is stored as a base64 encoded binary.The only type allowed is IEEE-754 floating point; the precision must be specified as either 32- or 64-bit; endianess must also be specified.
The intensities for each member of the m/z array, also in base64 encoded binary as IEEE-754 floating point, with specified precision and endianess.
For each acquisition, there can be a mixture of binary and other data arrays. This is mostly to allow string and other data not conveniently handled by base64 to be associated with binary data. This [0..n] choice allows the number of arrays to be arbitrary.
In addition to the m/z and intensity arrays, an arbitrary number of other arrays can be stored using the same indexing. For each array stored as base64 binary, precision and endianess must be specified.The only type allowed is IEEE-754 floating point (even booleans must be re-encoded this way).
If the raw data representation method was not binary, or if the supplemental data array was a string or other non-binary type, then it can be represented in the supDataArray element (again with the same indexing).
The identifier for a particular spectrum; to serve as both an internal (to the file) reference with which to order spectra and also to as a means to associate them with each other (e.g. parent and child soectra from a tandem experiment). This number should be provided whether it legitmately comes from the source data, or has to be generated. In the absence of a parent spectrum for an MS
Extension of binary data group for supplemental data
Name of the supplemental data array.
The value to which the supDataArrayRef attribute on suppDesc refers; values should never be shared between binary and non-binary supplemental arrays.
Extension of binary data group for m/z and intensity values
The structure into which base64-encoded binary data go
'Header' information - sample description, contact details, comments
A short label that is referable to the sample used to generate the dataset. This will often be a copy of the internal (lab) reference code for the sample being analysed.
Expansible description of the sample used to generate the dataset, named in sampleName.
Information about the original source file (i.e. that generated by the instrument) used in generating the instance document.
Audit information concerning the means by which the originator/owner of this mzData file can be identified, and contacted if necessary.
Description of the parameters for the mass spectrometer for a given acquisition (or list of)
Specification for combining raw scans/acquisitions into a single peak list or spectrum. A list of acquisitions from the original raw file can be specified. Software parameters specified in the cv/userParams under acquisition automatically override the default parameters given in dataProcessing.
Scan or acquisition from original raw file used to create this peak list, as specified in sourceFile.
The actual acquisition number taken directly from the raw file.
Whether these are discrete or continuous spectra.
The method (most usually summing or some form of averaging) by which the acquisitions were combined to make the spectrum.
The total number of acquisitions attached (as a simple data integrity check).
The instrument's 'run time' parameters; common to the whole of this spectrum.
The method of precursor ion selection and activation
This captures the type of ion selection being performed, and trigger m/z (or m/z's), neutral loss criteria etc. for tandem MS or data dependent scans.
The type and energy level used for activation.
Reference to the id attribute of the spectrum from which the precursor was selected.
Description of the process of performing an acquisition
Both run time instrument settings and variations in software parameters all leading to the generation of the specific spectrum being described.
List and descriptions of precursors to the spectrum currently being described.
This is the precursor step. If source activation is used then both msLevel and spectrumRef have the value 0. The spectrumRef holds the value of the id attrubute of the spectrum from which the precursor ion was selected. An ordered list of these precursors can be given; the referred-to id numbers my not represent spectra present in the mzData file, but this should not present a bar to providing the history of this scan. Example (trivially); MS survey scan has id = 1 and the first MSMS spectrum has id= 2, with the spectrumRef attribute on precursor for the MSMS spectrum having the value 1.
The number of precursor selection processes described in the attached list.
Additional comments regarding the acquisition are captured here as free text. This should only be used as a lifeboat for when the cv/userParams are inappropriate; or as a 'scratch' comment space.
Description of a supplemental data array
Description of the supplemental dataset, in both CV and free-text.
Information about the original source file used to generate the supDataArray.
Reference to the id attribute of the supplemental data array/binary.
Description of the components of the mass spectrometer used
Descriptive name of the instrument (make, model, significant customisations).
Invariant ion source (e.g. MALDI) information, as a run of name-value pairs.
Mass analyzer component list; ordered so as to reflect the physical order of the desribed components in the mass spectrometer.
A single component of the mass analyzer (e.g. quadrupole, collision cell), decribed with a run of name-value pairs.
The number of analyzers that are described in the attached list.
Ion detector information, as a run of name-value pairs.
Subsidiary information about the instrument; a run of additional parameters captured as name-value pairs
Structure allowing the use of controlled or uncontrolled vocabulary
This element holds additional data or annotation. Only controlled values are allowed here.
This element holds additional data or annotation. Uncontrolled, or user controlled values are allowed here.
Uncontrolled user parameters (vocabulary).
The actual name for the parameter.
The value for the parameter, where appropriate.
Extension of 'paramType' with an added free-text comment attribute.
Free text opportunity to supplement, but not to replace, to the main CV-based description.