The inputs to the analyses including the databases searched, the spectral data and the source file converted to mzIdentML.
Data sets generated by the analyses, including peptide and protein lists.
The upper-most hierarchy level of mzIdentML with sub-containers for example describing software, protocols and search results (spectrum identifications or protein detection results).
The list of CVs used within the file
The software used to perform the analyses (specify at least name, manufacturer, version, URL).
The Provider of the mzIdentML record in terms of the contact and software.
The complete set of Contacts (people and organisations) for this file.
The samples analysed can optionally be recorded using CV terms for descriptions. If a composite...
The collection of sequences (DBSequence or Peptide) identified to be referenced in the results.
The analyses performed to get the results, which map the input and output data sets. Analyses are for example: SpectrumIdentification (resulting in peptides) or ProteinDetection (assemble proteins from peptides).
The collection of protocols which include the parameters and settings of the performed analyses.
The collection of input and output data sets of the analyses.
Any bibliographic references associated with the file
The date on which the file was produced.
The version of the schema this instance document refers to, in the format x.y.z. Changes to z should not affect prevent instance documents from validating.
A database for searching mass spectra. Examples include a set of amino acid sequence entries, or annotated spectra libraries.
The database name may be given as a cvParam if it maps exactly to one of the release databases listed in the CV, otherwise a userParam should be used.
The version of the database.
The release date of the database.
The total number of sequences in the database.
The number of residues in the database.
A file from which this mzIdentML instance was created.
The type of filter e.g. database taxonomy filter, pi filter, mw filter
All sequences fulfilling the specifed criteria are included.
All sequences fulfilling the specifed criteria are excluded.
The filter must include at least one of Include and Exclude. If both are used, it is assumed that inclusion is performed first.
The table used to translate codons into nucleic acids e.g. by reference to the NCBI translation table.
The frames in which the nucleic acid sequence has been translated as a space separated list
The parameters and settings of a SpectrumIdentification analysis.
The type of search performed e.g. PMF, Tag searches, MS-MS
The search parameters other than the modifications searched.
The specification of static/variable modifications (e.g. Oxidation of Methionine) that are to be considered in the spectra search.
The threshold(s) applied to determine that a result is significant. If multiple terms are used it is assumed that all conditions are satisfied by the passing results.
The specification of filters applied to the database searched.
A specification of how a nucleic acid sequence database was translated for searching.
The search algorithm used, given as a reference to the SoftwareCollection section.
The attribute referencing an identifier within the SpectraData section.
A reference to the SpectraData element which locates the input spectra to an external file.
A reference to the database searched.
An Analysis which tries to identify peptides in input spectra, referencing the database searched, the input spectra, the output results and the protocol that is run.
One of the spectra data sets used (can be several).
One of the search databases used (can be several).
A reference to the search protocol used for this SpectrumIdentification.
A reference to the SpectrumIdentificationList produced by this analysis in the DataCollection section.
References to CV terms defining the measures about product ions to be reported in SpectrumIdentificationItem
Represents the set of all search results from SpectrumIdentification.
Scores or output parameters associated with the SpectrumIdentificationList
Contains the types of measures that will be reported in generic arrays for each SpectrumIdentificationItem e.g. product ion m/z, product ion intensity, product ion m/z error
This value should be provided unless a de novo search has been performed.
Specification of a search modification as parameter for a spectra search. Contains the name of the modification, the mass, the specificity and whether it is a static modification.
The specificity rules of the searched modification including for example the probability of a modification's presence or peptide or protein termini. Standard fixed or variable status should be provided by the attribute fixedMod.
True, if the modification is static (i.e. occurs always).
The values of this particular measure, corresponding to the index defined in ion type
A reference to the Measure defined in the FragmentationTable
The type of ion identified.
An array of values for a given type of measure and for a particular ion type, in parallel to the index of ions identified.
The index of ions identified as integers, following standard notation for a-c, x-z e.g. if b3 b5 and b6 have been identified, the index would store "3 5 6". For internal ions, the index contains pairs defining the start and end point - see specification document for examples. For immonium ions, the index is the position of the identified ion within the peptide sequence - if the peptide contains the same amino acid in multiple positions that cannot be distinguished, all positions should be given.
The charge of the identified fragmentation ions.
IonType defines the index of fragmentation ions being reported, importing a CV term for the type of ion e.g. b ion. Example: if b3 b7 b8 and b10 have been identified, the index attribute will contain 3 7 8 10, and the corresponding values will be reported in parallel arrays below
An identification of a single (poly)peptide, resulting from querying an input spectra, along with the set of confidence values for that identification. PeptideEvidence elements should be given for all mappings of the corresponding Peptide sequence within protein sequences.
The product ions identified in this result.
The charge state of the identified peptide.
The mass-to-charge value measured in the experiment in Daltons / charge.
The theoretical mass-to-charge value calculated for the peptide in Daltons / charge.
The calculated isoelectric point of the (poly)peptide, with relevant modifications included. Do not supply this value if the PI cannot be calcuated properly.
A reference to the identified (poly)peptide sequence in the Peptide element.
For an MS/MS result set, this is the rank of the identification quality as scored by the search engine. 1 is the top rank. If multiple identifications have the same top score, they should all be assigned rank =1. For PMF data, the rank attribute may be meaningless and values of rank = 0 should be given.
Set to true if the producers of the file has deemed that the identification has passed a given threshold or been validated as correct. If no such threshold has been set, value of true should be given for all results.
A reference should be given to the MassTable used to calculate the sequenceMass only if more than one MassTable has been given
A reference should be provided to link the SpectrumIdentificationItem to a Sample if more than one sample has been described in the AnalysisSampleCollection.
All identifications made from searching one spectrum. For PMF data, all peptide identifications will be listed underneath as SpectrumIdentificationItems. For MS/MS data, there will be ranked SpectrumIdentificationItems corresponding to possible different peptide IDs.
The locally unique id for the spectrum in the spectra data set specified by SpectraData_ref. External guidelines are provided on the use of consistent identifiers for spectra in different external formats.
A reference to a spectra data set (e.g. a spectra file).
A reference to the list of spectrum identifications that were input to the process.
An Analysis which assembles a set of peptides (e.g. from a spectra search analysis) to proteins.
The lists of spectrum identifications that are input to the protein detection process.
A reference to the ProteinDetectionList in the DataCollection section.
A reference to the detection protocol used for this ProteinDetection.
The parameters and settings of a ProteinDetection process.
The parameters and settings for the protein detection given as CV terms.
The threshold(s) applied to determine that a result is significant. If multiple terms are used it is assumed that all conditions are satisfied by the passing results.
The protein detection software used, given as a reference to the SoftwareCollection section.
The protein list resulting from a protein detection process.
Scores or output parameters associated with the ProteinDetectionList
A reference to the PeptideEvidence element on which this hypothesis is based.
A single result of the ProteinDetection analysis (i.e. a protein).
Peptide evidence on which this ProteinHypothesis is based by reference to a PeptideEvidence element in a SpectrumIdentificationItem.
A reference to the corresponding DBSequence entry. This is optional and redundant, because the PeptideEvidence elements referenced from here also map to the DBSequence.
Set to true if the producers of the file has deemed that the ProteinDetectionHypothesis has passed a given threshold or been validated as correct. If no such threshold has been set, value of true should be given for all results.
A set of logically related results from a protein detection, for example to represent conflicting assignments of peptides to proteins.
A molecule modification specification. If n modifications have been found on a peptide, there should be n instances of Modification. If multiple modifications are provided as cvParams, it is assumed that the modification is ambiguous i.e. one modification or another. If no CVParams are provided it is assumed that the delta has not been matched to a known modification. A neutral loss should be defined as an additional CVParam within Modification. If more complex information should be given about neutral losses (such as presence/absence on particular product ions), this can additionally be encoded within the FragmentationArray.
Location of the modification within the peptide - position in peptide sequence, counted from the N-terminus residue, starting at position 1. Specific modifications to the N-terminus should be given the location 0. Modification to the C-terminus should be given as peptide length + 1.
Specification of the residue (amino acid) on which the modification occurs. If multiple values are given, it is assumed that the exact residue modified is unknown i.e. the modification is to ONE of the residues listed. Multiple residues would usually only be specified for PMF data.
Atomic mass delta considering the natural distribution of isotopes in Daltons.
Atomic mass delta when assuming only the most common isotope of elements in Daltons.
The modification searched for, sourced from e.g. UniMod and the mass delta
The name of the modification imported from a relevant CV
The mass delta of the searched modification in Daltons
The residue(s) searched with the specified modification
One (poly)peptide (a sequence with modifications).
The amino acid sequence of the (poly)peptide. If a substitution modification has been found, the original sequence should be reported.
A modification where one residue is substituted by another (amino acid change).
The original residue before replacement.
The residue that replaced the originalResidue.
Location of the modification within the peptide - position in peptide sequence, counted from the N-terminus residue, starting at position 1. Specific modifications to the N-terminus should be given the location 0. Modification to the C-terminus should be given as peptide length + 1.
Atomic mass delta considering the natural distribution of isotopes in Daltons. This should only be reported if the original amino acid is known i.e. it is not "X"
Atomic mass delta when assuming only the most common isotope of elements in Daltons. This should only be reported if the original amino acid is known i.e. it is not "X"
A data set containing spectra data (consisting of one or more spectra).
The software used for performing the analyses.
The name of the analysis software package, sourced from a CV if available.
Any customizations to the software, such as alternative scoring mechanisms implemented, should be documented here as free text.
URI of the analysis software e.g. manufacturer's website
The details of an individual cleavage enzyme should be provided by giving a regular expression or a CV term if a "standard" enzyme cleavage has been performed.
The name of the enzyme from a CV.
Element formula gained at NTerm.
Element formula gained at CTerm.
Set to true if the enzyme cleaves semi-specifically (i.e. one terminus must cleave according to the rules, the other can cleave at any residue), false if the enzyme cleavage is assumed to be specific to both termini (accepting for any missed cleavages).
The number of missed cleavage sites allowed by the search. The attribute must be provided if an enzyme has been used.
Minimal distance for another cleavage (minimum: 1).
The list of enzymes used in experiment
If there are multiple enzymes specified, this attribute is set to true if cleavage with different enzymes is performed independently
The single letter code for the residue.
The residue mass in Daltons (not including any fixed modifications).
The single letter code of the ambiguous residue e.g. X.
The masses of residues used in the search.
The specification of a single residue within the mass table.
Ambiguous residues e.g. X can be specified by the Code attribute and a set of parameters for example giving the different masses that will be used in the search.
The MS spectrum that the MassTable refers to e.g. "1" for MS1 "2" for MS2 or "1 2" for MS1 or MS2
PeptideEvidence maps a spectrum identification to DBSequence in which such a peptide is located.
A reference to the sequence from which this identification has been made.
Start position of the peptide inside the protein sequence, where the first amino acid of the protein sequence is position 1.
The index position of the last amino acid of the peptide inside the protein sequence, where the first amino acid of the protein sequence is position 1.
Previous flanking residue. If the peptide is N-terminal, pre="-" and not pre="". If for any reason it is unknown (e.g. denovo), pre="?" should be used.
Post flanking residue. If the peptide is C-terminal, post="-" and not post="". If for any reason it is unknown (e.g. denovo), post="?" should be used.
A reference to the translation table used if this is PeptideEvidence derived from nucleic acid sequence
The translation frame of this sequence if this is PeptideEvidence derived from nucleic acid sequence
Set to true if the peptide is matched to a decoy sequence.
Number of missed cleavage sites (not required if no enzyme has been used).
The tolerance of the search given as a plus and minus value with units.
The format of the spectrum identifier within the source file
A database sequence from the specified SearchDatabase (nucleic acid or amino acid). If the sequence is nucleic acid, the source nucleic acid sequence should be given in the seq attribute rather than a translated sequence.
The actual sequence of amino acids or nucleic acid.
The length of the sequence as a number of bases or residues.
The source database of this sequence.
The unique accession of this sequence.
A description of the sample analysed by mass spectrometry using CVParams or UserParams. If a composite sample has been analysed, a parent sample should be defined, which references subsamples.
References to the individual component samples within a mixed parent sample.
Regular expression for specifying the enzyme cleavage site.