Skip to content

proBAM

proBAM is one of the standards developed by members of the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.

Contents

  1. proBAM (version 1.0.0): Specification documents
  2. proBAM Tools and Implementations

proBAM (version 1.0.0): Specification documents

The proteomics BAM (proBAM) file format is designed for storing and analyzing peptide spectrum matches (PSMs) within the context of the genome. proBAM is built upon the SAM format and its compressed binary version, BAM, with necessary modifications to accommodate information specific to proteomics data such as PSM scores and confidence, charge states and peptide level modifications, both artefactual and PTMs (post-translational modifications). A manuscript describing this proBAM format (together with the proBed format) is available at Genome Biology.

Direct links to deliverables:

First set of proBAM example and related files:

  • PXD001524.pro.bam – example proBAM file converted from the mzIdentML example file indicated below using the proBAMconvert tool (http://probam.biobix.be/). In this example, the SwissProt entry names (e.g. HDGF_HUMAN) were used for mapping to Ensembl prior to obtaining the genomic coordinates. 
  • PXD001524_reprocessed_sort.pro.bam – example proBAM file converted from the mzIdentML file using proBAMr (https://www.bioconductor.org/packages/release/bioc/html/proBAMr.html). In this example, peptides were mapped to protein sequences from the Refseq database. The genome coordinates were based on hg19 in this file.
  • Additional related files:

Second set of proBAM example and related files:

  • PXD001390.pro.bam – 2nd example proBAM file converted from the 2nd mztab example file using the proBAMconvert tool. In this example, the Ensembl protein identifiers (e.g. ENSP00000349878) were used for genome coordinate mapping.
  • PXD001390_sort.pro.bam – 2nd example proBAM file converted from mzIdentML example file using proBAMr. Ensembl v86 was used for annotating genomic coordinates.
  • Additional related files:

Third set of proBAM example and related files:

  • PXD000124.pro.bam – 3rd example proBAM file converted from the 3rd mztab example file. In this example, the Ensembl transcript identifiers (e.g. ENSMUST00000005017) were used for genome coordinate mapping. Also, an extra 3 reading frame translation was performed on these transcript sequences, enabling mapping of 5’UTR translation products.
  • Additional related files:

Fourth set of proBAM example and related files:

  • CPTAC_CRC_custom.pro.bam – 4th example proBAM file generated from CPTAC_CRC dataset (proteomics data for 91 samples representing 86 TCGA colorectal cancer tumors generated by the Clinical Proteomic Tumor Analysis Consortium). Refseq annotation (hg19) was used for genome coordinate mapping. In addition to the normal peptide identification, RNASeq data based customized databases were used to identify variant peptides. The location as well as the nucleotide level change were also included in this proBAM file.

Both the proBAMconvert and proBAMr tools perform the mapping, thus obtaining the genomic coordinates.

proBAM Tools and Implementations

  • Commonly used genomic tools that can process proBAM files (maybe not apply to all functions)
    • SAMtools: provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. ( http://www.htslib.org/doc/samtools.html)
    • BEDtools: a powerful toolset to intersect, merge, count, complement and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF,VCF. (http://bedtools.readthedocs.io/en/latest)
    • IGV (Integrative Genomics Viewer): high-performance visualization tool for interactive exploration of large, integrated genomic datasets. (https://www.broadinstitute.org/igv/)
  • proBAMconvert: Python implementation to generate proBAM files (proBAMconvert, PMID: 28573858), a worked-out example is also available (example).
  • proBAMsuite: includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively (proBAMrproBAMtools, PMID:26657539)
    • A computational pipeline to generate and analyze proBAM files in R enviroment. (link)