PEFF - PSI Extended Fasta Format

PEFF (PSI Extended Fasta Format) is a proposed unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc). This format enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms. It also allows the representation of structural annotations such as post-translational modifications, mutations and other processing events. The proposed format has the form of a flat file that extends the formalism of the individual sequence entries as presented in a FASTA format and that includes a header of meta data to describe relevant information about the database(s) from which the sequence has been obtained (i.e., name, version, etc). The format is named PEFF (PSI Extended FASTA Format). Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools.
PEFF is a flat file formatted document constituted by a header section that contains meta information about the sequence database (such as name, version, source, type) and a sequence section that contains for each protein/nucleotide sequence a fastA-like, formatted header line followed by the sequence itself.

As for other PSI formats, PEFF is made of a specification document, a CV that is to be incorporated into PSI-MS CV, a number of example documents and a number of implementation software.

The format has been submitted to the PSI doc process and is under review. The draft submission is available here. The format is currently under revision and a github repository is being created.


Current Implementations

- neXtProt is exporting PEFF

- Comet is reading PEFF as input database format

- Compomics has a PEFF viewer

- ProteinPilot reads PEFF as input database format