
mzPeak is a new binary file format to represent instrument runs in a compact, fast, and cloud-or-local-friendly manner. It is built on top of Apache Parquet, a featureful, robust and production-tested data format with implementations available in virtually all major programming languages today. You can read it from your hard drive, a object store like Amazon’s S3, or from most HTTP servers. It competes with and sometimes beats vendor proprietary file formats for size, and uncompressed mzML for complex random access queries.
Specification Status: In Development
Current Location: https://github.com/mobiusklein/mzpeak_prototyping
Specification Draft: Markdown
Original White Paper: https://pubs.acs.org/doi/full/10.1021/acs.jproteome.5c00435
Active Reference Implementations
These are all re-implementations, not bindings on a single core library in a lower level language. They use generally available Apache Parquet and Apache Arrow libraries for their languages.
Rust: https://github.com/mobiusklein/mzpeak_prototyping.
- This is the ur-reference implementation which serves as the basis for developing the initial prototypes and pushing their limits.
- Read (Local, Cloud)
- Write (Local), including direct conversion from Thermo RAW and Bruker .TDF
Python: https://github.com/mobiusklein/mzpeak_prototyping/python.
- Read (Local, Cloud)
- Bonus: SQL interface via DataFusion
R: https://github.com/mobiusklein/mzpeak_prototyping/R
- Read (Local)
C++: TODO
C#: TODO
Java: TODO
TypeScript/WASM: TODO