Mass spectrometry software

Mass spectrometry software is any software for data acquisition, analysis or data representation in mass spectrometry.

Most of the following tools work on the mass spectrometry data formats mzData and mzXML.

PEAKS
PEAKS is designed for peptide sequencing and protein identification from tandem mass spectrometry (MS/MS) data.

Other than being used for search engine protein identification (Protein ID), it is one of the earliest adaptors for de novo sequencing (both automated and manual) and sequence tag based searching (SPIDER).

In short, de novo sequencing is peptide sequencing performed without prior knowledge of the amino acid sequence.

Some of the information PEAKS provides is a complete sequence for each peptide, confidence scores on individual amino acid assignments, simple reporting for high-throughput analysis, and pretty much all the stuff one needs to know for in depth investigations.

One of the most useful tools in any form of research is the ability to compare results. PEAKS will cross check test results automatically with other protein ID search engines, such as Sequest, OMSSA, X!Tandem and Mascot. This approach guards against false positive peptide assignments.

PROTRAWLER
ProTrawler is an LCMS data reduction application that reads raw mass spectrometry vendor data (from a variety of well-known instrument companies) and creates lists of {mass, retention time, integrated signal intensity} triplets summarizing the LCMS chromatogram. The measurements are reported with errors, which are essential for performing dynamic binning for comparisons between data sets. ProTrawler operates in two modes: a highly visual hands-on (expert) mode for the development of parameters used in data reduction and a fully automated mode for moving through many chromatograms in an automated fashion. ProTrawler's data reduction work flow includes background elimination, noise estimation, peak shape estimation, shape deconvolution, and isotopic and charge-state list deconvolution (factoring in errors and signal noise) to give a list features. Typically, ProTrawler reduces 1 GB of raw data to 10 Kb of processed results with a detection sensitivity of three orders of magnitude in 25% of the data acquisition time. No formal Bayesian methods are used, but sophisticated statistical inference is employed throughout. ProTrawler has been used for bacterial protein biomarker discovery efforts as well as for IPEx-related applications.

REGATTA
Regatta is an LCMS list comparison application that works hand-in-hand with ProTrawler (but accepts input in Excel/CSV form) to provide an environment for LCMS results list filtering and normalization {mass, retention time, integrated intensity} lists. To accomplish this, Regatta solves the famous Transitive Property of Equality problem that arises in the comparison of analytical list data, viz., if Peak A in Sample A overlaps Peak B in Sample B, and Peak B overlaps Peak C in Sample C, but Peak A does not overlap Peak C, then can we say that we've measured the same analyte in all three samples or not? Regatta also implements multivariate analysis, e.g., hierarchical cluster analysis, principal component analysis, as well as statistical tests, e.g., coefficients of variation. Input is not necessarily restricted to output from ProTrawler. Regatta has been used for successfully for biomarker discovery.

SPIDER
Common database search engines are unable to recognize some peptides. SPIDER, a sequence tag based search tool, complements protein identification by quickly seeking homology in proposed protein sequences. Partial sequence recognition allows for a greater understanding of post translational modifications and sequence mutations.

BLAST style homology fails when confronted with common sequence substitutions such as I/L, N/GG, SAT/TAS.

SEQUEST
SEQUEST is a popular tandem mass spectrometry data analysis program. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.

This tool is most useful in the context of shotgun proteomics. Starting with a complex mixture of proteins, this strategy typically employs trypsin to digest proteins. These peptides are separated by liquid chromatography en route to a tandem mass spectrometer. The mass spectrometer then isolates ions of a particular peptide, subjects them to collision-induced dissociation, and records the produced fragments in a tandem mass spectrum. This process, repeated for several hours, will produce thousands of tandem mass spectra. Identifying such a data collection requires automation, and Sequest was the first software to fill that need.

Sequest identifies each tandem mass spectrum individually. The software evaluates protein sequences from a database to compute the list of peptides that could result from each. The peptide's intact mass is known from the mass spectrum, and Sequest uses this information to determine the set of candidate peptides sequences that could meaningfully be compared to the spectrum by including only those which are near the mass of the observed peptide ion. For each candidate peptide, Sequest projects a theoretical tandem mass spectrum, and Sequest compares these theoretical spectra to the observed tandem mass spectrum by the use of cross correlation. The candidate sequence with the best matching theoretical tandem mass spectrum is reported as the best identification for this spectrum.

Mascot
Matrix Science produces an algorithm called "Mascot" that performs mass spectrometry data analysis through a statistical evaluation of matches between observed and projected peptide fragments rather than cross correlation. As of version 2.2, support for peptide quantitation methods is provided in addition to the identification features.

VIPER and Decon2LS
The "Proteomics Research Resource for Integrative Biology" distributes two software tools (VIPER, Decon2LS, and others) that can be used to perform analysis of accurate mass and chromatography retention time analysis of LC-MS features. Sometimes referred to as the Accurate Mass and Time tag approach (AMT tag approach) generally these tools are used for Proteomics.

Phenyx
Phenyx was developed by Geneva Bioinformatics (GeneBio) in collaboration with the Swiss Institute of Bioinformatics (SIB). Phenyx incorporates OLAV, a family of true statistical scoring models, to generate and optimize scoring schemes that can be tailored for all kinds of instruments, instrumental set-ups and general sample treatments.

Phenyx computes a score to evaluate the quality of a match between a theoretical and experimental peak list (i.e. mass spectrum). A match is thus a collection of observations deduced from this comparison. The basic peptide score is ultimately transformed into a normalized z-Score and a p-Value. A basic peptide score is the sum of raw scores for up to twelve physico-chemical properties.

OpenMS / TOPP
OpenMS is a software C++ library for LC/MS data management and analysis. It offers an infrastructure for the development of mass spectrometry related software. OpenMS is free software available under the LGPL.

TOPP - The OpenMS Proteomics Pipeline - is a set of small applications that can be chained to create analysis pipelines tailored for a specific problem. TOPP is developed using the datastructures and algorithms provided by OpenMS. TOPP is free software available under the LGPL.

OpenMS and TOPP are a joint project of the Algorithmic Bioinformatics group at the Free University of Berlin, the Department for Simulation of Biological Systems of Tübingen University and the Junior Research Group for Protein-Protein Interactions and Computational Proteomics at Saarland University.