Protein Data Bank (file format)

The Protein Data Bank (pdb) file format is a textual file format describing the three dimensional structures of molecules held in the Protein Data Bank. As such, it contains atom position information and sequence information along with information about the researchers who defined the structure and a citation. There may also be optional remarks that are pertinent to understanding the information contained in the file.

Through the years the file format has undergone many changes and revisions. Its original format was dictated by the width of computer punch cards.
 * PDB Format Guide - Prepared by the PDB Staff at BNL The PDB format specification can be found here, and it is vital that you read this before looking at the raw data.
 * Recently PDB provides a representation of PDB data in XML format, PDBML format.
 * ftp.rcsb.org The raw data can be downloaded from here.

This legacy format has caused many problems with the format, and consequently the PDB has three distinct 'clean-up' projects;
 * The Molecular Modeling DataBase (MMDB) from NCBI
 * The Macromolecular Structure Database from the European Bioinformatics Institute
 * The Data Uniformity Project from PDB