Biological database

As of 2006, there are over 1,000 public and commercial biological databases. These biological databases usually contain genomics and proteomics data, but databases are also used in taxonomy. The data are nucleotide sequences of genes or amino acid sequences of proteins. Furthermore information about function, structure, localisation on chromosome, clinical effects of mutations as well as similarities of biological sequences can be found.

Overview
Biological databases have become an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life.

The biological knowledge of databases is usually (locally) distributed amongst many different specialized databases. This makes it difficult to ensure the consistency of information, which sometimes leads to low data quality.

By far the most important resource for biological databases is a special (yearly) issue of the journal "Nucleic Acids Research" (NAR). The Database Issue is freely available, and categorizes all the publicly available online databases related to computational biology (or bioinformatics).

The Database Issue of NAR

See also: NCBI, PubMed

Most important public databases for molecular biology
(from www.kokocinski.net)

Primary sequence databases
The International Nucleotide Sequence Database (INSD) consists of the following databases. These databanks represent the current knowledge about the sequences of all organisms. They interchange the stored information and are the source for many other databases.
 * 1) DDBJ (DNA Data Bank of Japan)
 * 2) EMBL Nucleotide DB (European Molecular Biology Laboratory)
 * 3) GenBank  (National Center for Biotechnology Information)

Meta-databases
Strictly speaking a meta-database can be considered a database of databases, rather than any one integration project or technology. It collects information from different other sources and usually makes them available in new and more convenient form.


 * 1) Entrez (National Center for Biotechnology Information)
 * 2) euGenes (Indiana University)
 * 3) GeneCards (Weizmann Inst.)
 * 4) SOURCE (Stanford University)
 * 5) mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
 * 6) Harvester III - Karlsruhe Institute of Technology - Integrating 26 major protein/gene resources.

Genome Browsers
Genome Browsers enable researchers to visualize and browse entire genomes (most have many complete genomes) with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. Annotated data is usually from multiple diverse sources.
 * 1) Integrated Microbial Genomes (IMG) system by the DOE-Joint Genome Institute
 * 2) UCSC Genome Bioinformatics Genome Browser and Tools (UCSC)
 * 3) Ensembl The Ensembl Genome Browser (Sanger Institute and EBI)
 * 4) GBrowse The GMOD GBrowse Project
 * 5) Pathway Tools Genome Browser
 * 6) X:Map A genome browser that shows Affymetrix Exon Microarray hit locations alongside the gene, transcript and exon data on a Google maps api

Protein sequence databases

 * 1) UniProt Universal Protein Resource (UniProt Consortium: EBI, Expasy, PIR)
 * 2) PIR Protein Information Resource (Georgetown University Medical Center (GUMC))
 * 3) Swiss-Prot Protein Knowledgebase  (Swiss Institute of Bioinformatics)
 * 4) PEDANT Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
 * 5) PROSITE Database of Protein Families and Domains
 * 6) DIP Database of Interacting Proteins (Univ. of California)
 * 7) Pfam Protein families database of alignments and HMMs (Sanger Institute)
 * 8) ProDom Comprehensive set of Protein Domain Families (INRA/CNRS)
 * 9) SignalP Server for signal peptide prediction

Protein structure databases
Protein structure databases:
 * 1) Protein Data Bank (PDB) (Research Collaboratory for Structural Bioinformatics (RCSB))
 * 2) CATH Protein Structure Classification
 * 3) SCOP Structural Classification of Proteins
 * 4) SWISS-MODEL Server and Repository for Protein Structure Models
 * 5) ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF)

Protein-Protein Interactions
Protein-protein interactions:
 * 1) BioGRID  A General Repository for Interaction Datasets (Samuel Lunenfeld Research Institute)
 * 2) STRING: STRING is a database of known and predicted protein-protein interactions. (EMBL)
 * 3) Database of Interacting Proteins

Pathway Databases

 * 1) BioCyc Database Collection including EcoCyc and MetaCyc
 * 2) KEGG PATHWAY Database (Univ. of Kyoto)
 * 3) Reactome (Cold Spring Harbor Laboratory, EBI, Gene Ontology Consortium)

Microarray-databases

 * 1) ArrayExpress (European Bioinformatics Institute)
 * 2) Gene Expression Omnibus (National Center for Biotechnology Information)
 * 3) maxd (Univ. of Manchester)
 * 4) SMD (Stanford University)
 * 5) GPX(Scottish Centre for Genomic Technology and Informatics)

Specialized databases

 * 1) CGAP Cancer Genes (National Cancer Institute)
 * 2) Clone Registry Clone Collections (National Center for Biotechnology Information)
 * 3) DBGET H.sapiens (Univ. of Kyoto)
 * 4) GDB Hum. Genome Db (Human Genome Organisation)
 * 5) MGI Mouse Genome (Jackson Lab.)
 * 6) SHMPD The Singapore Human Mutation and Polymorphism Database
 * 7) NCBI-UniGene (National Center for Biotechnology Information)
 * 8) OMIM Inherited Diseases (Online Mendelian Inheritance in Man)
 * 9) Off. Hum. Genome Db (HUGO Gene Nomenclature Committee)
 * 10) HGMD disease-causing mutations (HGMD Human Gene Mutation Database)
 * 11) PhenCode linking human mutations with phenotype
 * 12) List with SNP-Databases
 * 13) p53 The p53 Knowledgebase
 * 14) Edinburgh Mouse Atlas
 * 15) Corn (Maize Genetics and Genomics Database)
 * 16) HvrBase++ Human and primate mitochondrial DNA