GC-content



GC-content (or guanine-cytosine content), in molecular biology, is the percentage of nitrogenous bases on a DNA molecule which are either guanine or cytosine (from a possibility of four different ones, also including adenine and thymine). This may refer to a specific fragment of DNA or RNA, or that of the whole genome. When it refers to a fragment of the genetic material, it may denote the GC-content of part of a gene (domain), single gene, group of genes (or gene clusters) or even a non-coding region. G (guanine) and C (cytosine) undergo a specific hydrogen bonding whereas A (adenine) bonds specific with T (thymine). The GC pair is bound by a triple bond and AT paired by  double bond, and thus GC pairs are more thermostable compared to the AT pairs. In spite of the higher thermostability conferred to the genetic material, it is envisaged that cells with high GC DNA undergo autolysis, thereby reducing the longitivity of the cell per se. Due to the robustness endowed to the genetic materials in high GC organisms it was commonly believed that the GC content played a vital part in adaptation temperatures, an hypothesis which has recently been refuted.

In PCR experiments, the GC-content of primers are used to determine their annealing temperature to the template DNA. A higher GC-content level indicates a higher melting temperature.

Determination of GC content
GC content is usually expressed as a percentage value, but sometimes as a ratio (called G+C ratio or GC-ratio). GC-content percentage is calculated as


 * $$\cfrac{G+C}{A+T+G+C}\times\ 100$$

whereas the G+C ratio is calculated as
 * $$\cfrac{A+T}{G+C}$$.

The GC-content percentages as well as GC-ratio can be measured by several means but one of the simplest methods is to measure what is called the melting temperature of the DNA double helix using real time PCR. The absorbance of DNA at a wavelength of 260 nm increases fairly sharply when the double-stranded DNA separates into two single strands when sufficiently heated. The most commonly used protocol for determining GC ratios uses flow cytometry for large number of samples.

Alternatively, if the DNA or RNA molecule under investigation has been sequenced then the GC-content can be accurately calculated by simple arithmetic.

GC ratio of genomes
GC ratios within a genome is found to be markedly variable. These variations in GC ratio within a genome of higher organisms results in a mosaic like formation with islet regions called isochores. This results in the variations in staining intensity in the chromosomes. The isochores include in them essential protein coding genes, termed housekeeping genes and thus determination of ratio of these specific regions contributes in mapping these essential genes.

GC ratios and coding sequence
Within a long region of genomic sequence, genes are often characterised by having a higher GC-content in contrast to the background GC-content for the entire genome. Evidence of GC ratio with that of length of the coding region of a gene have showed that the length of the coding sequence is directly proportional to higher G+C content. This has been pointed to the fact that the stop codon has a bias towards A and T nucleotides and thus shorter the sequence higher the AT bias.

Application in systematics
GC content is found to be variable with different organisms, the process of which is envisaged to be contributed by variation in selection, mutational bias and biased recombination-associated DNA repair. The species problem in prokaryotic taxonomy has led to various suggestions in classifying bacteria and the ad hoc committee of on reconciliation of approaches to bacterial systematics has recommended use of GC ratios in higher level hierarchical classification .For example, the Actinobacteria are characterised as "high GC-content bacteria". In "Streptomyces coelicolor" A3(2) it is 72%. The GC-content of Yeast (Saccharomyces cerevisiae) is 38%, and that of another common model organism Thale Cress (Arabidopsis thaliana) is 36%. Because of the nature of the genetic code, it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. A species with an extremely low GC-content is Plasmodium falciparum (GC% = ~20%), and it is usually common to refer to such examples as being AT-rich instead of GC-poor.