Chargaff's rules

Chargaff's rules state that DNA from any cell of all organisms should have a 1:1 ratio of pyrimidine and purine bases and, more specifically, that the amount of guanine is equal to cytosine and the amount of adenine is equal to thymine. This pattern is found in both strands of the DNA. They were discovered by Austrian chemist Erwin Chargaff.

Chargaff Parity Rule 1
Chargaff Parity Rule 1 holds that a double-stranded DNA molecule globally %A = %T and %G = %C. The rigorous validation of the rule constitutes the basis of Watson-Crick pairs in the DNA double helix.

Chargaff Parity Rule 2
Chargaff Parity Rule 2 holds that globally both %A ~ %T and %G ~ %C are valid for each of the two DNA strands. The Chargaff Parity Rule 2 describes only a global feature of the base composition in a single DNA strand.

Research
The second of Chargaff's rules (or "Chargaff's second parity rule") is that the composition of DNA varies from one species to another; in particular in the relative amounts of A, G, T, and C bases. Such evidence of molecular diversity, which had been presumed absent from DNA, made DNA a more credible candidate for the genetic material than protein.

In 2006 it was shown that this rule applies to four of the five types of double stranded genomes; specifically it applies to the eukaryotic chromosomes, the bacterial chromosomes, the double stranded DNA viral genomes, and the archeal chromosomes. It does not apply to the organellar genomes (mitochondria and plastids) nor does it apply to the single stranded DNA (viral) genomes or any type of RNA genome. The basis for this rule is still under investigation.

The rule itself has consequences. In most bacterial genomes (which are generally 80-90% coding) genes are arranged in such a fashion that approximately 50% of the coding sequence lies on either strand. Szybalski, in the 1960s, showed that in bacteriophage coding sequences purines (A and G) exceed pyrimidines (C and T). This rule has since been confirmed in other organisms and should probably be now termed "Szybalski's rule". While Sybalski's rule generally holds, exceptions are known to exist. The biological basis for Szybalski's rule, like Chargaff's, is not yet known.

The combined effect of Chargaff's second rule and Sybalski's rule can be seen in bacterial genomes where the coding sequences are not equally distributed. The genetic code has 64 codons of which 3 function as termination codons: there are only 20 amino acids normally present in proteins. (There are two uncommon amino acids - selenocysteine and pyrolysine - found in a limited number of proteins and encoded by the 'stop' codons - TGA and TAG respectively.) The mismatch between the number of codons and amino acids allows several codons to code for a single amino acid. These codons normally differ in the third codon base position.

Multivariate statistical analysis of codon use within genomes with unequal quantities of coding sequences on the two strands has shown that codon use in the third position depends on the strand on which the gene is located. This seems likely to be the result of Szybalski's and Chargaff's rules. Because of the asymmetry in pyrimidine and purine use in coding sequences, the strand with the greater coding content will tend to have the greater number of purine bases (Szybalski's rule). Because the number of purine bases will to a very good approximation equal the number of their complementary pyrimidines within the same strand and because the coding sequences occupy 80-90% of the strand, there appears to be (1) a selective pressure on the third base to minimise the number of purine bases in the strand with the greater coding content and (2) that this pressure is proportional to the mismatch in the length of the coding sequences between the two strands.

The origin of the deviation from Chargaff's rule in the organelles has been suggested to be a consequence of the mechanism of replication. During replication the DNA strands separate. In single stranded DNA, cytosine spontaneously slowly deaminates to adenosine (a C to A transversion). The longer the strands are separated the greater the quantity of deamination. For reasons that are not yet clear the strands tend to exist longer in single form in mitochondria than in chromsomal DNA. This process tends to yield one strand that is enriched in guanine (G) and thymine (T) with its complement enriched in cytosine (C) and adenosine (A) and this process may have given rise to the deviations found in the mitochondria.

Chargaff's second rule appears to be the consequence of a more complex parity rule: within a single strand of DNA any oligonucleotide is present in equal numbers to its complementary nucleotide. Because of the computational requirements this has not been verified in all genomes for all oligonucleotides. It has been verified for triplet oligonucleotides for a large data set. Albrecht-Buehler has suggested that this rule is the consequence of genomes evolving by a process of inversion and transposition. This process does not appear to have acted on the mitochondrial genomes.

A connection between the Fibonacci numbers and Chargaff's second rule in the human genome has been proposed.

Relative proportions (%) of bases in DNA
Both of Chargaff's rules are supported by the following table: