CpG island

CpG islands are regions where there are a large number of cytosine and guanine adjacent to each other in the backbone of the DNA (i.e. linked by phosphodiester bonds). They are in and near approximately 40% of promoters of mammalian genes (about 70% in human promoters). The "p" in CpG notation refers to the phosphodiester bond between the cytosine and the guanine.

The length of a CpG island is typically 300-3000 base pairs. These regions are characterized by CpG dinucleotide content equal to or greater than what would be statistically expected (≈6%), whereas the rest of the genome has much lower CpG frequency (≈1%), a phenomenon called CG suppression. Unlike CpG sites in the coding region of a gene, in most instances, the CpG sites in the CpG islands of promoters are unmethylated if genes are expressed. This observation led to the speculation that methylation of CpG sites in the promoter of a gene may inhibit the expression of a gene. Methylation is central to imprinting alongside histone modifications.

The usual formal definition of a CpG island is a region with at least 200 bp and with a GC percentage that is greater than 50% and with an observed/expected CpG ratio that is greater than 0.6. The majority of these islands are associated with genes, and can be used as recognition sites for restriction enzymes.

CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites, and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole.