DNA replication



DNA replication is the process of copying a double-stranded deoxyribonucleic acid (DNA) molecule, a process essential in all known life forms. The general mechanisms of DNA replication are different in prokaryotic and eukaryotic organisms.

As each DNA strand holds the same genetic information, both strands can serve as templates for the reproduction of the opposite strand. The template strand is preserved in its entirety and the new strand is assembled from nucleotides. This process is called semiconservative replication. The resulting double-stranded DNA molecules are identical; proofreading and error-checking mechanisms exist to ensure extremely high fidelity.

In a cell, DNA replication must happen before cell division. Prokaryotes replicate their DNA throughout the interval between cell divisions. In eukaryotes, timings are highly regulated and this occurs during the S phase of the cell cycle, preceding mitosis or meiosis I.

DNA structure
A DNA strand is a long polymer built from nucleotides; two complementary DNA strands form a double helix, each strand possessing a 5' phosphate end and a 3' hydroxyl end. The numbers followed by the prime indicate the position on the deoxyribose sugar backgone to which the phosphate or hydroxyl group is attached (numbers without primes are reserved for the bases). The two strands in the DNA backbone run anti-parallel to each other: One of the DNA strands is built in the 5' → 3' direction while the other runs in an anti parallel direction, although its information is stored in the 3' → 5' direction. Each nucleotide consists of a phosphate, a simple sugar or a deoxyribose sugar - forming the backbone of the DNA double helix - plus a base. The bonding angles of the backbone ensures that DNA will tend to twist as the length of the molecule progresses, giving rise a double helix shape instead of a straight ladder. Base pairs form the steps of the helix ladder while the sugars and phosphate molecules forms the handrail. Each of the four bases has a partner to which it makes the strongest hydrogen bonds. When a nucleotide base forms hydrogen bonds with its complementary base on the other strand, they form a base pair: Adenine pairs with thymine and cytosine pairs with guanine. These pairings can be expressed as C•G and A•T, or C:::G and A::T where the number of colons indicate the number of hydrogen bonds between each base pair. For example, a 10-base pair strand running in the 5' → 3' direction that has adenine as the 3rd base will pair with the base thymine as the 7th base on the complementary 10-base pair strand running in the opposite direction.

The replication fork
The replication fork is a structure which forms when DNA is being replicated. It is created through the action of helicase, which breaks the hydrogen bonds holding the two DNA strands together. The resulting structure has two branching "prongs", each one made up of a single strand of DNA.

Lagging strand synthesis
In DNA replication, the lagging strand is the DNA strand at the replication fork opposite to the leading strand. It is also oriented in the opposite direction when compared to the leading strand, with the 5' near the replication fork instead of the 3' end as is the case with the leading strand. When the enzyme helicase unwinds DNA, two single stranded regions of DNA (the "replication fork") form. DNA polymerase cannot build a strand in the 3' → 5' direction. This poses no problems for the leading strand, which can continuously synthesize DNA in a processive manner, but creates a problem for the lagging strand, which cannot be synthesized in the 3' → 5' direction. Thus, the lagging strand is synthesized in short segments known as Okazaki fragments. On the lagging strand, primase builds an RNA primer in short bursts. DNA polymerase is then able to use the free 3' hydroxyl group on the RNA primer to synthesize DNA in the 5' → 3' direction. The RNA fragments are then removed (different mechanisms are used in eukaryotes and prokaryotes) and new deoxyribonucleotides are added to fill the gaps where the RNA was present. DNA ligase is then able to join the deoxyribonucleotides together, completing the synthesis of the lagging strand.

Leading strand synthesis
The leading strand is defined as the DNA strand that is read in the 3' → 5' direction but synthesized in the 5'→ 3' direction, in a continuous manner. On this strand, DNA polymerase III is able to synthesize DNA using the free 3'-OH group donated by a single RNA primer (multiple RNA primers are not used) and continuous synthesis occurs in the direction in which the replication fork is moving.

Dynamics at the replication fork
The sliding clamp in all domains of life share a similar structure, and are able to interact with the various processive and non-processive DNA polymerases found in cells. In addition, the sliding clamp serves as a processivity factor. The C-terminal end of the clamps forms loops which are able to interact with other proteins involved in DNA replication (such as DNA polymerase and the clamp loader). The inner face of the clamp allows DNA to be threaded through it. The sliding clamp forms no specific interactions with DNA. There is a large 35A hole in the middle of the clamp. This allows DNA to fit through it, and water to take up the rest of the space allowing the clamp to slide along the DNA. Once the polymerase reaches the end of the template or detects double stranded DNA (see below), the sliding clamp undergoes a conformational change which releases the DNA polymerase.

The clamp loader, a multisubunit protein, is able to bind to the sliding clamp and DNA polymerase. When ATP is hydrolyzed, it loses affinity for the sliding clamp allowing DNA polymerase to bind to it. Furthermore, the sliding clamp can only be bound to a polymerase as long as single stranded DNA is being synthesized. Once the single stranded DNA runs out, the polymerase is able to bind to the a subunit on the clamp loader and move to a new position on the lagging strand. On the leading strand, DNA polymerase III associates with the clamp loader and is bound to the sliding clamp.

Recent evidence suggests that the enzymes and proteins involved in DNA replication remain stationary at the replication forks while DNA is looped out to maintain bidirectionality observed in replication. This is a result of an interaction between DNA polymerase, the sliding clamp, and the clamp loader.

DNA replication
DNA replication differs somewhat between eukaryotic and prokaryotic cells. Much of our knowledge of the process DNA replication was derived from the study of E. coli, while yeast has been used as a model organism for understanding eukaryotic DNA replication.

It is not known how RNA polymerase produces enough energy to carry out replication.

Mechanism of replication
Once priming of DNA is complete, DNA polymerase is loaded into the DNA and replication begins. The catalytic mechanism of DNA polymerase involves the use of two metal ions in the active site and a region in the active site that can discriminate between deoxynucleotides and ribonucleotides. The metal ions are general divalent cations that help the 3'-OH initiate a nucleophilic attack onto the alpha-phosphate of the deoxyribonucleotide and orient and stabilize the negatively-charged triphosphate on the deoxyribonucleotide. Nucleophillic attack by the 3'-OH on the alpha phosphate releases pyrophosphate, which is then subsequently hydrolyzed by inorganic pyrophosphatase into two phosphates. This hydrolysis drives DNA synthesis to completion.

Furthermore, DNA polymerase must be able to distinguish between correctly paired bases and incorrectly paired bases. This is accomplished by distinguishing Watson-Crick base pairs through the use of an active site pocket that is complementary in shape to the structure of correctly paired nucleotides. This pocket has a tyrosine residue that is able to form van der Waals interactions with the correctly paired nucleotide. In addition, double stranded DNA in the active site has a wider and shallower minor groove that permits the formation of hydrogen bonds with the third nitrogen of purine bases and the second oxygen of pyrimidine bases. Finally, the active site makes extensive hydrogen bonds with the DNA backbone. These interactions result in the DNA polymerase III closing around a correctly paired base. If a base is inserted and incorrectly paired, these interactions could not occur due to disruptions in hydrogen bonding and van der Waals interactions. The mechanism of replication is similar in eukaryotes and prokaryotes.

DNA is read in the 3' → 5' direction, relative to the parent strand, therefore, nucleotides are synthesized (or attached to the template strand) in the 5' → 3' direction, relative to the daughter strand. However, one of the parent strands of DNA is 3' → 5' and the other is 5' → 3'. To solve this, replication must proceed in opposite directions. The leading strand runs towards the replication fork and is thus synthesized in a continuous fashion, only requiring one primer. On the other hand, the lagging strand runs in the opposite direction, heading away from the replication fork, and is synthesized in a series of short fragments known as Okazaki fragments, consequently requiring many primers. The RNA primers of Okazaki fragments are subsequently degraded by RNase H and DNA Polymerase I (exonuclease), and the gap (or nicks) are filled with deoxyribonucleotides and sealed by the enzyme ligase.

Initiation of replication and the bacterial origin
DNA replication in E. coli is bi-directional and originates at a single origin of replication (OriC). The initiation of replication is mediated by DnaA, a protein that binds to a region of the origin known as the DnaA box. In E. coli, there are 5 DnaA boxes, each of which contains a highly conserved 9-base pair consensus sequence 5' - TTATCCACA - 3'. Binding of DnaA to this region causes it to become negatively supercoiled. Following this, a region of OriC upstream of the DnaA boxes (known as DnaB boxes) melts. There are three of these regions. Each are 13 base pairs long and rich in A-T base pairs. This facilitates melting because less energy is required to break the two hydrogen bonds that form between A and T nucleotides. This region has the consensus sequence 5' - GATCTNTTNTTTT - 3. Melting of the DnaB boxes requires ATP (which is hydrolyzed by DnaA). Following melting, DnaA recruits a hexameric helicase (six DnaB proteins) to opposite ends of the melted DNA. This is where the replication fork will form. Recruitment of helicase requires six DnaC proteins, each of which is attached to one subunit of helicase. Once this complex is formed, an additional five DnaA proteins bind to the original five DnaA proteins to form five DnaA dimers. DnaC is then released, and the prepriming complex is complete. In order for DNA replication to continue, single-strand binding proteins (SSBs) are needed to prevent the single strands of DNA from forming any secondary structures and to prevent them from reannealing, and DNA gyrase is needed to relieves the stress (by creating negative supercoils) created by the action of DnaB helicase. The unwinding of DNA by DnaB helicase allows for primase (DnaG) an RNA polymerase to prime each DNA template so that DNA synthesis can begin.

Termination of replication
Termination of DNA replication in E. coli is completed through the use of termination sequences and the Tus protein. These sequences allow the two replication forks to pass through in only one direction, but not the other. In order to slow down and stop the movement of the replication fork in the termination region of the E. coli chromosome, the Tus protein is required. This protein binds to the termination sites, and prevents DnaB from displacing DNA strands. However, these sequences are not required for termination of replication.

Regulation of replication
Regulation of DNA replication is achieved through several mechanisms. Mechanisms of regulation involve the ratio of ATP to ADP, the ratio of DnaA protein to DnaA boxes and the hemimethylation and sequestering of OriC. The ratio of ATP to ADP indicates that the cell has reached a specific size and is ready to divide. This "signal" occurs because in a rich medium, the cell will grow quickly and will have a lot of excess ATP. Furthermore, DnaA binds equally well to ATP or ADP, but only the DnaA-ATP complex is able to initiate replication. Thus, in a fast growing cell, there will be more DnaA-ATP than DnaA-ADP.

Another mode of regulation involves the levels of DnaA in the cell. 5 DnaA-DnaA dimers are needed to initiate replication. Thus, the ratio of DnaA to the number of DnaA boxes in the cell is important. After DNA replication is complete, this number is halved and replication cannot occur until the levels of DnaA protein increase.

Finally, upon completion of DNA replication, DNA is sequestered to a membrane-binding protein called SeqA. This protein binds to hemimethylated GATC DNA sequences. This 4-base pair sequence occurs 11 times in OriC. Only the parent strand is methylated upon completion of DNA synthesis. DAM methyltransferase methylates the adenine residues in the newly synthesized strand of DNA only if it is not bound to SeqA. The importance of this form of regulation is twofold: 1) OriC becomes inaccessible to DnaA and 2) DnaA binds better to fully methylated DNA than hemimethylated DNA.

Rolling circle replication
Replication of the bacterial chromosome is known as theta (θ) replication. However, another method of replication exists in bacterial cells, known as rolling circle replication.

Rolling circle replication describes a process of DNA replication that can rapidly synthesize multiple copies of circular molecules of DNA, such as plasmids and the genomes of bacteriophages.

Rolling circle replication is initiated by an initiator protein encoded by the plasmid or bacteriophage DNA. This protein is able to nick one strand of the double-stranded, circular DNA molecule at a site called the double-strand origin (DSO) and remains bound to the 5'-PO4 end of the nicked strand. The free 3'-OH end is released and can serve as a primer for DNA synthesis by DNA polymerase III. Using the unnicked strand as a template, replication proceeds around the circular DNA molecule, displacing the nicked strand as single-stranded DNA.

Continued DNA synthesis can produce multiple single-stranded linear copies of the original DNA in a continuous head-to-tail series. These linear copies can be converted to double-stranded circular molecules through the following process: First, the initiator protein makes another nick to terminate synthesis of the first (leading) strand. RNA polymerase and DNA polymerase III then replicate the single-stranded origin (SSO) DNA to make another double-stranded circle. DNA polymerase I removes the primer, replacing it with DNA, and DNA ligase joins the ends to make another molecule of double-stranded circular DNA.

A striking feature of rolling circle replication is the uncoupling of the replication of the two strands of the DNA molecule. In contrast to common modes of DNA replication where both the parental DNA strands are replicated simultaneously, in rolling circle replication one strand is replicated first (which protrudes after being displaced, giving the characteristic appearance) and the second strand is replicated after completion of the first one.

Rolling circle replication has found wide uses in academic research and biotechnology, and has been successfully used for amplification of DNA from very small amounts of starting material.

Plasmid replication: Origin and regulation
The regulation of plasmids differs considerably from the regulation of chromosomal replication. However, the machinery involved in the replication of plasmids is similar to that of chromosomal replication. The plasmid origin is commonly termed OriV, and at this site DNA replication is initiated. The ori region of plasmids, unlike that found on the host chromosome, contain the genes required for its replication. In addition, the ori region determines the host range. Plasmids carrying the ColE1 origin have a narrow host range and are restricted to the relatives of E. coli. Plasmids of utilizing the RK2 ori and ones that replicate using rolling circle replication have a broad host range and are compatible with gram positive and gram negative bacteria. Another important characteristic of the ori region is the regulation of plasmid copy number. Generally, high copy number plasmids have mechanisms that inhibit the initiation of replication. Regulation of plasmids based on the ColE1 origin, a high copy number origin, require an antisense RNA. A gene close to the origin, RNAII is transcribed and the 3'-OH of the transcript primes the origin only if it is cleaved by RNase H. Transcription of RNAI, the antisense RNA, inhibits the RNAII from priming the DNA because it prevents the formation of the RNA-DNA hybrid recognized by RNase H.

Eukaryotic DNA replication
Although the mechanisms of DNA synthesis in eukaryotes and prokaryotes are similar, DNA replication in eukaryotes is much more complicated. Though DNA synthesis in prokaryotes such as E. coli is regulated, DNA replication is initiated before the end of the cell cycle. Eukaryotic cells can only initiate DNA replication at a specific point in the cell cycle known as the S phase.

DNA replication in eukaryotes occurs only in the S phase of the cell cycle. However, pre-initiation occurs in G1. Due to the sheer size of chromosomes in eukaryotes, eukaryotic chromosomes contain multiple origins of replication. Some origins are well characterized, such as the autonomously replicating sequences (ARS) of yeast while other eukaryotic origins, particularly those in metazoa, can be found in spans of thousands of base pairs. However, the assembly and initiation of replication is similar in both the protozoa and metazoa. You can find detailed information on Yeast ARS elements on this website http://www.oridb.org/index.php

Initiation of replication
The first step in the eukaryotic DNA replication is the formation of the pre-initiation replication complex (the pre-RC). The formation of this complex occurs in two stages. The first stage requires that there is no cyclin-dependent kinase (CDK) activity. This can only occur in early G1. The formation of the pre-RC is known as licensing, but a licensed pre-RC cannot initiate replication. Initiation of replication can only occur during the S-phase. Thus, the separation of licensing and activation ensures that the origin can only fire once per cell cycle.

DNA replication in eukaryotes is not very well characterized. However, researchers believe that it begins with the binding of the origin recognition complex (ORC) to the origin. This complex is a hexamer of related proteins and remains bound to the origin, even after DNA replication occurs. Furthermore, ORC is the functional analogue of DnaA. Following the binding of ORC to the origin, Cdc6/Cdc18 and Cdt1 coordinate the loading of the minichromosome maintenance functions (MCM) complex to the origin by first binding to ORC and then binding to the MCM complex. The MCM complex is thought to be the major DNA helicase in eukaryotic organisms, and is a hexamer (mcm2-7). Once binding of MCM occurs, a fully licensed pre-RC exists.

Activation of the complex occurs in S-phase and requires Cdk2-Cyclin E and Ddk. The activation process begins with the addition of Mcm10 to the pre-RC, which displaces Cdt1. Following this, Ddk phosphorylates Mcm3-7, which activates the helicase. It is believed that ORC and Cdc6/18 are phosphorylated by Cdk2-Cyclin E. Ddk and the Cdk complex then recruits another protein called Cdc45, which then recruits all of the DNA replication proteins to the replication fork. At this stage the origin fires and DNA synthesis begins.

Regulation of replication
Activation of a new round of replication is prevented through the actions of the cyclin-dependent kinases and a protein known as geminin. Geminin binds to Cdt1 and sequesters it. It is a periodic protein that first appears in S-phase and is degraded in late M-phase, possibly through the action of the anaphase promoting complex (APC). In addition, phosphorylation of Cdc6/18 prevent it from binding to the ORC (thus inhibiting loading of the MCM complex) while the phosphorylation of ORC remains unclear. Cells in the G0 stage of the cell cycle are prevented from initiating a round of replication because the MCM proteins are not expressed. Researchers believe that termination of DNA replication in eukaryotes occurs when two replication forks encounter each other. It is the first phase of translation.

Eukaryotic polymerases
Numerous polymerases can replicate DNA in eukaryotic cells. Currently, six families of polymerases (A, B, C, D, X, Y) have been discovered. At least four different types of DNA polymerases are involved in the replication of DNA in animal cells (POLA, POLG, POLD1 and POLE). POL1 functions by extending the primer in the 5' -> 3'. However, it lacks the ability to proofread DNA. POLD1 has a proofreading ability and is able to replicate the entire length of a template only when associated with PCNA. POLE is able to replicate the entire length of a template in the absence of PCNA and is able to proofread DNA while POLG replicates mitochondrial DNA via the D-Loop mechanism of DNA replication. All primers are removed by RNaseH1 and Flap Endonuclease I. The general mechanisms of DNA replication on the leading and lagging strand, however, are the same as to those found in prokaryotic cells.

Replication foci
Eukaryotic DNA replication takes place in discrete sites in the nucleus. These replication foci contain replication machinery (proteins involved in DNA replication)

Telomerase
A unique problem that occurs during the replication of linear chromosomes is chromosome shortening. Chromosome shortening occurs when the primer at the 5' end of the lagging strand is degraded. Because DNA polymerase cannot add new nucleotides to the 5' end of DNA (there is no place for a new primer), the ends would shorten after each round of replication. However, in most replicating cells a small amount of telomerase is present, and this enzyme extends the ends of the chromosomes so that this problem does not occur. This extension occurs when the telomerase enzyme binds to a section of DNA on the 3' end and extends it using the normal replication machinery. This then allows for a primer to bind so that the complementary strand can be extended by normal lagging strand synthesis. Finally, telomeres must be capped by a protein to prevent chromosomal instability.

Replication of mitochondrial DNA and chloroplast DNA
D-loop replication is a process by which chloroplasts and mitochondria replicate their genetic material. An important component of understanding D-loop replication is that chloroplasts and mitochondria have a single circular chromosome like bacteria instead of the linear chromosomes found in eukaryotes. Replication begins at the leading strand origin. The leading strand is replicated in one direction and after about 2/3 of the chromosome's leading strand has been replicated, the lagging strand origin is exposed. Replication of the lagging strand is 1/3 complete when the replication of the leading strand is finished. The resulting structure looks like the letter D, and this occurs because the synthesis of the leading strand displaces the lagging strand.

The D-loop region is important for phylogeographic studies. Because the region does not code for any genes, it is free to vary with only a few selective limitations on size and heavy/light strand factors. The mutation rate is among the fastest of anywhere in either the nuclear or mitochondrial genomes in animals. Mutations in the D-loop can effectively track recent and rapid evolutionary changes such as within species and among very closely related species.

DNA replication in archaea
Understanding DNA replication in the archaea is just beginning, and it is the goal of this section to provide a basic understanding of how DNA replication occurs in these unique prokaryotes. In addition, this section aims to provide a comparison between the three domains.

Origin of replication
The origins of archaea are AT rich, and generally have one or more AT stretches. In addition, long inverted repeats flank both ends of the origin, and are thought to be important in the initiation process and may serve a function similar to the DnaA boxes in the eubacteria. The genes that code for Cdc6/Orc1 are also located near the origin region, and this arrangement may allow these proteins to associate with the origin as soon as they are translated.

Initiation
Initiation of replication begins with the binding of Cdc6/Orc1 to the origin in an ATP independent manner. This complex is constitutively expressed and most likely forms the origin binding proteins (OBP). Due to their similarity with proteins involved in eukaryotic initiation, Cdc6/Orc1 may be involved in helicase loading in archaea. However, other evidence suggests that this complex may function as an initiator and create a sufficiently large replication bubble to allow the helicase (Mcm) to load without the presence of a loader. Once loading of this complex is complete, however, the DNA melts, and helicase can be loaded.

In archaea, a hexameric protein known as the Mcm complex may function as the primary helicase. This protein is homologous to the eukaryotic Mcm complex. In archaea, there is no cdt1 homologue, and the helicase may be able to self-assemble at an archaeal origin without the need for a helicase loader. These proteins possess 3'->5' helicase capability.

Elongation
Single stranded binding protein (SSB) prevents exposed single stranded DNA from forming any secondary structures or reannealing. This complex is able to recruit primase, DNA polymerase and other replication machinery. The mechanisms of this process are similar to those in eukaryotes.

Similarities to eukaryotic and eubacterial replication

 * ORC is homologous to Cdc6/Orc1 in archaea and may represent the ancestral state of the eukaryotic pre-RC.
 * A homologous Mcm protein exists between eukarya and archaea
 * The structure of Cdc6/Orc1 resembles the tertiary structure of DnaA in eubacteria
 * Both eukaryotic and archaeal helicases possess 3'->5' helicase capability
 * Archaeal SSB is similar to RPA