Transcription (genetics)

Overview


Transcription is the process by which genetic information from DNA is transferred into RNA. DNA sequence is enzymatically copied by RNA polymerase to produce a complementary nucleotide RNA strand. One significant difference between RNA and DNA sequence is the presence of U, or uracil in RNA instead of the T, or thymine of DNA. In the case of protein-encoding DNA, transcription is the first step that usually leads to the expression of the genes, by the production of the mRNA intermediate, which is then translated following the genetic code into a functional peptide or protein. The stretch of DNA that is transcribed into an RNA molecule is called a transcription unit. A transcription unit that is translated into protein contains sequences that direct and regulate protein synthesis in addition to coding the sequence that is translated into protein. The regulatory sequence that is before, or 5', of the coding sequence is called 5' untranslated (5'UTR) sequence, and sequence found following, or 3', of the coding sequence is called 3' untranslated (3'UTR) sequence. Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.

As in DNA replication, transcription proceeds in the 5' → 3' direction. The DNA template strand is read 3' → 5' by RNA polymerase and the new RNA strand is synthesized in the 5'→ 3' direction. RNA polymerase binds to the 3' end of a gene (promoter) on the DNA template strand and travels toward the 5' end. Except for the fact that thymines in DNA are represented as uracils in RNA, the newly synthesized RNA strand will have the same sequence as the coding (non-template) strand of the DNA. For this reason, scientists usually refer to the DNA coding strand that has the same sequence as the resulting RNA when referring to the directionality of genes on DNA, not the template strand.

Transcription is divided into 3 stages: initiation, elongation and termination.

Prokaryotic vs. eukaryotic transcription

 * Prokaryotic transcription occurs in the      cytoplasm alongside translation.
 * Eukaryotic transcription is primarily localized to the nucleus, where it is separated from the cytoplasm by the nuclear membrane. The transcript is then transported into the cytoplasm where translation occurs.
 * Another important difference is that eukaryotic DNA is wound around histones to form nucleosomes and packaged as chromatin. Chromatin has a strong influence on the accessibility of the DNA to transcription factors and the transcriptional machinery including RNA polymerase.
 * In prokaryotes, mRNA is not modified. Eukaryotic mRNA is modified through RNA splicing, 5' end capping, and the addition of a polyA tail.

Initiation


Unlike DNA replication, transcription does not need a primer to start. RNA polymerase simply binds to the DNA and, along with other cofactors, unwinds the DNA to create an initiation bubble so that the RNA polymerase has access to the single-stranded DNA template.

In bacteria, transcription begins with the binding of RNA polymerase to the promoter in DNA. The RNA polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. At the start of initiation, the core enzyme is associated with a sigma factor (number 70) that aids in finding the appropriate -35 and -10 basepairs downstream of promoter sequences.

Transcription initiation is far more complex in eukaryotes and archaea, the main difference being that eukaryotic polymerases do not recognize directly their core promoter sequences. Transcription factors must first mediate the binding of RNA polymerase and the initiation of transcription. The completed assembly of transcription factors and RNA polymerase bound to the promoter is called the transcription initiation complex.

In Prokaryotes RNA Polymerase bind the mRNA and then forms a "closed complex." This complex is unwound to create the open complex which has melted DNA from -12 to +2.

Elongation


One strand of DNA, the template strand (or non-coding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand is usually used as the reference point, so transcription is said to go from 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one less oxygen atom) in its sugar-phosphate backbone).

Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of replication, so many mRNA molecules can be produced from a single copy of a gene. This step also involves a proofreading mechanism that can replace incorrectly incorporated bases.

Prokaryotic elongation starts with the "abortive initiation cycle." During this cycle RNA Polymerase will synthesize mRNA fragments 2-12 nucleotides long. This continues to occur until the σ factor rearranges which results in the transcription elongation complex (which gives a 35 bp moving footprint). The σ factor is released before 80 nucleotides of mRNA is synthesized.

Termination


Bacteria use two different strategies for transcription termination: in Rho-independent transcription termination, RNA transcription stops when the newly synthesized RNA molecule forms a hairpin loop, followed by a run of Us, which makes it detach from the DNA template. In the "Rho-dependent" type of termination, a protein factor called "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex. Transcription termination in eukaryotes is less well understood. It involves cleavage of the new transcript, followed by template-independent addition of As at its new 3' end, in a process called polyadenylation.

Measuring and detecting transcription
Transcription can be measured and detected in a variety of ways:
 * Northern blot
 * RNase protection assay
 * RT-PCR
 * In vitro transcription
 * In situ hybridization
 * DNA microarrays

Transcription factories
Active transcription units are clustered in the nucleus, in discrete sites called ‘transcription factories’. Such sites could be visualized after allowing engaged polymerases to extend their transcripts in tagged precursors (Br-UTP or Br-U), and immuno-labeling the tagged nascent RNA. Transcription factories can also be localized using fluorescence in situ hybridization, or marked by antibodies directed against polymerases. There are ~10,000 factories in the nucleoplasm of a HeLa cell, among which are ~8,000 polymerase II factories and ~2,000 polymerase III factories. Each polymerase II factory contains ~8 polymerases. As most active transcription units are associated with only one polymerase, each factory will be associated with ~8 different transcription units. These units might be associated through promoters and/or enhancers, with loops forming a ‘cloud’ around the factory.

History
A molecule which allows the genetic material to be realized as a protein was first hypothesized by Jacob and Monod. RNA synthesis by RNA polymerase was established in vitro by several laboratories by 1965; however, the RNA synthesized by these enzymes had properties that suggested the existence of an additional factor needed to terminate transcription correctly.

Roger D. Kornberg won the 2006 Nobel Prize in Chemistry "for his studies of the molecular basis of eukaryotic transcription".

Reverse transcription
Some viruses (such as HIV, the cause of AIDS), have the ability to transcribe RNA into DNA. HIV has an RNA genome that is duplicated into DNA. The resulting DNA can be merged with the DNA genome of the host cell. The main enzyme responsible for synthesis of DNA from an RNA template is called reverse transcriptase. In the case of HIV, reverse transcriptase is responsible for synthesizing a complementary DNA strand (cDNA) to the viral RNA genome. An associated enzyme, ribonuclease H, digests the RNA strand, and reverse transcriptase synthesises a complementary strand of DNA to form a double helix DNA structure. This cDNA is integrated into the host cell's genome via another enzyme (integrase) causing the host cell to generate viral proteins which reassemble into new viral particles. Subsequently, the host cell undergoes programmed cell death (apoptosis).

Some eukaryotic cells contain an enzyme with reverse transcription activity called telomerase. Telomerase is a reverse transcriptase that lengthens the ends of linear chromosomes. Telomerase carries an RNA template from which it synthesizes DNA repeating sequence, or "junk" DNA. This repeated sequence of "junk" DNA is important because every time a linear chromosome is duplicated, it is shortened in length. With "junk" DNA at the ends of chromosomes, the shortening eliminates some repeated, or junk sequence, rather than the protein-encoding DNA sequence that is further away from the chromosome ends. Telomerase is often activated in cancer cells to enable cancer cells to duplicate their genomes without losing important protein-coding DNA sequence. Activation of telomerase can be part of the process that allows cancer cells to become immortal.