User:Forluvoft/sandbox/Transcription factor

In the field of molecular biology, a transcription factor is a protein that binds to specific parts of DNA using DNA binding domains and is part of the system that controls the transfer (or transcription) of genetic information from DNA to RNA.

Transcription factors perform this function alone, or by using other proteins in a complex, by increasing (as an activator), or preventing (as a repressor) the presence of RNA polymerase, a protein which transcribes genetic information.

DNA binding domains
The portion (domain) of the transcription factor that binds DNA is called its DNA binding domain. Below is a partial list of some of the major families of DNA-binding domains/transcription factors:


 * lambda repressor-like
 * C-terminal effector domain of the bipartite response regulators
 * srf-like (serum response factor)
 * basic-helix-loop-helix
 * GCC box
 * Zn2/Cys6
 * winged helix
 * Zn2/Cys8 nuclear receptor zinc finger
 * homeodomain proteins - bind to homeobox DNA sequences which in turn encode other transcription factors. Homeodomain proteins play critical roles in the regulation of development.
 * multi-domain Cys2His2 zinc fingers
 * basic-leucine zipper (bZIP)

There are other proteins that play crucial roles in the regulation of transcription, that aren't classified as transcription factors because they lack DNA binding domains. (for example coactivators, chromatin remodelers, histone acetylases, deacetylases, kinases, and methylases).

Transcription factor binding sites/response elements
The DNA sequence that a transcription factor binds to is called a transcription factor binding site or response element.

Chemically, transcription factors usually interact with their binding sites using a combination of hydrogen bonds and Van der Waals forces. Due to the nature of these chemical interactions, most transcription factors bind DNA in a sequence specific manner. However, not all bases in the transcription factor binding site may actually interact with the transcription factor. In addition some of these interactions may be weaker than others. Thus, transcription factors don't bind just one sequence but are capable of binding a subset of closely related sequences, each with a different strength of interaction.

For example, although the consensus binding site for the TATA binding protein (TBP) is:

TATAAAA

the TBP transcription factor can also bind similar sequences such as:

TATATAT or  TATATAA

Because transcription factors can bind a set of related sequences and the sequences don't tend to be that long, potential transcription factor binding sites can occur just by chance if the DNA sequence is long enough. It is unlikely, however, that a transcription factor binds all compatible sequences in the genome of the cell. Other constraints, such as DNA accessibility in the cell or availability of cofactors may also help dictate where a transcription factor will actually bind. Thus, given the genome sequence it is still difficult to predict where a transcription factor will actually bind in a living cell.

Regulation of transcription factor activity
It is common in biology for important processes to have multiple layers of regulation and control. This is just as true with transcription: not only does rates of transcription regulate the amounts of gene products (RNA and protein) available to the cell, but the process of transcription itself is regulated. Below is a brief synopsis of some of the ways that the activity of transcription factors can be regulated:


 * Transcription factor synthesis Transcription factors (like all proteins) are transcribed from a gene on a chromosome into RNA, and then the RNA is translated into protein.  Any of these steps can be regulated to affect the production (and thus activity) of a transcription factor.  One interesting implication of this is that transcription factors can regulate themselves.  For example, in a negative feedback loop, the transcription factor acts as its own repressor:  if the transcription factor protein binds the DNA of its own gene, it will down-regulate the production of more of itself.  This is one mechanism to maintain low levels of a transcription factor in a cell.


 * Localization to the nucleus In eukaryotes, transcription factors (like most proteins) are transcribed in the nucleus but are then translated in the cell's cytoplasm.  Many proteins that are active in the nucleus contain nuclear localization signals that direct them to the nucleus.  But for many transcription factors this is a key point in their regulation.  Important classes of transcription factors such as ?? hormone receptors ?? must first bind a ligand while in the cytoplasm before they can relocate to the nucleus.


 * Activation via chemical modifications or ligand binding Not only is ligand binding able to influence where a transcription factor is located within a cell, but this can also affect whether the transcription factor is in an active state and capable of binding DNA or other cofactors.  Another way that a transcription factor can be activated is my chemical modification of the transcription factor itself.  For example, many transcription factors such as ??? must be phosphorylated before they can bind DNA.


 * Accessibility of DNA binding site In eukaryotes, any genes that are not being actively transcribed are located in heterochromatin.  Heterochromatin are regions of chromosomes that are heavily compacted by tightly bundling the DNA onto histones.  DNA within heterochromatin is inaccessible to many transcription factors.  For the transcription factor to bind to its DNA binding site the heterochromatin must be first converted to euchromatin, usually via histone modifications.


 * Availability of other cofactors/transcription factors needed for a complex Most transcription factors don't work alone.  Often for gene transcription to occur, a number of transcription factors must bind to DNA regulatory sequences.  This collection of transcription factors in turn recruit intermediary proteins such as cofactors that allow efficient recruitment of the preinitiation complex and RNA polymerase.  Thus, for a single transcription factor to initiate transcription, all of these other proteins must also be present and the transcription factor must be in a state where it can bind to them if necessary.

Structure


Transcription factors are modular in structure and contain the following domains:
 * DNA-binding domain (DBD) which attach to specific sequences of DNA (enhancer or promoter sequences) adjacent to regulated genes. DNA sequences which bind transcription factors are often referred to as response elements.
 * Trans-activating domain (TAD) which contain binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions (AFs).
 * An optional signal sensing domain (SSD) (e.g., a ligand binding domain) which senses external signals and in response transmit these signals to the rest of the transcription complex resulting in up or down regulation of gene expression. Alternatively the DBD and signal sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.

Mechanism of action
Transcription factors may be activated (or deactivated) through their signal sensing domain by a number of mechanisms including:
 * ligand binding (see for example nuclear receptors)
 * phosphorylation
 * interactions with other transcription factors (e.g., homo- or hetero-dimerization) and/or coregulatory proteins

The resulting activated transcription factors through their DBD bind to specific sequences of DNA upstream or downstream to the gene they regulate and then either enhance or repress transcription of these genes by assisting or blocking RNA polymerase binding respectively. A cluster of transcription factors is the preinitiation complex (PIC) that recruits and activates RNA polymerase. Conversely, repressor transcription factors inhibit transcription by blocking the attachment of activator proteins.

The regulation of transcription is a highly complex process as it is dependent upon a number of factors including which transcription factors and other coregulatory proteins are present within a particular cell as well as the local 3-dimensional structure of the gene (chromatin).

Initial models, based on in vitro experiments suggested that the assembly of transcription factors dictated by the DNA sequence. It is, however, becoming increasingly obvious that the events leading to activation of transcription are dependent on a large number of factors and are highly intertwined. Furthermore epigenetic information present on DNA appears to play an important role in transcriptional activation.

Mechanistic
There are three mechanistic classes of transcription factors:


 * General transcription factors are involved in the formation of a preinitiation complex. The most common are abbreviated as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. They are ubiquitous and interact with the core promoter region surrounding the transcription start site(s) of all class II genes.
 * Upstream transcription factors are proteins that bind somewhere upstream of the initiation site to stimulate or repress transcription.
 * Inducible transcription factors are similar to upstream transcription factors but require activation or inhibition.

Functional
Alternatively transcription factors have been classified according to their regulatory function:


 * I. constitutively active - present in all cells at all times - general transcription factors, Sp1, NF1, CCAAT
 * II. conditionally active - requires activation
 * II.A developmental (cell specific) - expression is tightly controlled but once expressed require no additional activation - GATA, HNF, PIT-1, MyoD, Myf5, Hox, Winged Helix
 * II.B signal dependent - requires external signal for activation
 * II.B.1 extracellular ligand dependent - nuclear receptors
 * II.B.2 intracellular ligand dependent - activated by small intracellular molecules - SREBP, p53, orphan nuclear receptors
 * II.B.3 cell membrane receptor dependent- second messenger signaling cascades resulting in the phosphorylation of the transcription factor
 * II.B.3.a resident nuclear factors - reside in the nucleus regardless of activation state -   CREB, AP-1, Mef2
 * II.B.3.b latent cytoplasmic factors - inacitve form reside in the cytoplasm but when activated are translocated into the nucleus - STAT, R-SMAD, NF-kB, Notch, TUBBY, NFAT

Roles and Conservation in Different Organisms
Transcription factors are essential for the regulation of gene expression and consequently are found in all living organisms. The number of transcription factors found within an organism increases with the genome size and the larger genomes tend to have more transcription factors per gene.

There are approximately 2600 proteins in the human genome that contain DNA-binding domains and most of these are presumed to function as transcription factors. Therefore approximately 10% of genes in the genome code for transcription factors which makes this family the single largest family of human proteins. Furthermore genes are often flanked by several binding sites for distinct transcription factors and efficient expression of each these genes requires the cooperative action of several different transcription factors (see for example hepatocyte nuclear factors). Hence the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.

Classification of Transcription Factors
Transcription factors are often classified based on the similarity of their DNA binding domains:


 * 1 Superclass: Basic Domains (Basic-helix-loop-helix)
 * 1.1 Class: Leucine zipper factors (bZIP)
 * 1.1.1 Family: AP-1(-like) components; includes (c-Fos/c-Jun)
 * 1.1.2 Family: CREB
 * 1.1.3 Family: C/EBP-like factors
 * 1.1.4 Family: bZIP / PAR
 * 1.1.5 Family: Plant G-box binding factors
 * 1.1.6 Family: ZIP only
 * 1.2 Class: Helix-loop-helix factors (bHLH)
 * 1.2.1 Family: Ubiquitous (class A) factors
 * 1.2.2 Family: Myogenic transcription factors (MyoD)
 * 1.2.3 Family: Achaete-Scute
 * 1.2.4 Family: Tal/Twist/Atonal/Hen
 * 1.3 Class: Helix-loop-helix / leucine zipper factors (bHLH-ZIP)
 * 1.3.1 Family: Ubiquitous bHLH-ZIP factors; includes USF ; SREBP (SREBP)
 * 1.3.2 Family: Cell-cycle controlling factors; includes c-Myc
 * 1.4 Class: NF-1
 * 1.4.1 Family: NF-1
 * 1.5 Class: RF-X
 * 1.5.1 Family: RF-X
 * 1.6 Class: bHSH


 * 2 Superclass: Zinc-coordinating DNA-binding domains
 * 2.1 Class: Cys4 zinc finger of nuclear receptor type
 * 2.1.1 Family: Steroid hormone receptors
 * 2.1.2 Family: Thyroid hormone receptor-like factors
 * 2.2 Class: diverse Cys4 zinc fingers
 * 2.2.1 Family: GATA-Factors
 * 2.3 Class: Cys2His2 zinc finger domain
 * 2.3.1 Family: Ubiquitous factors, includes TFIIIA, Sp1
 * 2.3.2 Family: Developmental / cell cycle regulators; includes Krüppel
 * 2.3.4 Family: Large factors with NF-6B-like binding properties
 * 2.4 Class: Cys6 cysteine-zinc cluster
 * 2.5 Class: Zinc fingers of alternating composition


 * 3 Superclass: Helix-turn-helix
 * 3.1 Class: Homeo domain
 * 3.1.1 Family: Homeo domain only; includes Ubx
 * 3.1.2 Family: POU domain factors; includes Oct
 * 3.1.3 Family: Homeo domain with LIM region
 * 3.1.4 Family: homeo domain plus zinc finger motifs
 * 3.2 Class: Paired box
 * 3.2.1 Family: Paired plus homeo domain
 * 3.2.2 Family: Paired domain only
 * 3.3 Class: Fork head / winged helix
 * 3.3.1 Family: Developmental regulators; includes forkhead
 * 3.3.2 Family: Tissue-specific regulators
 * 3.3.3 Family: Cell-cycle controlling factors
 * 3.3.0 Family: Other regulators
 * 3.4 Class: Heat Shock Factors
 * 3.4.1 Family: HSF
 * 3.5 Class: Tryptophan clusters
 * 3.5.1 Family: Myb
 * 3.5.2 Family: Ets-type
 * 3.5.3 Family: Interferon regulatory factors
 * 3.6 Class: TEA domain
 * 3.6.1 Family: TEA


 * 4 Superclass: beta-Scaffold Factors with Minor Groove Contacts
 * 4.1 Class: RHR (Rel homology region)
 * 4.1.1 Family: Rel/ankyrin; NF-kappaB
 * 4.1.2 Family: ankyrin only
 * 4.1.3 Family: NF-AT
 * 4.2 Class: STAT
 * 4.2.1 Family: STAT
 * 4.3 Class: p53
 * 4.3.1 Family: p53
 * 4.4 Class: MADS box
 * 4.4.1 Family: Regulators of differentiation; includes (Mef2)
 * 4.4.2 Family: Responders to external signals, SRF (serum response factor)
 * 4.5 Class: beta-Barrel alpha-helix transcription factors
 * 4.6 Class: TATA binding proteins
 * 4.6.1 Family: TBP
 * 4.7.1 Family: SOX genes, SRY
 * 4.7.2 Family: TCF-1
 * 4.7.3 Family: HMG2-related, SSRP1
 * 4.7.5 Family: MATA
 * 4.8 Class: Heteromeric CCAAT factors
 * 4.8.1 Family: Heteromeric CCAAT factors
 * 4.9 Class: Grainyhead
 * 4.9.1 Family: Grainyhead
 * 4.10 Class: Cold-shock domain factors
 * 4.10.1 Family: csd
 * 4.11 Class: Runt
 * 4.11.1 Family: Runt


 * 0 Superclass: Other Transcription Factors
 * 0.1 Class: Copper fist proteins
 * 0.2 Class: HMGI(Y)
 * 0.2.1 Family: HMGI(Y)
 * 0.3 Class: Pocket domain
 * 0.4 Class: E1A-like factors
 * 0.5 Class: AP-2/EREBP-related factors