Paleopolyploidy



Paleopolyploidy refers to ancient genome duplications which occurred at least several million years ago (mya). The genome doubling event could either be an autopolyploidy or an allopolyploidy. Due to functional redundancy, genes are rapidly silenced and/or lost from the duplicated genomes. Most paleopolyploids, through evolutionary time, have lost their "polyploid" status through a process called diploidization, and are referred as "diploids" nowadays (eg. baker's yeast, Arabidopsis and perhaps humans).

Paleopolyploidy is extensively studied in plant linages. It has been found that almost all flowering plants have at least one round of genome duplication sometime during their evolutionary history. Ancient genome duplications are also found in the early ancestor of vertebrates (which includes the human lineage) and another near the origin of the bony fishes. Interestingly, evidence suggests that baker's yeast (Saccharomyces cerevisiae), which has a compact genome, experienced polyploidzation during its evolutionary history.

Paleopolyploidy in eukaryotes
Ancient genome duplications are widespread throughout eukaryotic lineages, particularly in plants. Almost all important cereal crops are paleopolyploids. Studies suggest that the common ancestor of Poaceae, the grass family, had a genome duplication 50-70 mya. Subsequent genome doublings occurred in maize and wheat (two rounds).

Furthermore, Arabidopsis thaliana, which has a small genome for a plant, experienced at least two rounds of paleopolyploidy. The most recent event took place before the divergence of Arabidopsis and Brassica lineages 25-40 mya; another one that is shared by all eudicots 50-70 mya and perhaps a third round that is shared by all flowering plants >200 mya.

Compared with plants, paleopolyploidy is much rarer in the animal kingdom. It is identified mainly in the amphibians and bony fishes. Although some studies suggested one (sometimes two) common genome duplications are shared by all vertebrates (including human), the evidence is not as strong as other cases and it is still under debate. However, many researchers were interested in the reason why animal lineages had fewer paleopolyploidization events than plants.

Lastly, a well-supported paleopolyploidy has been found in baker's yeast (Saccharomyces cerevisiae), despite its small, compact genome (~13Mbp) after the divergence from K. waltii. Through genome streamlining, yeast has lost 90% of the duplicated genome over evolutionary time and is recognized as a diploid organism nowadays.

Detection method
Duplicated genes can be identified through sequence homology on the DNA or protein level. Paleopolyploidy can be identified as massive gene duplication at one time using a molecular clock. To distinguish between whole-genome duplication and a collection of single gene duplication (which is a common phenomenon in the genome) events, the following rules are often applied:



1. Duplicated genes are located in large duplicated blocks. Single gene duplication is a random process and tends to make duplicated genes scattered throughout the genome.

2. Duplicated blocks are non-overlapping because they were created simultaneously Segmental duplication within the genome can fulfill Rule #1; but multiple independent segmental duplications could overlap each other.

In theory, the two duplicated genes should have the same "age"; that is, the divergence of the sequence should be equal between the two genes duplicated by paleopolyploidy (homeologs). Synonymous substitution rate, Ks, is often used as a molecular clock to determine the time of gene duplication. Thus, paleopolyploidy is identified as a "peak" on the duplicate number vs. Ks graph (shown on the right).

Duplication events that occurred a long time ago in the history of various evolutionary lineages can be difficult to detect because of subsequent diploidization (such that a polyploid starts to behave cytogenetically as a diploid over time) as mutations and gene translations gradually make one copy of each chromosome unlike its counterpart. This usually results in a low confidence for identifying a very ancient paleopolyploidy.

Evolutionary importance
Paleopolyploidization events lead to massive cellular changes, including doubling of the genetic material, changes in gene expression and increased cell size. Genes lost during diploidization is not completely random, but heavily selected. Genes from large gene family, which diversity is required for their functions or genes that function in a dosage manner, tend to be retained (eg. cellular machinery, transcription factors and protein kinases). On the other hand, genes involved in DNA repair, apoptosis and transmembrane receptors tend to return to a single copy status. Overall, paleopolyploidy can have both short-term and long-term evolutionary effects on an organism's fitness in the natural environment.


 * Genome Diversity -- genome doubling provided the organism with redundant alleles that can evolve freely with little selection pressure.  The duplicated genes can undergo neofunctionalization or subfunctionalization which could help the organism adapt to the new environment or survive different stress conditions.


 * Heterosis -- polyploids often have larger cell sizes and even larger organs.  Many important crops, including wheat, maize and cotton, are paleopolyploids which were domesticated by ancient people.


 * Speciation -- It has been suggested that many polyploidization events created new species because of gaining adaptive traits or sexual incompatibility with their diploid counterparts. An example would be the recent speciation of allopolyploid Spartina -- S. anglica;  the polyploid plant is so successful that it is listed as invasive species in many regions.

Human as paleopolyploid
The hypothesis about human being paleopolyploid originated as early as 1970, proposed by the famous biologist Susumu Ohno. He reasoned that vertebrate genome can not achieve its complexity without large scale whole-genome duplications. The "two rounds of genome duplication" (2R hypothesis) hypothesis was made and gained its popularity since then, especially among developmental biologists.

However, the 2R hypothesis has also been questioned by many researchers. Based on the theory, human genome should have a 4:1 ratio compared with invertebrate's genome. This is not supported by recent finding from various genome projects -- human genome consists of ~35,000 genes while an average invertebrate genome sized about 15,000 genes. Diploidization could not fully justify this ratio because phylogenetic studies failed to resolve a four-gene-clusters of (AB) (CD) tree topology in vertebrates.

The 2R hypothesis is still under debate today, with supporters on both sides. Some critical evidence might be found in the root of vertebrate lineages, such as Ciona intestinalis. With more and more ESTs available, we soon would have a better understanding on the paleopolyploidy of ourselves.