Phylogenetics

Overview
In biology, phylogenetics (Greek: phyle = tribe, race and genetikos = relative to birth, from genesis = birth) is the study of evolutionary relatedness among various groups of organisms (e.g., species, populations). Also known as phylogenetic systematics or cladistics, phylogenetics treats each species as a group of lineage-connected individuals. Taxonomy, the classification of organisms according to similarity, has been richly informed by phylogenetics but remains methodologically and logically distinct.

Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized as a multidimensional character-space that a population moves through over time. The problem posed by phylogenetics is that genetic data are only available for the present, and fossil records (osteometric data) are sporadic and less reliable. Our knowledge of how evolution operates is used to reconstruct the full tree.

Cladistics provides a simplified method of understanding phylogenetic trees. There are some terms that describe the nature of a grouping. For instance, all birds and reptiles are believed to have descended from a single common ancestor, so this taxonomic grouping (yellow in the diagram) is called monophyletic. "Modern reptile" (cyan in the diagram) is a grouping that contains a common ancestor, but does not contain all descendents of that ancestor (birds are excluded). This is an example of a paraphyletic group. A grouping such as warm-blooded animals would include only mammals and birds (red/orange in the diagram) and is called polyphyletic because the members of this grouping do not include the most recent common ancestor.

The most commonly used methods to infer phylogenies include parsimony, maximum likelihood, and MCMC-based Bayesian inference. Distance-based methods construct trees based on overall similarity which is often assumed to approximate phylogenetic relationships. All methods depend upon an implicit or explicit mathematical model describing the evolution of characters observed in the species included, and are usually used for molecular phylogeny where the characters are aligned nucleotide or amino acid sequences.

Ernst Haeckel's recapitulation theory
During the late 19th century, Ernst Haeckel's recapitulation theory, or biogenetic law, was widely accepted. This theory was often expressed as "ontogeny recapitulates phylogeny", i.e. the development of an organism exactly mirrors the evolutionary development of the species. Haeckel's early version of this hypothesis (that the embryo mirrors adult evolutionary ancestors) has since been rejected, and the hypothesis amended as the embryo's development mirroring embryos of its evolutionary ancestors. Most modern biologists recognize numerous connections between ontogeny and phylogeny, explain them using evolutionary theory, or view them as supporting evidence for that theory. Donald Williamson suggested that larvae and embryos represented adults in other taxa that have been transferred by hybridization (the larval transfer theory)

Gene transfer
Organisms can generally inherit genes in two ways: from parent to offspring (vertical gene transfer), or by horizontal or lateral gene transfer, in which genes jump between unrelated organisms, a common phenomenon in prokaryotes.

Lateral gene transfer has complicated the determination of phylogenies of organisms since inconsistencies have been reported depending on the gene chosen.

Carl Woese came up with the three-domain theory of life (eubacteria, archaea and eukaryotes) based on his discovery that the genes encoding ribosomal RNA are ancient and distributed over all lineages of life with little or no lateral gene transfer. Therefore rRNA are commonly recommended as molecular clocks for reconstructing phylogenies.

This has been particularly useful for the phylogeny of microorganisms, to which the species concept does not apply and which are too morphologically simple to be classified based on phenotypic traits.

Taxon sampling and phylogenetic signal
Owing to the development of advanced sequencing techniques in molecular biology, it has become feasible to gather large amounts of data (DNA or amino acid sequences) to estimate phylogenies. For example, it is not rare to find studies with character matrices based on whole mitochondrial genomes. However, it has been proposed that it is more important to increase the number of taxa in the matrix than to increase the number of characters, because the more taxa, the more robust is the resulting phylogeny. This is partly due to the breaking up of long branches. It has been argued that this is an important reason to incorporate data from fossils into phylogenies where possible. Using simulations, Derrick Zwickl and Hillis found that increasing taxon sampling in phylogenetic inference has a positive effect on the accuracy of phylogenetic analyses.

Another important factor that affects the accuracy of tree reconstruction is whether the data analyzed actually contain useful phylogenetic signal, a term that is used generally to denote whether related organisms tend to resemble each other with respect to their genetic material or phenotypic traits.