Cladistics

Cladistics is a philosophy of classification that arranges organisms only by their order of branching in an evolutionary tree and not by their morphological similarity, in the words of Luria et al. (1981). A major contributor to this school of thought was the German entomologist Willi Hennig, who referred to it as phylogenetic systematics (Hennig, 1966). The word cladistics is derived from the ancient Greek , klados, or "branch."

As the end result of a cladistic analysis, tree-like relationship-diagrams called "cladograms" are drawn up to show hypothesized relationships. A cladistic analysis can be based on as much or as little information as the researcher selects. Modern systematic research is likely to be based on a wide variety of information, including DNA-sequences (so-called "molecular data"), biochemical data and morphological data.

In a cladogram, all organisms lie at the leaves, and each inner node is ideally binary (two-way). The two taxa on either side of a split are called sister taxa or sister groups. Each subtree, whether it contains one item or a hundred thousand items, is called a clade. A natural group has all the organisms contained in any one clade that share a unique ancestor (one which they do not share with any other organisms on the diagram) for that clade. Each clade is distinguished by a set of characteristics that appear in its members, but not in the other forms from which it diverged. These identifying characteristics of a clade are called synapomorphies (shared, derived characters). For instance, hardened front wings (elytra) are a synapomorphy of beetles, while circinate vernation, or the unrolling of new fronds, is a synapomorphy of ferns.

Definitions
A character state (see below) that is present in both the outgroups (see below) and in the ancestors is called a plesiomorphy (meaning "close form", also called an ancestral state). A character state that occurs only in later descendants is called an apomorphy (meaning "separate form", also called a "derived" state) for that group. The adjectives plesiomorphic and apomorphic are used instead of "primitive" and "advanced" to avoid placing value-judgments on the evolution of the character states, since both may be advantageous in different circumstances. It is not uncommon to refer informally to a collective set of plesiomorphies as a ground plan for the clade or clades they refer to.

Several more terms are defined for the description of cladograms and the positions of items within them. A species or clade is basal to another clade if it holds more plesiomorphic characters than that other clade. Usually a basal group is very species-poor as compared to a more derived group. It is not a requirement that a basal group be extant. For example, when considering birds and mammals together, neither is basal to the other: both have many derived characters.

A clade or species located within another clade can be described as nested within that clade.

Cladistics as a break with the past
The school of thought now known as cladistics took inspiration from the work of Willi Hennig. But Hennig's major book, even the 1979 version, does not contain the term 'cladistics' in the index. He referred to his own approach as phylogenetic systematics, implied by the book's title (Hennig, 1979). A review paper by Dupuis (1984) observes that the term 'clade' was introduced in 1958 by Julian Huxley, 'cladistic' by Cain and Harrison in 1960 and 'cladist' (for an adherent of Hennig's school) by Mayr in 1965. Some of the debates that the cladists engaged in had been running since the 19th century, but they entered these debates with a new fervor, as can be learned from the Foreword to Hennig (1979) in which Rosen, Nelson and Patterson wrote the following: "Encumbered with vague and slippery ideas about adaptation, fitness, biological species and natural selection, neo-Darwinism (summed up in the 'evolutionary' systematics of Mayr and Simpson) not only lacked a definable investigatory method, but came to depend, both for evolutionary interpretation and classification, on consensus or authority. (Foreword, page ix)."

Cladistic methods
A cladistic analysis is applied to a certain set of information. To organize this information a distinction is made between characters, and character states. Consider the color of feathers, this may be blue in one species but red in another. Thus, "red feathers" and "blue feathers" are two character states of the character "feather-color."

The researcher decides which character states were present before the last common ancestor of the species group (plesiomorphies) and which were present in the last common ancestor (synapomorphies) by considering one or more outgroups. An outgroup is an organism that is considered not to be part of the group in question, but is closely related to the group. This makes the choice of an outgroup an important task, since this choice can profoundly change the topology of a tree. Note that only synapomorphies are of use in characterising clades.

Next, different possible cladograms are drawn up and evaluated. Clades ideally have many "agreeing" synapomorphies. Ideally there is a sufficient number of true synapomorphies to overwhelm homoplasies caused by convergent evolution (i.e. characters that resemble each other because of environmental conditions or function, not because of common ancestry). A well-known example of homoplasy due to convergent evolution would be a character "presence of wings". Though the wings of birds and insects serve the same function, each evolved independently, as can be seen by their anatomy. If a bird and a winged insect were scored for the character "presence of wings", a homoplasy would be introduced into the dataset, and this confounds the analysis, possibly resulting in a false picture of evolution.

Homoplasies can often be avoided outright in morphological datasets by defining characters more precisely and increasing their number: in the example above, e.g. utilizing "wings supported by bony endoskeleton" and "wings supported by chitinous exoskeleton" as characters would avoid the homoplasy. When analyzing "supertrees" (datasets incorporating as many taxa of a suspected clade as possible), it may become unavoidable to introduce character definitions that are unprecise, as otherwise the characters might not apply at all to a large number of taxa. The "wings" example would be hardly useful if attempting a phylogeny of all Metazoa as most of these don't have wings at all. Cautious choice and definition of characters thus is another important element in cladistic analyses. With a faulty outgroup and/or character set, no method of evaluation is likely to produce a phylogeny representing the evolutionary reality.

Many cladograms are possible for any given set of taxa, but one is chosen based on the principle of parsimony: the most compact arrangement, that is, with the fewest character state changes (synapomorphies), is the hypothesis of relationship accept here (see Occam's razor for a discussion of the principle of parsimony and possible complications). Though at one time this analysis was done by hand, computers are now used to evaluate much larger data sets. Sophisticated software packages such as PAUP* allow the statistical evaluation of the confidence we can put in the veracity of the nodes of a cladogram.

It is important to note that the nodes of cladograms do not usually represent divergences of evolutionary lineages, but divergences of character states that are found between such lineages. DNA sequence characters can only diverge after gene flow between (sub)populations has been reduced below some threshold, whereas comprehensive morphological alterations, usually being epistatic (the product of interactions of several genes), usually occur only after lineages have already evolved separately for quite some time: Biological subspecies can usually be distinguished genetically but not for example by internal anatomy

As DNA sequencing has become cheaper and easier, molecular systematics has become a more and more popular way to reconstruct phylogenies. Using a parsimony criterion is only one of several methods to infer a phylogeny from molecular data; maximum likelihood and Bayesian inference, which incorporate explicit models of sequence evolution, are non-Hennigian ways to evaluate sequence data. Another powerful method of reconstructing phylogenies is the use of genomic retrotransposon markers, which are thought to be less prone to the problem of reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the genome was entirely random; this seems at least sometimes not to be the case however.

Ideally, morphological, molecular and possibly other (behavioral etc.) phylogenies should be combined into an analysis of total evidence: all have different intrinsic sources of error. For example, character convergence (homoplasy) is much more common in morphological data than in molecular sequence data, but character reversions that cannot be noticed as such are more common in the latter (see long branch attraction). Morphological homoplasies can usually be recognized as such if character states are defined with enough attention to detail.

Cladistics does not assume any particular theory of evolution, only the background knowledge of descent with modification. Thus, cladistic methods can be, and recently have been, usefully applied to non-biological systems, including determining language families in historical linguistics and the filiation of manuscripts in textual criticism.

Classification using cladistics


A recent trend in biology since the 1960s, called phylogenetic nomenclature, attempts to use cladistic trees as the basis for scientific classification, requiring taxa to be clades. In other words, cladists argue that the classification system should be reformed to eliminate all non-clades. In contrast, other taxonomists insist that groups reflect phylogenies and often make use of cladistic techniques, but allow both monophyletic and paraphyletic groups as taxa. In effect, already since the early 20th century at latest, it was generally attempted to make genus- and lower-level taxa monophyletic (even though the word might not have been used). Class- and higher-level taxa proved to be more difficult.

A monophyletic group is a clade, comprising an ancestral form and all of its descendants, and so forming one (and only one) evolutionary group. A paraphyletic group is similar, but excludes some of the descendants that have undergone significant changes. For instance, the traditional class Reptilia excludes birds even though they evolved from the ancestral reptile. Similarly, the traditional Invertebrates are paraphyletic because Vertebrates are excluded, although the latter evolved from an Invertebrate.

A group with members from separate evolutionary lines is called polyphyletic. For instance, the once-recognized Pachydermata was found to be polyphyletic because elephants and rhinoceroses arose from non-pachyderms separately. Evolutionary taxonomists consider polyphyletic groups to be errors in classification, often occurring because convergence or other homoplasy was misinterpreted as homology.

Following Hennig, cladists argue that paraphyly is as harmful as polyphyly. The idea is that monophyletic groups can be defined objectively, in terms of common ancestors or the presence of synapomorphies. In contrast, paraphyletic and polyphyletic groups are both defined based on key characters, and the decision of which characters are of taxonomic import is inherently subjective. Many argue that they lead to "gradistic" thinking, where groups advance from "lowly" grades to "advanced" grades, which can in turn lead to teleology. In evolutionary studies, teleology is usually avoided because it implies a plan that cannot be empirically demonstrated.

Going further, some cladists argue that ranks for groups above species are too subjective to present any meaningful information, and so argue that they should be abandoned. Thus they have moved away from Linnaean taxonomy towards a simple hierarchy of clades. The validity of this argument hinges crucially on how often in evolution gradualist near-equilibria are punctuated. A quasi-stable state will result in phylogenies, which may be all but unmappable onto the Linnaean hierarchy, whereas a punctuation event that balances a taxon out of its ecological equilibrium is likely to lead to a split between clades that occurs in comparatively short time and thus lends itself readily for classification according to the Linnaean system.

Other evolutionary systematists argue that all taxa are inherently subjective, even when they reflect evolutionary relationships, since living things form an essentially continuous tree. Any dividing line is artificial, and creates both a monophyletic section above and a paraphyletic section below. Paraphyletic taxa are necessary for classifying earlier sections of the tree – for instance, the early vertebrates that would someday evolve into the family Hominidae cannot be placed in any other monophyletic family. They also argue that paraphyletic taxa provide information about significant changes in organisms' morphology, ecology, or life history – in short, that both taxa and clades are valuable but distinct notions, with separate purposes. Many use the term monophyly in its older sense, where it includes paraphyly, and use the alternate term holophyly to describe clades (monophyly in Hennig's sense). As an unscientific rule of thumb, if a distinct lineage that renders the containing clade paraphyletic has undergone marked adaptive radiation and collected many synapomorphies - especially ones that are radical and/or unprecedented -, the paraphyly is usually not considered a sufficient argument to prevent recognition of the lineage as distinct under the Linnaean system (but it is by definition sufficient in phylogenetic nomenclature). For example, as touched upon briefly above, the Sauropsida ("reptiles") and the Aves (birds) are both ranked as a Linnaean class, although the latter are a highly derived offshoot of some forms of the former which themselves were already quite advanced.

A formal code of phylogenetic nomenclature, the PhyloCode, is currently under development for cladistic taxonomy. It is intended for use by both those who would like to abandon Linnaean taxonomy and those who would like to use taxa and clades side by side. In several instances (see for example Hesperornithes) it has been employed to clarify uncertainties in Linnaean systematics so that in combination they yield a taxonomy that is unambiguously placing the group in the evolutionary tree in a way that is consistent with current knowledge.