Phylogenetic nomenclature



Phylogenetic nomenclature is an alternative to Linnaean nomenclature. Its two defining features are the use of phylogenetic definitions of biological taxon names, and the lack of obligatory ranks. It is currently not regulated, but the PhyloCode (International Code of Phylogenetic Nomenclature) is intended to regulate it once implemented.

Phylogenetic definitions
All forms of phylogenetic definitions are ways of specifying the ancestor in the definition of "clade" (which is "an ancestor and all its descendants"). Possible formats of phylogenetic definitions include:


 * Ancestor-based: "A and all its descendants". This is the only definition type that only needs a single specifier/anchor (see below); all others require at least two. It has so far never been used because the fossil record is hardly ever good enough that we can expect to find a direct ancestor of any known organism, let alone to recognize it as such with reasonable confidence.
 * Node-based: "the most recent common ancestor (MRCA) of A and B (and C etc. as needed), and all its descendants" = "the smallest clade that includes A and B (and C etc.)". (Note: Trivially, A and B are descendants of the MRCA of A and B.)
 * Branch-based: "all organisms or species that share a more recent common ancestor with A (and B, C, etc. as needed) than with Z (and with Y, X, etc. as needed)" = "the largest clade that includes A (and B, C, etc.) but not Z (and also not Y, X, etc.)". (Note: Trivially, A shares a more recent common ancestor with itself than with Z.)
 * Apomorphy-based: "the first organism or species to possess apomorphies M (and N etc. as needed) as inherited by A (and B etc. as needed), and all its descendants" = "the clade diagnosed by the presence of M (and N etc.) as inherited by A (and B etc.)". (Notes: if M evolves more than once, only the one event from which A has inherited it counts: if M is "walking on two legs" and A is Homo sapiens, then birds are not members of the clade; when the apomorphy is lost, the organisms that have lost it stay members of the clade: if M is flight and A is a sparrow, ostriches and penguins are members of the clade.)
 * Branch-modified node-nased: "the crown clade that includes all extant (or Recent) organisms or species which share a more recent common ancestor with A (and B etc. as needed) than with Z (and with Y etc. as needed), and all its descendants" = "the crown-clade of the clade consisting of all organisms or species that belong to the largest clade that includes A (and B etc.) but not Z (and Y etc.)". ''(Notes: The specifiers of the node-based clade are not explicitely mentioned; instead "all extant/Recent organisms/species that share a more recent common ancestor with A than with Z" – the extant/Recent members of a different, larger clade – are its specifiers. A and B etc. will in all likelihood be extant themselves and therefore be members of the clade to which the name applies. Both of these notes also apply to the following definition type.)
 * Apomorphy-modified node-based: "the crown clade that includes all extant (or Recent) organisms or species which are descended from the first organism to possess apomorphy M (and N etc. as needed) as inherited by A (and B etc. as needed)" = "the crown clade of the clade diagnosed by the presence of M (and N etc.) as inherited by A (and B etc.)".

Letters indicate "specifiers" (also called "anchors"); specifically, A, B, C, X, Y, and Z indicate specimens, species, or larger taxa, and M and N indicate derived character states (apomorphies). To avoid ambiguity, the use of taxa larger than species as anchors is now widely discouraged (and will be forbidden under the PhyloCode).

So-called "taxon-based definitions" can be encountered in literature from the late 1980s and early 1990s. These are not definitions, but merely non-exhaustive lists of contents, and are therefore not unambiguously applicable to all phylogenetic hypotheses. Accordingly, they are no longer used.

Universality
As long as all their anchors share any common ancestor at any time in the present or past, node-based definitions (whether modified or not) always objectively refer to a clade, whether that clade has been correctly identified (by phylogenetics) or not. This also holds for branch- and apomorphy-based definitions that mention only one species or specimen that must belong to the clade they refer to.

Branch- and apomorphy-based definitions with more than one internal anchor (here A, B, and C etc.) cannot be applied to all imaginable phylogenetic hypotheses: for example, if B shares a more recent common ancestor with Z than with A, there are no "organisms that share a more recent common ancestor with A and B than with Z". Under phylogenetic hypotheses where such definitions cannot be applied, they "self-destruct" by not referring to any clade. They stay valid under other phylogenetic hypotheses.

It is also possible to deliberately restrict the applicability of phylogenetic definitions by qualifying clauses, as in this example: "the MRCA of A and B, and all its descendants, provided that B shares a more recent MRCA with A than with C". If B does share a more recent MRCA with C than with A, the definition "self-destructs".

Phylogenetic definition that contain taxa as anchors that subsequently turn out to be para- or polyphyletic run the risk of "self-destructing" despite the author's intention. Therefore the use of taxa larger than species as anchors has greatly decreased and will be forbidden under the PhyloCode.

Extensions
The definition types listed above all define clade names. Indeed, all users of phylogenetic nomenclature have so far advocated the principle that only clades (and, usually, species) should be named, and this is also all the PhyloCode will allow. However, it is possible to create phylogenetic definitions for the names of paraphyletic taxa. Assuming Mammalia and Aves are defined, Reptilia could be defined as "the MRCA of birds and mammals and all its descendants except birds and mammals". This includes taxa that are not currently named and even taxa that cannot be named under the existing codes without seriously disrupting existing classifications, such as "all organisms that share a more recent common ancestor with Homo sapiens than with birds and plesiomorphically keep laying eggs". Names of polyphyletic taxa could be defined by referring to the sum of two or more clades or paraphyletic taxa.

Definitions
Under the currently existing codes of biological nomenclature, taxon names have (implicit) definitions that consist of a type and a rank. For example, Hominidae is the family that includes the genus Homo. The term "family" is not defined; therefore, the size of this taxon remains entirely at the discretion of the individual systematist, even if everyone agrees on the phylogeny of the potential members of Hominidae. Indeed, the use of Hominidae has over the last few decades changed from including only Homo and Australopithecus to including the chimpanzees and gorillas, then also the orangutans, and occasionally the gibbons as well. There is no way to say that any of these usages are "right" or "wrong", none of them correspond to testable scientific hypotheses – only aesthetic and utilitarian arguments can be made, and even the latter cannot be quantified or otherwise made objective. (See references for reviews: .) A growing number of biologists consider this situation unsatisfactory and feel that instability in nomenclature should only reflect instability of our knowledge of phylogeny, not instability in subjective opinions about which ranks should be given to which groups. Phylogenetic nomenclature, on the other hand, uses phylogenetic definitions (as explained above) to tie a name to a clade in such a manner that the meaning of the name is objective under any phylogenetic hypothesis, thus preventing splitting and lumping (unless definitions are changed in the process, which will not be allowed under the PhyloCode).

Lack of obligatory ranks
The current codes of biological nomenclature stipulate that taxa cannot be given a valid name without being given a rank; as mentioned above, the rank is part of the definition of every valid taxon name. However, the number of generally recognized ranks is limited; this means that many taxa must go unnamed because no ranks are available for them. Accordingly, the number of commonly used ranks has increased from the five to seven Linnaeus used (variety, species, genus, order, class; kingdom, arguably empire) to about twenty-one (by the addition of family, phylum/division, domain, and the prefixes super- and sub- and occasionally infra-; "empire" is not used, and "variety" is only used in botany). Given the million or more of known species, this is still not enough; for example, Gauthier et al. (1988) showed that a classification which uses the common array of ranks, but includes Aves within Reptilia and keeps Reptilia at its traditional rank of class, is forced to demote Aves to the rank of genus, despite the ~ 12,000 known species of extant and extinct birds that would have to be incorporated into this one genus. To reduce this problem, Patterson and Rosen (1977) suggested nine new ranks between family and superfamily in order to be able to classify a clade of herrings, and McKenna and Bell (1997) introduced a large array of new ranks in order to cope with the diversity of Mammalia, to cite only the two most famous examples. None of these proposals has become widely applied, mostly because they contain too many ranks to remember easily, and because there are never enough ranks to name all clades on a sufficiently large phylogenetic tree. In practice, this means that some taxon names are used by biologists, but either not made official or not used in classifications because no rank is available for them. Examples include the "tricolpates" in botany (no rank available) and Amniota, Tetrapoda, and/or Gnathostomata in zoology (ranks for one or two are available, but not for the third; selections of which names are not acknowledged vary). Phylogenetic nomenclature does not require that names have ranks.

The limited number of ranks also discourages researchers in another way from naming taxa when they discover them (see e.g. ). If a new name is introduced into a classification that already exhausts the available ranks, for example when two of three suborders of an order are recognized as a clade that might be interesting enough to be named, either rank shifts elsewhere in the classification (the order might be promoted to superorder, and the third suborder to order; or the two suborders might be demoted to superfamilies) or removing names from the classification because no rank is available for them any longer. By allowing unranked names, phylogenetic nomenclature circumvents this problem.

Furthermore, the current codes each have rules saying that names must have certain endings if they are applied to taxa that have certain ranks. When a taxon changes rank from one classification to another, its name must change its suffix. To return to the example of Hominidae, Ereshefsky (1997:512) stated: "The Linnaean rule of assigning rank-specific suffices [sic] gives rise to even more confusing cases. Simpson (1963, 29–30) and Wiley (1981, 238) agree that the genus Homo belongs to a particular taxon. They disagree, however, on that taxon's rank. Acting in accord with the Linnaean system, they attach different suffixes to the root Homini [actually Homin-] and give the taxon in question different names: Wiley calls it 'Hominini' [tribe rank] and Simpson calls it 'Hominidae' [family rank]. Their disagreement does not stop there. Wiley believes that the taxon just cited is a part of a more inclusive taxon which is a family. Using the root Homini, and following the rules of the Linnaean system [more precisely, the zoological code], he names the more inclusive taxon 'Hominidae.' So for Wiley and Simpson, the name 'Hominidae' refers to two different taxa. In brief, the Linnaean system causes Wiley and Simpson to assign different names to what they agree is the same taxon, and it causes them to give the same name to what they agree are different taxa." In phylogenetic nomenclature, ranks have no bearing on the spelling of taxon names (see e.g. ; see also the PhyloCode).

Perhaps most importantly, ranks encourage the misperception that taxa of the same rank are equivalent in a meaningful way (see e.g. and, but also who argue against phylogenetic nomenclature and for obligatory ranks). Many authors have treated genera, families, or orders as countable, using a count of such taxa as a measure (or proxy) of biodiversity. But because the ranks are not defined, taxa at the same rank are not equivalent in any objective way, so counting them does not tell us anything about nature – at most, it reveals the personal taxonomic preferences held by the authors of such studies. Alternative measures of biodiversity exist, such as the Phylogenetic Diversity Index.

Ranks are, however, not altogether forbidden in phylogenetic nomenclature. They are merely decoupled from nomenclature: they do not influence which names can be used, which taxa are associated with which names, and which names can refer to nested taxa (e.g.  ).

History
Most of the basic tenets of phylogenetic nomenclature (lack of obligatory ranks, and something close to phylogenetic definitions) can be traced to 1916, when Edwin Goodrich interpreted the name Sauropsida, erected 40 years earlier by T. H. Huxley, to include the birds (Aves) as well as part of Reptilia, and coined the new name Theropsida to include the mammals as well as another part of Reptilia, but did not give them ranks, and treated them exactly as if they had what would today be termed branch-based definitions, using neither contents nor diagnostic characters to decide whether a given animal should belong to Theropsida, Sauropsida, or something else once its phylogenetic position was agreed upon. Goodrich also opined that the name Reptilia should be abandoned once the phylogeny of the reptiles would be better known. The lack of compatibility of his scheme with the existing rank-based classifications (despite agreement on the phylogeny in all but details), and the lack of a method of phylogenetics at this time, are the most likely reasons why Goodrich's suggestions were largely ignored.

The principle that only clades (monophyletic taxa – an ancestor plus all its descendants) should be formally named became popular in the second half of the 20th century. It spread together with the methods for discovering clades (cladistics). At the same time, it became apparent that the obligatory ranks that are part of the traditional systems of nomenclature produced problems (see above under "Comparison to traditional nomenclature"). Some authors suggested abandoning them altogether, starting with Willi Hennig's abandonment of his earlier proposal to define ranks as geological age classes.

The origin of phylogenetic nomenclature can be dated to 1986, when Jacques Gauthier used phylogenetic definitions for the first time in a published work. Theoretical papers outlining the principles of phylogenetic nomenclature, as well as further publications containing applications of phylogenetic nomenclature (mostly to vertebrates), soon followed (see Literature section). Since then, the number of publications using phylogenetic nomenclature has steadily increased. So has the number of parts of the tree of life to which phylogenetic nomenclature has been applied, although great disparities still remain, most notably a bias towards fossil vertebrates and extant flowering plants (for example, all workers on Mesozoic dinosaur phylogeny today use phylogenetic nomenclature, while no entomologists use it).

Gauthier and de Queiroz appealed to the International Commission on Zoological Nomenclature to have phylogenetic definitions included in zoological nomenclature. When this proposal was rejected, they and the botanist Philip Cantino started drafting their own code of nomenclature, the PhyloCode, for regulating phylogenetic nomenclature.

Important literature
Only a few seminal publications are cited here (in roughly chronological order from top to bottom) and in the references (below). An exhaustive list can be found on the website of the International Society for Phylogenetic Nomenclature.


 * Gauthier, Jacques (1986). Saurischian Monophyly and the Origin of Birds. Pages 1–55 in Kevin Padian (ed.): The Origin of Birds and the Evolution of Flight. Mem. Cal. Acad. Sci. 8.
 * Gauthier, Jacques A., Arnold G. Kluge, and Timothy Rowe (1988). The early evolution of the Amniota. Pages 103–155 in Michael J. Benton (ed.): The Phylogeny and Classification of the Tetrapods, Volume 1: Amphibians, Reptiles, Birds. Syst. Ass. Spec. Vol. 35A. Clarendon Press, Oxford.
 * Gauthier, Jacques, David Cannatella, Kevin de Queiroz, Arnold G. Kluge, and Timothy Rowe (1989). Tetrapod phylogeny. Pages 337–353 in B. Fernholm, K. Bremer, and H. Jörnvall (eds.): The Hierarchy of Life. Elsevier Science B. V. (Biomedical Division), New York.
 * de Queiroz, Kevin (1992). Phylogenetic definitions and taxonomic philosophy. Biol. Philos. 7:295–313.
 * de Queiroz, Kevin, and Jacques Gauthier (1992). Phylogenetic taxonomy. Annu. Rev. Ecol. Syst. 23:449–480.
 * Gauthier, Jacques A. (1994). The diversification of the amniotes. Pages 129–159 in Donald R. Prothero and Rainer M. Schoch (eds.): Major features of vertebrate evolution. Paleontological Society, Knoxville.
 * Sereno, Paul C. (1998). A rationale for phylogenetic definitions, with application to the higher-level taxonomy of Dinosauria. N. Jb. Geol. Paläont. Abh. 210:41–83.
 * Sereno, Paul C. (1999). Definitions in phylogenetic taxonomy: critique and rationale. Syst. Biol. 48:329–351.
 * Sereno, Paul C. (2005). The Logical Basis of Phylogenetic Taxonomy [sic]. Syst. Biol. 54:595–619.