DNA barcoding

DNA barcoding is a taxonomic method which uses a short genetic marker in an organism's mitochondrial DNA to identify it as belonging to a particular species. It is based on a relatively simple concept: most eukaryote cells contain mitochondria and mitochondrial DNA (mtDNA) has a relatively fast mutation rate, which results in significant variance in mtDNA sequences between species and, in principle, a comparatively small variance within species. However, because all mtDNA genes are maternally inherited, any occurrences of hybridization, horizontal gene transfer (such as via cellular symbionts ), or other "reticulate" evolutionary phenomena in a lineage can lead to misleading results (i.e., it is possible for two different species to share mtDNA, or for one species to have more than one mtDNA sequence exhibited among different individuals). A 648-bp region of the mitochondrial gene, known as cytochrome c oxidase I (COI), was initially proposed as a potential 'barcode'.

Origin
The use of nucleotide sequence variations to investigate evolutionary relationships is not a new concept. Carl Woese used sequence differences in ribosomal RNA (rRNA) to discover archaebacteria, which in turn led to the redrawing of the evolutionary tree, and molecular markers (e.g., allozymes, rDNA, and mtDNA) have been successfully used in molecular systematics for decades. DNA barcoding provides a standardised method for this process via the use of a short DNA sequence from a particular region of the genome to provide a 'barcode' for identifying species. In 2003, Professor Paul D.N. Hebert from the University of Guelph, Ontario, Canada, proposed the compilation of a public library of DNA barcodes that would be linked to named specimens. This library would “provide a new master key for identifying species, one whose power will rise with increased taxon coverage and with faster, cheaper sequencing”.

Identification of Birds
In an effort to find a correspondence between traditional species boundaries established by taxonomy and those inferred by DNA barcoding, Hebert and co-workers sequenced DNA barcodes of 260 of the 667 bird species that breed in North America (Hebert et al. 2004a ). They found that every single one of the 260 species had a different COI sequence. 130 species were represented by two or more specimens, in all of these species. COI sequences were either identical or were most similar to sequences of the same species. COI variations between species averaged 7.93%, whereas variation within species averaged 0.43%. In four cases there were deep intraspecific divergences, indicating possible new species. Three out of these four polytypic species are already split into two by some taxonomists. Hebert et al.'s (2004a ) results reinforce these views and strengthen the case for DNA barcoding. Hebert et al. also proposed a standard sequence threshold to define new species, this threshold was defined as 10 times the mean intraspecific variation for the group under study.

Delimiting Cryptic Species
The next major study into the efficacy of DNA barcoding was focused on the neotropical skipper butterfly, Astraptes fulgerator at the Area Conservacion de Guanacaste (ACG) in north-western Costa Rica. This species was already known as a cryptic species complex, due to subtle morphological differences, as well as an unusually large variety of caterpillar food plants. However, several years would have been required for taxonomists to completely delimit species. Hebert et al. (2004b ) sequenced the COI gene of 484 specimens from the ACG. This sample included “at least 20 individuals reared from each species of food plant, extremes and intermediates of adult and caterpillar color variation, and representatives” from the three major ecosystems where Astraptes fulgerator is found. Hebert et al. (2004b ) concluded that Astraptes fulgerator consists of 10 different species in north-western Costa Rica. These results, however, were subsequently challenged by Brower (2006), who pointed out numerous serious flaws in the analysis, and concluded that the original data could support no more than the possibility of three cryptic taxa rather than ten. This highlights that the results of DNA barcoding analyses can be dependent upon the choice of analytical methods used by the investigators, so the process of delimiting cryptic species using DNA barcodes can be as subjective as any other form of taxonomy.

A more recent example used DNA barcoding for the identification of cryptic species included in the ongoing long-term database of tropical caterpillar life generated by Dan Janzen and Winnie Hallwachs in Costa Rica at the ACG. In 2006 Smith et al. examined whether a COI DNA barcode could function as a tool for identification and discovery for the 20 morphospecies of Belvosia parasitoid flies (Tachinidae) that have been reared from caterpillars in ACG. Barcoding not only discriminated among all 17 highly host-specific morphospecies of ACG Belvosia, but it also suggested that the species count could be as high as 32 by indicating that each of the three generalist species might actually be arrays of highly host-specific cryptic species.

In 2007 Smith et al. expanded on these results by barcoding 2,134 flies belonging to what appeared to be the 16 most generalist of the ACG tachinid morphospecies. They encountered 73 mitochondrial lineages separated by an average of 4% sequence divergence and, as these lineages are supported by collateral ecological information, and, where tested, by independent nuclear markers (28S and ITS1), the authors therefore viewed these lineages as provisional species. Each of the 16 initially apparent generalist species were categorized into one of four patterns: (i) a single generalist species, (ii) a pair of morphologically cryptic generalist species, (iii) a complex of specialist species plus a generalist, or (iv) a complex of specialists with no remaining generalist. In sum, there remained 9 generalist species classified among the 73 mitochondrial lineages analyzed.

However, also in 2007, Whitworth et al. reported that flies in the related family Calliphoridae could not be discriminated by barcoding. They investigated the performance of barcoding in the fly genus Protocalliphora, known to be infected with the endosymbiotic bacteria Wolbachia. Assignment of unknown individuals to species was impossible for 60% of the species, and if the technique had been applied, as in the previous study, to identify new species, it would have underestimated the species number in the genus by 75%. They attributed the failure of barcoding to the non-monophyly of many of the species at the mitochondrial level; in one case, individuals from four different species had identical barcodes. The authors went on to state: ''The pattern of Wolbachia infection strongly suggests that the lack of within-species monophyly results from introgressive hybridization associated with Wolbachia infection. Given that Wolbachia is known to infect between 15 and 75% of insect species, we conclude that identification at the species level based on mitochondrial sequence might not be possible for many insects.''

Identifying Flowering Plants
Kress et al. (2005 ) suggest that the use of the COI sequence “is not appropriate for most species of plants because of a much slower rate of cytochrome c oxidase I gene evolution in higher plants than in animals”. A series of experiments was then conducted to find a more suitable region of the genome for use in the DNA barcoding of flowering plants. Three criteria were set for the appropriate genetic loci:


 * 1) Significant species-level genetic variability and divergence
 * 2) An appropriately short sequence length so as to facilitate DNA extraction and amplification, and
 * 3) The presence of conserved flanking sites for developing universal primers.

At the conclusion of these experiments, Kress et al. (2005 ) proposed the nuclear internal transcribed spacer region and the plastid trnH-psbA intergenic spacer as a potential DNA barcode for flowering plants. These results suggest that DNA barcoding, rather than being a 'master key' may be a 'master keyring', with different kingdoms of life requiring different keys.

Cataloguing Ancient Life
Lambert et al. (2005 ) examined the possibility of using DNA barcoding to assess the past diversity of the earth's biota. The COI gene of a group of extinct ratite birds, the moa, were sequenced using 26 subfossil moa bones. As with Hebert's results, each species sequenced had a unique barcode and intraspecific COI sequence variance ranged from 0 to 1.24%. To determine new species, a standard sequence threshold of 2.7% COI sequence difference was set. This value is 10 times the average intraspecies difference of North American birds, which is inconsistent with Hebert's recommendation that the threshold value be based on the group under study. Using this value, the group detected six moa species. In addition, a further standard sequence threshold of 1.24% was also used. This value resulted in 10 moa species which corresponded with the previously known species with one exception. This exception suggested a possible complex of species which was previously unidentified. Given the slow rate of growth and reproduction of moa, it is probable that the interspecies variation is rather low. On the other hand, there is no set value of molecular difference at which populations can be assumed to have irrevocably started to undergo speciation. It is safe to say, however, that the 2.7% COI sequence difference initially used was far too high.

Criticisms
DNA barcoding has met with spirited reaction from scientists, especially systematists, ranging from enthusiastic endorsement to vociferous opposition. For example, many stress the fact that DNA barcoding does not provide reliable information above the species level, while others indicate that it is inapplicable at the species level, but may still have merit for higher-level groups. Others resent what they see as a gross oversimplification of the science of taxonomy. And, more practically, some suggest that recently diverged species might not be distinguishable on the basis of their COI sequences. Regarding the latter, exploratory studies have shown that about 96% of eukaryotic species surveyed can be detected with barcoding - though most of these would also be resolvable with traditional means; the remaining 4% do, however, pose problems which can lead to error rates that are unacceptably high (up to 31% of false attributions) when relying on DNA barcoding alone. Due to various phenomena, Funk & Omland (2003 ) found that some 23% of animal species are polyphyletic if their mtDNA data are accurate, indicating that using an mtDNA barcode to assign a species name to an animal will be ambiguous or erroneous some 23% of the time (see also Meyer & Paulay, 2005 ). Studies with insects suggest an equal or even greater error rate, due to the frequent lack of correlation between the mitochondrial genome and the nuclear genome (e.g., Hurst and Jiggins, 2005, Whitworth et al., 2007 ), and given that insects represent over 75% of all known organisms, this suggests that while barcoding may work for vertebrates, it may not be effective for the majority of known organisms.

Moritz and Cicero (2004 ) have questioned the efficacy of DNA barcoding by suggesting that other avian data is inconsistent with Hebert et al.'s interpretation, namely, Johnson and Cicero's (2004 ) finding that 74% of sister species comparisons fall below the 2.7% threshold suggested by Hebert et al. These criticisms are somewhat misleading considering that, of the 39 species comparisons reported by Johnson and Cicero, only 8 actually use COI data to arrive at their conclusions. Johnson and Cicero (2004 ) have also claimed to have detected species with identical DNA barcodes, however, these 'barcodes' refer to an unpublished 723-bp sequence of ND6 which has never been suggested as a likely candidate for DNA barcoding.

The DNA barcoding debate resembles the phenetics debate of decades gone by. It remains to be seen whether what is now touted as a revolution in taxonomy will eventually go the same way as phenetic approaches, of which was claimed exactly the same decades ago, but which were all but rejected when they failed to live up to overblown expectations. Controversy surrounding DNA barcoding stems not so much from the method itself, but rather from extravagant claims that it will supersede or radically transform traditional taxonomy. Other critics fear a "big science" initiative like barcoding will make funding even more scarce for already underfunded disciplines like taxonomy, but barcoders respond that they compete for funding not with fields like taxonomy, but instead with other big science fields, such as medicine and genomics.

The current trend appears to be that DNA barcoding needs to be used alongside traditional taxonomic tools and alternative forms of molecular systematics so that problem cases can be identified and errors detected. Non-cryptic species can generally be resolved by either traditional or molecular taxonomy without ambiguity. However, more difficult cases will only yield to a combination of approaches. And finally, as most of the global biodiversity remains unknown, molecular barcoding can only hint at the existence of new taxa, but not delimit or describe them (DeSalle, 2006 ; Rubinoff, 2006 ).