You are hereSystematics 101 / Barcode


Molecular barcoding is a molecular identification technique. It is based on the comparison of selected DNA sequences from the sample to identify to a reference database.

Identification is a fundamental part of biology. It focusses on linking a specimen or sample to a name, in order to communicate more precisely the subject of the study. The identification can be based on morphological or molecular characteristics of the sample. These characteristics are compared to those listed for described species in identification keys or databases, or can even be compared directly to those present in specimens stored as vouchers in collections.

Barcoding can be performed as long as the sample or specimen contains relatively good quality DNA, even if there is not much left: dried, cooked or mixed samples can yield DNA. Although even older samples can sometimes give good results, DNA denaturation and destruction sets the limit; formalin treated specimens, for instance, are very rarely usable.

The process can be represented by a flowchart:


The sequence of the barcode marker for the specimen is compared to the sequences from identified specimens listed in the online database. The chosen barcode marker for most animal species is a partial sequence of the mitochondrial gene coding for cytochrome oxidase I (COI), a respiratory enzyme of the mitochondrion. However, the results of the identification are good only if sequences from individuals that belong to the same species are more similar to each other than to sequences from other species. If it is the case, and if sequences from the species the specimen belongs to are present in the database, the identification will successfully yield a genus and species name.

Molecular identification had been invented and used years before molecular barcoding was proposed by Hebert et al. in 2003. However, barcoding is particularly interesting as a project because:

  • The limited number of molecular markers standardise the data and make it largely comparable transversaly. Thus, a specimen that is not easily attributed to a group can still be identified. At first, the chosen marker was a part of mitochondrial cytochrome oxidase I. However, this was reassessed as this gene is not informative for some groups (green plants, fungi...).  
  • A dedicated database is available. It links the sequences for a species to the voucher specimens they come from. This is a great step forward, as in other sequence databases like GenBank, the link to the specimen is most often missing, and no return to the specimen can be performed in case of doubt.

The Barcode of Life project aims to gather and explore the necessary technologies, as well as the reference dataset for fast molecular identification of hopefully all living organisms.

Problems and critics

At first, the project was presented as a revolutionary development, with very far reaching conclusions. It was therefore widely criticized in the next few years, for two main aspects. First, part of the community of taxonomists perceived the proposal as a direct attack on their work. Second, subsequent studies showed numerous exceptions and problems that had not been identified at first. Most barcode proponents have integrated the critics in their more recent studies, and are more careful with their conclusions.

Barcode has several limits: 

  • Mitochondrial markers are inherited only from the maternal side in most groups. A mitochondrial marker will therefore not reflect gene flows, and might give an incomplete or false picture of the species delineations. However, this is true for any study integrating a single marker.
  • Mitochondrial markers can have nuclear copies in some species. These copies can bias the results.
  • The divergence level between specimens from the same species is not always clearly diffrent from the divergence levels between specimens from different species, especially for closely related species. Therefore, a specimen can be wrongly attributed to a closely related species, and the problem will not be detected if not all species are present in the database.
  • Identification efficiency is highly dependent on the completeness of the database: if there is no sequence for the species or for closely related species, the identification will fail, even for higher taxonomic levels.
  •  The methods used for assignations and tree reconstructions in the system are highly controversial as to their efficiency.

What uses for molecular barcoding ?

Unexpected results for DNA barcoding can lead to the discovery of problems in our knowledge of current species delineations, and therefore to the discovery and description of new species. But molecular barcoding is not sufficient on its own to solve dubious or complex cases. A complementary morphological or additional molecular data and analyses are necessary.

Identification is a problem for a number of groups: lack of taxonomists, difficult identifications... In many studies in biology outside systematics, specimens are not correctly identified (Bely & Weisblat 2006, Bortolus 2008). This can lead to problems in the interpretation and comparison of data and publications. Help for identification remains a priority.

Last, some types of samples (larvae, eggs, parts) are not readily identifiable via morphological characteristics. Quite often, the most interesting samples to identify for practical reasons (customs or fraud squad samples...) are in this case. Molecular identification is often the only possibility in these cases.