Cargando…

Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera

DNA barcoding can identify biological species and provides an important tool in diverse applications, such as conserving species and identifying pathogens, among many others. If combined with statistical tests, DNA barcoding can focus taxonomic scrutiny onto anomalous species identifications based o...

Descripción completa

Detalles Bibliográficos
Autores principales: Martín, María P., Daniëls, Pablo P., Erickson, David, Spouge, John L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7437900/
https://www.ncbi.nlm.nih.gov/pubmed/32813726
http://dx.doi.org/10.1371/journal.pone.0237507
_version_ 1783572711561232384
author Martín, María P.
Daniëls, Pablo P.
Erickson, David
Spouge, John L.
author_facet Martín, María P.
Daniëls, Pablo P.
Erickson, David
Spouge, John L.
author_sort Martín, María P.
collection PubMed
description DNA barcoding can identify biological species and provides an important tool in diverse applications, such as conserving species and identifying pathogens, among many others. If combined with statistical tests, DNA barcoding can focus taxonomic scrutiny onto anomalous species identifications based on morphological features. Accordingly, we put nonparametric tests into a taxonomic context to answer questions about our sequence dataset of the formal fungal barcode, the nuclear ribosomal internal transcribed spacer (ITS). For example, does DNA barcoding concur with annotated species identifications significantly better if expert taxonomists produced the annotations? Does species assignment improve significantly if sequences are restricted to lengths greater than 500 bp? Both questions require a figure of merit to measure of the accuracy of species identification, typically provided by the probability of correct identification (PCI). Many articles on DNA barcoding use variants of PCI to measure the accuracy of species identification, but do not provide the variants with names, and the absence of explicit names hinders the recognition that the different variants are not comparable from study to study. We provide four variant PCIs with a name and show that for fixed data they follow systematic inequalities. Despite custom, therefore, their comparison is at a minimum problematic. Some popular PCI variants are particularly vulnerable to errors in species annotation, insensitive to improvements in a barcoding pipeline, and unable to predict identification accuracy as a database grows, making them unsuitable for many purposes. Generally, the Fractional PCI has the best properties as a figure of merit for species identification. The fungal genus Ramaria provides unusual taxonomic difficulties. As a case study, it shows that a good taxonomic background can be combined with the pertinent summary statistics of molecular results to improve the identification of doubtful samples, linking both disciplines synergistically.
format Online
Article
Text
id pubmed-7437900
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-74379002020-08-26 Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera Martín, María P. Daniëls, Pablo P. Erickson, David Spouge, John L. PLoS One Research Article DNA barcoding can identify biological species and provides an important tool in diverse applications, such as conserving species and identifying pathogens, among many others. If combined with statistical tests, DNA barcoding can focus taxonomic scrutiny onto anomalous species identifications based on morphological features. Accordingly, we put nonparametric tests into a taxonomic context to answer questions about our sequence dataset of the formal fungal barcode, the nuclear ribosomal internal transcribed spacer (ITS). For example, does DNA barcoding concur with annotated species identifications significantly better if expert taxonomists produced the annotations? Does species assignment improve significantly if sequences are restricted to lengths greater than 500 bp? Both questions require a figure of merit to measure of the accuracy of species identification, typically provided by the probability of correct identification (PCI). Many articles on DNA barcoding use variants of PCI to measure the accuracy of species identification, but do not provide the variants with names, and the absence of explicit names hinders the recognition that the different variants are not comparable from study to study. We provide four variant PCIs with a name and show that for fixed data they follow systematic inequalities. Despite custom, therefore, their comparison is at a minimum problematic. Some popular PCI variants are particularly vulnerable to errors in species annotation, insensitive to improvements in a barcoding pipeline, and unable to predict identification accuracy as a database grows, making them unsuitable for many purposes. Generally, the Fractional PCI has the best properties as a figure of merit for species identification. The fungal genus Ramaria provides unusual taxonomic difficulties. As a case study, it shows that a good taxonomic background can be combined with the pertinent summary statistics of molecular results to improve the identification of doubtful samples, linking both disciplines synergistically. Public Library of Science 2020-08-19 /pmc/articles/PMC7437900/ /pubmed/32813726 http://dx.doi.org/10.1371/journal.pone.0237507 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Martín, María P.
Daniëls, Pablo P.
Erickson, David
Spouge, John L.
Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera
title Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera
title_full Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera
title_fullStr Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera
title_full_unstemmed Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera
title_short Figures of merit and statistics for detecting faulty species identification with DNA barcodes: A case study in Ramaria and related fungal genera
title_sort figures of merit and statistics for detecting faulty species identification with dna barcodes: a case study in ramaria and related fungal genera
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7437900/
https://www.ncbi.nlm.nih.gov/pubmed/32813726
http://dx.doi.org/10.1371/journal.pone.0237507
work_keys_str_mv AT martinmariap figuresofmeritandstatisticsfordetectingfaultyspeciesidentificationwithdnabarcodesacasestudyinramariaandrelatedfungalgenera
AT danielspablop figuresofmeritandstatisticsfordetectingfaultyspeciesidentificationwithdnabarcodesacasestudyinramariaandrelatedfungalgenera
AT ericksondavid figuresofmeritandstatisticsfordetectingfaultyspeciesidentificationwithdnabarcodesacasestudyinramariaandrelatedfungalgenera
AT spougejohnl figuresofmeritandstatisticsfordetectingfaultyspeciesidentificationwithdnabarcodesacasestudyinramariaandrelatedfungalgenera