Cargando…

Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers

The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Garrido-Sanz, Lidia, Àngel Senar, Miquel, Piñol, Josep
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9595558/
https://www.ncbi.nlm.nih.gov/pubmed/36282811
http://dx.doi.org/10.1371/journal.pone.0275790
_version_ 1784815680294486016
author Garrido-Sanz, Lidia
Àngel Senar, Miquel
Piñol, Josep
author_facet Garrido-Sanz, Lidia
Àngel Senar, Miquel
Piñol, Josep
author_sort Garrido-Sanz, Lidia
collection PubMed
description The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain many false positives. Here we focus on the reduction of false positive species attributable to the classifiers. We benchmarked two popular classifiers, BLASTn followed by MEGAN6 (BM) and Kraken2 (K2), to analyse shotgun sequenced artificial single-species samples of insects. To reduce the number of misclassified reads, we combined the output of the two classifiers in two different ways: (1) by keeping only the reads that were attributed to the same species by both classifiers (intersection approach); and (2) by keeping the reads assigned to some species by any classifier (union approach). In addition, we applied an analytical detection limit to further reduce the number of false positives species. As expected, both metagenomic classifiers used with default parameters generated an unacceptably high number of misidentified species (tens with BM, hundreds with K2). The false positive species were not necessarily phylogenetically close, as some of them belonged to different orders of insects. The union approach failed to reduce the number of false positives, but the intersection approach got rid of most of them. The addition of an analytic detection limit of 0.001 further reduced the number to ca. 0.5 false positive species per sample. The misidentification of species by most classifiers hampers the confidence of the DNA-based methods for assessing the biodiversity of biological samples. Our approach to alleviate the problem is straightforward and significantly reduced the number of reported false positive species.
format Online
Article
Text
id pubmed-9595558
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-95955582022-10-26 Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers Garrido-Sanz, Lidia Àngel Senar, Miquel Piñol, Josep PLoS One Research Article The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain many false positives. Here we focus on the reduction of false positive species attributable to the classifiers. We benchmarked two popular classifiers, BLASTn followed by MEGAN6 (BM) and Kraken2 (K2), to analyse shotgun sequenced artificial single-species samples of insects. To reduce the number of misclassified reads, we combined the output of the two classifiers in two different ways: (1) by keeping only the reads that were attributed to the same species by both classifiers (intersection approach); and (2) by keeping the reads assigned to some species by any classifier (union approach). In addition, we applied an analytical detection limit to further reduce the number of false positives species. As expected, both metagenomic classifiers used with default parameters generated an unacceptably high number of misidentified species (tens with BM, hundreds with K2). The false positive species were not necessarily phylogenetically close, as some of them belonged to different orders of insects. The union approach failed to reduce the number of false positives, but the intersection approach got rid of most of them. The addition of an analytic detection limit of 0.001 further reduced the number to ca. 0.5 false positive species per sample. The misidentification of species by most classifiers hampers the confidence of the DNA-based methods for assessing the biodiversity of biological samples. Our approach to alleviate the problem is straightforward and significantly reduced the number of reported false positive species. Public Library of Science 2022-10-25 /pmc/articles/PMC9595558/ /pubmed/36282811 http://dx.doi.org/10.1371/journal.pone.0275790 Text en © 2022 Garrido-Sanz et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Garrido-Sanz, Lidia
Àngel Senar, Miquel
Piñol, Josep
Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
title Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
title_full Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
title_fullStr Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
title_full_unstemmed Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
title_short Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
title_sort drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9595558/
https://www.ncbi.nlm.nih.gov/pubmed/36282811
http://dx.doi.org/10.1371/journal.pone.0275790
work_keys_str_mv AT garridosanzlidia drasticreductionoffalsepositivespeciesinsamplesofinsectsbyintersectingthedefaultoutputoftwopopularmetagenomicclassifiers
AT angelsenarmiquel drasticreductionoffalsepositivespeciesinsamplesofinsectsbyintersectingthedefaultoutputoftwopopularmetagenomicclassifiers
AT pinoljosep drasticreductionoffalsepositivespeciesinsamplesofinsectsbyintersectingthedefaultoutputoftwopopularmetagenomicclassifiers