Cargando…

Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets

Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an e...

Descripción completa

Detalles Bibliográficos
Autores principales: Eldred, Lauren E., Thorn, R. Greg, Smith, David Roy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662557/
https://www.ncbi.nlm.nih.gov/pubmed/34899856
http://dx.doi.org/10.3389/fgene.2021.768473
_version_ 1784613462303834112
author Eldred, Lauren E.
Thorn, R. Greg
Smith, David Roy
author_facet Eldred, Lauren E.
Thorn, R. Greg
Smith, David Roy
author_sort Eldred, Lauren E.
collection PubMed
description Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.
format Online
Article
Text
id pubmed-8662557
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86625572021-12-11 Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets Eldred, Lauren E. Thorn, R. Greg Smith, David Roy Front Genet Genetics Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences. Frontiers Media S.A. 2021-11-26 /pmc/articles/PMC8662557/ /pubmed/34899856 http://dx.doi.org/10.3389/fgene.2021.768473 Text en Copyright © 2021 Eldred, Thorn and Smith. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Eldred, Lauren E.
Thorn, R. Greg
Smith, David Roy
Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_full Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_fullStr Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_full_unstemmed Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_short Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
title_sort simple matching using qiime 2 and rdp reveals misidentified sequences and an underrepresentation of fungi in reference datasets
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662557/
https://www.ncbi.nlm.nih.gov/pubmed/34899856
http://dx.doi.org/10.3389/fgene.2021.768473
work_keys_str_mv AT eldredlaurene simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets
AT thornrgreg simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets
AT smithdavidroy simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets