Cargando…
Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets
Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an e...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662557/ https://www.ncbi.nlm.nih.gov/pubmed/34899856 http://dx.doi.org/10.3389/fgene.2021.768473 |
_version_ | 1784613462303834112 |
---|---|
author | Eldred, Lauren E. Thorn, R. Greg Smith, David Roy |
author_facet | Eldred, Lauren E. Thorn, R. Greg Smith, David Roy |
author_sort | Eldred, Lauren E. |
collection | PubMed |
description | Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences. |
format | Online Article Text |
id | pubmed-8662557 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-86625572021-12-11 Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets Eldred, Lauren E. Thorn, R. Greg Smith, David Roy Front Genet Genetics Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences. Frontiers Media S.A. 2021-11-26 /pmc/articles/PMC8662557/ /pubmed/34899856 http://dx.doi.org/10.3389/fgene.2021.768473 Text en Copyright © 2021 Eldred, Thorn and Smith. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Eldred, Lauren E. Thorn, R. Greg Smith, David Roy Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets |
title | Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets |
title_full | Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets |
title_fullStr | Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets |
title_full_unstemmed | Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets |
title_short | Simple Matching Using QIIME 2 and RDP Reveals Misidentified Sequences and an Underrepresentation of Fungi in Reference Datasets |
title_sort | simple matching using qiime 2 and rdp reveals misidentified sequences and an underrepresentation of fungi in reference datasets |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662557/ https://www.ncbi.nlm.nih.gov/pubmed/34899856 http://dx.doi.org/10.3389/fgene.2021.768473 |
work_keys_str_mv | AT eldredlaurene simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets AT thornrgreg simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets AT smithdavidroy simplematchingusingqiime2andrdprevealsmisidentifiedsequencesandanunderrepresentationoffungiinreferencedatasets |