Cargando…
16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580931/ https://www.ncbi.nlm.nih.gov/pubmed/36304264 http://dx.doi.org/10.3389/fbinf.2022.905489 |
_version_ | 1784812503427973120 |
---|---|
author | Hsieh, Yu-Peng Hung, Yuan-Mao Tsai, Mong-Hsun Lai, Liang-Chuan Chuang, Eric Y. |
author_facet | Hsieh, Yu-Peng Hung, Yuan-Mao Tsai, Mong-Hsun Lai, Liang-Chuan Chuang, Eric Y. |
author_sort | Hsieh, Yu-Peng |
collection | PubMed |
description | Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, and Greengenes are the most widely used 16S rRNA databases. Many 16S rRNA classifiers have used these databases as a reference for taxonomic assignment tasks. However, some of the prokaryotic taxonomies only exist in one of the three databases. Furthermore, Greengenes and SILVA include a considerable number of taxonomies that do not have the resolution to the species level, which has limited the classifiers’ performance. In order to improve the accuracy of taxonomic assignment at the species level for full-length 16S rRNA sequences, we manually curated the three databases and removed the sequences that did not have a species name. We then established a taxonomy-based integrated database by considering both taxonomies and sequences from all three 16S rRNA databases and validated it by a mock community. Results showed that our taxonomy-based integrated database had improved taxonomic resolution to the species level. The integrated database and the related datasets are available at https://github.com/yphsieh/ItgDB. |
format | Online Article Text |
id | pubmed-9580931 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95809312022-10-26 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences Hsieh, Yu-Peng Hung, Yuan-Mao Tsai, Mong-Hsun Lai, Liang-Chuan Chuang, Eric Y. Front Bioinform Bioinformatics Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, and Greengenes are the most widely used 16S rRNA databases. Many 16S rRNA classifiers have used these databases as a reference for taxonomic assignment tasks. However, some of the prokaryotic taxonomies only exist in one of the three databases. Furthermore, Greengenes and SILVA include a considerable number of taxonomies that do not have the resolution to the species level, which has limited the classifiers’ performance. In order to improve the accuracy of taxonomic assignment at the species level for full-length 16S rRNA sequences, we manually curated the three databases and removed the sequences that did not have a species name. We then established a taxonomy-based integrated database by considering both taxonomies and sequences from all three 16S rRNA databases and validated it by a mock community. Results showed that our taxonomy-based integrated database had improved taxonomic resolution to the species level. The integrated database and the related datasets are available at https://github.com/yphsieh/ItgDB. Frontiers Media S.A. 2022-08-03 /pmc/articles/PMC9580931/ /pubmed/36304264 http://dx.doi.org/10.3389/fbinf.2022.905489 Text en Copyright © 2022 Hsieh, Hung, Tsai, Lai and Chuang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Hsieh, Yu-Peng Hung, Yuan-Mao Tsai, Mong-Hsun Lai, Liang-Chuan Chuang, Eric Y. 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences |
title | 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences |
title_full | 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences |
title_fullStr | 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences |
title_full_unstemmed | 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences |
title_short | 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences |
title_sort | 16s-itgdb: an integrated database for improving species classification of prokaryotic 16s ribosomal rna sequences |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580931/ https://www.ncbi.nlm.nih.gov/pubmed/36304264 http://dx.doi.org/10.3389/fbinf.2022.905489 |
work_keys_str_mv | AT hsiehyupeng 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences AT hungyuanmao 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences AT tsaimonghsun 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences AT lailiangchuan 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences AT chuangericy 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences |