Cargando…

16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences

Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsieh, Yu-Peng, Hung, Yuan-Mao, Tsai, Mong-Hsun, Lai, Liang-Chuan, Chuang, Eric Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580931/
https://www.ncbi.nlm.nih.gov/pubmed/36304264
http://dx.doi.org/10.3389/fbinf.2022.905489
_version_ 1784812503427973120
author Hsieh, Yu-Peng
Hung, Yuan-Mao
Tsai, Mong-Hsun
Lai, Liang-Chuan
Chuang, Eric Y.
author_facet Hsieh, Yu-Peng
Hung, Yuan-Mao
Tsai, Mong-Hsun
Lai, Liang-Chuan
Chuang, Eric Y.
author_sort Hsieh, Yu-Peng
collection PubMed
description Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, and Greengenes are the most widely used 16S rRNA databases. Many 16S rRNA classifiers have used these databases as a reference for taxonomic assignment tasks. However, some of the prokaryotic taxonomies only exist in one of the three databases. Furthermore, Greengenes and SILVA include a considerable number of taxonomies that do not have the resolution to the species level, which has limited the classifiers’ performance. In order to improve the accuracy of taxonomic assignment at the species level for full-length 16S rRNA sequences, we manually curated the three databases and removed the sequences that did not have a species name. We then established a taxonomy-based integrated database by considering both taxonomies and sequences from all three 16S rRNA databases and validated it by a mock community. Results showed that our taxonomy-based integrated database had improved taxonomic resolution to the species level. The integrated database and the related datasets are available at https://github.com/yphsieh/ItgDB.
format Online
Article
Text
id pubmed-9580931
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95809312022-10-26 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences Hsieh, Yu-Peng Hung, Yuan-Mao Tsai, Mong-Hsun Lai, Liang-Chuan Chuang, Eric Y. Front Bioinform Bioinformatics Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, and Greengenes are the most widely used 16S rRNA databases. Many 16S rRNA classifiers have used these databases as a reference for taxonomic assignment tasks. However, some of the prokaryotic taxonomies only exist in one of the three databases. Furthermore, Greengenes and SILVA include a considerable number of taxonomies that do not have the resolution to the species level, which has limited the classifiers’ performance. In order to improve the accuracy of taxonomic assignment at the species level for full-length 16S rRNA sequences, we manually curated the three databases and removed the sequences that did not have a species name. We then established a taxonomy-based integrated database by considering both taxonomies and sequences from all three 16S rRNA databases and validated it by a mock community. Results showed that our taxonomy-based integrated database had improved taxonomic resolution to the species level. The integrated database and the related datasets are available at https://github.com/yphsieh/ItgDB. Frontiers Media S.A. 2022-08-03 /pmc/articles/PMC9580931/ /pubmed/36304264 http://dx.doi.org/10.3389/fbinf.2022.905489 Text en Copyright © 2022 Hsieh, Hung, Tsai, Lai and Chuang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Hsieh, Yu-Peng
Hung, Yuan-Mao
Tsai, Mong-Hsun
Lai, Liang-Chuan
Chuang, Eric Y.
16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
title 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
title_full 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
title_fullStr 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
title_full_unstemmed 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
title_short 16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
title_sort 16s-itgdb: an integrated database for improving species classification of prokaryotic 16s ribosomal rna sequences
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580931/
https://www.ncbi.nlm.nih.gov/pubmed/36304264
http://dx.doi.org/10.3389/fbinf.2022.905489
work_keys_str_mv AT hsiehyupeng 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences
AT hungyuanmao 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences
AT tsaimonghsun 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences
AT lailiangchuan 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences
AT chuangericy 16sitgdbanintegrateddatabaseforimprovingspeciesclassificationofprokaryotic16sribosomalrnasequences