Cargando…

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

BACKGROUND: Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable....

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Xiang, Lin, Huaiying, Revanna, Kashi, Dong, Qunfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5424349/
https://www.ncbi.nlm.nih.gov/pubmed/28486927
http://dx.doi.org/10.1186/s12859-017-1670-4
_version_ 1783235114975625216
author Gao, Xiang
Lin, Huaiying
Revanna, Kashi
Dong, Qunfeng
author_facet Gao, Xiang
Lin, Huaiying
Revanna, Kashi
Dong, Qunfeng
author_sort Gao, Xiang
collection PubMed
description BACKGROUND: Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. RESULTS: We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. CONCLUSIONS: Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA.
format Online
Article
Text
id pubmed-5424349
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54243492017-05-10 A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy Gao, Xiang Lin, Huaiying Revanna, Kashi Dong, Qunfeng BMC Bioinformatics Software BACKGROUND: Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. RESULTS: We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. CONCLUSIONS: Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA. BioMed Central 2017-05-10 /pmc/articles/PMC5424349/ /pubmed/28486927 http://dx.doi.org/10.1186/s12859-017-1670-4 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Gao, Xiang
Lin, Huaiying
Revanna, Kashi
Dong, Qunfeng
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_full A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_fullStr A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_full_unstemmed A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_short A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_sort bayesian taxonomic classification method for 16s rrna gene sequences with improved species-level accuracy
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5424349/
https://www.ncbi.nlm.nih.gov/pubmed/28486927
http://dx.doi.org/10.1186/s12859-017-1670-4
work_keys_str_mv AT gaoxiang abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT linhuaiying abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT revannakashi abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT dongqunfeng abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT gaoxiang bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT linhuaiying bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT revannakashi bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT dongqunfeng bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy