Cargando…

Benchmarking of 16S rRNA gene databases using known strain sequences

16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check t...

Descripción completa

Detalles Bibliográficos
Autores principales: Dixit, Kunal, Davray, Dimple, Chaudhari, Diptaraj, Kadam, Pratik, Kshirsagar, Rudresh, Shouche, Yogesh, Dhotre, Dhiraj, Saroj, Sunil D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8131573/
https://www.ncbi.nlm.nih.gov/pubmed/34092959
http://dx.doi.org/10.6026/97320630017377
_version_ 1783694725596839936
author Dixit, Kunal
Davray, Dimple
Chaudhari, Diptaraj
Kadam, Pratik
Kshirsagar, Rudresh
Shouche, Yogesh
Dhotre, Dhiraj
Saroj, Sunil D
author_facet Dixit, Kunal
Davray, Dimple
Chaudhari, Diptaraj
Kadam, Pratik
Kshirsagar, Rudresh
Shouche, Yogesh
Dhotre, Dhiraj
Saroj, Sunil D
author_sort Dixit, Kunal
collection PubMed
description 16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at various taxonomic levels using various methods and databases.
format Online
Article
Text
id pubmed-8131573
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-81315732021-06-04 Benchmarking of 16S rRNA gene databases using known strain sequences Dixit, Kunal Davray, Dimple Chaudhari, Diptaraj Kadam, Pratik Kshirsagar, Rudresh Shouche, Yogesh Dhotre, Dhiraj Saroj, Sunil D Bioinformation Research Article 16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at various taxonomic levels using various methods and databases. Biomedical Informatics 2021-03-31 /pmc/articles/PMC8131573/ /pubmed/34092959 http://dx.doi.org/10.6026/97320630017377 Text en © 2021 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Article
Dixit, Kunal
Davray, Dimple
Chaudhari, Diptaraj
Kadam, Pratik
Kshirsagar, Rudresh
Shouche, Yogesh
Dhotre, Dhiraj
Saroj, Sunil D
Benchmarking of 16S rRNA gene databases using known strain sequences
title Benchmarking of 16S rRNA gene databases using known strain sequences
title_full Benchmarking of 16S rRNA gene databases using known strain sequences
title_fullStr Benchmarking of 16S rRNA gene databases using known strain sequences
title_full_unstemmed Benchmarking of 16S rRNA gene databases using known strain sequences
title_short Benchmarking of 16S rRNA gene databases using known strain sequences
title_sort benchmarking of 16s rrna gene databases using known strain sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8131573/
https://www.ncbi.nlm.nih.gov/pubmed/34092959
http://dx.doi.org/10.6026/97320630017377
work_keys_str_mv AT dixitkunal benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT davraydimple benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT chaudharidiptaraj benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT kadampratik benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT kshirsagarrudresh benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT shoucheyogesh benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT dhotredhiraj benchmarkingof16srrnagenedatabasesusingknownstrainsequences
AT sarojsunild benchmarkingof16srrnagenedatabasesusingknownstrainsequences