Cargando…
Benchmarking of 16S rRNA gene databases using known strain sequences
16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check t...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Biomedical Informatics
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8131573/ https://www.ncbi.nlm.nih.gov/pubmed/34092959 http://dx.doi.org/10.6026/97320630017377 |
_version_ | 1783694725596839936 |
---|---|
author | Dixit, Kunal Davray, Dimple Chaudhari, Diptaraj Kadam, Pratik Kshirsagar, Rudresh Shouche, Yogesh Dhotre, Dhiraj Saroj, Sunil D |
author_facet | Dixit, Kunal Davray, Dimple Chaudhari, Diptaraj Kadam, Pratik Kshirsagar, Rudresh Shouche, Yogesh Dhotre, Dhiraj Saroj, Sunil D |
author_sort | Dixit, Kunal |
collection | PubMed |
description | 16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at various taxonomic levels using various methods and databases. |
format | Online Article Text |
id | pubmed-8131573 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Biomedical Informatics |
record_format | MEDLINE/PubMed |
spelling | pubmed-81315732021-06-04 Benchmarking of 16S rRNA gene databases using known strain sequences Dixit, Kunal Davray, Dimple Chaudhari, Diptaraj Kadam, Pratik Kshirsagar, Rudresh Shouche, Yogesh Dhotre, Dhiraj Saroj, Sunil D Bioinformation Research Article 16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at various taxonomic levels using various methods and databases. Biomedical Informatics 2021-03-31 /pmc/articles/PMC8131573/ /pubmed/34092959 http://dx.doi.org/10.6026/97320630017377 Text en © 2021 Biomedical Informatics https://creativecommons.org/licenses/by/3.0/This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License. |
spellingShingle | Research Article Dixit, Kunal Davray, Dimple Chaudhari, Diptaraj Kadam, Pratik Kshirsagar, Rudresh Shouche, Yogesh Dhotre, Dhiraj Saroj, Sunil D Benchmarking of 16S rRNA gene databases using known strain sequences |
title | Benchmarking of 16S rRNA gene databases using known strain sequences |
title_full | Benchmarking of 16S rRNA gene databases using known strain sequences |
title_fullStr | Benchmarking of 16S rRNA gene databases using known strain sequences |
title_full_unstemmed | Benchmarking of 16S rRNA gene databases using known strain sequences |
title_short | Benchmarking of 16S rRNA gene databases using known strain sequences |
title_sort | benchmarking of 16s rrna gene databases using known strain sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8131573/ https://www.ncbi.nlm.nih.gov/pubmed/34092959 http://dx.doi.org/10.6026/97320630017377 |
work_keys_str_mv | AT dixitkunal benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT davraydimple benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT chaudharidiptaraj benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT kadampratik benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT kshirsagarrudresh benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT shoucheyogesh benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT dhotredhiraj benchmarkingof16srrnagenedatabasesusingknownstrainsequences AT sarojsunild benchmarkingof16srrnagenedatabasesusingknownstrainsequences |