Cargando…

Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments

BACKGROUND: Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with M...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Alexandre, Mitchell, Alex L, Tarkowska, Aleksandra, Finn, Robert D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5967554/
https://www.ncbi.nlm.nih.gov/pubmed/29762668
http://dx.doi.org/10.1093/gigascience/giy054
_version_ 1783325626790313984
author Almeida, Alexandre
Mitchell, Alex L
Tarkowska, Aleksandra
Finn, Robert D
author_facet Almeida, Alexandre
Mitchell, Alex L
Tarkowska, Aleksandra
Finn, Robert D
author_sort Almeida, Alexandre
collection PubMed
description BACKGROUND: Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene. FINDINGS: We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline. CONCLUSIONS: Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required.
format Online
Article
Text
id pubmed-5967554
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59675542018-06-04 Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments Almeida, Alexandre Mitchell, Alex L Tarkowska, Aleksandra Finn, Robert D Gigascience Technical Note BACKGROUND: Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene. FINDINGS: We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline. CONCLUSIONS: Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required. Oxford University Press 2018-05-11 /pmc/articles/PMC5967554/ /pubmed/29762668 http://dx.doi.org/10.1093/gigascience/giy054 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Almeida, Alexandre
Mitchell, Alex L
Tarkowska, Aleksandra
Finn, Robert D
Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments
title Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments
title_full Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments
title_fullStr Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments
title_full_unstemmed Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments
title_short Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments
title_sort benchmarking taxonomic assignments based on 16s rrna gene profiling of the microbiota from commonly sampled environments
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5967554/
https://www.ncbi.nlm.nih.gov/pubmed/29762668
http://dx.doi.org/10.1093/gigascience/giy054
work_keys_str_mv AT almeidaalexandre benchmarkingtaxonomicassignmentsbasedon16srrnageneprofilingofthemicrobiotafromcommonlysampledenvironments
AT mitchellalexl benchmarkingtaxonomicassignmentsbasedon16srrnageneprofilingofthemicrobiotafromcommonlysampledenvironments
AT tarkowskaaleksandra benchmarkingtaxonomicassignmentsbasedon16srrnageneprofilingofthemicrobiotafromcommonlysampledenvironments
AT finnrobertd benchmarkingtaxonomicassignmentsbasedon16srrnageneprofilingofthemicrobiotafromcommonlysampledenvironments