Cargando…

Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics

Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate too...

Descripción completa

Detalles Bibliográficos
Autores principales: Siegwald, Léa, Touzet, Hélène, Lemoine, Yves, Hot, David, Audebert, Christophe, Caboche, Ségolène
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5215245/
https://www.ncbi.nlm.nih.gov/pubmed/28052134
http://dx.doi.org/10.1371/journal.pone.0169563
_version_ 1782491736222728192
author Siegwald, Léa
Touzet, Hélène
Lemoine, Yves
Hot, David
Audebert, Christophe
Caboche, Ségolène
author_facet Siegwald, Léa
Touzet, Hélène
Lemoine, Yves
Hot, David
Audebert, Christophe
Caboche, Ségolène
author_sort Siegwald, Léa
collection PubMed
description Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described.
format Online
Article
Text
id pubmed-5215245
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-52152452017-01-19 Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics Siegwald, Léa Touzet, Hélène Lemoine, Yves Hot, David Audebert, Christophe Caboche, Ségolène PLoS One Research Article Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described. Public Library of Science 2017-01-04 /pmc/articles/PMC5215245/ /pubmed/28052134 http://dx.doi.org/10.1371/journal.pone.0169563 Text en © 2017 Siegwald et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Siegwald, Léa
Touzet, Hélène
Lemoine, Yves
Hot, David
Audebert, Christophe
Caboche, Ségolène
Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
title Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
title_full Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
title_fullStr Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
title_full_unstemmed Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
title_short Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
title_sort assessment of common and emerging bioinformatics pipelines for targeted metagenomics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5215245/
https://www.ncbi.nlm.nih.gov/pubmed/28052134
http://dx.doi.org/10.1371/journal.pone.0169563
work_keys_str_mv AT siegwaldlea assessmentofcommonandemergingbioinformaticspipelinesfortargetedmetagenomics
AT touzethelene assessmentofcommonandemergingbioinformaticspipelinesfortargetedmetagenomics
AT lemoineyves assessmentofcommonandemergingbioinformaticspipelinesfortargetedmetagenomics
AT hotdavid assessmentofcommonandemergingbioinformaticspipelinesfortargetedmetagenomics
AT audebertchristophe assessmentofcommonandemergingbioinformaticspipelinesfortargetedmetagenomics
AT cabochesegolene assessmentofcommonandemergingbioinformaticspipelinesfortargetedmetagenomics