Cargando…

Quality control of microbiota metagenomics by k-mer analysis

BACKGROUND: The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has crea...

Descripción completa

Detalles Bibliográficos
Autores principales: Plaza Onate, Florian, Batto, Jean-Michel, Juste, Catherine, Fadlallah, Jehane, Fougeroux, Cyrielle, Gouas, Doriane, Pons, Nicolas, Kennedy, Sean, Levenez, Florence, Dore, Joel, Ehrlich, S Dusko, Gorochov, Guy, Larsen, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373121/
https://www.ncbi.nlm.nih.gov/pubmed/25887914
http://dx.doi.org/10.1186/s12864-015-1406-7
_version_ 1782363295987007488
author Plaza Onate, Florian
Batto, Jean-Michel
Juste, Catherine
Fadlallah, Jehane
Fougeroux, Cyrielle
Gouas, Doriane
Pons, Nicolas
Kennedy, Sean
Levenez, Florence
Dore, Joel
Ehrlich, S Dusko
Gorochov, Guy
Larsen, Martin
author_facet Plaza Onate, Florian
Batto, Jean-Michel
Juste, Catherine
Fadlallah, Jehane
Fougeroux, Cyrielle
Gouas, Doriane
Pons, Nicolas
Kennedy, Sean
Levenez, Florence
Dore, Joel
Ehrlich, S Dusko
Gorochov, Guy
Larsen, Martin
author_sort Plaza Onate, Florian
collection PubMed
description BACKGROUND: The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue. RESULTS: We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets. CONCLUSIONS: We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4373121
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43731212015-03-26 Quality control of microbiota metagenomics by k-mer analysis Plaza Onate, Florian Batto, Jean-Michel Juste, Catherine Fadlallah, Jehane Fougeroux, Cyrielle Gouas, Doriane Pons, Nicolas Kennedy, Sean Levenez, Florence Dore, Joel Ehrlich, S Dusko Gorochov, Guy Larsen, Martin BMC Genomics Methodology Article BACKGROUND: The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue. RESULTS: We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets. CONCLUSIONS: We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-14 /pmc/articles/PMC4373121/ /pubmed/25887914 http://dx.doi.org/10.1186/s12864-015-1406-7 Text en © Plaza Onate et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Plaza Onate, Florian
Batto, Jean-Michel
Juste, Catherine
Fadlallah, Jehane
Fougeroux, Cyrielle
Gouas, Doriane
Pons, Nicolas
Kennedy, Sean
Levenez, Florence
Dore, Joel
Ehrlich, S Dusko
Gorochov, Guy
Larsen, Martin
Quality control of microbiota metagenomics by k-mer analysis
title Quality control of microbiota metagenomics by k-mer analysis
title_full Quality control of microbiota metagenomics by k-mer analysis
title_fullStr Quality control of microbiota metagenomics by k-mer analysis
title_full_unstemmed Quality control of microbiota metagenomics by k-mer analysis
title_short Quality control of microbiota metagenomics by k-mer analysis
title_sort quality control of microbiota metagenomics by k-mer analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373121/
https://www.ncbi.nlm.nih.gov/pubmed/25887914
http://dx.doi.org/10.1186/s12864-015-1406-7
work_keys_str_mv AT plazaonateflorian qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT battojeanmichel qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT justecatherine qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT fadlallahjehane qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT fougerouxcyrielle qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT gouasdoriane qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT ponsnicolas qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT kennedysean qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT levenezflorence qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT dorejoel qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT ehrlichsdusko qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT gorochovguy qualitycontrolofmicrobiotametagenomicsbykmeranalysis
AT larsenmartin qualitycontrolofmicrobiotametagenomicsbykmeranalysis