Cargando…

MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage

MOTIVATION: Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling...

Descripción completa

Detalles Bibliográficos
Autores principales: Qian, Jia, Comin, Matteo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6873667/
https://www.ncbi.nlm.nih.gov/pubmed/31757198
http://dx.doi.org/10.1186/s12859-019-2904-4
_version_ 1783472711467532288
author Qian, Jia
Comin, Matteo
author_facet Qian, Jia
Comin, Matteo
author_sort Qian, Jia
collection PubMed
description MOTIVATION: Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling and down-streaming functional analysis. Taxonomic analysis of microbial communities requires contig clustering, a process referred to as binning, that is still one of the most challenging tasks when analyzing metagenomic data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, sequencing errors, and the limitations due to binning contig of different lengths. RESULTS: In this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage. MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, also contigs of different length are clustered in two separate phases. The effectiveness of MetaCon is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, MaxBin and MetaBAT. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2904-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6873667
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68736672019-11-25 MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage Qian, Jia Comin, Matteo BMC Bioinformatics Research MOTIVATION: Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling and down-streaming functional analysis. Taxonomic analysis of microbial communities requires contig clustering, a process referred to as binning, that is still one of the most challenging tasks when analyzing metagenomic data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, sequencing errors, and the limitations due to binning contig of different lengths. RESULTS: In this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage. MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, also contigs of different length are clustered in two separate phases. The effectiveness of MetaCon is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, MaxBin and MetaBAT. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2904-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-11-22 /pmc/articles/PMC6873667/ /pubmed/31757198 http://dx.doi.org/10.1186/s12859-019-2904-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Qian, Jia
Comin, Matteo
MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
title MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
title_full MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
title_fullStr MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
title_full_unstemmed MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
title_short MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
title_sort metacon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6873667/
https://www.ncbi.nlm.nih.gov/pubmed/31757198
http://dx.doi.org/10.1186/s12859-019-2904-4
work_keys_str_mv AT qianjia metaconunsupervisedclusteringofmetagenomiccontigswithprobabilistickmersstatisticsandcoverage
AT cominmatteo metaconunsupervisedclusteringofmetagenomiccontigswithprobabilistickmersstatisticsandcoverage