Cargando…

Binning unassembled short reads based on k-mer abundance covariance using sparse coding

BACKGROUND: Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kyrgyzov, Olexiy, Prost, Vincent, Gazut, Stéphane, Farcy, Bruno, Brüls, Thomas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7099633/ https://www.ncbi.nlm.nih.gov/pubmed/32219339 http://dx.doi.org/10.1093/gigascience/giaa028

_version_	1783511344825237504
author	Kyrgyzov, Olexiy Prost, Vincent Gazut, Stéphane Farcy, Bruno Brüls, Thomas
author_facet	Kyrgyzov, Olexiy Prost, Vincent Gazut, Stéphane Farcy, Bruno Brüls, Thomas
author_sort	Kyrgyzov, Olexiy
collection	PubMed
description	BACKGROUND: Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. RESULTS: We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >10(10) reads). CONCLUSION: We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.
format	Online Article Text
id	pubmed-7099633
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-70996332020-04-06 Binning unassembled short reads based on k-mer abundance covariance using sparse coding Kyrgyzov, Olexiy Prost, Vincent Gazut, Stéphane Farcy, Bruno Brüls, Thomas Gigascience Technical Note BACKGROUND: Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. RESULTS: We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >10(10) reads). CONCLUSION: We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels. Oxford University Press 2020-03-29 /pmc/articles/PMC7099633/ /pubmed/32219339 http://dx.doi.org/10.1093/gigascience/giaa028 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Kyrgyzov, Olexiy Prost, Vincent Gazut, Stéphane Farcy, Bruno Brüls, Thomas Binning unassembled short reads based on k-mer abundance covariance using sparse coding
title	Binning unassembled short reads based on k-mer abundance covariance using sparse coding
title_full	Binning unassembled short reads based on k-mer abundance covariance using sparse coding
title_fullStr	Binning unassembled short reads based on k-mer abundance covariance using sparse coding
title_full_unstemmed	Binning unassembled short reads based on k-mer abundance covariance using sparse coding
title_short	Binning unassembled short reads based on k-mer abundance covariance using sparse coding
title_sort	binning unassembled short reads based on k-mer abundance covariance using sparse coding
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7099633/ https://www.ncbi.nlm.nih.gov/pubmed/32219339 http://dx.doi.org/10.1093/gigascience/giaa028
work_keys_str_mv	AT kyrgyzovolexiy binningunassembledshortreadsbasedonkmerabundancecovarianceusingsparsecoding AT prostvincent binningunassembledshortreadsbasedonkmerabundancecovarianceusingsparsecoding AT gazutstephane binningunassembledshortreadsbasedonkmerabundancecovarianceusingsparsecoding AT farcybruno binningunassembledshortreadsbasedonkmerabundancecovarianceusingsparsecoding AT brulsthomas binningunassembledshortreadsbasedonkmerabundancecovarianceusingsparsecoding

Binning unassembled short reads based on k-mer abundance covariance using sparse coding

Ejemplares similares