Cargando…

Deconvolute individual genomes from metagenome sequences through short read clustering

Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Kexue, Lu, Yakang, Deng, Li, Wang, Lili, Shi, Lizhen, Wang, Zhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7150542/
https://www.ncbi.nlm.nih.gov/pubmed/32296615
http://dx.doi.org/10.7717/peerj.8966
_version_ 1783521053154213888
author Li, Kexue
Lu, Yakang
Deng, Li
Wang, Lili
Shi, Lizhen
Wang, Zhong
author_facet Li, Kexue
Lu, Yakang
Deng, Li
Wang, Lili
Shi, Lizhen
Wang, Zhong
author_sort Li, Kexue
collection PubMed
description Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality.
format Online
Article
Text
id pubmed-7150542
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-71505422020-04-15 Deconvolute individual genomes from metagenome sequences through short read clustering Li, Kexue Lu, Yakang Deng, Li Wang, Lili Shi, Lizhen Wang, Zhong PeerJ Bioinformatics Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality. PeerJ Inc. 2020-04-08 /pmc/articles/PMC7150542/ /pubmed/32296615 http://dx.doi.org/10.7717/peerj.8966 Text en © 2020 Li et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Li, Kexue
Lu, Yakang
Deng, Li
Wang, Lili
Shi, Lizhen
Wang, Zhong
Deconvolute individual genomes from metagenome sequences through short read clustering
title Deconvolute individual genomes from metagenome sequences through short read clustering
title_full Deconvolute individual genomes from metagenome sequences through short read clustering
title_fullStr Deconvolute individual genomes from metagenome sequences through short read clustering
title_full_unstemmed Deconvolute individual genomes from metagenome sequences through short read clustering
title_short Deconvolute individual genomes from metagenome sequences through short read clustering
title_sort deconvolute individual genomes from metagenome sequences through short read clustering
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7150542/
https://www.ncbi.nlm.nih.gov/pubmed/32296615
http://dx.doi.org/10.7717/peerj.8966
work_keys_str_mv AT likexue deconvoluteindividualgenomesfrommetagenomesequencesthroughshortreadclustering
AT luyakang deconvoluteindividualgenomesfrommetagenomesequencesthroughshortreadclustering
AT dengli deconvoluteindividualgenomesfrommetagenomesequencesthroughshortreadclustering
AT wanglili deconvoluteindividualgenomesfrommetagenomesequencesthroughshortreadclustering
AT shilizhen deconvoluteindividualgenomesfrommetagenomesequencesthroughshortreadclustering
AT wangzhong deconvoluteindividualgenomesfrommetagenomesequencesthroughshortreadclustering