Cargando…

Evaluation of short read metagenomic assembly

BACKGROUND: Metagenomic assembly is a challenging problem due to the presence of genetic material from multiple organisms. The problem becomes even more difficult when short reads produced by next generation sequencing technologies are used. Although whole genome assemblers are not designed to assem...

Descripción completa

Detalles Bibliográficos
Autores principales:	Charuvaka, Anveshi, Rangwala, Huzefa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194239/ https://www.ncbi.nlm.nih.gov/pubmed/21989307 http://dx.doi.org/10.1186/1471-2164-12-S2-S8

_version_	1782213935573762048
author	Charuvaka, Anveshi Rangwala, Huzefa
author_facet	Charuvaka, Anveshi Rangwala, Huzefa
author_sort	Charuvaka, Anveshi
collection	PubMed
description	BACKGROUND: Metagenomic assembly is a challenging problem due to the presence of genetic material from multiple organisms. The problem becomes even more difficult when short reads produced by next generation sequencing technologies are used. Although whole genome assemblers are not designed to assemble metagenomic samples, they are being used for metagenomics due to the lack of assemblers capable of dealing with metagenomic samples. We present an evaluation of assembly of simulated short-read metagenomic samples using a state-of-art de Bruijn graph based assembler. RESULTS: We assembled simulated metagenomic reads from datasets of various complexities using a state-of-art de Bruijn graph based parallel assembler. We have also studied the effect of k-mer size used in de Bruijn graph on metagenomic assembly and developed a clustering solution to pool the contigs obtained from different assembly runs, which allowed us to obtain longer contigs. We have also assessed the degree of chimericity of the assembled contigs using an entropy/impurity metric and compared the metagenomic assemblies to assemblies of isolated individual source genomes. CONCLUSIONS: Our results show that accuracy of the assembled contigs was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains. Clustering contigs from different k-mer parameter of the de Bruijn graph allowed us to obtain longer contigs, however the clustering resulted in accumulation of erroneous contigs thus increasing the error rate in clustered contigs.
format	Online Article Text
id	pubmed-3194239
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31942392011-10-17 Evaluation of short read metagenomic assembly Charuvaka, Anveshi Rangwala, Huzefa BMC Genomics Proceedings BACKGROUND: Metagenomic assembly is a challenging problem due to the presence of genetic material from multiple organisms. The problem becomes even more difficult when short reads produced by next generation sequencing technologies are used. Although whole genome assemblers are not designed to assemble metagenomic samples, they are being used for metagenomics due to the lack of assemblers capable of dealing with metagenomic samples. We present an evaluation of assembly of simulated short-read metagenomic samples using a state-of-art de Bruijn graph based assembler. RESULTS: We assembled simulated metagenomic reads from datasets of various complexities using a state-of-art de Bruijn graph based parallel assembler. We have also studied the effect of k-mer size used in de Bruijn graph on metagenomic assembly and developed a clustering solution to pool the contigs obtained from different assembly runs, which allowed us to obtain longer contigs. We have also assessed the degree of chimericity of the assembled contigs using an entropy/impurity metric and compared the metagenomic assemblies to assemblies of isolated individual source genomes. CONCLUSIONS: Our results show that accuracy of the assembled contigs was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains. Clustering contigs from different k-mer parameter of the de Bruijn graph allowed us to obtain longer contigs, however the clustering resulted in accumulation of erroneous contigs thus increasing the error rate in clustered contigs. BioMed Central 2011-07-27 /pmc/articles/PMC3194239/ /pubmed/21989307 http://dx.doi.org/10.1186/1471-2164-12-S2-S8 Text en Copyright ©2011 Charuvaka and Rangwala; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Charuvaka, Anveshi Rangwala, Huzefa Evaluation of short read metagenomic assembly
title	Evaluation of short read metagenomic assembly
title_full	Evaluation of short read metagenomic assembly
title_fullStr	Evaluation of short read metagenomic assembly
title_full_unstemmed	Evaluation of short read metagenomic assembly
title_short	Evaluation of short read metagenomic assembly
title_sort	evaluation of short read metagenomic assembly
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194239/ https://www.ncbi.nlm.nih.gov/pubmed/21989307 http://dx.doi.org/10.1186/1471-2164-12-S2-S8
work_keys_str_mv	AT charuvakaanveshi evaluationofshortreadmetagenomicassembly AT rangwalahuzefa evaluationofshortreadmetagenomicassembly

Evaluation of short read metagenomic assembly

Ejemplares similares