Cargando…

Optimizing k-mer size using a variant grid search to enhance de novo genome assembly

Largely driven by huge reductions in per-base costs, sequencing nucleic acids has become a near-ubiquitous technique in laboratories performing biological and biomedical research. Most of the effort goes to re-sequencing, but assembly of de novogenerated, raw sequence reads into contigs that span as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cha, Soyeon, Bird, David McK
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Biomedical Informatics 2016
Materias:	Prediction Model
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5237644/ https://www.ncbi.nlm.nih.gov/pubmed/28104957 http://dx.doi.org/10.6026/97320630012036

_version_	1782495561264398336
author	Cha, Soyeon Bird, David McK
author_facet	Cha, Soyeon Bird, David McK
author_sort	Cha, Soyeon
collection	PubMed
description	Largely driven by huge reductions in per-base costs, sequencing nucleic acids has become a near-ubiquitous technique in laboratories performing biological and biomedical research. Most of the effort goes to re-sequencing, but assembly of de novogenerated, raw sequence reads into contigs that span as much of the genome as possible is central to many projects. Although truly complete coverage is not realistically attainable, maximizing the amount of sequence that can be correctly assembled into contigs contributes to coverage. Here we compare three commonly used assembly algorithms (ABySS, Velvet and SOAPdenovo2), and show that empirical optimization of k-mer values has a disproportionate influence on de novo assembly of a eukaryotic genome, the nematode parasite Meloidogynechitwoodi. Each assembler was challenged with about 40 million Iluumina II paired-end reads, and assemblies performed under a range of k-mer sizes. In each instance, the optimal k-mer was 127, although based on N50 values,ABySS was more efficient than the others. That the assembly was not spurious was established using the “Core Eukaryotic Gene Mapping Approach”, which indicated that 98.79% of the M. chitwoodi genome was accounted for by the assembly. Subsequent gene finding and annotation are consistent with this and suggest that k-mer optimization contributes to the robustness of assembly.
format	Online Article Text
id	pubmed-5237644
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Biomedical Informatics
record_format	MEDLINE/PubMed
spelling	pubmed-52376442017-01-19 Optimizing k-mer size using a variant grid search to enhance de novo genome assembly Cha, Soyeon Bird, David McK Bioinformation Prediction Model Largely driven by huge reductions in per-base costs, sequencing nucleic acids has become a near-ubiquitous technique in laboratories performing biological and biomedical research. Most of the effort goes to re-sequencing, but assembly of de novogenerated, raw sequence reads into contigs that span as much of the genome as possible is central to many projects. Although truly complete coverage is not realistically attainable, maximizing the amount of sequence that can be correctly assembled into contigs contributes to coverage. Here we compare three commonly used assembly algorithms (ABySS, Velvet and SOAPdenovo2), and show that empirical optimization of k-mer values has a disproportionate influence on de novo assembly of a eukaryotic genome, the nematode parasite Meloidogynechitwoodi. Each assembler was challenged with about 40 million Iluumina II paired-end reads, and assemblies performed under a range of k-mer sizes. In each instance, the optimal k-mer was 127, although based on N50 values,ABySS was more efficient than the others. That the assembly was not spurious was established using the “Core Eukaryotic Gene Mapping Approach”, which indicated that 98.79% of the M. chitwoodi genome was accounted for by the assembly. Subsequent gene finding and annotation are consistent with this and suggest that k-mer optimization contributes to the robustness of assembly. Biomedical Informatics 2016-04-10 /pmc/articles/PMC5237644/ /pubmed/28104957 http://dx.doi.org/10.6026/97320630012036 Text en © 2016 Biomedical Informatics This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
spellingShingle	Prediction Model Cha, Soyeon Bird, David McK Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
title	Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
title_full	Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
title_fullStr	Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
title_full_unstemmed	Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
title_short	Optimizing k-mer size using a variant grid search to enhance de novo genome assembly
title_sort	optimizing k-mer size using a variant grid search to enhance de novo genome assembly
topic	Prediction Model
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5237644/ https://www.ncbi.nlm.nih.gov/pubmed/28104957 http://dx.doi.org/10.6026/97320630012036
work_keys_str_mv	AT chasoyeon optimizingkmersizeusingavariantgridsearchtoenhancedenovogenomeassembly AT birddavidmck optimizingkmersizeusingavariantgridsearchtoenhancedenovogenomeassembly

Optimizing k-mer size using a variant grid search to enhance de novo genome assembly

Ejemplares similares