Cargando…

Comparing De Novo Genome Assembly: The Long and Short of It

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implemen...

Descripción completa

Detalles Bibliográficos
Autores principales: Narzisi, Giuseppe, Mishra, Bud
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3084767/
https://www.ncbi.nlm.nih.gov/pubmed/21559467
http://dx.doi.org/10.1371/journal.pone.0019175
_version_ 1782202549209661440
author Narzisi, Giuseppe
Mishra, Bud
author_facet Narzisi, Giuseppe
Mishra, Bud
author_sort Narzisi, Giuseppe
collection PubMed
description Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
format Text
id pubmed-3084767
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30847672011-05-10 Comparing De Novo Genome Assembly: The Long and Short of It Narzisi, Giuseppe Mishra, Bud PLoS One Research Article Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium. Public Library of Science 2011-04-29 /pmc/articles/PMC3084767/ /pubmed/21559467 http://dx.doi.org/10.1371/journal.pone.0019175 Text en Narzisi and Mishra. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Narzisi, Giuseppe
Mishra, Bud
Comparing De Novo Genome Assembly: The Long and Short of It
title Comparing De Novo Genome Assembly: The Long and Short of It
title_full Comparing De Novo Genome Assembly: The Long and Short of It
title_fullStr Comparing De Novo Genome Assembly: The Long and Short of It
title_full_unstemmed Comparing De Novo Genome Assembly: The Long and Short of It
title_short Comparing De Novo Genome Assembly: The Long and Short of It
title_sort comparing de novo genome assembly: the long and short of it
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3084767/
https://www.ncbi.nlm.nih.gov/pubmed/21559467
http://dx.doi.org/10.1371/journal.pone.0019175
work_keys_str_mv AT narzisigiuseppe comparingdenovogenomeassemblythelongandshortofit
AT mishrabud comparingdenovogenomeassemblythelongandshortofit