Cargando…

Genome reassembly with high-throughput sequencing data

MOTIVATION: Recent studies in genomics have highlighted the significance of structural variation in determining individual variation. Current methods for identifying structural variation, however, are predominantly focused on either assembling whole genomes from scratch, or identifying the relativel...

Descripción completa

Detalles Bibliográficos
Autores principales: Parrish, Nathaniel, Sudakov, Benjamin, Eskin, Eleazar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549812/
https://www.ncbi.nlm.nih.gov/pubmed/23368744
http://dx.doi.org/10.1186/1471-2164-14-S1-S8
_version_ 1782256475702296576
author Parrish, Nathaniel
Sudakov, Benjamin
Eskin, Eleazar
author_facet Parrish, Nathaniel
Sudakov, Benjamin
Eskin, Eleazar
author_sort Parrish, Nathaniel
collection PubMed
description MOTIVATION: Recent studies in genomics have highlighted the significance of structural variation in determining individual variation. Current methods for identifying structural variation, however, are predominantly focused on either assembling whole genomes from scratch, or identifying the relatively small changes between a genome and a reference sequence. While significant progress has been made in recent years on both de novo assembly and resequencing (read mapping) methods, few attempts have been made to bridge the gap between them. RESULTS: In this paper, we present a computational method for incorporating a reference sequence into an assembly algorithm. We propose a novel graph construction that builds upon the well-known de Bruijn graph to incorporate the reference, and describe a simple algorithm, based on iterative message passing, which uses this information to significantly improve assembly results. We validate our method by applying it to a series of 5 Mb simulation genomes derived from both mammalian and bacterial references. The results of applying our method to this simulation data are presented along with a discussion of the benefits and drawbacks of this technique.
format Online
Article
Text
id pubmed-3549812
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35498122013-01-23 Genome reassembly with high-throughput sequencing data Parrish, Nathaniel Sudakov, Benjamin Eskin, Eleazar BMC Genomics Proceedings MOTIVATION: Recent studies in genomics have highlighted the significance of structural variation in determining individual variation. Current methods for identifying structural variation, however, are predominantly focused on either assembling whole genomes from scratch, or identifying the relatively small changes between a genome and a reference sequence. While significant progress has been made in recent years on both de novo assembly and resequencing (read mapping) methods, few attempts have been made to bridge the gap between them. RESULTS: In this paper, we present a computational method for incorporating a reference sequence into an assembly algorithm. We propose a novel graph construction that builds upon the well-known de Bruijn graph to incorporate the reference, and describe a simple algorithm, based on iterative message passing, which uses this information to significantly improve assembly results. We validate our method by applying it to a series of 5 Mb simulation genomes derived from both mammalian and bacterial references. The results of applying our method to this simulation data are presented along with a discussion of the benefits and drawbacks of this technique. BioMed Central 2013-01-21 /pmc/articles/PMC3549812/ /pubmed/23368744 http://dx.doi.org/10.1186/1471-2164-14-S1-S8 Text en Copyright ©2013 Parrish et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Parrish, Nathaniel
Sudakov, Benjamin
Eskin, Eleazar
Genome reassembly with high-throughput sequencing data
title Genome reassembly with high-throughput sequencing data
title_full Genome reassembly with high-throughput sequencing data
title_fullStr Genome reassembly with high-throughput sequencing data
title_full_unstemmed Genome reassembly with high-throughput sequencing data
title_short Genome reassembly with high-throughput sequencing data
title_sort genome reassembly with high-throughput sequencing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549812/
https://www.ncbi.nlm.nih.gov/pubmed/23368744
http://dx.doi.org/10.1186/1471-2164-14-S1-S8
work_keys_str_mv AT parrishnathaniel genomereassemblywithhighthroughputsequencingdata
AT sudakovbenjamin genomereassemblywithhighthroughputsequencingdata
AT eskineleazar genomereassemblywithhighthroughputsequencingdata