Cargando…

Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping

Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but...

Descripción completa

Detalles Bibliográficos
Autores principales: Onmus-Leone, Fatma, Hang, Jun, Clifford, Robert J., Yang, Yu, Riley, Matthew C., Kuschner, Robert A., Waterman, Paige E., Lesho, Emil P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3629165/
https://www.ncbi.nlm.nih.gov/pubmed/23613926
http://dx.doi.org/10.1371/journal.pone.0061762
_version_ 1782266534192742400
author Onmus-Leone, Fatma
Hang, Jun
Clifford, Robert J.
Yang, Yu
Riley, Matthew C.
Kuschner, Robert A.
Waterman, Paige E.
Lesho, Emil P.
author_facet Onmus-Leone, Fatma
Hang, Jun
Clifford, Robert J.
Yang, Yu
Riley, Matthew C.
Kuschner, Robert A.
Waterman, Paige E.
Lesho, Emil P.
author_sort Onmus-Leone, Fatma
collection PubMed
description Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla (NDM-1) plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.
format Online
Article
Text
id pubmed-3629165
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36291652013-04-23 Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping Onmus-Leone, Fatma Hang, Jun Clifford, Robert J. Yang, Yu Riley, Matthew C. Kuschner, Robert A. Waterman, Paige E. Lesho, Emil P. PLoS One Research Article Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla (NDM-1) plasmid, and a novel bacteriophage, without separately purifying them to homogeneity. Public Library of Science 2013-04-17 /pmc/articles/PMC3629165/ /pubmed/23613926 http://dx.doi.org/10.1371/journal.pone.0061762 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Onmus-Leone, Fatma
Hang, Jun
Clifford, Robert J.
Yang, Yu
Riley, Matthew C.
Kuschner, Robert A.
Waterman, Paige E.
Lesho, Emil P.
Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping
title Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping
title_full Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping
title_fullStr Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping
title_full_unstemmed Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping
title_short Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping
title_sort enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3629165/
https://www.ncbi.nlm.nih.gov/pubmed/23613926
http://dx.doi.org/10.1371/journal.pone.0061762
work_keys_str_mv AT onmusleonefatma enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT hangjun enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT cliffordrobertj enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT yangyu enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT rileymatthewc enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT kuschnerroberta enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT watermanpaigee enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping
AT leshoemilp enhanceddenovoassemblyofhighthroughputpyrosequencingdatausingwholegenomemapping