Cargando…

Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools

BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of...

Descripción completa

Detalles Bibliográficos
Autores principales: Alexeyenko, Andrey, Nystedt, Björn, Vezzi, Francesco, Sherwood, Ellen, Ye, Rosa, Knudsen, Bjarne, Simonsen, Martin, Turner, Benjamin, de Jong, Pieter, Wu, Cheng-Cang, Lundeberg, Joakim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4070561/
https://www.ncbi.nlm.nih.gov/pubmed/24906298
http://dx.doi.org/10.1186/1471-2164-15-439
_version_ 1782322709005336576
author Alexeyenko, Andrey
Nystedt, Björn
Vezzi, Francesco
Sherwood, Ellen
Ye, Rosa
Knudsen, Bjarne
Simonsen, Martin
Turner, Benjamin
de Jong, Pieter
Wu, Cheng-Cang
Lundeberg, Joakim
author_facet Alexeyenko, Andrey
Nystedt, Björn
Vezzi, Francesco
Sherwood, Ellen
Ye, Rosa
Knudsen, Bjarne
Simonsen, Martin
Turner, Benjamin
de Jong, Pieter
Wu, Cheng-Cang
Lundeberg, Joakim
author_sort Alexeyenko, Andrey
collection PubMed
description BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process. We have made public the input data (FASTQ format) for the set of pools used in this study: ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/. (alternatively accessible via http://congenie.org/downloads). The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4070561
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40705612014-06-27 Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools Alexeyenko, Andrey Nystedt, Björn Vezzi, Francesco Sherwood, Ellen Ye, Rosa Knudsen, Bjarne Simonsen, Martin Turner, Benjamin de Jong, Pieter Wu, Cheng-Cang Lundeberg, Joakim BMC Genomics Methodology Article BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process. We have made public the input data (FASTQ format) for the set of pools used in this study: ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/. (alternatively accessible via http://congenie.org/downloads). The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users. BioMed Central 2014-06-06 /pmc/articles/PMC4070561/ /pubmed/24906298 http://dx.doi.org/10.1186/1471-2164-15-439 Text en © Alexeyenko et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Alexeyenko, Andrey
Nystedt, Björn
Vezzi, Francesco
Sherwood, Ellen
Ye, Rosa
Knudsen, Bjarne
Simonsen, Martin
Turner, Benjamin
de Jong, Pieter
Wu, Cheng-Cang
Lundeberg, Joakim
Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
title Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
title_full Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
title_fullStr Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
title_full_unstemmed Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
title_short Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
title_sort efficient de novo assembly of large and complex genomes by massively parallel sequencing of fosmid pools
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4070561/
https://www.ncbi.nlm.nih.gov/pubmed/24906298
http://dx.doi.org/10.1186/1471-2164-15-439
work_keys_str_mv AT alexeyenkoandrey efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT nystedtbjorn efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT vezzifrancesco efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT sherwoodellen efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT yerosa efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT knudsenbjarne efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT simonsenmartin efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT turnerbenjamin efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT dejongpieter efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT wuchengcang efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools
AT lundebergjoakim efficientdenovoassemblyoflargeandcomplexgenomesbymassivelyparallelsequencingoffosmidpools