Cargando…

Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology

BACKGROUND: Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruan, Jue, Jiang, Lan, Chong, Zechen, Gong, Qiang, Li, Heng, Li, Chunyan, Tao, Yong, Zheng, Caihong, Zhai, Weiwei, Turissini, David, Cannon, Charles H, Lu, Xuemei, Wu, Chung-I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046676/
https://www.ncbi.nlm.nih.gov/pubmed/24134808
http://dx.doi.org/10.1186/1471-2164-14-711
_version_ 1782480296882470912
author Ruan, Jue
Jiang, Lan
Chong, Zechen
Gong, Qiang
Li, Heng
Li, Chunyan
Tao, Yong
Zheng, Caihong
Zhai, Weiwei
Turissini, David
Cannon, Charles H
Lu, Xuemei
Wu, Chung-I
author_facet Ruan, Jue
Jiang, Lan
Chong, Zechen
Gong, Qiang
Li, Heng
Li, Chunyan
Tao, Yong
Zheng, Caihong
Zhai, Weiwei
Turissini, David
Cannon, Charles H
Lu, Xuemei
Wu, Chung-I
author_sort Ruan, Jue
collection PubMed
description BACKGROUND: Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging. RESULTS: We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing. CONCLUSIONS: Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4046676
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40466762014-06-06 Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology Ruan, Jue Jiang, Lan Chong, Zechen Gong, Qiang Li, Heng Li, Chunyan Tao, Yong Zheng, Caihong Zhai, Weiwei Turissini, David Cannon, Charles H Lu, Xuemei Wu, Chung-I BMC Genomics Methodology Article BACKGROUND: Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging. RESULTS: We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing. CONCLUSIONS: Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users. BioMed Central 2013-10-17 /pmc/articles/PMC4046676/ /pubmed/24134808 http://dx.doi.org/10.1186/1471-2164-14-711 Text en © Ruan et al.; licensee BioMed Central Ltd. 2013 This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Ruan, Jue
Jiang, Lan
Chong, Zechen
Gong, Qiang
Li, Heng
Li, Chunyan
Tao, Yong
Zheng, Caihong
Zhai, Weiwei
Turissini, David
Cannon, Charles H
Lu, Xuemei
Wu, Chung-I
Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology
title Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology
title_full Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology
title_fullStr Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology
title_full_unstemmed Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology
title_short Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology
title_sort pseudo-sanger sequencing: massively parallel production of long and near error-free reads using ngs technology
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046676/
https://www.ncbi.nlm.nih.gov/pubmed/24134808
http://dx.doi.org/10.1186/1471-2164-14-711
work_keys_str_mv AT ruanjue pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT jianglan pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT chongzechen pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT gongqiang pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT liheng pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT lichunyan pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT taoyong pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT zhengcaihong pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT zhaiweiwei pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT turissinidavid pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT cannoncharlesh pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT luxuemei pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology
AT wuchungi pseudosangersequencingmassivelyparallelproductionoflongandnearerrorfreereadsusingngstechnology