Cargando…
Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly
Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downs...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3726674/ https://www.ncbi.nlm.nih.gov/pubmed/23922726 http://dx.doi.org/10.1371/journal.pone.0069503 |
_version_ | 1782278687212699648 |
---|---|
author | Liu, Tsunglin Tsai, Cheng-Hung Lee, Wen-Bin Chiang, Jung-Hsien |
author_facet | Liu, Tsunglin Tsai, Cheng-Hung Lee, Wen-Bin Chiang, Jung-Hsien |
author_sort | Liu, Tsunglin |
collection | PubMed |
description | Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/. |
format | Online Article Text |
id | pubmed-3726674 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-37266742013-08-06 Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly Liu, Tsunglin Tsai, Cheng-Hung Lee, Wen-Bin Chiang, Jung-Hsien PLoS One Research Article Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/. Public Library of Science 2013-07-29 /pmc/articles/PMC3726674/ /pubmed/23922726 http://dx.doi.org/10.1371/journal.pone.0069503 Text en © 2013 Liu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Liu, Tsunglin Tsai, Cheng-Hung Lee, Wen-Bin Chiang, Jung-Hsien Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly |
title | Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly |
title_full | Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly |
title_fullStr | Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly |
title_full_unstemmed | Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly |
title_short | Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly |
title_sort | optimizing information in next-generation-sequencing (ngs) reads for improving de novo genome assembly |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3726674/ https://www.ncbi.nlm.nih.gov/pubmed/23922726 http://dx.doi.org/10.1371/journal.pone.0069503 |
work_keys_str_mv | AT liutsunglin optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly AT tsaichenghung optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly AT leewenbin optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly AT chiangjunghsien optimizinginformationinnextgenerationsequencingngsreadsforimprovingdenovogenomeassembly |