Cargando…

Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing

BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Ting-Wen, Gan, Ruei-Chi, Chang, Yi-Feng, Liao, Wei-Chao, Wu, Timothy H., Lee, Chi-Ching, Huang, Po-Jung, Lee, Cheng-Yang, Chen, Yi-Ywan M., Chiu, Cheng-Hsun, Tang, Petrus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552406/
https://www.ncbi.nlm.nih.gov/pubmed/26315384
http://dx.doi.org/10.1186/s12864-015-1859-8
_version_ 1782387721890693120
author Chen, Ting-Wen
Gan, Ruei-Chi
Chang, Yi-Feng
Liao, Wei-Chao
Wu, Timothy H.
Lee, Chi-Ching
Huang, Po-Jung
Lee, Cheng-Yang
Chen, Yi-Ywan M.
Chiu, Cheng-Hsun
Tang, Petrus
author_facet Chen, Ting-Wen
Gan, Ruei-Chi
Chang, Yi-Feng
Liao, Wei-Chao
Wu, Timothy H.
Lee, Chi-Ching
Huang, Po-Jung
Lee, Cheng-Yang
Chen, Yi-Ywan M.
Chiu, Cheng-Hsun
Tang, Petrus
author_sort Chen, Ting-Wen
collection PubMed
description BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS: We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS: In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4552406
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45524062015-08-29 Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing Chen, Ting-Wen Gan, Ruei-Chi Chang, Yi-Feng Liao, Wei-Chao Wu, Timothy H. Lee, Chi-Ching Huang, Po-Jung Lee, Cheng-Yang Chen, Yi-Ywan M. Chiu, Cheng-Hsun Tang, Petrus BMC Genomics Research Article BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS: We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS: In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-08-28 /pmc/articles/PMC4552406/ /pubmed/26315384 http://dx.doi.org/10.1186/s12864-015-1859-8 Text en © Chen et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chen, Ting-Wen
Gan, Ruei-Chi
Chang, Yi-Feng
Liao, Wei-Chao
Wu, Timothy H.
Lee, Chi-Ching
Huang, Po-Jung
Lee, Cheng-Yang
Chen, Yi-Ywan M.
Chiu, Cheng-Hsun
Tang, Petrus
Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
title Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
title_full Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
title_fullStr Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
title_full_unstemmed Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
title_short Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
title_sort is the whole greater than the sum of its parts? de novo assembly strategies for bacterial genomes based on paired-end sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552406/
https://www.ncbi.nlm.nih.gov/pubmed/26315384
http://dx.doi.org/10.1186/s12864-015-1859-8
work_keys_str_mv AT chentingwen isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT ganrueichi isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT changyifeng isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT liaoweichao isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT wutimothyh isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT leechiching isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT huangpojung isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT leechengyang isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT chenyiywanm isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT chiuchenghsun isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing
AT tangpetrus isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing