Cargando…
Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing
BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552406/ https://www.ncbi.nlm.nih.gov/pubmed/26315384 http://dx.doi.org/10.1186/s12864-015-1859-8 |
_version_ | 1782387721890693120 |
---|---|
author | Chen, Ting-Wen Gan, Ruei-Chi Chang, Yi-Feng Liao, Wei-Chao Wu, Timothy H. Lee, Chi-Ching Huang, Po-Jung Lee, Cheng-Yang Chen, Yi-Ywan M. Chiu, Cheng-Hsun Tang, Petrus |
author_facet | Chen, Ting-Wen Gan, Ruei-Chi Chang, Yi-Feng Liao, Wei-Chao Wu, Timothy H. Lee, Chi-Ching Huang, Po-Jung Lee, Cheng-Yang Chen, Yi-Ywan M. Chiu, Cheng-Hsun Tang, Petrus |
author_sort | Chen, Ting-Wen |
collection | PubMed |
description | BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS: We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS: In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4552406 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45524062015-08-29 Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing Chen, Ting-Wen Gan, Ruei-Chi Chang, Yi-Feng Liao, Wei-Chao Wu, Timothy H. Lee, Chi-Ching Huang, Po-Jung Lee, Cheng-Yang Chen, Yi-Ywan M. Chiu, Cheng-Hsun Tang, Petrus BMC Genomics Research Article BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS: We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS: In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-08-28 /pmc/articles/PMC4552406/ /pubmed/26315384 http://dx.doi.org/10.1186/s12864-015-1859-8 Text en © Chen et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Chen, Ting-Wen Gan, Ruei-Chi Chang, Yi-Feng Liao, Wei-Chao Wu, Timothy H. Lee, Chi-Ching Huang, Po-Jung Lee, Cheng-Yang Chen, Yi-Ywan M. Chiu, Cheng-Hsun Tang, Petrus Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing |
title | Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing |
title_full | Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing |
title_fullStr | Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing |
title_full_unstemmed | Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing |
title_short | Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing |
title_sort | is the whole greater than the sum of its parts? de novo assembly strategies for bacterial genomes based on paired-end sequencing |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552406/ https://www.ncbi.nlm.nih.gov/pubmed/26315384 http://dx.doi.org/10.1186/s12864-015-1859-8 |
work_keys_str_mv | AT chentingwen isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT ganrueichi isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT changyifeng isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT liaoweichao isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT wutimothyh isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT leechiching isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT huangpojung isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT leechengyang isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT chenyiywanm isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT chiuchenghsun isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing AT tangpetrus isthewholegreaterthanthesumofitspartsdenovoassemblystrategiesforbacterialgenomesbasedonpairedendsequencing |