Cargando…

Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly

Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NG...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yen-Chun, Liu, Tsunglin, Yu, Chun-Hui, Chiang, Tzen-Yuh, Hwang, Chi-Chuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3639258/
https://www.ncbi.nlm.nih.gov/pubmed/23638157
http://dx.doi.org/10.1371/journal.pone.0062856
_version_ 1782475929902120960
author Chen, Yen-Chun
Liu, Tsunglin
Yu, Chun-Hui
Chiang, Tzen-Yuh
Hwang, Chi-Chuan
author_facet Chen, Yen-Chun
Liu, Tsunglin
Yu, Chun-Hui
Chiang, Tzen-Yuh
Hwang, Chi-Chuan
author_sort Chen, Yen-Chun
collection PubMed
description Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.
format Online
Article
Text
id pubmed-3639258
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36392582013-05-01 Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly Chen, Yen-Chun Liu, Tsunglin Yu, Chun-Hui Chiang, Tzen-Yuh Hwang, Chi-Chuan PLoS One Research Article Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias. Public Library of Science 2013-04-29 /pmc/articles/PMC3639258/ /pubmed/23638157 http://dx.doi.org/10.1371/journal.pone.0062856 Text en © 2013 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chen, Yen-Chun
Liu, Tsunglin
Yu, Chun-Hui
Chiang, Tzen-Yuh
Hwang, Chi-Chuan
Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
title Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
title_full Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
title_fullStr Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
title_full_unstemmed Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
title_short Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
title_sort effects of gc bias in next-generation-sequencing data on de novo genome assembly
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3639258/
https://www.ncbi.nlm.nih.gov/pubmed/23638157
http://dx.doi.org/10.1371/journal.pone.0062856
work_keys_str_mv AT chenyenchun effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT liutsunglin effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT yuchunhui effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT chiangtzenyuh effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly
AT hwangchichuan effectsofgcbiasinnextgenerationsequencingdataondenovogenomeassembly