Cargando…

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications

BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisiti...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Ziwen, Li, Xinnian, Ling, Shaoping, Fu, Yun-Xin, Hungate, Eric, Shi, Suhua, Wu, Chung-I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/
https://www.ncbi.nlm.nih.gov/pubmed/23919637
http://dx.doi.org/10.1186/1471-2164-14-535
_version_ 1782477118214504448
author He, Ziwen
Li, Xinnian
Ling, Shaoping
Fu, Yun-Xin
Hungate, Eric
Shi, Suhua
Wu, Chung-I
author_facet He, Ziwen
Li, Xinnian
Ling, Shaoping
Fu, Yun-Xin
Hungate, Eric
Shi, Suhua
Wu, Chung-I
author_sort He, Ziwen
collection PubMed
description BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice.
format Online
Article
Text
id pubmed-3750404
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37504042013-08-27 Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I BMC Genomics Methodology Article BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice. BioMed Central 2013-08-07 /pmc/articles/PMC3750404/ /pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535 Text en Copyright © 2013 He et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
He, Ziwen
Li, Xinnian
Ling, Shaoping
Fu, Yun-Xin
Hungate, Eric
Shi, Suhua
Wu, Chung-I
Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_full Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_fullStr Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_full_unstemmed Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_short Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_sort estimating dna polymorphism from next generation sequencing data with high error rate by dual sequencing applications
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/
https://www.ncbi.nlm.nih.gov/pubmed/23919637
http://dx.doi.org/10.1186/1471-2164-14-535
work_keys_str_mv AT heziwen estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications
AT lixinnian estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications
AT lingshaoping estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications
AT fuyunxin estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications
AT hungateeric estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications
AT shisuhua estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications
AT wuchungi estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications