Cargando…

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications

BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisiti...

Descripción completa

Detalles Bibliográficos
Autores principales:	He, Ziwen, Li, Xinnian, Ling, Shaoping, Fu, Yun-Xin, Hungate, Eric, Shi, Suhua, Wu, Chung-I
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/ https://www.ncbi.nlm.nih.gov/pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535

_version_	1782477118214504448
author	He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I
author_facet	He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I
author_sort	He, Ziwen
collection	PubMed
description	BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice.
format	Online Article Text
id	pubmed-3750404
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37504042013-08-27 Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I BMC Genomics Methodology Article BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice. BioMed Central 2013-08-07 /pmc/articles/PMC3750404/ /pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535 Text en Copyright © 2013 He et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title	Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_full	Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_fullStr	Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_full_unstemmed	Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_short	Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
title_sort	estimating dna polymorphism from next generation sequencing data with high error rate by dual sequencing applications
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/ https://www.ncbi.nlm.nih.gov/pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535
work_keys_str_mv	AT heziwen estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT lixinnian estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT lingshaoping estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT fuyunxin estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT hungateeric estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT shisuhua estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT wuchungi estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications

Ejemplares similares