Cargando…
Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications
BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisiti...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/ https://www.ncbi.nlm.nih.gov/pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535 |
_version_ | 1782477118214504448 |
---|---|
author | He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I |
author_facet | He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I |
author_sort | He, Ziwen |
collection | PubMed |
description | BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice. |
format | Online Article Text |
id | pubmed-3750404 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-37504042013-08-27 Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I BMC Genomics Methodology Article BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice. BioMed Central 2013-08-07 /pmc/articles/PMC3750404/ /pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535 Text en Copyright © 2013 He et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article He, Ziwen Li, Xinnian Ling, Shaoping Fu, Yun-Xin Hungate, Eric Shi, Suhua Wu, Chung-I Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
title | Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
title_full | Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
title_fullStr | Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
title_full_unstemmed | Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
title_short | Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
title_sort | estimating dna polymorphism from next generation sequencing data with high error rate by dual sequencing applications |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/ https://www.ncbi.nlm.nih.gov/pubmed/23919637 http://dx.doi.org/10.1186/1471-2164-14-535 |
work_keys_str_mv | AT heziwen estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT lixinnian estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT lingshaoping estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT fuyunxin estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT hungateeric estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT shisuhua estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications AT wuchungi estimatingdnapolymorphismfromnextgenerationsequencingdatawithhigherrorratebydualsequencingapplications |