Cargando…

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications

BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisiti...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Ziwen, Li, Xinnian, Ling, Shaoping, Fu, Yun-Xin, Hungate, Eric, Shi, Suhua, Wu, Chung-I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750404/
https://www.ncbi.nlm.nih.gov/pubmed/23919637
http://dx.doi.org/10.1186/1471-2164-14-535
Descripción
Sumario:BACKGROUND: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data. RESULTS: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low. CONCLUSIONS: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice.