Cargando…

Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation

Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Yan, Samuels, David C., Li, Jiang, Clark, Travis, Li, Chung-I, Shyr, Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582166/
https://www.ncbi.nlm.nih.gov/pubmed/23476151
http://dx.doi.org/10.1155/2013/895496
_version_ 1782260547187638272
author Guo, Yan
Samuels, David C.
Li, Jiang
Clark, Travis
Li, Chung-I
Shyr, Yu
author_facet Guo, Yan
Samuels, David C.
Li, Jiang
Clark, Travis
Li, Chung-I
Shyr, Yu
author_sort Guo, Yan
collection PubMed
description Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample.
format Online
Article
Text
id pubmed-3582166
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-35821662013-03-09 Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation Guo, Yan Samuels, David C. Li, Jiang Clark, Travis Li, Chung-I Shyr, Yu ScientificWorldJournal Research Article Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample. Hindawi Publishing Corporation 2013-02-07 /pmc/articles/PMC3582166/ /pubmed/23476151 http://dx.doi.org/10.1155/2013/895496 Text en Copyright © 2013 Yan Guo et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Guo, Yan
Samuels, David C.
Li, Jiang
Clark, Travis
Li, Chung-I
Shyr, Yu
Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
title Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
title_full Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
title_fullStr Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
title_full_unstemmed Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
title_short Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation
title_sort evaluation of allele frequency estimation using pooled sequencing data simulation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582166/
https://www.ncbi.nlm.nih.gov/pubmed/23476151
http://dx.doi.org/10.1155/2013/895496
work_keys_str_mv AT guoyan evaluationofallelefrequencyestimationusingpooledsequencingdatasimulation
AT samuelsdavidc evaluationofallelefrequencyestimationusingpooledsequencingdatasimulation
AT lijiang evaluationofallelefrequencyestimationusingpooledsequencingdatasimulation
AT clarktravis evaluationofallelefrequencyestimationusingpooledsequencingdatasimulation
AT lichungi evaluationofallelefrequencyestimationusingpooledsequencingdatasimulation
AT shyryu evaluationofallelefrequencyestimationusingpooledsequencingdatasimulation