Cargando…

Estimation of allele frequency and association mapping using next-generation sequencing data

BACKGROUND: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Su Yeon, Lohmueller, Kirk E, Albrechtsen, Anders, Li, Yingrui, Korneliussen, Thorfinn, Tian, Geng, Grarup, Niels, Jiang, Tao, Andersen, Gitte, Witte, Daniel, Jorgensen, Torben, Hansen, Torben, Pedersen, Oluf, Wang, Jun, Nielsen, Rasmus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3212839/
https://www.ncbi.nlm.nih.gov/pubmed/21663684
http://dx.doi.org/10.1186/1471-2105-12-231
_version_ 1782216033348616192
author Kim, Su Yeon
Lohmueller, Kirk E
Albrechtsen, Anders
Li, Yingrui
Korneliussen, Thorfinn
Tian, Geng
Grarup, Niels
Jiang, Tao
Andersen, Gitte
Witte, Daniel
Jorgensen, Torben
Hansen, Torben
Pedersen, Oluf
Wang, Jun
Nielsen, Rasmus
author_facet Kim, Su Yeon
Lohmueller, Kirk E
Albrechtsen, Anders
Li, Yingrui
Korneliussen, Thorfinn
Tian, Geng
Grarup, Niels
Jiang, Tao
Andersen, Gitte
Witte, Daniel
Jorgensen, Torben
Hansen, Torben
Pedersen, Oluf
Wang, Jun
Nielsen, Rasmus
author_sort Kim, Su Yeon
collection PubMed
description BACKGROUND: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. RESULTS: We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. CONCLUSIONS: Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
format Online
Article
Text
id pubmed-3212839
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32128392011-11-11 Estimation of allele frequency and association mapping using next-generation sequencing data Kim, Su Yeon Lohmueller, Kirk E Albrechtsen, Anders Li, Yingrui Korneliussen, Thorfinn Tian, Geng Grarup, Niels Jiang, Tao Andersen, Gitte Witte, Daniel Jorgensen, Torben Hansen, Torben Pedersen, Oluf Wang, Jun Nielsen, Rasmus BMC Bioinformatics Research Article BACKGROUND: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. RESULTS: We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. CONCLUSIONS: Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score. BioMed Central 2011-06-11 /pmc/articles/PMC3212839/ /pubmed/21663684 http://dx.doi.org/10.1186/1471-2105-12-231 Text en Copyright ©2011 Kim et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kim, Su Yeon
Lohmueller, Kirk E
Albrechtsen, Anders
Li, Yingrui
Korneliussen, Thorfinn
Tian, Geng
Grarup, Niels
Jiang, Tao
Andersen, Gitte
Witte, Daniel
Jorgensen, Torben
Hansen, Torben
Pedersen, Oluf
Wang, Jun
Nielsen, Rasmus
Estimation of allele frequency and association mapping using next-generation sequencing data
title Estimation of allele frequency and association mapping using next-generation sequencing data
title_full Estimation of allele frequency and association mapping using next-generation sequencing data
title_fullStr Estimation of allele frequency and association mapping using next-generation sequencing data
title_full_unstemmed Estimation of allele frequency and association mapping using next-generation sequencing data
title_short Estimation of allele frequency and association mapping using next-generation sequencing data
title_sort estimation of allele frequency and association mapping using next-generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3212839/
https://www.ncbi.nlm.nih.gov/pubmed/21663684
http://dx.doi.org/10.1186/1471-2105-12-231
work_keys_str_mv AT kimsuyeon estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT lohmuellerkirke estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT albrechtsenanders estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT liyingrui estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT korneliussenthorfinn estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT tiangeng estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT grarupniels estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT jiangtao estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT andersengitte estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT wittedaniel estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT jorgensentorben estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT hansentorben estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT pedersenoluf estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT wangjun estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata
AT nielsenrasmus estimationofallelefrequencyandassociationmappingusingnextgenerationsequencingdata