Cargando…

An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data

BACKGROUND: Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kuk, Anthony YC, Li, Xiang, Xu, Jinfeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847674/ https://www.ncbi.nlm.nih.gov/pubmed/24034507 http://dx.doi.org/10.1186/1471-2156-14-82

_version_	1782293641607249920
author	Kuk, Anthony YC Li, Xiang Xu, Jinfeng
author_facet	Kuk, Anthony YC Li, Xiang Xu, Jinfeng
author_sort	Kuk, Anthony YC
collection	PubMed
description	BACKGROUND: Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete data, the expectation-maximization (EM) algorithm is the natural algorithm to use, but it is computationally intensive. A recent proposal to reduce the computational burden is to make use of database information to form a list of frequently occurring haplotypes, and to restrict the haplotypes to come from this list only in implementing the EM algorithm. There is, however, the danger of using an incorrect list, and there may not be enough database information to form a list externally in some applications. RESULTS: We investigate the possibility of creating an internal list from the data at hand. One way to form such a list is to collapse the observed total minor allele frequencies to “zero” or “at least one”, which is shown to have the desirable effect of amplifying the haplotype frequencies. To improve coverage, we propose ways to add and remove haplotypes from the list, and a benchmarking method to determine the frequency threshold for removing haplotypes. Simulation results show that the EM estimates based on a suitably augmented and trimmed collapsed data list (ATCDL) perform satisfactorily. In two scenarios involving 25 and 32 loci respectively, the EM-ATCDL estimates outperform the EM estimates based on other lists as well as the collapsed data maximum likelihood estimates. CONCLUSIONS: The proposed augmented and trimmed CD list is a useful list for the EM algorithm to base upon in estimating the haplotype distributions of rare variants. It can handle more markers and larger pool size than existing methods, and the resulting EM-ATCDL estimates are more efficient than the EM estimates based on other lists.
format	Online Article Text
id	pubmed-3847674
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-38476742013-12-05 An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data Kuk, Anthony YC Li, Xiang Xu, Jinfeng BMC Genet Methodology Article BACKGROUND: Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete data, the expectation-maximization (EM) algorithm is the natural algorithm to use, but it is computationally intensive. A recent proposal to reduce the computational burden is to make use of database information to form a list of frequently occurring haplotypes, and to restrict the haplotypes to come from this list only in implementing the EM algorithm. There is, however, the danger of using an incorrect list, and there may not be enough database information to form a list externally in some applications. RESULTS: We investigate the possibility of creating an internal list from the data at hand. One way to form such a list is to collapse the observed total minor allele frequencies to “zero” or “at least one”, which is shown to have the desirable effect of amplifying the haplotype frequencies. To improve coverage, we propose ways to add and remove haplotypes from the list, and a benchmarking method to determine the frequency threshold for removing haplotypes. Simulation results show that the EM estimates based on a suitably augmented and trimmed collapsed data list (ATCDL) perform satisfactorily. In two scenarios involving 25 and 32 loci respectively, the EM-ATCDL estimates outperform the EM estimates based on other lists as well as the collapsed data maximum likelihood estimates. CONCLUSIONS: The proposed augmented and trimmed CD list is a useful list for the EM algorithm to base upon in estimating the haplotype distributions of rare variants. It can handle more markers and larger pool size than existing methods, and the resulting EM-ATCDL estimates are more efficient than the EM estimates based on other lists. BioMed Central 2013-09-13 /pmc/articles/PMC3847674/ /pubmed/24034507 http://dx.doi.org/10.1186/1471-2156-14-82 Text en Copyright © 2013 Kuk et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Kuk, Anthony YC Li, Xiang Xu, Jinfeng An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
title	An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
title_full	An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
title_fullStr	An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
title_full_unstemmed	An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
title_short	An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
title_sort	em algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847674/ https://www.ncbi.nlm.nih.gov/pubmed/24034507 http://dx.doi.org/10.1186/1471-2156-14-82
work_keys_str_mv	AT kukanthonyyc anemalgorithmbasedonaninternallistforestimatinghaplotypedistributionsofrarevariantsfrompooledgenotypedata AT lixiang anemalgorithmbasedonaninternallistforestimatinghaplotypedistributionsofrarevariantsfrompooledgenotypedata AT xujinfeng anemalgorithmbasedonaninternallistforestimatinghaplotypedistributionsofrarevariantsfrompooledgenotypedata AT kukanthonyyc emalgorithmbasedonaninternallistforestimatinghaplotypedistributionsofrarevariantsfrompooledgenotypedata AT lixiang emalgorithmbasedonaninternallistforestimatinghaplotypedistributionsofrarevariantsfrompooledgenotypedata AT xujinfeng emalgorithmbasedonaninternallistforestimatinghaplotypedistributionsofrarevariantsfrompooledgenotypedata

An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data

Ejemplares similares