Cargando…

Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information

In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are disca...

Descripción completa

Detalles Bibliográficos
Autores principales: Graffelman, Jan, Sánchez, Milagros, Cook, Samantha, Moreno, Victor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3877411/
https://www.ncbi.nlm.nih.gov/pubmed/24391752
http://dx.doi.org/10.1371/journal.pone.0083316
_version_ 1782297642653450240
author Graffelman, Jan
Sánchez, Milagros
Cook, Samantha
Moreno, Victor
author_facet Graffelman, Jan
Sánchez, Milagros
Cook, Samantha
Moreno, Victor
author_sort Graffelman, Jan
collection PubMed
description In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
format Online
Article
Text
id pubmed-3877411
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38774112014-01-03 Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information Graffelman, Jan Sánchez, Milagros Cook, Samantha Moreno, Victor PLoS One Research Article In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions. Public Library of Science 2013-12-31 /pmc/articles/PMC3877411/ /pubmed/24391752 http://dx.doi.org/10.1371/journal.pone.0083316 Text en © 2013 Graffelman et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Graffelman, Jan
Sánchez, Milagros
Cook, Samantha
Moreno, Victor
Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
title Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
title_full Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
title_fullStr Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
title_full_unstemmed Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
title_short Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information
title_sort statistical inference for hardy-weinberg proportions in the presence of missing genotype information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3877411/
https://www.ncbi.nlm.nih.gov/pubmed/24391752
http://dx.doi.org/10.1371/journal.pone.0083316
work_keys_str_mv AT graffelmanjan statisticalinferenceforhardyweinbergproportionsinthepresenceofmissinggenotypeinformation
AT sanchezmilagros statisticalinferenceforhardyweinbergproportionsinthepresenceofmissinggenotypeinformation
AT cooksamantha statisticalinferenceforhardyweinbergproportionsinthepresenceofmissinggenotypeinformation
AT morenovictor statisticalinferenceforhardyweinbergproportionsinthepresenceofmissinggenotypeinformation