Cargando…

Efficiency of multiple imputation to test for association in the presence of missing data

The presence of missing data in association studies is an important problem, particularly with high-density single-nucleotide polymorphism (SNP) maps, because the probability that at least one genotype is missing dramatically increases with the number of markers. A possible strategy is to simply ign...

Descripción completa

Detalles Bibliográficos
Autores principales: Croiseau, Pascal, Bardel, Claire, Génin, Emmanuelle
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367517/
https://www.ncbi.nlm.nih.gov/pubmed/18466521
_version_ 1782154310748995584
author Croiseau, Pascal
Bardel, Claire
Génin, Emmanuelle
author_facet Croiseau, Pascal
Bardel, Claire
Génin, Emmanuelle
author_sort Croiseau, Pascal
collection PubMed
description The presence of missing data in association studies is an important problem, particularly with high-density single-nucleotide polymorphism (SNP) maps, because the probability that at least one genotype is missing dramatically increases with the number of markers. A possible strategy is to simply ignore the missing data and only use the complete observations, and, consequently, to accept a significant decrease of the sample size. Using Genetic Analysis Workshop 15 simulated data on which we removed some genotypes to generate different levels of missing data, we show that this strategy might lead to an important loss in power to detect association, but may also result in false conclusions regarding the most likely susceptibility site if another marker is in linkage disequilibrium with the disease susceptibility site. We propose a multiple imputation approach to deal with missing data on case-parent trios and evaluated the performance of this approach on the same simulated data. We found that our multiple imputation approach has high power to detect association with the susceptibility site even with a large amount of missing data, and can identify the susceptibility sites among a set of sites in linkage disequilibrium.
format Text
id pubmed-2367517
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23675172008-05-06 Efficiency of multiple imputation to test for association in the presence of missing data Croiseau, Pascal Bardel, Claire Génin, Emmanuelle BMC Proc Proceedings The presence of missing data in association studies is an important problem, particularly with high-density single-nucleotide polymorphism (SNP) maps, because the probability that at least one genotype is missing dramatically increases with the number of markers. A possible strategy is to simply ignore the missing data and only use the complete observations, and, consequently, to accept a significant decrease of the sample size. Using Genetic Analysis Workshop 15 simulated data on which we removed some genotypes to generate different levels of missing data, we show that this strategy might lead to an important loss in power to detect association, but may also result in false conclusions regarding the most likely susceptibility site if another marker is in linkage disequilibrium with the disease susceptibility site. We propose a multiple imputation approach to deal with missing data on case-parent trios and evaluated the performance of this approach on the same simulated data. We found that our multiple imputation approach has high power to detect association with the susceptibility site even with a large amount of missing data, and can identify the susceptibility sites among a set of sites in linkage disequilibrium. BioMed Central 2007-12-18 /pmc/articles/PMC2367517/ /pubmed/18466521 Text en Copyright © 2007 Croiseau et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Croiseau, Pascal
Bardel, Claire
Génin, Emmanuelle
Efficiency of multiple imputation to test for association in the presence of missing data
title Efficiency of multiple imputation to test for association in the presence of missing data
title_full Efficiency of multiple imputation to test for association in the presence of missing data
title_fullStr Efficiency of multiple imputation to test for association in the presence of missing data
title_full_unstemmed Efficiency of multiple imputation to test for association in the presence of missing data
title_short Efficiency of multiple imputation to test for association in the presence of missing data
title_sort efficiency of multiple imputation to test for association in the presence of missing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367517/
https://www.ncbi.nlm.nih.gov/pubmed/18466521
work_keys_str_mv AT croiseaupascal efficiencyofmultipleimputationtotestforassociationinthepresenceofmissingdata
AT bardelclaire efficiencyofmultipleimputationtotestforassociationinthepresenceofmissingdata
AT geninemmanuelle efficiencyofmultipleimputationtotestforassociationinthepresenceofmissingdata