Cargando…
Efficiency of multiple imputation to test for association in the presence of missing data
The presence of missing data in association studies is an important problem, particularly with high-density single-nucleotide polymorphism (SNP) maps, because the probability that at least one genotype is missing dramatically increases with the number of markers. A possible strategy is to simply ign...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367517/ https://www.ncbi.nlm.nih.gov/pubmed/18466521 |
_version_ | 1782154310748995584 |
---|---|
author | Croiseau, Pascal Bardel, Claire Génin, Emmanuelle |
author_facet | Croiseau, Pascal Bardel, Claire Génin, Emmanuelle |
author_sort | Croiseau, Pascal |
collection | PubMed |
description | The presence of missing data in association studies is an important problem, particularly with high-density single-nucleotide polymorphism (SNP) maps, because the probability that at least one genotype is missing dramatically increases with the number of markers. A possible strategy is to simply ignore the missing data and only use the complete observations, and, consequently, to accept a significant decrease of the sample size. Using Genetic Analysis Workshop 15 simulated data on which we removed some genotypes to generate different levels of missing data, we show that this strategy might lead to an important loss in power to detect association, but may also result in false conclusions regarding the most likely susceptibility site if another marker is in linkage disequilibrium with the disease susceptibility site. We propose a multiple imputation approach to deal with missing data on case-parent trios and evaluated the performance of this approach on the same simulated data. We found that our multiple imputation approach has high power to detect association with the susceptibility site even with a large amount of missing data, and can identify the susceptibility sites among a set of sites in linkage disequilibrium. |
format | Text |
id | pubmed-2367517 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23675172008-05-06 Efficiency of multiple imputation to test for association in the presence of missing data Croiseau, Pascal Bardel, Claire Génin, Emmanuelle BMC Proc Proceedings The presence of missing data in association studies is an important problem, particularly with high-density single-nucleotide polymorphism (SNP) maps, because the probability that at least one genotype is missing dramatically increases with the number of markers. A possible strategy is to simply ignore the missing data and only use the complete observations, and, consequently, to accept a significant decrease of the sample size. Using Genetic Analysis Workshop 15 simulated data on which we removed some genotypes to generate different levels of missing data, we show that this strategy might lead to an important loss in power to detect association, but may also result in false conclusions regarding the most likely susceptibility site if another marker is in linkage disequilibrium with the disease susceptibility site. We propose a multiple imputation approach to deal with missing data on case-parent trios and evaluated the performance of this approach on the same simulated data. We found that our multiple imputation approach has high power to detect association with the susceptibility site even with a large amount of missing data, and can identify the susceptibility sites among a set of sites in linkage disequilibrium. BioMed Central 2007-12-18 /pmc/articles/PMC2367517/ /pubmed/18466521 Text en Copyright © 2007 Croiseau et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Croiseau, Pascal Bardel, Claire Génin, Emmanuelle Efficiency of multiple imputation to test for association in the presence of missing data |
title | Efficiency of multiple imputation to test for association in the presence of missing data |
title_full | Efficiency of multiple imputation to test for association in the presence of missing data |
title_fullStr | Efficiency of multiple imputation to test for association in the presence of missing data |
title_full_unstemmed | Efficiency of multiple imputation to test for association in the presence of missing data |
title_short | Efficiency of multiple imputation to test for association in the presence of missing data |
title_sort | efficiency of multiple imputation to test for association in the presence of missing data |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367517/ https://www.ncbi.nlm.nih.gov/pubmed/18466521 |
work_keys_str_mv | AT croiseaupascal efficiencyofmultipleimputationtotestforassociationinthepresenceofmissingdata AT bardelclaire efficiencyofmultipleimputationtotestforassociationinthepresenceofmissingdata AT geninemmanuelle efficiencyofmultipleimputationtotestforassociationinthepresenceofmissingdata |