Cargando…

Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies

Multiple imputation (MI) is a well-established method for dealing with missing data. MI is computationally intensive when imputing missing covariates with high-dimensional outcome data (e.g., DNA methylation data in epigenome-wide association studies (EWAS)), because every outcome variable must be i...

Descripción completa

Detalles Bibliográficos
Autores principales: Mills, Harriet L, Heron, Jon, Relton, Caroline, Suderman, Matt, Tilling, Kate
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6825836/
https://www.ncbi.nlm.nih.gov/pubmed/31504104
http://dx.doi.org/10.1093/aje/kwz186
_version_ 1783464964205314048
author Mills, Harriet L
Heron, Jon
Relton, Caroline
Suderman, Matt
Tilling, Kate
author_facet Mills, Harriet L
Heron, Jon
Relton, Caroline
Suderman, Matt
Tilling, Kate
author_sort Mills, Harriet L
collection PubMed
description Multiple imputation (MI) is a well-established method for dealing with missing data. MI is computationally intensive when imputing missing covariates with high-dimensional outcome data (e.g., DNA methylation data in epigenome-wide association studies (EWAS)), because every outcome variable must be included in the imputation model to avoid biasing associations towards the null. Instead, EWAS analyses are reduced to only complete cases, limiting statistical power and potentially causing bias. We used simulations to compare 5 MI methods for high-dimensional data under 2 missingness mechanisms. All imputation methods had increased power over complete-case (C-C) analyses. Imputing missing values separately for each variable was computationally inefficient, but dividing sites at random into evenly sized bins improved efficiency and gave low bias. Methods imputing solely using subsets of sites identified by the C-C analysis suffered from bias towards the null. However, if these subsets were added into random bins of sites, this bias was reduced. The optimal methods were applied to an EWAS with missingness in covariates. All methods identified additional sites over the C-C analysis, and many of these sites had been replicated in other studies. These methods are also applicable to other high-dimensional data sets, including the rapidly expanding area of “-omics” studies.
format Online
Article
Text
id pubmed-6825836
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68258362019-11-07 Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies Mills, Harriet L Heron, Jon Relton, Caroline Suderman, Matt Tilling, Kate Am J Epidemiol Practice of Epidemiology Multiple imputation (MI) is a well-established method for dealing with missing data. MI is computationally intensive when imputing missing covariates with high-dimensional outcome data (e.g., DNA methylation data in epigenome-wide association studies (EWAS)), because every outcome variable must be included in the imputation model to avoid biasing associations towards the null. Instead, EWAS analyses are reduced to only complete cases, limiting statistical power and potentially causing bias. We used simulations to compare 5 MI methods for high-dimensional data under 2 missingness mechanisms. All imputation methods had increased power over complete-case (C-C) analyses. Imputing missing values separately for each variable was computationally inefficient, but dividing sites at random into evenly sized bins improved efficiency and gave low bias. Methods imputing solely using subsets of sites identified by the C-C analysis suffered from bias towards the null. However, if these subsets were added into random bins of sites, this bias was reduced. The optimal methods were applied to an EWAS with missingness in covariates. All methods identified additional sites over the C-C analysis, and many of these sites had been replicated in other studies. These methods are also applicable to other high-dimensional data sets, including the rapidly expanding area of “-omics” studies. Oxford University Press 2019-11 2019-09-05 /pmc/articles/PMC6825836/ /pubmed/31504104 http://dx.doi.org/10.1093/aje/kwz186 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0 (http://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Practice of Epidemiology
Mills, Harriet L
Heron, Jon
Relton, Caroline
Suderman, Matt
Tilling, Kate
Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
title Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
title_full Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
title_fullStr Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
title_full_unstemmed Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
title_short Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies
title_sort methods for dealing with missing covariate data in epigenome-wide association studies
topic Practice of Epidemiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6825836/
https://www.ncbi.nlm.nih.gov/pubmed/31504104
http://dx.doi.org/10.1093/aje/kwz186
work_keys_str_mv AT millsharrietl methodsfordealingwithmissingcovariatedatainepigenomewideassociationstudies
AT heronjon methodsfordealingwithmissingcovariatedatainepigenomewideassociationstudies
AT reltoncaroline methodsfordealingwithmissingcovariatedatainepigenomewideassociationstudies
AT sudermanmatt methodsfordealingwithmissingcovariatedatainepigenomewideassociationstudies
AT tillingkate methodsfordealingwithmissingcovariatedatainepigenomewideassociationstudies