Cargando…

Joint screening of ultrahigh dimensional variables for family-based genetic studies

BACKGROUND: Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit an...

Descripción completa

Detalles Bibliográficos
Autores principales: Datta, Subha, Fang, Yixin, Loh, Ji Meng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6156922/
https://www.ncbi.nlm.nih.gov/pubmed/30263041
http://dx.doi.org/10.1186/s12919-018-0120-2
_version_ 1783358183029342208
author Datta, Subha
Fang, Yixin
Loh, Ji Meng
author_facet Datta, Subha
Fang, Yixin
Loh, Ji Meng
author_sort Datta, Subha
collection PubMed
description BACKGROUND: Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit any mixed effect model. METHODS: We propose a two-stage strategy, screening genetic variables in the first stage and then fitting the mixed effect model in the second stage to those variables that survive the screening. For the screening stage, we can use the sure independence screening (SIS) procedure, which fits the mixed effect model to one genetic variable at a time. Because the SIS procedure may fail to identify those marginally unimportant but jointly important genetic variables, we propose a joint screening (JS) procedure that screens all the genetic variables simultaneously. We evaluate the performance of the proposed JS procedure via a simulation study and an application to the GAW20 data. RESULTS: We perform the proposed JS procedure on the GAW20 representative simulated data set (n = 680 participant(s) and p = 463,995 CpG cytosine-phosphate-guanine [CpG] sites) and select the top d = ⌊n/ log(n)⌋ variables. Then we fit the mixed model using these top variables. Under significance level, 5%, 43 CpG sites are found to be significant. Some diagnostic analyses based on the residuals show the fitted mixed model is appropriate. CONCLUSIONS: Although the GAW20 data set is ultrahigh dimensional and family-based having within group variances, we were successful in performing subset selection using a two-step strategy that is computationally simple and easy to understand.
format Online
Article
Text
id pubmed-6156922
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61569222018-09-27 Joint screening of ultrahigh dimensional variables for family-based genetic studies Datta, Subha Fang, Yixin Loh, Ji Meng BMC Proc Proceedings BACKGROUND: Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit any mixed effect model. METHODS: We propose a two-stage strategy, screening genetic variables in the first stage and then fitting the mixed effect model in the second stage to those variables that survive the screening. For the screening stage, we can use the sure independence screening (SIS) procedure, which fits the mixed effect model to one genetic variable at a time. Because the SIS procedure may fail to identify those marginally unimportant but jointly important genetic variables, we propose a joint screening (JS) procedure that screens all the genetic variables simultaneously. We evaluate the performance of the proposed JS procedure via a simulation study and an application to the GAW20 data. RESULTS: We perform the proposed JS procedure on the GAW20 representative simulated data set (n = 680 participant(s) and p = 463,995 CpG cytosine-phosphate-guanine [CpG] sites) and select the top d = ⌊n/ log(n)⌋ variables. Then we fit the mixed model using these top variables. Under significance level, 5%, 43 CpG sites are found to be significant. Some diagnostic analyses based on the residuals show the fitted mixed model is appropriate. CONCLUSIONS: Although the GAW20 data set is ultrahigh dimensional and family-based having within group variances, we were successful in performing subset selection using a two-step strategy that is computationally simple and easy to understand. BioMed Central 2018-09-17 /pmc/articles/PMC6156922/ /pubmed/30263041 http://dx.doi.org/10.1186/s12919-018-0120-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Datta, Subha
Fang, Yixin
Loh, Ji Meng
Joint screening of ultrahigh dimensional variables for family-based genetic studies
title Joint screening of ultrahigh dimensional variables for family-based genetic studies
title_full Joint screening of ultrahigh dimensional variables for family-based genetic studies
title_fullStr Joint screening of ultrahigh dimensional variables for family-based genetic studies
title_full_unstemmed Joint screening of ultrahigh dimensional variables for family-based genetic studies
title_short Joint screening of ultrahigh dimensional variables for family-based genetic studies
title_sort joint screening of ultrahigh dimensional variables for family-based genetic studies
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6156922/
https://www.ncbi.nlm.nih.gov/pubmed/30263041
http://dx.doi.org/10.1186/s12919-018-0120-2
work_keys_str_mv AT dattasubha jointscreeningofultrahighdimensionalvariablesforfamilybasedgeneticstudies
AT fangyixin jointscreeningofultrahighdimensionalvariablesforfamilybasedgeneticstudies
AT lohjimeng jointscreeningofultrahighdimensionalvariablesforfamilybasedgeneticstudies