Cargando…
Joint screening of ultrahigh dimensional variables for family-based genetic studies
BACKGROUND: Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6156922/ https://www.ncbi.nlm.nih.gov/pubmed/30263041 http://dx.doi.org/10.1186/s12919-018-0120-2 |
_version_ | 1783358183029342208 |
---|---|
author | Datta, Subha Fang, Yixin Loh, Ji Meng |
author_facet | Datta, Subha Fang, Yixin Loh, Ji Meng |
author_sort | Datta, Subha |
collection | PubMed |
description | BACKGROUND: Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit any mixed effect model. METHODS: We propose a two-stage strategy, screening genetic variables in the first stage and then fitting the mixed effect model in the second stage to those variables that survive the screening. For the screening stage, we can use the sure independence screening (SIS) procedure, which fits the mixed effect model to one genetic variable at a time. Because the SIS procedure may fail to identify those marginally unimportant but jointly important genetic variables, we propose a joint screening (JS) procedure that screens all the genetic variables simultaneously. We evaluate the performance of the proposed JS procedure via a simulation study and an application to the GAW20 data. RESULTS: We perform the proposed JS procedure on the GAW20 representative simulated data set (n = 680 participant(s) and p = 463,995 CpG cytosine-phosphate-guanine [CpG] sites) and select the top d = ⌊n/ log(n)⌋ variables. Then we fit the mixed model using these top variables. Under significance level, 5%, 43 CpG sites are found to be significant. Some diagnostic analyses based on the residuals show the fitted mixed model is appropriate. CONCLUSIONS: Although the GAW20 data set is ultrahigh dimensional and family-based having within group variances, we were successful in performing subset selection using a two-step strategy that is computationally simple and easy to understand. |
format | Online Article Text |
id | pubmed-6156922 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61569222018-09-27 Joint screening of ultrahigh dimensional variables for family-based genetic studies Datta, Subha Fang, Yixin Loh, Ji Meng BMC Proc Proceedings BACKGROUND: Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit any mixed effect model. METHODS: We propose a two-stage strategy, screening genetic variables in the first stage and then fitting the mixed effect model in the second stage to those variables that survive the screening. For the screening stage, we can use the sure independence screening (SIS) procedure, which fits the mixed effect model to one genetic variable at a time. Because the SIS procedure may fail to identify those marginally unimportant but jointly important genetic variables, we propose a joint screening (JS) procedure that screens all the genetic variables simultaneously. We evaluate the performance of the proposed JS procedure via a simulation study and an application to the GAW20 data. RESULTS: We perform the proposed JS procedure on the GAW20 representative simulated data set (n = 680 participant(s) and p = 463,995 CpG cytosine-phosphate-guanine [CpG] sites) and select the top d = ⌊n/ log(n)⌋ variables. Then we fit the mixed model using these top variables. Under significance level, 5%, 43 CpG sites are found to be significant. Some diagnostic analyses based on the residuals show the fitted mixed model is appropriate. CONCLUSIONS: Although the GAW20 data set is ultrahigh dimensional and family-based having within group variances, we were successful in performing subset selection using a two-step strategy that is computationally simple and easy to understand. BioMed Central 2018-09-17 /pmc/articles/PMC6156922/ /pubmed/30263041 http://dx.doi.org/10.1186/s12919-018-0120-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Datta, Subha Fang, Yixin Loh, Ji Meng Joint screening of ultrahigh dimensional variables for family-based genetic studies |
title | Joint screening of ultrahigh dimensional variables for family-based genetic studies |
title_full | Joint screening of ultrahigh dimensional variables for family-based genetic studies |
title_fullStr | Joint screening of ultrahigh dimensional variables for family-based genetic studies |
title_full_unstemmed | Joint screening of ultrahigh dimensional variables for family-based genetic studies |
title_short | Joint screening of ultrahigh dimensional variables for family-based genetic studies |
title_sort | joint screening of ultrahigh dimensional variables for family-based genetic studies |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6156922/ https://www.ncbi.nlm.nih.gov/pubmed/30263041 http://dx.doi.org/10.1186/s12919-018-0120-2 |
work_keys_str_mv | AT dattasubha jointscreeningofultrahighdimensionalvariablesforfamilybasedgeneticstudies AT fangyixin jointscreeningofultrahighdimensionalvariablesforfamilybasedgeneticstudies AT lohjimeng jointscreeningofultrahighdimensionalvariablesforfamilybasedgeneticstudies |