Cargando…
Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399453/ https://www.ncbi.nlm.nih.gov/pubmed/37547462 http://dx.doi.org/10.3389/fgene.2023.1162690 |
_version_ | 1785084248620793856 |
---|---|
author | Gentry, Amanda Elswick Kirkpatrick, Robert M. Peterson, Roseann E. Webb, Bradley T. |
author_facet | Gentry, Amanda Elswick Kirkpatrick, Robert M. Peterson, Roseann E. Webb, Bradley T. |
author_sort | Gentry, Amanda Elswick |
collection | PubMed |
description | Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. Methods: To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. Results: The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Discussion: Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery. |
format | Online Article Text |
id | pubmed-10399453 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-103994532023-08-04 Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery Gentry, Amanda Elswick Kirkpatrick, Robert M. Peterson, Roseann E. Webb, Bradley T. Front Genet Genetics Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. Methods: To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. Results: The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Discussion: Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery. Frontiers Media S.A. 2023-07-20 /pmc/articles/PMC10399453/ /pubmed/37547462 http://dx.doi.org/10.3389/fgene.2023.1162690 Text en Copyright © 2023 Gentry, Kirkpatrick, Peterson and Webb. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Gentry, Amanda Elswick Kirkpatrick, Robert M. Peterson, Roseann E. Webb, Bradley T. Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
title | Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
title_full | Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
title_fullStr | Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
title_full_unstemmed | Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
title_short | Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
title_sort | missingness adapted group informed clustered (magic)-lasso: a novel paradigm for phenotype prediction to improve power for genetic loci discovery |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399453/ https://www.ncbi.nlm.nih.gov/pubmed/37547462 http://dx.doi.org/10.3389/fgene.2023.1162690 |
work_keys_str_mv | AT gentryamandaelswick missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery AT kirkpatrickrobertm missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery AT petersonroseanne missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery AT webbbradleyt missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery |