Cargando…

Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery

Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is s...

Descripción completa

Detalles Bibliográficos
Autores principales: Gentry, Amanda Elswick, Kirkpatrick, Robert M., Peterson, Roseann E., Webb, Bradley T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399453/
https://www.ncbi.nlm.nih.gov/pubmed/37547462
http://dx.doi.org/10.3389/fgene.2023.1162690
_version_ 1785084248620793856
author Gentry, Amanda Elswick
Kirkpatrick, Robert M.
Peterson, Roseann E.
Webb, Bradley T.
author_facet Gentry, Amanda Elswick
Kirkpatrick, Robert M.
Peterson, Roseann E.
Webb, Bradley T.
author_sort Gentry, Amanda Elswick
collection PubMed
description Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. Methods: To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. Results: The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Discussion: Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery.
format Online
Article
Text
id pubmed-10399453
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103994532023-08-04 Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery Gentry, Amanda Elswick Kirkpatrick, Robert M. Peterson, Roseann E. Webb, Bradley T. Front Genet Genetics Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. Methods: To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. Results: The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Discussion: Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery. Frontiers Media S.A. 2023-07-20 /pmc/articles/PMC10399453/ /pubmed/37547462 http://dx.doi.org/10.3389/fgene.2023.1162690 Text en Copyright © 2023 Gentry, Kirkpatrick, Peterson and Webb. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Gentry, Amanda Elswick
Kirkpatrick, Robert M.
Peterson, Roseann E.
Webb, Bradley T.
Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
title Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
title_full Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
title_fullStr Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
title_full_unstemmed Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
title_short Missingness adapted group informed clustered (MAGIC)-LASSO: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
title_sort missingness adapted group informed clustered (magic)-lasso: a novel paradigm for phenotype prediction to improve power for genetic loci discovery
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399453/
https://www.ncbi.nlm.nih.gov/pubmed/37547462
http://dx.doi.org/10.3389/fgene.2023.1162690
work_keys_str_mv AT gentryamandaelswick missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery
AT kirkpatrickrobertm missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery
AT petersonroseanne missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery
AT webbbradleyt missingnessadaptedgroupinformedclusteredmagiclassoanovelparadigmforphenotypepredictiontoimprovepowerforgeneticlocidiscovery