Cargando…

Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research

INTRODUCTION: A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Anc...

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez-Pineda, Arturo, Vernekar, Manvi, Moreno-Grau, Sonia, Rojas-Muñoz, Agustin, Moatamed, Babak, Lee, Ming Ta Michael, Nava-Aguilar, Marco A., Gonzalez-Arroyo, Gilberto, Numakura, Kensuke, Matsuda, Yuta, Ioannidis, Alexander, Katsanis, Nicholas, Takano, Tomohiro, Bustamante, Carlos D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9452874/
https://www.ncbi.nlm.nih.gov/pubmed/36076307
http://dx.doi.org/10.1186/s40246-022-00406-y
_version_ 1784785012989624320
author Lopez-Pineda, Arturo
Vernekar, Manvi
Moreno-Grau, Sonia
Rojas-Muñoz, Agustin
Moatamed, Babak
Lee, Ming Ta Michael
Nava-Aguilar, Marco A.
Gonzalez-Arroyo, Gilberto
Numakura, Kensuke
Matsuda, Yuta
Ioannidis, Alexander
Katsanis, Nicholas
Takano, Tomohiro
Bustamante, Carlos D.
author_facet Lopez-Pineda, Arturo
Vernekar, Manvi
Moreno-Grau, Sonia
Rojas-Muñoz, Agustin
Moatamed, Babak
Lee, Ming Ta Michael
Nava-Aguilar, Marco A.
Gonzalez-Arroyo, Gilberto
Numakura, Kensuke
Matsuda, Yuta
Ioannidis, Alexander
Katsanis, Nicholas
Takano, Tomohiro
Bustamante, Carlos D.
author_sort Lopez-Pineda, Arturo
collection PubMed
description INTRODUCTION: A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Ancestry DNA, and MyHeritage, providing a potential mechanism for democratizing access to medical interventions and thus catalyzing improvements in patient outcomes as the cost of data acquisition drops. However, much of these data are sequestered in the initial provider network, without the ability for the scientific community to either access or validate. Here, we present a novel geno-pheno platform that integrates heterogeneous data sources and applies learnings to common chronic disease conditions including Type 2 diabetes (T2D) and hypertension. METHODS: We collected genotyped data from a novel DTC platform where participants upload their genotype data files and were invited to answer general health questionnaires regarding cardiometabolic traits over a period of 6 months. Quality control, imputation, and genome-wide association studies were performed on this dataset, and polygenic risk scores were built in a case–control setting using the BASIL algorithm. RESULTS: We collected data on N = 4,550 (389 cases / 4,161 controls) who reported being affected or previously affected for T2D and N = 4,528 (1,027 cases / 3,501 controls) for hypertension. We identified 164 out of 272 variants showing identical effect direction to previously reported genome-significant findings in Europeans. Performance metric of the PRS models was AUC = 0.68, which is comparable to previously published PRS models obtained with larger datasets including clinical biomarkers. DISCUSSION: DTC platforms have the potential of inverting research models of genome sequencing and phenotypic data acquisition. Quality control (QC) mechanisms proved to successfully enable traditional GWAS and PRS analyses. The direct participation of individuals has shown the potential to generate rich datasets enabling the creation of PRS cardiometabolic models. More importantly, federated learning of PRS from reuse of DTC data provides a mechanism for scaling precision health care delivery beyond the small number of countries who can afford to finance these efforts directly. CONCLUSIONS: The genetics of T2D and hypertension have been studied extensively in controlled datasets, and various polygenic risk scores (PRS) have been developed. We developed predictive tools for both phenotypes trained with heterogeneous genotypic and phenotypic data generated outside of the clinical environment and show that our methods can recapitulate prior findings with fidelity. From these observations, we conclude that it is possible to leverage DTC genetic repositories to identify individuals at risk of debilitating diseases based on their unique genetic landscape so that informed, timely clinical interventions can be incorporated. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40246-022-00406-y.
format Online
Article
Text
id pubmed-9452874
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94528742022-09-08 Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research Lopez-Pineda, Arturo Vernekar, Manvi Moreno-Grau, Sonia Rojas-Muñoz, Agustin Moatamed, Babak Lee, Ming Ta Michael Nava-Aguilar, Marco A. Gonzalez-Arroyo, Gilberto Numakura, Kensuke Matsuda, Yuta Ioannidis, Alexander Katsanis, Nicholas Takano, Tomohiro Bustamante, Carlos D. Hum Genomics Research INTRODUCTION: A major challenge to enabling precision health at a global scale is the bias between those who enroll in state sponsored genomic research and those suffering from chronic disease. More than 30 million people have been genotyped by direct-to-consumer (DTC) companies such as 23andMe, Ancestry DNA, and MyHeritage, providing a potential mechanism for democratizing access to medical interventions and thus catalyzing improvements in patient outcomes as the cost of data acquisition drops. However, much of these data are sequestered in the initial provider network, without the ability for the scientific community to either access or validate. Here, we present a novel geno-pheno platform that integrates heterogeneous data sources and applies learnings to common chronic disease conditions including Type 2 diabetes (T2D) and hypertension. METHODS: We collected genotyped data from a novel DTC platform where participants upload their genotype data files and were invited to answer general health questionnaires regarding cardiometabolic traits over a period of 6 months. Quality control, imputation, and genome-wide association studies were performed on this dataset, and polygenic risk scores were built in a case–control setting using the BASIL algorithm. RESULTS: We collected data on N = 4,550 (389 cases / 4,161 controls) who reported being affected or previously affected for T2D and N = 4,528 (1,027 cases / 3,501 controls) for hypertension. We identified 164 out of 272 variants showing identical effect direction to previously reported genome-significant findings in Europeans. Performance metric of the PRS models was AUC = 0.68, which is comparable to previously published PRS models obtained with larger datasets including clinical biomarkers. DISCUSSION: DTC platforms have the potential of inverting research models of genome sequencing and phenotypic data acquisition. Quality control (QC) mechanisms proved to successfully enable traditional GWAS and PRS analyses. The direct participation of individuals has shown the potential to generate rich datasets enabling the creation of PRS cardiometabolic models. More importantly, federated learning of PRS from reuse of DTC data provides a mechanism for scaling precision health care delivery beyond the small number of countries who can afford to finance these efforts directly. CONCLUSIONS: The genetics of T2D and hypertension have been studied extensively in controlled datasets, and various polygenic risk scores (PRS) have been developed. We developed predictive tools for both phenotypes trained with heterogeneous genotypic and phenotypic data generated outside of the clinical environment and show that our methods can recapitulate prior findings with fidelity. From these observations, we conclude that it is possible to leverage DTC genetic repositories to identify individuals at risk of debilitating diseases based on their unique genetic landscape so that informed, timely clinical interventions can be incorporated. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40246-022-00406-y. BioMed Central 2022-09-08 /pmc/articles/PMC9452874/ /pubmed/36076307 http://dx.doi.org/10.1186/s40246-022-00406-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lopez-Pineda, Arturo
Vernekar, Manvi
Moreno-Grau, Sonia
Rojas-Muñoz, Agustin
Moatamed, Babak
Lee, Ming Ta Michael
Nava-Aguilar, Marco A.
Gonzalez-Arroyo, Gilberto
Numakura, Kensuke
Matsuda, Yuta
Ioannidis, Alexander
Katsanis, Nicholas
Takano, Tomohiro
Bustamante, Carlos D.
Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
title Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
title_full Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
title_fullStr Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
title_full_unstemmed Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
title_short Validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
title_sort validating and automating learning of cardiometabolic polygenic risk scores from direct-to-consumer genetic and phenotypic data: implications for scaling precision health research
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9452874/
https://www.ncbi.nlm.nih.gov/pubmed/36076307
http://dx.doi.org/10.1186/s40246-022-00406-y
work_keys_str_mv AT lopezpinedaarturo validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT vernekarmanvi validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT morenograusonia validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT rojasmunozagustin validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT moatamedbabak validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT leemingtamichael validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT navaaguilarmarcoa validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT gonzalezarroyogilberto validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT numakurakensuke validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT matsudayuta validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT ioannidisalexander validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT katsanisnicholas validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT takanotomohiro validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch
AT bustamantecarlosd validatingandautomatinglearningofcardiometabolicpolygenicriskscoresfromdirecttoconsumergeneticandphenotypicdataimplicationsforscalingprecisionhealthresearch