Cargando…

284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us

OBJECTIVES/GOALS: While disease subtypes are critical for precision medicine, most projects use unipartite clustering methods such as k-means which are not fully automated, do not provide statistical significance, and are difficult to interpret. These gaps were addressed through bipartite networks a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bhavnani, Suresh K., Zhang, Weibin, Bao, Daniel, Hatch, Sandra, Reistetter, Timothy, Downer, Brian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cambridge University Press 2023
Materias:	Precision Medicine/Health
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129454/ http://dx.doi.org/10.1017/cts.2023.340

_version_	1785030741372960768
author	Bhavnani, Suresh K. Zhang, Weibin Bao, Daniel Hatch, Sandra Reistetter, Timothy Downer, Brian
author_facet	Bhavnani, Suresh K. Zhang, Weibin Bao, Daniel Hatch, Sandra Reistetter, Timothy Downer, Brian
author_sort	Bhavnani, Suresh K.
collection	PubMed
description	OBJECTIVES/GOALS: While disease subtypes are critical for precision medicine, most projects use unipartite clustering methods such as k-means which are not fully automated, do not provide statistical significance, and are difficult to interpret. These gaps were addressed through bipartite networks and tested for generalizability on three national databases. METHODS/STUDY POPULATION: Data. All participants with self-reported stroke from the 2010 Health and Retirement Study (HRS), with cases (n=798) having one or more 8 depressive symptoms measured by the Centers for the Epidemiological Study–Depression 8 scale, and controls (n=389) with none of those symptoms. The replication data set consisted of independent identically-defined participants (cases=725, controls=190) from 1998 HRS. Method. (1) Bipartite network analysis and modularity maximization to automatically identify patient-symptom biclusters with significance. (2) Rand Index to measure the replicability of symptom co-occurrences in the replication data. (3) ExplodeLayout to visualize and interpret the subtypes. (4) R libraries to generalize the methods, upload them to CRAN, and then tested on the N3C and All of Us platforms. RESULTS/ANTICIPATED RESULTS: The analysis identified 4 depressive symptom subtypes (https://postimg.cc/Ny8YwXJW) which had significant modularity (Q=0.26, z=3.03, P DISCUSSION/SIGNIFICANCE: We developed generalizable methods to automatically identify biclusters, measure the clustering significance, and visualize the results for interpretation. These methods were successfully tested on three national level data bases. Such generalizable methods should accelerate the analysis of subtypes, and the design of targeted interventions.
format	Online Article Text
id	pubmed-10129454
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cambridge University Press
record_format	MEDLINE/PubMed
spelling	pubmed-101294542023-04-26 284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us Bhavnani, Suresh K. Zhang, Weibin Bao, Daniel Hatch, Sandra Reistetter, Timothy Downer, Brian J Clin Transl Sci Precision Medicine/Health OBJECTIVES/GOALS: While disease subtypes are critical for precision medicine, most projects use unipartite clustering methods such as k-means which are not fully automated, do not provide statistical significance, and are difficult to interpret. These gaps were addressed through bipartite networks and tested for generalizability on three national databases. METHODS/STUDY POPULATION: Data. All participants with self-reported stroke from the 2010 Health and Retirement Study (HRS), with cases (n=798) having one or more 8 depressive symptoms measured by the Centers for the Epidemiological Study–Depression 8 scale, and controls (n=389) with none of those symptoms. The replication data set consisted of independent identically-defined participants (cases=725, controls=190) from 1998 HRS. Method. (1) Bipartite network analysis and modularity maximization to automatically identify patient-symptom biclusters with significance. (2) Rand Index to measure the replicability of symptom co-occurrences in the replication data. (3) ExplodeLayout to visualize and interpret the subtypes. (4) R libraries to generalize the methods, upload them to CRAN, and then tested on the N3C and All of Us platforms. RESULTS/ANTICIPATED RESULTS: The analysis identified 4 depressive symptom subtypes (https://postimg.cc/Ny8YwXJW) which had significant modularity (Q=0.26, z=3.03, P DISCUSSION/SIGNIFICANCE: We developed generalizable methods to automatically identify biclusters, measure the clustering significance, and visualize the results for interpretation. These methods were successfully tested on three national level data bases. Such generalizable methods should accelerate the analysis of subtypes, and the design of targeted interventions. Cambridge University Press 2023-04-24 /pmc/articles/PMC10129454/ http://dx.doi.org/10.1017/cts.2023.340 Text en © The Association for Clinical and Translational Science 2023 https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
spellingShingle	Precision Medicine/Health Bhavnani, Suresh K. Zhang, Weibin Bao, Daniel Hatch, Sandra Reistetter, Timothy Downer, Brian 284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us
title	284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us
title_full	284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us
title_fullStr	284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us
title_full_unstemmed	284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us
title_short	284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us
title_sort	284 generalizable machine learning methods for subtyping individuals on national health databases: case studies using data from hrs, n3c, and all of us
topic	Precision Medicine/Health
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129454/ http://dx.doi.org/10.1017/cts.2023.340
work_keys_str_mv	AT bhavnanisureshk 284generalizablemachinelearningmethodsforsubtypingindividualsonnationalhealthdatabasescasestudiesusingdatafromhrsn3candallofus AT zhangweibin 284generalizablemachinelearningmethodsforsubtypingindividualsonnationalhealthdatabasescasestudiesusingdatafromhrsn3candallofus AT baodaniel 284generalizablemachinelearningmethodsforsubtypingindividualsonnationalhealthdatabasescasestudiesusingdatafromhrsn3candallofus AT hatchsandra 284generalizablemachinelearningmethodsforsubtypingindividualsonnationalhealthdatabasescasestudiesusingdatafromhrsn3candallofus AT reistettertimothy 284generalizablemachinelearningmethodsforsubtypingindividualsonnationalhealthdatabasescasestudiesusingdatafromhrsn3candallofus AT downerbrian 284generalizablemachinelearningmethodsforsubtypingindividualsonnationalhealthdatabasescasestudiesusingdatafromhrsn3candallofus

284 Generalizable Machine Learning Methods for Subtyping Individuals on National Health Databases: Case Studies Using Data from HRS, N3C, and All of Us

Ejemplares similares