Cargando…

Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks

Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabet...

Descripción completa

Detalles Bibliográficos
Autores principales: Preo, Nicolo', Capobianco, Enrico
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931876/
https://www.ncbi.nlm.nih.gov/pubmed/33693353
http://dx.doi.org/10.3389/fdata.2019.00030
_version_ 1783660372867153920
author Preo, Nicolo'
Capobianco, Enrico
author_facet Preo, Nicolo'
Capobianco, Enrico
author_sort Preo, Nicolo'
collection PubMed
description Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabetes mellitus (T2D) screening when ad hoc models are used. About 10,000 US patients have been analyzed through a variety of inference techniques applied to all records with a variable degree of completeness. The analyses conducted in the reference study have indicated that EHR phenotypes significantly improved T2D detection. Methods: With these US patients and the T2D data evidenced in the above study, we propose an integrative inference approach that leverages the prediction power of EHR features selected by two well-known methods, Random Forests and Lasso. The goal is 2-fold: reducing the Big Data redundancies potentially harmful to the predictive learning task and exploiting the interconnectivity of EHR features. A mutual information (MI) network is the inference tool used to identify communities useful to prioritize significant T2D features underlying the similarity between patients. Results: Endowed with a different degree of granularity, the communities detected after the application of both methods were centered especially on T2D comorbidities and risk factors. As such, they appear very relevant for assessment of two main issues, T2D disease burden, and prevention. Conclusions: Our analytical approach offers a solution for managing the EHR scale factor in a complex disease context. EHR are rich sources of phenotypic diversity through which novel stratifications of patients are expected. To enable these results, both pre-screening of variables and calibration of risk prediction methods become necessary steps in EHR analyses. We have presented networks identifying major T2D communities. The specific significance assigned to comorbidities and risk factors in relation to T2D can be inferred with accuracy from just a suitably reduced number of EHR features.
format Online
Article
Text
id pubmed-7931876
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79318762021-03-09 Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks Preo, Nicolo' Capobianco, Enrico Front Big Data Big Data Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabetes mellitus (T2D) screening when ad hoc models are used. About 10,000 US patients have been analyzed through a variety of inference techniques applied to all records with a variable degree of completeness. The analyses conducted in the reference study have indicated that EHR phenotypes significantly improved T2D detection. Methods: With these US patients and the T2D data evidenced in the above study, we propose an integrative inference approach that leverages the prediction power of EHR features selected by two well-known methods, Random Forests and Lasso. The goal is 2-fold: reducing the Big Data redundancies potentially harmful to the predictive learning task and exploiting the interconnectivity of EHR features. A mutual information (MI) network is the inference tool used to identify communities useful to prioritize significant T2D features underlying the similarity between patients. Results: Endowed with a different degree of granularity, the communities detected after the application of both methods were centered especially on T2D comorbidities and risk factors. As such, they appear very relevant for assessment of two main issues, T2D disease burden, and prevention. Conclusions: Our analytical approach offers a solution for managing the EHR scale factor in a complex disease context. EHR are rich sources of phenotypic diversity through which novel stratifications of patients are expected. To enable these results, both pre-screening of variables and calibration of risk prediction methods become necessary steps in EHR analyses. We have presented networks identifying major T2D communities. The specific significance assigned to comorbidities and risk factors in relation to T2D can be inferred with accuracy from just a suitably reduced number of EHR features. Frontiers Media S.A. 2019-09-27 /pmc/articles/PMC7931876/ /pubmed/33693353 http://dx.doi.org/10.3389/fdata.2019.00030 Text en Copyright © 2019 Preo and Capobianco. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Preo, Nicolo'
Capobianco, Enrico
Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
title Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
title_full Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
title_fullStr Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
title_full_unstemmed Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
title_short Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
title_sort significant ehr feature-driven t2d inference: predictive machine learning and networks
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931876/
https://www.ncbi.nlm.nih.gov/pubmed/33693353
http://dx.doi.org/10.3389/fdata.2019.00030
work_keys_str_mv AT preonicolo significantehrfeaturedrivent2dinferencepredictivemachinelearningandnetworks
AT capobiancoenrico significantehrfeaturedrivent2dinferencepredictivemachinelearningandnetworks