Cargando…
Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks
Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabet...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931876/ https://www.ncbi.nlm.nih.gov/pubmed/33693353 http://dx.doi.org/10.3389/fdata.2019.00030 |
_version_ | 1783660372867153920 |
---|---|
author | Preo, Nicolo' Capobianco, Enrico |
author_facet | Preo, Nicolo' Capobianco, Enrico |
author_sort | Preo, Nicolo' |
collection | PubMed |
description | Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabetes mellitus (T2D) screening when ad hoc models are used. About 10,000 US patients have been analyzed through a variety of inference techniques applied to all records with a variable degree of completeness. The analyses conducted in the reference study have indicated that EHR phenotypes significantly improved T2D detection. Methods: With these US patients and the T2D data evidenced in the above study, we propose an integrative inference approach that leverages the prediction power of EHR features selected by two well-known methods, Random Forests and Lasso. The goal is 2-fold: reducing the Big Data redundancies potentially harmful to the predictive learning task and exploiting the interconnectivity of EHR features. A mutual information (MI) network is the inference tool used to identify communities useful to prioritize significant T2D features underlying the similarity between patients. Results: Endowed with a different degree of granularity, the communities detected after the application of both methods were centered especially on T2D comorbidities and risk factors. As such, they appear very relevant for assessment of two main issues, T2D disease burden, and prevention. Conclusions: Our analytical approach offers a solution for managing the EHR scale factor in a complex disease context. EHR are rich sources of phenotypic diversity through which novel stratifications of patients are expected. To enable these results, both pre-screening of variables and calibration of risk prediction methods become necessary steps in EHR analyses. We have presented networks identifying major T2D communities. The specific significance assigned to comorbidities and risk factors in relation to T2D can be inferred with accuracy from just a suitably reduced number of EHR features. |
format | Online Article Text |
id | pubmed-7931876 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79318762021-03-09 Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks Preo, Nicolo' Capobianco, Enrico Front Big Data Big Data Background: Electronic health records (EHR) play an important role for the redefinition of phenotypes in view of the wealth and heterogeneity of information now available from disparate data sources. A recent cross-sectional retrospective study has described the potential of EHR toward type 2 diabetes mellitus (T2D) screening when ad hoc models are used. About 10,000 US patients have been analyzed through a variety of inference techniques applied to all records with a variable degree of completeness. The analyses conducted in the reference study have indicated that EHR phenotypes significantly improved T2D detection. Methods: With these US patients and the T2D data evidenced in the above study, we propose an integrative inference approach that leverages the prediction power of EHR features selected by two well-known methods, Random Forests and Lasso. The goal is 2-fold: reducing the Big Data redundancies potentially harmful to the predictive learning task and exploiting the interconnectivity of EHR features. A mutual information (MI) network is the inference tool used to identify communities useful to prioritize significant T2D features underlying the similarity between patients. Results: Endowed with a different degree of granularity, the communities detected after the application of both methods were centered especially on T2D comorbidities and risk factors. As such, they appear very relevant for assessment of two main issues, T2D disease burden, and prevention. Conclusions: Our analytical approach offers a solution for managing the EHR scale factor in a complex disease context. EHR are rich sources of phenotypic diversity through which novel stratifications of patients are expected. To enable these results, both pre-screening of variables and calibration of risk prediction methods become necessary steps in EHR analyses. We have presented networks identifying major T2D communities. The specific significance assigned to comorbidities and risk factors in relation to T2D can be inferred with accuracy from just a suitably reduced number of EHR features. Frontiers Media S.A. 2019-09-27 /pmc/articles/PMC7931876/ /pubmed/33693353 http://dx.doi.org/10.3389/fdata.2019.00030 Text en Copyright © 2019 Preo and Capobianco. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Preo, Nicolo' Capobianco, Enrico Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks |
title | Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks |
title_full | Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks |
title_fullStr | Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks |
title_full_unstemmed | Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks |
title_short | Significant EHR Feature-Driven T2D Inference: Predictive Machine Learning and Networks |
title_sort | significant ehr feature-driven t2d inference: predictive machine learning and networks |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931876/ https://www.ncbi.nlm.nih.gov/pubmed/33693353 http://dx.doi.org/10.3389/fdata.2019.00030 |
work_keys_str_mv | AT preonicolo significantehrfeaturedrivent2dinferencepredictivemachinelearningandnetworks AT capobiancoenrico significantehrfeaturedrivent2dinferencepredictivemachinelearningandnetworks |