Cargando…
A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS: We used 1872 billing cod...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9122640/ https://www.ncbi.nlm.nih.gov/pubmed/35139533 http://dx.doi.org/10.1093/jamia/ocac008 |
_version_ | 1784711388117073920 |
---|---|
author | Maurits, Marc P Korsunsky, Ilya Raychaudhuri, Soumya Murphy, Shawn N Smoller, Jordan W Weiss, Scott T Petukhova, Lynn M Weng, Chunhua Wei, Wei-Qi Huizinga, Thomas W J Reinders, Marcel J T Karlson, Elizabeth W van den Akker, Erik B Knevel, Rachel |
author_facet | Maurits, Marc P Korsunsky, Ilya Raychaudhuri, Soumya Murphy, Shawn N Smoller, Jordan W Weiss, Scott T Petukhova, Lynn M Weng, Chunhua Wei, Wei-Qi Huizinga, Thomas W J Reinders, Marcel J T Karlson, Elizabeth W van den Akker, Erik B Knevel, Rachel |
author_sort | Maurits, Marc P |
collection | PubMed |
description | OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 “other headache” clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. |
format | Online Article Text |
id | pubmed-9122640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-91226402022-05-23 A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history Maurits, Marc P Korsunsky, Ilya Raychaudhuri, Soumya Murphy, Shawn N Smoller, Jordan W Weiss, Scott T Petukhova, Lynn M Weng, Chunhua Wei, Wei-Qi Huizinga, Thomas W J Reinders, Marcel J T Karlson, Elizabeth W van den Akker, Erik B Knevel, Rachel J Am Med Inform Assoc Research and Applications OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 “other headache” clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. Oxford University Press 2022-02-09 /pmc/articles/PMC9122640/ /pubmed/35139533 http://dx.doi.org/10.1093/jamia/ocac008 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Maurits, Marc P Korsunsky, Ilya Raychaudhuri, Soumya Murphy, Shawn N Smoller, Jordan W Weiss, Scott T Petukhova, Lynn M Weng, Chunhua Wei, Wei-Qi Huizinga, Thomas W J Reinders, Marcel J T Karlson, Elizabeth W van den Akker, Erik B Knevel, Rachel A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
title | A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
title_full | A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
title_fullStr | A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
title_full_unstemmed | A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
title_short | A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
title_sort | framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9122640/ https://www.ncbi.nlm.nih.gov/pubmed/35139533 http://dx.doi.org/10.1093/jamia/ocac008 |
work_keys_str_mv | AT mauritsmarcp aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT korsunskyilya aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT raychaudhurisoumya aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT murphyshawnn aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT smollerjordanw aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT weissscottt aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT petukhovalynnm aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT wengchunhua aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT weiweiqi aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT huizingathomaswj aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT reindersmarceljt aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT karlsonelizabethw aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT vandenakkererikb aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT knevelrachel aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT mauritsmarcp frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT korsunskyilya frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT raychaudhurisoumya frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT murphyshawnn frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT smollerjordanw frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT weissscottt frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT petukhovalynnm frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT wengchunhua frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT weiweiqi frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT huizingathomaswj frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT reindersmarceljt frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT karlsonelizabethw frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT vandenakkererikb frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory AT knevelrachel frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory |