Cargando…

A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history

OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS: We used 1872 billing cod...

Descripción completa

Detalles Bibliográficos
Autores principales: Maurits, Marc P, Korsunsky, Ilya, Raychaudhuri, Soumya, Murphy, Shawn N, Smoller, Jordan W, Weiss, Scott T, Petukhova, Lynn M, Weng, Chunhua, Wei, Wei-Qi, Huizinga, Thomas W J, Reinders, Marcel J T, Karlson, Elizabeth W, van den Akker, Erik B, Knevel, Rachel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9122640/
https://www.ncbi.nlm.nih.gov/pubmed/35139533
http://dx.doi.org/10.1093/jamia/ocac008
_version_ 1784711388117073920
author Maurits, Marc P
Korsunsky, Ilya
Raychaudhuri, Soumya
Murphy, Shawn N
Smoller, Jordan W
Weiss, Scott T
Petukhova, Lynn M
Weng, Chunhua
Wei, Wei-Qi
Huizinga, Thomas W J
Reinders, Marcel J T
Karlson, Elizabeth W
van den Akker, Erik B
Knevel, Rachel
author_facet Maurits, Marc P
Korsunsky, Ilya
Raychaudhuri, Soumya
Murphy, Shawn N
Smoller, Jordan W
Weiss, Scott T
Petukhova, Lynn M
Weng, Chunhua
Wei, Wei-Qi
Huizinga, Thomas W J
Reinders, Marcel J T
Karlson, Elizabeth W
van den Akker, Erik B
Knevel, Rachel
author_sort Maurits, Marc P
collection PubMed
description OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 “other headache” clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.
format Online
Article
Text
id pubmed-9122640
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91226402022-05-23 A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history Maurits, Marc P Korsunsky, Ilya Raychaudhuri, Soumya Murphy, Shawn N Smoller, Jordan W Weiss, Scott T Petukhova, Lynn M Weng, Chunhua Wei, Wei-Qi Huizinga, Thomas W J Reinders, Marcel J T Karlson, Elizabeth W van den Akker, Erik B Knevel, Rachel J Am Med Inform Assoc Research and Applications OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 “other headache” clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. Oxford University Press 2022-02-09 /pmc/articles/PMC9122640/ /pubmed/35139533 http://dx.doi.org/10.1093/jamia/ocac008 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Maurits, Marc P
Korsunsky, Ilya
Raychaudhuri, Soumya
Murphy, Shawn N
Smoller, Jordan W
Weiss, Scott T
Petukhova, Lynn M
Weng, Chunhua
Wei, Wei-Qi
Huizinga, Thomas W J
Reinders, Marcel J T
Karlson, Elizabeth W
van den Akker, Erik B
Knevel, Rachel
A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
title A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
title_full A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
title_fullStr A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
title_full_unstemmed A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
title_short A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
title_sort framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9122640/
https://www.ncbi.nlm.nih.gov/pubmed/35139533
http://dx.doi.org/10.1093/jamia/ocac008
work_keys_str_mv AT mauritsmarcp aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT korsunskyilya aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT raychaudhurisoumya aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT murphyshawnn aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT smollerjordanw aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT weissscottt aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT petukhovalynnm aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT wengchunhua aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT weiweiqi aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT huizingathomaswj aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT reindersmarceljt aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT karlsonelizabethw aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT vandenakkererikb aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT knevelrachel aframeworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT mauritsmarcp frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT korsunskyilya frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT raychaudhurisoumya frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT murphyshawnn frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT smollerjordanw frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT weissscottt frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT petukhovalynnm frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT wengchunhua frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT weiweiqi frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT huizingathomaswj frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT reindersmarceljt frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT karlsonelizabethw frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT vandenakkererikb frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory
AT knevelrachel frameworkforemployinglongitudinallycollectedmulticenterelectronichealthrecordstostratifyheterogeneouspatientpopulationsondiseasehistory