Cargando…

Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database

Electronic health records (EHRs) have given rise to large and complex databases of medical information that have the potential to become powerful tools for clinical research. However, differences in coding systems and the detail and accuracy of the information within EHRs can vary across institution...

Descripción completa

Detalles Bibliográficos
Autores principales:	McKnite, Autumn M., Job, Kathleen M., Nelson, Raoul, Sherwin, Catherine M.T., Watt, Kevin M., Brewer, Simon C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674326/ https://www.ncbi.nlm.nih.gov/pubmed/36405250 http://dx.doi.org/10.1016/j.imu.2022.101104

_version_	1784833133482344448
author	McKnite, Autumn M. Job, Kathleen M. Nelson, Raoul Sherwin, Catherine M.T. Watt, Kevin M. Brewer, Simon C.
author_facet	McKnite, Autumn M. Job, Kathleen M. Nelson, Raoul Sherwin, Catherine M.T. Watt, Kevin M. Brewer, Simon C.
author_sort	McKnite, Autumn M.
collection	PubMed
description	Electronic health records (EHRs) have given rise to large and complex databases of medical information that have the potential to become powerful tools for clinical research. However, differences in coding systems and the detail and accuracy of the information within EHRs can vary across institutions. This makes it challenging to identify subpopulations of patients and limits the widespread use of multi-institutional databases. In this study, we leveraged machine learning to identify patterns in medication usage among hospitalized pediatric patients receiving renal replacement therapy and created a predictive model that successfully differentiated between intermittent (iHD) and continuous renal replacement therapy (CRRT) hemodialysis patients. We trained six machine learning algorithms (logistical regression, Naïve Bayes, k-nearest neighbor, support vector machine, random forest, and gradient boosted trees) using patient records from a multi-center database (n = 533) and prescribed medication ingredients (n = 228) as features to discriminate between the two hemodialysis types. Predictive skill was assessed using a 5-fold cross-validation, and the algorithms showed a range of performance from 0.7 balanced accuracy (logistical regression) to 0.86 (random forest). The two best performing models were further tested using an independent single-center dataset and achieved 84–87% balanced accuracy. This model overcomes issues inherent within large databases and will allow us to utilize and combine historical records, significantly increasing population size and diversity within both iHD and CRRT populations for future clinical studies. Our work demonstrates the utility of using medications alone to accurately differentiate subpopulations of patients in large datasets, allowing codes to be transferred between different coding systems. This framework has the potential to be used to distinguish other subpopulations of patients where discriminatory ICD codes are not available, permitting more detailed insights and new lines of research.
format	Online Article Text
id	pubmed-9674326
institution	National Center for Biotechnology Information
language	English
publishDate	2022
record_format	MEDLINE/PubMed
spelling	pubmed-96743262022-11-18 Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database McKnite, Autumn M. Job, Kathleen M. Nelson, Raoul Sherwin, Catherine M.T. Watt, Kevin M. Brewer, Simon C. Inform Med Unlocked Article Electronic health records (EHRs) have given rise to large and complex databases of medical information that have the potential to become powerful tools for clinical research. However, differences in coding systems and the detail and accuracy of the information within EHRs can vary across institutions. This makes it challenging to identify subpopulations of patients and limits the widespread use of multi-institutional databases. In this study, we leveraged machine learning to identify patterns in medication usage among hospitalized pediatric patients receiving renal replacement therapy and created a predictive model that successfully differentiated between intermittent (iHD) and continuous renal replacement therapy (CRRT) hemodialysis patients. We trained six machine learning algorithms (logistical regression, Naïve Bayes, k-nearest neighbor, support vector machine, random forest, and gradient boosted trees) using patient records from a multi-center database (n = 533) and prescribed medication ingredients (n = 228) as features to discriminate between the two hemodialysis types. Predictive skill was assessed using a 5-fold cross-validation, and the algorithms showed a range of performance from 0.7 balanced accuracy (logistical regression) to 0.86 (random forest). The two best performing models were further tested using an independent single-center dataset and achieved 84–87% balanced accuracy. This model overcomes issues inherent within large databases and will allow us to utilize and combine historical records, significantly increasing population size and diversity within both iHD and CRRT populations for future clinical studies. Our work demonstrates the utility of using medications alone to accurately differentiate subpopulations of patients in large datasets, allowing codes to be transferred between different coding systems. This framework has the potential to be used to distinguish other subpopulations of patients where discriminatory ICD codes are not available, permitting more detailed insights and new lines of research. 2022 2022-10-06 /pmc/articles/PMC9674326/ /pubmed/36405250 http://dx.doi.org/10.1016/j.imu.2022.101104 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ).
spellingShingle	Article McKnite, Autumn M. Job, Kathleen M. Nelson, Raoul Sherwin, Catherine M.T. Watt, Kevin M. Brewer, Simon C. Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
title	Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
title_full	Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
title_fullStr	Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
title_full_unstemmed	Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
title_short	Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
title_sort	medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674326/ https://www.ncbi.nlm.nih.gov/pubmed/36405250 http://dx.doi.org/10.1016/j.imu.2022.101104
work_keys_str_mv	AT mckniteautumnm medicationbasedmachinelearningtoidentifysubpopulationsofpediatrichemodialysispatientsinanelectronichealthrecorddatabase AT jobkathleenm medicationbasedmachinelearningtoidentifysubpopulationsofpediatrichemodialysispatientsinanelectronichealthrecorddatabase AT nelsonraoul medicationbasedmachinelearningtoidentifysubpopulationsofpediatrichemodialysispatientsinanelectronichealthrecorddatabase AT sherwincatherinemt medicationbasedmachinelearningtoidentifysubpopulationsofpediatrichemodialysispatientsinanelectronichealthrecorddatabase AT wattkevinm medicationbasedmachinelearningtoidentifysubpopulationsofpediatrichemodialysispatientsinanelectronichealthrecorddatabase AT brewersimonc medicationbasedmachinelearningtoidentifysubpopulationsofpediatrichemodialysispatientsinanelectronichealthrecorddatabase

Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database

Ejemplares similares