Cargando…

Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance

BACKGROUND: Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real wor...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, Sharon E., Ssemaganda, Henry, Koola, Jejo D., Mao, Jialin, Westerman, Dax, Speroff, Theodore, Govindarajulu, Usha S., Ramsay, Craig R., Sedrakyan, Art, Ohno-Machado, Lucila, Resnic, Frederic S., Matheny, Michael E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088292/
https://www.ncbi.nlm.nih.gov/pubmed/37041457
http://dx.doi.org/10.1186/s12874-023-01913-9
_version_ 1785022542687240192
author Davis, Sharon E.
Ssemaganda, Henry
Koola, Jejo D.
Mao, Jialin
Westerman, Dax
Speroff, Theodore
Govindarajulu, Usha S.
Ramsay, Craig R.
Sedrakyan, Art
Ohno-Machado, Lucila
Resnic, Frederic S.
Matheny, Michael E.
author_facet Davis, Sharon E.
Ssemaganda, Henry
Koola, Jejo D.
Mao, Jialin
Westerman, Dax
Speroff, Theodore
Govindarajulu, Usha S.
Ramsay, Craig R.
Sedrakyan, Art
Ohno-Machado, Lucila
Resnic, Frederic S.
Matheny, Michael E.
author_sort Davis, Sharon E.
collection PubMed
description BACKGROUND: Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real world data, simulation studies using synthetic datasets that mimic complex clinical environments are essential. We describe and evaluate a generalizable framework for injecting hierarchical learning effects within a robust data generation process that incorporates the magnitude of intrinsic risk and accounts for known critical elements in clinical data relationships. METHODS: We present a multi-step data generating process with customizable options and flexible modules to support a variety of simulation requirements. Synthetic patients with nonlinear and correlated features are assigned to provider and institution case series. The probability of treatment and outcome assignment are associated with patient features based on user definitions. Risk due to experiential learning by providers and/or institutions when novel treatments are introduced is injected at various speeds and magnitudes. To further reflect real-world complexity, users can request missing values and omitted variables. We illustrate an implementation of our method in a case study using MIMIC-III data for reference patient feature distributions. RESULTS: Realized data characteristics in the simulated data reflected specified values. Apparent deviations in treatment effects and feature distributions, though not statistically significant, were most common in small datasets (n < 3000) and attributable to random noise and variability in estimating realized values in small samples. When learning effects were specified, synthetic datasets exhibited changes in the probability of an adverse outcomes as cases accrued for the treatment group impacted by learning and stable probabilities as cases accrued for the treatment group not affected by learning. CONCLUSIONS: Our framework extends clinical data simulation techniques beyond generation of patient features to incorporate hierarchical learning effects. This enables the complex simulation studies required to develop and rigorously test algorithms developed to disentangle treatment safety signals from the effects of experiential learning. By supporting such efforts, this work can help identify training opportunities, avoid unwarranted restriction of access to medical advances, and hasten treatment improvements. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01913-9.
format Online
Article
Text
id pubmed-10088292
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100882922023-04-12 Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance Davis, Sharon E. Ssemaganda, Henry Koola, Jejo D. Mao, Jialin Westerman, Dax Speroff, Theodore Govindarajulu, Usha S. Ramsay, Craig R. Sedrakyan, Art Ohno-Machado, Lucila Resnic, Frederic S. Matheny, Michael E. BMC Med Res Methodol Research BACKGROUND: Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real world data, simulation studies using synthetic datasets that mimic complex clinical environments are essential. We describe and evaluate a generalizable framework for injecting hierarchical learning effects within a robust data generation process that incorporates the magnitude of intrinsic risk and accounts for known critical elements in clinical data relationships. METHODS: We present a multi-step data generating process with customizable options and flexible modules to support a variety of simulation requirements. Synthetic patients with nonlinear and correlated features are assigned to provider and institution case series. The probability of treatment and outcome assignment are associated with patient features based on user definitions. Risk due to experiential learning by providers and/or institutions when novel treatments are introduced is injected at various speeds and magnitudes. To further reflect real-world complexity, users can request missing values and omitted variables. We illustrate an implementation of our method in a case study using MIMIC-III data for reference patient feature distributions. RESULTS: Realized data characteristics in the simulated data reflected specified values. Apparent deviations in treatment effects and feature distributions, though not statistically significant, were most common in small datasets (n < 3000) and attributable to random noise and variability in estimating realized values in small samples. When learning effects were specified, synthetic datasets exhibited changes in the probability of an adverse outcomes as cases accrued for the treatment group impacted by learning and stable probabilities as cases accrued for the treatment group not affected by learning. CONCLUSIONS: Our framework extends clinical data simulation techniques beyond generation of patient features to incorporate hierarchical learning effects. This enables the complex simulation studies required to develop and rigorously test algorithms developed to disentangle treatment safety signals from the effects of experiential learning. By supporting such efforts, this work can help identify training opportunities, avoid unwarranted restriction of access to medical advances, and hasten treatment improvements. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01913-9. BioMed Central 2023-04-11 /pmc/articles/PMC10088292/ /pubmed/37041457 http://dx.doi.org/10.1186/s12874-023-01913-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Davis, Sharon E.
Ssemaganda, Henry
Koola, Jejo D.
Mao, Jialin
Westerman, Dax
Speroff, Theodore
Govindarajulu, Usha S.
Ramsay, Craig R.
Sedrakyan, Art
Ohno-Machado, Lucila
Resnic, Frederic S.
Matheny, Michael E.
Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
title Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
title_full Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
title_fullStr Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
title_full_unstemmed Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
title_short Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
title_sort simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088292/
https://www.ncbi.nlm.nih.gov/pubmed/37041457
http://dx.doi.org/10.1186/s12874-023-01913-9
work_keys_str_mv AT davissharone simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT ssemagandahenry simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT koolajejod simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT maojialin simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT westermandax simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT sperofftheodore simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT govindarajuluushas simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT ramsaycraigr simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT sedrakyanart simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT ohnomachadolucila simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT resnicfrederics simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance
AT mathenymichaele simulatingcomplexpatientpopulationswithhierarchicallearningeffectstosupportmethodsdevelopmentforpostmarketsurveillance