Cargando…

Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Chao, Liu, Jiming, Zhao, Hongyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454602/
https://www.ncbi.nlm.nih.gov/pubmed/30961594
http://dx.doi.org/10.1186/s12911-019-0763-6
_version_ 1783409568821280768
author Yu, Chao
Liu, Jiming
Zhao, Hongyi
author_facet Yu, Chao
Liu, Jiming
Zhao, Hongyi
author_sort Yu, Chao
collection PubMed
description BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs). METHODS: We model the decision making problem as a Markov Decision Process, and use a batch RL method, Fitted Q Iterations with Gradient Boosting Decision Tree, to learn a suitable ventilator weaning policy from real trajectories in retrospective ICU data. A Bayesian inverse RL method is then applied to infer the latent reward functions in terms of weights in trading off various aspects of evaluation criterion. We then evaluate how the policy learned using the Bayesian inverse RL method matches the policy given by clinicians, as compared to other policies learned with fixed reward functions. RESULTS: Results show that the inverse RL method is capable of extracting meaningful indicators for recommending extubation readiness and sedative dosage, indicating that clinicians pay more attention to patients’ physiological stability (e.g., heart rate and respiration rate), rather than oxygenation criteria (FiO(2), PEEP and SpO(2)) which is supported by previous RL methods. Moreover, by discovering the optimal weights, new effective treatment protocols can be suggested. CONCLUSIONS: Inverse RL is an effective approach to discovering clinicians’ underlying reward functions for designing better treatment protocols in the ventilation weaning and sedative dosing in future ICUs.
format Online
Article
Text
id pubmed-6454602
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64546022019-04-19 Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units Yu, Chao Liu, Jiming Zhao, Hongyi BMC Med Inform Decis Mak Research BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs). METHODS: We model the decision making problem as a Markov Decision Process, and use a batch RL method, Fitted Q Iterations with Gradient Boosting Decision Tree, to learn a suitable ventilator weaning policy from real trajectories in retrospective ICU data. A Bayesian inverse RL method is then applied to infer the latent reward functions in terms of weights in trading off various aspects of evaluation criterion. We then evaluate how the policy learned using the Bayesian inverse RL method matches the policy given by clinicians, as compared to other policies learned with fixed reward functions. RESULTS: Results show that the inverse RL method is capable of extracting meaningful indicators for recommending extubation readiness and sedative dosage, indicating that clinicians pay more attention to patients’ physiological stability (e.g., heart rate and respiration rate), rather than oxygenation criteria (FiO(2), PEEP and SpO(2)) which is supported by previous RL methods. Moreover, by discovering the optimal weights, new effective treatment protocols can be suggested. CONCLUSIONS: Inverse RL is an effective approach to discovering clinicians’ underlying reward functions for designing better treatment protocols in the ventilation weaning and sedative dosing in future ICUs. BioMed Central 2019-04-09 /pmc/articles/PMC6454602/ /pubmed/30961594 http://dx.doi.org/10.1186/s12911-019-0763-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yu, Chao
Liu, Jiming
Zhao, Hongyi
Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_full Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_fullStr Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_full_unstemmed Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_short Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
title_sort inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454602/
https://www.ncbi.nlm.nih.gov/pubmed/30961594
http://dx.doi.org/10.1186/s12911-019-0763-6
work_keys_str_mv AT yuchao inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits
AT liujiming inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits
AT zhaohongyi inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits