Cargando…
Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units
BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454602/ https://www.ncbi.nlm.nih.gov/pubmed/30961594 http://dx.doi.org/10.1186/s12911-019-0763-6 |
_version_ | 1783409568821280768 |
---|---|
author | Yu, Chao Liu, Jiming Zhao, Hongyi |
author_facet | Yu, Chao Liu, Jiming Zhao, Hongyi |
author_sort | Yu, Chao |
collection | PubMed |
description | BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs). METHODS: We model the decision making problem as a Markov Decision Process, and use a batch RL method, Fitted Q Iterations with Gradient Boosting Decision Tree, to learn a suitable ventilator weaning policy from real trajectories in retrospective ICU data. A Bayesian inverse RL method is then applied to infer the latent reward functions in terms of weights in trading off various aspects of evaluation criterion. We then evaluate how the policy learned using the Bayesian inverse RL method matches the policy given by clinicians, as compared to other policies learned with fixed reward functions. RESULTS: Results show that the inverse RL method is capable of extracting meaningful indicators for recommending extubation readiness and sedative dosage, indicating that clinicians pay more attention to patients’ physiological stability (e.g., heart rate and respiration rate), rather than oxygenation criteria (FiO(2), PEEP and SpO(2)) which is supported by previous RL methods. Moreover, by discovering the optimal weights, new effective treatment protocols can be suggested. CONCLUSIONS: Inverse RL is an effective approach to discovering clinicians’ underlying reward functions for designing better treatment protocols in the ventilation weaning and sedative dosing in future ICUs. |
format | Online Article Text |
id | pubmed-6454602 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64546022019-04-19 Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units Yu, Chao Liu, Jiming Zhao, Hongyi BMC Med Inform Decis Mak Research BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs). METHODS: We model the decision making problem as a Markov Decision Process, and use a batch RL method, Fitted Q Iterations with Gradient Boosting Decision Tree, to learn a suitable ventilator weaning policy from real trajectories in retrospective ICU data. A Bayesian inverse RL method is then applied to infer the latent reward functions in terms of weights in trading off various aspects of evaluation criterion. We then evaluate how the policy learned using the Bayesian inverse RL method matches the policy given by clinicians, as compared to other policies learned with fixed reward functions. RESULTS: Results show that the inverse RL method is capable of extracting meaningful indicators for recommending extubation readiness and sedative dosage, indicating that clinicians pay more attention to patients’ physiological stability (e.g., heart rate and respiration rate), rather than oxygenation criteria (FiO(2), PEEP and SpO(2)) which is supported by previous RL methods. Moreover, by discovering the optimal weights, new effective treatment protocols can be suggested. CONCLUSIONS: Inverse RL is an effective approach to discovering clinicians’ underlying reward functions for designing better treatment protocols in the ventilation weaning and sedative dosing in future ICUs. BioMed Central 2019-04-09 /pmc/articles/PMC6454602/ /pubmed/30961594 http://dx.doi.org/10.1186/s12911-019-0763-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Yu, Chao Liu, Jiming Zhao, Hongyi Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
title | Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
title_full | Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
title_fullStr | Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
title_full_unstemmed | Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
title_short | Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
title_sort | inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454602/ https://www.ncbi.nlm.nih.gov/pubmed/30961594 http://dx.doi.org/10.1186/s12911-019-0763-6 |
work_keys_str_mv | AT yuchao inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits AT liujiming inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits AT zhaohongyi inversereinforcementlearningforintelligentmechanicalventilationandsedativedosinginintensivecareunits |