Cargando…
Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit
BACKGROUND: The number of publications using machine learning (ML) to predict cardiovascular outcomes and identify clusters of patients at greater risk has risen dramatically in recent years. However, research papers which use ML often fail to provide sufficient information about their algorithms to...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9708006/ http://dx.doi.org/10.1093/ehjdh/ztab104.3052 |
_version_ | 1784840826190299136 |
---|---|
author | Jones, Y Cleland, J Li, C Pellicori, P Friday, J |
author_facet | Jones, Y Cleland, J Li, C Pellicori, P Friday, J |
author_sort | Jones, Y |
collection | PubMed |
description | BACKGROUND: The number of publications using machine learning (ML) to predict cardiovascular outcomes and identify clusters of patients at greater risk has risen dramatically in recent years. However, research papers which use ML often fail to provide sufficient information about their algorithms to enable results to be replicated by others in the same or different datasets. AIM: To test the reproducibility of results from ML algorithms given three different levels of information commonly found in publications: model type alone, a description of the model, and complete algorithm. METHODS: MIMIC-III is a healthcare dataset comprising detailed information from over 60,000 intensive care unit (ICU) admissions from the Beth Israel Deaconess Medical Centre between 2001 and 2012. Access is available to everyone pending approval and completion of a short training course. Using this dataset, three models for predicting all-cause in-hospital mortality were created, two from a PhD student working in ML, and one from an existing research paper which used the same dataset and provided complete model information. A second researcher (a PhD student in ML and cardiology) was given the same dataset and was tasked with reproducing their results. Initially, this second researcher was told what type of model was created in each case, followed by a brief description of the algorithms. Finally, the complete algorithms from each participant were provided. In all three scenarios, recreated models were compared to original models using Area Under the Receiver Operating Characteristic Curve (AUC). RESULTS: After excluding those younger than 18 years and events with missing or invalid entries, 21,139 ICU admissions remained from 18,094 patients between 2001 and 2012, including 2,797 in-hospital deaths. Three models were produced: two Recurrent Neural Networks (RNNs) which differed significantly in internal weights and variables, and a Boosted Tree Classifier (BTC). The AUC of the first reproduced RNN matched that of the original RNN (Figure 1), however the second RNN and the BTC could not be reproduced given model type alone. As more information was provided about these algorithms, the results from the reproduced models matched the original results more closely. CONCLUSIONS: In order to create clinically useful ML tools with results that are reproducible and consistent, it is vital that researchers share enough detail about their models. Model type alone is not enough to guarantee reproducibility. Although some models can be recreated with limited information, this is not always the case, and the best results are found when the complete algorithm is shared. These findings have huge relevance when trying to apply ML in clinical practice. FUNDING ACKNOWLEDGEMENT: Type of funding sources: None. |
format | Online Article Text |
id | pubmed-9708006 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97080062023-01-27 Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit Jones, Y Cleland, J Li, C Pellicori, P Friday, J Eur Heart J Digit Health Abstracts BACKGROUND: The number of publications using machine learning (ML) to predict cardiovascular outcomes and identify clusters of patients at greater risk has risen dramatically in recent years. However, research papers which use ML often fail to provide sufficient information about their algorithms to enable results to be replicated by others in the same or different datasets. AIM: To test the reproducibility of results from ML algorithms given three different levels of information commonly found in publications: model type alone, a description of the model, and complete algorithm. METHODS: MIMIC-III is a healthcare dataset comprising detailed information from over 60,000 intensive care unit (ICU) admissions from the Beth Israel Deaconess Medical Centre between 2001 and 2012. Access is available to everyone pending approval and completion of a short training course. Using this dataset, three models for predicting all-cause in-hospital mortality were created, two from a PhD student working in ML, and one from an existing research paper which used the same dataset and provided complete model information. A second researcher (a PhD student in ML and cardiology) was given the same dataset and was tasked with reproducing their results. Initially, this second researcher was told what type of model was created in each case, followed by a brief description of the algorithms. Finally, the complete algorithms from each participant were provided. In all three scenarios, recreated models were compared to original models using Area Under the Receiver Operating Characteristic Curve (AUC). RESULTS: After excluding those younger than 18 years and events with missing or invalid entries, 21,139 ICU admissions remained from 18,094 patients between 2001 and 2012, including 2,797 in-hospital deaths. Three models were produced: two Recurrent Neural Networks (RNNs) which differed significantly in internal weights and variables, and a Boosted Tree Classifier (BTC). The AUC of the first reproduced RNN matched that of the original RNN (Figure 1), however the second RNN and the BTC could not be reproduced given model type alone. As more information was provided about these algorithms, the results from the reproduced models matched the original results more closely. CONCLUSIONS: In order to create clinically useful ML tools with results that are reproducible and consistent, it is vital that researchers share enough detail about their models. Model type alone is not enough to guarantee reproducibility. Although some models can be recreated with limited information, this is not always the case, and the best results are found when the complete algorithm is shared. These findings have huge relevance when trying to apply ML in clinical practice. FUNDING ACKNOWLEDGEMENT: Type of funding sources: None. Oxford University Press 2021-12-29 /pmc/articles/PMC9708006/ http://dx.doi.org/10.1093/ehjdh/ztab104.3052 Text en Reproduced from: European Heart Journal, Volume 42, Issue Supplement_1, October 2021, ehab724.3052, https://doi.org/10.1093/eurheartj/ehab724.3052 by permission of Oxford University Press on behalf of the European Society of Cardiology. The opinions expressed in the Journal item reproduced as this reprint are those of the authors and contributors, and do not necessarily reflect those of the European Society of Cardiology, the editors, the editorial board, Oxford University Press or the organization to which the authors are affiliated. The mention of trade names, commercial products or organizations, and the inclusion of advertisements in this reprint do not imply endorsement by the Journal, the editors, the editorial board, Oxford University Press or the organization to which the authors are affiliated. The editors and publishers have taken all reasonable precautions to verify drug names and doses, the results of experimental work and clinical findings published in the Journal. The ultimate responsibility for the use and dosage of drugs mentioned in this reprint and in interpretation of published material lies with the medical practitioner, and the editors and publisher cannot accept liability for damages arising from any error or omissions in the Journal or in this reprint. Please inform the editors of any errors. © The Author(s) 2021. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Abstracts Jones, Y Cleland, J Li, C Pellicori, P Friday, J Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
title | Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
title_full | Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
title_fullStr | Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
title_full_unstemmed | Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
title_short | Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
title_sort | inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit |
topic | Abstracts |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9708006/ http://dx.doi.org/10.1093/ehjdh/ztab104.3052 |
work_keys_str_mv | AT jonesy interoperatorvariabilityofmachinelearningresearcherspredictingallcausemortalityinpatientsadmittedtointensivecareunit AT clelandj interoperatorvariabilityofmachinelearningresearcherspredictingallcausemortalityinpatientsadmittedtointensivecareunit AT lic interoperatorvariabilityofmachinelearningresearcherspredictingallcausemortalityinpatientsadmittedtointensivecareunit AT pellicorip interoperatorvariabilityofmachinelearningresearcherspredictingallcausemortalityinpatientsadmittedtointensivecareunit AT fridayj interoperatorvariabilityofmachinelearningresearcherspredictingallcausemortalityinpatientsadmittedtointensivecareunit |