Cargando…

Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study

BACKGROUND: Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. METHODS: Using simulated data, we use a ML algorit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Forna, Alpha, Dorigatti, Ilaria, Nouvellet, Pierre, Donnelly, Christl A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443081/ https://www.ncbi.nlm.nih.gov/pubmed/34525098 http://dx.doi.org/10.1371/journal.pone.0257005

_version_	1783753119060983808
author	Forna, Alpha Dorigatti, Ilaria Nouvellet, Pierre Donnelly, Christl A.
author_facet	Forna, Alpha Dorigatti, Ilaria Nouvellet, Pierre Donnelly, Christl A.
author_sort	Forna, Alpha
collection	PubMed
description	BACKGROUND: Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. METHODS: Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random—MCAR, missing at random—MAR, or missing not at random—MNAR). RESULTS: Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%–16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%–11%). CONCLUSION: ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings—patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes.
format	Online Article Text
id	pubmed-8443081
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-84430812021-09-16 Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study Forna, Alpha Dorigatti, Ilaria Nouvellet, Pierre Donnelly, Christl A. PLoS One Research Article BACKGROUND: Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. METHODS: Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random—MCAR, missing at random—MAR, or missing not at random—MNAR). RESULTS: Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%–16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%–11%). CONCLUSION: ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings—patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes. Public Library of Science 2021-09-15 /pmc/articles/PMC8443081/ /pubmed/34525098 http://dx.doi.org/10.1371/journal.pone.0257005 Text en © 2021 Forna et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Forna, Alpha Dorigatti, Ilaria Nouvellet, Pierre Donnelly, Christl A. Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study
title	Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study
title_full	Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study
title_fullStr	Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study
title_full_unstemmed	Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study
title_short	Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study
title_sort	comparison of machine learning methods for estimating case fatality ratios: an ebola outbreak simulation study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8443081/ https://www.ncbi.nlm.nih.gov/pubmed/34525098 http://dx.doi.org/10.1371/journal.pone.0257005
work_keys_str_mv	AT fornaalpha comparisonofmachinelearningmethodsforestimatingcasefatalityratiosanebolaoutbreaksimulationstudy AT dorigattiilaria comparisonofmachinelearningmethodsforestimatingcasefatalityratiosanebolaoutbreaksimulationstudy AT nouvelletpierre comparisonofmachinelearningmethodsforestimatingcasefatalityratiosanebolaoutbreaksimulationstudy AT donnellychristla comparisonofmachinelearningmethodsforestimatingcasefatalityratiosanebolaoutbreaksimulationstudy

Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study

Ejemplares similares