Cargando…

Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods

BACKGROUND: To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technolog...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Gang, Stone, Bryan L, Johnson, Michael D, Tarczy-Hornoch, Peter, Wilcox, Adam B, Mooney, Sean D, Sheng, Xiaoming, Haug, Peter J, Nkoy, Flory L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5596298/
https://www.ncbi.nlm.nih.gov/pubmed/28851678
http://dx.doi.org/10.2196/resprot.7757
_version_ 1783263505372151808
author Luo, Gang
Stone, Bryan L
Johnson, Michael D
Tarczy-Hornoch, Peter
Wilcox, Adam B
Mooney, Sean D
Sheng, Xiaoming
Haug, Peter J
Nkoy, Flory L
author_facet Luo, Gang
Stone, Bryan L
Johnson, Michael D
Tarczy-Hornoch, Peter
Wilcox, Adam B
Mooney, Sean D
Sheng, Xiaoming
Haug, Peter J
Nkoy, Flory L
author_sort Luo, Gang
collection PubMed
description BACKGROUND: To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient’s weight kept rising in the past year). This process becomes infeasible with limited budgets. OBJECTIVE: This study’s goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. METHODS: This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers; and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes. RESULTS: We are currently writing Auto-ML’s design document. We intend to finish our study by around the year 2022. CONCLUSIONS: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, health care researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in health care and improve patient outcomes.
format Online
Article
Text
id pubmed-5596298
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-55962982017-09-20 Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods Luo, Gang Stone, Bryan L Johnson, Michael D Tarczy-Hornoch, Peter Wilcox, Adam B Mooney, Sean D Sheng, Xiaoming Haug, Peter J Nkoy, Flory L JMIR Res Protoc Proposal BACKGROUND: To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient’s weight kept rising in the past year). This process becomes infeasible with limited budgets. OBJECTIVE: This study’s goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. METHODS: This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers; and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes. RESULTS: We are currently writing Auto-ML’s design document. We intend to finish our study by around the year 2022. CONCLUSIONS: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, health care researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in health care and improve patient outcomes. JMIR Publications 2017-08-29 /pmc/articles/PMC5596298/ /pubmed/28851678 http://dx.doi.org/10.2196/resprot.7757 Text en ©Gang Luo, Bryan L Stone, Michael D Johnson, Peter Tarczy-Hornoch, Adam B Wilcox, Sean D Mooney, Xiaoming Sheng, Peter J Haug, Flory L Nkoy. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 29.08.2017. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on http://www.researchprotocols.org, as well as this copyright and license information must be included.
spellingShingle Proposal
Luo, Gang
Stone, Bryan L
Johnson, Michael D
Tarczy-Hornoch, Peter
Wilcox, Adam B
Mooney, Sean D
Sheng, Xiaoming
Haug, Peter J
Nkoy, Flory L
Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
title Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
title_full Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
title_fullStr Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
title_full_unstemmed Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
title_short Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods
title_sort automating construction of machine learning models with clinical big data: proposal rationale and methods
topic Proposal
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5596298/
https://www.ncbi.nlm.nih.gov/pubmed/28851678
http://dx.doi.org/10.2196/resprot.7757
work_keys_str_mv AT luogang automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT stonebryanl automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT johnsonmichaeld automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT tarczyhornochpeter automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT wilcoxadamb automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT mooneyseand automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT shengxiaoming automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT haugpeterj automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods
AT nkoyfloryl automatingconstructionofmachinelearningmodelswithclinicalbigdataproposalrationaleandmethods