Cargando…

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise im...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dunnmon, Jared A., Ratner, Alexander J., Saab, Khaled, Khandwala, Nishith, Markert, Matthew, Sagreiya, Hersh, Goldman, Roger, Lee-Messer, Christopher, Lungren, Matthew P., Rubin, Daniel L., Ré, Christopher
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413132/ https://www.ncbi.nlm.nih.gov/pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019

_version_	1783568745198780416
author	Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher
author_facet	Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher
author_sort	Dunnmon, Jared A.
collection	PubMed
description	A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work—a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models.
format	Online Article Text
id	pubmed-7413132
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-74131322020-08-07 Cross-Modal Data Programming Enables Rapid Medical Machine Learning Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher Patterns (N Y) Article A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work—a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models. Elsevier 2020-04-28 /pmc/articles/PMC7413132/ /pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title	Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_full	Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_fullStr	Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_full_unstemmed	Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_short	Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_sort	cross-modal data programming enables rapid medical machine learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413132/ https://www.ncbi.nlm.nih.gov/pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019
work_keys_str_mv	AT dunnmonjareda crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT ratneralexanderj crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT saabkhaled crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT khandwalanishith crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT markertmatthew crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT sagreiyahersh crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT goldmanroger crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT leemesserchristopher crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT lungrenmatthewp crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT rubindaniell crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT rechristopher crossmodaldataprogrammingenablesrapidmedicalmachinelearning

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Ejemplares similares