Cargando…

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise im...

Descripción completa

Detalles Bibliográficos
Autores principales: Dunnmon, Jared A., Ratner, Alexander J., Saab, Khaled, Khandwala, Nishith, Markert, Matthew, Sagreiya, Hersh, Goldman, Roger, Lee-Messer, Christopher, Lungren, Matthew P., Rubin, Daniel L., Ré, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413132/
https://www.ncbi.nlm.nih.gov/pubmed/32776018
http://dx.doi.org/10.1016/j.patter.2020.100019
_version_ 1783568745198780416
author Dunnmon, Jared A.
Ratner, Alexander J.
Saab, Khaled
Khandwala, Nishith
Markert, Matthew
Sagreiya, Hersh
Goldman, Roger
Lee-Messer, Christopher
Lungren, Matthew P.
Rubin, Daniel L.
Ré, Christopher
author_facet Dunnmon, Jared A.
Ratner, Alexander J.
Saab, Khaled
Khandwala, Nishith
Markert, Matthew
Sagreiya, Hersh
Goldman, Roger
Lee-Messer, Christopher
Lungren, Matthew P.
Rubin, Daniel L.
Ré, Christopher
author_sort Dunnmon, Jared A.
collection PubMed
description A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work—a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models.
format Online
Article
Text
id pubmed-7413132
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-74131322020-08-07 Cross-Modal Data Programming Enables Rapid Medical Machine Learning Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher Patterns (N Y) Article A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work—a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models. Elsevier 2020-04-28 /pmc/articles/PMC7413132/ /pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dunnmon, Jared A.
Ratner, Alexander J.
Saab, Khaled
Khandwala, Nishith
Markert, Matthew
Sagreiya, Hersh
Goldman, Roger
Lee-Messer, Christopher
Lungren, Matthew P.
Rubin, Daniel L.
Ré, Christopher
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_full Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_fullStr Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_full_unstemmed Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_short Cross-Modal Data Programming Enables Rapid Medical Machine Learning
title_sort cross-modal data programming enables rapid medical machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413132/
https://www.ncbi.nlm.nih.gov/pubmed/32776018
http://dx.doi.org/10.1016/j.patter.2020.100019
work_keys_str_mv AT dunnmonjareda crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT ratneralexanderj crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT saabkhaled crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT khandwalanishith crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT markertmatthew crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT sagreiyahersh crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT goldmanroger crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT leemesserchristopher crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT lungrenmatthewp crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT rubindaniell crossmodaldataprogrammingenablesrapidmedicalmachinelearning
AT rechristopher crossmodaldataprogrammingenablesrapidmedicalmachinelearning