Cargando…
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise im...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413132/ https://www.ncbi.nlm.nih.gov/pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019 |
_version_ | 1783568745198780416 |
---|---|
author | Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher |
author_facet | Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher |
author_sort | Dunnmon, Jared A. |
collection | PubMed |
description | A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work—a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models. |
format | Online Article Text |
id | pubmed-7413132 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-74131322020-08-07 Cross-Modal Data Programming Enables Rapid Medical Machine Learning Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher Patterns (N Y) Article A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work—a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models. Elsevier 2020-04-28 /pmc/articles/PMC7413132/ /pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Dunnmon, Jared A. Ratner, Alexander J. Saab, Khaled Khandwala, Nishith Markert, Matthew Sagreiya, Hersh Goldman, Roger Lee-Messer, Christopher Lungren, Matthew P. Rubin, Daniel L. Ré, Christopher Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
title | Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
title_full | Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
title_fullStr | Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
title_full_unstemmed | Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
title_short | Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
title_sort | cross-modal data programming enables rapid medical machine learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7413132/ https://www.ncbi.nlm.nih.gov/pubmed/32776018 http://dx.doi.org/10.1016/j.patter.2020.100019 |
work_keys_str_mv | AT dunnmonjareda crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT ratneralexanderj crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT saabkhaled crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT khandwalanishith crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT markertmatthew crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT sagreiyahersh crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT goldmanroger crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT leemesserchristopher crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT lungrenmatthewp crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT rubindaniell crossmodaldataprogrammingenablesrapidmedicalmachinelearning AT rechristopher crossmodaldataprogrammingenablesrapidmedicalmachinelearning |