Cargando…
Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
AIMS: Histology, the microscopic study of normal tissues, is a crucial element of most medical curricula. Learning tools focused on histology are very important to learners who seek diagnostic competency within this important diagnostic arena. Recent developments in machine learning (ML) suggest tha...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Wolters Kluwer - Medknow
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112343/ https://www.ncbi.nlm.nih.gov/pubmed/34012709 http://dx.doi.org/10.4103/jpi.jpi_69_20 |
_version_ | 1783690671613280256 |
---|---|
author | Vali-Betts, Elham Krause, Kevin J. Dubrovsky, Alanna Olson, Kristin Graff, John Paul Mitra, Anupam Datta-Mitra, Ananya Beck, Kenneth Tsirigos, Aristotelis Loomis, Cynthia Neto, Antonio Galvao Adler, Esther Rashidi, Hooman H. |
author_facet | Vali-Betts, Elham Krause, Kevin J. Dubrovsky, Alanna Olson, Kristin Graff, John Paul Mitra, Anupam Datta-Mitra, Ananya Beck, Kenneth Tsirigos, Aristotelis Loomis, Cynthia Neto, Antonio Galvao Adler, Esther Rashidi, Hooman H. |
author_sort | Vali-Betts, Elham |
collection | PubMed |
description | AIMS: Histology, the microscopic study of normal tissues, is a crucial element of most medical curricula. Learning tools focused on histology are very important to learners who seek diagnostic competency within this important diagnostic arena. Recent developments in machine learning (ML) suggest that certain ML tools may be able to benefit this histology learning platform. Here, we aim to explore how one such tool based on a convolutional neural network, can be used to build a generalizable multi-classification model capable of classifying microscopic images of human tissue samples with the ultimate goal of providing a differential diagnosis (a list of look-alikes) for each entity. METHODS: We obtained three institutional training datasets and one generalizability test dataset, each containing images of histologic tissues in 38 categories. Models were trained on data from single institutions, low quantity combinations of multiple institutions, and high quantity combinations of multiple institutions. Models were tested against withheld validation data, external institutional data, and generalizability test images obtained from Google image search. Performance was measured with macro and micro accuracy, sensitivity, specificity, and f1-score. RESULTS: In this study, we were able to show that such a model's generalizability is dependent on both the training data source variety and the total number of training images used. Models which were trained on 760 images from only a single institution performed well on withheld internal data but poorly on external data (lower generalizability). Increasing data source diversity improved generalizability, even when decreasing data quantity: models trained on 684 images, but from three sources improved generalization accuracy between 4.05% and 18.59%. Maintaining this diversity and increasing the quantity of training images to 2280 further improved generalization accuracy between 16.51% and 32.79%. CONCLUSIONS: This pilot study highlights the significance of data diversity within such studies. As expected, optimal models are those that incorporate both diversity and quantity into their platforms.s |
format | Online Article Text |
id | pubmed-8112343 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Wolters Kluwer - Medknow |
record_format | MEDLINE/PubMed |
spelling | pubmed-81123432021-05-18 Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models Vali-Betts, Elham Krause, Kevin J. Dubrovsky, Alanna Olson, Kristin Graff, John Paul Mitra, Anupam Datta-Mitra, Ananya Beck, Kenneth Tsirigos, Aristotelis Loomis, Cynthia Neto, Antonio Galvao Adler, Esther Rashidi, Hooman H. J Pathol Inform Original Article AIMS: Histology, the microscopic study of normal tissues, is a crucial element of most medical curricula. Learning tools focused on histology are very important to learners who seek diagnostic competency within this important diagnostic arena. Recent developments in machine learning (ML) suggest that certain ML tools may be able to benefit this histology learning platform. Here, we aim to explore how one such tool based on a convolutional neural network, can be used to build a generalizable multi-classification model capable of classifying microscopic images of human tissue samples with the ultimate goal of providing a differential diagnosis (a list of look-alikes) for each entity. METHODS: We obtained three institutional training datasets and one generalizability test dataset, each containing images of histologic tissues in 38 categories. Models were trained on data from single institutions, low quantity combinations of multiple institutions, and high quantity combinations of multiple institutions. Models were tested against withheld validation data, external institutional data, and generalizability test images obtained from Google image search. Performance was measured with macro and micro accuracy, sensitivity, specificity, and f1-score. RESULTS: In this study, we were able to show that such a model's generalizability is dependent on both the training data source variety and the total number of training images used. Models which were trained on 760 images from only a single institution performed well on withheld internal data but poorly on external data (lower generalizability). Increasing data source diversity improved generalizability, even when decreasing data quantity: models trained on 684 images, but from three sources improved generalization accuracy between 4.05% and 18.59%. Maintaining this diversity and increasing the quantity of training images to 2280 further improved generalization accuracy between 16.51% and 32.79%. CONCLUSIONS: This pilot study highlights the significance of data diversity within such studies. As expected, optimal models are those that incorporate both diversity and quantity into their platforms.s Wolters Kluwer - Medknow 2021-01-23 /pmc/articles/PMC8112343/ /pubmed/34012709 http://dx.doi.org/10.4103/jpi.jpi_69_20 Text en Copyright: © 2021 Journal of Pathology Informatics https://creativecommons.org/licenses/by-nc-sa/4.0/This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms. |
spellingShingle | Original Article Vali-Betts, Elham Krause, Kevin J. Dubrovsky, Alanna Olson, Kristin Graff, John Paul Mitra, Anupam Datta-Mitra, Ananya Beck, Kenneth Tsirigos, Aristotelis Loomis, Cynthia Neto, Antonio Galvao Adler, Esther Rashidi, Hooman H. Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models |
title | Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models |
title_full | Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models |
title_fullStr | Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models |
title_full_unstemmed | Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models |
title_short | Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models |
title_sort | effects of image quantity and image source variation on machine learning histology differential diagnosis models |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112343/ https://www.ncbi.nlm.nih.gov/pubmed/34012709 http://dx.doi.org/10.4103/jpi.jpi_69_20 |
work_keys_str_mv | AT valibettselham effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT krausekevinj effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT dubrovskyalanna effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT olsonkristin effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT graffjohnpaul effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT mitraanupam effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT dattamitraananya effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT beckkenneth effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT tsirigosaristotelis effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT loomiscynthia effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT netoantoniogalvao effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT adleresther effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels AT rashidihoomanh effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels |