Cargando…

Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models

AIMS: Histology, the microscopic study of normal tissues, is a crucial element of most medical curricula. Learning tools focused on histology are very important to learners who seek diagnostic competency within this important diagnostic arena. Recent developments in machine learning (ML) suggest tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Vali-Betts, Elham, Krause, Kevin J., Dubrovsky, Alanna, Olson, Kristin, Graff, John Paul, Mitra, Anupam, Datta-Mitra, Ananya, Beck, Kenneth, Tsirigos, Aristotelis, Loomis, Cynthia, Neto, Antonio Galvao, Adler, Esther, Rashidi, Hooman H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wolters Kluwer - Medknow 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112343/
https://www.ncbi.nlm.nih.gov/pubmed/34012709
http://dx.doi.org/10.4103/jpi.jpi_69_20
_version_ 1783690671613280256
author Vali-Betts, Elham
Krause, Kevin J.
Dubrovsky, Alanna
Olson, Kristin
Graff, John Paul
Mitra, Anupam
Datta-Mitra, Ananya
Beck, Kenneth
Tsirigos, Aristotelis
Loomis, Cynthia
Neto, Antonio Galvao
Adler, Esther
Rashidi, Hooman H.
author_facet Vali-Betts, Elham
Krause, Kevin J.
Dubrovsky, Alanna
Olson, Kristin
Graff, John Paul
Mitra, Anupam
Datta-Mitra, Ananya
Beck, Kenneth
Tsirigos, Aristotelis
Loomis, Cynthia
Neto, Antonio Galvao
Adler, Esther
Rashidi, Hooman H.
author_sort Vali-Betts, Elham
collection PubMed
description AIMS: Histology, the microscopic study of normal tissues, is a crucial element of most medical curricula. Learning tools focused on histology are very important to learners who seek diagnostic competency within this important diagnostic arena. Recent developments in machine learning (ML) suggest that certain ML tools may be able to benefit this histology learning platform. Here, we aim to explore how one such tool based on a convolutional neural network, can be used to build a generalizable multi-classification model capable of classifying microscopic images of human tissue samples with the ultimate goal of providing a differential diagnosis (a list of look-alikes) for each entity. METHODS: We obtained three institutional training datasets and one generalizability test dataset, each containing images of histologic tissues in 38 categories. Models were trained on data from single institutions, low quantity combinations of multiple institutions, and high quantity combinations of multiple institutions. Models were tested against withheld validation data, external institutional data, and generalizability test images obtained from Google image search. Performance was measured with macro and micro accuracy, sensitivity, specificity, and f1-score. RESULTS: In this study, we were able to show that such a model's generalizability is dependent on both the training data source variety and the total number of training images used. Models which were trained on 760 images from only a single institution performed well on withheld internal data but poorly on external data (lower generalizability). Increasing data source diversity improved generalizability, even when decreasing data quantity: models trained on 684 images, but from three sources improved generalization accuracy between 4.05% and 18.59%. Maintaining this diversity and increasing the quantity of training images to 2280 further improved generalization accuracy between 16.51% and 32.79%. CONCLUSIONS: This pilot study highlights the significance of data diversity within such studies. As expected, optimal models are those that incorporate both diversity and quantity into their platforms.s
format Online
Article
Text
id pubmed-8112343
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Wolters Kluwer - Medknow
record_format MEDLINE/PubMed
spelling pubmed-81123432021-05-18 Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models Vali-Betts, Elham Krause, Kevin J. Dubrovsky, Alanna Olson, Kristin Graff, John Paul Mitra, Anupam Datta-Mitra, Ananya Beck, Kenneth Tsirigos, Aristotelis Loomis, Cynthia Neto, Antonio Galvao Adler, Esther Rashidi, Hooman H. J Pathol Inform Original Article AIMS: Histology, the microscopic study of normal tissues, is a crucial element of most medical curricula. Learning tools focused on histology are very important to learners who seek diagnostic competency within this important diagnostic arena. Recent developments in machine learning (ML) suggest that certain ML tools may be able to benefit this histology learning platform. Here, we aim to explore how one such tool based on a convolutional neural network, can be used to build a generalizable multi-classification model capable of classifying microscopic images of human tissue samples with the ultimate goal of providing a differential diagnosis (a list of look-alikes) for each entity. METHODS: We obtained three institutional training datasets and one generalizability test dataset, each containing images of histologic tissues in 38 categories. Models were trained on data from single institutions, low quantity combinations of multiple institutions, and high quantity combinations of multiple institutions. Models were tested against withheld validation data, external institutional data, and generalizability test images obtained from Google image search. Performance was measured with macro and micro accuracy, sensitivity, specificity, and f1-score. RESULTS: In this study, we were able to show that such a model's generalizability is dependent on both the training data source variety and the total number of training images used. Models which were trained on 760 images from only a single institution performed well on withheld internal data but poorly on external data (lower generalizability). Increasing data source diversity improved generalizability, even when decreasing data quantity: models trained on 684 images, but from three sources improved generalization accuracy between 4.05% and 18.59%. Maintaining this diversity and increasing the quantity of training images to 2280 further improved generalization accuracy between 16.51% and 32.79%. CONCLUSIONS: This pilot study highlights the significance of data diversity within such studies. As expected, optimal models are those that incorporate both diversity and quantity into their platforms.s Wolters Kluwer - Medknow 2021-01-23 /pmc/articles/PMC8112343/ /pubmed/34012709 http://dx.doi.org/10.4103/jpi.jpi_69_20 Text en Copyright: © 2021 Journal of Pathology Informatics https://creativecommons.org/licenses/by-nc-sa/4.0/This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.
spellingShingle Original Article
Vali-Betts, Elham
Krause, Kevin J.
Dubrovsky, Alanna
Olson, Kristin
Graff, John Paul
Mitra, Anupam
Datta-Mitra, Ananya
Beck, Kenneth
Tsirigos, Aristotelis
Loomis, Cynthia
Neto, Antonio Galvao
Adler, Esther
Rashidi, Hooman H.
Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
title Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
title_full Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
title_fullStr Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
title_full_unstemmed Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
title_short Effects of Image Quantity and Image Source Variation on Machine Learning Histology Differential Diagnosis Models
title_sort effects of image quantity and image source variation on machine learning histology differential diagnosis models
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112343/
https://www.ncbi.nlm.nih.gov/pubmed/34012709
http://dx.doi.org/10.4103/jpi.jpi_69_20
work_keys_str_mv AT valibettselham effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT krausekevinj effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT dubrovskyalanna effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT olsonkristin effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT graffjohnpaul effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT mitraanupam effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT dattamitraananya effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT beckkenneth effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT tsirigosaristotelis effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT loomiscynthia effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT netoantoniogalvao effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT adleresther effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels
AT rashidihoomanh effectsofimagequantityandimagesourcevariationonmachinelearninghistologydifferentialdiagnosismodels