Cargando…

Evaluating reproducibility of AI algorithms in digital pathology with DAPPER

Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred...

Descripción completa

Detalles Bibliográficos
Autores principales: Bizzego, Andrea, Bussola, Nicole, Chierici, Marco, Maggio, Valerio, Francescatto, Margherita, Cima, Luca, Cristoforetti, Marco, Jurman, Giuseppe, Furlanello, Cesare
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6467397/
https://www.ncbi.nlm.nih.gov/pubmed/30917113
http://dx.doi.org/10.1371/journal.pcbi.1006269
_version_ 1783411265432977408
author Bizzego, Andrea
Bussola, Nicole
Chierici, Marco
Maggio, Valerio
Francescatto, Margherita
Cima, Luca
Cristoforetti, Marco
Jurman, Giuseppe
Furlanello, Cesare
author_facet Bizzego, Andrea
Bussola, Nicole
Chierici, Marco
Maggio, Valerio
Francescatto, Margherita
Cima, Luca
Cristoforetti, Marco
Jurman, Giuseppe
Furlanello, Cesare
author_sort Bizzego, Andrea
collection PubMed
description Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g., from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA’s MAQC project, designed to analyze causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test three different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with four datasets (5, 10, 20 or 30 classes), for a total of 53, 000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for diagnostic tests (e.g., random labels) to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1, 060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning pipeline and the Histological Imaging—Newsy Tiles (HINT) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for digital pathology.
format Online
Article
Text
id pubmed-6467397
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64673972019-05-03 Evaluating reproducibility of AI algorithms in digital pathology with DAPPER Bizzego, Andrea Bussola, Nicole Chierici, Marco Maggio, Valerio Francescatto, Margherita Cima, Luca Cristoforetti, Marco Jurman, Giuseppe Furlanello, Cesare PLoS Comput Biol Research Article Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g., from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA’s MAQC project, designed to analyze causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test three different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with four datasets (5, 10, 20 or 30 classes), for a total of 53, 000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for diagnostic tests (e.g., random labels) to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1, 060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning pipeline and the Histological Imaging—Newsy Tiles (HINT) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for digital pathology. Public Library of Science 2019-03-27 /pmc/articles/PMC6467397/ /pubmed/30917113 http://dx.doi.org/10.1371/journal.pcbi.1006269 Text en © 2019 Bizzego et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bizzego, Andrea
Bussola, Nicole
Chierici, Marco
Maggio, Valerio
Francescatto, Margherita
Cima, Luca
Cristoforetti, Marco
Jurman, Giuseppe
Furlanello, Cesare
Evaluating reproducibility of AI algorithms in digital pathology with DAPPER
title Evaluating reproducibility of AI algorithms in digital pathology with DAPPER
title_full Evaluating reproducibility of AI algorithms in digital pathology with DAPPER
title_fullStr Evaluating reproducibility of AI algorithms in digital pathology with DAPPER
title_full_unstemmed Evaluating reproducibility of AI algorithms in digital pathology with DAPPER
title_short Evaluating reproducibility of AI algorithms in digital pathology with DAPPER
title_sort evaluating reproducibility of ai algorithms in digital pathology with dapper
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6467397/
https://www.ncbi.nlm.nih.gov/pubmed/30917113
http://dx.doi.org/10.1371/journal.pcbi.1006269
work_keys_str_mv AT bizzegoandrea evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT bussolanicole evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT chiericimarco evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT maggiovalerio evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT francescattomargherita evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT cimaluca evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT cristoforettimarco evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT jurmangiuseppe evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper
AT furlanellocesare evaluatingreproducibilityofaialgorithmsindigitalpathologywithdapper