Cargando…
SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images
Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. In such analyses, WSIs are typically broken into smaller images called tiles, and a neural network backbone encodes each tile in...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418159/ https://www.ncbi.nlm.nih.gov/pubmed/37577691 http://dx.doi.org/10.1101/2023.08.01.551468 |
_version_ | 1785088207300329472 |
---|---|
author | Mukashyaka, Patience Sheridan, Todd B. Foroughi pour, Ali Chuang, Jeffrey H. |
author_facet | Mukashyaka, Patience Sheridan, Todd B. Foroughi pour, Ali Chuang, Jeffrey H. |
author_sort | Mukashyaka, Patience |
collection | PubMed |
description | Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. In such analyses, WSIs are typically broken into smaller images called tiles, and a neural network backbone encodes each tile in a feature space. Many recent works have applied attention based deep learning models to aggregate tile-level features into a slide-level representation, which is then used for slide-level prediction tasks. However, training attention models is computationally intensive, necessitating hyperparameter optimization and specialized training procedures. Here, we propose SAMPLER, a fully statistical approach to generate efficient and informative WSI representations by encoding the empirical cumulative distribution functions (CDFs) of multiscale tile features. We demonstrate that SAMPLER-based classifiers are as accurate or better than state-of-the-art fully deep learning attention models for classification tasks including distinction of: subtypes of breast carcinoma (BRCA: AUC=0.911 ± 0.029); subtypes of non-small cell lung carcinoma (NSCLC: AUC=0.940±0.018); and subtypes of renal cell carcinoma (RCC: AUC=0.987±0.006). A major advantage of the SAMPLER representation is that predictive models are >100X faster compared to attention models. Histopathological review confirms that SAMPLER-identified high attention tiles contain tumor morphological features specific to the tumor type, while low attention tiles contain fibrous stroma, blood, or tissue folding artifacts. We further apply SAMPLER concepts to improve the design of attention-based neural networks, yielding a context aware multi-head attention model with increased accuracy for subtype classification within BRCA and RCC (BRCA: AUC=0.921±0.027, and RCC: AUC=0.988±0.010). Finally, we provide theoretical results identifying sufficient conditions for which SAMPLER is optimal. SAMPLER is a fast and effective approach for analyzing WSIs, with greatly improved scalability over attention methods to benefit digital pathology analysis. |
format | Online Article Text |
id | pubmed-10418159 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-104181592023-08-12 SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images Mukashyaka, Patience Sheridan, Todd B. Foroughi pour, Ali Chuang, Jeffrey H. bioRxiv Article Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. In such analyses, WSIs are typically broken into smaller images called tiles, and a neural network backbone encodes each tile in a feature space. Many recent works have applied attention based deep learning models to aggregate tile-level features into a slide-level representation, which is then used for slide-level prediction tasks. However, training attention models is computationally intensive, necessitating hyperparameter optimization and specialized training procedures. Here, we propose SAMPLER, a fully statistical approach to generate efficient and informative WSI representations by encoding the empirical cumulative distribution functions (CDFs) of multiscale tile features. We demonstrate that SAMPLER-based classifiers are as accurate or better than state-of-the-art fully deep learning attention models for classification tasks including distinction of: subtypes of breast carcinoma (BRCA: AUC=0.911 ± 0.029); subtypes of non-small cell lung carcinoma (NSCLC: AUC=0.940±0.018); and subtypes of renal cell carcinoma (RCC: AUC=0.987±0.006). A major advantage of the SAMPLER representation is that predictive models are >100X faster compared to attention models. Histopathological review confirms that SAMPLER-identified high attention tiles contain tumor morphological features specific to the tumor type, while low attention tiles contain fibrous stroma, blood, or tissue folding artifacts. We further apply SAMPLER concepts to improve the design of attention-based neural networks, yielding a context aware multi-head attention model with increased accuracy for subtype classification within BRCA and RCC (BRCA: AUC=0.921±0.027, and RCC: AUC=0.988±0.010). Finally, we provide theoretical results identifying sufficient conditions for which SAMPLER is optimal. SAMPLER is a fast and effective approach for analyzing WSIs, with greatly improved scalability over attention methods to benefit digital pathology analysis. Cold Spring Harbor Laboratory 2023-08-03 /pmc/articles/PMC10418159/ /pubmed/37577691 http://dx.doi.org/10.1101/2023.08.01.551468 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Mukashyaka, Patience Sheridan, Todd B. Foroughi pour, Ali Chuang, Jeffrey H. SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images |
title | SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images |
title_full | SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images |
title_fullStr | SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images |
title_full_unstemmed | SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images |
title_short | SAMPLER: Empirical distribution representations for rapid analysis of whole slide tissue images |
title_sort | sampler: empirical distribution representations for rapid analysis of whole slide tissue images |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418159/ https://www.ncbi.nlm.nih.gov/pubmed/37577691 http://dx.doi.org/10.1101/2023.08.01.551468 |
work_keys_str_mv | AT mukashyakapatience samplerempiricaldistributionrepresentationsforrapidanalysisofwholeslidetissueimages AT sheridantoddb samplerempiricaldistributionrepresentationsforrapidanalysisofwholeslidetissueimages AT foroughipourali samplerempiricaldistributionrepresentationsforrapidanalysisofwholeslidetissueimages AT chuangjeffreyh samplerempiricaldistributionrepresentationsforrapidanalysisofwholeslidetissueimages |