Cargando…

The impact of site-specific digital histology signatures on deep learning model accuracy and bias

The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these...

Descripción completa

Detalles Bibliográficos
Autores principales: Howard, Frederick M., Dolezal, James, Kochanny, Sara, Schulte, Jefree, Chen, Heather, Heij, Lara, Huo, Dezheng, Nanda, Rita, Olopade, Olufunmilayo I., Kather, Jakob N., Cipriani, Nicole, Grossman, Robert L., Pearson, Alexander T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8292530/
https://www.ncbi.nlm.nih.gov/pubmed/34285218
http://dx.doi.org/10.1038/s41467-021-24698-1
_version_ 1783724844879183872
author Howard, Frederick M.
Dolezal, James
Kochanny, Sara
Schulte, Jefree
Chen, Heather
Heij, Lara
Huo, Dezheng
Nanda, Rita
Olopade, Olufunmilayo I.
Kather, Jakob N.
Cipriani, Nicole
Grossman, Robert L.
Pearson, Alexander T.
author_facet Howard, Frederick M.
Dolezal, James
Kochanny, Sara
Schulte, Jefree
Chen, Heather
Heij, Lara
Huo, Dezheng
Nanda, Rita
Olopade, Olufunmilayo I.
Kather, Jakob N.
Cipriani, Nicole
Grossman, Robert L.
Pearson, Alexander T.
author_sort Howard, Frederick M.
collection PubMed
description The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site.
format Online
Article
Text
id pubmed-8292530
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82925302021-07-23 The impact of site-specific digital histology signatures on deep learning model accuracy and bias Howard, Frederick M. Dolezal, James Kochanny, Sara Schulte, Jefree Chen, Heather Heij, Lara Huo, Dezheng Nanda, Rita Olopade, Olufunmilayo I. Kather, Jakob N. Cipriani, Nicole Grossman, Robert L. Pearson, Alexander T. Nat Commun Article The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site. Nature Publishing Group UK 2021-07-20 /pmc/articles/PMC8292530/ /pubmed/34285218 http://dx.doi.org/10.1038/s41467-021-24698-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Howard, Frederick M.
Dolezal, James
Kochanny, Sara
Schulte, Jefree
Chen, Heather
Heij, Lara
Huo, Dezheng
Nanda, Rita
Olopade, Olufunmilayo I.
Kather, Jakob N.
Cipriani, Nicole
Grossman, Robert L.
Pearson, Alexander T.
The impact of site-specific digital histology signatures on deep learning model accuracy and bias
title The impact of site-specific digital histology signatures on deep learning model accuracy and bias
title_full The impact of site-specific digital histology signatures on deep learning model accuracy and bias
title_fullStr The impact of site-specific digital histology signatures on deep learning model accuracy and bias
title_full_unstemmed The impact of site-specific digital histology signatures on deep learning model accuracy and bias
title_short The impact of site-specific digital histology signatures on deep learning model accuracy and bias
title_sort impact of site-specific digital histology signatures on deep learning model accuracy and bias
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8292530/
https://www.ncbi.nlm.nih.gov/pubmed/34285218
http://dx.doi.org/10.1038/s41467-021-24698-1
work_keys_str_mv AT howardfrederickm theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT dolezaljames theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT kochannysara theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT schultejefree theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT chenheather theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT heijlara theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT huodezheng theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT nandarita theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT olopadeolufunmilayoi theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT katherjakobn theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT ciprianinicole theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT grossmanrobertl theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT pearsonalexandert theimpactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT howardfrederickm impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT dolezaljames impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT kochannysara impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT schultejefree impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT chenheather impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT heijlara impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT huodezheng impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT nandarita impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT olopadeolufunmilayoi impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT katherjakobn impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT ciprianinicole impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT grossmanrobertl impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias
AT pearsonalexandert impactofsitespecificdigitalhistologysignaturesondeeplearningmodelaccuracyandbias