Cargando…

Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice

BACKGROUND: Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (mic...

Descripción completa

Detalles Bibliográficos
Autores principales: Tavolara, Thomas E., Niazi, M.K.K., Gower, Adam C., Ginese, Melanie, Beamer, Gillian, Gurcan, Metin N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138606/
https://www.ncbi.nlm.nih.gov/pubmed/34000621
http://dx.doi.org/10.1016/j.ebiom.2021.103388
_version_ 1783695844555358208
author Tavolara, Thomas E.
Niazi, M.K.K.
Gower, Adam C.
Ginese, Melanie
Beamer, Gillian
Gurcan, Metin N.
author_facet Tavolara, Thomas E.
Niazi, M.K.K.
Gower, Adam C.
Ginese, Melanie
Beamer, Gillian
Gurcan, Metin N.
author_sort Tavolara, Thomas E.
collection PubMed
description BACKGROUND: Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ('supersusceptible') in an experimentally infected cohort of Diversity Outbred mice (n=77). METHODS: Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes' expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data. FINDINGS: The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively. IMPLICATIONS: Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes' expression to serve as a diagnostic or prognostic tool. FUNDING: National Institutes of Health and American Lung Association.
format Online
Article
Text
id pubmed-8138606
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-81386062021-05-24 Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice Tavolara, Thomas E. Niazi, M.K.K. Gower, Adam C. Ginese, Melanie Beamer, Gillian Gurcan, Metin N. EBioMedicine Research Paper BACKGROUND: Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ('supersusceptible') in an experimentally infected cohort of Diversity Outbred mice (n=77). METHODS: Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes' expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data. FINDINGS: The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively. IMPLICATIONS: Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes' expression to serve as a diagnostic or prognostic tool. FUNDING: National Institutes of Health and American Lung Association. Elsevier 2021-05-14 /pmc/articles/PMC8138606/ /pubmed/34000621 http://dx.doi.org/10.1016/j.ebiom.2021.103388 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Paper
Tavolara, Thomas E.
Niazi, M.K.K.
Gower, Adam C.
Ginese, Melanie
Beamer, Gillian
Gurcan, Metin N.
Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice
title Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice
title_full Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice
title_fullStr Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice
title_full_unstemmed Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice
title_short Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice
title_sort deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in mycobacterium tuberculosis infected diversity outbred mice
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8138606/
https://www.ncbi.nlm.nih.gov/pubmed/34000621
http://dx.doi.org/10.1016/j.ebiom.2021.103388
work_keys_str_mv AT tavolarathomase deeplearningpredictsgeneexpressionasanintermediatedatamodalitytoidentifysusceptibilitypatternsinmycobacteriumtuberculosisinfecteddiversityoutbredmice
AT niazimkk deeplearningpredictsgeneexpressionasanintermediatedatamodalitytoidentifysusceptibilitypatternsinmycobacteriumtuberculosisinfecteddiversityoutbredmice
AT goweradamc deeplearningpredictsgeneexpressionasanintermediatedatamodalitytoidentifysusceptibilitypatternsinmycobacteriumtuberculosisinfecteddiversityoutbredmice
AT ginesemelanie deeplearningpredictsgeneexpressionasanintermediatedatamodalitytoidentifysusceptibilitypatternsinmycobacteriumtuberculosisinfecteddiversityoutbredmice
AT beamergillian deeplearningpredictsgeneexpressionasanintermediatedatamodalitytoidentifysusceptibilitypatternsinmycobacteriumtuberculosisinfecteddiversityoutbredmice
AT gurcanmetinn deeplearningpredictsgeneexpressionasanintermediatedatamodalitytoidentifysusceptibilitypatternsinmycobacteriumtuberculosisinfecteddiversityoutbredmice