Cargando…

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model par...

Descripción completa

Detalles Bibliográficos
Autores principales: Hayakawa, Jin, Seki, Tomohisa, Kawazoe, Yoshimasa, Ohe, Kazuhiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231717/
https://www.ncbi.nlm.nih.gov/pubmed/35749395
http://dx.doi.org/10.1371/journal.pone.0269570
_version_ 1784735406039760896
author Hayakawa, Jin
Seki, Tomohisa
Kawazoe, Yoshimasa
Ohe, Kazuhiko
author_facet Hayakawa, Jin
Seki, Tomohisa
Kawazoe, Yoshimasa
Ohe, Kazuhiko
author_sort Hayakawa, Jin
collection PubMed
description Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.
format Online
Article
Text
id pubmed-9231717
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92317172022-06-25 Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma Hayakawa, Jin Seki, Tomohisa Kawazoe, Yoshimasa Ohe, Kazuhiko PLoS One Research Article Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction. Public Library of Science 2022-06-24 /pmc/articles/PMC9231717/ /pubmed/35749395 http://dx.doi.org/10.1371/journal.pone.0269570 Text en © 2022 Hayakawa et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hayakawa, Jin
Seki, Tomohisa
Kawazoe, Yoshimasa
Ohe, Kazuhiko
Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_full Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_fullStr Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_full_unstemmed Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_short Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_sort pathway importance by graph convolutional network and shapley additive explanations in gene expression phenotype of diffuse large b-cell lymphoma
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231717/
https://www.ncbi.nlm.nih.gov/pubmed/35749395
http://dx.doi.org/10.1371/journal.pone.0269570
work_keys_str_mv AT hayakawajin pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma
AT sekitomohisa pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma
AT kawazoeyoshimasa pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma
AT ohekazuhiko pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma