Cargando…

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model par...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hayakawa, Jin, Seki, Tomohisa, Kawazoe, Yoshimasa, Ohe, Kazuhiko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231717/ https://www.ncbi.nlm.nih.gov/pubmed/35749395 http://dx.doi.org/10.1371/journal.pone.0269570

_version_	1784735406039760896
author	Hayakawa, Jin Seki, Tomohisa Kawazoe, Yoshimasa Ohe, Kazuhiko
author_facet	Hayakawa, Jin Seki, Tomohisa Kawazoe, Yoshimasa Ohe, Kazuhiko
author_sort	Hayakawa, Jin
collection	PubMed
description	Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.
format	Online Article Text
id	pubmed-9231717
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-92317172022-06-25 Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma Hayakawa, Jin Seki, Tomohisa Kawazoe, Yoshimasa Ohe, Kazuhiko PLoS One Research Article Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction. Public Library of Science 2022-06-24 /pmc/articles/PMC9231717/ /pubmed/35749395 http://dx.doi.org/10.1371/journal.pone.0269570 Text en © 2022 Hayakawa et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hayakawa, Jin Seki, Tomohisa Kawazoe, Yoshimasa Ohe, Kazuhiko Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title	Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_full	Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_fullStr	Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_full_unstemmed	Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_short	Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma
title_sort	pathway importance by graph convolutional network and shapley additive explanations in gene expression phenotype of diffuse large b-cell lymphoma
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231717/ https://www.ncbi.nlm.nih.gov/pubmed/35749395 http://dx.doi.org/10.1371/journal.pone.0269570
work_keys_str_mv	AT hayakawajin pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma AT sekitomohisa pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma AT kawazoeyoshimasa pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma AT ohekazuhiko pathwayimportancebygraphconvolutionalnetworkandshapleyadditiveexplanationsingeneexpressionphenotypeofdiffuselargebcelllymphoma

Pathway importance by graph convolutional network and Shapley additive explanations in gene expression phenotype of diffuse large B-cell lymphoma

Ejemplares similares