Cargando…

Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell...

Descripción completa

Detalles Bibliográficos
Autores principales: Gundogdu, Pelin, Loucera, Carlos, Alamo-Alvarez, Inmaculada, Dopazo, Joaquin, Nepomuceno, Isabel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8722116/
https://www.ncbi.nlm.nih.gov/pubmed/34980200
http://dx.doi.org/10.1186/s13040-021-00285-4
_version_ 1784625464516542464
author Gundogdu, Pelin
Loucera, Carlos
Alamo-Alvarez, Inmaculada
Dopazo, Joaquin
Nepomuceno, Isabel
author_facet Gundogdu, Pelin
Loucera, Carlos
Alamo-Alvarez, Inmaculada
Dopazo, Joaquin
Nepomuceno, Isabel
author_sort Gundogdu, Pelin
collection PubMed
description BACKGROUND: Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. RESULTS: In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. CONCLUSIONS: Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00285-4.
format Online
Article
Text
id pubmed-8722116
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-87221162022-01-06 Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data Gundogdu, Pelin Loucera, Carlos Alamo-Alvarez, Inmaculada Dopazo, Joaquin Nepomuceno, Isabel BioData Min Methodology BACKGROUND: Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. RESULTS: In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. CONCLUSIONS: Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-021-00285-4. BioMed Central 2022-01-03 /pmc/articles/PMC8722116/ /pubmed/34980200 http://dx.doi.org/10.1186/s13040-021-00285-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Gundogdu, Pelin
Loucera, Carlos
Alamo-Alvarez, Inmaculada
Dopazo, Joaquin
Nepomuceno, Isabel
Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data
title Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data
title_full Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data
title_fullStr Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data
title_full_unstemmed Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data
title_short Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data
title_sort integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell rna-seq data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8722116/
https://www.ncbi.nlm.nih.gov/pubmed/34980200
http://dx.doi.org/10.1186/s13040-021-00285-4
work_keys_str_mv AT gundogdupelin integratingpathwayknowledgewithdeepneuralnetworkstoreducethedimensionalityinsinglecellrnaseqdata
AT louceracarlos integratingpathwayknowledgewithdeepneuralnetworkstoreducethedimensionalityinsinglecellrnaseqdata
AT alamoalvarezinmaculada integratingpathwayknowledgewithdeepneuralnetworkstoreducethedimensionalityinsinglecellrnaseqdata
AT dopazojoaquin integratingpathwayknowledgewithdeepneuralnetworkstoreducethedimensionalityinsinglecellrnaseqdata
AT nepomucenoisabel integratingpathwayknowledgewithdeepneuralnetworkstoreducethedimensionalityinsinglecellrnaseqdata