Cargando…

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data

BACKGROUND: Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denois...

Descripción completa

Detalles Bibliográficos
Autores principales: Kinalis, Savvas, Nielsen, Finn Cilius, Winther, Ole, Bagger, Frederik Otzen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615267/
https://www.ncbi.nlm.nih.gov/pubmed/31286861
http://dx.doi.org/10.1186/s12859-019-2952-9
_version_ 1783433335830216704
author Kinalis, Savvas
Nielsen, Finn Cilius
Winther, Ole
Bagger, Frederik Otzen
author_facet Kinalis, Savvas
Nielsen, Finn Cilius
Winther, Ole
Bagger, Frederik Otzen
author_sort Kinalis, Savvas
collection PubMed
description BACKGROUND: Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction. RESULTS: Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets. CONCLUSIONS: We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2952-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6615267
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66152672019-07-18 Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data Kinalis, Savvas Nielsen, Finn Cilius Winther, Ole Bagger, Frederik Otzen BMC Bioinformatics Methodology Article BACKGROUND: Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction. RESULTS: Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets. CONCLUSIONS: We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2952-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-08 /pmc/articles/PMC6615267/ /pubmed/31286861 http://dx.doi.org/10.1186/s12859-019-2952-9 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Kinalis, Savvas
Nielsen, Finn Cilius
Winther, Ole
Bagger, Frederik Otzen
Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data
title Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data
title_full Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data
title_fullStr Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data
title_full_unstemmed Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data
title_short Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data
title_sort deconvolution of autoencoders to learn biological regulatory modules from single cell mrna sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6615267/
https://www.ncbi.nlm.nih.gov/pubmed/31286861
http://dx.doi.org/10.1186/s12859-019-2952-9
work_keys_str_mv AT kinalissavvas deconvolutionofautoencoderstolearnbiologicalregulatorymodulesfromsinglecellmrnasequencingdata
AT nielsenfinncilius deconvolutionofautoencoderstolearnbiologicalregulatorymodulesfromsinglecellmrnasequencingdata
AT wintherole deconvolutionofautoencoderstolearnbiologicalregulatorymodulesfromsinglecellmrnasequencingdata
AT baggerfrederikotzen deconvolutionofautoencoderstolearnbiologicalregulatorymodulesfromsinglecellmrnasequencingdata