Cargando…

RVAgene: generative modeling of gene expression time series data

MOTIVATION: Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of sing...

Descripción completa

Detalles Bibliográficos
Autores principales: Mitra, Raktim, MacLean, Adam L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504625/
https://www.ncbi.nlm.nih.gov/pubmed/33974008
http://dx.doi.org/10.1093/bioinformatics/btab260
_version_ 1784581357904592896
author Mitra, Raktim
MacLean, Adam L
author_facet Mitra, Raktim
MacLean, Adam L
author_sort Mitra, Raktim
collection PubMed
description MOTIVATION: Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of single-cell datasets. Here, we extend these methods for use with gene expression time series data. RESULTS: We present RVAgene: a recurrent variational autoencoder to model gene expression dynamics. RVAgene learns to accurately and efficiently reconstruct temporal gene profiles. It also learns a low dimensional representation of the data via a recurrent encoder network that can be used for biological feature discovery, and from which we can generate new gene expression data by sampling the latent space. We test RVAgene on simulated and real biological datasets, including embryonic stem cell differentiation and kidney injury response dynamics. In all cases, RVAgene accurately reconstructed complex gene expression temporal profiles. Via cross validation, we show that a low-error latent space representation can be learnt using only a fraction of the data. Through clustering and gene ontology term enrichment analysis on the latent space, we demonstrate the potential of RVAgene for unsupervised discovery. In particular, RVAgene identifies new programs of shared gene regulation of Lox family genes in response to kidney injury. AVAILABILITY AND IMPLEMENTATION: All datasets analyzed in this manuscript are publicly available and have been published previously. RVAgene is available in Python, at GitHub: https://github.com/maclean-lab/RVAgene; Zenodo archive: http://doi.org/10.5281/zenodo.4271097. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8504625
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85046252021-10-13 RVAgene: generative modeling of gene expression time series data Mitra, Raktim MacLean, Adam L Bioinformatics Original Papers MOTIVATION: Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of single-cell datasets. Here, we extend these methods for use with gene expression time series data. RESULTS: We present RVAgene: a recurrent variational autoencoder to model gene expression dynamics. RVAgene learns to accurately and efficiently reconstruct temporal gene profiles. It also learns a low dimensional representation of the data via a recurrent encoder network that can be used for biological feature discovery, and from which we can generate new gene expression data by sampling the latent space. We test RVAgene on simulated and real biological datasets, including embryonic stem cell differentiation and kidney injury response dynamics. In all cases, RVAgene accurately reconstructed complex gene expression temporal profiles. Via cross validation, we show that a low-error latent space representation can be learnt using only a fraction of the data. Through clustering and gene ontology term enrichment analysis on the latent space, we demonstrate the potential of RVAgene for unsupervised discovery. In particular, RVAgene identifies new programs of shared gene regulation of Lox family genes in response to kidney injury. AVAILABILITY AND IMPLEMENTATION: All datasets analyzed in this manuscript are publicly available and have been published previously. RVAgene is available in Python, at GitHub: https://github.com/maclean-lab/RVAgene; Zenodo archive: http://doi.org/10.5281/zenodo.4271097. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-11 /pmc/articles/PMC8504625/ /pubmed/33974008 http://dx.doi.org/10.1093/bioinformatics/btab260 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Mitra, Raktim
MacLean, Adam L
RVAgene: generative modeling of gene expression time series data
title RVAgene: generative modeling of gene expression time series data
title_full RVAgene: generative modeling of gene expression time series data
title_fullStr RVAgene: generative modeling of gene expression time series data
title_full_unstemmed RVAgene: generative modeling of gene expression time series data
title_short RVAgene: generative modeling of gene expression time series data
title_sort rvagene: generative modeling of gene expression time series data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504625/
https://www.ncbi.nlm.nih.gov/pubmed/33974008
http://dx.doi.org/10.1093/bioinformatics/btab260
work_keys_str_mv AT mitraraktim rvagenegenerativemodelingofgeneexpressiontimeseriesdata
AT macleanadaml rvagenegenerativemodelingofgeneexpressiontimeseriesdata