Cargando…

Genomic data imputation with variational auto-encoders

BACKGROUND: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Qiu, Yeping Lina, Zheng, Hong, Gevaert, Olivier
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7407276/ https://www.ncbi.nlm.nih.gov/pubmed/32761097 http://dx.doi.org/10.1093/gigascience/giaa082

_version_	1783567589906055168
author	Qiu, Yeping Lina Zheng, Hong Gevaert, Olivier
author_facet	Qiu, Yeping Lina Zheng, Hong Gevaert, Olivier
author_sort	Qiu, Yeping Lina
collection	PubMed
description	BACKGROUND: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. RESULTS: In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. CONCLUSIONS: We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.
format	Online Article Text
id	pubmed-7407276
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-74072762020-08-10 Genomic data imputation with variational auto-encoders Qiu, Yeping Lina Zheng, Hong Gevaert, Olivier Gigascience Technical Note BACKGROUND: As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. RESULTS: In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. CONCLUSIONS: We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios. Oxford University Press 2020-08-06 /pmc/articles/PMC7407276/ /pubmed/32761097 http://dx.doi.org/10.1093/gigascience/giaa082 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Qiu, Yeping Lina Zheng, Hong Gevaert, Olivier Genomic data imputation with variational auto-encoders
title	Genomic data imputation with variational auto-encoders
title_full	Genomic data imputation with variational auto-encoders
title_fullStr	Genomic data imputation with variational auto-encoders
title_full_unstemmed	Genomic data imputation with variational auto-encoders
title_short	Genomic data imputation with variational auto-encoders
title_sort	genomic data imputation with variational auto-encoders
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7407276/ https://www.ncbi.nlm.nih.gov/pubmed/32761097 http://dx.doi.org/10.1093/gigascience/giaa082
work_keys_str_mv	AT qiuyepinglina genomicdataimputationwithvariationalautoencoders AT zhenghong genomicdataimputationwithvariationalautoencoders AT gevaertolivier genomicdataimputationwithvariationalautoencoders

Genomic data imputation with variational auto-encoders

Ejemplares similares