Cargando…

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network

BACKGROUND: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation i...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Xiang, Chai, Hua, Zhao, Huiying, Luo, Ching-Hsing, Yang, Yuedong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350980/
https://www.ncbi.nlm.nih.gov/pubmed/32649756
http://dx.doi.org/10.1093/gigascience/giaa076
_version_ 1783557374337875968
author Zhou, Xiang
Chai, Hua
Zhao, Huiying
Luo, Ching-Hsing
Yang, Yuedong
author_facet Zhou, Xiang
Chai, Hua
Zhao, Huiying
Luo, Ching-Hsing
Yang, Yuedong
author_sort Zhou, Xiang
collection PubMed
description BACKGROUND: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets. RESULTS: Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning–based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7–11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation–driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. CONCLUSIONS: TDimpute is an effective method for RNA-seq imputation with limited training samples.
format Online
Article
Text
id pubmed-7350980
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73509802020-07-14 Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network Zhou, Xiang Chai, Hua Zhao, Huiying Luo, Ching-Hsing Yang, Yuedong Gigascience Research BACKGROUND: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets. RESULTS: Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning–based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7–11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation–driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. CONCLUSIONS: TDimpute is an effective method for RNA-seq imputation with limited training samples. Oxford University Press 2020-07-10 /pmc/articles/PMC7350980/ /pubmed/32649756 http://dx.doi.org/10.1093/gigascience/giaa076 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Zhou, Xiang
Chai, Hua
Zhao, Huiying
Luo, Ching-Hsing
Yang, Yuedong
Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network
title Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network
title_full Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network
title_fullStr Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network
title_full_unstemmed Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network
title_short Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network
title_sort imputing missing rna-sequencing data from dna methylation by using a transfer learning–based neural network
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7350980/
https://www.ncbi.nlm.nih.gov/pubmed/32649756
http://dx.doi.org/10.1093/gigascience/giaa076
work_keys_str_mv AT zhouxiang imputingmissingrnasequencingdatafromdnamethylationbyusingatransferlearningbasedneuralnetwork
AT chaihua imputingmissingrnasequencingdatafromdnamethylationbyusingatransferlearningbasedneuralnetwork
AT zhaohuiying imputingmissingrnasequencingdatafromdnamethylationbyusingatransferlearningbasedneuralnetwork
AT luochinghsing imputingmissingrnasequencingdatafromdnamethylationbyusingatransferlearningbasedneuralnetwork
AT yangyuedong imputingmissingrnasequencingdatafromdnamethylationbyusingatransferlearningbasedneuralnetwork