Cargando…

Mining influential genes based on deep learning

BACKGROUND: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced repr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kong, Lingpeng, Chen, Yuanyuan, Xu, Fengjiao, Xu, Mingmin, Li, Zutan, Fang, Jingya, Zhang, Liangyun, Pian, Cong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7821411/ https://www.ncbi.nlm.nih.gov/pubmed/33482718 http://dx.doi.org/10.1186/s12859-021-03972-5

_version_	1783639418269073408
author	Kong, Lingpeng Chen, Yuanyuan Xu, Fengjiao Xu, Mingmin Li, Zutan Fang, Jingya Zhang, Liangyun Pian, Cong
author_facet	Kong, Lingpeng Chen, Yuanyuan Xu, Fengjiao Xu, Mingmin Li, Zutan Fang, Jingya Zhang, Liangyun Pian, Cong
author_sort	Kong, Lingpeng
collection	PubMed
description	BACKGROUND: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. RESULTS: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. CONCLUSIONS: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.
format	Online Article Text
id	pubmed-7821411
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-78214112021-01-22 Mining influential genes based on deep learning Kong, Lingpeng Chen, Yuanyuan Xu, Fengjiao Xu, Mingmin Li, Zutan Fang, Jingya Zhang, Liangyun Pian, Cong BMC Bioinformatics Methodology Article BACKGROUND: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. RESULTS: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. CONCLUSIONS: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections. BioMed Central 2021-01-22 /pmc/articles/PMC7821411/ /pubmed/33482718 http://dx.doi.org/10.1186/s12859-021-03972-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Kong, Lingpeng Chen, Yuanyuan Xu, Fengjiao Xu, Mingmin Li, Zutan Fang, Jingya Zhang, Liangyun Pian, Cong Mining influential genes based on deep learning
title	Mining influential genes based on deep learning
title_full	Mining influential genes based on deep learning
title_fullStr	Mining influential genes based on deep learning
title_full_unstemmed	Mining influential genes based on deep learning
title_short	Mining influential genes based on deep learning
title_sort	mining influential genes based on deep learning
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7821411/ https://www.ncbi.nlm.nih.gov/pubmed/33482718 http://dx.doi.org/10.1186/s12859-021-03972-5
work_keys_str_mv	AT konglingpeng mininginfluentialgenesbasedondeeplearning AT chenyuanyuan mininginfluentialgenesbasedondeeplearning AT xufengjiao mininginfluentialgenesbasedondeeplearning AT xumingmin mininginfluentialgenesbasedondeeplearning AT lizutan mininginfluentialgenesbasedondeeplearning AT fangjingya mininginfluentialgenesbasedondeeplearning AT zhangliangyun mininginfluentialgenesbasedondeeplearning AT piancong mininginfluentialgenesbasedondeeplearning

Mining influential genes based on deep learning

Ejemplares similares