Cargando…

A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs

BACKGROUND: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Lin, Zhao, Jing, Sun, Tao, Shen, Zhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210375/
https://www.ncbi.nlm.nih.gov/pubmed/34134612
http://dx.doi.org/10.1186/s12859-021-04256-8
_version_ 1783709298010882048
author Yuan, Lin
Zhao, Jing
Sun, Tao
Shen, Zhen
author_facet Yuan, Lin
Zhao, Jing
Sun, Tao
Shen, Zhen
author_sort Yuan, Lin
collection PubMed
description BACKGROUND: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. RESULTS: In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. CONCLUSIONS: Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04256-8.
format Online
Article
Text
id pubmed-8210375
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82103752021-06-17 A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs Yuan, Lin Zhao, Jing Sun, Tao Shen, Zhen BMC Bioinformatics Research BACKGROUND: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. RESULTS: In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. CONCLUSIONS: Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04256-8. BioMed Central 2021-06-16 /pmc/articles/PMC8210375/ /pubmed/34134612 http://dx.doi.org/10.1186/s12859-021-04256-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yuan, Lin
Zhao, Jing
Sun, Tao
Shen, Zhen
A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
title A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
title_full A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
title_fullStr A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
title_full_unstemmed A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
title_short A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
title_sort machine learning framework that integrates multi-omics data predicts cancer-related lncrnas
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210375/
https://www.ncbi.nlm.nih.gov/pubmed/34134612
http://dx.doi.org/10.1186/s12859-021-04256-8
work_keys_str_mv AT yuanlin amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT zhaojing amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT suntao amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT shenzhen amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT yuanlin machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT zhaojing machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT suntao machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas
AT shenzhen machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas