Cargando…
A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs
BACKGROUND: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and c...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210375/ https://www.ncbi.nlm.nih.gov/pubmed/34134612 http://dx.doi.org/10.1186/s12859-021-04256-8 |
_version_ | 1783709298010882048 |
---|---|
author | Yuan, Lin Zhao, Jing Sun, Tao Shen, Zhen |
author_facet | Yuan, Lin Zhao, Jing Sun, Tao Shen, Zhen |
author_sort | Yuan, Lin |
collection | PubMed |
description | BACKGROUND: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. RESULTS: In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. CONCLUSIONS: Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04256-8. |
format | Online Article Text |
id | pubmed-8210375 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-82103752021-06-17 A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs Yuan, Lin Zhao, Jing Sun, Tao Shen, Zhen BMC Bioinformatics Research BACKGROUND: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. RESULTS: In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. CONCLUSIONS: Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04256-8. BioMed Central 2021-06-16 /pmc/articles/PMC8210375/ /pubmed/34134612 http://dx.doi.org/10.1186/s12859-021-04256-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Yuan, Lin Zhao, Jing Sun, Tao Shen, Zhen A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs |
title | A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs |
title_full | A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs |
title_fullStr | A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs |
title_full_unstemmed | A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs |
title_short | A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs |
title_sort | machine learning framework that integrates multi-omics data predicts cancer-related lncrnas |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210375/ https://www.ncbi.nlm.nih.gov/pubmed/34134612 http://dx.doi.org/10.1186/s12859-021-04256-8 |
work_keys_str_mv | AT yuanlin amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT zhaojing amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT suntao amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT shenzhen amachinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT yuanlin machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT zhaojing machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT suntao machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas AT shenzhen machinelearningframeworkthatintegratesmultiomicsdatapredictscancerrelatedlncrnas |