Cargando…

Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations

More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Dengju, Zhang, Tao, Zhan, Xiaojuan, Zhang, Shuli, Zhan, Xiaorong, Zhang, Chao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9448985/
https://www.ncbi.nlm.nih.gov/pubmed/36092871
http://dx.doi.org/10.3389/fgene.2022.995532
_version_ 1784784188266774528
author Yao, Dengju
Zhang, Tao
Zhan, Xiaojuan
Zhang, Shuli
Zhan, Xiaorong
Zhang, Chao
author_facet Yao, Dengju
Zhang, Tao
Zhan, Xiaojuan
Zhang, Shuli
Zhan, Xiaorong
Zhang, Chao
author_sort Yao, Dengju
collection PubMed
description More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.
format Online
Article
Text
id pubmed-9448985
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-94489852022-09-08 Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations Yao, Dengju Zhang, Tao Zhan, Xiaojuan Zhang, Shuli Zhan, Xiaorong Zhang, Chao Front Genet Genetics More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs. Frontiers Media S.A. 2022-08-24 /pmc/articles/PMC9448985/ /pubmed/36092871 http://dx.doi.org/10.3389/fgene.2022.995532 Text en Copyright © 2022 Yao, Zhang, Zhan, Zhang, Zhan and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Yao, Dengju
Zhang, Tao
Zhan, Xiaojuan
Zhang, Shuli
Zhan, Xiaorong
Zhang, Chao
Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations
title Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations
title_full Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations
title_fullStr Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations
title_full_unstemmed Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations
title_short Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations
title_sort geometric complement heterogeneous information and random forest for predicting lncrna-disease associations
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9448985/
https://www.ncbi.nlm.nih.gov/pubmed/36092871
http://dx.doi.org/10.3389/fgene.2022.995532
work_keys_str_mv AT yaodengju geometriccomplementheterogeneousinformationandrandomforestforpredictinglncrnadiseaseassociations
AT zhangtao geometriccomplementheterogeneousinformationandrandomforestforpredictinglncrnadiseaseassociations
AT zhanxiaojuan geometriccomplementheterogeneousinformationandrandomforestforpredictinglncrnadiseaseassociations
AT zhangshuli geometriccomplementheterogeneousinformationandrandomforestforpredictinglncrnadiseaseassociations
AT zhanxiaorong geometriccomplementheterogeneousinformationandrandomforestforpredictinglncrnadiseaseassociations
AT zhangchao geometriccomplementheterogeneousinformationandrandomforestforpredictinglncrnadiseaseassociations