Cargando…

GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network

In recent years, the long noncoding RNA (lncRNA) has been shown to be involved in many disease processes. The prediction of the lncRNA–disease association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for...

Descripción completa

Detalles Bibliográficos
Autores principales: Duan, Tao, Kuang, Zhufang, Wang, Jiaqi, Ma, Zhihao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8718797/
https://www.ncbi.nlm.nih.gov/pubmed/34977011
http://dx.doi.org/10.3389/fcell.2021.753027
_version_ 1784624804268081152
author Duan, Tao
Kuang, Zhufang
Wang, Jiaqi
Ma, Zhihao
author_facet Duan, Tao
Kuang, Zhufang
Wang, Jiaqi
Ma, Zhihao
author_sort Duan, Tao
collection PubMed
description In recent years, the long noncoding RNA (lncRNA) has been shown to be involved in many disease processes. The prediction of the lncRNA–disease association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for predicting the potential lncRNA–disease association seldom consider the heterogeneous networks with complex node paths, and these methods have the problem of unbalanced positive and negative samples. To solve this problem, a method based on the Gradient Boosting Decision Tree (GBDT) and logistic regression (LR) to predict the lncRNA–disease association (GBDTLRL2D) is proposed in this paper. MetaGraph2Vec is used for feature learning, and negative sample sets are selected by using K-means clustering. The innovation of the GBDTLRL2D is that the clustering algorithm is used to select a representative negative sample set, and the use of MetaGraph2Vec can better retain the semantic and structural features in heterogeneous networks. The average area under the receiver operating characteristic curve (AUC) values of GBDTLRL2D obtained on the three datasets are 0.98, 0.98, and 0.96 in 10-fold cross-validation.
format Online
Article
Text
id pubmed-8718797
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-87187972022-01-01 GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network Duan, Tao Kuang, Zhufang Wang, Jiaqi Ma, Zhihao Front Cell Dev Biol Cell and Developmental Biology In recent years, the long noncoding RNA (lncRNA) has been shown to be involved in many disease processes. The prediction of the lncRNA–disease association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for predicting the potential lncRNA–disease association seldom consider the heterogeneous networks with complex node paths, and these methods have the problem of unbalanced positive and negative samples. To solve this problem, a method based on the Gradient Boosting Decision Tree (GBDT) and logistic regression (LR) to predict the lncRNA–disease association (GBDTLRL2D) is proposed in this paper. MetaGraph2Vec is used for feature learning, and negative sample sets are selected by using K-means clustering. The innovation of the GBDTLRL2D is that the clustering algorithm is used to select a representative negative sample set, and the use of MetaGraph2Vec can better retain the semantic and structural features in heterogeneous networks. The average area under the receiver operating characteristic curve (AUC) values of GBDTLRL2D obtained on the three datasets are 0.98, 0.98, and 0.96 in 10-fold cross-validation. Frontiers Media S.A. 2021-12-17 /pmc/articles/PMC8718797/ /pubmed/34977011 http://dx.doi.org/10.3389/fcell.2021.753027 Text en Copyright © 2021 Duan, Kuang, Wang and Ma. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cell and Developmental Biology
Duan, Tao
Kuang, Zhufang
Wang, Jiaqi
Ma, Zhihao
GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network
title GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network
title_full GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network
title_fullStr GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network
title_full_unstemmed GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network
title_short GBDTLRL2D Predicts LncRNA–Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network
title_sort gbdtlrl2d predicts lncrna–disease associations using metagraph2vec and k-means based on heterogeneous network
topic Cell and Developmental Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8718797/
https://www.ncbi.nlm.nih.gov/pubmed/34977011
http://dx.doi.org/10.3389/fcell.2021.753027
work_keys_str_mv AT duantao gbdtlrl2dpredictslncrnadiseaseassociationsusingmetagraph2vecandkmeansbasedonheterogeneousnetwork
AT kuangzhufang gbdtlrl2dpredictslncrnadiseaseassociationsusingmetagraph2vecandkmeansbasedonheterogeneousnetwork
AT wangjiaqi gbdtlrl2dpredictslncrnadiseaseassociationsusingmetagraph2vecandkmeansbasedonheterogeneousnetwork
AT mazhihao gbdtlrl2dpredictslncrnadiseaseassociationsusingmetagraph2vecandkmeansbasedonheterogeneousnetwork