Cargando…

A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks

Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Biyu, Hou, GuangWen, Tan, Ming, Zeng, Lixin, Guan, Mengshan, Li
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10395133/ https://www.ncbi.nlm.nih.gov/pubmed/37539215 http://dx.doi.org/10.1016/j.heliyon.2023.e17726

_version_	1785083524975427584
author	Biyu, Hou GuangWen, Tan Ming, Zeng Lixin, Guan Mengshan, Li
author_facet	Biyu, Hou GuangWen, Tan Ming, Zeng Lixin, Guan Mengshan, Li
author_sort	Biyu, Hou
collection	PubMed
description	Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In the era of big data, numerous machine learning methods have been proposed to predict the potential association between lncRNA and diseases, but the characteristics of the associated data were seldom explored. In these methods, negative samples are randomly selected for model training and the model is prone to learn the potential positive association error, thus affecting the prediction accuracy. In this paper, we proposed a cyclic optimization model of predicting lncRNA-disease associations (COPTLDA in short). In COPTLDA, the two-step training strategy is adopted to search for the samples with the greater probability of being negative examples from unlabeled samples and the determined samples are treated as negative samples, which are combined together with known positive samples to train the model. The searching and training steps are repeated until the best model is obtained as the final prediction model. In order to evaluate the performance of the model, 30% of the known positive samples are used to calculate the model accuracy and 10% of positive samples are used to calculate the recall rate of the model. The sampling strategy used in this paper can improve the accuracy and the AUC value reaches 0.9348. The results of case studies showed that the model could predict the potential associations between lncRNA and malignant tumors such as colorectal cancer, gastric cancer, and breast cancer. The predicted top 20 associated lncRNAs included 10 colorectal cancer lncRNAs, 2 gastric cancer lncRNAs, and 8 breast cancer lncRNAs.
format	Online Article Text
id	pubmed-10395133
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-103951332023-08-03 A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks Biyu, Hou GuangWen, Tan Ming, Zeng Lixin, Guan Mengshan, Li Heliyon Research Article Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In the era of big data, numerous machine learning methods have been proposed to predict the potential association between lncRNA and diseases, but the characteristics of the associated data were seldom explored. In these methods, negative samples are randomly selected for model training and the model is prone to learn the potential positive association error, thus affecting the prediction accuracy. In this paper, we proposed a cyclic optimization model of predicting lncRNA-disease associations (COPTLDA in short). In COPTLDA, the two-step training strategy is adopted to search for the samples with the greater probability of being negative examples from unlabeled samples and the determined samples are treated as negative samples, which are combined together with known positive samples to train the model. The searching and training steps are repeated until the best model is obtained as the final prediction model. In order to evaluate the performance of the model, 30% of the known positive samples are used to calculate the model accuracy and 10% of positive samples are used to calculate the recall rate of the model. The sampling strategy used in this paper can improve the accuracy and the AUC value reaches 0.9348. The results of case studies showed that the model could predict the potential associations between lncRNA and malignant tumors such as colorectal cancer, gastric cancer, and breast cancer. The predicted top 20 associated lncRNAs included 10 colorectal cancer lncRNAs, 2 gastric cancer lncRNAs, and 8 breast cancer lncRNAs. Elsevier 2023-06-28 /pmc/articles/PMC10395133/ /pubmed/37539215 http://dx.doi.org/10.1016/j.heliyon.2023.e17726 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Biyu, Hou GuangWen, Tan Ming, Zeng Lixin, Guan Mengshan, Li A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title	A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_full	A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_fullStr	A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_full_unstemmed	A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_short	A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_sort	lncrna-disease association prediction model based on the two-step pu learning and fully connected neural networks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10395133/ https://www.ncbi.nlm.nih.gov/pubmed/37539215 http://dx.doi.org/10.1016/j.heliyon.2023.e17726
work_keys_str_mv	AT biyuhou alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT guangwentan alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT mingzeng alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT lixinguan alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT mengshanli alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT biyuhou lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT guangwentan lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT mingzeng lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT lixinguan lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks AT mengshanli lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks

A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks

Ejemplares similares