Cargando…

A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks

Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In th...

Descripción completa

Detalles Bibliográficos
Autores principales: Biyu, Hou, GuangWen, Tan, Ming, Zeng, Lixin, Guan, Mengshan, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10395133/
https://www.ncbi.nlm.nih.gov/pubmed/37539215
http://dx.doi.org/10.1016/j.heliyon.2023.e17726
_version_ 1785083524975427584
author Biyu, Hou
GuangWen, Tan
Ming, Zeng
Lixin, Guan
Mengshan, Li
author_facet Biyu, Hou
GuangWen, Tan
Ming, Zeng
Lixin, Guan
Mengshan, Li
author_sort Biyu, Hou
collection PubMed
description Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In the era of big data, numerous machine learning methods have been proposed to predict the potential association between lncRNA and diseases, but the characteristics of the associated data were seldom explored. In these methods, negative samples are randomly selected for model training and the model is prone to learn the potential positive association error, thus affecting the prediction accuracy. In this paper, we proposed a cyclic optimization model of predicting lncRNA-disease associations (COPTLDA in short). In COPTLDA, the two-step training strategy is adopted to search for the samples with the greater probability of being negative examples from unlabeled samples and the determined samples are treated as negative samples, which are combined together with known positive samples to train the model. The searching and training steps are repeated until the best model is obtained as the final prediction model. In order to evaluate the performance of the model, 30% of the known positive samples are used to calculate the model accuracy and 10% of positive samples are used to calculate the recall rate of the model. The sampling strategy used in this paper can improve the accuracy and the AUC value reaches 0.9348. The results of case studies showed that the model could predict the potential associations between lncRNA and malignant tumors such as colorectal cancer, gastric cancer, and breast cancer. The predicted top 20 associated lncRNAs included 10 colorectal cancer lncRNAs, 2 gastric cancer lncRNAs, and 8 breast cancer lncRNAs.
format Online
Article
Text
id pubmed-10395133
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-103951332023-08-03 A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks Biyu, Hou GuangWen, Tan Ming, Zeng Lixin, Guan Mengshan, Li Heliyon Research Article Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In the era of big data, numerous machine learning methods have been proposed to predict the potential association between lncRNA and diseases, but the characteristics of the associated data were seldom explored. In these methods, negative samples are randomly selected for model training and the model is prone to learn the potential positive association error, thus affecting the prediction accuracy. In this paper, we proposed a cyclic optimization model of predicting lncRNA-disease associations (COPTLDA in short). In COPTLDA, the two-step training strategy is adopted to search for the samples with the greater probability of being negative examples from unlabeled samples and the determined samples are treated as negative samples, which are combined together with known positive samples to train the model. The searching and training steps are repeated until the best model is obtained as the final prediction model. In order to evaluate the performance of the model, 30% of the known positive samples are used to calculate the model accuracy and 10% of positive samples are used to calculate the recall rate of the model. The sampling strategy used in this paper can improve the accuracy and the AUC value reaches 0.9348. The results of case studies showed that the model could predict the potential associations between lncRNA and malignant tumors such as colorectal cancer, gastric cancer, and breast cancer. The predicted top 20 associated lncRNAs included 10 colorectal cancer lncRNAs, 2 gastric cancer lncRNAs, and 8 breast cancer lncRNAs. Elsevier 2023-06-28 /pmc/articles/PMC10395133/ /pubmed/37539215 http://dx.doi.org/10.1016/j.heliyon.2023.e17726 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Biyu, Hou
GuangWen, Tan
Ming, Zeng
Lixin, Guan
Mengshan, Li
A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_full A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_fullStr A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_full_unstemmed A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_short A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks
title_sort lncrna-disease association prediction model based on the two-step pu learning and fully connected neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10395133/
https://www.ncbi.nlm.nih.gov/pubmed/37539215
http://dx.doi.org/10.1016/j.heliyon.2023.e17726
work_keys_str_mv AT biyuhou alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT guangwentan alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT mingzeng alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT lixinguan alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT mengshanli alncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT biyuhou lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT guangwentan lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT mingzeng lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT lixinguan lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks
AT mengshanli lncrnadiseaseassociationpredictionmodelbasedonthetwosteppulearningandfullyconnectedneuralnetworks