Cargando…

Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms

Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound di...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Jinlong, Chen, Xingyu, Huang, Qixing, Wang, Yang, Xie, Yun, Dai, Zong, Zou, Xiaoyong, Li, Zhanchao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7578641/
https://www.ncbi.nlm.nih.gov/pubmed/33087810
http://dx.doi.org/10.1038/s41598-020-75005-9
_version_ 1783598410576691200
author Li, Jinlong
Chen, Xingyu
Huang, Qixing
Wang, Yang
Xie, Yun
Dai, Zong
Zou, Xiaoyong
Li, Zhanchao
author_facet Li, Jinlong
Chen, Xingyu
Huang, Qixing
Wang, Yang
Xie, Yun
Dai, Zong
Zou, Xiaoyong
Li, Zhanchao
author_sort Li, Jinlong
collection PubMed
description Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF.
format Online
Article
Text
id pubmed-7578641
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-75786412020-10-23 Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms Li, Jinlong Chen, Xingyu Huang, Qixing Wang, Yang Xie, Yun Dai, Zong Zou, Xiaoyong Li, Zhanchao Sci Rep Article Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF. Nature Publishing Group UK 2020-10-21 /pmc/articles/PMC7578641/ /pubmed/33087810 http://dx.doi.org/10.1038/s41598-020-75005-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Li, Jinlong
Chen, Xingyu
Huang, Qixing
Wang, Yang
Xie, Yun
Dai, Zong
Zou, Xiaoyong
Li, Zhanchao
Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
title Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
title_full Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
title_fullStr Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
title_full_unstemmed Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
title_short Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
title_sort seq-symrf: a random forest model predicts potential mirna-disease associations based on information of sequences and clinical symptoms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7578641/
https://www.ncbi.nlm.nih.gov/pubmed/33087810
http://dx.doi.org/10.1038/s41598-020-75005-9
work_keys_str_mv AT lijinlong seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT chenxingyu seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT huangqixing seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT wangyang seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT xieyun seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT daizong seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT zouxiaoyong seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms
AT lizhanchao seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms