Cargando…
Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms
Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound di...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7578641/ https://www.ncbi.nlm.nih.gov/pubmed/33087810 http://dx.doi.org/10.1038/s41598-020-75005-9 |
_version_ | 1783598410576691200 |
---|---|
author | Li, Jinlong Chen, Xingyu Huang, Qixing Wang, Yang Xie, Yun Dai, Zong Zou, Xiaoyong Li, Zhanchao |
author_facet | Li, Jinlong Chen, Xingyu Huang, Qixing Wang, Yang Xie, Yun Dai, Zong Zou, Xiaoyong Li, Zhanchao |
author_sort | Li, Jinlong |
collection | PubMed |
description | Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF. |
format | Online Article Text |
id | pubmed-7578641 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-75786412020-10-23 Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms Li, Jinlong Chen, Xingyu Huang, Qixing Wang, Yang Xie, Yun Dai, Zong Zou, Xiaoyong Li, Zhanchao Sci Rep Article Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF. Nature Publishing Group UK 2020-10-21 /pmc/articles/PMC7578641/ /pubmed/33087810 http://dx.doi.org/10.1038/s41598-020-75005-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Li, Jinlong Chen, Xingyu Huang, Qixing Wang, Yang Xie, Yun Dai, Zong Zou, Xiaoyong Li, Zhanchao Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms |
title | Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms |
title_full | Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms |
title_fullStr | Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms |
title_full_unstemmed | Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms |
title_short | Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms |
title_sort | seq-symrf: a random forest model predicts potential mirna-disease associations based on information of sequences and clinical symptoms |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7578641/ https://www.ncbi.nlm.nih.gov/pubmed/33087810 http://dx.doi.org/10.1038/s41598-020-75005-9 |
work_keys_str_mv | AT lijinlong seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT chenxingyu seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT huangqixing seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT wangyang seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT xieyun seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT daizong seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT zouxiaoyong seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms AT lizhanchao seqsymrfarandomforestmodelpredictspotentialmirnadiseaseassociationsbasedoninformationofsequencesandclinicalsymptoms |