Cargando…

An improved random forest-based computational model for predicting novel miRNA-disease associations

BACKGROUND: A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Dengju, Zhan, Xiaojuan, Kwoh, Chee-Keong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889672/
https://www.ncbi.nlm.nih.gov/pubmed/31795954
http://dx.doi.org/10.1186/s12859-019-3290-7
_version_ 1783475471031205888
author Yao, Dengju
Zhan, Xiaojuan
Kwoh, Chee-Keong
author_facet Yao, Dengju
Zhan, Xiaojuan
Kwoh, Chee-Keong
author_sort Yao, Dengju
collection PubMed
description BACKGROUND: A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. RESULTS: Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model’s ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. CONCLUSIONS: Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future.
format Online
Article
Text
id pubmed-6889672
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68896722019-12-11 An improved random forest-based computational model for predicting novel miRNA-disease associations Yao, Dengju Zhan, Xiaojuan Kwoh, Chee-Keong BMC Bioinformatics Research Article BACKGROUND: A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. RESULTS: Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model’s ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. CONCLUSIONS: Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future. BioMed Central 2019-12-03 /pmc/articles/PMC6889672/ /pubmed/31795954 http://dx.doi.org/10.1186/s12859-019-3290-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Yao, Dengju
Zhan, Xiaojuan
Kwoh, Chee-Keong
An improved random forest-based computational model for predicting novel miRNA-disease associations
title An improved random forest-based computational model for predicting novel miRNA-disease associations
title_full An improved random forest-based computational model for predicting novel miRNA-disease associations
title_fullStr An improved random forest-based computational model for predicting novel miRNA-disease associations
title_full_unstemmed An improved random forest-based computational model for predicting novel miRNA-disease associations
title_short An improved random forest-based computational model for predicting novel miRNA-disease associations
title_sort improved random forest-based computational model for predicting novel mirna-disease associations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889672/
https://www.ncbi.nlm.nih.gov/pubmed/31795954
http://dx.doi.org/10.1186/s12859-019-3290-7
work_keys_str_mv AT yaodengju animprovedrandomforestbasedcomputationalmodelforpredictingnovelmirnadiseaseassociations
AT zhanxiaojuan animprovedrandomforestbasedcomputationalmodelforpredictingnovelmirnadiseaseassociations
AT kwohcheekeong animprovedrandomforestbasedcomputationalmodelforpredictingnovelmirnadiseaseassociations
AT yaodengju improvedrandomforestbasedcomputationalmodelforpredictingnovelmirnadiseaseassociations
AT zhanxiaojuan improvedrandomforestbasedcomputationalmodelforpredictingnovelmirnadiseaseassociations
AT kwohcheekeong improvedrandomforestbasedcomputationalmodelforpredictingnovelmirnadiseaseassociations