Cargando…

Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Xin, Guo, Jing, Sun, Xiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4620426/
https://www.ncbi.nlm.nih.gov/pubmed/26543860
http://dx.doi.org/10.1155/2015/425810
_version_ 1782397295198732288
author Ma, Xin
Guo, Jing
Sun, Xiao
author_facet Ma, Xin
Guo, Jing
Sun, Xiao
author_sort Ma, Xin
collection PubMed
description The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
format Online
Article
Text
id pubmed-4620426
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-46204262015-11-05 Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection Ma, Xin Guo, Jing Sun, Xiao Biomed Res Int Research Article The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. Hindawi Publishing Corporation 2015 2015-10-12 /pmc/articles/PMC4620426/ /pubmed/26543860 http://dx.doi.org/10.1155/2015/425810 Text en Copyright © 2015 Xin Ma et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ma, Xin
Guo, Jing
Sun, Xiao
Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
title Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
title_full Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
title_fullStr Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
title_full_unstemmed Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
title_short Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
title_sort sequence-based prediction of rna-binding proteins using random forest with minimum redundancy maximum relevance feature selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4620426/
https://www.ncbi.nlm.nih.gov/pubmed/26543860
http://dx.doi.org/10.1155/2015/425810
work_keys_str_mv AT maxin sequencebasedpredictionofrnabindingproteinsusingrandomforestwithminimumredundancymaximumrelevancefeatureselection
AT guojing sequencebasedpredictionofrnabindingproteinsusingrandomforestwithminimumredundancymaximumrelevancefeatureselection
AT sunxiao sequencebasedpredictionofrnabindingproteinsusingrandomforestwithminimumredundancymaximumrelevancefeatureselection