Cargando…

IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields

Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yumeng, Wang, Xiaolong, Liu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6164615/
https://www.ncbi.nlm.nih.gov/pubmed/30135358
http://dx.doi.org/10.3390/ijms19092483
_version_ 1783359642605191168
author Liu, Yumeng
Wang, Xiaolong
Liu, Bin
author_facet Liu, Yumeng
Wang, Xiaolong
Liu, Bin
author_sort Liu, Yumeng
collection PubMed
description Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP–CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP–CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP–CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP–CRF will facilitate the development of protein sequence analysis.
format Online
Article
Text
id pubmed-6164615
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-61646152018-10-10 IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields Liu, Yumeng Wang, Xiaolong Liu, Bin Int J Mol Sci Article Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP–CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP–CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP–CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP–CRF will facilitate the development of protein sequence analysis. MDPI 2018-08-22 /pmc/articles/PMC6164615/ /pubmed/30135358 http://dx.doi.org/10.3390/ijms19092483 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Yumeng
Wang, Xiaolong
Liu, Bin
IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
title IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
title_full IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
title_fullStr IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
title_full_unstemmed IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
title_short IDP–CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields
title_sort idp–crf: intrinsically disordered protein/region identification based on conditional random fields
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6164615/
https://www.ncbi.nlm.nih.gov/pubmed/30135358
http://dx.doi.org/10.3390/ijms19092483
work_keys_str_mv AT liuyumeng idpcrfintrinsicallydisorderedproteinregionidentificationbasedonconditionalrandomfields
AT wangxiaolong idpcrfintrinsicallydisorderedproteinregionidentificationbasedonconditionalrandomfields
AT liubin idpcrfintrinsicallydisorderedproteinregionidentificationbasedonconditionalrandomfields