Cargando…

Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function

With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharma, Alok, Dehzangi, Abdollah, Lyons, James, Imoto, Seiya, Miyano, Satoru, Nakai, Kenta, Patil, Ashwini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933697/
https://www.ncbi.nlm.nih.gov/pubmed/24587103
http://dx.doi.org/10.1371/journal.pone.0089890
_version_ 1782304971782356992
author Sharma, Alok
Dehzangi, Abdollah
Lyons, James
Imoto, Seiya
Miyano, Satoru
Nakai, Kenta
Patil, Ashwini
author_facet Sharma, Alok
Dehzangi, Abdollah
Lyons, James
Imoto, Seiya
Miyano, Satoru
Nakai, Kenta
Patil, Ashwini
author_sort Sharma, Alok
collection PubMed
description With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor.
format Online
Article
Text
id pubmed-3933697
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39336972014-02-25 Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function Sharma, Alok Dehzangi, Abdollah Lyons, James Imoto, Seiya Miyano, Satoru Nakai, Kenta Patil, Ashwini PLoS One Research Article With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor. Public Library of Science 2014-02-24 /pmc/articles/PMC3933697/ /pubmed/24587103 http://dx.doi.org/10.1371/journal.pone.0089890 Text en © 2014 Sharma et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Sharma, Alok
Dehzangi, Abdollah
Lyons, James
Imoto, Seiya
Miyano, Satoru
Nakai, Kenta
Patil, Ashwini
Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
title Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
title_full Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
title_fullStr Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
title_full_unstemmed Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
title_short Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
title_sort evaluation of sequence features from intrinsically disordered regions for the estimation of protein function
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933697/
https://www.ncbi.nlm.nih.gov/pubmed/24587103
http://dx.doi.org/10.1371/journal.pone.0089890
work_keys_str_mv AT sharmaalok evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT dehzangiabdollah evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT lyonsjames evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT imotoseiya evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT miyanosatoru evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT nakaikenta evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT patilashwini evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction