Cargando…
Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function
With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933697/ https://www.ncbi.nlm.nih.gov/pubmed/24587103 http://dx.doi.org/10.1371/journal.pone.0089890 |
_version_ | 1782304971782356992 |
---|---|
author | Sharma, Alok Dehzangi, Abdollah Lyons, James Imoto, Seiya Miyano, Satoru Nakai, Kenta Patil, Ashwini |
author_facet | Sharma, Alok Dehzangi, Abdollah Lyons, James Imoto, Seiya Miyano, Satoru Nakai, Kenta Patil, Ashwini |
author_sort | Sharma, Alok |
collection | PubMed |
description | With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor. |
format | Online Article Text |
id | pubmed-3933697 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-39336972014-02-25 Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function Sharma, Alok Dehzangi, Abdollah Lyons, James Imoto, Seiya Miyano, Satoru Nakai, Kenta Patil, Ashwini PLoS One Research Article With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor. Public Library of Science 2014-02-24 /pmc/articles/PMC3933697/ /pubmed/24587103 http://dx.doi.org/10.1371/journal.pone.0089890 Text en © 2014 Sharma et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Sharma, Alok Dehzangi, Abdollah Lyons, James Imoto, Seiya Miyano, Satoru Nakai, Kenta Patil, Ashwini Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function |
title | Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function |
title_full | Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function |
title_fullStr | Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function |
title_full_unstemmed | Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function |
title_short | Evaluation of Sequence Features from Intrinsically Disordered Regions for the Estimation of Protein Function |
title_sort | evaluation of sequence features from intrinsically disordered regions for the estimation of protein function |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933697/ https://www.ncbi.nlm.nih.gov/pubmed/24587103 http://dx.doi.org/10.1371/journal.pone.0089890 |
work_keys_str_mv | AT sharmaalok evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction AT dehzangiabdollah evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction AT lyonsjames evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction AT imotoseiya evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction AT miyanosatoru evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction AT nakaikenta evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction AT patilashwini evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction |