Cargando…
FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences
To understand fully cell behaviour, biologists are making progress towards cataloguing the functional elements in the human genome and characterising their roles across a variety of tissues and conditions. Yet, functional information – either experimentally validated or computationally inferred by s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3661659/ https://www.ncbi.nlm.nih.gov/pubmed/23717476 http://dx.doi.org/10.1371/journal.pone.0063754 |
_version_ | 1782270723751936000 |
---|---|
author | Minneci, Federico Piovesan, Damiano Cozzetto, Domenico Jones, David T. |
author_facet | Minneci, Federico Piovesan, Damiano Cozzetto, Domenico Jones, David T. |
author_sort | Minneci, Federico |
collection | PubMed |
description | To understand fully cell behaviour, biologists are making progress towards cataloguing the functional elements in the human genome and characterising their roles across a variety of tissues and conditions. Yet, functional information – either experimentally validated or computationally inferred by similarity – remains completely missing for approximately 30% of human proteins. FFPred was initially developed to bridge this gap by targeting sequences with distant or no homologues of known function and by exploiting clear patterns of intrinsic disorder associated with particular molecular activities and biological processes. Here, we present an updated and improved version, which builds on larger datasets of protein sequences and annotations, and uses updated component feature predictors as well as revised training procedures. FFPred 2.0 includes support vector regression models for the prediction of 442 Gene Ontology (GO) terms, which largely expand the coverage of the ontology and of the biological process category in particular. The GO term list mainly revolves around macromolecular interactions and their role in regulatory, signalling, developmental and metabolic processes. Benchmarking experiments on newly annotated proteins show that FFPred 2.0 provides more accurate functional assignments than its predecessor and the ProtFun server do; also, its assignments can complement information obtained using BLAST-based transfer of annotations, improving especially prediction in the biological process category. Furthermore, FFPred 2.0 can be used to annotate proteins belonging to several eukaryotic organisms with a limited decrease in prediction quality. We illustrate all these points through the use of both precision-recall plots and of the COGIC scores, which we recently proposed as an alternative numerical evaluation measure of function prediction accuracy. |
format | Online Article Text |
id | pubmed-3661659 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-36616592013-05-28 FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences Minneci, Federico Piovesan, Damiano Cozzetto, Domenico Jones, David T. PLoS One Research Article To understand fully cell behaviour, biologists are making progress towards cataloguing the functional elements in the human genome and characterising their roles across a variety of tissues and conditions. Yet, functional information – either experimentally validated or computationally inferred by similarity – remains completely missing for approximately 30% of human proteins. FFPred was initially developed to bridge this gap by targeting sequences with distant or no homologues of known function and by exploiting clear patterns of intrinsic disorder associated with particular molecular activities and biological processes. Here, we present an updated and improved version, which builds on larger datasets of protein sequences and annotations, and uses updated component feature predictors as well as revised training procedures. FFPred 2.0 includes support vector regression models for the prediction of 442 Gene Ontology (GO) terms, which largely expand the coverage of the ontology and of the biological process category in particular. The GO term list mainly revolves around macromolecular interactions and their role in regulatory, signalling, developmental and metabolic processes. Benchmarking experiments on newly annotated proteins show that FFPred 2.0 provides more accurate functional assignments than its predecessor and the ProtFun server do; also, its assignments can complement information obtained using BLAST-based transfer of annotations, improving especially prediction in the biological process category. Furthermore, FFPred 2.0 can be used to annotate proteins belonging to several eukaryotic organisms with a limited decrease in prediction quality. We illustrate all these points through the use of both precision-recall plots and of the COGIC scores, which we recently proposed as an alternative numerical evaluation measure of function prediction accuracy. Public Library of Science 2013-05-22 /pmc/articles/PMC3661659/ /pubmed/23717476 http://dx.doi.org/10.1371/journal.pone.0063754 Text en © 2013 Minneci et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Minneci, Federico Piovesan, Damiano Cozzetto, Domenico Jones, David T. FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences |
title | FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences |
title_full | FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences |
title_fullStr | FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences |
title_full_unstemmed | FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences |
title_short | FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences |
title_sort | ffpred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3661659/ https://www.ncbi.nlm.nih.gov/pubmed/23717476 http://dx.doi.org/10.1371/journal.pone.0063754 |
work_keys_str_mv | AT minnecifederico ffpred20improvedhomologyindependentpredictionofgeneontologytermsforeukaryoticproteinsequences AT piovesandamiano ffpred20improvedhomologyindependentpredictionofgeneontologytermsforeukaryoticproteinsequences AT cozzettodomenico ffpred20improvedhomologyindependentpredictionofgeneontologytermsforeukaryoticproteinsequences AT jonesdavidt ffpred20improvedhomologyindependentpredictionofgeneontologytermsforeukaryoticproteinsequences |