Cargando…

Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins

BACKGROUND: Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins,...

Descripción completa

Detalles Bibliográficos
Autores principales: Jaramillo-Garzón, Jorge Alberto, Gallardo-Chacón, Joan Josep, Castellanos-Domínguez, César Germán, Perera-Lluna, Alexandre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660269/
https://www.ncbi.nlm.nih.gov/pubmed/23441934
http://dx.doi.org/10.1186/1471-2105-14-68
_version_ 1782270531425271808
author Jaramillo-Garzón, Jorge Alberto
Gallardo-Chacón, Joan Josep
Castellanos-Domínguez, César Germán
Perera-Lluna, Alexandre
author_facet Jaramillo-Garzón, Jorge Alberto
Gallardo-Chacón, Joan Josep
Castellanos-Domínguez, César Germán
Perera-Lluna, Alexandre
author_sort Jaramillo-Garzón, Jorge Alberto
collection PubMed
description BACKGROUND: Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence. RESULTS: High predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features. CONCLUSIONS: An analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO terms at molecular, cellular and phenotypical level. Thus, it provides a valuable guide for researchers interested on further advances in protein function prediction on Embryophyta plants.
format Online
Article
Text
id pubmed-3660269
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36602692013-05-22 Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins Jaramillo-Garzón, Jorge Alberto Gallardo-Chacón, Joan Josep Castellanos-Domínguez, César Germán Perera-Lluna, Alexandre BMC Bioinformatics Research Article BACKGROUND: Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence. RESULTS: High predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features. CONCLUSIONS: An analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO terms at molecular, cellular and phenotypical level. Thus, it provides a valuable guide for researchers interested on further advances in protein function prediction on Embryophyta plants. BioMed Central 2013-02-26 /pmc/articles/PMC3660269/ /pubmed/23441934 http://dx.doi.org/10.1186/1471-2105-14-68 Text en Copyright © 2013 Jaramillo-Garzón et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jaramillo-Garzón, Jorge Alberto
Gallardo-Chacón, Joan Josep
Castellanos-Domínguez, César Germán
Perera-Lluna, Alexandre
Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins
title Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins
title_full Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins
title_fullStr Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins
title_full_unstemmed Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins
title_short Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins
title_sort predictability of gene ontology slim-terms from primary structure information in embryophyta plant proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660269/
https://www.ncbi.nlm.nih.gov/pubmed/23441934
http://dx.doi.org/10.1186/1471-2105-14-68
work_keys_str_mv AT jaramillogarzonjorgealberto predictabilityofgeneontologyslimtermsfromprimarystructureinformationinembryophytaplantproteins
AT gallardochaconjoanjosep predictabilityofgeneontologyslimtermsfromprimarystructureinformationinembryophytaplantproteins
AT castellanosdominguezcesargerman predictabilityofgeneontologyslimtermsfromprimarystructureinformationinembryophytaplantproteins
AT pererallunaalexandre predictabilityofgeneontologyslimtermsfromprimarystructureinformationinembryophytaplantproteins