Cargando…

Detailed analysis of putative genes encoding small proteins in legume genomes

Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not ye...

Descripción completa

Detalles Bibliográficos
Autores principales: Guillén, Gabriel, Díaz-Camino, Claudia, Loyola-Torres, Carlos A., Aparicio-Fabre, Rosaura, Hernández-López, Alejandrina, Díaz-Sánchez, Mauricio, Sanchez, Federico
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687714/
https://www.ncbi.nlm.nih.gov/pubmed/23802007
http://dx.doi.org/10.3389/fpls.2013.00208
_version_ 1782273974962487296
author Guillén, Gabriel
Díaz-Camino, Claudia
Loyola-Torres, Carlos A.
Aparicio-Fabre, Rosaura
Hernández-López, Alejandrina
Díaz-Sánchez, Mauricio
Sanchez, Federico
author_facet Guillén, Gabriel
Díaz-Camino, Claudia
Loyola-Torres, Carlos A.
Aparicio-Fabre, Rosaura
Hernández-López, Alejandrina
Díaz-Sánchez, Mauricio
Sanchez, Federico
author_sort Guillén, Gabriel
collection PubMed
description Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30–150 amino acids) encoded by short open reading frames (sORFs). SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max, and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription), presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10,461, 30,521, and 23,599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs) available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation.
format Online
Article
Text
id pubmed-3687714
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-36877142013-06-25 Detailed analysis of putative genes encoding small proteins in legume genomes Guillén, Gabriel Díaz-Camino, Claudia Loyola-Torres, Carlos A. Aparicio-Fabre, Rosaura Hernández-López, Alejandrina Díaz-Sánchez, Mauricio Sanchez, Federico Front Plant Sci Plant Science Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30–150 amino acids) encoded by short open reading frames (sORFs). SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max, and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription), presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10,461, 30,521, and 23,599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs) available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation. Frontiers Media S.A. 2013-06-20 /pmc/articles/PMC3687714/ /pubmed/23802007 http://dx.doi.org/10.3389/fpls.2013.00208 Text en Copyright © 2013 Guillén, Díaz-Camino, Loyola-Torres, Aparicio-Fabre, Hernández-López, Díaz-Sánchez and Sanchez. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
spellingShingle Plant Science
Guillén, Gabriel
Díaz-Camino, Claudia
Loyola-Torres, Carlos A.
Aparicio-Fabre, Rosaura
Hernández-López, Alejandrina
Díaz-Sánchez, Mauricio
Sanchez, Federico
Detailed analysis of putative genes encoding small proteins in legume genomes
title Detailed analysis of putative genes encoding small proteins in legume genomes
title_full Detailed analysis of putative genes encoding small proteins in legume genomes
title_fullStr Detailed analysis of putative genes encoding small proteins in legume genomes
title_full_unstemmed Detailed analysis of putative genes encoding small proteins in legume genomes
title_short Detailed analysis of putative genes encoding small proteins in legume genomes
title_sort detailed analysis of putative genes encoding small proteins in legume genomes
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687714/
https://www.ncbi.nlm.nih.gov/pubmed/23802007
http://dx.doi.org/10.3389/fpls.2013.00208
work_keys_str_mv AT guillengabriel detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes
AT diazcaminoclaudia detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes
AT loyolatorrescarlosa detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes
AT apariciofabrerosaura detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes
AT hernandezlopezalejandrina detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes
AT diazsanchezmauricio detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes
AT sanchezfederico detailedanalysisofputativegenesencodingsmallproteinsinlegumegenomes