Cargando…

Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome

BACKGROUND: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab i...

Descripción completa

Detalles Bibliográficos
Autores principales:	D'Agostino, Nunzio, Traini, Alessandra, Frusciante, Luigi, Chiusano, Maria Luisa
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1885861/ https://www.ncbi.nlm.nih.gov/pubmed/17430576 http://dx.doi.org/10.1186/1471-2105-8-S1-S9

_version_	1782133658479493120
author	D'Agostino, Nunzio Traini, Alessandra Frusciante, Luigi Chiusano, Maria Luisa
author_facet	D'Agostino, Nunzio Traini, Alessandra Frusciante, Luigi Chiusano, Maria Luisa
author_sort	D'Agostino, Nunzio
collection	PubMed
description	BACKGROUND: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. RESULTS: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences. GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. CONCLUSION: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation.
format	Text
id	pubmed-1885861
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18858612007-06-05 Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome D'Agostino, Nunzio Traini, Alessandra Frusciante, Luigi Chiusano, Maria Luisa BMC Bioinformatics Research BACKGROUND: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. RESULTS: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences. GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. CONCLUSION: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation. BioMed Central 2007-03-08 /pmc/articles/PMC1885861/ /pubmed/17430576 http://dx.doi.org/10.1186/1471-2105-8-S1-S9 Text en Copyright © 2007 D'Agostino et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research D'Agostino, Nunzio Traini, Alessandra Frusciante, Luigi Chiusano, Maria Luisa Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome
title	Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome
title_full	Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome
title_fullStr	Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome
title_full_unstemmed	Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome
title_short	Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome
title_sort	gene models from ests (genemodelest): an application on the solanum lycopersicum genome
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1885861/ https://www.ncbi.nlm.nih.gov/pubmed/17430576 http://dx.doi.org/10.1186/1471-2105-8-S1-S9
work_keys_str_mv	AT dagostinonunzio genemodelsfromestsgenemodelestanapplicationonthesolanumlycopersicumgenome AT trainialessandra genemodelsfromestsgenemodelestanapplicationonthesolanumlycopersicumgenome AT fruscianteluigi genemodelsfromestsgenemodelestanapplicationonthesolanumlycopersicumgenome AT chiusanomarialuisa genemodelsfromestsgenemodelestanapplicationonthesolanumlycopersicumgenome

Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome

Ejemplares similares