Cargando…

Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

BACKGROUND: Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Ara...

Descripción completa

Detalles Bibliográficos
Autores principales: Aubourg, Sébastien, Martin-Magniette, Marie-Laure, Brunaud, Véronique, Taconnat, Ludivine, Bitton, Frédérique, Balzergue, Sandrine, Jullien, Pauline E, Ingouff, Mathieu, Thareau, Vincent, Schiex, Thomas, Lecharny, Alain, Renou, Jean-Pierre
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2174955/
https://www.ncbi.nlm.nih.gov/pubmed/17980019
http://dx.doi.org/10.1186/1471-2164-8-401
_version_ 1782145396798128128
author Aubourg, Sébastien
Martin-Magniette, Marie-Laure
Brunaud, Véronique
Taconnat, Ludivine
Bitton, Frédérique
Balzergue, Sandrine
Jullien, Pauline E
Ingouff, Mathieu
Thareau, Vincent
Schiex, Thomas
Lecharny, Alain
Renou, Jean-Pierre
author_facet Aubourg, Sébastien
Martin-Magniette, Marie-Laure
Brunaud, Véronique
Taconnat, Ludivine
Bitton, Frédérique
Balzergue, Sandrine
Jullien, Pauline E
Ingouff, Mathieu
Thareau, Vincent
Schiex, Thomas
Lecharny, Alain
Renou, Jean-Pierre
author_sort Aubourg, Sébastien
collection PubMed
description BACKGROUND: Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models. RESULTS: The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS. CONCLUSION: This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs.
format Text
id pubmed-2174955
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21749552008-01-05 Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome Aubourg, Sébastien Martin-Magniette, Marie-Laure Brunaud, Véronique Taconnat, Ludivine Bitton, Frédérique Balzergue, Sandrine Jullien, Pauline E Ingouff, Mathieu Thareau, Vincent Schiex, Thomas Lecharny, Alain Renou, Jean-Pierre BMC Genomics Research Article BACKGROUND: Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models. RESULTS: The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS. CONCLUSION: This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs. BioMed Central 2007-11-02 /pmc/articles/PMC2174955/ /pubmed/17980019 http://dx.doi.org/10.1186/1471-2164-8-401 Text en Copyright © 2007 Aubourg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Aubourg, Sébastien
Martin-Magniette, Marie-Laure
Brunaud, Véronique
Taconnat, Ludivine
Bitton, Frédérique
Balzergue, Sandrine
Jullien, Pauline E
Ingouff, Mathieu
Thareau, Vincent
Schiex, Thomas
Lecharny, Alain
Renou, Jean-Pierre
Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
title Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
title_full Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
title_fullStr Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
title_full_unstemmed Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
title_short Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome
title_sort analysis of catma transcriptome data identifies hundreds of novel functional genes and improves gene models in the arabidopsis genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2174955/
https://www.ncbi.nlm.nih.gov/pubmed/17980019
http://dx.doi.org/10.1186/1471-2164-8-401
work_keys_str_mv AT aubourgsebastien analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT martinmagniettemarielaure analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT brunaudveronique analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT taconnatludivine analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT bittonfrederique analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT balzerguesandrine analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT jullienpaulinee analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT ingouffmathieu analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT thareauvincent analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT schiexthomas analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT lecharnyalain analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome
AT renoujeanpierre analysisofcatmatranscriptomedataidentifieshundredsofnovelfunctionalgenesandimprovesgenemodelsinthearabidopsisgenome