Cargando…

Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens

BACKGROUND: The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position. Available EST data was clustered and assembled, and provided the basis for a genome-wi...

Descripción completa

Detalles Bibliográficos
Autores principales: Rensing, Stefan A, Fritzowsky, Dana, Lang, Daniel, Reski, Ralf
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1079823/
https://www.ncbi.nlm.nih.gov/pubmed/15784153
http://dx.doi.org/10.1186/1471-2164-6-43
_version_ 1782123427031678976
author Rensing, Stefan A
Fritzowsky, Dana
Lang, Daniel
Reski, Ralf
author_facet Rensing, Stefan A
Fritzowsky, Dana
Lang, Daniel
Reski, Ralf
author_sort Rensing, Stefan A
collection PubMed
description BACKGROUND: The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position. Available EST data was clustered and assembled, and provided the basis for a genome-wide analysis of protein encoding genes. RESULTS: We have clustered and assembled Physcomitrella patens EST and CDS data in order to represent the transcriptome of this non-seed plant. Clustering of the publicly available data and subsequent prediction resulted in a total of 19,081 non-redundant ORF. Of these putative transcripts, approximately 30% have a homolog in both rice and Arabidopsis transcriptome. More than 130 transcripts are not present in seed plants but can be found in other kingdoms. These potential "retained genes" might have been lost during seed plant evolution. Functional annotation of these genes reveals unequal distribution among taxonomic groups and intriguing putative functions such as cytotoxicity and nucleic acid repair. Whereas introns in the moss are larger on average than in the seed plant Arabidopsis thaliana, position and amount of introns are approximately the same. Contrary to Arabidopsis, where CDS contain on average 44% G/C, in Physcomitrella the average G/C content is 50%. Interestingly, moss orthologs of Arabidopsis genes show a significant drift of codon fraction usage, towards the seed plant. While averaged codon bias is the same in Physcomitrella and Arabidopsis, the distribution pattern is different, with 15% of moss genes being unbiased. Species-specific, sensitive and selective splice site prediction for Physcomitrella has been developed using a dataset of 368 donor and acceptor sites, utilizing a support vector machine. The prediction accuracy is better than those achieved with tools trained on Arabidopsis data. CONCLUSION: Analysis of the moss transcriptome displays differences in gene structure, codon and splice site usage in comparison with the seed plant Arabidopsis. Putative retained genes exhibit possible functions that might explain the peculiar physiological properties of mosses. Both the transcriptome representation (including a BLAST and retrieval service) and splice site prediction have been made available on , setting the basis for assembly and annotation of the Physcomitrella genome, of which draft shotgun sequences will become available in 2005.
format Text
id pubmed-1079823
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-10798232005-04-15 Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens Rensing, Stefan A Fritzowsky, Dana Lang, Daniel Reski, Ralf BMC Genomics Research Article BACKGROUND: The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position. Available EST data was clustered and assembled, and provided the basis for a genome-wide analysis of protein encoding genes. RESULTS: We have clustered and assembled Physcomitrella patens EST and CDS data in order to represent the transcriptome of this non-seed plant. Clustering of the publicly available data and subsequent prediction resulted in a total of 19,081 non-redundant ORF. Of these putative transcripts, approximately 30% have a homolog in both rice and Arabidopsis transcriptome. More than 130 transcripts are not present in seed plants but can be found in other kingdoms. These potential "retained genes" might have been lost during seed plant evolution. Functional annotation of these genes reveals unequal distribution among taxonomic groups and intriguing putative functions such as cytotoxicity and nucleic acid repair. Whereas introns in the moss are larger on average than in the seed plant Arabidopsis thaliana, position and amount of introns are approximately the same. Contrary to Arabidopsis, where CDS contain on average 44% G/C, in Physcomitrella the average G/C content is 50%. Interestingly, moss orthologs of Arabidopsis genes show a significant drift of codon fraction usage, towards the seed plant. While averaged codon bias is the same in Physcomitrella and Arabidopsis, the distribution pattern is different, with 15% of moss genes being unbiased. Species-specific, sensitive and selective splice site prediction for Physcomitrella has been developed using a dataset of 368 donor and acceptor sites, utilizing a support vector machine. The prediction accuracy is better than those achieved with tools trained on Arabidopsis data. CONCLUSION: Analysis of the moss transcriptome displays differences in gene structure, codon and splice site usage in comparison with the seed plant Arabidopsis. Putative retained genes exhibit possible functions that might explain the peculiar physiological properties of mosses. Both the transcriptome representation (including a BLAST and retrieval service) and splice site prediction have been made available on , setting the basis for assembly and annotation of the Physcomitrella genome, of which draft shotgun sequences will become available in 2005. BioMed Central 2005-03-22 /pmc/articles/PMC1079823/ /pubmed/15784153 http://dx.doi.org/10.1186/1471-2164-6-43 Text en Copyright © 2005 Rensing et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rensing, Stefan A
Fritzowsky, Dana
Lang, Daniel
Reski, Ralf
Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens
title Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens
title_full Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens
title_fullStr Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens
title_full_unstemmed Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens
title_short Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens
title_sort protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, physcomitrella patens
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1079823/
https://www.ncbi.nlm.nih.gov/pubmed/15784153
http://dx.doi.org/10.1186/1471-2164-6-43
work_keys_str_mv AT rensingstefana proteinencodinggenesinanancientplantanalysisofcodonusageretainedgenesandsplicesitesinamossphyscomitrellapatens
AT fritzowskydana proteinencodinggenesinanancientplantanalysisofcodonusageretainedgenesandsplicesitesinamossphyscomitrellapatens
AT langdaniel proteinencodinggenesinanancientplantanalysisofcodonusageretainedgenesandsplicesitesinamossphyscomitrellapatens
AT reskiralf proteinencodinggenesinanancientplantanalysisofcodonusageretainedgenesandsplicesitesinamossphyscomitrellapatens